Ringkasan
Ketika prediktor berkorelasi, istilah kuadratik dan istilah interaksi akan membawa informasi serupa. Ini dapat menyebabkan model kuadratik atau model interaksi menjadi signifikan; tetapi ketika kedua istilah tersebut dimasukkan, karena keduanya sangat mirip, tidak satu pun yang signifikan. Diagnostik standar untuk multikolinearitas, seperti VIF, mungkin gagal mendeteksi hal ini. Bahkan plot diagnostik, yang dirancang khusus untuk mendeteksi efek penggunaan model kuadrat di tempat interaksi, mungkin gagal menentukan model mana yang terbaik.
Analisis
Kekuatan analisis ini, dan kekuatan utamanya, adalah untuk mengkarakterisasi situasi seperti yang dijelaskan dalam pertanyaan. Dengan karakterisasi yang tersedia maka tugas yang mudah untuk mensimulasikan data yang berperilaku sesuai.
Pertimbangkan dua prediktor dan X 2 (yang akan kami standarisasi secara otomatis sehingga masing-masing memiliki varian unit dalam dataset) dan anggaplah respons acak Y ditentukan oleh prediktor ini dan interaksinya ditambah kesalahan acak independen:X1X2Y
Y=β1X1+β2X2+β1,2X1X2+ε.
Dalam banyak kasus, prediktor berkorelasi. Dataset mungkin terlihat seperti ini:
Data sampel ini dihasilkan dengan dan β 1 , 2 = 0,1 . Korelasi antara X 1 dan X 2 adalah 0,85 .β1=β2=1β1,2=0.1X1X20.85
Ini tidak berarti bahwa kita memikirkan dan X 2 sebagai realisasi dari variabel acak: ini dapat mencakup situasi di mana X 1 dan X 2 adalah pengaturan dalam percobaan yang dirancang, tetapi untuk beberapa alasan pengaturan ini tidak ortogonal.X1X2X1X2
Terlepas dari bagaimana korelasi muncul, satu cara yang baik untuk menggambarkannya adalah dalam hal seberapa banyak prediktor berbeda dari rata-rata mereka, . Perbedaan-perbedaan ini akan cukup kecil (dalam arti bahwa varians mereka kurang dari 1 ); semakin besar korelasi antara X 1 dan X 2 , semakin kecil perbedaan ini. Menulis, maka, X 1 = X 0 + δ 1 dan X 2 = X 0 + δX0=(X1+X2)/21X1X2X1=X0+δ1X2=X0+δ2X2X1X2=X1+(δ2−δ1)
Y=β1X1+β2X2+β1,2X1(X1+[δ2−δ1])+ε=(β1+β1,2[δ2−δ1])X1+β2X2+β1,2X21+ε
Provided the values of β1,2[δ2−δ1] vary only a little bit compared to β1, we can gather this variation with the true random terms, writing
Y=β1X1+β2X2+β1,2X21+(ε+β1,2[δ2−δ1]X1)
Thus, if we regress Y against X1,X2, and X21, we will be making an error: the variation in the residuals will depend on X1 (that is, it will be heteroscedastic). This can be seen with a simple variance calculation:
var(ε+β1,2[δ2−δ1]X1)=var(ε)+[β21,2var(δ2−δ1)]X21.
However, if the typical variation in ε substantially exceeds the typical variation in β1,2[δ2−δ1]X1, that heteroscedasticity will be so low as to be undetectable (and should yield a fine model). (As shown below, one way to look for this violation of regression assumptions is to plot the absolute value of the residuals against the absolute value of X1--remembering first to standardize X1 if necessary.) This is the characterization we were seeking.
Remembering that X1 and X2 were assumed to be standardized to unit variance, this implies the variance of δ2−δ1 will be relatively small. To reproduce the observed behavior, then, it should suffice to pick a small absolute value for β1,2, but make it large enough (or use a large enough dataset) so that it will be significant.
In short, when the predictors are correlated and the interaction is small but not too small, a quadratic term (in either predictor alone) and an interaction term will be individually significant but confounded with each other. Statistical methods alone are unlikely to help us decide which is better to use.
Example
Let's check this out with the sample data by fitting several models. Recall that β1,2 was set to 0.1 when simulating these data. Although that is small (the quadratic behavior is not even visible in the previous scatterplots), with 150 data points we have a chance of detecting it.
First, the quadratic model:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.03363 0.03046 1.104 0.27130
x1 0.92188 0.04081 22.592 < 2e-16 ***
x2 1.05208 0.04085 25.756 < 2e-16 ***
I(x1^2) 0.06776 0.02157 3.141 0.00204 **
Residual standard error: 0.2651 on 146 degrees of freedom
Multiple R-squared: 0.9812, Adjusted R-squared: 0.9808
The quadratic term is significant. Its coefficient, 0.068, underestimates β1,2=0.1, but it's of the right size and right sign. As a check for multicollinearity (correlation among the predictors) we compute the variance inflation factors (VIF):
x1 x2 I(x1^2)
3.531167 3.538512 1.009199
Any value less than 5 is usually considered just fine. These are not alarming.
Next, the model with an interaction but no quadratic term:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02887 0.02975 0.97 0.333420
x1 0.93157 0.04036 23.08 < 2e-16 ***
x2 1.04580 0.04039 25.89 < 2e-16 ***
x1:x2 0.08581 0.02451 3.50 0.000617 ***
Residual standard error: 0.2631 on 146 degrees of freedom
Multiple R-squared: 0.9815, Adjusted R-squared: 0.9811
x1 x2 x1:x2
3.506569 3.512599 1.004566
All the results are similar to the previous ones. Both are about equally good (with a very tiny advantage to the interaction model).
Finally, let's include both the interaction and quadratic terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02572 0.03074 0.837 0.404
x1 0.92911 0.04088 22.729 <2e-16 ***
x2 1.04771 0.04075 25.710 <2e-16 ***
I(x1^2) 0.01677 0.03926 0.427 0.670
x1:x2 0.06973 0.04495 1.551 0.123
Residual standard error: 0.2638 on 145 degrees of freedom
Multiple R-squared: 0.9815, Adjusted R-squared: 0.981
x1 x2 I(x1^2) x1:x2
3.577700 3.555465 3.374533 3.359040
Now, neither the quadratic term nor the interaction term are significant, because each is trying to estimate a part of the interaction in the model. Another way to see this is that nothing was gained (in terms of reducing the residual standard error) when adding the quadratic term to the interaction model or when adding the interaction term to the quadratic model. It is noteworthy that the VIFs do not detect this situation: although the fundamental explanation for what we have seen is the slight collinearity between X1 and X2, which induces a collinearity between X21 and X1X2, neither is large enough to raise flags.
If we had tried to detect the heteroscedasticity in the quadratic model (the first one), we would be disappointed:
In the loess smooth of this scatterplot there is ever so faint a hint that the sizes of the residuals increase with |X1|, but nobody would take this hint seriously.