Entah kuadratik atau istilah interaksi signifikan dalam isolasi, tetapi keduanya tidak bersama

15

Sebagai bagian dari tugas, saya harus mencocokkan model dengan dua variabel prediktor. Saya kemudian harus menggambar plot residu model terhadap salah satu prediktor yang disertakan dan membuat perubahan berdasarkan itu. Plotnya menunjukkan tren lengkung dan jadi saya memasukkan istilah kuadrat untuk prediktor itu. Model baru menunjukkan istilah kuadrat menjadi signifikan. Semuanya baik sejauh ini.

Namun, data menunjukkan bahwa interaksi juga masuk akal. Menambahkan istilah interaksi ke model asli juga 'memperbaiki' tren lengkung dan juga signifikan ketika ditambahkan ke model (tanpa istilah kuadratik). Masalahnya adalah, ketika kuadratik dan istilah interaksi ditambahkan ke model, salah satunya tidak signifikan.

Istilah mana (kuadrat atau interaksi) yang harus saya sertakan dalam model dan mengapa?

statistical-significance multiple-regression modeling

— Tal Bashan
sumber

21

Ringkasan

Ketika prediktor berkorelasi, istilah kuadratik dan istilah interaksi akan membawa informasi serupa. Ini dapat menyebabkan model kuadratik atau model interaksi menjadi signifikan; tetapi ketika kedua istilah tersebut dimasukkan, karena keduanya sangat mirip, tidak satu pun yang signifikan. Diagnostik standar untuk multikolinearitas, seperti VIF, mungkin gagal mendeteksi hal ini. Bahkan plot diagnostik, yang dirancang khusus untuk mendeteksi efek penggunaan model kuadrat di tempat interaksi, mungkin gagal menentukan model mana yang terbaik.

Analisis

Kekuatan analisis ini, dan kekuatan utamanya, adalah untuk mengkarakterisasi situasi seperti yang dijelaskan dalam pertanyaan. Dengan karakterisasi yang tersedia maka tugas yang mudah untuk mensimulasikan data yang berperilaku sesuai.

Pertimbangkan dua prediktor dan (yang akan kami standarisasi secara otomatis sehingga masing-masing memiliki varian unit dalam dataset) dan anggaplah respons acak ditentukan oleh prediktor ini dan interaksinya ditambah kesalahan acak independen: $X_1$ $X_2$ $Y$

Y = β_{1} X_{1} + β_{2} X_{2} + β_{1, 2} X_{1} X_{2} + ε .

$Y = \beta_1 X_1 + \beta_2 X_2 + \beta_{1,2} X_1 X_2 + \varepsilon.$

Dalam banyak kasus, prediktor berkorelasi. Dataset mungkin terlihat seperti ini:

Matriks scatterplot

Data sampel ini dihasilkan dengan dan . Korelasi antara dan adalah . $\beta_1=\beta_2=1$ $\beta_{1,2}=0.1$ $X_1$ $X_2$ $0.85$

Ini tidak berarti bahwa kita memikirkan dan sebagai realisasi dari variabel acak: ini dapat mencakup situasi di mana dan adalah pengaturan dalam percobaan yang dirancang, tetapi untuk beberapa alasan pengaturan ini tidak ortogonal. $X_1$ $X_2$ $X_1$ $X_2$

Terlepas dari bagaimana korelasi muncul, satu cara yang baik untuk menggambarkannya adalah dalam hal seberapa banyak prediktor berbeda dari rata-rata mereka, . Perbedaan-perbedaan ini akan cukup kecil (dalam arti bahwa varians mereka kurang dari ); semakin besar korelasi antara dan , semakin kecil perbedaan ini. Menulis, maka, dan $X_0 = (X_1+X_2)/2$ $1$ $X_1$ $X_2$ $X_1 = X_0 + \delta_1$ $X_2 = X_0 + \delta_2$ $X_2$ $X_1$ $X_2 = X_1 + (\delta_2-\delta_1)$

\begin{aligned} Y & = β_{1} X_{1} + β_{2} X_{2} + β_{1, 2} X_{1} (X_{1} + [δ_{2} - δ_{1}]) + ε \\ = (β_{1} + β_{1, 2} [δ_{2} - δ_{1}]) X_{1} + β_{2} X_{2} + β_{1, 2} X_{1}^{2} + ε \end{aligned}

$\eqalign{ Y &= \beta_1 X_1+ \beta_2 X_2 + \beta_{1,2}X_1(X_1+ [\delta_2-\delta_1]) + \varepsilon \\ &= (\beta_1 + \beta_{1,2}[\delta_2-\delta_1]) X_1+ \beta_2 X_2 + \beta_{1,2}X_1^2 + \varepsilon }$

Provided the values of $\beta_{1,2}[\delta_2-\delta_1]$ vary only a little bit compared to $\beta_1$ , we can gather this variation with the true random terms, writing

Y = β_{1} X_{1} + β_{2} X_{2} + β_{1, 2} X_{1}^{2} + (ε + β_{1, 2} [δ_{2} - δ_{1}] X_{1})

$Y = \beta_1 X_1+ \beta_2 X_2 + \beta_{1,2}X_1^2 + \left(\varepsilon +\beta_{1,2}[\delta_2-\delta_1] X_1\right)$

Thus, if we regress $Y$ against $X_1, X_2$ , and $X_1^2$ , we will be making an error: the variation in the residuals will depend on $X_1$ (that is, it will be heteroscedastic). This can be seen with a simple variance calculation:

var (ε + β_{1, 2} [δ_{2} - δ_{1}] X_{1}) = var (ε) + [β_{1, 2}^{2} var (δ_{2} - δ_{1})] X_{1}^{2} .

$\text{var}\left(\varepsilon +\beta_{1,2}[\delta_2-\delta_1] X_1\right) = \text{var}(\varepsilon) + \left[\beta_{1,2}^2\text{var}(\delta_2-\delta_1)\right]X_1^2.$

However, if the typical variation in $\varepsilon$ substantially exceeds the typical variation in $\beta_{1,2}[\delta_2-\delta_1] X_1$ , that heteroscedasticity will be so low as to be undetectable (and should yield a fine model). (As shown below, one way to look for this violation of regression assumptions is to plot the absolute value of the residuals against the absolute value of $X_1$ --remembering first to standardize $X_1$ if necessary.) This is the characterization we were seeking.

Remembering that $X_1$ and $X_2$ were assumed to be standardized to unit variance, this implies the variance of $\delta_2-\delta_1$ will be relatively small. To reproduce the observed behavior, then, it should suffice to pick a small absolute value for $\beta_{1,2}$ , but make it large enough (or use a large enough dataset) so that it will be significant.

In short, when the predictors are correlated and the interaction is small but not too small, a quadratic term (in either predictor alone) and an interaction term will be individually significant but confounded with each other. Statistical methods alone are unlikely to help us decide which is better to use.

Example

Let's check this out with the sample data by fitting several models. Recall that $\beta_{1,2}$ was set to $0.1$ when simulating these data. Although that is small (the quadratic behavior is not even visible in the previous scatterplots), with $150$ data points we have a chance of detecting it.

First, the quadratic model:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.03363    0.03046   1.104  0.27130    
x1           0.92188    0.04081  22.592  < 2e-16 ***
x2           1.05208    0.04085  25.756  < 2e-16 ***
I(x1^2)      0.06776    0.02157   3.141  0.00204 ** 

Residual standard error: 0.2651 on 146 degrees of freedom
Multiple R-squared: 0.9812, Adjusted R-squared: 0.9808

The quadratic term is significant. Its coefficient, $0.068$ , underestimates $\beta_{1,2}=0.1$ , but it's of the right size and right sign. As a check for multicollinearity (correlation among the predictors) we compute the variance inflation factors (VIF):

      x1       x2  I(x1^2) 
3.531167 3.538512 1.009199

Any value less than $5$ is usually considered just fine. These are not alarming.

Next, the model with an interaction but no quadratic term:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.02887    0.02975    0.97 0.333420    
x1           0.93157    0.04036   23.08  < 2e-16 ***
x2           1.04580    0.04039   25.89  < 2e-16 ***
x1:x2        0.08581    0.02451    3.50 0.000617 ***

Residual standard error: 0.2631 on 146 degrees of freedom
Multiple R-squared: 0.9815, Adjusted R-squared: 0.9811

      x1       x2    x1:x2 
3.506569 3.512599 1.004566

All the results are similar to the previous ones. Both are about equally good (with a very tiny advantage to the interaction model).

Finally, let's include both the interaction and quadratic terms:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.02572    0.03074   0.837    0.404    
x1           0.92911    0.04088  22.729   <2e-16 ***
x2           1.04771    0.04075  25.710   <2e-16 ***
I(x1^2)      0.01677    0.03926   0.427    0.670    
x1:x2        0.06973    0.04495   1.551    0.123    

Residual standard error: 0.2638 on 145 degrees of freedom
Multiple R-squared: 0.9815, Adjusted R-squared: 0.981 

      x1       x2  I(x1^2)    x1:x2 
3.577700 3.555465 3.374533 3.359040

Now, neither the quadratic term nor the interaction term are significant, because each is trying to estimate a part of the interaction in the model. Another way to see this is that nothing was gained (in terms of reducing the residual standard error) when adding the quadratic term to the interaction model or when adding the interaction term to the quadratic model. It is noteworthy that the VIFs do not detect this situation: although the fundamental explanation for what we have seen is the slight collinearity between $X_1$ and $X_2$ , which induces a collinearity between $X_1^2$ and $X_1 X_2$ , neither is large enough to raise flags.

If we had tried to detect the heteroscedasticity in the quadratic model (the first one), we would be disappointed:

Diagnostic plot

In the loess smooth of this scatterplot there is ever so faint a hint that the sizes of the residuals increase with $|X_1|$ , but nobody would take this hint seriously.

— whuber
sumber

9

What makes the most sense based on the source of the data?

We cannot answer this question for you, the computer cannot answer this question for you. The reason that we still need statisticians instead of just statistical programs is because of questions like this. Statistics is about more than just crunching the numbers, it is about understanding the question and the source of the data and being able to make decisions based on the science and background and other information outside the data that the computer looks at. Your teacher is probably hoping that you will contemplate this as part of the assignment. If I had assigned a problem like this (and I have before) I would be more interested in the justification of your answer than which you actually chose.

It is probably beyond your current class, but one approach if there is not a clear scientific reason for prefering one model over the other is model averaging, you fit both models (and maybe several other models as well), then you average together the predictions (often weighted by the goodness of fit of the different models).

Another option, when possible, is to collect more data and if possible choosing the x values so that it becomes more clear what the non-linear vs. interaction effects are.

There are some tools for comparing the fit of non-nested models (AIC, BIC, etc.), but for this case they probably will not show enough difference to overrule understanding of where the data comes from and what makes the most sense.

— Greg Snow
sumber

1

Yet another possibility, in addition to @Greg's is to include both terms, even though one is not significant. Including only statistically significant terms is not a law of the universe.

— Peter Flom - Reinstate Monica
sumber

Thanks Peter & @Greg. I guess that at this stage of my studies I'm looking for absolute answers to questions that need at least some qualitative reasoning. Since the addition of either the quadratic term or the interaction term 'fixed' the residuals vs predictor plot, I was not sure which one should be included. What did surprise me is that the inclusion of a quadratic term rendered the interaction term non-significant. I would have thought that if there is an interaction, it would be significant regardless of whether a quadratic term was included or not.

— Tal Bashan

1

Hi @TalBashan A famous statistician, Donald Cox, once said that "there are no routine statistical questions, only questionable statistical routines"

— Peter Flom - Reinstate Monica

@PeterFlom Maybe you mean Sir David Cox??

— Michael R. Chernick

Ooops. Yes, David, not Donald. Sorry.

— Peter Flom - Reinstate Monica