Pertanyaan tentang cara menormalkan koefisien regresi

Tidak yakin apakah normalisasi adalah kata yang tepat untuk digunakan di sini, tetapi saya akan mencoba yang terbaik untuk menggambarkan apa yang ingin saya tanyakan. Estimator yang digunakan di sini adalah kuadrat terkecil.

Misalkan Anda memiliki $y=\beta_0+\beta_1x_1$ , Anda dapat memusatkannya di sekitar rata-rata dengan $y=\beta_0'+\beta_1x_1'$ mana $\beta_0'=\beta_0+\beta_1\bar x_1$ dan $x_1'=x-\bar x$ , sehingga $\beta_0'$ tidak lagi memiliki pengaruh pada estimasi $\beta_1$ .

Dengan ini saya berarti di setara dengan di . Kami telah mengurangi persamaan untuk perhitungan kuadrat terkecil yang lebih mudah. $\hat\beta_1$ $y=\beta_1x_1'$ $\hat\beta_1$ $y=\beta_0+\beta_1x_1$

Bagaimana Anda menerapkan metode ini secara umum? Sekarang saya memiliki model $y=\beta_1e^{x_1t}+\beta_2e^{x_2t}$ , saya mencoba menguranginya menjadi $y=\beta_1x'$ .

— Saber CN
sumber

Jenis data apa yang Anda analisis, dan mengapa Anda ingin menghapus kovariat,

, dari model Anda. Juga, apakah ada alasan Anda menghapus intersep? Jika Anda memusatkan data, kemiringan akan sama dalam model dengan / tanpa intersep, tetapi model dengan intersep akan lebih cocok dengan data Anda.

e^{x_{1} t}

$e^{x_1t}$

— caburke

@caburke Saya tidak khawatir tentang kesesuaian model, karena setelah saya menghitung

dan

saya bisa memasukkannya kembali ke dalam model. Titik untuk latihan ini adalah untuk memperkirakan

. Dengan mengurangi persamaan asli menjadi hanya

, perhitungan kuadrat terkecil akan lebih mudah (x 'adalah bagian dari apa yang saya coba cari tahu, mungkin termasuk

). Saya mencoba mempelajari mekanismenya, ini adalah pertanyaan dari sebuah buku karya Tukey.

β_{1}

$\beta_1$

β_{2}

$\beta_2$

β_{1}

$\beta_1$

y = β_{1} x^{'}

$y=\beta_1x'$

e^{x_{1} t}

$e^{x_1t}$

— Saber CN

@ca Pengamatan di akhir komentar Anda membingungkan. Ini tidak mungkin berlaku untuk ekspresi nonlinier - mereka tidak mengandung apa pun yang dapat dianggap sebagai "kemiringan" - tetapi itu tidak benar dalam pengaturan OLS: kesesuaian untuk data yang berpusat pada rata-rata adalah sama baiknya dengan cocok dengan intersepsi. Sabre, model Anda ambigu: manakah dari

merupakan variabel dan yang merupakan parameter? Apa struktur kesalahan yang dimaksud? (Dan buku Tukey mana yang merupakan pertanyaan dari?)

β_{1}, β_{2}, x_{1}, x_{2}, t

$\beta_1, \beta_2, x_1, x_2, t$

— Whuber

@whuber Ini dari buku Tukey "Analisis dan regresi data: kursus kedua dalam statistik" bab 14A.

adalah parameter yang kami coba perkirakan,

adalah variabel masing-masing dengan n pengamatan,

Saya berasumsi adalah variabel waktu yang terkait dengan pengamatan, namun tidak ditentukan. Kesalahan seharusnya normal dan dapat diabaikan untuk pertanyaan ini.

β_{1}, β_{2}

$\beta_1,\beta_2$

x_{1}, x_{2}

$x_1,x_2$

t

$t$

— Saber CN

@whuber saya sebagian besar merujuk pada bagian pertama dari posting, tetapi ini tidak jelas dalam komentar saya. Yang saya maksudkan adalah bahwa jika Anda hanya bermaksud-pusat

, dan bukan

, seperti yang tampaknya disarankan dalam OP, dan kemudian menghapus intersep maka cocok akan lebih buruk, karena itu tidak selalu terjadi pada kasus yang

. Kemiringan jelas bukan istilah yang baik untuk koefisien dalam model yang disebutkan dalam baris terakhir OP.

x

$x$

y

$y$

\bar{y} = 0

$\bar{y}=0$

— caburke

Meskipun saya tidak dapat melakukan keadilan terhadap pertanyaan di sini - yang akan membutuhkan monograf kecil - mungkin bermanfaat untuk merekapitulasi beberapa ide kunci.

Pertanyaan

Mari kita mulai dengan menyatakan kembali pertanyaan dan menggunakan terminologi yang jelas. The Data terdiri dari daftar pasangan memerintahkan . Konstanta yang diketahui dan menentukan nilai dan . Kami menempatkan model di mana $(t_i, y_i)$ $\alpha_1$ $\alpha_2$ $x_{1,i} = \exp(\alpha_1 t_i)$ $x_{2,i} = \exp(\alpha_2 t_i)$

y_{i} = β_{1} x_{1, i} + β_{2} x_{2, i} + ε_{i}

$y_i = \beta_1 x_{1,i} + \beta_2 x_{2,i} + \varepsilon_i$

untuk estimasi konstanta dan , adalah acak, dan - to aproksimasi yang baik - independen dan memiliki varian yang sama (yang estimasi-nya juga menarik). $\beta_1$ $\beta_2$ $\varepsilon_i$

Latar belakang: linear "matching"

Mosteller and Tukey refer to the variables $x_1$ = $(x_{1,1}, x_{1,2}, \ldots)$ and $x_2$ as "matchers." They will be used to "match" the values of $y = (y_1, y_2, \ldots)$ in a specific way, which I will illustrate. More generally, let $y$ and $x$ be any two vectors in the same Euclidean vector space, with $y$ playing the role of "target" and $x$ that of "matcher". We contemplate systematically varying a coefficient $\lambda$ in order to approximate $y$ by the multiple $\lambda x$ . The best approximation is obtained when $\lambda x$ is as close to $y$ as possible. Equivalently, the squared length of $y - \lambda x$ is minimized.

Salah satu cara untuk memvisualisasikan proses pencocokan ini adalah dengan membuat sebar dan yang menggambar grafik . Jarak vertikal antara titik sebar dan grafik ini adalah komponen dari vektor sisa ; jumlah kotak mereka harus dibuat sekecil mungkin. Hingga konstan proporsionalitas, kuadrat ini adalah area lingkaran yang berpusat pada titik dengan jari-jari sama dengan residu: kami ingin meminimalkan jumlah area dari semua lingkaran ini. $x$ $y$ $x \to \lambda x$ $y - \lambda x$ $(x_i, y_i)$

Here is an example showing the optimal value of $\lambda$ in the middle panel:

Panel

The points in the scatterplot are blue; the graph of $x \to \lambda x$ is a red line. This illustration emphasizes that the red line is constrained to pass through the origin $(0,0)$ : it is a very special case of line fitting.

Multiple regression can be obtained by sequential matching

Kembali ke pengaturan pertanyaan, kami memiliki satu target dan dua pencocokan dan . Kami mencari angka dan yang diperkirakan kira-kira sedekat mungkin dengan , lagi-lagi dalam arti jarak paling rendah. Dimulai secara acak dengan , Mosteller & Tukey cocok dengan variabel yang tersisa dan hingga $y$ $x_1$ $x_2$ $b_1$ $b_2$ $y$ $b_1 x_1 + b_2 x_2$ $x_1$ $x_2$ $y$ $x_1$ . Write the residuals for these matches as $x_{2\cdot 1}$ and $y_{\cdot 1}$ , respectively: the $_{\cdot 1}$ indicates that $x_1$ has been "taken out of" the variable.

We can write

y = λ_{1} x_{1} + y_{\cdot 1} and x_{2} = λ_{2} x_{1} + x_{2 \cdot 1} .

$y = \lambda_1 x_1 + y_{\cdot 1}\text{ and }x_2 = \lambda_2 x_1 + x_{2\cdot 1}.$

Having taken $x_1$ out of $x_2$ and $y$ , we proceed to match the target residuals $y_{\cdot 1}$ to the matcher residuals $x_{2\cdot 1}$ . The final residuals are $y_{\cdot 12}$ . Algebraically, we have written

\begin{aligned} y_{\cdot 1} & = λ_{3} x_{2 \cdot 1} + y_{\cdot 12}; whence \\ y & = λ_{1} x_{1} + y_{\cdot 1} = λ_{1} x_{1} + λ_{3} x_{2 \cdot 1} + y_{\cdot 12} = λ_{1} x_{1} + λ_{3} (x_{2} - λ_{2} x_{1}) + y_{\cdot 12} \\ = (λ_{1} - λ_{3} λ_{2}) x_{1} + λ_{3} x_{2} + y_{\cdot 12} . \end{aligned}

$\eqalign{ y_{\cdot 1} &= \lambda_3 x_{2\cdot 1} + y_{\cdot 12}; \text{ whence} \\ y &= \lambda_1 x_1 + y_{\cdot 1} = \lambda_1 x_1 + \lambda_3 x_{2\cdot 1} + y_{\cdot 12} =\lambda_1 x_1 + \lambda_3 \left(x_2 - \lambda_2 x_1\right) + y_{\cdot 12} \\ &=\left(\lambda_1 - \lambda_3 \lambda_2\right)x_1 + \lambda_3 x_2 + y_{\cdot 12}. }$

This shows that the $\lambda_3$ in the last step is the coefficient of $x_2$ in a matching of $x_1$ and $x_2$ to $y$ .

We could just as well have proceeded by first taking $x_2$ out of $x_1$ and $y$ , producing $x_{1\cdot 2}$ and $y_{\cdot 2}$ , and then taking $x_{1\cdot 2}$ out of $y_{\cdot 2}$ , yielding a different set of residuals $y_{\cdot 21}$ . This time, the coefficient of $x_1$ found in the last step--let's call it $\mu_3$ --is the coefficient of $x_1$ in a matching of $x_1$ and $x_2$ to $y$ .

Finally, for comparison, we might run a multiple (ordinary least squares regression) of $y$ against $x_1$ and $x_2$ . Let those residuals be $y_{\cdot lm}$ . It turns out that the coefficients in this multiple regression are precisely the coefficients $\mu_3$ and $\lambda_3$ found previously and that all three sets of residuals, $y_{\cdot 12}$ , $y_{\cdot 21}$ , and $y_{\cdot lm}$ , are identical.

Depicting the process

None of this is new: it's all in the text. I would like to offer a pictorial analysis, using a scatterplot matrix of everything we have obtained so far.

Scatterplot

Because these data are simulated, we have the luxury of showing the underlying "true" values of $y$ on the last row and column: these are the values $\beta_1 x_1 + \beta_2 x_2$ without the error added in.

The scatterplots below the diagonal have been decorated with the graphs of the matchers, exactly as in the first figure. Graphs with zero slopes are drawn in red: these indicate situations where the matcher gives us nothing new; the residuals are the same as the target. Also, for reference, the origin (wherever it appears within a plot) is shown as an open red circle: recall that all possible matching lines have to pass through this point.

Much can be learned about regression through studying this plot. Some of the highlights are:

The matching of $x_2$ to $x_1$ (row 2, column 1) is poor. This is a good thing: it indicates that $x_1$ and $x_2$ are providing very different information; using both together will likely be a much better fit to $y$ than using either one alone.
Once a variable has been taken out of a target, it does no good to try to take that variable out again: the best matching line will be zero. See the scatterplots for $x_{2\cdot 1}$ versus $x_1$ or $y_{\cdot 1}$ versus $x_1$ , for instance.
The values $x_1$ , $x_2$ , $x_{1\cdot 2}$ , and $x_{2\cdot 1}$ have all been taken out of $y_{\cdot lm}$ .
Multiple regression of $y$ against $x_1$ and $x_2$ can be achieved first by computing $y_{\cdot 1}$ and $x_{2\cdot 1}$ . These scatterplots appear at (row, column) = $(8,1)$ and $(2,1)$ , respectively. With these residuals in hand, we look at their scatterplot at $(4,3)$ . These three one-variable regressions do the trick. As Mosteller & Tukey explain, the standard errors of the coefficients can be obtained almost as easily from these regressions, too--but that's not the topic of this question, so I will stop here.

Code

These data were (reproducibly) created in R with a simulation. The analyses, checks, and plots were also produced with R. This is the code.

#
# Simulate the data.
#
set.seed(17)
t.var <- 1:50                                    # The "times" t[i]
x <- exp(t.var %o% c(x1=-0.1, x2=0.025) )        # The two "matchers" x[1,] and x[2,]
beta <- c(5, -1)                                 # The (unknown) coefficients
sigma <- 1/2                                     # Standard deviation of the errors
error <- sigma * rnorm(length(t.var))            # Simulated errors
y <- (y.true <- as.vector(x %*% beta)) + error   # True and simulated y values
data <- data.frame(t.var, x, y, y.true)

par(col="Black", bty="o", lty=0, pch=1)
pairs(data)                                      # Get a close look at the data
#
# Take out the various matchers.
#
take.out <- function(y, x) {fit <- lm(y ~ x - 1); resid(fit)}
data <- transform(transform(data, 
  x2.1 = take.out(x2, x1),
  y.1 = take.out(y, x1),
  x1.2 = take.out(x1, x2),
  y.2 = take.out(y, x2)
), 
  y.21 = take.out(y.2, x1.2),
  y.12 = take.out(y.1, x2.1)
)
data$y.lm <- resid(lm(y ~ x - 1))               # Multiple regression for comparison
#
# Analysis.
#
# Reorder the dataframe (for presentation):
data <- data[c(1:3, 5:12, 4)]

# Confirm that the three ways to obtain the fit are the same:
pairs(subset(data, select=c(y.12, y.21, y.lm)))

# Explore what happened:
panel.lm <- function (x, y, col=par("col"), bg=NA, pch=par("pch"),
   cex=1, col.smooth="red",  ...) {
  box(col="Gray", bty="o")
  ok <- is.finite(x) & is.finite(y)
  if (any(ok))  {
    b <- coef(lm(y[ok] ~ x[ok] - 1))
    col0 <- ifelse(abs(b) < 10^-8, "Red", "Blue")
    lwd0 <- ifelse(abs(b) < 10^-8, 3, 2)
    abline(c(0, b), col=col0, lwd=lwd0)
  }
  points(x, y, pch = pch, col="Black", bg = bg, cex = cex)    
  points(matrix(c(0,0), nrow=1), col="Red", pch=1)
}
panel.hist <- function(x, ...) {
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  h <- hist(x, plot = FALSE)
  breaks <- h$breaks; nB <- length(breaks)
  y <- h$counts; y <- y/max(y)
  rect(breaks[-nB], 0, breaks[-1], y,  ...)
}
par(lty=1, pch=19, col="Gray")
pairs(subset(data, select=c(-t.var, -y.12, -y.21)), col="Gray", cex=0.8, 
   lower.panel=panel.lm, diag.panel=panel.hist)

# Additional interesting plots:
par(col="Black", pch=1)
#pairs(subset(data, select=c(-t.var, -x1.2, -y.2, -y.21)))
#pairs(subset(data, select=c(-t.var, -x1, -x2)))
#pairs(subset(data, select=c(x2.1, y.1, y.12)))

# Details of the variances, showing how to obtain multiple regression
# standard errors from the OLS matches.
norm <- function(x) sqrt(sum(x * x))
lapply(data, norm)
s <- summary(lm(y ~ x1 + x2 - 1, data=data))
c(s$sigma, s$coefficients["x1", "Std. Error"] * norm(data$x1.2)) # Equal
c(s$sigma, s$coefficients["x2", "Std. Error"] * norm(data$x2.1)) # Equal
c(s$sigma, norm(data$y.12) / sqrt(length(data$y.12) - 2))        # Equal

— whuber
sumber

Could multiple regression of

y

$y$ against

x_{1}

$x_1$ and

x_{2}

$x_2$ still be achieved by first computing

y_{.1}

$y_{.1}$ and

x_{2.1}

$x_{2.1}$ if

x_{1}

$x_1$ and

x_{2}

$x_2$ were correlated? Wouldn't it then make a big difference whether we sequentially regressed

y

$y$ on

x_{1}

$x_1$ and

x_{2.1}

$x_{2.1}$ or on

x_{2}

$x_2$ and

x_{1.2}

$x_{1.2}$ ? How does this relate to one regression equation with multiple explanatory variables?

— miura

@miura, One of the leitmotifs of that chapter in Mosteller & Tukey is that when the

x_{i}

$x_i$ are correlated, the partials

x_{i \cdot j}

$x_{i\cdot j}$ have low variances; because their variances appear in the denominator of a formula for the estimation variance of their coefficients, this implies the corresponding coefficients will have relatively uncertain estimates. That's a fact of the data, M&T say, and you need to recognize that. It makes no difference whether you start the regression with

x_{1}

$x_1$ or

x_{2}

$x_2$ : compare y.21 to y.12 in my code.

— whuber

I came across this today, here is what I think on the question by @miura, Think of a 2 dimensional space where Y is to be projected as a combination of two vectors. y = ax1 + bx2 + res (=0). Now think of y as a combination of 3 variables, y = ax1 + bx2 + cx3. and x3 = mx1 + nx2. so certainly, the order in which you choose your variables is going to effect the coefficients. The reason for this is: the minimum error here can be obtained by various combinations. However, in few examples, the minimum error can be obtained by only one combination and that is where the order will not matter.

— Gaurav Singhal

@whuber Can you elaborate on how this equation might be used for a multivariate regression that also has a constant term ? ie y = B1 * x1 + B2 * x2 + c ? It is not clear to me how the constant term can be derived. Also I understand in general what was done for the 2 variables, enough at least to replicate it in Excel. How can that be expanded to 3 variables ? x1, x2, x3. It seems clear that we would need to remove x3 first from y, x1, and x2. then remove x2 from x1 and y. But it is not clear to me how to then get the B3 term.

— Fairly Nerdy

Saya telah menjawab beberapa pertanyaan saya di komentar di atas. Untuk regresi 3 variabel, kita akan memiliki 6 langkah. Hapus x1 dari x2, dari x3, dan dari y. Kemudian hapus x2,1 dari x3,1 dan dari y1. Kemudian hapus x3,21 dari y21. Yang menghasilkan 6 persamaan, yang masing-masing adalah dari bentuk variabel = lamda * variabel berbeda + residual. Salah satu persamaan memiliki ay sebagai variabel pertama, dan jika Anda terus mengganti variabel lain, Anda mendapatkan persamaan yang Anda butuhkan

— Fairly Nerdy