Ini adalah pertanyaan yang sangat mendasar. Mengapa kita menggunakan distribusi chi square? Apa arti dari distribusi ini? Mengapa ini distribusi yang digunakan untuk membuat interval kepercayaan untuk varians?

Setiap tempat saya mencari penjelasan hanya menyajikan fakta ini, menjelaskan kapan harus menggunakan chi, tetapi tidak menjelaskan mengapa menggunakan chi, dan mengapa itu terlihat seperti itu.

Banyak terima kasih kepada siapa pun yang dapat mengarahkan saya ke arah yang benar dan itu - benar-benar memahami mengapa saya menggunakan chi ketika saya membuat interval kepercayaan untuk varians.

variance chi-squared

— nafrtiti
sumber

Anda menggunakannya karena - ketika data normal -

Q = (n - 1) \frac{s^{2}}{σ^{2}} \sim χ_{n - 1}^{2}

$Q = (n-1)\frac{s^2}{\sigma^2}\sim \chi^2_{n-1}$ . (Ini menjadikan

Q

$Q$ jumlah yang sangat penting)

— Glen_b -Reinstate Monica

Lihat juga stats.stackexchange.com/questions/15711/… dan tautannya .

— Nick Cox

Bagi mereka yang tertarik dengan aplikasi atau penelitian lebih lanjut ke

, Anda akan ingin memperhatikan perbedaan antara distribusi

("chi-squared") dan distribusi

("chi") (itu adalah akar kuadrat dari

, tidak mengejutkan).

χ^{2}

$\chi^2$

χ^{2}

$\chi^2$

χ

$\chi$

χ^{2}

$\chi^2$

— whuber

Jawaban cepat

Alasannya adalah karena, dengan asumsi data iid dan , dan mendefinisikan $X_i\sim N(\mu,\sigma^2)$ saat membentuk interval kepercayaan, distribusi sampling yang terkait dengan varians sampel (, ingat, variabel acak!) Adalah distribusi chi-square (), sama seperti distribusi sampling yang terkait dengan mean sampel adalah distribusi normal standar (

\begin{array}{rcl} \bar{X} & = & \sum^{N} \frac{X_{i}}{N} \\ S^{2} & = & \sum^{N} \frac{(\bar{X} - X_{i})^{2}}{N - 1} \end{array}

$\begin{eqnarray*} \bar{X}&=&\sum^N \frac{X_i}{N}\\ S^2 &=& \sum^{N} \frac{(\bar{X}-X_i)^2}{N-1} \end{eqnarray*}$

S^{2}

$S^2$

S^{2} (N - 1) / σ^{2} \sim χ_{n - 1}^{2}

$S^2(N-1)/\sigma^2 \sim \chi^2_{n-1}$

) ketika Anda mengetahui varians, dan dengan siswa-t ketika Anda tidak (

(\bar{X} - μ) \sqrt{n} / σ \sim Z (0, 1)

$(\bar{X}-\mu)\sqrt{n}/\sigma \sim Z(0,1)$

(\bar{X} - μ) \sqrt{n} / S \sim T_{n - 1}

$(\bar{X}-\mu)\sqrt{n}/S \sim T_{n-1}$

Jawaban panjang

Pertama-tama, kita akan membuktikan bahwa mengikuti distribusi chi-square dengan $S^2(N-1)/\sigma^2$ $N-1$ derajat kebebasan. Setelah itu, kita akan melihat bagaimana bukti ini berguna ketika menurunkan interval kepercayaan untuk varians, dan bagaimana distribusi chi-square muncul (dan mengapa itu sangat berguna!). Mari kita mulai.

Bukti

Untuk ini, mungkin Anda harus terbiasa dengan distribusi chi-square di artikel Wikipedia ini . Distribusi ini hanya memiliki satu parameter: derajat kebebasan, , dan kebetulan memiliki Moment Generating Function (MGF) yang diberikan oleh: Jika kita dapat menunjukkan bahwa distribusi memiliki fungsi menghasilkan momen seperti ini, tetapi dengan $\nu$

m_{χ_{ν}^{2}} (t) = (1 - 2 t)^{- ν / 2} .

$\begin{equation*} m_{\chi^2_\nu}(t)=(1-2t)^{-\nu/2}. \end{equation*}$

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$

, maka kita telah menunjukkan bahwa

ν = N - 1

$\nu=N-1$

mengikuti distribusi chi-square dengan

derajat kebebasan. Untuk menunjukkan ini, perhatikan dua fakta:

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$

N - 1

$N-1$

Jika kita mendefinisikan,
$Y = \sum \frac{(X_{i} - \bar{X})^{2}}{σ^{2}} = \sum Z_{i}^{2},$ $\begin{equation*} Y = \sum \frac{(X_i-\bar{X})^2}{\sigma^2} = \sum Z_i^2, \end{equation*}$ where $Z_i\sim N(0,1)$ , i.e., standard normal random variables, the moment generating function of $Y$ is given by $\begin{array}{rcl} m_{Y} (t) & = & E [e^{t Y}] \\ = & E [e^{t Z_{1}^{2}}] \times E [e^{t Z_{2}^{2}}] \times . . . E [e^{t Z_{N}^{2}}] \\ = & m_{Z_{i}^{2}} (t) \times m_{Z_{2}^{2}} (t) \times . . . m_{Z_{N}^{2}} (t) . \end{array}$ $\begin{eqnarray*} m_Y(t) &=& \mathbb{E}[e^{tY}]\\ &=&\mathbb{E}[e^{tZ_1^2}]\times \mathbb{E}[e^{tZ_2^2}]\times ...\mathbb{E}[e^{tZ_N^2}]\\ &=&m_{Z_i^2}(t)\times m_{Z_2^2}(t)\times ...m_{Z_N^2}(t). \end{eqnarray*}$ The MGF of $Z^2$ is given by $\begin{array}{rcl} m_{Z^{2}} (t) & = & \int_{- \infty}^{\infty} f (z) \exp (t z^{2}) d z \\ = & (1 - 2 t)^{- 1 / 2}, \end{array}$ $\begin{eqnarray*} m_{Z^2}(t) &=& \int_{-\infty}^{\infty} f(z)\exp(tz^2)dz\\ &=&(1-2t)^{-1/2}, \end{eqnarray*}$ where I have used the PDF of the standard normal, $f(z)=e^{-z^2/2}/\sqrt{2\pi}$ and, hence, $m_{Y} (t) = (1 - 2 t)^{- N / 2},$ $\begin{equation*} m_Y(t)=(1-2t)^{-N/2}, \end{equation*}$ which implies that $Y$ follows a chi-square distribution with $N$ degrees of freedom.
If $Y_1$ and $Y_2$ are independent and each distribute as a chi-square distribution but with $\nu_1$ and $\nu_2$ degrees of freedom, then $W=Y_1+Y_2$ distributes with a chi-square distribution with $\nu_1+\nu_2$ degrees of freedom (this follows from taking the MGF of $W$ ; do this!).

With the above facts, note that if you multiply the sample variance by $N-1$ , you obtain (after some algebra),

(N - 1) S^{2} = - n (\bar{X} - μ) + \sum (X_{i} - μ)^{2},

$\begin{equation*} (N-1)S^2 = -n(\bar{X}-\mu)+\sum(X_i-\mu)^2, \end{equation*}$ and, hence, dividing by

σ^{2}

$\sigma^2$ ,

\frac{(N - 1) S^{2}}{σ^{2}} + \frac{(\bar{X} - μ)^{2}}{σ^{2} / N} = \sum \frac{(X_{i} - μ)^{2}}{σ^{2}} .

$\begin{equation*} \frac{(N-1)S^2}{\sigma^2}+\frac{(\bar{X}-\mu)^2}{\sigma^2/N}=\sum \frac{(X_i-\mu)^2}{\sigma^2}. \end{equation*}$ Note that the second term in the left-side of this sum distributes as a chi-square distribution with 1 degree of freedom, and the right-hand side sum distributes as a chi-square with

N

$N$ degrees of freedom. Therefore, $S^2(N-1)/\sigma^2$ distributes as a chi-square with $N-1$ degrees of freedom.

Calculating the Confidence Interval for the variance.

When looking for a confidence interval for the variance, you want to know the limits $L_1$ and $L_2$ in

P (L_{1} \leq σ^{2} \leq L_{2}) = 1 - α .

$\begin{equation*} \mathbb{P}\left(L_1\leq \sigma^2 \leq L_2\right) = 1-\alpha. \end{equation*}$ Let's play with the inequality inside the parenthesis. First, divide by

S^{2} (N - 1)

$S^2(N-1)$ ,

\frac{L_{1}}{S^{2} (N - 1)} \leq \frac{σ^{2}}{S^{2} (N - 1)} \leq \frac{L_{2}}{S^{2} (N - 1)} .

$\begin{equation*} \frac{L_1}{S^2(N-1)}\leq \frac{\sigma^2}{S^2(N-1)} \leq \frac{L_2}{S^2(N-1)}. \end{equation*}$ And then remember two things: (1) the statistic

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$ has a chi-squared distribution with

N - 1

$N-1$ degrees of freedom and (2) the variances is always greather than zero, which implies that you can invert the inequalities, because

\begin{array}{rcl} \frac{L_{1}}{S^{2} (N - 1)} \leq \frac{σ^{2}}{S^{2} (N - 1)} & \Rightarrow & \frac{S^{2} (N - 1)}{σ^{2}} \leq \frac{S^{2} (N - 1)}{L_{1}}, \\ \frac{σ^{2}}{S^{2} (N - 1)} \leq \frac{L_{2}}{S^{2} (N - 1)} & \Rightarrow & \frac{S^{2} (N - 1)}{L_{2}} \leq \frac{S^{2} (N - 1)}{σ^{2}}, \end{array}

$\begin{eqnarray*} \frac{L_1}{S^2(N-1)}\leq \frac{\sigma^2}{S^2(N-1)} &\Rightarrow& \frac{S^2(N-1)}{\sigma^2}\leq \frac{S^2(N-1)}{L_1},\\ \frac{\sigma^2}{S^2(N-1)} \leq \frac{L_2}{S^2(N-1)} &\Rightarrow& \frac{S^2(N-1)}{L_2} \leq \frac{S^2(N-1)}{\sigma^2},\\ \end{eqnarray*}$ hence, the probability we are looking for is:

P (\frac{S^{2} (N - 1)}{L_{2}} \leq \frac{S^{2} (N - 1)}{σ^{2}} \leq \frac{S^{2} (N - 1)}{L_{1}}) = 1 - α .

$\begin{equation*} \mathbb{P}\left(\frac{S^2(N-1)}{L_2} \leq \frac{S^2(N-1)}{\sigma^2}\leq \frac{S^2(N-1)}{L_1}\right) = 1-\alpha. \end{equation*}$ Note that

S^{2} (N - 1) / σ^{2} \sim χ^{2} (N - 1)

$S^2(N-1)/\sigma^2 \sim \chi^2(N-1)$ . We want then,

\begin{array}{rcl} \int_{\frac{S^{2} (N - 1)}{L_{2}}}^{N - 1} p_{χ^{2}} (x) d x & = & (1 - α) / 2, \\ \int_{N - 1}^{\frac{S^{2} (N - 1)}{L_{1}}} p_{χ^{2}} (x) d x & = & (1 - α) / 2 \end{array}

$\begin{eqnarray*} \int_{\frac{S^2(N-1)}{L_2}}^{N-1}p_{\chi^2}(x)dx &=& (1-\alpha)/2\ \ \ ,\\ \int_{N-1}^{\frac{S^2(N-1)}{L_1}}p_{\chi^2}(x)dx &=& (1-\alpha)/2\ \ \, \end{eqnarray*}$ (we integrate up to

N - 1

$N-1$ because the expected value of a chi-squared random variable with

N - 1

$N-1$ degrees of freedom is

N - 1

$N-1$ ) or, equivalently,

\begin{array}{rcl} \int_{0}^{\frac{S^{2} (N - 1)}{L_{2}}} p_{χ^{2}} (x) d x = α / 2, \\ \int_{\frac{S^{2} (N - 1)}{L_{1}}}^{\infty} p_{χ^{2}} (x) d x = α / 2. \end{array}

$\begin{eqnarray*} \int_{0}^{\frac{S^2(N-1)}{L_2}}p_{\chi^2}(x)dx=\alpha/2,\\ \int_{\frac{S^2(N-1)}{L_1}}^{\infty}p_{\chi^2}(x)dx=\alpha/2. \end{eqnarray*}$ Calling

χ_{α / 2}^{2} = \frac{S^{2} (N - 1)}{L_{2}}

$\chi^2_{\alpha/2}=\frac{S^2(N-1)}{L_2}$ and

χ_{1 - α / 2}^{2} = \frac{S^{2} (N - 1)}{L_{1}}

$\chi^2_{1-\alpha/2}= \frac{S^2(N-1)}{L_1}$ , where the values

χ_{α / 2}^{2}

$\chi^2_{\alpha/2}$ and

χ_{1 - α / 2}^{2}

$\chi^2_{1-\alpha/2}$ can be found in chi-square tables (in computers mainly!) and solving for

L_{1}

$L_1$ and

L_{2}

$L_2$ ,

\begin{array}{rcl} L_{1} & = & \frac{S^{2} (N - 1)}{χ_{1 - α / 2}^{2}}, \\ L_{2} & = & \frac{S^{2} (N - 1)}{χ_{α / 2}^{2}} . \end{array}

$\begin{eqnarray*} L_1 &=& \frac{S^2(N-1)}{\chi^2_{1-\alpha/2}},\\ L_2 &=& \frac{S^2(N-1)}{\chi^2_{\alpha/2}}. \end{eqnarray*}$ Hence, your confidence interval for the variance is

C . I . = (\frac{S^{2} (N - 1)}{χ_{1 - α / 2}^{2}}, \frac{S^{2} (N - 1)}{χ_{α / 2}^{2}}) .

$\begin{equation*} C.I.=\left(\frac{S^2(N-1)}{\chi^2_{1-\alpha/2}}, \frac{S^2(N-1)}{\chi^2_{\alpha/2}}\right). \end{equation*}$

— Néstor
sumber

Simply because

S^{2}

$S^2$ does not follow a centered chi-square distribution, while

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$ does and, therefore, its easier to work with. Are you asking for a derivation for that? (i.e., you want someone to show you that

S^{2} (N - 1) / σ^{2}

$S^2(N-1)/\sigma^2$ follows a chi-square distribution with

N - 1

$N-1$ degrees of freedom?)

— Néstor

It would be helpful to modify this answer to include the very strong but unstated assumption that the sample variance follows a chi-squared distribution when the underlying data are independent and follow a normal distribution. Unlike the theory of the distribution of the sample mean, where in practice its sampling distribution will be approximately Normal to reasonable accuracy in many situations, this same asymptotic behavior tends not to happen with the sample variance (until sample sizes become extremely large).

— whuber

Oops. So, so true! This actually came from a problem solution that I handed out to some students, where I state on the question all these assumptions. I edited the answer now.

— Néstor

@user34756 The reason we don't use the distribution of

S^{2}

$S^2$ directly is that its distribution depends on the value of a parameter. You may find it useful to investigate the use of pivotal quantities in constructing confidence intervals.

— Glen_b -Reinstate Monica

Isn't

f (z) = e^{- z^{2} / 2}

$f(z) = e^{-z^2/2}$ instead of

f (z) = e^{- z^{2}}

$f(z) = e^{-z^2}$ ?

— Benoît Legat

Mengapa chi square digunakan saat membuat interval kepercayaan untuk varians?

Jawaban cepat

Jawaban panjang

Bukti

Calculating the Confidence Interval for the variance.