Linearitas varians

16

Saya pikir dua formula berikut ini benar:

V a r (a X) = a^{2} V a r (X)

$\mathrm{Var}(aX)=a^2 \mathrm{Var}(X)$ sedangkan a adalah bilangan konstan

V a r (X + Y) = V a r (X) + V a r (Y)

$\mathrm{Var}(X + Y)=\mathrm{Var}(X)+\mathrm{Var}(Y)$ jika

X

$X$ ,

Y

$Y$ adalah independen

Namun, saya tidak yakin apa yang salah dengan hal di bawah ini:

V a r (2 X) = V a r (X + X) = V a r (X) + V a r (X)

$\mathrm{Var}(2X) = \mathrm{Var}(X+X) = \mathrm{Var}(X) + \mathrm{Var}(X)$ yang tidak sama dengan

, yaitu

.

2^{2} V a r (X)

$2^2 \mathrm{Var}(X)$

4 V a r (X)

$4\mathrm{Var}(X)$

Jika diasumsikan bahwa adalah sampel yang diambil dari suatu populasi, saya pikir kita selalu dapat menganggap sebagai independen dari lainnya . $X$ $X$ $X$

Jadi apa yang salah dengan kebingungan saya?

variance linearity fallacy

— lanselibai
sumber

8

Varians tidak linear - menunjukkan pernyataan pertama Anda ini (jika itu, Anda akan memiliki

. Kovarian di sisi lain adalah bilinear.

V a r (a X) = a V a r (X)

$Var(aX) = a Var(X)$

— Batman

33

$\DeclareMathOperator{\Cov}{Cov}$ $\DeclareMathOperator{\Corr}{Corr}$ $\DeclareMathOperator{\Var}{Var}$

Masalah dengan garis penalaran Anda adalah

"Saya pikir kita selalu dapat menganggap independen dari lainnya ." $X$ $X$

ini tidak terlepas dari . Simbol digunakan untuk merujuk ke variabel acak yang sama di sini. Setelah Anda mengetahui nilai pertama yangmuncul dalam rumus Anda, ini juga memperbaiki nilai kedua yangmuncul. Jika Anda ingin mereka merujuk ke variabel acak yang berbeda (dan berpotensi independen), Anda harus menunjukkannya dengan huruf yang berbeda (misalnya dan ) atau menggunakan subskrip (misalnya dan ); yang terakhir sering (tetapi tidak selalu) digunakan untuk menunjukkan variabel yang diambil dari distribusi yang sama. $X$ $X$ $X$ $X$ $X$ $X$ $Y$ $X_1$ $X_2$

Jika dua variabel dan adalah independen maka adalah sama dengan : mengetahui nilai tidak memberikan informasi tambahan tentang nilai . Tetapi adalah jika dan sebaliknya: mengetahui nilai $X$ $Y$ $\Pr(X=a|Y=b)$ $\Pr(X=a)$ $Y$ $X$ $\Pr(X=a|X=b)$ $1$ $a=b$ $0$ $X$ memberikan informasi yang lengkap tentang nilai . [Anda dapat mengganti probabilitas dalam paragraf ini dengan fungsi distribusi kumulatif, atau jika sesuai, fungsi kepadatan probabilitas, untuk efek dasarnya sama.] $X$

Cara lain untuk melihat hal-hal adalah bahwa jika dua variabel independen maka mereka memiliki korelasi nol (meskipun nol korelasi tidak berarti kemerdekaan !) Tapi yang sempurna berkorelasi dengan dirinya sendiri, sehingga tidak bisa mandiri itu sendiri. Perhatikan bahwa karena kovarians diberikan oleh $X$ $\Corr(X,X)=1$ $X$ , lalu $\Cov(X,Y)=\Corr(X,Y)\sqrt{\Var(X)\Var(Y)}$

Cov (X, X) = 1 \sqrt{Var (X)^{2}} = Var (X)

$\Cov(X,X)=1\sqrt{\Var(X)^2}=\Var(X)$

Rumus yang lebih umum untuk varians dari penjumlahan dari dua variabel acak adalah

Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y)

$\Var(X+Y) = \Var(X) + \Var(Y) + 2 \Cov(X,Y)$

Secara khusus, , jadi $\Cov(X,X) = \Var(X)$

Var (X + X) = Var (X) + Var (X) + 2 Var (X) = 4 Var (X)

$\Var(X+X) = \Var(X) + \Var(X) + 2\Var(X) = 4\Var(X)$

yang sama seperti yang Anda simpulkan dari penerapan aturan

Var (a X) = a^{2} Var (X) ⟹ Var (2 X) = 4 Var (X)

$\Var(aX) = a^2 \Var(X) \implies \Var(2X) = 4\Var(X)$

$W$ $X$ $Y$ $Z$ $a$ $b$ $c$ $d$

Cov (a W + b X, Y) = a Cov (W, Y) + b Cov (X, Y)

$\Cov(aW + bX, Y) = a \Cov(W,Y) + b \Cov(X,Y)$

Cov (X, c Y + d Z) = c Cov (X, Y) + d Cov (X, Z)

$\Cov(X, cY + dZ) = c \Cov(X,Y) + d \Cov(X,Z)$

and overall,

Cov (a W + b X, c Y + d Z) = a c Cov (W, Y) + a d Cov (W, Z) + b c Cov (X, Y) + b d Cov (X, Z)

$\Cov(aW + bX, cY + dZ) = ac \Cov(W,Y) + ad \Cov(W,Z) + bc \Cov(X,Y) + bd \Cov(X,Z)$

You can then use this to prove the (non-linear) results for variance that you wrote in your post:

Var (a X) = Cov (a X, a X) = a^{2} Cov (X, X) = a^{2} Var (X)

$\Var(aX) = \Cov(aX, aX) = a^2 \Cov(X,X) = a^2 \Var(X)$

\begin{aligned} Var (a X + b Y) & = Cov (a X + b Y, a X + b Y) \\ = a^{2} Cov (X, X) + a b Cov (X, Y) + b a Cov (X, Y) + b^{2} Cov (Y, Y) \\ Var (a X + b Y) & = a^{2} Var (X) + b^{2} Var (Y) + 2 a b Cov (X, Y) \end{aligned}

$\begin{align} \Var(aX + bY) &= \Cov(aX + bY, aX + bY) \\ &= a^2 \Cov(X,X) + ab \Cov(X,Y) + ba \Cov (X,Y) + b^2 \Cov(Y,Y) \\ \Var(aX + bY) &= a^2 \Var(X) + b^2 \Var(Y) + 2ab \Cov(X,Y) \end{align}$

The latter gives, as a special case when $a=b=1$ ,

Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y)

$\Var(X+Y) = \Var(X) + \Var(Y) + 2 \Cov(X,Y)$

When $X$ and $Y$ are uncorrelated (which includes the case where they are independent), then this reduces to $\Var(X+Y) = \Var(X) + \Var(Y)$ . So if you want to manipulate variances in a "linear" way (which is often a nice way to work algebraically), then work with the covariances instead, and exploit their bilinearity.

— Silverfish
sumber

1

Yes! I think you pinpointed at the beginning that the confusion was essentially a notational one. I found it very helpful when one book (very explicitly, some might say laboriously) explained the interpretation of and rules of evaluating a probabilistic statement (so that, e.g., even if you know what you mean by

Pr (X + X = n)

$\Pr (X+X=n)$ where

X \sim Uniform (1..6)

$X \sim \text{Uniform}(1..6)$ , it is technically incorrect if you're considering throwing a

n

$n$ in craps (and

X + X = 2 X

$X+X=2X$ would never yield an odd roll); the event would be properly expressed using

X_{1}, X_{2}

$X_1,X_2$ i.i.d.).

— Vandermonde

1

This is in contrast to (and I think my misapprehension might have stemmed from) how 2+PRNG(6)+PRNG(6) often is how you would toss dice as above and/or notation/conventions such as

2 d 6 = d 6 + d 6

$2 \text{d}6 = \text{d}6 + \text{d}6$ in which different instances are genuinely intended to be independent.

— Vandermonde

@Vandermonde That's an interesting point. I initially considered mentioning the use of subscripts to distinguish between "different

X

$X$ s" but didn't bother - think I might edit it in now. The argument that "you'd never get an odd total score if the sum was

2 X

$2X$ " is very clear and convincing to someone who can't see the need to distinguish: thanks for sharing it.

— Silverfish

0

Another way of thinking about it is that with random variables $2X \neq X + X$ .

$2X$ would mean two times the value of the outcome of $X$ , while $X + X$ would mean two trials of $X$ . In other words, it's the difference between rolling a die once and doubling the result, vs rolling a die twice.

— Benjamin
sumber

+1 This is a perfectly clear and correct answer. Welcome to our site!

— whuber

Thanks @whuber!

— Benjamin