Bagaimana cara memahami rumus koefisien korelasi?

Adakah yang bisa membantu saya memahami rumus korelasi Pearson? sampel = mean dari produk dari nilai standar variabel dan . $r$ $X$ $Y$

Saya agak mengerti mengapa mereka perlu membuat standar dan , tetapi bagaimana memahami produk dari kedua skor z? $X$ $Y$

Formula ini juga disebut "koefisien korelasi momen-produk", tetapi apa alasan tindakan produk itu? Saya tidak yakin apakah pertanyaan saya sudah jelas, tetapi saya hanya ingin mengingat formula secara intuitif.

correlation descriptive-statistics pearson-r

— Aaron Lu
sumber

Anda mungkin ingin membaca makalah "Tiga Belas Cara untuk Melihat Koefisien Korelasi" (Rodgers & Nicewander 1988). Sesuai dengan judulnya, judul ini membahas tiga belas pandangan intuitif berbeda dari koefisien korelasi. Jadi semoga setidaknya satu orang akan mengklik :)

— setengah lulus

13 Cara dapat ditemukan di sini

— Dimitriy V. Masterov

Cara ke-14 untuk memahami korelasi (dalam hal produk dari skor z) diturunkan untuk memahami kovarians dari variabel standar, seperti yang diilustrasikan pada stats.stackexchange.com/questions/18058/… .

— whuber

... Dan cara ke-15 menggunakan lingkaran yang ditampilkan di stats.stackexchange.com/a/46508/919 : kecocokan kuadrat meminimalkan luas total lingkaran (ada setidaknya dua cara untuk melakukan ini ketika poin tidak tepat berbaris) dan koefisien korelasi kemudian luas rata - rata mereka (ketika kedua variabel distandarisasi).

— whuber

Kemungkinan duplikat dari Apa itu kovarians dalam bahasa sederhana?

— kjetil b halvorsen

Dalam komentar, 15 cara untuk memahami koefisien korelasi disarankan:

13 cara yang dibahas dalam artikel Rodgers dan Nicewander (The American Statistician, Februari 1988) adalah

Fungsi Skor dan Cara Mentah,

$r = \frac{\sum (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum {(X_{i} - \bar{X})}^{2} {(Y_{i} - \bar{Y})}^{2}}} .$ $r =\frac{\sum\left(X_i - \bar{X}\right)\left(Y_i - \bar{Y}\right)}{\sqrt{\sum\left(X_i-\bar{X}\right)^2\left(Y_i-\bar{Y}\right)^2}}.$
Kovarian Standar,

$r = s_{X Y} / (s_{X} s_{Y})$ $r = s_{XY}/(s_Xs_Y)$
di mana adalah kovarians sampel dan dan adalah standar deviasi sampel. $s_{XY}$ $s_X$ $s_Y$
Kemiringan Standar Jalur Regresi,

$r = b_{Y \cdot X} \frac{s_{X}}{s_{Y}} = b_{X \cdot Y} \frac{s_{Y}}{s_{X}},$ $r = b_{Y\cdot X}\frac{s_X}{s_Y} = b_{X\cdot Y}\frac{s_Y}{s_X},$
di mana dan adalah kemiringan garis regresi. $b_{Y\cdot X}$ $b_{X \cdot Y}$
Mean Geometris dari Dua Lereng Regresi,

$r = \pm \sqrt{b_{Y \cdot X} b_{X \cdot Y}} .$ $r = \pm \sqrt{b_{Y\cdot X}b_{X\cdot Y}}.$
Akar Kuadrat dari Rasio Dua Varian (Proporsi Variabilitas Disumbang),

$r = \sqrt{\frac{\sum {(Y_{i} - \hat{Y_{i}})}^{2}}{\sum {(Y_{i} - \bar{Y})}^{2}}} = \sqrt{\frac{S S_{R E G}}{S S_{T O T}}} = \frac{s_{\hat{Y}}}{s_{Y}} .$ $r = \sqrt{\frac{\sum\left(Y_i - \hat{Y_i}\right)^2}{\sum\left(Y_i-\bar{Y}\right)^2}} = \sqrt{\frac{SS_{REG}}{SS_{TOT}}} = \frac{s_\hat{Y}}{s_Y}.$
Produk Lintas Rata-Rata dari Variabel Standar,

$r = \sum z_{X} z_{Y} / N .$ $r = \sum z_X z_Y / N.$
A Function of the Angle Between the Two Standardized Regression Lines. The two regression lines (of $Y$ vs. $X$ and $X$ vs. $Y$ ) are symmetric about the diagonal. Let the angle between the two lines be $\beta$ . Then

$r = \sec (β) \pm \tan (β) .$ $r = \sec(\beta)\pm \tan(\beta).$
A Function of the Angle Between the Two Variable Vectors,

$r = \cos (α) .$ $r = \cos(\alpha).$
A Rescaled Variance of the Difference Between Standardized Scores. Letting $z_Y - z_X$ be the difference between standardized $X$ and $Y$ variables for each observation,

$r = 1 - s_{(z_{Y} - z_{X})}^{2} / 2 = s_{(z_{Y} + z_{X})}^{2} / 2 - 1.$ $r = 1 - s^2_{(z_Y - z_X)} / 2 = s^2_{(z_Y+z_X)}/2 - 1.$
Estimated from the "Balloon" Rule,

$r \approx \sqrt{1 - (h / H)^{2}}$ $r \approx \sqrt{1 - (h/H)^2}$
where $H$ is the vertical range of the entire $X-Y$ scatterplot and $h$ is the range through the "center of the distribution on the $X$ axis" (that is, through the point of means).
In Relation to the Bivariate Ellipses of Isoconcentration,

$r = \frac{D^{2} - d^{2}}{D^{2} + d^{2}}$ $r = \frac{D^2 - d^2}{D^2 + d^2}$
where $D$ and $d$ are the major and minor axis lengths, respectively. $r$ also equals the slope of the tangent line of an isocontour (in standardized coordinates) at the point the contour crosses the vertical axis.
A Function of Test Statistics from Designed Experiments,

$r = \frac{t}{\sqrt{t^{2} + n - 2}}$ $r = \frac{t}{\sqrt{t^2 + n-2}}$
where $t$ is the test statistic in a two-independent sample $t$ test for a designed experiment with two treatment conditions (coded as $X=0, 1$ ) and $n$ is the combined total number of observations in the two treatment groups.
The Ratio of Two Means. Assume bivariate normality and standardize the variables. Select some arbitrarily large value $X_c$ of $X$ . Then

$r = \frac{E (Y | X > X_{c})}{E (X | X > X_{c})} .$ $r = \frac{\mathbb{E}(Y\,|\,X\gt X_c)}{\mathbb{E}(X\,|\,X\gt X_c)}.$

(Most of this is verbatim, with very slight changes in some of the notation.)

Some other methods (perhaps original to this site) are

Via circles. $r$ is the slope of the regression line in standardized coordinates. This line can be characterized in various ways, including geometric ones, such as minimizing the total area of circles drawn between the line and the data points in a scatterplot.
By coloring rectangles. Covariance can be assessed by coloring rectangles in a scatterplot (that is, by summing signed areas of rectangles). When the scatterplot is standardized, the net amount of color--the total signed error--is $r$ .

— whuber
sumber

Thank you, @Avraham, for trying to bring this unanswered thread to some closure by posting an answer here.

— whuber