Apa intuisi di balik mendefinisikan kelengkapan dalam statistik sebagai tidak mungkin untuk membentuk penaksir

21

Dalam statistik klasik, ada definisi yang statistik $T$ dari set data $y_1, \ldots, y_n$ didefinisikan sebagai lengkap untuk parameter $\theta$ adalah mustahil untuk membentuk sebuah estimator berisi dari $0$ dari itu nontrivially. Artinya, satu-satunya cara untuk memiliki $E h(T (y )) = 0$ untuk semua $\theta$ adalah dengan memiliki $h$ menjadi $0$ hampir pasti.

Apakah ada intuisi di balik ini? Sepertinya ini adalah cara yang agak mekanis untuk mendefinisikan ini, saya sadar ini telah ditanyakan sebelumnya, tetapi bertanya-tanya apakah ada intuisi yang sangat mudah dipahami yang akan membuat siswa pengantar memiliki waktu yang lebih mudah untuk mencerna materi.

— pengguna1398057
sumber

2

Itu pertanyaan yang sangat bagus, saya harus menggali sendiri. Ternyata alasannya seperti definisi mekanis dan tidak tampak secara intuitif bermakna bagi seorang praktisi standar seperti saya adalah bahwa ini terutama digunakan untuk membuktikan kontribusi mendasar dalam statistik matematika. Secara khusus, pencarian singkat saya mengungkapkan bahwa teorema Lehmann-Scheffé dan teorema Basu membutuhkan kelengkapan statistik untuk dapat dipertahankan. Ini adalah kontribusi pertengahan 1950-an. Saya tidak bisa memberikan penjelasan yang intuitif kepada Anda - tetapi jika Anda benar-benar ingin membuatnya, mungkin bukti asosiasi

— Jeremias K

18

Saya akan mencoba menambahkan ke jawaban yang lain. Pertama, kelengkapan adalah kondisi teknis yang sebagian besar dibenarkan oleh teorema yang menggunakannya. Jadi mari kita mulai dengan beberapa konsep dan teorema terkait di mana mereka muncul.

Misalkan $X=(X_1,X_2,\dotsc,X_n)$ mewakili vektor data iid, yang kami modelkan sebagai distribusi $f(x;\theta), \theta \in \Theta$ mana parameter $\theta$ mengatur data tidak diketahui. $T=T(X)$ adalah cukup jika distribusi bersyarat dari $X \mid T$ tidak tergantung pada parameter $\theta$ . $V=V(X)$ adalahtambahanjika distribusi $V$ tidak tergantung pada $\theta$ (dalam keluarga $f(x;\theta)$ ). $U=U(X)$ adalahpenaksir yang tidak memihak dari noljika ekspektasinya adalah nol, terlepas dari $\theta$ . $S=S(X)$ adalahstatistik lengkapjika ada estimator yang tidak memihak dari nol berdasarkan $S$ adalah identik nol, yaitu, jika $\DeclareMathOperator{\E}{\mathbb{E}} \E g(S)=0 (\text{for all $\theta$})$ lalu $g(S)=0$ ae (untuk semua $\theta$ ).

Sekarang, anggaplah Anda memiliki dua penaksir tidak bias yang berbeda dari $\theta$ berdasarkan pada statistik $T$ memadai , $g_1(T), g_2(T)$ . Yaitu, dalam simbol

E g_{1} (T) = θ, E g_{2} (T) = θ

$\E g_1(T)=\theta ,\\ \E g_2(T)=\theta$ dan

P (g_{1} (T) \neq g_{2} (T)) > 0

$\DeclareMathOperator{\P}{\mathbb{P}} \P(g_1(T) \not= g_2(T) ) > 0$ (untuk semua

θ

$\theta$ ). Maka

g_{1} (T) - g_{2} (T)

$g_1(T)-g_2(T)$ adalah penaksir yang tidak memihak dari nol, yang tidak identik nol, membuktikan bahwa

T

$T$ tidak lengkap. Jadi, kelengkapan statistik yang cukup

T

$T$ memberi kita bahwa ada tidak hanya ada satu estimator berisi unik

θ

$\theta$ based on

T

$T$ . That is already very close to the Lehmann–Scheffé theorem.

Mari kita lihat beberapa contoh. Misalkan $X_1, \dotsc, X_n$ sekarang adalah seragam iid pada interval $(\theta, \theta+1)$ . Kita dapat menunjukkan bahwa ( $X_{(1)} < X_{(2)} < \dotsm < X_{(n)}$ adalah statistik urutan) pasangan $(X_{(1)}, X_{(n)})$ cukup, tetapi tidak lengkap, karena perbedaan $X_{(n)}-X_{(1)}$ adalah tambahan, kita dapat menghitung ekspektasinya, biarkan $c$ (yang merupakan fungsi dari $n$ saja), dan kemudian $X_{(n)}-X_{(1)} -c$ akan menjadi penaksir yang tidak bias dari nol yang tidak identik nol. Jadi statistik kami yang cukup, dalam hal ini, tidak lengkap dan memadai. Dan kita dapat melihat apa artinya itu: ada fungsi statistik yang memadai yang tidak informatif tentang $\theta$ (in the context of the model). This cannot happen with a complete sufficient statistic; it is in a sense maximally informative, in that no functions of it are uninformative. On the other hand, if there is some function of the minimally sufficient statistic that has expectation zero, that could be seen as a noise term, disturbance/noise terms in models have expectation zero. So we could say that non-complete sufficient statistics do contain some noise.

Look again at the range $R=X_{(n)}-X_{(1)}$ in this example. Since its distribution does not depend on $\theta$ , it doesn't by itself alone contain any information about $\theta$ . But, together with the sufficient statistic, it does! How? Look at the case where $R=1$ is observed.Then, in the context of our (known to be true) model, we have perfect knowledge of $\theta$ ! Namely, we can say with certainty that $\theta = X_{(1)}$ . You can check that any other value for $\theta$ then leads to either $X_{(1)}$ or $X_{(n)}$ being an impossible observation, under the assumed model. On the other hand, if we observe $R=0.1$ , then the range of possible values for $\theta$ is rather large (exercise ...).

In this sense, the ancillary statistic $R$ does contain some information about the precision with which we can estimate $\theta$ based on this data and model. In this example, and others, the ancillary statistic $R$ "takes over the role of the sample size". Usually, confidence intervals and such needs the sample size $n$ , but in this example, we can make a conditional confidence interval this is computed using only $R$ , not $n$ (exercise.) This was an idea of Fisher, that inference should be conditional on some ancillary statistic.

Now, Basu's theorem: If $T$ is complete sufficient, then it is independent of any ancillary statistic. That is, inference based on a complete sufficient statistic is simpler, in that we do not need to consider conditional inference. Conditioning on a statistic which is independent of $T$ does not change anything, of course.

Then, a last example to give some more intuition. Change our uniform distribution example to a uniform distribution on the interval $(\theta_1, \theta_2)$ (with $\theta_1<\theta_2$ ). In this case the statistic $(X_{(1)}, X_{(n)})$ is complete and sufficient. What changed? We can see that completeness is really a property of the model. In the former case, we had a restricted parameter space. This restriction destroyed completeness by introducing relationships on the order statistics. By removing this restriction we got completeness! So, in a sense, lack of completeness means that the parameter space is not big enough, and by enlarging it we can hope to restore completeness (and thus, easier inference).

Some other examples where lack of completeness is caused by restrictions on the parameter space,

see my answer to: What kind of information is Fisher information?
Let $X_1, \dotsc, X_n$ be iid $\mathcal{Cauchy}(\theta,\sigma)$ (a location-scale model). Then the order statistics in sufficient but not complete. But now enlarge this model to a fully nonparametric model, still iid but from some completely unspecified distribution $F$ . Then the order statistics is sufficient and complete.
For exponential families with canonical parameter space (that is, as large as possible) the minimal sufficient statistic is also complete. But in many cases, introducing restrictions on the parameter space, as with curved exponential families, destroys completeness.

A very relevant paper is An Interpretation of Completeness and Basu's Theorem.

— kjetil b halvorsen
sumber

7

Some intuition may be available from the theory of best (minimum variance) unbiased estimators.

If $E_\theta W=\tau(\theta)$ then $W$ is a best unbiased estimator of $\tau(\theta)$ iff $W$ is uncorrelated with all unbiased estimators of zero.

Proof: Let $W$ be an unbiased estimator uncorrelated with all unbiased estimators of zero. Let $W'$ be another estimator such that $E_\theta W'=E_\theta W=\tau(\theta)$ . Write $W'=W+(W'-W)$ . By assumption, $Var_\theta W'=Var_\theta W+Var_\theta (W'-W)$ . Hence, for any $W'$ , $Var_\theta W'\geq Var_\theta W$ .

Now assume that $W$ is a best unbiased estimator. Let there be some other estimator $U$ with $E_\theta U=0$ . $\phi_a:=W+aU$ is also unbiased for $\tau(\theta)$ . We have

V a r_{θ} ϕ_{a} := V a r_{θ} W + 2 a C o v_{θ} (W, U) + a^{2} V a r_{θ} U .

$Var_\theta \phi_a:=Var_\theta W+2aCov_\theta(W,U)+a^2Var_\theta U.$ If there were a

θ_{0} \in Θ

$\theta_0\in\Theta$ such that

C o v_{θ_{0}} (W, U) < 0

$Cov_{\theta_0}(W,U)<0$ , we would obtain

V a r_{θ} ϕ_{a} < V a r_{θ} W

$Var_\theta \phi_a<Var_\theta W$ for

a \in (0, - 2 C o v_{θ_{0}} (W, U) / V a r_{θ_{0}} U)

$a\in(0,-2Cov_{\theta_0}(W,U)/Var_{\theta_0} U)$ .

W

$W$ could then not be the best unbiased estimator. QED

Intuitively, the result says that if an estimator is optimal, it must not be possible to improve it by just adding some noise to it, in the sense of combining it with an estimator that is just zero on average (being an unbiased estimator of zero).

Unfortunately, it is difficult to characterize all unbiased estimators of zero. The situation becomes much simpler if zero itself is the only unbiased estimator of zero, as any statistic $W$ satisfies $Cov_\theta(W,0)=0$ . Completeness describes such a situation.

— Christoph Hanck
sumber