Saya akan mencoba menambahkan ke jawaban yang lain. Pertama, kelengkapan adalah kondisi teknis yang sebagian besar dibenarkan oleh teorema yang menggunakannya. Jadi mari kita mulai dengan beberapa konsep dan teorema terkait di mana mereka muncul.
Misalkan X=(X1,X2,…,Xn) mewakili vektor data iid, yang kami modelkan sebagai distribusi f(x;θ),θ∈Θ mana parameter θ mengatur data tidak diketahui. T=T(X) adalah cukup jika distribusi bersyarat dari X∣T tidak tergantung pada parameter θ . V=V(X) adalahtambahanjika distribusiV tidak tergantung padaθ (dalam keluargaf(x;θ) ). U=U(X) adalahpenaksir yang tidak memihak dari noljika ekspektasinya adalah nol, terlepas dariθ . S=S(X) adalahstatistik lengkapjika ada estimator yang tidak memihak dari nol berdasarkanS adalah identik nol, yaitu, jikaEg(S)=0(for all θ) lalug(S)=0 ae (untuk semuaθ ).
Sekarang, anggaplah Anda memiliki dua penaksir tidak bias yang berbeda dari θ berdasarkan pada statistik T memadai , g1(T),g2(T) . Yaitu, dalam simbol
Eg1(T)=θ,Eg2(T)=θ
danP(g1(T)≠g2(T))>0(untuk semuaθ). Makag1(T)−g2(T)adalah penaksir yang tidak memihak dari nol, yang tidak identik nol, membuktikan bahwaTtidak lengkap. Jadi, kelengkapan statistik yang cukupTmemberi kita bahwa ada tidak hanya ada satu estimator berisi unikθ based on T. That is already very close to the Lehmann–Scheffé theorem.
Mari kita lihat beberapa contoh. Misalkan X1,…,Xn sekarang adalah seragam iid pada interval (θ,θ+1) . Kita dapat menunjukkan bahwa ( X(1)<X(2)<⋯<X(n) adalah statistik urutan) pasangan (X(1),X(n)) cukup, tetapi tidak lengkap, karena perbedaan X(n)−X(1) adalah tambahan, kita dapat menghitung ekspektasinya, biarkanc(yang merupakan fungsi darinsaja), dan kemudianX(n)−X(1)−cakan menjadi penaksir yang tidak bias dari nol yang tidak identik nol. Jadi statistik kami yang cukup, dalam hal ini, tidak lengkap dan memadai. Dan kita dapat melihat apa artinya itu: ada fungsi statistik yang memadai yang tidak informatif tentangθ (in the context of the model). This cannot happen with a complete sufficient statistic; it is in a sense maximally informative, in that no functions of it are uninformative. On the other hand, if there is some function of the minimally sufficient statistic that has expectation zero, that could be seen as a noise term, disturbance/noise terms in models have expectation zero. So we could say that non-complete sufficient statistics do contain some noise.
Look again at the range R=X(n)−X(1) in this example. Since its distribution does not depend on θ, it doesn't by itself alone contain any information about θ. But, together with the sufficient statistic, it does! How? Look at the case where R=1 is observed.Then, in the context of our (known to be true) model, we have perfect knowledge of θ! Namely, we can say with certainty that θ=X(1). You can check that any other value for θ then leads to either X(1) or X(n) being an impossible observation, under the assumed model. On the other hand, if we observe R=0.1, then the range of possible values for θ is rather large (exercise ...).
In this sense, the ancillary statistic R does contain some information about the precision with which we can estimate θ based on this data and model. In this example, and others, the ancillary statistic R "takes over the role of the sample size". Usually, confidence intervals and such needs the sample size n, but in this example, we can make a conditional confidence interval this is computed using only R, not n (exercise.)
This was an idea of Fisher, that inference should be conditional on some ancillary statistic.
Now, Basu's theorem: If T is complete sufficient, then it is independent of any ancillary statistic. That is, inference based on a complete sufficient statistic is simpler, in that we do not need to consider conditional inference.
Conditioning on a statistic which is independent of T does not change anything, of course.
Then, a last example to give some more intuition. Change our uniform distribution example to a uniform distribution on the interval (θ1,θ2) (with θ1<θ2). In this case the statistic (X(1),X(n)) is complete and sufficient. What changed? We can see that completeness is really a property of the model. In the former case, we had a restricted parameter space. This restriction destroyed completeness by introducing relationships on the order statistics. By removing this restriction we got completeness! So, in a sense, lack of completeness means that the parameter space is not big enough, and by enlarging it we can hope to restore completeness (and thus, easier inference).
Some other examples where lack of completeness is caused by restrictions on the parameter space,
see my answer to: What kind of information is Fisher information?
Let X1,…,Xn be iid Cauchy(θ,σ) (a location-scale model). Then the order statistics in sufficient but not complete. But now enlarge this model to a fully nonparametric model, still iid but from some completely unspecified distribution F. Then the order statistics is sufficient and complete.
For exponential families with canonical parameter space (that is, as large as possible) the minimal sufficient statistic is also complete. But in many cases, introducing restrictions on the parameter space, as with curved exponential families, destroys completeness.
A very relevant paper is An Interpretation of Completeness and
Basu's Theorem.