Tampilkan taksiran konvergen ke persentil melalui statistik pesanan


10

Biarkan menjadi urutan variabel acak iid yang disampel dari distribusi stabil alpha , dengan parameter .X1,X2,,X3nα=1.5,β=0,c=1.0,μ=1.0

Sekarang perhatikan urutan , di mana , untuk .Y1,Y2,,YnYj+1=X3j+1X3j+2X3j+31j=0,,n1

Saya ingin memperkirakan persentil .0.01

Ide saya adalah melakukan semacam simulasi Monte-Carlo:

l = 1;
while(l < max_iterations)
{
  Generate $X_1, X_2, \ldots, X_{3n}$ and compute $Y_1, Y_2, \ldots, Y_{n}$;
  Compute $0.01-$percentile of current repetition;
  Compute mean $0.01-$percentile of all the iterations performed;
  Compute variance of $0.01-$percentile of all the iterations performed;
  Calculate confidence interval for the estimate of the $0.01-$percentile;

  if(confidence interval is small enough)
    break;

}

Memanggil mean dari semua sampel persentil yang dihitung sebagai dan , untuk menghitung interval kepercayaan yang sesuai untuk , saya resor ke bentuk Kuat dari Teorema Limit Pusat :0.01μ^nσ^n2μ

Misalkan menjadi urutan variabel acak iid dengan dan . Tetapkan mean sampel sebagai . Kemudian, memiliki distribusi normal standar yang membatasi, yaitu X1,X2,E[Xi]=μ0<V[Xi]=σ2<μ^n=(1/n)i=1nXi(μ^nμ)/σ2/n

μ^nμσ2/nnN(0,1).

dan teorema Slutksy untuk menyimpulkan bahwa

nμ^nμσ^n2nN(0,1).

Maka interval kepercayaan untuk adalah(1α)×100%μ

Iα=[μ^nz1α/2σ^n2n,μ^n+z1α/2σ^n2n],
where z1α/2 is the (1α/2)-quantile of the standard normal distribution.

Questions:

1) Is my approach correct? How can I justify the application of the CLT? I mean, how can I show that the variance is finite? (Do I have to look at the variance of Yj? Because I don't think it is finite...)

2) How can I show that the average of all the sample 0.01percentiles computed converges to the true value of the 0.01percentile? (I should use order statistics but I'm unsure on how to procceed; references are appreciated.)


3
All the methods applied to sample medians at stats.stackexchange.com/questions/45124 also apply to other percentiles. In effect, your question is identical to that one but merely replaces the 50th percentile with the 1st (or 0.01 perhaps?) percentile.
whuber

@whuber, your answer to that question is extremely good. however, Glen_b states, at the end of his post (the accepted answer), that the approximate normality "doesn't hold for extreme quantiles, because the CLT doesn't kick in there (the average of Z's won't be asymptotically normal). You need different theory for extreme values". How concerned should I be about this statement?
Maya

2
I believe he didn't really mean extreme quantiles, but only the extremes themselves. (In fact, he corrected that lapse at the end of the same sentence, referring to them as "extreme values.") The distinction is that an extreme quantile, such as the .01 percentile (which marks the bottom 1/10000th of the distribution) will, in the limit, stabilize because more and more data in a sample will still fall below and more and more will fall above that percentile. With an extreme (such as the maximum or minimum) that is no longer the case.
whuber

This is a problem that should be solved in general using empirical process theory. Some help about your level of training would be helpful.
AdamO

Jawaban:


2

The variance of Y is not finite. This is because an alpha-stable variable X with α=3/2 (a Holtzmark distribution) does have a finite expectation μ but its variance is infinite. If Y had a finite variance σ2, then by exploiting the independence of the Xi and the definition of variance we could compute

σ2=Var(Y)=E(Y2)E(Y)2=E(X12X22X32)E(X1X2X3)2=E(X2)3(E(X)3)2=(Var(X)+E(X)2)3μ6=(Var(X)+μ2)3μ6.

This cubic equation in Var(X) has at least one real solution (and up to three solutions, but no more), implying Var(X) would be finite--but it's not. This contradiction proves the claim.


Let's turn to the second question.

Any sample quantile converges to the true quantile as the sample grows large. The next few paragraphs prove this general point.

Let the associated probability be q=0.01 (or any other value between 0 and 1, exclusive). Write F for the distribution function, so that Zq=F1(q) is the qth quantile.

All we need to assume is that F1 (the quantile function) is continuous. This assures us that for any ϵ>0 there are probabilities q<q and q+>q for which

F(Zqϵ)=q,F(Zq+ϵ)=q+,

and that as ϵ0, the limit of the interval [q,q+] is {q}.

Consider any iid sample of size n. The number of elements of this sample that are less than Zq has a Binomial(q,n) distribution, because each element independently has a chance q of being less than Zq. The Central Limit Theorem (the usual one!) implies that for sufficiently large n, the number of elements less than Zq is given by a Normal distribution with mean nq and variance nq(1q) (to an arbitrarily good approximation). Let the CDF of the standard Normal distribution be Φ. The chance that this quantity exceeds nq therefore is arbitrarily close to

1Φ(nqnqnq(1q))=1Φ(nqqq(1q)).

Because the argument on Φ on the right hand side is a fixed multiple of n, it grows arbitrarily large as n grows. Since Φ is a CDF, its value approaches arbitrarily close to 1, showing the limiting value of this probability is zero.

In words: in the limit, it is almost surely the case that nq of the sample elements are not less than Zq. An analogous argument proves it is almost surely the case that nq of the sample elements are not greater than Zq+. Together, these imply the q quantile of a sufficiently large sample is extremely likely to lie between Zqϵ and Zq+ϵ.

That's all we need in order to know that simulation will work. You may choose any desired degree of accuracy ϵ and confidence level 1α and know that for a sufficiently large sample size n, the order statistic closest to nq in that sample will have a chance at least 1α of being within ϵ of the true quantile Zq.


Having established that a simulation will work, the rest is easy. Confidence limits can be obtained from limits for the Binomial distribution and then back-transformed. Further explanation (for the q=0.50 quantile, but generalizing to all quantiles) can be found in the answers at Central limit theorem for sample medians.

Figure: histogram of 0.01 quantiles of Y with n=300 for 1000 iterations

The q=0.01 quantile of Y is negative. Its sampling distribution is highly skewed. To reduce the skew, this figure shows a histogram of the logarithms of the negatives of 1,000 simulated samples of n=300 values of Y.

library(stabledist)
n <- 3e2
q <- 0.01
n.sim <- 1e3

Y.q <- replicate(n.sim, {
  Y <- apply(matrix(rstable(3*n, 3/2, 0, 1, 1), nrow=3), 2, prod) - 1
  log(-quantile(Y, 0.01))
})
m <- median(-exp(Y.q))
hist(Y.q, freq=FALSE, 
     main=paste("Histogram of the", q, "quantile of Y for", n.sim, "iterations" ),
     xlab="Log(-Y_q)",
     sub=paste("Median is", signif(m, 4), 
               "Negative log is", signif(log(-m), 4)),
     cex.sub=0.8)
abline(v=log(-m), col="Red", lwd=2)
Dengan menggunakan situs kami, Anda mengakui telah membaca dan memahami Kebijakan Cookie dan Kebijakan Privasi kami.
Licensed under cc by-sa 3.0 with attribution required.