Diharapkan berapa kali untuk menggulung dadu sampai masing-masing sisi muncul 3 kali

15

Berapa yang diharapkan berapa kali Anda harus melempar dadu sampai masing-masing sisi muncul 3 kali?

Pertanyaan ini ditanyakan di sekolah dasar di Selandia Baru dan diselesaikan dengan simulasi. Apa solusi analitis untuk masalah ini?

— Edgar Santos
sumber

6

Karena hasil dari gulungan adalah acak, tidak mungkin untuk mengetahui sebelumnya berapa banyak gulungan yang dibutuhkan. Jika pertanyaan mencari, misalnya, jumlah gulungan yang diharapkan sebelum masing-masing sisi muncul 3 kali, itu harus dinyatakan secara eksplisit. Dalam hal ini, stats.stackexchange.com/tags/self-study/info berlaku.

— Juho Kokkala

3

Beri tahu anak-anak Selandia Baru untuk membaca Norman L. Johnson, Samuel Kotz, N. Balakrishnan "Discrete Multivariate Distribution" wiley.com/WileyCDA/WileyTitle/productCd-0471128449.html .

— Mark L. Stone

3

terkait: Seberapa sering Anda harus melempar dadu 6 sisi untuk mendapatkan setiap angka setidaknya satu kali?

— Sycorax berkata Reinstate Monica

28

Misalkan semua sisi memiliki peluang yang sama. Menggeneralisasikan dan mari kita menemukan jumlah yang diharapkan dari gulungan diperlukan sampai sisi telah muncul kali, sisi telah muncul kali, ..., dan sisi telah muncul kali. Karena identitas para pihak tidak penting (mereka semua memiliki kesempatan yang sama), deskripsi tujuan ini dapat terkondensasi: mari kita anggap bahwa belah pihak tidak perlu muncul sama sekali, dari sisi perlu muncul hanya sekali, ..., dan $d=6$ $1$ $n_1$ $2$ $n_2$ $d$ $n_d$ $i_0$ $i_1$ $i_n$ sisi harus muncul kali. Biarkan tentukan situasi ini dan tulis untuk jumlah gulungan yang diharapkan. Pertanyaannya menanyakan : $n=\max(n_1,n_2,\ldots,n_d)$

i = (i_{0}, i_{1}, \dots, i_{n})

$\mathbf{i}=(i_0,i_1,\ldots,i_n)$

e (i)

$e(\mathbf{i})$

e (0, 0, 0, 6)

$e(0,0,0,6)$

menunjukkan keenam sisi harus dilihat masing-masing tiga kali.

i_{3} = 6

$i_3 = 6$

Perulangan mudah tersedia. Pada gulungan berikutnya, sisi yang muncul bersesuaian dengan salah satu : yaitu, baik kita tidak perlu melihatnya, atau kita perlu melihat sekali, ..., atau kami harus melihatnya lebih waktu. adalah berapa kali kita perlu melihatnya. $i_j$ $n$ $j$

Ketika , kami tidak perlu melihatnya dan tidak ada yang berubah. Ini terjadi dengan probabilitas . $j=0$ $i_0/d$
Ketika maka kita memang perlu melihat sisi ini. Sekarang ada satu sisi lebih sedikit yang perlu dilihat kali dan satu sisi lagi yang perlu dilihat kali. Jadi, menjadi dan menjadi . Biarkan operasi ini pada komponen ditunjuk , sehingga $j \gt 0$ $j$ $j-1$ $i_j$ $i_j-1$ $i_{j-1}$ $i_j+1$ $\mathbf{i}$ $\mathbf{i}\cdot j$

$i \cdot j = (i_{0}, \dots, i_{j - 2}, i_{j - 1} + 1, i_{j} - 1, i_{j + 1}, \dots, i_{n}) .$ $\mathbf{i}\cdot j = (\color{gray}{i_0, \ldots, i_{j-2}}, i_{j-1}+1, i_j-1, \color{gray}{i_{j+1},\ldots, i_n}).$
Ini terjadi dengan probabilitas . $i_j/d$

Kita hanya perlu menghitung gulungan ini dan menggunakan rekursi untuk memberi tahu kami berapa banyak gulungan yang diharapkan. Dengan hukum harapan dan probabilitas total,

e (i) = 1 + \frac{i_{0}}{d} e (i) + \sum_{j = 1}^{n} \frac{i_{j}}{d} e (i \cdot j)

$e(\mathbf{i}) = 1 + \frac{i_0}{d}e(\mathbf{i}) + \sum_{j=1}^n \frac{i_j}{d}e(\mathbf{i}\cdot j)$

(Mari kita pahami bahwa setiap kali , istilah terkait dalam penjumlahan adalah nol.) $i_j=0$

Jika , kita selesai dan . Kalau tidak, kita dapat memecahkan untuk , memberikan formula rekursif yang diinginkan $i_0=d$ $e(\mathbf{i}) =0$ $e(\mathbf{i})$

\begin{matrix} (1) & e (i) = \frac{d + i_{1} e (i \cdot 1) + \dots + i_{n} e (i \cdot n)}{d - i_{0}} . \end{matrix}

$e(\mathbf{i}) = \frac{d + i_1 e(\mathbf{i}\cdot 1) + \cdots + i_n e(\mathbf{i}\cdot n)}{d - i_0}.\tag{1}$

Perhatikan bahwa adalah jumlah total acara yang ingin kita lihat. Operasi mengurangi kuantitas itu menjadi satu untuk setiap asalkan , yang selalu demikian. Oleh karena itu rekursi ini berakhir pada kedalaman yang tepat (sama dengan

| i | = 0 (i_{0}) + 1 (i_{1}) + \dots + n (i_{n})

$|\mathbf{i}| = 0(i_0) + 1(i_1) + \cdots + n(i_n)$

\cdot j

$\cdot j$

j > 0

$j\gt 0$

i_{j} > 0

$i_j \gt 0$

| i |

$|\mathbf{i}|$

dalam pertanyaan). Selain itu (karena tidak sulit untuk memeriksa) jumlah kemungkinan pada setiap kedalaman rekursi dalam pertanyaan ini kecil (tidak pernah melebihi

). Akibatnya, ini adalah metode yang efisien, setidaknya ketika kemungkinan kombinatorial tidak terlalu banyak dan kami memoize hasil antara (sehingga tidak ada nilai

dihitung lebih dari sekali).

3 (6) = 18

$3(6) = 18$

8

$8$

e

$e$

Saya menghitung bahwa

e (0, 0, 0, 6) = \frac{2 286 878 604 508 883}{69 984 000 000 000} \approx 32.677.

$e(0,0,0,6) = \frac{2\,286\,878\,604\,508\,883}{69\,984\,000\,000\,000}\approx 32.677.$

R $32.669$ $0.027$

$18$

# Specify the problem
d <- 6   # Number of faces
k <- 3   # Number of times to see each
N <- 3.26772e6 # Number of rolls

# Simulate many rolls
set.seed(17)
x <- sample(1:d, N, replace=TRUE)

# Use these rolls to play the game repeatedly.
totals <- sapply(1:d, function(i) cumsum(x==i))
n <- 0
base <- rep(0, d)
i.last <- 0
n.list <- list()
for (i in 1:N) {
  if (min(totals[i, ] - base) >= k) {
    base <- totals[i, ]
    n <- n+1
    n.list[[n]] <- i - i.last
    i.last <- i
  }
}

# Summarize the results
sim <- unlist(n.list)
mean(sim)
sd(sim) / sqrt(length(sim))
length(sim)
hist(sim, main="Simulation results", xlab="Number of rolls", freq=FALSE, breaks=0:max(sim))

Penerapan

Although the recursive calculation of $e$ is simple, it presents some challenges in some computing environments. Chief among these is storing the values of $e(\mathbf{i})$ as they are computed. This is essential, for otherwise each value will be (redundantly) computed a very large number of times. However, the storage potentially needed for an array indexed by $\mathbf{i}$ could be enormous. Ideally, only values of $\mathbf{i}$ that are actually encountered during the computation should be stored. This calls for a kind of associative array.

To illustrate, here is working R code. The comments describe the creation of a simple "AA" (associative array) class for storing intermediate results. Vectors $\mathbf{i}$ are converted to strings and those are used to index into a list E that will hold all the values. The $\mathbf{i}\cdot j$ operation is implemented as %.%.

These preliminaries enable the recursive function $e$ to be defined rather simply in a way that parallels the mathematical notation. In particular, the line

x <- (d + sum(sapply(1:n, function(i) j[i+1]*e.(j %.% i))))/(d - j[1])

is directly comparable to the formula $(1)$ above. Note that all indexes have been increased by $1$ because R starts indexing its arrays at $1$ rather than $0$ .

Timing shows it takes $0.01$ seconds to compute e(c(0,0,0,6)); its value is

32.6771634160506

Accumulated floating point roundoff error has destroyed the last two digits (which should be 68 rather than 06).

e <- function(i) {
  #
  # Create a data structure to "memoize" the values.
  #
  `[[<-.AA` <- function(x, i, value) {
    class(x) <- NULL
    x[[paste(i, collapse=",")]] <- value
    class(x) <- "AA"
    x
  }
  `[[.AA` <- function(x, i) {
    class(x) <- NULL
    x[[paste(i, collapse=",")]]
  }
  E <- list()
  class(E) <- "AA"
  #
  # Define the "." operation.
  #
  `%.%` <- function(i, j) {
    i[j+1] <- i[j+1]-1
    i[j] <- i[j] + 1
    return(i)
  }
  #
  # Define a recursive version of this function.
  #
  e. <- function(j) {
    #
    # Detect initial conditions and return initial values.
    #
    if (min(j) < 0 || sum(j[-1])==0) return(0)
    #
    # Look up the value (if it has already been computed).
    #
    x <- E[[j]]
    if (!is.null(x)) return(x)
    #
    # Compute the value (for the first and only time).
    #
    d <- sum(j)
    n <- length(j) - 1
    x <- (d + sum(sapply(1:n, function(i) j[i+1]*e.(j %.% i))))/(d - j[1])
    #
    # Store the value for later re-use.
    #
    E[[j]] <<- x
    return(x)
  }
  #
  # Do the calculation.
  #
  e.(i)
}
e(c(0,0,0,6))

Finally, here is the original Mathematica implementation that produced the exact answer. The memoization is accomplished via the idiomatic e[i_] := e[i] = ... expression, eliminating almost all the R preliminaries. Internally, though, the two programs are doing the same things in the same way.

shift[j_, x_List] /; Length[x] >= j >= 2 := Module[{i = x},
   i[[j - 1]] = i[[j - 1]] + 1;
   i[[j]] = i[[j]] - 1;
   i];
e[i_] := e[i] = With[{i0 = First@i, d = Plus @@ i},
    (d + Sum[If[i[[k]] > 0, i[[k]]  e[shift[k, i]], 0], {k, 2, Length[i]}])/(d - i0)];
e[{x_, y__}] /; Plus[y] == 0  := e[{x, y}] = 0

e[{0, 0, 0, 6}]

$\frac{2286878604508883}{69984000000000}$

— whuber
sumber

5

+1 I imagine some of the notation would be difficult to follow for the students who were asked this question (not that I have any concrete alternative to suggest right now). On the other hand I wonder what they were intended to do with such a question.

— Glen_b -Reinstate Monica

1

@Glen_b They could learn a lot by actually rolling the dice (and tallying the results). It sounds like a good way to keep a class busy for a half hour while the teacher rests :-).

— whuber

12

The original version of this question started life by asking:

how many rolls are needed until each side has appeared 3 times

Of course, that is a question that does not have an answer as @JuhoKokkala commented above: the answer is a random variable with a distribution that needs to be found. The question was then modified to ask: "What is the expected number of rolls." The answer below seeks to answer the original question posed: how to find the distribution of the number of rolls, without using simulation, and just using conceptually simple techniques any New Zealand student with a computer could implement $\rightarrow$ Why not? The problem reduces to a 1-liner.

Distribution of the number of rolls required ... such that each side appears 3 times

We roll a die $n$ times. Let $X_i$ denote the number of times side $i$ of the die appears, where $i \in \{1, \dots, 6\}$ . Then, the joint pmf of $(X_1, X_2,\dots, X_6)$ is $\text{Multinomial}(n,\frac16)$ i.e.:

P (X_{1} = x_{1}, \dots, X_{6} = x_{6}) = \frac{n!}{x_{1}! \dots x_{6}!} \frac{1}{6^{n}} subject to: \sum_{i = 1}^{6} x_{i} = n

$P\left(X_1=x_1,\ldots ,X_6=x_6\right) \; = \; \frac{n! }{ x_1! \cdots x_6!} \; \frac{1}{6^n} \quad \text{ subject to: } \quad \sum _{i=1}^6 x_i=n$

Let: $\quad N = \min\big\{n: \; {X_i \geq 3 \; \forall_i } \big\}. \;$ Then the cdf of $N$ is: $\quad P(N \leq n) \; = \; P\big(X_{\forall_i} \geq 3 \; \big| \; n\big)$

i.e. To find the cdf $P(N \leq n)$ , simply calculate for each value of $n = \{18, 19, 20,\dots\}$ :

P (X_{1} \geq 3, \dots, X_{6} \geq 3) where (X_{1}, \dots, X_{6}) \sim Multinomial (n, \frac{1}{6})

$P(X_1 \geq3, \dots , X_6 \geq 3) \quad \text{ where } \quad (X_1, \dots, X_6) \sim \text{Multinomial}(n,\frac16)$

Here, for example, is Mathematica code that does this, as $n$ increases from 18 to say 60. It is basically a one-liner:

 cdf = ParallelTable[ 
   Probability[x1 >= 3 && x2 >= 3 && x3 >= 3 && x4 >= 3 && x5 >= 3 &&  x6 >= 3, 
       {x1, x2, x3, x4, x5, x6} \[Distributed] MultinomialDistribution[n, Table[1/6, 6]]],
    {n, 18, 60}]

... which yields the exact cdf as $n$ increases:

\begin{array}{cc} 18 & \frac{14889875}{11019960576} \\ 19 & \frac{282907625}{44079842304} \\ 20 & \frac{3111983875}{176319369216} \\ 21 & \frac{116840849125}{3173748645888} \\ 22 & \frac{3283142988125}{50779978334208} \\ 23 & \frac{61483465418375}{609359740010496} \\ ⋮ & ⋮ \end{array}

$\begin{array}{cc} 18 & \frac{14889875}{11019960576} \\ 19 & \frac{282907625}{44079842304} \\ 20 & \frac{3111983875}{176319369216} \\ 21 & \frac{116840849125}{3173748645888} \\ 22 & \frac{3283142988125}{50779978334208} \\ 23 & \frac{61483465418375}{609359740010496} \\ \vdots & \vdots\\ \\ \end{array}$

Here is a plot of the cdf $P(N\leq n)$ , as a function of $n$ :

To derive the pmf $P(N=n)$ , simply first difference the cdf:

Of course, the distribution has no upper bound, but we can readily solve here for as many values as practically required. The approach is general and should work just as well for any desired combination of sides required.

— wolfies
sumber