Berapa yang diharapkan berapa kali Anda harus melempar dadu sampai masing-masing sisi muncul 3 kali?
Pertanyaan ini ditanyakan di sekolah dasar di Selandia Baru dan diselesaikan dengan simulasi. Apa solusi analitis untuk masalah ini?
Berapa yang diharapkan berapa kali Anda harus melempar dadu sampai masing-masing sisi muncul 3 kali?
Pertanyaan ini ditanyakan di sekolah dasar di Selandia Baru dan diselesaikan dengan simulasi. Apa solusi analitis untuk masalah ini?
Jawaban:
Misalkan semua sisi memiliki peluang yang sama. Menggeneralisasikan dan mari kita menemukan jumlah yang diharapkan dari gulungan diperlukan sampai sisi 1 telah muncul n 1 kali, sisi 2 telah muncul n 2 kali, ..., dan sisi d telah muncul n d kali. Karena identitas para pihak tidak penting (mereka semua memiliki kesempatan yang sama), deskripsi tujuan ini dapat terkondensasi: mari kita anggap bahwa saya 0 belah pihak tidak perlu muncul sama sekali, saya 1 dari sisi perlu muncul hanya sekali, ..., dan i nsisi harus muncul kali. Biarkan i = ( i 0 , i 1 , ... , i n ) tentukan situasi ini dan tulis e ( i ) untuk jumlah gulungan yang diharapkan. Pertanyaannya menanyakan e ( 0 , 0 , 0 , 6 ) : i 3 =
Perulangan mudah tersedia. Pada gulungan berikutnya, sisi yang muncul bersesuaian dengan salah satu : yaitu, baik kita tidak perlu melihatnya, atau kita perlu melihat sekali, ..., atau kami harus melihatnya n lebih waktu. j adalah berapa kali kita perlu melihatnya.
Ketika , kami tidak perlu melihatnya dan tidak ada yang berubah. Ini terjadi dengan probabilitas i 0 / d .
Ketika maka kita memang perlu melihat sisi ini. Sekarang ada satu sisi lebih sedikit yang perlu dilihat j kali dan satu sisi lagi yang perlu dilihat j - 1 kali. Jadi, i j menjadi i j - 1 dan i j - 1 menjadi i j + 1 . Biarkan operasi ini pada komponen i ditunjuk i ⋅ j , sehingga
Ini terjadi dengan probabilitas .
Kita hanya perlu menghitung gulungan ini dan menggunakan rekursi untuk memberi tahu kami berapa banyak gulungan yang diharapkan. Dengan hukum harapan dan probabilitas total,
(Mari kita pahami bahwa setiap kali , istilah terkait dalam penjumlahan adalah nol.)
Jika , kita selesai dan e ( i ) = 0 . Kalau tidak, kita dapat memecahkan untuk e ( i ) , memberikan formula rekursif yang diinginkan
Perhatikan bahwa adalah jumlah total acara yang ingin kita lihat. Operasi ⋅ j mengurangi kuantitas itu menjadi satu untuk setiap j > 0 asalkan i j > 0 , yang selalu demikian. Oleh karena itu rekursi ini berakhir pada kedalaman yang tepat | saya | (sama dengan 3 ( 6 ) =
Saya menghitung bahwa
R
# Specify the problem
d <- 6 # Number of faces
k <- 3 # Number of times to see each
N <- 3.26772e6 # Number of rolls
# Simulate many rolls
set.seed(17)
x <- sample(1:d, N, replace=TRUE)
# Use these rolls to play the game repeatedly.
totals <- sapply(1:d, function(i) cumsum(x==i))
n <- 0
base <- rep(0, d)
i.last <- 0
n.list <- list()
for (i in 1:N) {
if (min(totals[i, ] - base) >= k) {
base <- totals[i, ]
n <- n+1
n.list[[n]] <- i - i.last
i.last <- i
}
}
# Summarize the results
sim <- unlist(n.list)
mean(sim)
sd(sim) / sqrt(length(sim))
length(sim)
hist(sim, main="Simulation results", xlab="Number of rolls", freq=FALSE, breaks=0:max(sim))
Although the recursive calculation of is simple, it presents some challenges in some computing environments. Chief among these is storing the values of as they are computed. This is essential, for otherwise each value will be (redundantly) computed a very large number of times. However, the storage potentially needed for an array indexed by could be enormous. Ideally, only values of that are actually encountered during the computation should be stored. This calls for a kind of associative array.
To illustrate, here is working R
code. The comments describe the creation of a simple "AA" (associative array) class for storing intermediate results. Vectors are converted to strings and those are used to index into a list E
that will hold all the values. The operation is implemented as %.%
.
These preliminaries enable the recursive function to be defined rather simply in a way that parallels the mathematical notation. In particular, the line
x <- (d + sum(sapply(1:n, function(i) j[i+1]*e.(j %.% i))))/(d - j[1])
is directly comparable to the formula above. Note that all indexes have been increased by because R
starts indexing its arrays at rather than .
Timing shows it takes seconds to compute e(c(0,0,0,6))
; its value is
32.6771634160506
Accumulated floating point roundoff error has destroyed the last two digits (which should be 68
rather than 06
).
e <- function(i) {
#
# Create a data structure to "memoize" the values.
#
`[[<-.AA` <- function(x, i, value) {
class(x) <- NULL
x[[paste(i, collapse=",")]] <- value
class(x) <- "AA"
x
}
`[[.AA` <- function(x, i) {
class(x) <- NULL
x[[paste(i, collapse=",")]]
}
E <- list()
class(E) <- "AA"
#
# Define the "." operation.
#
`%.%` <- function(i, j) {
i[j+1] <- i[j+1]-1
i[j] <- i[j] + 1
return(i)
}
#
# Define a recursive version of this function.
#
e. <- function(j) {
#
# Detect initial conditions and return initial values.
#
if (min(j) < 0 || sum(j[-1])==0) return(0)
#
# Look up the value (if it has already been computed).
#
x <- E[[j]]
if (!is.null(x)) return(x)
#
# Compute the value (for the first and only time).
#
d <- sum(j)
n <- length(j) - 1
x <- (d + sum(sapply(1:n, function(i) j[i+1]*e.(j %.% i))))/(d - j[1])
#
# Store the value for later re-use.
#
E[[j]] <<- x
return(x)
}
#
# Do the calculation.
#
e.(i)
}
e(c(0,0,0,6))
Finally, here is the original Mathematica implementation that produced the exact answer. The memoization is accomplished via the idiomatic e[i_] := e[i] = ...
expression, eliminating almost all the R
preliminaries. Internally, though, the two programs are doing the same things in the same way.
shift[j_, x_List] /; Length[x] >= j >= 2 := Module[{i = x},
i[[j - 1]] = i[[j - 1]] + 1;
i[[j]] = i[[j]] - 1;
i];
e[i_] := e[i] = With[{i0 = First@i, d = Plus @@ i},
(d + Sum[If[i[[k]] > 0, i[[k]] e[shift[k, i]], 0], {k, 2, Length[i]}])/(d - i0)];
e[{x_, y__}] /; Plus[y] == 0 := e[{x, y}] = 0
e[{0, 0, 0, 6}]
The original version of this question started life by asking:
how many rolls are needed until each side has appeared 3 times
Of course, that is a question that does not have an answer as @JuhoKokkala commented above: the answer is a random variable with a distribution that needs to be found. The question was then modified to ask: "What is the expected number of rolls." The answer below seeks to answer the original question posed: how to find the distribution of the number of rolls, without using simulation, and just using conceptually simple techniques any New Zealand student with a computer could implement Why not? The problem reduces to a 1-liner.
Distribution of the number of rolls required ... such that each side appears 3 times
We roll a die times. Let denote the number of times side of the die appears, where . Then, the joint pmf of is i.e.:
Let: Then the cdf of is:
i.e. To find the cdf , simply calculate for each value of :
Here, for example, is Mathematica code that does this, as increases from 18 to say 60. It is basically a one-liner:
cdf = ParallelTable[
Probability[x1 >= 3 && x2 >= 3 && x3 >= 3 && x4 >= 3 && x5 >= 3 && x6 >= 3,
{x1, x2, x3, x4, x5, x6} \[Distributed] MultinomialDistribution[n, Table[1/6, 6]]],
{n, 18, 60}]
... which yields the exact cdf as increases:
Here is a plot of the cdf , as a function of :
To derive the pmf , simply first difference the cdf:
Of course, the distribution has no upper bound, but we can readily solve here for as many values as practically required. The approach is general and should work just as well for any desired combination of sides required.