Metode yang saya gunakan adalah matriks bayangan, di mana dataset terdiri dari variabel indikator di mana 1 diberikan jika nilai hadir, dan 0 jika tidak. Mengkorelasikan ini satu sama lain dan data asli dapat membantu menentukan apakah variabel cenderung hilang bersama (MAR) atau tidak (MCAR). Menggunakan R
contoh (meminjam dari buku "R in action" oleh Robert Kabacoff):
#Load dataset
data(sleep, package = "VIM")
x <- as.data.frame(abs(is.na(sleep)))
#Elements of x are 1 if a value in the sleep data is missing and 0 if non-missing.
head(sleep)
head(x)
#Extracting variables that have some missing values.
y <- x[which(sapply(x, sd) > 0)]
cor(y)
#We see that variables Dream and NonD tend to be missing together. To a lesser extent, this is also true with Sleep and NonD, as well as Sleep and Dream.
#Now, looking at the relationship between the presence of missing values in each variable and the observed values in other variables:
cor(sleep, y, use="pairwise.complete.obs")
#NonD is more likely to be missing as Exp, BodyWgt, and Gest increases, suggesting that the missingness for NonD is likely MAR rather than MCAR.