This was something I thought about:
- If you have data from a coin with a very low probability of success, it is possible that the Wald confidence interval can contain negative numbers and be illogical? But if you do bootstrap, the ci should never be illogical, i.e. worse case scenario the lower bound will always be exactly 0:
I verified this in R:
set.seed(123)
p_true <- 0.001
n <- 1000
n_sims <- 100
n_boot <- 100
wald_negative <- 0
bootstrap_negative <- 0
for(i in 1:n_sims) {
x <- rbinom(n, 1, p_true)
successes <- sum(x)
p_hat <- successes / n
se <- sqrt(p_hat * (1 - p_hat) / n)
wald_lower <- p_hat - 1.96 * se
wald_upper <- p_hat + 1.96 * se
if(wald_lower < 0) {
wald_negative <- wald_negative + 1
}
boot_estimates <- numeric(n_boot)
for(j in 1:n_boot) {
boot_x <- sample(x, n, replace = TRUE)
boot_estimates[j] <- sum(boot_x) / n
}
boot_lower <- quantile(boot_estimates, 0.025,
names = FALSE)
boot_upper <- quantile(boot_estimates, 0.975,
names = FALSE)
if(boot_lower < 0) {
bootstrap_negative <- bootstrap_negative + 1
}
}
cat("Results from", n_sims, "simulations:\n")
cat("Wald CI negative:", wald_negative,
"times (", round(100 * wald_negative / n_sims, 1),
"%)\n")
cat("Bootstrap CI negative:", bootstrap_negative,
"times (", round(100 * bootstrap_negative / n_sims, 1),
"%)\n")
- If you have data from an exponential random variable with a rate parameter very close to being 0, it is possible that the Wald confidence interval can contain negative numbers and be illogical? But if you do bootstrap, the ci should never be illogical. i.e. worse case scenario the lower bound will always be exactly 0
I also verified this in R:
set.seed(123)
lambda_true <- 0.001
n <- 100
n_sims <- 100
n_boot <- 100
wald_negative <- 0
bootstrap_negative <- 0
for(i in 1:n_sims) {
x <- rexp(n, rate = lambda_true)
lambda_hat <- 1 / mean(x)
se <- lambda_hat / sqrt(n)
wald_lower <- lambda_hat - 1.96 * se
wald_upper <- lambda_hat + 1.96 * se
if(wald_lower < 0) {
wald_negative <- wald_negative + 1
}
boot_estimates <- numeric(n_boot)
for(j in 1:n_boot) {
boot_x <- sample(x, n, replace = TRUE)
boot_estimates[j] <- 1 / mean(boot_x)
}
boot_lower <- quantile(boot_estimates, 0.025,
names = FALSE)
boot_upper <- quantile(boot_estimates, 0.975,
names = FALSE)
if(boot_lower < 0) {
bootstrap_negative <- bootstrap_negative + 1
}
}
cat("Results from", n_sims, "simulations:\n")
cat("True lambda:", lambda_true, "\n")
cat("Wald CI negative:", wald_negative,
"times (", round(100 * wald_negative / n_sims, 1),
"%)\n")
cat("Bootstrap CI negative:", bootstrap_negative,
"times (", round(100 * bootstrap_negative / n_sims, 1),
"%)\n")
If this is true, is the bootstrap CI always more advantageous than the Wald CI? Now with modern computers where simulation is not a problem, wont bootstrapping CI almost always be better (ie avoid illogical problem) if not same as Wald?