Based on Chapter 8 of ModernDive. Code for Quiz 12.
What is the average age of members that have served in congress?
congress_age and assign it to congress_age_100set.seed(123)
congress_age_100 <- congress_age %>%
rep_sample_n(size=100)
congress_age is the population and congress_age_100 is the sample
18,635 is number of observations in the population and 100 is the number of observations in your sample
Construct the confidence interval
1. Use specify to indicate the variable from congress_age_100 that you are interested in
Response: age (numeric)
# A tibble: 100 × 1
age
<dbl>
1 53.1
2 54.9
3 65.3
4 60.1
5 43.8
6 57.9
7 55.3
8 46
9 42.1
10 37
# … with 90 more rows
2. Use generate 1000 replicates of your sample of 100
Response: age (numeric)
# A tibble: 100,000 × 2
# Groups: replicate [1,000]
replicate age
<int> <dbl>
1 1 42.1
2 1 71.2
3 1 45.6
4 1 39.6
5 1 56.8
6 1 71.6
7 1 60.5
8 1 56.4
9 1 43.3
10 1 53.1
# … with 99,990 more rows
3. calculate the mean for each replicate
bootstrap_distribution_mean_agebootstrap_distribution_mean_agebootstrap_distribution_mean_age <- congress_age_100 %>%
specify(response = age) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "mean")
bootstrap_distribution_mean_age
Response: age (numeric)
# A tibble: 1,000 × 2
replicate stat
<int> <dbl>
1 1 53.6
2 2 53.2
3 3 52.8
4 4 51.5
5 5 53.0
6 6 54.2
7 7 52.0
8 8 52.8
9 9 53.8
10 10 52.4
# … with 990 more rows
4. Visualize the bootstrap distribution
visualize(bootstrap_distribution_mean_age)

Calculate the 95% confidence interval using the percentile method
congress_ci_percentilecongress_ci_percentilecongress_ci_percentile <- bootstrap_distribution_mean_age %>%
get_confidence_interval(type = "percentile", level = 0.95)
congress_ci_percentile
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 51.5 55.2
Calculate the observed point estimate of the mean and assign it to obs_mean_age
obs_mean_ageobs_mean_age <- congress_age_100 %>%
specify(response = age) %>%
calculate(stat = "mean") %>%
pull()
obs_mean_age
[1] 53.36
Shade the confidence interval
Add a line at the observed mean, obs_mean_age, to your visualization and color it “hotpink”
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1 )

Calculate the population mean to see if it is in the 95% confidence interval
Assign the output to pop_mean_age
Display pop_mean_age
[1] 53.31373
pop_mean_age, to the plot color it “purple”visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1) +
geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)

Save previous plot to preview.png and add to the yaml chunk at the top
Change set.seed(123) to set.seed(4346). Rerun all the code.
When you change the seed is the population mean in the 95% confidence interval constructed using the bootstrap distribution? no
If you construct 100 95% confidence intervals approximately how many do you expect will contain the population mean? 95