What’s it all about, Alfie?

Lecture 24

John Zito

Duke University
STA 199 Spring 2025

2025-04-22

Course admin

While you wait

  • Go to your ae project in RStudio.

  • Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.

  • Click Pull to get today’s application exercise file: ae-20-final-review.qmd.

  • Wait until you’re prompted to work on the application exercise during class before editing the file.

Projects due tomorrow at 5PM

  • Written report in the index file;
  • Code chunks should not be displayed;
  • Five minute video presentation;
  • It needs to be obvious that all team members participated.

Final exam next Tuesday 4/29 at 9AM

  • Roughly the same length as a midterm;
  • You have three hours to complete it;
  • You get both sides of an 8.5” x 11” sheet of notes;
  • All multiple choice;
  • Room assignments emailed to you;
  • 20% of your final course grade;
  • Replaces a lower in-class midterm score;
  • Visit my OHs to view old midterm solutions.

Style of questions

Things you’ve seen that will reappear:

  • Do you understand this picture?
  • Can you debug this incorrect code?
  • What is the correct statistical interpretation of this quantity?

A skill you will need:

Can you picture the intermediate output of a data pipeline.

From Midterm 1…

# A tibble: 6 × 3
  age   opinion                n_people
  <chr> <chr>                     <dbl>
1 18-49 Remain Against The Law       59
2 18-49 Be Made Legal               292
3 18-49 Not Sure                     40
4 50+   Remain Against The Law       67
5 50+   Be Made Legal               245
6 50+   Not Sure                     68

From Midterm 1…

Why does this give an error?

survey_counts |>
  group_by(age) |>
  summarise(
    age_sum = sum(n_people)
  ) |> 
  mutate(
    prop = n_people / sum(n_people)
  )
Error in `mutate()`:
ℹ In argument: `prop = n_people/sum(n_people)`.
Caused by error:
! object 'n_people' not found

How do the intermediate steps look?

survey_counts 
# A tibble: 6 × 3
  age   opinion                n_people
  <chr> <chr>                     <dbl>
1 18-49 Remain Against The Law       59
2 18-49 Be Made Legal               292
3 18-49 Not Sure                     40
4 50+   Remain Against The Law       67
5 50+   Be Made Legal               245
6 50+   Not Sure                     68

How do the intermediate steps look?

survey_counts |>
  group_by(age)
# A tibble: 6 × 3
# Groups:   age [2]
  age   opinion                n_people
  <chr> <chr>                     <dbl>
1 18-49 Remain Against The Law       59
2 18-49 Be Made Legal               292
3 18-49 Not Sure                     40
4 50+   Remain Against The Law       67
5 50+   Be Made Legal               245
6 50+   Not Sure                     68

How do the intermediate steps look?

survey_counts |>
  group_by(age) |>
  summarise(
    age_sum = sum(n_people)
  )
# A tibble: 2 × 2
  age   age_sum
  <chr>   <dbl>
1 18-49     391
2 50+       380

Indeed, n_people is not a column in the data frame that is being piped into mutate, hence the error.

How do the intermediate steps look?

survey_counts |>
  group_by(age) |>
  summarise(
    age_sum = sum(n_people)
  ) |> 
  mutate(
    prop = n_people / sum(n_people)
  )
Error in `mutate()`:
ℹ In argument: `prop = n_people/sum(n_people)`.
Caused by error:
! object 'n_people' not found

The full monty

  • Preparation: importing, joining, reshaping, mutating, etc;

  • Exploration: pictures and summaries;

  • Modeling: linear/logistic regression, model selection, etc.

  • Inference: interval estimation and hypothesis testing.

ae-20-final-review

  • Go to your ae project in RStudio.

  • If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.

  • If you haven’t yet done so, click Pull to get today’s application exercise file: ae-20-final-review.qmd.

  • Work through the application exercise in class, and render, commit, and push your edits.