AE 06: Sales taxes + data joining

Suggested answers

Important

These are suggested answers. This document should be used as a reference only; it’s not designed to be an exhaustive key.

Getting started

Packages

We’ll use the tidyverse package for this analysis.

Data

The data are available in the data folder:

  • sales-taxes.csv

  • us-regions.csv

sales_taxes <- read_csv("data/sales-taxes.csv")
us_regions <- read_csv("data/us-regions.csv")

And let’s take a look at the data.

glimpse(sales_taxes)
Rows: 51
Columns: 5
$ state              <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Califo…
$ state_tax_rate     <dbl> 0.0400, 0.0000, 0.0560, 0.0650, 0.0725, 0.0290, 0.0…
$ avg_local_tax_rate <dbl> 0.0529, 0.0182, 0.0278, 0.0295, 0.0160, 0.0491, 0.0…
$ combined_rate      <dbl> 0.0929, 0.0182, 0.0838, 0.0945, 0.0885, 0.0781, 0.0…
$ max_local_tax_rate <dbl> 0.0750, 0.0785, 0.0530, 0.0613, 0.0475, 0.0830, 0.0…
glimpse(us_regions)
Rows: 50
Columns: 2
$ state_name <chr> "Maine", "New Hampshire", "Vermont", "Massachusetts", "Rhod…
$ region     <chr> "Northeast", "Northeast", "Northeast", "Northeast", "Northe…

Sales tax in swing states: if_else

Create new swing_state variable using if_else:

list_of_swing_states <- c("Arizona", "Georgia", "Michigan", "Nevada", 
                          "North Carolina", "Pennsylvania", "Wisconsin")

sales_taxes <- sales_taxes |>
  mutate(
    swing_state = if_else(state %in% list_of_swing_states,
                          "Swing",
                          "Non-swing")) |>
  relocate(swing_state)

sales_taxes
# A tibble: 51 × 6
   swing_state state       state_tax_rate avg_local_tax_rate combined_rate
   <chr>       <chr>                <dbl>              <dbl>         <dbl>
 1 Non-swing   Alabama             0.04               0.0529        0.0929
 2 Non-swing   Alaska              0                  0.0182        0.0182
 3 Swing       Arizona             0.056              0.0278        0.0838
 4 Non-swing   Arkansas            0.065              0.0295        0.0945
 5 Non-swing   California          0.0725             0.016         0.0885
 6 Non-swing   Colorado            0.029              0.0491        0.0781
 7 Non-swing   Connecticut         0.0635             0             0.0635
 8 Non-swing   Delaware            0                  0             0     
 9 Non-swing   Florida             0.06               0.01          0.07  
10 Swing       Georgia             0.04               0.0338        0.0738
# ℹ 41 more rows
# ℹ 1 more variable: max_local_tax_rate <dbl>

Summarize to find the mean sales tax in each type of state:

sales_taxes |>
  group_by(swing_state) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 2 × 2
  swing_state mean_state_tax
  <chr>                <dbl>
1 Non-swing           0.0504
2 Swing               0.0546

Sales tax in coastal states: case_when

Create new coast variable using case_when:

pacific_coast <- c("Alaska", "Washington", "Oregon", "California", "Hawaii")

atlantic_coast <- c(
  "Connecticut", "Delaware", "Georgia", "Florida", "Maine", "Maryland", 
  "Massachusetts", "New Hampshire", "New Jersey", "New York", 
  "North Carolina", "Rhode Island", "South Carolina", "Virginia"
)

sales_taxes <- sales_taxes |>
  mutate(
    coast = case_when(
      state %in% pacific_coast ~ "Pacific",
      state %in% atlantic_coast ~ "Atlantic",
      .default = "Neither"
    )
  ) |>
  relocate(coast)

sales_taxes
# A tibble: 51 × 7
   coast    swing_state state    state_tax_rate avg_local_tax_rate combined_rate
   <chr>    <chr>       <chr>             <dbl>              <dbl>         <dbl>
 1 Neither  Non-swing   Alabama          0.04               0.0529        0.0929
 2 Pacific  Non-swing   Alaska           0                  0.0182        0.0182
 3 Neither  Swing       Arizona          0.056              0.0278        0.0838
 4 Neither  Non-swing   Arkansas         0.065              0.0295        0.0945
 5 Pacific  Non-swing   Califor…         0.0725             0.016         0.0885
 6 Neither  Non-swing   Colorado         0.029              0.0491        0.0781
 7 Atlantic Non-swing   Connect…         0.0635             0             0.0635
 8 Atlantic Non-swing   Delaware         0                  0             0     
 9 Atlantic Non-swing   Florida          0.06               0.01          0.07  
10 Atlantic Swing       Georgia          0.04               0.0338        0.0738
# ℹ 41 more rows
# ℹ 1 more variable: max_local_tax_rate <dbl>

Summarize to find the mean sales tax in each type of state:

sales_taxes |>
  group_by(coast) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 3 × 2
  coast    mean_state_tax
  <chr>             <dbl>
1 Atlantic         0.0484
2 Neither          0.0545
3 Pacific          0.0355

Sales tax in US regions: joining

Join the sales tax data with region data and save the joined data frame as a new data frame, not overwriting either data frame that goes into the join.

sales_taxes_regions <- sales_taxes |>
  full_join(us_regions, 
            by = join_by(state == state_name)) |>
  relocate(region)

sales_taxes_regions
# A tibble: 51 × 8
   region    coast    swing_state state       state_tax_rate avg_local_tax_rate
   <chr>     <chr>    <chr>       <chr>                <dbl>              <dbl>
 1 South     Neither  Non-swing   Alabama             0.04               0.0529
 2 West      Pacific  Non-swing   Alaska              0                  0.0182
 3 West      Neither  Swing       Arizona             0.056              0.0278
 4 South     Neither  Non-swing   Arkansas            0.065              0.0295
 5 West      Pacific  Non-swing   California          0.0725             0.016 
 6 West      Neither  Non-swing   Colorado            0.029              0.0491
 7 Northeast Atlantic Non-swing   Connecticut         0.0635             0     
 8 South     Atlantic Non-swing   Delaware            0                  0     
 9 South     Atlantic Non-swing   Florida             0.06               0.01  
10 South     Atlantic Swing       Georgia             0.04               0.0338
# ℹ 41 more rows
# ℹ 2 more variables: combined_rate <dbl>, max_local_tax_rate <dbl>

Calculate the average sales tax of states in each region. What is surprising in the output?

sales_taxes_regions |>
  group_by(region) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 5 × 2
  region    mean_state_tax
  <chr>              <dbl>
1 Midwest           0.0569
2 Northeast         0.0530
3 South             0.0523
4 West              0.0416
5 <NA>              0.06  

Identify the state with NA for region.

sales_taxes_regions |>
  filter(is.na(region)) |>
  select(state)
# A tibble: 1 × 1
  state               
  <chr>               
1 District of Columbia

Apply a fix for the NA in region, and calculate the mean sales taxes for regions again. Display the results in ascending order of mean sales tax.

sales_taxes_regions |>
  mutate(
    region = if_else(state == "District of Columbia", "Northeast", region)
  ) |>
  group_by(region) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 4 × 2
  region    mean_state_tax
  <chr>              <dbl>
1 Midwest           0.0569
2 Northeast         0.0537
3 South             0.0523
4 West              0.0416

Render, commit, and push

  1. Render your Quarto document.

  2. Go to the Git pane and check the box next to each file listed, i.e., stage your changes. Commit your staged changes using a simple and informative message.

  3. Click on push (the green arrow) to push your changes to your application exercise repo on GitHub.

  4. Go to your repo on GitHub and confirm that you can see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!