AE 06: Sales taxes + data joining
Suggested answers
These are suggested answers. This document should be used as a reference only; it’s not designed to be an exhaustive key.
Getting started
Packages
We’ll use the tidyverse package for this analysis.
Data
The data are available in the data
folder:
sales-taxes.csv
us-regions.csv
And let’s take a look at the data.
glimpse(sales_taxes)
Rows: 51
Columns: 5
$ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Califo…
$ state_tax_rate <dbl> 0.0400, 0.0000, 0.0560, 0.0650, 0.0725, 0.0290, 0.0…
$ avg_local_tax_rate <dbl> 0.0529, 0.0182, 0.0278, 0.0295, 0.0160, 0.0491, 0.0…
$ combined_rate <dbl> 0.0929, 0.0182, 0.0838, 0.0945, 0.0885, 0.0781, 0.0…
$ max_local_tax_rate <dbl> 0.0750, 0.0785, 0.0530, 0.0613, 0.0475, 0.0830, 0.0…
glimpse(us_regions)
Rows: 50
Columns: 2
$ state_name <chr> "Maine", "New Hampshire", "Vermont", "Massachusetts", "Rhod…
$ region <chr> "Northeast", "Northeast", "Northeast", "Northeast", "Northe…
Sales tax in swing states: if_else
Create new swing_state
variable using if_else
:
list_of_swing_states <- c("Arizona", "Georgia", "Michigan", "Nevada",
"North Carolina", "Pennsylvania", "Wisconsin")
sales_taxes <- sales_taxes |>
mutate(
swing_state = if_else(state %in% list_of_swing_states,
"Swing",
"Non-swing")) |>
relocate(swing_state)
sales_taxes
# A tibble: 51 × 6
swing_state state state_tax_rate avg_local_tax_rate combined_rate
<chr> <chr> <dbl> <dbl> <dbl>
1 Non-swing Alabama 0.04 0.0529 0.0929
2 Non-swing Alaska 0 0.0182 0.0182
3 Swing Arizona 0.056 0.0278 0.0838
4 Non-swing Arkansas 0.065 0.0295 0.0945
5 Non-swing California 0.0725 0.016 0.0885
6 Non-swing Colorado 0.029 0.0491 0.0781
7 Non-swing Connecticut 0.0635 0 0.0635
8 Non-swing Delaware 0 0 0
9 Non-swing Florida 0.06 0.01 0.07
10 Swing Georgia 0.04 0.0338 0.0738
# ℹ 41 more rows
# ℹ 1 more variable: max_local_tax_rate <dbl>
Summarize to find the mean sales tax in each type of state:
Sales tax in coastal states: case_when
Create new coast
variable using case_when
:
pacific_coast <- c("Alaska", "Washington", "Oregon", "California", "Hawaii")
atlantic_coast <- c(
"Connecticut", "Delaware", "Georgia", "Florida", "Maine", "Maryland",
"Massachusetts", "New Hampshire", "New Jersey", "New York",
"North Carolina", "Rhode Island", "South Carolina", "Virginia"
)
sales_taxes <- sales_taxes |>
mutate(
coast = case_when(
state %in% pacific_coast ~ "Pacific",
state %in% atlantic_coast ~ "Atlantic",
.default = "Neither"
)
) |>
relocate(coast)
sales_taxes
# A tibble: 51 × 7
coast swing_state state state_tax_rate avg_local_tax_rate combined_rate
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Neither Non-swing Alabama 0.04 0.0529 0.0929
2 Pacific Non-swing Alaska 0 0.0182 0.0182
3 Neither Swing Arizona 0.056 0.0278 0.0838
4 Neither Non-swing Arkansas 0.065 0.0295 0.0945
5 Pacific Non-swing Califor… 0.0725 0.016 0.0885
6 Neither Non-swing Colorado 0.029 0.0491 0.0781
7 Atlantic Non-swing Connect… 0.0635 0 0.0635
8 Atlantic Non-swing Delaware 0 0 0
9 Atlantic Non-swing Florida 0.06 0.01 0.07
10 Atlantic Swing Georgia 0.04 0.0338 0.0738
# ℹ 41 more rows
# ℹ 1 more variable: max_local_tax_rate <dbl>
Summarize to find the mean sales tax in each type of state:
Sales tax in US regions: joining
Join the sales tax data with region data and save the joined data frame as a new data frame, not overwriting either data frame that goes into the join.
sales_taxes_regions <- sales_taxes |>
full_join(us_regions,
by = join_by(state == state_name)) |>
relocate(region)
sales_taxes_regions
# A tibble: 51 × 8
region coast swing_state state state_tax_rate avg_local_tax_rate
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 South Neither Non-swing Alabama 0.04 0.0529
2 West Pacific Non-swing Alaska 0 0.0182
3 West Neither Swing Arizona 0.056 0.0278
4 South Neither Non-swing Arkansas 0.065 0.0295
5 West Pacific Non-swing California 0.0725 0.016
6 West Neither Non-swing Colorado 0.029 0.0491
7 Northeast Atlantic Non-swing Connecticut 0.0635 0
8 South Atlantic Non-swing Delaware 0 0
9 South Atlantic Non-swing Florida 0.06 0.01
10 South Atlantic Swing Georgia 0.04 0.0338
# ℹ 41 more rows
# ℹ 2 more variables: combined_rate <dbl>, max_local_tax_rate <dbl>
Calculate the average sales tax of states in each region. What is surprising in the output?
# A tibble: 5 × 2
region mean_state_tax
<chr> <dbl>
1 Midwest 0.0569
2 Northeast 0.0530
3 South 0.0523
4 West 0.0416
5 <NA> 0.06
Identify the state with NA
for region.
# A tibble: 1 × 1
state
<chr>
1 District of Columbia
Apply a fix for the NA
in region, and calculate the mean sales taxes for regions again. Display the results in ascending order of mean sales tax.
Render, commit, and push
Render your Quarto document.
Go to the Git pane and check the box next to each file listed, i.e., stage your changes. Commit your staged changes using a simple and informative message.
Click on push (the green arrow) to push your changes to your application exercise repo on GitHub.
Go to your repo on GitHub and confirm that you can see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!