Lecture 3
Duke University
STA 199 Spring 2025
2025-01-21
(Dykes to Watch Out For - 1985)
Film passes if…
Double Indemnity (1944) | 🥴 |
Sunset Boulevard (1950) | 🥴 |
Sweet Smell of Success (1957) | ❌ |
One Hundred and One Dalmatians (1961) | ✅ |
Chinatown (1974) | ❌ |
Amadeus (1984) | ❌ |
Goodfellas (1990) | 🥴 |
Bram Stoker’s Dracula (1992) | ❌ |
The Lord of the Rings (2001 - 2003) | ❌ |
Vera Drake (2004) | ✅ |
From FiveThirtyEight
“We did a statistical analysis of films to test two claims: first, that films that pass the Bechdel test — featuring women in stronger roles — see a lower return on investment, and second, that they see lower gross profits. We found no evidence to support either claim.”
ae-02-bechdel-dataviz
Go to RStudio, confirm that you’re in the ae
project, and open the document ae-02-bechdel-dataviz.qmd
.
. . .
Cell label
s are helpful for describing what the code is doing, for jumping between code cells in the editor, and for troubleshooting
message: false
hides any messages emitted by the code in your rendered document
bechdel
data frame
roi
greater than 400 (gross is more than 400 times budget)
title
, roi
, budget_2013
, gross_2013
, year
, and clean_test
# A tibble: 3 × 6
title roi budget_2013 gross_2013 year clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Paranormal Activity 671. 505595 339424558 2007 dubious
2 The Blair Witch Project 648. 839077 543776715 1999 ok
3 El Mariachi 583. 11622 6778946 1992 nowomen
|>
The pipe operator passes what comes before it into the function that comes after it as the first argument in that function.
|>
|>
+
+
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Start with the bechdel
data frame:
# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a Slave 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day to Die Hard 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Filter for rows where binary
is equal to "PASS"
:
# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
2 About Time 2013 102648667 12000000 8.55 PASS ok
3 Admission 2013 36014634 13000000 2.77 PASS ok
4 American Hustle 2013 397915817 40000000 9.95 PASS ok
5 August: Osage County 2013 87609748 25000000 3.50 PASS ok
6 Beautiful Creatures 2013 75392809 50000000 1.51 PASS ok
7 Blue Jasmine 2013 101793664 18000000 5.66 PASS ok
8 Carrie 2013 120268278 30000000 4.01 PASS ok
9 Despicable Me 2 2013 1338831390 76000000 17.6 PASS ok
10 Elysium 2013 379242208 120000000 3.16 PASS ok
# ℹ 743 more rows
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Arrange the rows in desc
ending order of roi
:
# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 The Blair Witch Project 1999 543776715 839077 648. PASS ok
2 The Devil Inside 2012 157289709 1014639 155. PASS ok
3 My Big Fat Greek Wedding 2002 768922942 6475896 119. PASS ok
4 Chasing Amy 1997 39417963 362810 109. PASS ok
5 Slacker 1991 4200140 39349 107. PASS ok
6 Insidious 2010 164379554 1602348 103. PASS ok
7 Paranormal Activity 2 2010 280159759 3204696 87.4 PASS ok
8 Paranormal Activity 3 2011 322170936 5178454 62.2 PASS ok
9 The Last Exorcism 2010 118787648 1922817 61.8 PASS ok
10 Cinderella 1997 246710482 4208591 58.6 PASS ok
# ℹ 743 more rows
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Select columns title
and roi
:
# A tibble: 753 × 2
title roi
<chr> <dbl>
1 The Blair Witch Project 648.
2 The Devil Inside 155.
3 My Big Fat Greek Wedding 119.
4 Chasing Amy 109.
5 Slacker 107.
6 Insidious 103.
7 Paranormal Activity 2 87.4
8 Paranormal Activity 3 62.2
9 The Last Exorcism 61.8
10 Cinderella 58.6
# ℹ 743 more rows
Build cakes (ggplot
)
Stack dolls (pipe |>
)
Master these constructs, and everything will be coming up roses!