Lab 0

Hello, World and STA 199!

Lab
End of lab on Wednesday January 8

This lab will walk you through setting up the course technology, which consists of the following:

Version control software like Git and GitHub is indispensable in modern data science practice, and learning how to use it is one of the things that distinguishes STA 199 from a similar course like STA 101. At a minimum, version control helps prevent your files from looking like this:

We’ve all been there, but it’s time to kick the habit.

1. Access R and RStudio

Instead of asking you to download any new programs onto your computer, we host everything pre-packaged for you in the cloud. You just have to login through your web browser, and you’re ready to go:

  • Go to https://cmgr.oit.duke.edu/containers and login with your Duke NetID and Password.
  • Click STA198-199 to log into the Docker container. You should now see the RStudio environment.

Go to https://cmgr.oit.duke.edu/containers and under Reservations available click on reserve STA 198-199 to reserve a container for yourself.

Note

A container is a self-contained instance of RStudio for you, and you alone. You will do all of your computing in your container.

Once you’ve reserved the container you’ll see that it will show up under My reservations.

To launch your container click on it under My reservations, then click Login, and then Start.1

Warning

Please double and triple check that you have reserved the STA 198-199 container and not some other. The various containers differ in the software they include, so if you reserve something else, you may be missing something we need.

2. Create a GitHub account

Go to https://github.com/ and walk through the steps for creating an account. You do not have to use your Duke email address, but I recommend doing so.2

Note

You’ll need to choose a user name. I recommend reviewing the user name advice at https://happygitwithr.com/github-acct#username-advice before choosing a username.

If you already have a GitHub account, you do not need to create a new one for this course. Just log in to that account to make sure you still remember your username and password. If you are unsure of your login credentials, carefully follow GitHub’s instructions for recovering this information. If you accumulate too many failed login attempts, you will be locked out of your account for the day, which will make it difficult for you to complete the rest of this lab.

3. Set up your SSH key

You will authenticate GitHub using SSH (Secure Shell Protocol – it doesn’t really matter what this means for the purpose of this course). Below is an outline of the authentication steps; you are encouraged to follow along as your TA demonstrates the steps.

Note

You only need to do this authentication process one time on a single system.

  • Go back to your RStudio container and type credentials::ssh_setup_github() into your console.
  • R will ask “No SSH key found. Generate one now?” You should click 1 for yes.
  • You will generate a key. R will then ask “Would you like to open a browser now?” You should click 1 for yes.
  • You may be asked to provide your GitHub username and password to log into GitHub. After entering this information, you should paste the key in and give it a name. You might name it in a way that indicates where the key will be used, e.g., sta199).

You can find more detailed instructions here if you’re interested.

4. Configure Git to introduce yourself

Next, you need to configure your git so that RStudio can communicate with GitHub. This requires two pieces of information: your name and email address.

To do so, you will use the use_git_config() function from the usethis package.

Note

You’ll hear about 📦 packages a lot in the context of R – basically they’re how developers write functions and bundle them to distribute to the community (and more on this later too!).

Type the following lines of code in the console in RStudio filling in your name and the address associated with your GitHub account.

usethis::use_git_config(
  user.name = "Your name", 
  user.email = "Email associated with your GitHub account"
)

For example, mine would be

usethis::use_git_config(
  user.name = "John Zito", 
  user.email = "johnczito@gmail.com"
)

I used my gmail because that is the one I used to create my GitHub account. You should also be using the email address you used to create your GitHub account, it’s ok if it isn’t your Duke email.

When you input the usethis::use_git_config(...) command into the console and hit enter/return, it will appear as if nothing happened. To verify that everything worked, you can briefly switch over to the Terminal tab (should be to the right of Console) and type the commands git config user.name and git config user.email. If all is well, these will return whatever text you originally provided. If you notice a mistake or typo, you can just go back to the Console and rerun usethis::use_git_config(...) with modified inputs.

The following video walks you through the steps outlined in the SSH key generation and Git configuration sections above.

5. Baby’s first repo

A GitHub repository (repo) is a collection of files hosted on GitHub. Repos are the main way we will distribute files to you during the semester. When you copy the files in a repo to your local computing environment (your container, in this case), that’s called “cloning”. So, let’s clone our first repo:

  • Go to the course organization at github.com/sta199-s25 organization on GitHub. Click on the repo hello-world.

  • Click on the green CODE button, select Use SSH (this might already be selected by default, and if it is, you’ll see the text Clone with SSH). Click on the clipboard icon to copy the repo URL.

  • In RStudio, go to FileNew ProjectVersion ControlGit.

  • Copy and paste the URL of your assignment repo into the dialog box Repository URL. Again, please make sure to have SSH highlighted under Clone when you copy the address.

  • Click Create Project, and the files from your GitHub repo will be displayed in the Files pane in RStudio.

You will need to get used to these steps, because you’ll probably clone at least one new repo every week.

6. Hello STA 199!

Fill out the course “Getting to know you” survey on Canvas: https://canvas.duke.edu/courses/50057/quizzes/30406.

We will use the information collected in this survey for a variety of goals, from inviting you to the course GitHub organization (more on that later) to getting to know you as a person and your course goals and concerns.

Footnotes

  1. Yes, it’s too many steps. I don’t know why! But it works, and you’ll get used to it. Trust me, it beats downloading and installing everything you need on your computers!↩︎

  2. GitHub has some perks for students you can take advantage of later in the course or in your future work, and it helps to have a .edu address to get verified as a student.↩︎