GitHub.com is an online platform that is commonly used in industry and academia for collaboration. You will use GitHub in your statistics / data science class to complete individual and team assignments. This is the first of two tutorials to walk you through the steps you will need to successfully use GitHub in your classes.
By the end of this tutorial, you will be able to
Below are terms that are commonly used when talking about GitHub.1
Term | Definition |
---|---|
Git | An open source version control software system |
GitHub | A remote commercial hosting service for Git repositories |
Git repository (or repo) | Similar to a project directory or folder in Google Drive, Dropbox, etc. It tracks changes to files. |
commit | Save changes to a local repo |
pull | Update a local repo |
push | Upload local files to the remote repo |
merge conflict | Contradictory changes that cannot be integrated until they are reconciled by a user |
As you begin to use GitHub more, you may also come across terms describing more advanced GitHub actions.
Term | Definition |
---|---|
forking | Create a copy of a repository in your local profile |
pull request | Submit changes to a remote repo |
branching | Keeping multiple snapshots of a repo |
gh-pages | Special branch which allows creation of a webpage from within GitHub |
GitHub actions | Mechanism for continuous integration |
If you do not have a GitHub account, you can register for a new account at http://github.com. Your GitHub repo will contain examples of your work that can be shared as you apply for internships and jobs, so choose a username that you are willing to share with future employers. Also consider a username that will still be relevant after your statistics class (e.g. don’t use Stat101Student2020 as your username 🙃). See Username Advice in Happy Git and GitHub for the R User for more tips on choosing a GitHub username.
Most of your assignments will be done using GitHub and RStudio. You’ll begin with a starter repo on GitHub (most likely provided by your instructor) that contains templates and other materials needed to complete the assignment. You’ll conduct your analysis and type your responses in RStudio and “push” the updates back to the repo on GitHub. In this tutorial, we will focus on the steps to get started on an assignment - specifically cloning a repo and starting a new project in RStudio.
We’ll use the example-assignment repo in the DukeStatSci GitHub organization for this tutorial. You can use this repo to practice the steps in this tutorial; however, note that you cannot push to the repo (we’ll talk more about pushing in the next tutorial).
Go to your course organization on GitHub. The URL for the course organization is provided by your instructor. In this tutorial, the organization is DukeStatSci at www.github.com/DukeStatSci.
Click on the relevant assignment repo. Ours is the example-assignment repo. This contains the starter documents and other materials required for the assignment.
Go to RStudio.
If you are using RStudio through a Docker container
In RStudio, go to File ➡️ New Project ➡️ Version Control ➡️ Git.
Copy and paste the URL of your assignment repo into the dialog box Repository URL. You can leave Project directory name empty. It will default to the name of the GitHub repo.
There is one thing to take care of before you start completing the assignment. You need to configure git so that RStudio can communicate with GitHub. To do so, you will use the use_git_config()
function from the usethis
package.
Type the following lines of code in the console in RStudio filling in your GitHub username and the email address tied to your GitHub account.
library(usethis)
use_git_config(user.name = "your GitHub username", user.email="your email")
If you get the error message
Error in library(usethis) : there is no package called ‘usethis’
then you need to install the usethis
package by running the code below in the console:
install.package("usethis")
library(usethis)
Then, rerun the use_git_config
function with your GitHub username and email address.
That’s it! You’ve cloned a repo, started a new project in RStudio, and configured git. You’re now ready to start your analysis in RStudio!
If you are asked to reenter your password each time you push to GitHub, you can cache your GitHub password so it is saved over a specified period of time. This most likely occurs if you’re using RStudio through RStudio Cloud or RStudio Pro servers provided by the Statistical Science department.
To cache your password, run the following in the Terminal:
This will cache your password for 604800 seconds, i.e. 7 days (60 * 60 * 24 * 7 = 604800). This is generally enough time to complete a lab or homework assignment. You can increase the number of seconds for longer assignments, such as a project that lasts several weeks.
Happy Git and GitHub for the R User by Jenny Bryan, the STAT 545 TAs, and Jim Hester
The instructions from this tutorial were adapted from labs in Data Science in a Box by Mine Çetinkaya-Rundel.
Definitions from Beckman, M., Çetinkaya-Rundel, M., Horton, N., Rundel, C., Sullivan, A., & Tackett, M. Implementing version control with Git as a learning objective in statistics courses. arxiv.org/abs/2001.01988.↩︎