It is assumed that you have watched the videos contained within Preparation 1.
In this activity, you will:
- Fork a GitHub repository.
- Connect GitHub and RStudio and share files between these two tools (i.e., “clone” and “pull”).
- Edit and view changes in narrative text and R code chunks in an RMarkdown document.
- Create a branch within a GitHub repository.
- Create and resolve a Pull Request between you and yourself.
In this activity, we begin collaborating on coding projects with ourselves. However, we can collaborate on projects with colleagues, online acquaintances, or, simply, you and future you. Version control is one way to keep track of what a person did and when, which is much better than the system that I used through most of my studies:
The Google Workspace (Drive, Docs, etc.) is a spectacular tool where people collaborate on many different types of projects. Google even has a collaborating on coding notebooks (i.e., Google Colab); however, it is lacking in some programming language options. We can think of GitHub as a Google Drive for collaborating on coding/software projects. Git would then be like Version History for a Google Doc.
Version control systems record the changes that are made each step of a project. This allows you to rewind to start at a previous version and play back through changes you made to eventually arrive at the most recent version. Think back on what you did in your work for Preparation 1.
Each time you make a change, you can (and should), provide a brief, informative commit message. A commit message is additional information that can be used to help keep track of what changes were made. There are many opinions on what needs to be included in commit messages. My recommendation is to use active terms and be brief, but say enough as to why this change was needed. Note that I do not always follow this recommendation (I get lazy too), but future me really appreciates it when current me is thoughtful 😄
Recall from Activity 0 that you made a personal copy of my repository to your GitHub area. If you forgot how to do that, follow the directions in Task 1 except for this activity - Activity 1.
Planned Pause Point: If you have any questions, contact Bradford or another group.From your Preparation 1, you saw that if you (individually) made changes
to a branched version of your document, then merged those back into your
main
branch, there were no issues with the merge. In fact, two people
could be editing the same file (on the same branch) and commit changes
with no issues occurring as long as the edits do not happen on the same
line.
When resolving merge conflicts, we will be able to see each version of the file (along with these changes and the files, useful metadata about what changed, the complete history of committed changes make up a repository). This allows us to decide which changes we want to keep for the next version of the file. We can then keep these repositories in sync across different computers to help facilitate collaboration among different people (or versions of ourselves).
For the remainder of this activity, you will be working with your neighbor(s). Remember to introduce yourself!
I think it is too early to worry about merge conflicts as we are trying to simply become familiar with GitHub. Therefore, we will circle back to this after we update our STA 418/518 Workflow - focusing on collaborating with current and future you. So far our workflow has been:
- Go to Bradford’s repository, and
- Fork a copy of Bradford’s repository into your GitHub space.
In a bit, we will see how to bring this repository into RStudio so that we can edit and work with the R programming language!
Before we get into updating our Workflow, let’s see what typical file
types look like in GitHub. There are a five versions of the same
document within the docs/
directory/folder of this repo:
day1.md
,day1.html
,day1.docx
,day1.pdf
, andday1.Rmd
.
With your neighbor(s), explore how each file looks within your repository (click on them to view them within GitHub). Discuss with your neighbor(s) what is easily viewable and what is not. Keep this in mind as we progress through this semester. That is, what do you need to include so that people can view your work as you intended?
Jenny Bryan provides some great information on repo browsability. Throughout this course, we will use my opinionated method of repo organization.
When doing work on activities in this course, our Workflow is going to be:
- Go to Bradford’s repository;
- Fork a copy of Bradford’s repository into your GitHub space;
- Clone your GitHub repository to your RStudio session;
- Do work in RStudio: save, stage, provide a commit message, commit, push; and
- Continue until done.
Before we can do this, we need to get RStudio and GitHub communicating.
-
Login to the RStudio Workbench using your GVSU username and password,
-
Verify that you are in an RStudio session (i.e., not the RStudio Workbench Sessions/Project screen).
There are a couple of ways to configure Git in RStudio. For STA 418/518, we will use
{usethis}
.Note that when you see something like
{thing}
in my documents, I mean,“the R package calledthing
.”We will use the
edit_git_config
function from{usethis}
. Throughout the semester, I will shorten this to beusethis::edit_git_config
or, “from the R packageusethis
, use the functionedit_git_config
” (in general,package::function
). -
In your Console pane (left-hand pane), type the following pressing Enter/Return after each line:
library(usethis) edit_git_config()
A file should open in the pane above your Console that is called
.gitconfig
. In this file, copy the information provided below, then update it with your preferred name (or pseudonym) and email (can be any email, but it would probably be helpful to use the same one you signed up for GitHub to avoid confusion). This information will be publicly available.[user] name = "name" email = "[email protected]" [credential] helper = cache --timeout=10000000
Note that the lines begining with a left square bracket (
[
) start at the left most entry position and the lines that do not begin with a left square bracket have two spaces, then the information.Also note that we are instructing RStudio to remember your GitHub credentials for 10,000,000 seconds (or roughly 16 weeks) in the
[credential]
portion. RStudio is not remembering your credentials yet, but we will resolve this shortly.
Now that you have Git set up within RStudio, we can enable RStudio and
GitHub to communicate. To do this, we will need your GitHub username and
a Personal Access Token (PAT). PATs are designed to be more secure than
your password when communicating between your computer session and
GitHub. Conveniently, {usethis}
has a function for this!
-
To create a PAT, type the following in your Console and press Enter/Return:
create_github_token()
Note that you previously loaded
{usethis}
(usinglibrary(usethis)
) so I did not ask you to do this again. Once you load a package in your current RStudio session, you do not need to load it again. -
You will be directed to a “New personal access token” page on GitHub in your browser. Since I work on multiple machines (i.e., my personal laptop, my work laptop, my personal desktop, and the RStudio Workspace), I like to provide a unique name for each PAT. For example, in the Note text field, I called this token “GVSU RStudio Workbench”.
Most of the other options you will accept the default selections. However, you might want to change the Expiration date. A couple of suggestions: have this PAT expire at the end of this semester (e.g., August 10, 2022) or (risky) choose “No expiration”. After choosing a PAT expiration, scroll down and click on the green Generate Token button.
-
After clicking on Generate Token, you will be taken to a “Personal access tokens” page that has a seemingly random string presented to you beginning with
ghp_
. Keep this page open for a little bit (I will tell you when it is safe to close it), as once you close this page, you will never be able to view this PAT again! I highly recommend that you store this code somewhere safe (e.g., a password manager tool). However, if you do lose it, you can go through this process again to create a new one. -
Now we need to associate this PAT in RStudio so that RStudio can connect to your GitHub account. Back in your RStudio Console, type the following and press Enter/Return after each line:
library(gitcreds) gitcreds_set()
-
In your Console you will be asked to
? Enter password or token:
Paste your PAT here (NOT your GitHub password) and press Enter/Return. You should see a message similar to:-> Adding new credentials... -> Removing credentials from cache... -> Done.
Now that RStudio and GitHub are connected, we can clone this repo (i.e., copy the repo to our RStudio session)!
- In RStudio, click on the icon (the icon below the Edit drop-down menu);
- Click on Version Control on the New Project Wizard pop-up;
- Click on Git and you should be on a “Clone Git Repository” page;
- Back to your
activity01-rmarkdown
GitHub repo, click on the green Code button near the top of the page; - Verify that HTTPS is underlined in red on the pop-down, then copy the URL provided;
- Back in RStudio, paste the URL in the “Repository URL” text field;
- The “Project directory name” text field should have automatically
populated with
activity01-rmarkdown
. If yours did not, click into this box and press Ctrl/Cmd (usually this is a Mac issue); - In the “Create project as subdirectory of” field, click on Browse…. Create a New Folder called “STA 418” of “STA 518” (depending on your course), then within this folder, create a New Folder called “Activities”, think click Choose. Note that I am forcing you to use my file system management style.
- Click on Create Project.
Your screen should refresh and the Files pane should say that you
are currently in your activity01-rmarkdown
folder that currently has
three files (.gitignore
, activity01-rmarkdown.Rproj
, and
README.md
) and a folder (README-img/
). If you are asked for your
GitHub credentials, provide your GitHub username and your PAT.
For the rest of this class period, you will complete the RMarkdown
document (activity01-bechdel-test.Rmd
) with your group members.
Bradford will continue circling through the groups and be available to
help when needed.
- Verify that you are in your forked version of this rep - your page
title should be
username/activity01-rmarkdown
, whereusername
is replaced with your GitHub username. - In the main repo page on GitHub, click on the green Code button. Verify that HTTPS is underlined in red on the pop-down, then copy the URL provided.
- In RStudio, click on the icon (the icon below the Edit drop-down menu),
- Click on Version Control on the New Project Wizard pop-up,
- Click on Git and you should be on a “Clone Git Repository” page,
- Paste the URL in the “Repository URL” text field,
- The “Project directory name” text field should have automatically
populated with
activity0101-rmarkdown
. If yours did not, click into this box and press Ctrl/Cmd (usually this is a Mac issue); - In the “Create project as subdirectory of” field, click on Browse…. Navigate into your “STA 418” of “STA 518” (depending on what you created in Preparation 2), then within this folder, create a New Folder called “Activities”, think click Choose. Note that I am forcing you to use my file system management style.
- Click on Create Project.
You are probably currently in the main
branch (the drop-down menu next
to the icon
in the Git pane). We will continue working within your main branch,
but I want you to notice that you can create new branches within
RStudio!
In the Files pane, click on the activity01-bechdel-test.Rmd
.
Update author: "Name"
to your name.
Do not continue in this README document until after your group has
completed the .Rmd
file, then stage, commit, and push to this to your
repo.
YOU DID IT!
Next: Activity 2 will cover creating visual representations of data.
This document is based on David Keyes’s tutorial at R for the Rest of Us, Happy Git with R by Jenny Bryan et al., The Coding Train’s GitHub YouTube video series, and The Carpentries’ Git training.