Git
Core Module
Proper collaboration with other people will require that you can work on the same codebase in an organized manner. This is the reason that version control exists. Simply stated, it is a way to keep track of:
- Who made changes to the code
- When did the change happen
- What changes were made
For a full explanation, please see this page.
Secondly, it is important to note that GitHub is not git! GitHub is the dominating player when it comes to hosting repositories but that does not mean that they are the only one providing free repository hosting (see bitbucket or gitlab) for some other examples.
That said, we will be using git and GitHub throughout this course. It is a requirement for passing this course that you create a public repository with your code and use git to upload any code changes. How much you choose to integrate this into your own projects depends, but you are at least expected to be familiar with git+GitHub.
Initial config
What does Git stand for?
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- Random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- Stupid. Contemptible and Despicable. simple. Take your pick from the dictionary of slang.
- "Global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "Goddamn idiotic truckload of sh*t": when it breaks
-
Install git on your computer and make sure that your installation is working by writing
git help
in a terminal and it should show you the help message for git. -
Create a GitHub account if you do not already have one.
-
To make sure that we do not have to type in our GitHub username every time that we want to do some changes, we can once and for all set them on our local machine
Git overview
The most simple way to think of version control is that it is just nodes with lines connecting them:
Each node, which we call a commit, is uniquely identified by a hash string. Each node stores what our code looked like at that point in time (when we made the commit) and using the hash codes we can easily revert to a specific point in time.
The commits are made up of local changes that we make to our code. A basic workflow for adding commits are seen below:
Assuming that we have made some changes to our local working directory and that we want to get these updates to be online in the remote repository we have to do the following steps:
-
First we run the command
git add
. This will move our changes to the staging area. While changes are in the staging area we can very easily revert them (usinggit restore
). There have therefore not been assigned a unique hash to the code yet, and we can therefore still overwrite it. -
To take our code from the staging area and make it into a commit, we simply run
git commit
which will locally add a note to the graph. It is important again, that we have not pushed the commit to the online repository yet. -
Finally, we want others to be able to use the changes that we made. We do a simple
git push
and our commit gets online
Of course, the real power of version control is the ability to make branches, as in the image below
Each branch can contain code that is not present on other branches. This is useful when you are many developers working together on the same project.
❔ Exercises
-
In your GitHub account create an repository, where the intention is that you upload the code from the final exercise from yesterday
-
After creating the repository, clone it to your computer
-
Move/copy the three files from yesterday into the repository (and any other that you made)
-
Add the files to a commit by using
git add
command -
Commit the files using
git commit
command where you use the-m
argument to provide a commit message (1).- Writing good commit message is a skill in itself. A commit message should be short but informative about the work you are trying to commit. Try to practise writing good commit messages throughout the course. You can see this guideline for help.
-
Finally push the files to your repository using
git push
. Make sure to check online that the files have been updated in your repository. -
You can always use the command
git status
to check where you are in the process of making a commit. -
Also checkout the
git log
command, which will show you the history of commits that you have made.
-
-
Make sure that you understand how to make branches, as this will allow you to try out code changes without messing with your working code. Creating a new branch can be done using:
Afterwards, you can use
git checkout
(1) to change between branches (remember to commit your work!) Try adding something (a file, a new line of code etc.) to the newly created branch, commit it and try changing back to master afterwards. You should hopefully see whatever you added on the branch is not present on the main branch.- The
git checkout
command is used for a lot of different things in git. It can be used to change branches, to revert changes and to create new branches. An alternative is usinggit switch
andgit restore
which are more modern commands.
- The
-
If you do not already have a cloned version of this repository belonging to the course, make sure to make one! I am continuously updating/changing some of the material during the course and I therefore recommend that you each day before the lecture do a
git pull
on your local copy -
Git may seem like a waste of time when solutions like dropbox, google drive etc exist, and it is not completely untrue when you are only one or two working on a project. However, these file management systems fall short when hundreds to thousands of people work together. For this exercise you will go through the steps of sending an open-source contribution:
-
Go online and find a project you do not own, where you can improve the code. You can either look at this page of good issues to get started with or for simplicity you can just choose the repository belonging to the course. Now fork the project by clicking the Fork button.
This will create a local copy of the repository which you have complete writing access to. Note that code updates to the original repository do not update code in your local repository.
-
Clone your local fork of the project using
git clone
. -
As default your local repository will be on the
main branch
(HINT: you can check this with thegit status
command). It is good practice to make a new branch when working on some changes. Use thegit branch
command followed by thegit checkout
command to create a new branch. -
You are now ready to make changes to the repository. Try to find something to improve (any spelling mistakes?). When you have made the changes, do the standard git cycle:
add -> commit -> push
-
Go online to the original repository and go to the
Pull requests
tab. Findcompare
button and choose the button to compare themaster branch
of the original repo with the branch that you just created in your own repository. Check the diff on the page to make sure that it contains the changes you have made. -
Write a bit about the changes you have made and click
Create pull request
:)
-
-
Forking a repository has the consequence that your fork and the repository that you forked can diverge. To mitigate this we can set what is called an remote upstream. Take a look on this page , and set a remote upstream for the repository you just forked.
-
After setting the upstream branch, we need to pull and merge any update. Take a look on this page and figure out how to do this.
-
As a final exercise we want to simulate a merge conflict, which happens when two users try to commit changes to exactly the same lines of code in the codebase, and git is not able to resolve how the different commits should be integrated.
-
In your browser, open your favorite repository (it could be the one you just worked on), go to any file of your choosing and click the edit button (see image below) and make some change to the file. For example, if you choose a Python file you can just import some random packages at the top of the file. Commit the change.
-
Make sure not to pull the change you just made to your local computer. Locally make changes to the same file in the same lines and commit them afterwards.
-
Now try to
git pull
the online changes. What should (hopefully) happen is that git will tell you that it found a merge conflict that needs to be resolved. Open the file and you should see something like this<<<<<<< HEAD this is some content to mess with content to append ======= totally different content to merge later >>>>>>> master
this should be interpreted as: everything that's between
<<<<<<<
and=======
are the changes made by your local commit and everything between=======
and>>>>>>>
are the changes you are trying to pull. To fix the merge conflict you simply have to make the code in the two "cells" work together. When you are done, remove the identifiers<<<<<<<
,=======
and>>>>>>>
. -
Finally, commit the merge and try to push.
-
-
(Optional) The above exercises have focused on how to use git from the terminal, which I highly recommend learning. However, if you are using a proper editor they also have build in support for version control. We recommend getting familiar with these features (here is a tutorial for VS Code)
🧠 Knowledge check
-
How do you know if a certain directory is a git repository?
Solution
You can check if there is a ".git" directory. Alternative you can use the
git status
command. -
Explain what the file
gitignore
is used for?Solution
The file
gitignore
is used to tell git which files to ignore when doing agit add .
command. This is useful for files that are not part of the codebase, but are needed for the code to run (e.g. data files) or files that contain sensitive information (e.g..env
files that contain API keys and passwords). -
You have two branches - main and devel. What sequence of commands would you need to execute to make sure that devel is in sync with main?
-
What best practices are you familiar with regarding version control?
Solution
- Use a descriptive commit message
- Make each commit a logical unit
- Incorporate others' changes frequently
- Share your changes frequently
- Coordinate with your co-workers
- Don't commit generated files
That covers the basics of git to get you started. In the exercise folder you can find a git cheat sheet with the most useful commands for future reference. Finally, we want to point out another awesome feature of GitHub: in browser editor. Sometimes you have a small edit that you want to make, but still would like to do this in a IDE/editor. Or you may be in the situation where you are working from another device than your usual developer machine. GitHub has an built-in editor that can simply be enabled by changing any URL from
to
Try it out on your newly created repository.