Version control
Proper collaboration with other people will require that you work on the same codebase in an organized manner. This is the reason that version control exists. Simply stated, it is a way to keep track of:
- Who made changes to the code
- When did the change happen
- What changes were made
For a full explanation please see this page
Secondly, it is important to note that GitHub is not git! GitHub is the dominating player when it comes to hosting repositories but that does not mean that they are the only one providing free repository hosting (see bitbucket or gitlab for some other examples).
That said we will be using git and Github in these exercises.
Initial config
-
Install git on your computer and make sure that your installation is working by writing
git help
in a terminal and it should show you the help message for git. -
Create a github account if you do not already have one.
-
To make sure that we do not have to type in our GitHub username every time that we want to make some changes, we can once and for all set them on our local machine
# type in a terminal git config --global user.email <email> git config --global user.password <password> git config --global credential.helper store
where you provide your email and password. The last line will make sure that you do not have to type in your password every time you want to push something to your repository.
Git Overview
The most simple way to think of version control is that it is just nodes with lines connecting them
Each node, which we call a commit is uniquely identified by a hash string. Each node stores what our code looked like then (when we made the commit) and using the hash codes we can easily revert to a specific point in time.
The commits are made up of local changes that we make to our code. A basic workflow for adding commits is seen below
Assuming that we have made some changes to our local working directory and that we want to get these updates to be online in the remote repository we have to do the following steps:
-
First, we run the command
git add
. This will move our changes to the staging area. While changes are in the staging area we can very easily revert them (usinggit restore
). There has therefore not been assigned a unique hash to the code yet, and we can therefore still overwrite it. -
To take our code from the staging area and make it into a commit, we simply run
git commit
which will locally add a note to the graph. It is important again, that we have not pushed the commit to the online repository yet. -
Finally, we want others to be able to use the changes that we made. We do a simple
git push
and our commit gets online
Of course, the real power of version control is the ability to make branches, as in the image below
Each branch can contain code that is not present on other branches. This is useful when you are many developers working together on the same project.
Exercises
-
In your GitHub account create a repository, where the intention is to upload and version control a script
-
After creating the repository, clone it to your computer
-
-
Create/Copy/Move the
simple_classifier.py
file and therequirements.txt
file from the last virtual environment exercises into this repository. -
Add the files to a commit by using
git add
command -
Commit the files using
git commit
-
Finally, push the files to your repository using
git push
. Make sure to check online that the files have been updated in your repository. -
You can always use the command
git status
to check where you are in the process of making a commit. -
Make sure that you understand how to make branches, as this will allow you to try out code changes without messing with your working code. Creating a new branch can be done using:
Afterwards, you can use
git checkout
to change between branches (remember to commit your work!) Try adding something (a file, a new line of code etc.) to the newly created branch, commit it and try changing back to master afterward. You should hopefully see whatever you added on the branch is not present on the main branch. -
Try to make a couple of commits to either your newly created branch or your main branch, let's say at least two. Afterwards, try executing
which will give you info about the last couple of commits. Try to figure out how you can roll back to a previous commit? Hint: you need to use the
git checkout
command + the commit hash you get fromgit log
. -
As a final exercise, we want to simulate a merge conflict, which happens when two users try to commit changes to the same lines of code in the codebase, and git is not able to resolve how the different commits should be integrated.
-
In your browser, open your repository, go to any file of your choosing and click the edit button (see image below ) and make some changes to the file. For example, if you choose a Python file you can just import some random packages at the top of the file. Commit the change.
-
Make sure not to pull the change you just made to your local computer. Locally make changes to the same file in the same lines and commit them afterward.
-
-
Now try to
git pull
the online changes. What should (hopefully) happen is that git will tell you that it found a merge conflict that needs to be resolved. Open the file and you should see something like this```txt <<<<<<< HEAD this is some content to mess with content to append ======= totally different content to merge later >>>>>>> master ``` this should be interpreted as everything that's between `<<<<<<<` and `=======` are the changes made by your local commit and everything between `=======` and `>>>>>>>` are the changes you are trying to pull. To fix the merge conflict you simply have to make the code in the two "cells" work together. When you are done, remove the identifiers `<<<<<<<`, `=======` and `>>>>>>>`.
- Finally, commit the merge and try to push.