Virtual environments
Python is a great programming language and this is mostly due to its vast ecosystem of packages. No matter what you want to do, there is probably a package that can get you started. Just try to remember when the last time you wrote a program only using the Python standard library. Probably never. For this reason, we need a way to install third-party packages and this is where package managers come into play.
You have probably already used pip for the longest time, which is the default package manager for Python. pip is
great for beginners but it is missing one essential feature that you will need as a developer or data scientist:
virtual environments. Virtual environments are an essential way to make sure that the dependencies of different
projects do not cross-contaminate each other. As a naive example, consider project A requires torch==1.3.0 and
project B requires torch==2.0, then
cd project_A # move to project A
pip install torch==1.3.0 # install old torch version
cd ../project_B # move to project B
pip install torch==2.0 # install new torch version
cd ../project_A # move back to project A
python main.py # try executing main script from project A
will mean that even though we are executing the main script from project A's folder, it will use torch==2.0 instead of
torch==1.3.0 because that is the last version we installed because in both cases pip will install the package into
the same environment, in this case, the global environment. Instead, if we did something like:
cd project_A # move to project A
python -m venv env # create a virtual environment in project A
source env/bin/activate # activate that virtual environment
pip install torch==1.3.0 # Install the old torch version into the virtual environment belonging to project A
cd ../project_B # move to project B
python -m venv env # create a virtual environment in project B
source env/bin/activate # activate that virtual environment
pip install torch==2.0 # Install new torch version into the virtual environment belonging to project B
cd ../project_A # Move back to project A
source env/bin/activate # Activate the virtual environment belonging to project A
python main.py # Succeed in executing the main script from project A
cd project_A # Move to project A
python -m venv env # Create a virtual environment in project A
.\env\Scripts\activate # Activate that virtual environment
pip install torch==1.3.0 # Install the old torch version into the virtual environment belonging to project A
cd ../project_B # Move to project B
python -m venv env # Create a virtual environment in project B
.\env\Scripts\activate # Activate that virtual environment
pip install torch==2.0 # Install new torch version into the virtual environment belonging to project B
cd ../project_A # Move back to project A
.\env\Scripts\activate # Activate the virtual environment belonging to project A
python main.py # Succeed in executing the main script from project A
then we would be sure that torch==1.3.0 is used when executing main.py in project A because we are using two
different virtual environments. In the above case, we used the venv module
which is the built-in Python module for creating virtual environments. venv+pip is arguably a good combination
but when working on multiple projects it can quickly become a hassle to manage all the different
virtual environments yourself, remembering which Python version to use, which packages to install and so on.
For this reason, several package managers have been created that can help you manage your virtual environments and dependencies, with some of the most popular being:
In these exercises, we are going to be looking at how we can use conda to control dependencies when we are working on
python projects. Many of you may already have conda installed, but most people have never actually used it. The
workflow presented in these exercises for managing dependencies are as follows
- Use
condato create environments - Use
pipto install packages in that environment
It is most likely not the optimal way of doing things but where conda shines over other dependency managers is that it supports all three major operating systems (Windows, OS, Linux) the best. Therefore, it is a great tool for teaching about virtual environments. Additionally, many local compute clusters in universities only allow you to work on the cluster if you use virtual environments through conda.
Exercises
-
Download and install
conda. You are free to either install fullcondaor the much simpler versionminiconda. The core difference between the two packages is thatcondaalready comes with a lot of packages that you would normally have to install withminiconda. The downside is thatcondais a much larger package which can be a huge disadvantage on smaller devices. -
Start a terminal or command prompt and type in
conda helpwhich should show you the help page for the different commands that you can use with conda. If this does not work you probably need to set some system variable to point to the conda installation -
The first important
condacommand iscreatewhich will create a new environmentExecute the command. What does the
-nflag do? What does thepython=3.11flag do?Solution!
The-nflag is used to specify the name of the environment and thepython=3.11flag is used to specify the version of python that should be installed in the environment. In general, you can callconda create --helpto get information about the different flags you can use with thecreatecommand. -
Afterward, use the
conda activatecommand to activate the environment. -
After entering the environment, what
pipcommand should you execute to get a list of all the dependencies already installed in the environment?Solution!
pip freeze -
We are now ready to install some dependencies. Try to get the script
simple_classifier.pyrunning (you can find it here). Essentially, you need to iteratively calland
Until the script runs.
-
The way we usually communicate to other people the requirements needed to run our Python applications/scripts are called
requirement.txtfiles. These files are a simple list of dependencies with the formatWhere X.Y.Z is the particular version of that package. Construct a
requirements.txtfile containing the dependencies you just installed to run the script. Remember to specify the exact version you have used! -
We are often interested in listing only the bare minimum necessary to run our code in the
requirements.txtfile. If you have written more than 2 dependencies in the last exercise, you have too many. Try figuring out what two are strictly necessary to get the application running? -
When you think you have managed to create the file, let's try to test that it works. Execute these four commands:
conda create -y -n "newenv" python=3.11 conda activate newenv pip install -r requirements.txt python simple_classifier.pyMake sure you understand what the four commands does. If it completes without errors, congratulations on creating your first reproducible virtual environment.
-
Hopefully, you will be using multiple environments in the future and forget from time to time what you call them. Which
condacommando gives you a list of all the environments that you have created? Hint: look at this conda cheat sheetSolution!
conda env list -
Finally, make sure you also know how to delete unused environments as these can fill up your laptop. Figure out the command to remove the
newenvenvironment created in the previous exercise.Solution!
conda env remove -n newenv
