Virtual environments

Python is a great programming language and this is mostly due to its vast ecosystem of packages. No matter what you want to do, there is probably a package that can get you started. Just try to remember when the last time you wrote a program only using the Python standard library. Probably never. For this reason, we need a way to install third-party packages and this is where package managers come into play.

You have probably already used pip for the longest time, which is the default package manager for Python. pip is great for beginners but it is missing one essential feature that you will need as a developer or data scientist: virtual environments. Virtual environments are an essential way to make sure that the dependencies of different projects do not cross-contaminate each other. As a naive example, consider project A requires torch==1.3.0 and project B requires torch==2.0, then

cd project_A  # move to project A
pip install torch==1.3.0  # install old torch version
cd ../project_B  # move to project B
pip install torch==2.0  # install new torch version
cd ../project_A  # move back to project A
python main.py  # try executing main script from project A

will mean that even though we are executing the main script from project A's folder, it will use torch==2.0 instead of torch==1.3.0 because that is the last version we installed because in both cases pip will install the package into the same environment, in this case, the global environment. Instead, if we did something like:

Unix/macOSWindows

cd project_A  # move to project A
python -m venv env  # create a virtual environment in project A
source env/bin/activate  # activate that virtual environment
pip install torch==1.3.0  # Install the old torch version into the virtual environment belonging to project A
cd ../project_B  # move to project B
python -m venv env  # create a virtual environment in project B
source env/bin/activate  # activate that virtual environment
pip install torch==2.0  # Install new torch version into the virtual environment belonging to project B
cd ../project_A  # Move back to project A
source env/bin/activate  # Activate the virtual environment belonging to project A
python main.py  # Succeed in executing the main script from project A

cd project_A  # Move to project A
python -m venv env  # Create a virtual environment in project A
.\env\Scripts\activate  # Activate that virtual environment
pip install torch==1.3.0  # Install the old torch version into the virtual environment belonging to project A
cd ../project_B  # Move to project B
python -m venv env  # Create a virtual environment in project B
.\env\Scripts\activate  # Activate that virtual environment
pip install torch==2.0  # Install new torch version into the virtual environment belonging to project B
cd ../project_A  # Move back to project A
.\env\Scripts\activate  # Activate the virtual environment belonging to project A
python main.py  # Succeed in executing the main script from project A

then we would be sure that torch==1.3.0 is used when executing main.py in project A because we are using two different virtual environments. In the above case, we used the venv module which is the built-in Python module for creating virtual environments. venv+pip is arguably a good combination but when working on multiple projects it can quickly become a hassle to manage all the different virtual environments yourself, remembering which Python version to use, which packages to install and so on.

For this reason, several package managers have been created that can help you manage your virtual environments and dependencies, with some of the most popular being:

In these exercises, we are going to be looking at how we can use conda to control dependencies when we are working on python projects. Many of you may already have conda installed, but most people have never actually used it. The workflow presented in these exercises for managing dependencies are as follows

Use conda to create environments
Use pip to install packages in that environment

It is most likely not the optimal way of doing things but where conda shines over other dependency managers is that it supports all three major operating systems (Windows, OS, Linux) the best. Therefore, it is a great tool for teaching about virtual environments. Additionally, many local compute clusters in universities only allow you to work on the cluster if you use virtual environments through conda.

Exercises

Download and install conda. You are free to either install full conda or the much simpler version miniconda. The core difference between the two packages is that conda already comes with a lot of packages that you would normally have to install with miniconda. The downside is that conda is a much larger package which can be a huge disadvantage on smaller devices.
Start a terminal or command prompt and type in conda help which should show you the help page for the different commands that you can use with conda. If this does not work you probably need to set some system variable to point to the conda installation
The first important conda command is create which will create a new environment
```
conda create -n "my_environment" python=3.11
```
Execute the command. What does the -n flag do? What does the python=3.11 flag do?

Solution!
The -n flag is used to specify the name of the environment and the python=3.11 flag is used to specify the version of python that should be installed in the environment. In general, you can call conda create --help to get information about the different flags you can use with the create command.
Afterward, use the conda activate command to activate the environment.
After entering the environment, what pip command should you execute to get a list of all the dependencies already installed in the environment?

Solution!
pip freeze
We are now ready to install some dependencies. Try to get the script simple_classifier.py running (you can find it here). Essentially, you need to iteratively call
```
python simple_classifier.py
```
and
```
pip install <missing-package>
```
Until the script runs.
The way we usually communicate to other people the requirements needed to run our Python applications/scripts are called requirement.txt files. These files are a simple list of dependencies with the format
```
dependency1==X.Y.Z
dependency2==X.Y.Z
```
Where X.Y.Z is the particular version of that package. Construct a requirements.txt file containing the dependencies you just installed to run the script. Remember to specify the exact version you have used!
We are often interested in listing only the bare minimum necessary to run our code in the requirements.txt file. If you have written more than 2 dependencies in the last exercise, you have too many. Try figuring out what two are strictly necessary to get the application running?
When you think you have managed to create the file, let's try to test that it works. Execute these four commands:
```
conda create -y -n "newenv" python=3.11
conda activate newenv
pip install -r requirements.txt
python simple_classifier.py
```
Make sure you understand what the four commands does. If it completes without errors, congratulations on creating your first reproducible virtual environment.
Hopefully, you will be using multiple environments in the future and forget from time to time what you call them. Which conda commando gives you a list of all the environments that you have created? Hint: look at this conda cheat sheet

Solution!
conda env list
Finally, make sure you also know how to delete unused environments as these can fill up your laptop. Figure out the command to remove the newenv environment created in the previous exercise.

Solution!
conda env remove -n newenv