Package managers and virtual environments
Core Module
Python's extensive package ecosystem is a major strength. It's rare to write a program relying solely on the Python standard library. Therefore, package managers are essential for installing third-party packages.
You've likely used pip, the default Python package manager. While suitable for beginners, pip lacks a crucial
feature for developers and data scientists: virtual environments. Virtual environments prevent dependency conflicts
between projects. For example, if project A requires torch==1.3.0 and project B requires torch==2.0, the following
scenario illustrates the problem:
cd project_A # move to project A
pip install torch==1.3.0 # install old torch version
cd ../project_B # move to project B
pip install torch==2.0 # install new torch version
cd ../project_A # move back to project A
python main.py # try executing main script from project A
will mean that even though we are executing the main script from project A's folder, it will use torch==2.0 instead of
torch==1.3.0 because that is the last version we installed. In both cases, pip will install the package into
the same environment, in this case, the global environment. Instead, if we did something like:
cd project_A # move to project A
python -m venv env # create a virtual environment in project A
source env/bin/activate # activate that virtual environment
pip install torch==1.3.0 # Install the old torch version into the virtual environment belonging to project A
cd ../project_B # move to project B
python -m venv env # create a virtual environment in project B
source env/bin/activate # activate that virtual environment
pip install torch==2.0 # Install new torch version into the virtual environment belonging to project B
cd ../project_A # Move back to project A
source env/bin/activate # Activate the virtual environment belonging to project A
python main.py # Succeed in executing the main script from project A
cd project_A # Move to project A
python -m venv env # Create a virtual environment in project A
.\\env\\Scripts\\activate # Activate that virtual environment
pip install torch==1.3.0 # Install the old torch version into the virtual environment belonging to project A
cd ../project_B # Move to project B
python -m venv env # Create a virtual environment in project B
.\\env\\Scripts\\activate # Activate that virtual environment
pip install torch==2.0 # Install new torch version into the virtual environment belonging to project B
cd ../project_A # Move back to project A
.\\env\\Scripts\\activate # Activate the virtual environment belonging to project A
python main.py # Succeed in executing the main script from project A
then we would be sure that torch==1.3.0 is used when executing main.py in project A because we are using two
different virtual environments. In the above case, we used the built-in
venv module, which is the built-in Python module for creating virtual
environments. venv+pip is arguably a good combination, but when working on multiple projects it can quickly become a
hassle to manage all the different virtual environments yourself, remembering which Python version to use, which
packages to install and so on.
Therefore, several package managers have been developed to manage virtual environments and dependencies. Some popular options include:
| 🌟 Framework | 📄 Docs | 📂 Repository | ⭐ GitHub Stars |
|---|---|---|---|
| Conda | 🔗 Link | 🔗 Link | 7.3k |
| Pipenv | 🔗 Link | 🔗 Link | 25.1k |
| Poetry | 🔗 Link | 🔗 Link | 34.1k |
| Pipx | 🔗 Link | 🔗 Link | 12.4k |
| Hatch | 🔗 Link | 🔗 Link | 7.1k |
| PDM | 🔗 Link | 🔗 Link | 8.5k |
| uv | 🔗 Link | 🔗 Link | 76.4k |
The lack of a standard dependency management approach, unlike npm for node.js or cargo for rust, is a known
issue in the Python community.
This course doesn't mandate a specific package manager, but using one is essential. If you're already familiar with a package manager, continue using it. The best approach is to choose one you like and stick with it. While it's tempting to find the "perfect" package manager, they all accomplish the same goal with minor differences. For a somewhat recent comparison of Python environment management and packaging tools, see this blog post.
If you're new to package managers, I will recommend that you use uv. It is becoming the de facto standard in the
Python community due to its speed and ease of use. It combines the best features of pip and conda, allowing you to
create virtual environments and manage dependencies seamlessly, including for multiple python versions. The alternative
(which has been the recommended approach for many years) is to use conda for creating virtual environments and pip
for installing packages within those environments.
uv and conda+pip in this course
Until 2026, the recommended package managers for this course have been conda+pip. However, starting in 2026, we
will transition to recommending uv as the primary package manager, and conda will probably be phased out of the
course in a few years. Therefore, we try to as much as possible to provide instructions for both conda+pip and
uv. However, I still expect to have missed a couple of places where only instructions for conda+pip are given.
If you find such places, please report them on the course GitHub repository (or directly to me) so that I can fix
them.
Python dependencies
Before we get started with the exercises, let's first talk a bit about Python dependencies. One of the most common ways
to specify dependencies in the Python community is through a requirements.txt file, which is a simple text file that
contains a list of all the packages that you want to install. The format allows you to specify the package name and
version number you want, with 7 different operators:
package1 # any version
package2 == x.y.z # exact version
package3 >= x.y.z # at least version x.y.z
package4 > x.y.z # newer than version x.y.z
package4 <= x.y.z # at most version x.y.z
package5 < x.y.z # older than version x.y.z
package6 ~= x.y.z # install version newer than x.y.z and older than x.y+1
In general, all packages (should) follow the semantic versioning standard, which means that the
version number is split into three parts: x.y.z where x is the major version, y is the minor version and z is
the patch version. Specifying version numbers ensures code reproducibility. Without version numbers, you risk API
changes by package maintainers. This is especially important in machine learning, where reproducing the exact same model
is crucial. The most common alternative to semantic versioning is calendar versioning, where the
version number is based on the date of release, e.g., 2023.4.1 for a release on April 1st, 2023.
Finally, we also need to discuss dependency resolution, which is the process of figuring out which packages are compatible. This is a complex problem with various algorithms. If a package manager takes a long time to install a package, it's likely due to the dependency resolution process. For example, attempting to install
then it would simply fail because there are no versions of matplotlib and numpy under the given
constraints that are compatible with each other. In this case, we would need to relax the constraints to something like
to make it work.
Tip
You are only supposed to use one package manager in the course and therefore you should either do the uv exercises
or the conda+pip exercises, not both. The most clear difference between the two approaches is that uv forces you
to use a project-based approach e.g. anything you do with uv should be done inside a project folder even for one
off scripts, while with conda+pip you can just create a virtual environment anywhere on your system and use that
wherever you want. For this reason, conda+pip may seem a bit more flexible and easier to use initially, while
uv has a bit of a learning curve but is more structured and easier to manage in the long run. In the end, both
approaches are valid and it is mostly a matter of personal preference which one you choose.
❔ Exercises (uv)
-
Download and install
uvfollowing the official installation guide. Verify your installation by runninguv --versionin a terminal, it should display theuvversion number. -
If you have successfully installed
uv, then you should be able to execute theuvcommand in a terminal. You should see something like this:
-
I cannot recommend the uv documentation enough. It will essentially go through all the features of
uvwe will be using in the course. That said, let's first try to see how we can useuvto create virtual environments and manage dependencies:-
Try creating a new virtual environment called using Python 3.11. What command should you execute to do this?
Use Python 3.10 or higher
We recommend using Python 3.10 or higher for this course. Generally, using the second latest Python version (currently 3.13) is advisable, as the newest version may lack support from all dependencies. Check the status of different Python versions here.
-
After creating the virtual environment, a folder called
.venvshould have been created in your current directory (check this!). To run a script using the virtual environment, you can use theuv runcommand:you can think of
uv run=pythoninside the virtual environment. -
uv pipis a 1-to-1 replacement forpipthat works directly within the virtual environment created byuv. Try installing a package usinguv pip, for examplenumpy. -
Instead of calling
uv runevery time you want to execute a command in the virtual environment, you can also activate the virtual environment like withvenvorconda.which will change your terminal prompt to indicate that you are now inside the virtual environment. Instead of running
uv pip installanduv run, you can now simply usepip installandpythonas you would normally do. -
Which
uvcommand gives you a list of all packages installed in your virtual environment?
-
-
The above is the very basic of
uvand is actually not the recommended way of usinguv. Instead,uvworks best as a project-based package manager. Let's try that out:-
When you start a new project, you can initialize it with
uv init <project_name>, which will create a new folder with the given project name and set up a virtual environment for you(1):If you already have a pre-existing folder, you can also run
uv initinside that folder to set it up as auvproject.
which files have been created in the
my_projectfolder and what do they do?Solution
The following has been created:
- A
.venvfolder containing the virtual environment as above - A
README.mdfile for documenting your project - a
pyproject.tomlfile for managing your project, more on this file later
-
To add dependencies to your project, you can use the
uv addcommand:which will install the packages in your virtual environment and also add them to the
pyproject.tomlfile (check this out!). An additional file have been created calleduv.lock, can you figure out what the purpose of this file is?Solution
The
uv.lockfile is used to ensure reproducibility between different users of the project. It contains the exact versions of all packages installed in the virtual environment, including sub-dependencies. When another user wants to set up the same environment, they can use theuv synccommand to install the exact same versions of all packages as specified in theuv.lockfile. -
Another, way to add dependencies to your project is to directly edit the
pyproject.tomlfile. Try addingscikit-learnversion1.2.2to yourpyproject.tomlfile. Afterwards, what command should you execute to install the dependencies specified in thepyproject.tomlfile? -
Make sure that everything works as expected by creating a new script that imports all the packages you have installed and try running it using
uv run. It should run without any import errors if the previous steps were successful. -
Something you will encounter later in the course is the need to install dependencies for the development of your project, e.g., testing frameworks, linters, formatters and so on, which are not needed for the actual execution of your project.
uvhas a built-in way to handle this:Try adding at least two development dependencies to your project and check how they are stored in the
pyproject.tomlfile. -
uvalso supports for defining optional dependencies e.g. dependencies that are only needed for specific use-cases. For example,pandassupport the optional dependencyexcelfor reading and writing Excel files. Try adding an optional dependency to your project and check how it is stored in thepyproject.tomlfile.Solution
in this case we are adding an optional dependency group called
dataframesand to that group we are adding the packagepandas. Check thepyproject.tomlfile afterwards.which will add the following line to your
pyproject.tomlfile:-
Optional dependencies are, as the name suggests, not installed by default when you call
uv runoruv sync. How do you install optional dependencies? -
Finally, how do you specify which optional dependencies should be installed when executing
uv run?Solution
You can specify this in the
pyproject.tomlfile under the[tool.uv]section (see docs):alternatively, setting
default-groups = "all"will install all optional dependencies by default.
-
-
Let's say that you want to upgrade or downgrade the python version you are running inside your
uvproject. How do you do that?Solution
The recommended way is to pin the python version by having a
.python-versionfile in the root of your project with the desired python version. This file can easily be created with the command: -
Assume you have a friend wonking on the same project as you and they are using
piptogether with good oldrequirements.txtfiles. How do you create arequirements.txtfile from youruvproject? -
(Optional)
uvalso supports the notion of tools which are external command line tools that you may use in multiple projects. Examples of such tools areblack,ruff,pytestand so on (all which you will encounter later in the course). These tools can be installed globally on your system by using theuvx(oruv tool): command:which will install the
cowsaytool globally on your system and then execute it with the argument"muuh". Try installing at least one tool and executing it.
-
Alias uvr=uv run
I have personally found that typing uv run before every command can get a bit tedious. Therefore, I recommend
creating a shell alias to simplify this. For example, in bash or zsh, you can add the following line to your
.bashrc or .zshrc file:
and then you can simply use uvr instead of uv run.
❔ Exercises (conda+pip)
Conda vs. Mamba
If you are using conda then you can also use mamba which is a drop-in replacement conda that is faster.
This means that any conda command can be replaced with mamba and it should work. Feel free to use mamba if
you are already familiar with conda or after having gone through the exercises below. Install instructions can
be found here.
For guidance on using conda, refer to the
cheat sheet
in the exercise folder.
-
Download and install
conda. You are free to either install fullcondaor the much simpler versionminiconda. Conda includes many pre-installed packages, while Miniconda is a smaller, minimal installation. Conda's larger size can be a disadvantage on smaller devices. Verify your installation by runningconda helpin a terminal; it should display the conda help message. If it doesn't work, you may need to configure a system variable to point to the conda installation. -
If you have successfully installed conda, then you should be able to execute the
condacommand in a terminal.Conda will always tell you what environment you are currently in, indicated by the
(env_name)in the prompt. By default, it will always start in the(base)environment. -
Try creating a new virtual environment. Make sure that it is called
my_environmentand that it installs version 3.11 of Python. What command should you execute to do this?Use Python 3.10 or higher
We recommend using Python 3.10 or higher for this course. Generally, using the second latest Python version (currently 3.12) is advisable, as the newest version may lack support from all dependencies. Check the status of different Python versions here.
-
Which
condacommand gives you a list of all the environments that you have created? -
Which
condacommand gives you a list of the packages installed in the current environment?-
How do you easily export this list to a text file? Do this, and make sure you export it to a file called
environment.yaml, as conda uses another format by default thanpip. -
Inspect the file to see what is in it.
-
The
environment.yamlfile you have created is one way to secure reproducibility between users because anyone should be able to get an exact copy of your environment if they have yourenvironment.yamlfile. Try creating a new environment directly from yourenvironment.yamlfile and check that the packages being installed exactly match what you originally had.
-
-
As the introduction states, it is fairly safe to use
pipinsidecondatoday. What is the correspondingpipcommand that gives you a list of allpipinstalled packages? And how do you export this to arequirements.txtfile? -
If you look through the requirements that both
pipandcondaproduce, you will see that they are often filled with a lot more packages than what you are using in your project. What you are interested in are the packages that you import in your code:from package import module. One way to get around this is to use the packagepipreqs, which will automatically scan your project and create a requirements file specific to that. Let's try it out:-
Install
pipreqs: -
Either try out
pipreqson one of your own projects or try it out on some other online project. What does therequirements.txtfile thatpipreqsproduces look like compared to the files produced by eitherpiporconda?
-
🧠 Knowledge check
-
Try executing the command
based on the error message you get, what would be a compatible way to install these?
This ends the module on setting up virtual environments. While the methods mentioned in the exercises are great ways to construct requirement files automatically, sometimes it is just easier to sit down and manually create the files, as you in that way ensure that only the most necessary requirements are installed when creating a new environment.
