Command line interfaces
As discussed in the initial module, the command line offers a robust interface for interacting with your computer. You should already be comfortable executing basic Python commands in the terminal:
However, as projects increase in size and complexity, more sophisticated methods of interacting with your code become necessary. This is where a command line interface (CLI) becomes useful. A CLI allows you to define the user interface of your application directly in the terminal, and the best approach depends on the specific needs of your application.
In this module, we will explore three distinct methods for creating CLIs for your machine learning projects. Each method serves a slightly different purpose, and they can be combined within the same project. You may find some overlap between them, which is perfectly acceptable. The choice of which method to use depends on your specific requirements.
Project scripts
You may already be familiar with executable scripts. An executable script is a Python script that can be run directly
from the terminal without needing to call the Python interpreter. This has been possible in Python for a long time,
using a shebang line at the top of the script. However, we will explore a
specific method for defining executable scripts
using the standard pyproject.toml
file, as covered in this module.
❔ Exercises
-
We are going to assume that you have a training script in your project that you would like to be able to run from the terminal directly without having to call the Python interpreter. Lets assume it is located like this:
In your
pyproject.toml
file, add the following lines, adjusting the paths to match your project structure:what do you think the
train = "my_project.train:main"
line does?Solution
The line instructs Python to create an executable script named
train
that executes themain
function within thetrain.py
file, located in themy_project
package. -
Now, all that is left to do is to install the project again in editable mode.
You should now be able to run the following command in the terminal:
Try it out and see if it works.
-
Add additional commands to your
pyproject.toml
file that allow you to run other scripts in your project from the terminal.
That is all there really is to it. You can now run your scripts directly from the terminal without having to call the Python interpreter. Some good examples of Python packages that use this approach are numpy, pylint and kedro.
Command line arguments
If you have experience with Python, you are likely familiar with the argparse
package, which enables you to pass
arguments directly to your script via the terminal.
argparse
is a very simple way of constructing what is called a command line interface. However, one limitation of
argparse
is that it is not easy to define a CLI with subcommands. If we take git
as an example, git
is the
main command, but it has multiple subcommands: push
, pull
, commit
etc. that can all take their own arguments. This
kind of second CLI with subcommands is somewhat possible to do using only argparse
, but it requires some
hacks.
You could of course ask the question why we at all want to have the possibility of defining such a CLI. The main
argument here is to give users of our code a single entrypoint to interact with our application instead of having
multiple scripts. As long as all subcommands are properly documented, then our interface should be simple to interact
with (again think git
where each subcommand can be given the -h
arg to get specific help).
Instead of using argparse
we are here going to look at the typer package. typer
extends the functionalities of argparse
to allow for easy definition of subcommands and many other things, which we
are not going to touch upon in this module. For completeness we should also mention that typer
is not the only package
for doing this, and another excellent framework for creating command line interfaces easily is
click.
❔ Exercises
-
Start by installing the
typer
packageand remember to add the package to your
requirements.txt
file. -
To get you started with
typer
, let's just create a simple hello world type of script. Create a new Python file calledgreetings.py
and use thetyper
package to create a command line interface such that running the following linespython greetings.py # should print "Hello World!" python greetings.py --count=3 # should print "Hello World!" three times python greetings.py --help # should print the help message, informing the user of the possible arguments
executes and gives the expected output. Relevant documentation.
Solution
Important for
typer
is that you need to provide type hints for the arguments. This is becausetyper
needs these to be able to work properly. -
Next, let's try on a slightly harder example. Below is a simple script that trains a support vector machine on the iris dataset.
iris_classifier.py
Implement a CLI for the script such that the following commands can be run
python iris_classifier.py --output 'model.ckpt' # should train the model and save it to 'model.ckpt' python iris_classifier.py -o 'model.ckpt' # should be the same as above
Solution
We are here making use of the short name option in typer for giving a shorter alias to the
--output
option. -
Next let's create a CLI that has more than a single command. Continue working in the basic machine learning application from the previous exercise, but this time we want to define two separate commands:
python iris_classifier.py train --output 'model.ckpt' python iris_classifier.py evaluate 'model.ckpt'
Solution
The only key difference between the two is that in the
train
command we define theoutput
argument to to be an optional parameter, i.e., we provide a default and for theevaluate
command it is a required parameter. -
Finally, let's try to define subcommands for our subcommands, e.g., something similar to how
git
has the subcommandremote
which in itself has multiple subcommands likeadd
,rename
, etc. Continue on the simple machine learning application from the previous exercises, but this time define a CLI for these commands:python iris_classifier.py train svm --kernel 'linear' python iris_classifier.py train knn --n-neighbors 5
i.e., the
train
command now has two subcommands for training different machine learning models (in this case SVM and KNN) which each takes arguments that are unique to that model. Relevant documentation._
vs-
When using typer note that variables with
_
in the name will be converted to-
in the CLI. Meaning that if you have a variablen_neighbors
in your code, you should use--n-neighbors
in the CLI.Success
-
(Optional) Let's try to combine what we have learned until now. Try to make your
typer
CLI into an executable script using thepyproject.toml
file and try it out!Solution
Assuming that our
iris_classifier.py
script from before is placed in thesrc/my_project
folder, we should just addand remember to install the project in editable mode
and you should now be able to run the following command in the terminal
This covers the basics of typer
but feel free to deep dive into how the package can help you custimize your CLIs.
Checkout this page on adding colors to your CLI or
this page on validating the inputs to your CLI.
Non-Python code
The two above sections have shown you how to create a simple CLI for your Python scripts. However, when doing machine learning projects, you often have a lot of non-Python code that you would like to run from the terminal. In the learning modules you have already completed, you have already encountered a couple of CLI tools that are used in our projects:
As we begin to move into the next couple of learning modules, we are going to encounter even more CLI tools that we need to interact with. Here is an example of a long command that you might need to run in your project in the future:
docker run -v $(pwd):/app -w /app --gpus all --rm -it my_image:latest python my_script.py --arg1 val1 --arg2 val2
This can be a lot to remember, and it can be easy to make mistakes. Instead, it would be nice if we could just do
i.e. easier to remember because we have removed a lot of the hard-to-remember stuff, but we are still able to configure
it to our liking. To help with this, we are going to look at the invoke package.
invoke
is a Python package that allows you to define tasks that can be
run from the terminal. It is a bit like a more advanced version of the Makefile that
you might have encountered in other programming languages. Some good alternatives to invoke
are
just and task, but we have chosen to focus on
invoke
in this module because it can be installed as a Python package, making installation across different systems
easier.
❔ Exercises
-
Start by installing
invoke
and remember to add the package to your
requirements.txt
file. -
Add a
tasks.py
file to your repository and try to just runwhich should work but inform you that no tasks have been added yet.
-
Let's now try to add a task to the
tasks.py
file. The way to do this with invoke is to import thetask
decorator frominvoke
and then decorate a function with it:from invoke import task import os @task def python(ctx): """ """ ctx.run("which python" if os.name != "nt" else "where python")
the first argument of any task-decorated function is the
ctx
context argument that implements therun
method for running any command as we would run it in the terminal. In this case we have simply implemented a task that returns the current Python interpreter, but it works for all operating systems. Check that it works by running: -
Let's try to create a task that simplifies the process of
git add
,git commit
,git push
. Create a task such that the following command can be runImplement it and use the command to commit the taskfile you just created!
-
As you have hopefully realized by now, the most important method in
invoke
is thectx.run
method which actually run the commands you want to run in the terminal. This command takes multiple additional arguments. Try out the argumentswarn
,pty
,echo
and explain in your own words what they do.Solution
warn
: If set toTrue
the command will not raise an exception if the command fails. This can be useful if you want to run multiple commands and you do not want the whole process to stop if one of the commands fails.pty
: If set toTrue
the command will be run in a pseudo-terminal. Whether you want to enable this or not depends on the command you are running. Here is a good explanation of when/why you should use it.echo
: If set toTrue
the command will be printed to the terminal before it is run.
-
Create a command that simplifies the process of bootstrapping a
conda
environment and installs the relevant dependencies of your project. -
Assuming you have completed the exercises on using dvc for version control of data, let's also try to add a task that simplifies the process of adding new data. This is the list of commands that need to be run to add new data to a dvc repository:
dvc add
,git add
,git commit
,git push
,dvc push
. Try to implement a task that simplifies this process. It needs to take two arguments for defining the folder to add and the commit message. -
As the final exercise, let's try to combine every way of defining CLIs we have learned about in this module. Define a task that does the following
- calls
dvc pull
to download the data - calls an entrypoint
my_cli
with the subcommandtrain
with the arguments--output 'model.ckpt'
- calls
That is all there is to it. You should now be able to define tasks that can be run from the terminal to simplify the
process of running your code. We recommend that as you go through the learning modules in this course that you slowly
start to add tasks to your tasks.py
file that simplify the process of running the code you are writing.
🧠 Knowledge check
-
What is the purpose of a command line interface?
Solution
A command line interface is a way for you to define the user interface of your application directly in the terminal. It allows you to interact with your code in a more advanced way than just running Python scripts.