Command line interfaces
As discussed in the initial module, the command line offers a robust interface for interacting with your computer. You should already be comfortable executing basic Python commands in the terminal:
However, as projects increase in size and complexity, more sophisticated methods of interacting with your code become necessary. This is where a command line interface (CLI) becomes useful. A CLI allows you to define the user interface of your application directly in the terminal, and the best approach depends on the specific needs of your application.
In this module, we will explore three distinct methods for creating CLIs for your machine learning projects. Each method serves a slightly different purpose, and they can be combined within the same project. You may find some overlap between them, which is perfectly acceptable. The choice of which method to use depends on your specific requirements.
Project scripts
You may already be familiar with executable scripts. An executable script is a Python script that can be run directly
from the terminal without needing to call the Python interpreter. This has been possible in Python for a long time,
using a shebang line at the top of the script. However, we will explore a
specific method for defining executable scripts
using the standard pyproject.toml file, as covered in this module.
❔ Exercises
-
We are going to assume that you have a training script in your project that you would like to be able to run from the terminal directly without having to call the Python interpreter. Lets assume it is located like this:
In your
pyproject.tomlfile, add the following lines, adjusting the paths to match your project structure:what do you think the
train = "my_project.train:main"line does?Solution
The line instructs Python to create an executable script named
trainthat executes themainfunction within thetrain.pyfile, located in themy_projectpackage. -
Now, all that is left to do is to install the project again in editable mode.
You should now be able to run the following command in the terminal:
Try it out and see if it works.
-
Add additional commands to your
pyproject.tomlfile that allow you to run other scripts in your project from the terminal.
That is all there really is to it. You can now run your scripts directly from the terminal without having to call the Python interpreter. Some good examples of Python packages that use this approach are numpy, pylint and kedro.
Command line arguments
If you have experience with Python, you are likely familiar with the argparse package, which enables you to pass
arguments directly to your script via the terminal.
argparse is a very simple way of constructing what is called a command line interface. However, one limitation of
argparse is that it is not easy to define a CLI with subcommands. If we take git as an example, git is the
main command, but it has multiple subcommands: push, pull, commit etc. that can all take their own arguments. This
kind of second CLI with subcommands is somewhat possible to do using only argparse, but it requires some
hacks.
You could of course ask the question why we at all want to have the possibility of defining such a CLI. The main
argument here is to give users of our code a single entrypoint to interact with our application instead of having
multiple scripts. As long as all subcommands are properly documented, then our interface should be simple to interact
with (again think git where each subcommand can be given the -h arg to get specific help).
Instead of using argparse we are here going to look at the typer package. typer
extends the functionalities of argparse to allow for easy definition of subcommands and many other things, which we
are not going to touch upon in this module. For completeness we should also mention that typer is not the only package
for doing this, and another excellent framework for creating command line interfaces easily is
click.
❔ Exercises
-
Start by installing the
typerpackageand remember to add the package to your
requirements.txtfile. -
To get you started with
typer, let's just create a simple hello world type of script. Create a new Python file calledgreetings.pyand use thetyperpackage to create a command line interface such that running the following linespython greetings.py # should print "Hello World!" python greetings.py --count=3 # should print "Hello World!" three times python greetings.py --help # should print the help message, informing the user of the possible argumentsexecutes and gives the expected output. Relevant documentation.
Solution
Important for
typeris that you need to provide type hints for the arguments. This is becausetyperneeds these to be able to work properly. -
Next, let's try on a slightly harder example. Below is a simple script that trains a support vector machine on the iris dataset.
iris_classifier.py
Implement a CLI for the script such that the following commands can be run
python iris_classifier.py --output 'model.ckpt' # should train the model and save it to 'model.ckpt' python iris_classifier.py -o 'model.ckpt' # should be the same as aboveSolution
We are here making use of the short name option in typer for giving a shorter alias to the
--outputoption. -
Next let's create a CLI that has more than a single command. Continue working in the basic machine learning application from the previous exercise, but this time we want to define two separate commands:
python iris_classifier.py train --output 'model.ckpt' python iris_classifier.py evaluate 'model.ckpt'Solution
The only key difference between the two is that in the
traincommand we define theoutputargument to to be an optional parameter, i.e., we provide a default and for theevaluatecommand it is a required parameter. -
Finally, let's try to define subcommands for our subcommands, e.g., something similar to how
githas the subcommandremotewhich in itself has multiple subcommands likeadd,rename, etc. Continue on the simple machine learning application from the previous exercises, but this time define a CLI for these commands:python iris_classifier.py train svm --kernel 'linear' python iris_classifier.py train knn --n-neighbors 5i.e., the
traincommand now has two subcommands for training different machine learning models (in this case SVM and KNN) which each takes arguments that are unique to that model. Relevant documentation._vs-When using typer note that variables with
_in the name will be converted to-in the CLI. Meaning that if you have a variablen_neighborsin your code, you should use--n-neighborsin the CLI.Success
-
(Optional) Let's try to combine what we have learned until now. Try to make your
typerCLI into an executable script using thepyproject.tomlfile and try it out!Solution
Assuming that our
iris_classifier.pyscript from before is placed in thesrc/my_projectfolder, we should just addand remember to install the project in editable mode
and you should now be able to run the following command in the terminal
This covers the basics of typer but feel free to deep dive into how the package can help you custimize your CLIs.
Checkout this page on adding colors to your CLI or
this page on validating the inputs to your CLI.
Non-Python code
The two above sections have shown you how to create a simple CLI for your Python scripts. However, when doing machine learning projects, you often have a lot of non-Python code that you would like to run from the terminal. In the learning modules you have already completed, you have already encountered a couple of CLI tools that are used in our projects:
As we begin to move into the next couple of learning modules, we are going to encounter even more CLI tools that we need to interact with. Here is an example of a long command that you might need to run in your project in the future:
docker run -v $(pwd):/app -w /app --gpus all --rm -it my_image:latest python my_script.py --arg1 val1 --arg2 val2
This can be a lot to remember, and it can be easy to make mistakes. Instead, it would be nice if we could just do
i.e. easier to remember because we have removed a lot of the hard-to-remember stuff, but we are still able to configure
it to our liking. To help with this, we are going to look at the invoke package.
invoke is a Python package that allows you to define tasks that can be
run from the terminal. It is a bit like a more advanced version of the Makefile that
you might have encountered in other programming languages. Some good alternatives to invoke are
just and task, but we have chosen to focus on
invoke in this module because it can be installed as a Python package, making installation across different systems
easier.
❔ Exercises
-
Start by installing
invokeand remember to add the package to your
requirements.txtfile. -
Add a
tasks.pyfile to your repository and try to just runwhich should work but inform you that no tasks have been added yet.
-
Let's now try to add a task to the
tasks.pyfile. The way to do this with invoke is to import thetaskdecorator frominvokeand then decorate a function with it:from invoke import task import os @task def python(ctx): """ """ ctx.run("which python" if os.name != "nt" else "where python")the first argument of any task-decorated function is the
ctxcontext argument that implements therunmethod for running any command as we would run it in the terminal. In this case we have simply implemented a task that returns the current Python interpreter, but it works for all operating systems. Check that it works by running: -
Let's try to create a task that simplifies the process of
git add,git commit,git push. Create a task such that the following command can be runImplement it and use the command to commit the taskfile you just created!
-
As you have hopefully realized by now, the most important method in
invokeis thectx.runmethod which actually run the commands you want to run in the terminal. This command takes multiple additional arguments. Try out the argumentswarn,pty,echoand explain in your own words what they do.Solution
warn: If set toTruethe command will not raise an exception if the command fails. This can be useful if you want to run multiple commands and you do not want the whole process to stop if one of the commands fails.pty: If set toTruethe command will be run in a pseudo-terminal. Whether you want to enable this or not depends on the command you are running. Here is a good explanation of when/why you should use it.echo: If set toTruethe command will be printed to the terminal before it is run.
-
Create a command that simplifies the process of bootstrapping a
condaenvironment and installs the relevant dependencies of your project. -
Assuming you have completed the exercises on using dvc for version control of data, let's also try to add a task that simplifies the process of adding new data. This is the list of commands that need to be run to add new data to a dvc repository:
dvc add,git add,git commit,git push,dvc push. Try to implement a task that simplifies this process. It needs to take two arguments for defining the folder to add and the commit message. -
As the final exercise, let's try to combine every way of defining CLIs we have learned about in this module. Define a task that does the following
- calls
dvc pullto download the data - calls an entrypoint
my_cliwith the subcommandtrainwith the arguments--output 'model.ckpt'
- calls
That is all there is to it. You should now be able to define tasks that can be run from the terminal to simplify the
process of running your code. We recommend that as you go through the learning modules in this course that you slowly
start to add tasks to your tasks.py file that simplify the process of running the code you are writing.
🧠 Knowledge check
-
What is the purpose of a command line interface?
Solution
A command line interface is a way for you to define the user interface of your application directly in the terminal. It allows you to interact with your code in a more advanced way than just running Python scripts.