Command line interfaces
As we already laid out in the very first module, the command line is a powerful tool for interacting with your computer. You should already now be familiar with running basic Python commands in the terminal:
However, as your projects grow in size and complexity, you will often find yourself in need of more advanced ways of interacting with your code. This is where command line interface (CLI) comes into play. A CLI can be seen as a way for you to define the user interface of your application directly in the terminal. Thus, there is no right or wrong way of creating a CLI, it is all about what makes sense for your application.
In this module we are going to look at three different ways of creating a CLI for your machine learning projects. They are all serving a bit different purposes and can therefore be combined in the same project. However, you will most likely also feel that they are overlapping in some areas. That is completely fine, and it is up to you to decide which one to use in which situation.
Project scripts
You might already be familiar with the concept of executable scripts. An executable script is a Python script that can
be run directly from the terminal without having to call the Python interpreter. This has been possible for a long time
in Python, by the inclusion of a so-called shebang line at the top of
the script. However, we are going to look at a specific way of defining
executable scripts
using the standard pyproject.toml
file, which you should have learned about in this
module.
❔ Exercises
-
We are going to assume that you have a training script in your project that you would like to be able to run from the terminal directly without having to call the Python interpreter. Lets assume it is located like this
In your
pyproject.toml
file add the following lines. You will need to alter the paths to match your project.what do you think the
train = "my_project.train:main"
line do?Solution
The line tells Python that we want to create an executable script called
train
that should run themain
function in thetrain.py
file located in themy_project
package. -
Now, all that is left to do is install the project again in editable mode
and you should now be able to run the following command in the terminal
Try it out and see if it works.
-
Add additional scripts to your
pyproject.toml
file that allows you to run other scripts in your project from the terminal.
That is all there really is to it. You can now run your scripts directly from the terminal without having to call the Python interpreter. Some good examples of Python packages that uses this approach are numpy, pylint and kedro.
Command line arguments
If you have worked with Python for some time you are probably familiar with the argparse
package, which allows you
to directly pass in additional arguments to your script in the terminal
argparse
is a very simple way of constructing what is called a command line interfaces. However, one limitation of
argparse
is the possibility of easily defining an CLI with subcommands. If we take git
as an example, git
is the
main command but it has multiple subcommands: push
, pull
, commit
etc. that all can take their own arguments. This
kind of second CLI with subcommands is somewhat possible to do using only argparse
, however it requires a bit of
hacks.
You could of course ask the question why we at all would like to have the possibility of defining such CLI. The main
argument here is to give users of our code a single entrypoint to interact with our application instead of having
multiple scripts. As long as all subcommands are proper documented, then our interface should be simple to interact
with (again think git
where each subcommand can be given the -h
arg to get specific help).
Instead of using argparse
we are here going to look at the yyper package. typer
extends the functionalities of argparse
to allow for easy definition of subcommands and many other things, which we
are not going to touch upon in this module. For completeness we should also mention that typer
is not the only package
for doing this, and of other excellent frameworks for creating command line interfaces easily we can mention
click.
❔ Exercises
-
Start by installing the
typer
packageremember to add the package to your
requirements.txt
file. -
To get you started with
typer
, let's just create a simple hello world type of script. Create a new Python file calledgreetings.py
and use thetyper
package to create a command line interface such that running the following linespython greetings.py # should print "Hello World!" python greetings.py --count=3 # should print "Hello World!" three times python greetings.py --help # should print the help message, informing the user of the possible arguments
executes and gives the expected output. Relevant documentation.
Solution
Importantly for
typer
is that you need to provide type hints for the arguments. This is becausetyper
needs these to be able to work properly. -
Next, lets try on a bit harder example. Below is a simple script that trains a support vector machine on the iris dataset.
iris_classifier.py
Implement a CLI for the script such that the following commands can be run
python iris_classifier.py train --output 'model.ckpt' # should train the model and save it to 'model.ckpt' python iris_classifier.py train -o 'model.ckpt' # should be the same as above
Solution
We are here making use of the short name option in typer for giving an shorter alias to the
--output
option. -
Next lets create a CLI that has more than a single command. Continue working in the basic machine learning application from the previous exercise, but this time we want to define two separate commands
python iris_classifier.py train --output 'model.ckpt' python iris_classifier.py evaluate 'model.ckpt'
Solution
The only key difference between the two is that in the
train
command we define theoutput
argument to to be an optional parameter e.g. we provide a default and for theevaluate
command it is a required parameter. -
Finally, let's try to define subcommands for our subcommands e.g. something similar to how
git
has the subcommandremote
which in itself has multiple subcommands likeadd
,rename
etc. Continue on the simple machine learning application from the previous exercises, but this time define a cli such thate.g the
train
command now has two subcommands for training different machine learning models (in this case SVM and KNN) which each takes arguments that are unique to that model. Relevant documentation.Success
-
(Optional) Let's try to combine what we have learned until now. Try to make your
typer
cli into a executable script using thepyproject.toml
file and try it out!Solution
Assuming that our
iris_classifier.py
script from before is placed insrc/my_project
folder, we should just addand remember to install the project in editable mode
and you should now be able to run the following command in the terminal
This covers the basic of typer
but feel free to deep dive into how the package can help you custimize your CLIs.
Checkout this page on adding colors to your CLI or
this page on validating the inputs to your CLI.
Non-Python code
The two sections above have shown you how to create a simple CLI for your Python scripts. However, when doing machine learning projects, you often have a lot of non-Python code that you would like to run from the terminal. Based on the learning modules you have already completed, you have already encountered a couple of CLI tools that are used in our projects:
As we begin to move into the next couple of learning modules, we are going to encounter even more CLI tools that we need to interact with. Here is a example of long command that you might need to run in your project in the future
docker run -v $(pwd):/app -w /app --gpus all --rm -it my_image:latest python my_script.py --arg1 val1 --arg2 val2
This can be a lot to remember, and it can be easy to make mistakes. Instead it would be nice if we could just do
e.g. easier to remember because we have remove a lot of the hard-to-remember stuff, but we are still able to configure
it to our liking. To help with this, we are going to look at the invoke package.
invoke
is a Python package that allows you to define tasks that can be
run from the terminal. It is a bit like a more advanced version of the Makefile that
you might have encountered in other programming languages. Some good alternatives to invoke
are
just and task, but we have chosen to focus on
invoke
in this module because it can be installed as a Python package making installation across different systems
easier.
❔ Exercises
-
Start by installing
invoke
remember to add the package to your
requirements.txt
file. -
Add a
tasks.py
file to your repository and try to just runwhich should work but inform you that no tasks are added yet.
-
Let's now try to add a task to the
tasks.py
file. The way to do this with invoke is to import thetask
decorator frominvoke
and then decorate a function with it:from invoke import task import os @task def python(ctx): """ """ ctx.run("which python" if os.name != "nt" else "where python")
the first argument of any task-decorated function is the
ctx
context argument that implements therun
method for running any command as we run them in the terminal. In this case we have simply implemented a task that returns the current Python interpreter but it works for all operating systems. Check that it works by running: -
Lets try to create a task that simplifies the process of
git add
,git commit
,git push
. Create a task such that the following command can be runImplement it and use the command to commit the taskfile you just created!
-
As you have hopefully realized by now, the most important method in
invoke
is thectx.run
method which actually run the commands you want to run in the terminal. This command takes multiple additional arguments. Try out the argumentswarn
,pty
,echo
and explain in your own words what they do.Solution
warn
: If set toTrue
the command will not raise an exception if the command fails. This can be useful if you want to run multiple commands and you do not want the whole process to stop if one of the commands fail.pty
: If set toTrue
the command will be run in a pseudo-terminal. If you want to enable this or not, depends on the command you are running. Here is a good explanation of when/why you should use it.echo
: If set toTrue
the command will be printed to the terminal before it is run.
-
Create a command that simplifies the process of bootstrapping a
conda
environment and install the relevant dependencies of your project. -
Assuming you have completed the exercises on using dvc for version control of data, lets also try to add a task that simplifies the process of adding new data. This is the list of commands that need to be run to add new data to a dvc repository:
dvc add
,git add
,git commit
,git push
,dvc push
. Try to implement a task that simplifies this process. It needs to take two arguments for defining the folder to add and the commit message. -
As the final exercise, lets try to combine every way of defining CLIs we have learned about in this module. Define a task that does the following
- calls
dvc pull
to download the data - calls a entrypoint
my_cli
with the subcommandtrain
with the arguments--output 'model.ckpt'
- calls
That is all there is to it. You should now be able to define tasks that can be run from the terminal to simplify the
process of running your code. We recommend that as you go through the learning modules in this course that you slowly
start to add tasks to your tasks.py
file that simplifies the process of running the code you are writing.
🧠 Knowledge check
-
What is the purpose of a command line interface?
Solution
A command line interface is a way for you to define the user interface of your application directly in the terminal. It allows you to interact with your code in a more advanced way than just running Python scripts.