Part 5: Hello Containers¶
See the whole playlist on the Nextflow YouTube channel.
The video transcript is available here.
In Parts 1-4 of this training course, you learned how to use the basic building blocks of Nextflow to assemble a simple workflow capable of processing some text, parallelizing execution if there were multiple inputs, and collecting the results for further processing.
However, you were limited to basic UNIX tools available in your environment. Real-world tasks often require various tools and packages not included by default. Typically, you'd need to install these tools, manage their dependencies, and resolve any conflicts.
That is all very tedious and annoying, so we're going to show you how to use containers to solve this problem much more conveniently.
A container is a lightweight, standalone, executable unit of software created from a container image that includes everything needed to run an application including code, system libraries and settings. As you might imagine, that is a going to be very helpful for making your pipelines more reproducible.
Note that we'll be teaching this using Docker, but keep in mind Nextflow supports several other container technologies as well.
How to begin from this section
This section of the course assumes you have completed Parts 1-4 of the Hello Nextflow course and have a complete working pipeline.
If you're starting the course from this point, you'll need to copy the modules directory over from the solutions:
0. Warmup: Run hello-containers.nf¶
We're going to use the workflow script hello-containers.nf as a starting point.
It is equivalent to the script produced by working through Part 4 of this training course, except we've changed the output destinations:
| hello-containers.nf | |
|---|---|
Just to make sure everything is working, run the script once before making any changes:
Command output
As previously, you will find the output files in the directory specified in the output block (results/hello_containers/).
Directory contents
If that worked for you, you're ready to learn how to use containers.
1. Use a container 'manually'¶
What we want to do is add a step to our workflow that will use a container for execution.
However, we are first going to go over some basic concepts and operations to solidify your understanding of what containers are before we start using them in Nextflow.
1.1. Pull the container image¶
To use a container, you usually download or pull a container image from a container registry, and then run the container image to create a container instance.
The general syntax is as follows:
The docker pull part is the instruction to the container system to pull a container image from a repository.
The '<container>' part is the URI address of the container image.
As an example, let's pull a container image that contains cowpy, a python implementation of a tool called cowsay that generates ASCII art to display arbitrary text inputs in a fun way.
________________________
< Are we having fun yet? >
------------------------
\ ___-------___
\ _-~~ ~~-_
\ _-~ /~-_
/^\__/^\ /~ \ / \
/| O|| O| / \_______________/ \
| |___||__| / / \ \
| \ / / \ \
| (_______) /______/ \_________ \
| / / \ / \
\ \^\\ \ / \ /
\ || \______________/ _-_ //\__//
\ ||------_-~~-_ ------------- \ --/~ ~\ || __/
~-----||====/~ |==================| |/~~~~~
(_(__/ ./ / \_\ \.
(_(___/ \_____)_)
There are various repositories where you can find published containers.
We used the Seqera Containers service to generate this Docker container image from the cowpy Conda package: 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'.
Run the complete pull command:
Command output
1.1.5--3db457ae1977a273: Pulling from library/cowpy
dafa2b0c44d2: Pull complete
dec6b097362e: Pull complete
f88da01cff0b: Pull complete
4f4fb700ef54: Pull complete
92dc97a3ef36: Pull complete
403f74b0f85e: Pull complete
10b8c00c10a5: Pull complete
17dc7ea432cc: Pull complete
bb36d6c3110d: Pull complete
0ea1a16bbe82: Pull complete
030a47592a0a: Pull complete
c23bdb422167: Pull complete
e1686ff32a11: Pull complete
Digest: sha256:1ebc0043e8cafa61203bf42d29fd05bd14e7b4298e5e8cf986504c15f5aa4160
Status: Downloaded newer image for community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273
community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273
If you've never downloaded the image before, this may take a minute to complete. Once it's done, you have a local copy of the container image.
1.2. Use the container to run cowpy as a one-off command¶
One very common way that people use containers is to run them directly, i.e. non-interactively. This is great for running one-off commands.
The general syntax is as follows:
The docker run --rm '<container>' part is the instruction to the container system to spin up a container instance from a container image and execute a command in it.
The --rm flag tells the system to shut down the container instance after the command has completed.
The [tool command] syntax depends on the tool you are using and how the container is set up.
Let's just start with cowpy.
Fully assembled, the container execution command looks like this; go ahead and run it.
Command output
The system spun up the container, ran the cowpy command with its parameters, sent the output to the console and finally, shut down the container instance.
1.3. Use the container to run cowpy interactively¶
You can also run a container interactively, which gives you a shell prompt inside the container and allows you to play with the command.
1.3.1. Spin up the container¶
To run interactively, we just add -it to the docker run command.
Optionally, we can specify the shell we want to use inside the container by appending e.g. /bin/bash to the command.
Notice that your prompt changes to something like (base) root@b645838b3314:/tmp#, which indicates that you are now inside the container.
You can verify this by running ls / to list directory contents from the root of the filesystem:
We use ls here instead of tree because the tree utility is not available in this container.
You can see that the filesystem inside the container is different from the filesystem on your host system.
One limitation of what we just did is that the container is completely isolated from the host system by default. This means that the container can't access any files on the host system unless you explicitly allow it to do so.
We'll show you how to do that in a minute.
1.3.2. Run the desired tool command(s)¶
Now that you are inside the container, you can run the cowpy command directly and give it some parameters.
For example, the tool documentation says we can change the character ('cowacter') with -c.
Command output
Now the output shows the Linux penguin, Tux, instead of the default cow, because we specified the -c tux parameter.
Because you're inside the container, you can run the cowpy command as many times as you like, varying the input parameters, without having to bother with Docker commands.
Tip
Use the '-c' flag to pick a different character, including:
beavis, cheese, daemon, dragonandcow, ghostbusters, kitty, moose, milk, stegosaurus, turkey, turtle, tux
This is neat. What would be even neater is if we could feed our greetings.csv as input into this.
But since we don't have access to the filesystem, we can't.
Let's fix that.
1.3.3. Exit the container¶
To exit the container, you can type exit at the prompt or use the Ctrl+D keyboard shortcut.
Your prompt should now be back to what it was before you started the container.
1.3.4. Mount data into the container¶
As noted earlier, the container is isolated from the host system by default.
To allow the container to access the host filesystem, you can mount a volume from the host system into the container using the following syntax:
In our case <outside_path> will be the current working directory, so we can just use a dot (.), and <inside_path> is just an alias we make up; let's call it /my_project (the inside path must be absolute).
To mount a volume, we replace the paths and add the volume mounting argument to the docker run command as follows:
docker run --rm -it -v .:/my_project 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273' /bin/bash
This mounts the current working directory as a volume that will be accessible under /my_project inside the container.
You can check that it works by listing the contents of /my_project:
Command output
You can now see the contents of the working directory from inside the container, including the greetings.csv file under data/.
This effectively established a tunnel through the container wall that you can use to access that part of your filesystem.
1.3.5. Use the mounted data¶
Now that we have mounted the working directory into the container, we can use the cowpy command to display the contents of the greetings.csv file.
To do this, we'll use cat /my_project/data/greetings.csv | to pipe the contents of the CSV file into the cowpy command.
Command output
____________________
/ Hello,English,123 \
| Bonjour,French,456 |
\ Holà,Spanish,789 /
--------------------
\ ,+*^^*+___+++_
\ ,*^^^^ )
\ _+* ^**+_
\ +^ _ _++*+_+++_, )
_+^^*+_ ( ,+*^ ^ \+_ )
{ ) ( ,( ,_+--+--, ^) ^\
{ (\@) } f ,( ,+-^ __*_*_ ^^\_ ^\ )
{:;-/ (_+*-+^^^^^+*+*<_ _++_)_ ) ) /
( / ( ( ,___ ^*+_+* ) < < \
U _/ ) *--< ) ^\-----++__) ) ) )
( ) _(^)^^)) ) )\^^^^^))^*+/ / /
( / (_))_^)) ) ) ))^^^^^))^^^)__/ +^^
( ,/ (^))^)) ) ) ))^^^^^^^))^^) _)
*+__+* (_))^) ) ) ))^^^^^^))^^^^^)____*^
\ \_)^)_)) ))^^^^^^^^^^))^^^^)
(_ ^\__^^^^^^^^^^^^))^^^^^^^)
^\___ ^\__^^^^^^))^^^^^^^^)\\
^^^^^\uuu/^^\uuu/^^^^\^\^\^\^\^\^\^\
___) >____) >___ ^\_\_\_\_\_\_\)
^^^//\\_^^//\\_^ ^(\_\_\_\)
^^^ ^^ ^^^ ^
This produces the desired ASCII art of a turkey rattling off our example greetings! Except here the turkey is repeating the full rows instead of just the greetings. We already know our Nextflow workflow will do a better job!
Feel free to play around with this command. When you're done, exit the container as previously:
You will find yourself back in your normal shell.
Takeaway¶
You know how to pull a container and run it either as a one-off or interactively. You also know how to make your data accessible from within your container, which lets you try any tool you're interested in on real data without having to install any software on your system.
What's next?¶
Learn how to use containers for the execution of Nextflow processes.
2. Use containers in Nextflow¶
Nextflow has built-in support for running processes inside containers to let you run tools you don't have installed in your compute environment. This means that you can use any container image you like to run your processes, and Nextflow will take care of pulling the image, mounting the data, and running the process inside it.
To demonstrate this, we are going to add a cowpy step to the pipeline we've been developing, after the collectGreetings step.
Moo if you're ready to dive in!
2.1. Write a cowpy module¶
First, let's create the cowpy process module.
2.1.1. Create a file stub for the new module¶
Create an empty file for the module called cowpy.nf.
This gives us a place to put the process code.
2.1.2. Copy the cowpy process code in the module file¶
We can model our cowpy process on the other processes we've written previously.
| modules/cowpy.nf | |
|---|---|
The process expects an input_file containing the greetings as well as a character value.
The output will be a new text file containing the ASCII art generated by the cowpy tool.
2.2. Add cowpy to the workflow¶
Now we need to import the module and call the process.
2.2.1. Import the cowpy process into hello-containers.nf¶
Insert the import declaration above the workflow block and fill it out appropriately.
Now the cowpy module is available to use in the workflow.
2.2.2. Add a call to the cowpy process in the workflow¶
Let's connect the cowpy() process to the output of the collectGreetings() process, which as you may recall produces two outputs:
collectGreetings.out.outfilecontains the output file <--what we wantcollectGreetings.out.reportcontains the report file with the count of greetings per batch
In the workflow block, make the following code change:
Notice that we declared a new CLI parameter, params.character, in order to specify which character we want to have say the greetings.
2.2.3. Add the character parameter to the params block¶
This is technically optional but it's the recommended practice and it's an opportunity to set a default value for the character while we're at it.
Now we can be lazy and skip typing the character parameter in our command lines.
2.2.4. Update the workflow outputs¶
We need to update the workflow outputs to publish the output of the cowpy process.
2.2.4.1. Update the publish: section¶
In the workflow block, make the following code change:
The cowpy process only produces one output so we can refer to it the usual way by appending .out.
But for now, let's finish updating the workflow-level outputs.
2.2.4.2. Update the output block¶
We need to add the final cowpy_art output to the output block. While we're at it, let's also edit the publishing destinations since now our pipeline is complete and we know what outputs we really care about.
In the output block, make the following code changes:
Now the published outputs will be a bit more organized.
2.2.5. Run the workflow¶
Just to recap, this is what we are aiming for:
Do you think it's going to work?
Let's delete the previous published outputs to have a clean slate, and run the workflow with the -resume flag.
Command output (edited for clarity)
N E X T F L O W ~ version 25.10.2
Launching `hello-containers.nf` [lonely_woese] DSL2 - revision: abf1dccf7f
executor > local (1)
[c9/f5c686] sayHello (3) [100%] 3 of 3, cached: 3 ✔
[ef/3135a8] convertToUpper (3) [100%] 3 of 3, cached: 3 ✔
[7f/f435e3] collectGreetings [100%] 1 of 1, cached: 1 ✔
[9b/02e776] cowpy [ 0%] 0 of 1 ✘
ERROR ~ Error executing process > 'cowpy'
Caused by:
Process `cowpy` terminated with an error exit status (127)
Command executed:
cat COLLECTED-batch-output.txt | cowpy -c "turkey" > cowpy-COLLECTED-batch-output.txt
Command exit status:
127
Command output:
(empty)
Command error:
.command.sh: line 2: cowpy: command not found
Work dir:
/workspaces/training/hello-nextflow/work/9b/02e7761db848f82db3c3e59ff3a9b6
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
ERROR ~ Cannot access first() element from an empty List
-- Check '.nextflow.log' file for details
Oh no, there's an error!
The error code given by error exit status (127) means the executable we asked for was not found.
That makes sense, since we're calling the cowpy tool but we haven't actually specified a container yet (oops).
2.3. Use a container to run the cowpy process¶
We need to specify a container and tell Nextflow to use it for the cowpy() process.
2.3.1. Specify a container for cowpy¶
We can use the same image we were using directly in the first section of this tutorial.
Edit the cowpy.nf module to add the container directive to the process definition as follows:
| modules/cowpy.nf | |
|---|---|
This tells Nextflow that if the use of Docker is enabled, it should use the container image specified here to execute the process.
2.3.2. Enable use of Docker via the nextflow.config file¶
Notice we said 'if the use of Docker is enabled'. By default, it is not, so we need to tell Nextflow it's allowed to use Docker. To that end, we are going to slightly anticipate the topic of the next and last part of this course (Part 6), which covers configuration.
One of the main ways Nextflow offers for configuring workflow execution is to use a nextflow.config file.
When such a file is present in the current directory, Nextflow will automatically load it in and apply any configuration it contains.
We provided a nextflow.config file with a single line of code that explicitly disables Docker: docker.enabled = false.
Now, let's switch that to true to enable Docker:
Tip
It is possible to enable Docker execution from the command-line, on a per-run basis, using the -with-docker <container> parameter.
However, that only allows us to specify one container for the entire workflow, whereas the approach we just showed you allows us to specify a different container per process.
This is better for modularity, code maintenance and reproducibility.
2.3.3. Run the workflow with Docker enabled¶
Run the workflow with the -resume flag:
Command output
N E X T F L O W ~ version 25.10.2
Launching `hello-containers.nf` [drunk_perlman] DSL2 - revision: abf1dccf7f
executor > local (1)
[c9/f5c686] sayHello (3) [100%] 3 of 3, cached: 3 ✔
[ef/3135a8] convertToUpper (3) [100%] 3 of 3, cached: 3 ✔
[7f/f435e3] collectGreetings [100%] 1 of 1, cached: 1 ✔
[98/656c6c] cowpy [100%] 1 of 1 ✔
This time it does indeed work! As usual you can find the workflow outputs in the corresponding results directory, though this time they are a bit more neatly organized, with only the report and the final output at the top level, and all intermediate files shoved out of the way into a subdirectory.
Directory contents
The final ASCII art output is in the results/hello_containers/ directory, under the name cowpy-COLLECTED-batch-output.txt.
File contents
_________
/ HOLà \
| HELLO |
\ BONJOUR /
---------
\ ,+*^^*+___+++_
\ ,*^^^^ )
\ _+* ^**+_
\ +^ _ _++*+_+++_, )
_+^^*+_ ( ,+*^ ^ \+_ )
{ ) ( ,( ,_+--+--, ^) ^\
{ (\@) } f ,( ,+-^ __*_*_ ^^\_ ^\ )
{:;-/ (_+*-+^^^^^+*+*<_ _++_)_ ) ) /
( / ( ( ,___ ^*+_+* ) < < \
U _/ ) *--< ) ^\-----++__) ) ) )
( ) _(^)^^)) ) )\^^^^^))^*+/ / /
( / (_))_^)) ) ) ))^^^^^))^^^)__/ +^^
( ,/ (^))^)) ) ) ))^^^^^^^))^^) _)
*+__+* (_))^) ) ) ))^^^^^^))^^^^^)____*^
\ \_)^)_)) ))^^^^^^^^^^))^^^^)
(_ ^\__^^^^^^^^^^^^))^^^^^^^)
^\___ ^\__^^^^^^))^^^^^^^^)\\
^^^^^\uuu/^^\uuu/^^^^\^\^\^\^\^\^\^\
___) >____) >___ ^\_\_\_\_\_\_\)
^^^//\\_^^//\\_^ ^(\_\_\_\)
^^^ ^^ ^^^ ^
And there it is, our beautiful turkey saying the greetings as desired.
2.3.4. Inspect how Nextflow launched the containerized task¶
As a final coda to this section, let's take a look at the work subdirectory for one of the cowpy process calls to get a bit more insight on how Nextflow works with containers under the hood.
Check the output from your nextflow run command to find the path to the work subdirectory for the cowpy process.
Looking at what we got for the run shown above, the console log line for the cowpy process starts with [98/656c6c].
That corresponds to the following truncated directory path: work/98/656c6c.
In that directory, you will find the .command.run file that contains all the commands Nextflow ran on your behalf in the course of executing the pipeline.
File contents
#!/bin/bash
### ---
### name: 'cowpy'
### container: 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'
### outputs:
### - 'cowpy-COLLECTED-batch-output.txt'
### ...
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}
nxf_sleep() {
sleep $1 2>/dev/null || sleep 1;
}
nxf_date() {
local ts=$(date +%s%3N);
if [[ ${#ts} == 10 ]]; then echo ${ts}000
elif [[ $ts == *%3N ]]; then echo ${ts/\%3N/000}
elif [[ $ts == *3N ]]; then echo ${ts/3N/000}
elif [[ ${#ts} == 13 ]]; then echo $ts
else echo "Unexpected timestamp value: $ts"; exit 1
fi
}
nxf_env() {
echo '============= task environment ============='
env | sort | sed "s/\(.*\)AWS\(.*\)=\(.\{6\}\).*/\1AWS\2=\3xxxxxxxxxxxxx/"
echo '============= task output =================='
}
nxf_kill() {
declare -a children
while read P PP;do
children[$PP]+=" $P"
done < <(ps -e -o pid= -o ppid=)
kill_all() {
[[ $1 != $$ ]] && kill $1 2>/dev/null || true
for i in ${children[$1]:=}; do kill_all $i; done
}
kill_all $1
}
nxf_mktemp() {
local base=${1:-/tmp}
mkdir -p "$base"
if [[ $(uname) = Darwin ]]; then mktemp -d $base/nxf.XXXXXXXXXX
else TMPDIR="$base" mktemp -d -t nxf.XXXXXXXXXX
fi
}
nxf_fs_copy() {
local source=$1
local target=$2
local basedir=$(dirname $1)
mkdir -p $target/$basedir
cp -fRL $source $target/$basedir
}
nxf_fs_move() {
local source=$1
local target=$2
local basedir=$(dirname $1)
mkdir -p $target/$basedir
mv -f $source $target/$basedir
}
nxf_fs_rsync() {
rsync -rRl $1 $2
}
nxf_fs_rclone() {
rclone copyto $1 $2/$1
}
nxf_fs_fcp() {
fcp $1 $2/$1
}
on_exit() {
local last_err=$?
local exit_status=${nxf_main_ret:=0}
[[ ${exit_status} -eq 0 && ${nxf_unstage_ret:=0} -ne 0 ]] && exit_status=${nxf_unstage_ret:=0}
[[ ${exit_status} -eq 0 && ${last_err} -ne 0 ]] && exit_status=${last_err}
printf -- $exit_status > /workspaces/training/hello-nextflow/work/98/656c6c90cce1667c094d880f4b6dcc/.exitcode
set +u
docker rm $NXF_BOXID &>/dev/null || true
exit $exit_status
}
on_term() {
set +e
docker stop $NXF_BOXID
}
nxf_launch() {
docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -v /workspaces/training/hello-nextflow/work:/workspaces/training/hello-nextflow/work -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273 /bin/bash -ue /workspaces/training/hello-nextflow/work/98/656c6c90cce1667c094d880f4b6dcc/.command.sh
}
nxf_stage() {
true
# stage input files
rm -f COLLECTED-batch-output.txt
ln -s /workspaces/training/hello-nextflow/work/7f/f435e3f2cf95979b5f3d7647ae6696/COLLECTED-batch-output.txt COLLECTED-batch-output.txt
}
nxf_unstage_outputs() {
true
}
nxf_unstage_controls() {
true
}
nxf_unstage() {
if [[ ${nxf_main_ret:=0} == 0 ]]; then
(set -e -o pipefail; (nxf_unstage_outputs | tee -a .command.out) 3>&1 1>&2 2>&3 | tee -a .command.err)
nxf_unstage_ret=$?
fi
nxf_unstage_controls
}
nxf_main() {
trap on_exit EXIT
trap on_term TERM INT USR2
trap '' USR1
[[ "${NXF_CHDIR:-}" ]] && cd "$NXF_CHDIR"
export NXF_BOXID="nxf-$(dd bs=18 count=1 if=/dev/urandom 2>/dev/null | base64 | tr +/ 0A | tr -d '\r\n')"
NXF_SCRATCH=''
[[ $NXF_DEBUG > 0 ]] && nxf_env
touch /workspaces/training/hello-nextflow/work/98/656c6c90cce1667c094d880f4b6dcc/.command.begin
set +u
set -u
[[ $NXF_SCRATCH ]] && cd $NXF_SCRATCH
export NXF_TASK_WORKDIR="$PWD"
nxf_stage
set +e
(set -o pipefail; (nxf_launch | tee .command.out) 3>&1 1>&2 2>&3 | tee .command.err) &
pid=$!
wait $pid || nxf_main_ret=$?
nxf_unstage
}
$NXF_ENTRY
If you search for nxf_launch in this file, you should see something like this:
nxf_launch() {
docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -v /workspaces/training/hello-nextflow/work:/workspaces/training/hello-nextflow/work -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273 /bin/bash -ue /workspaces/training/hello-nextflow/work/98/656c6c90cce1667c094d880f4b6dcc/.command.sh
}
As you can see, Nextflow is using the docker run command to launch the process call.
It also mounts the corresponding work subdirectory into the container, sets the working directory inside the container accordingly, and runs our templated bash script in the .command.sh file.
All the hard work we had to do manually in the first section? Nextflow does it for us behind the scenes!
_______________________
< Hurray for robots...! >
-----------------------
,-----.
| |
,--| |-.
__,----| | | |
,;:: | `_____' |
`._______| i^i |
`----| |---'| .
,-------._| |== ||//
| |_|P`. /'/
`-------' 'Y Y/'/'
.==\ /_\
^__^ / /'| `i
(oo)\_______ /' / | |
(__)\ )\/\ /' / | `i
||----w | ___,;`----'.___L_,-'`\__
|| || i_____;----\.____i""\____\
Takeaway¶
You know how to use containers in Nextflow to run processes.
What's next?¶
Take a break!
When you're ready, move on to Part 6: Hello Config to learn how to configure the execution of your pipeline to fit your infrastructure as well as manage configuration of inputs and parameters.
It's the very last part, and then you'll be done with this course!