By Aleksandra Pawlik, Ngoni Faya, Joseph Guhlin, Megan Guidry, Tom Harrop, Dinindu Senanayake
Rapid development of computational bioinformatics tools mean we can more easily push research boundaries. However, it comes at a cost.
The complexity of the software chain that needs to be installed and configured to run advanced workflows results in researchers spending hours, if not days, trying to set up and debug the computational environment before they can even start to analyse their data. Complex setups also hinder reproducibility, one of the core principles of science.
Fortunately, there is a solution growing in popularity – containers.
Simply put, containers allow wrapping up software packages in an executable environment which can be moved from one computer to another regardless of which operating system these computers run. You can think of it as a “computer inside a computer”, contained in a relatively lightweight file.
Once created, containers make the analysis workflow fully reproducible without the need to install and configure the required software.Containers were created to ease the life of software developers, but they quickly made their way into the research world. It’s therefore useful for researchers to understand more about container processes.
There are several container systems; the most popular being Docker. However, due to several factors, it is not the best fit for High-Performance Computing (HPC), the environment commonly used for bioinformatics analysis. Singularity, with its security focus and high-compatibility with Docker, is currently the best container solution for HPC.
In July 2020, Genomics Aotearoa, New Zealand eScience Infrastructure (NeSI) and Manaaki Whenua – Landcare Research (MWLR) came together to organise a webinar introducing Singularity containers for reproducible bioinformatics.
The webinar was delivered by two experienced container users, Joseph Guhlin and Tom Harrop, theoretical genomicists from the University of Otago. They gave an overview of containers and their main features, followed by an easy to follow tutorial “Genome assembly of diatom Chr 17”.
The webinar attracted over 70 participants from New Zealand universities and CRIs – this turnout emphasising the usefulness and widespread interest of containers for modern-day research. Those unable to attend the webinar live are encouraged to watch a full recording on NeSI’s youtube channel
Learning Singularity, just like any other computational tool, takes time and practice. However, the potential gains for researchers are significant. They can easily reuse containers shared by their peers, tool and pipeline developers, and create their own, making their research robust and reproducible.
It is also becoming best practice, and sometimes even a requirement, to publish not just datasets but also the workflows in containers alongside scientific papers.
- Containers on HPC and Cloud with Singularity by Pawsey Computing Centre
- Singularity Containers for Bioinformatics by Pawsey Computing Centre
- R Docker tutorial by ROpenSci Labs
Research is underpinning New Zealand’s response to economic, health, and environmental challenges. Creating opportunities for researchers to learn about containers and other digital best practices, then, is critical for ensuring tomorrow’s decisions are based on robust and reproducible research outputs.
Want to know more about what NeSI and Genomics Aotearoa are doing to support capability growth among New Zealand’s researchers? Check out a summary of their training partnership here or read about the growth of data analysis skills in NZ here. To request a future training topic reach out to email@example.com or subscribe to NeSI’s training mailing list to get notified of future training events.