Enabling Cancer Interpretation At Scale For The Genomics England 100K Genomics Project

Perspectives on training and on-boarding users of the Genomics England Cancer Program

By Jawahar Swaminathan, Ph.D., Program Manager – Population Genomics (aided by Keira Cheetham, Ph.D., Staff Bioinformatics Scientist)

Illumina and Genomics England announced the Bioinformatics and Clinical Interpretation partnership (BCIP) in February 2016 with the aim: “develop a platform and knowledge base that can be used to improve and automate genome interpretation.” As part of this collaboration Illumina developed a customized version of BaseSpaceVariant Interpreter (BSVI)[1] for cancer and rare disease, including various backend services to allow integration between the Genomics England case dispatch pipeline and Illumina systems. What followed was a rigorous schedule of meetings between Genomics England and Illumina (read as long hours, late nights, lots of coffee and many meetings at Genomics England HQ in London!) leading to development of essential features for cancer interpretation.

In June 2017, following multiple rounds of user acceptance testing and concordance checks, BSVI was adopted by Genomics England as the default interpretation solution. Illumina then began the process of on-boarding various users at the 13 Genomic Medicine Centres (GMC), the recruiting hubs for various regions of England by organizing training sessions on the use of the software with particular focus on the unique way data entered and left the system. This article is a look back on these activities and how they are helping in the development of genome interpretation software that meets the diverse needs of the Genomics England end users.

Figure 1: The Genomics England Genomic Medicine Centres (Image Courtesy: Genomics England Ltd.

The GMC training sessions

Over the course of 2018, we carried out training and outreach activities across most of the GMCs. The GMCs are the recruitment hubs for the Genomics England 100,000 Genomes Project and comprise of multiple hospitals centered around a geographical area that has the necessary expertise. All training activities were organized by the Genomics England Cancer Interpretation team and were also attended by a representative from Genomics England.

Some humorous takeaways:

  1. Long hours on an early morning packed train from Cambridge (where we are situated) to our destination city, including a hurriedly eaten lunch at a busy Costa Coffee (yes almost every hospital in the UK has one of these) at the hospital before the training! Throw in the occasional aborted visit due to an alarmingly growing windscreen crack on a rental car or boarding the wrong train and you have the makings of a long and interesting day.
  2. Every NHS hospital looks the same. The usual 1960s concrete exterior, the same typeface on the signs and the same warren of corridors to the Clinical Genetics department
  3. Working out how to use the different display equipment in different hospitals before attempting to figure out internet connectivity on the slow and ageing hospital computer systems.
  4. Hot chocolate or a burrito on the return leg at the local train station as a treat for a job done well
  5. Never work with children, animals, or live demos. Although we always got the live demo to work!

All training activities were conducted by my colleague Keira Cheetham and I and involved a mix of presentations, live demos using cases specific to the GMC followed by hands-on instructions on how to use the software and send results back to Genomics England for reporting. The training was also an opportunity for us to talk about the science around interpreting cancer genomes and how Illumina is facilitating greater insights into cancers with whole genome sequencing (WGS).

This was also a great opportunity to see how the BCIP tools were used by GMC users and any feedback (both good and bad) were gratefully received. We also spoke about upcoming features in these sessions. Attendance at these events varied from 2-10 users per GMC and the venues ranged from really tight spaces (sometimes with windows!) to large meeting rooms and everything in between. However, what was consistent throughout was the motivation and dedication of the NHS staff in delivering the best possible care to their patients recruited into the Genomics England 100,000 Genomes Project cancer program.

Illumina continues to work with Genomics England to extend its BCIP tools for Rare Disease interpretation and this offering will soon be available for user acceptance testing and following that, could be used in Genomics England’s suite of clinical interpretation systems. In the meantime, the UK NHS has announced the commissioning of WGS for rare disease and cancer, to be offered throughout the health system. The outreach activities of 2018 carried out by Keira and I for cancer will keep in us good stead for the next round of training for rare disease.

The Genomics England Cancer Outreach Program by numbers

  • ~76 GMC users across 11 GMCs trained
  • ~ 34 hours of training imparted
  • ~4000 miles travelled (all by British Rail barring Belfast Northern Ireland)

[1] The version of BSVI co-developed with Genomics England as part of the BCIP contains extensive customizations for their use cases and is not openly accessible to the public. Please contact your Illumina sales representative for guidance on how to use the publicly available version of BSVI.

New Sequence Quality Metrics in BaseSpace™ Sequence Hub

The Run Monitoring features in BaseSpaceTM Sequence Hub (BSSH) enable users to remotely monitor the quality of their sequencing runs and troubleshoot sequencing errors. As part of our efforts to extend real time Run Monitoring capabilities, we recently released new data quality metrics in BSSH.

 

% Occupancy for iSeq™ and MiniSeq™ instruments

 In a previous release, we added the %Occupied measure in the Charts section of Run Monitoring for the NovaSeq™ systems. As part of this release, this metric will now be visible for iSeq and MiniSeq systems, in BaseSpace Sequence Hub. This measure can be used to understand loading concentrations on the flow cell.

For patterned and non-patterned flow cells, % Occupancy is the percentage of clusters on the flowcell that have DNA that can ultimately be sequenced. With patterned flow cells (such as iSeq), the number of nano wells on the patterned grid determines the total number of possible clusters. For non-patterned flow cells (such as MiniSeq), the total number of possible clusters is the number of non-duplicated spots identified by Real Time Analysis (RTA) during template generation.

 

new metrics

 

% Pass Filter (%PF) settings for all instruments

 The Flow Cell chart in BaseSpace Sequence Hub has also been updated to include the %Pass Filter (%PF) for all instruments. This additional information will allow users to determine in particular tiles of a flowcell have unusual levels of %PF.

%PF

With these enhancements, we have added capabilities that are currently not available in Sequence Analysis Viewer (SAV). SAV will be updated in the future so our users have a consistent experience across SAV and BSSH.

 

#QB6200

BaseSpace™ Clarity LIMS NovaSeq™ Integration Now Supports the S1 Flow Cell

Integration and interoperability between laboratory systems –or lack thereof—remains a challenge for those performing next-generation sequencing (NGS) or other genomics studies.[i] To address this challenge, we developed version 2.2 of the integration between BaseSpace Clarity LIMS and the NovaSeq 6000 instrument. This integration now supports the NovaSeq S1 flow cell.

The NovaSeq S1 flow cell delivers up to 0.5TB of output in two days and is ideally suited for high-intensity sequencing applications. Users can now sequence up to 8 human genomes or 80 exomes per run in approximately 24 hours.[ii] And now, users of both Basespace Clarity LIMS and NovaSeq 6000 instrument can access this out-of-the box integration to quickly get up and running with their system.

fun format.png

The NovaSeq 6000 version 2.0 Workflow in BaseSpace Clarity LIMS that supports the integration version 2.2.1

 

The integration helps users track samples throughout the workflow. Specifically, it:

  • Supports S1, S2, and S4 flow cells per sample
  • Supports different applications on the same flow cell
  • Calculates samples and reagents volumes based on the flow cell type
  • Creates an output file for use with liquid handling robots
  • Validates every step in the workflow

The integration also tracks sequencing run information in BaseSpace Clarity LIMS to help with troubleshooting or trending:

  • Run recipe files (JSON) are automatically generated to set up and initiate the run
  • Sample sheets, which are compatible with BaseSpace Sequence Hub and bcl2fastq v 2.19, are automatically generated and placed directly on the NovaSeq 6000 instrument
  • Sequencing run are tracked and run metrics are parsed per lane and per flow cell

If you have questions about this integration, please contact Technical Support.

For Research Use Only. Not for use in diagnostic procedures.


 

[i] Next-Generation Sequencing Informatics: Challenges and … http://www.bing.com/cr?IG=74008A18392242E59F11965A936C0331&CID=1B0873003B0C6EB91053783A3A0A6F0E&rd=1&h=qZ8eqx6ov_OxkAzDtTWfrbsSZM2WP_pCoQuO66f-AVI&v=1&r=http%3a%2f%2fwww.archivesofpathology.org%2fdoi%2f10.5858%2farpa.2015-0507-RA&p=DevEx,5067.1. Accessed November 14, 2017.

[ii]  Illumina.com. (2017). Illumina Releases NovaSeq S4 Flow Cell and NovaSeq Xp Workflow. [online] Available at: https://www.illumina.com/company/news-center/press-releases/2017/2308795.html [Accessed 16 Nov. 2017].

 

 

Characterizing Bacterial Single Isolates with BaseSpace™ Sequence Hub Apps

A guest blog, written by GoSeqIt

In an increasingly globalized world, bacteria can spread rapidly and easily. Furthermore, they often contain genes that make them resistant to antibiotics or confer high virulence. Sequencing the entire genome of bacteria enables a thorough characterization and thus makes it possible for researchers to monitor the spread of particular strains of bacteria or sets of genes.

In collaboration with the Illumina BaseSpace Sequence Hub development team, GoSeqIt has published two apps for characterization of bacterial single isolates. Both of these apps are now available to BaseSpace Sequence Hub users:

The input for both apps is a bacterial complete or draft genome in FASTA format (only files with the extension .fa or .fasta are accepted).

The genomes may have been generated by either the BaseSpace SPAdes Genome Assembler app or the Velvet de novo Assembly app.

Bacterial Analysis Pipeline App

The Bacterial Analysis Pipeline app will initially predict the species of the bacterial draft genome based on the number of kmers (oligonucleotides with the length k) co-occurring between the input genome and bacterial genomes in a reference database (1). Further, acquired antimicrobial resistance genes are identified using a BLAST-based approach, where the nucleotide sequence of the input genome is compared to the genes in the ResFinder database (2). Depending on the identified species, Multilocus Sequence Typing (MLST) is performed, also using a BLAST-based approach (3). One-hundred-twenty-five (125) MLST schemes are currently available.

If the input genome is recognized as belonging to Enterobacteriaceae or the gram positive bacteria (Enterococcus, Streptococcus, or Staphylococcus), BLAST is used to search for plasmid replicons using the PlasmidFinder database (4). Identified plasmids of the incF, IncH1, IncH2, IncI1, IncN, or IncA/C type are further subtyped by plasmid MLST (4). Finally, identified Escherichia coli, Enterococcus sp., Listeria sp., and Staphylococcus aureus are compared to the VirulenceFinder database containing known virulence genes (5). For more information, refer to the article titled “Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.” Figure 1 illustrates the output for species prediction and MLST, while figure 2 illustrates the output for the prediction of acquired antimicrobial resistance genes.

fig1

Figure 1: Example of output from the Bacterial Analysis Pipeline app for species prediction and MLST of the input genome.

fig2.png

Figure 2: Example of output from the Bacterial Analysis Pipeline app for acquired antimicrobial resistance genes in the input genome.

E. coli Serotyping App

The E. coli Serotyping app uses a BLAST-based approach to predict the serotype of E. coli isolates by comparing the input genome with a database of specific O-antigen processing system genes for O typing and flagellin genes for H typing (7). The app outputs the predicted serotype along with the identified O-antigen genes (wzx, wzy, wzm, and wzt) and flagellin genes (fliC, flkA, fllA, flmA, and flnA).
fig3.png

Figure 3: Example of output from the E. coli Serotyping app. So far, only E. coli isolates can in this way be in silico serotyped.

Using the New Apps

The price for using the Bacterial Analysis Pipeline app is 5 iCredits per uploaded file plus the cost of computing. The E. coli Serotyping app costs 1 iCredit per uploaded file plus the cost of computing.

Both apps use methods that have been throughly described and published in renowned scientific journals.

References

1) Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Hasman H, Sicheritz-Pontén T, Aarestrup FM, Ussery DW, Lund O. Benchmarking of methods for genomic taxonomy. J Clin Microbiol. 2014 May;52(5):1529-39. PMID: 24574292.

2) Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother. 2012 Nov;67(11):2640-4. PMID: 22782487.

3) Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-Pontén T, Ussery DW, Aarestrup FM, Lund O. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 2012 Apr;50(4):1355-61. PMID: 22238442.

4) Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O, Villa L, Møller Aarestrup F, Hasman H. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother. 2014 Jul;58(7):3895-903. PMID: 24777092.

5) Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol. 2014 May;52(5):1501-10. PMID: 24574290.

6) Thomsen MC, Ahrenfeldt J, Cisneros JL, Jurtz V, Larsen MV, Hasman H, Aarestrup FM, Lund O. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance. PLoS One. 2016 Jun 21;11(6):e0157718. PMID: 27327771.

7) Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol. 2014 May;52(5):1501-10. PMID: 24574290.

 

For Research Use Only. Not for use in diagnostic procedures.

Upcoming BaseSpace Developer Conference in San Francisco!

We want to invite all of you to the BaseSpace Developer Conference in San Francisco!  We’ve been active with many BaseSpace Developer Conferences throughout the world this year, including Heidelberg, Singapore, Bangalore, and our most recent visit to the University of Tokyo in Japan!

First of all, we would like to thank all of our developers and speakers, you all made this possible.  We hope it was a great learning experience and look forward to the apps we can bring to BaseSpace.  Also, a big shout out to the University of Tokyo for hosting the event and our Illumina team in Japan.

developer pic

The events showcase the new Native App Engine within BaseSpace with which developers can easily adapt their command-line pipelines into the BaseSpace cloud infrastructure or an infrastructure of their choice.

During the event, developers are taken through a step-by-step walkthrough where they develop two separate BaseSpace applications by the end!  For anyone that is interested in learning more about BaseSpace App development, there is a lot of documentation available on the BaseSpace Developer Portal for both Native and Web applications.

B1KY_rrCIAAW48F

We also spend time interacting with developers and users directly to brainstorm ideas and answer any questions they may have.

helping individual dev

We are hosting another BaseSpace Developer Conference in San Francisco on December 8th, if you are interested in attending you can sign up here.

To get an idea of whats in store for you when you attend one of our developer conferences, check us out on twitter at #basedev2014.

For any further questions about BaseSpace App development, please view or post on the developer forum or contact us through BaseSpace support.

Introducing our First BaseSpace Labs Applications – FastQC and Velvet de novo Assembly

We are excited to announce two new applications in BaseSpace, FastQC and Velvet de novo Assembly.

 denovo_assembly_100                                     FastQC_icon_100

     Velvet de novo Assembly                                      FastQC

Both applications are currently available for all users and were built using the BaseSpace Native App Engine by our internal R&D groups.  These two applications are also the first BaseSpace Labs Apps of many more to come, the concept behind BaseSpace Labs Apps is explained in more detail below.

BaseSpace Labs Apps are Illumina’s internally developed applications that extend the functionality within BaseSpace.  Some BaseSpace Labs applications will be experimental or research focused, while others will be used as a step in a greater workflow.  The Apps are reviewed regularly by our team and put through the same review process as third-party apps.

BaseSpace Labs Apps are developed using an accelerated development process in order to make them available to BaseSpace users faster than the BaseSpace Core Apps.  It is important to note that, unlike BaseSpace Core Apps, BaseSpace Labs Apps are not officially supported by Illumina Customer Service.  Support for BaseSpace Labs applications is provided at the developer’s discretion and the apps are provided as-is without any warranty of any kind.

The FastQC app can be used to provide a quality assessment of the sequence data generated using Illumina sequencers.  FastQC for BaseSpace is based on the FastQC software developed by the Bioinformatics Group at the Babraham Institute.  It provides a modular set of analyses which can be used quickly to assess if there are any problems with the sequencing data before doing any additional analysis.

fastqcscreenshot

The above figure shows an example output from the FastQC app depicting the quality score across all bases at a given position in the reads.  For an example of additional output generated by FastQC, please view this FastQC demo project.

The Velvet de novo Assembly app is a de novo assembly pipeline for bacterial samples using the Velvet assembler.  One of the key features of this app is that it has an adapter trimming protocol that has been optimized for the Nextera Mate-Pair library prep kit.  An application note describing the de novo assembly of 9 different bacterial using the Velvet de novo Assembly app can be found here.  In many cases, a single contig representing the entire bacterial genome can be assembled.  The figure below is an example of the output generate by the Velvet de novo Assembly app.

rsz_denovoscreenshot

Example output generated by the Velvet de novo Assembly can be found here.  We hope you enjoy the FastQC and Velvet de novo Assembly apps.  For any questions, feedback, or feature requests for these applications, please send an email to basespacelabs@illumina.com and include the name of the application.  Thank you!

DeepChek®-HIV – App for genotyping by NGS and inferred drug resistance testing – for research use only

DeepCheck-HIV

DeepChek®-HIV

HIV genotyping and inferred drug resistance testing has become an integral part of the clinical management of patients infected with HIV. Detecting minority populations of resistant viruses is now routinely done. Next-generation sequencing (NGS) technology is replacing  Sanger sequencing methodology, and end-to-end solutions combining sensitive genomic tests with advanced data management software platforms are in high demand.

DeepChek®-HIV is easy-to-use downstream analysis software for NGS data management, interpretation, and reporting for Research Use Only. DeepChek is a reliable software and database solution that is capable of handling the complexity of NGS data for all the key genomic regions involved in HIV drug resistance (reverse transcriptase, protease, integrase, GP41, and GP120/V3). The database is regularly updated with the most recent drug resistance information and provides an efficient and downstream analysis platform for clinical laboratories involved in routine HIV-1 genotyping and drug resistance testing.

 

Link to App in BaseSpace:

https://basespace.illumina.com/apps/414414/DeepChek-HIV 

 

Link to example dataset with example input data and output results:

https://basespace.illumina.com/s/krdEqmmpwTrn