BaseSpace® Variant Interpreter competes in Next-generation Sequencing (NGS) Bioinformatics Challenge at European Congress of Pathology 2017
Differences in bioinformatics pipelines may contribute to substantial variability across labs, in terms of variant annotation, interpretation, and reporting. The lack of standardization is an emerging concern, especially given the growing availability of commercial bioinformatics software options that reduce the barrier for new labs to adopt next-generation sequencing (NGS). To demonstrate how differences in commercial software can influence analysis, organizers of the Two-Day Symposium for Molecular Biologists in Pathology at the European Congress of Pathology (ECP) 2017 set up an NGS Bioinformatics Challenge where both Illumina and QIAGEN were invited to participate.
The concept of the challenge was simple: 3 institutions in Germany (Universitätsklinikum Köln, Erlangen, and Charite in Berlin) contributed FASTQ files for a total of 12 tumor samples that were known to harbor pathogenic variants. These data were then sent to Illumina and QIAGEN 2 months before the event, and subjected to variant calling and interpretation using their commercially available offerings. Both Illumina and QIAGEN were blinded to the identity of the known variants, and reported on their findings at a round-table session at ECP where the organizers also revealed what the expected variants were, and how they had been interpreted by each institution. To add an interesting twist, each contributing institution had used a different library prep (and sequencing platform) for their samples: The Berlin samples used the AmpliSeq Colon and Lung v2 hotspot panel and were sequenced on the Ion Torrent PGM; both Erlangen and Köln samples were sequenced on an Illumina MiSeq™ System, but the Erlangen samples used the Illumina TruSight® Tumor 15 prep and the Köln samples used a custom QIAGEN amplicon panel.
Removing the NGS Analytics Data Bottleneck with Field-Programmable Gate Arrays (FPGAs)
The following is a guest blog, written by our partners at Edico Genome.
The next-generation sequencing (NGS) analysis demand is growing at an exponential rate, creating a shortage of computing power to analyze the rapidly growing body of data. Current projections1 calculate genomic data to continue doubling every seven months, a stark acceleration in comparison to Moore’s Law, which states CPU capabilities will double every two years (Figure 1, below). The void left in-between creates a bottleneck for genomics labs.
Providing an alternative to traditional CPU-based systems, Edico Genome’s DRAGEN™ (Dynamic Read Analysis for Genomics) Platform leverages FPGA (Field-Programmable Gate Array) technology to provide customers with hardware-accelerated implementation of genome pipeline algorithms. Leveraging FPGAs, DRAGEN allows customers to analyze NGS data at unprecedented speeds with extremely high accuracy2 onsite, in the cloud, or through a blended hybrid cloud.
BaseSpace Sequence Hub, hosted on Amazon Web Services, enables the cloud-based deployment of the Edico Genome DRAGEN pipeline. Edico Genome’s DRAGEN Genome Pipeline is now readily available, enabling rapid analysis of whole genome sequencing and targeted resequencing panels.
Also co-authored by Eric Allen.
Recent advancements in the Illumina TruSeq Amplicon technology enable higher multiplexing of amplicons in a single assay. Combined with next-generation sequencing (NGS) from Illumina, NGS users can perform high throughput, high sensitivity genotyping experiments on Illumina Sequencers. The new TruSeq Amplicon 3.0 BaseSpace® Sequence Hub App introduces major improvements to support a variety of amplicon sequencing applications, including the recently launched TruSeq Genotype Ne product. TruSeq Genotype Ne is a fully customizable targeted genotyping by sequencing (GBS) solution. Key GBS features of TruSeq Amplicon 3.0 include:
- Support for custom reference genomes, allowing a user to analyze amplicon data against their choice of FASTA file (previously uploaded to Sequence Hub).
- Genotypes of Interest reporting, allowing a user to generate a tabular report of genotypes for each sample, which is analogous to genotyping array outputs.
Example usage of the Genotypes of Interest feature can be found in the example Project below. The Input VCF (variant call file) in this Project (found in the test_NA12878_GOI output files) can be used as a template and customized for use with other datasets.
Deep sequencing and high throughput microarray technologies have enabled scientists to routinely generate hundreds of thousands if not millions of new data points in a single experiment. The extraordinary rate of data generation, finite resources, and focused research interests limit most investigations to follow up on only a small fraction of the data generated from next-generation sequencing (NGS) instruments.
Ten years ago, there were no services available to curate data. Researchers relied on home grown tools to perform the cumbersome task of matching information to publications, but they didn’t have the expertise to do a re-analysis. A group of entrepreneurial scientists and bioinfomaticists envisioned a need for solutions to handle the deluge of data that was coming as more and more genomes, from different kinds of species, were being sequenced. That vision manifested into NextBio® Research, a genomics software platform that could match variant to variant sets and gene expression to DNA methylation to protein-DNA binding across a spectrum of organisms, saving researchers time and resources. By leveraging biomedical ontologies coupled with proto-machine learning algorithms, dynamic data-driven applications were added to aid in the discovery of novel relationships among diseases, compounds, gene perturbations, and pathways.
Next-Generation Sequencing (NGS) users often adopt BaseSpace Sequence Hub at times of change within their organizations. They may be new to NGS, or in the process of scaling up their operations, such as having purchased a new sequencing instrument – perhaps a NovaSeq™ Series instrument. In these situations, it can be challenging to estimate data storage and compute costs, creating uncertainty in the budgeting process.
To address this concern, we are excited to announce an unlimited data storage and compute plan for BaseSpace Sequence Hub that takes the uncertainty away. The plan enables new Sequence Hub customers to choose from either the traditional pay-for-use plan or alternatively choose a fixed-price, unlimited plan covering all data storage and compute cost in the first year.
With the plan, new customers get unlimited data storage and have access to all of the apps in BaseSpace Sequence Hub without any additional cost. The plan includes Illumina-developed apps as well as third-party apps, such as the recently announced whole genome sequencing Apps from Edico Genome (coming soon). The unlimited plan eliminates any ambiguity associated with the cost of using BaseSpace Sequence Hub and allows customers to understand their usage patterns so they can comfortably estimate their expenses in subsequent years.
For Research Use Only. Not for use in diagnostic procedures.
Despite advances in sequencing technology, conducting studies from an informatics perspective can still be challenging. Managing, analyzing, and interpreting the large volume of data generated from genomic studies calls for a systematic, standardized, and pipeline-centric approach.1
To accommodate this type of approach, we have integrated BaseSpace Clarity LIMS and the NovaSeq Series instruments. The integration helps expedite genomic workflows and can potentially reduce human error inherent when handling and managing samples in a laboratory.
To date, most of what we know about our genome comes from studying populations of cells. Although few would argue with how far we have come to understand our genome, many researchers now realize that it may be just as important to fully examine the heterogeneity that exists within the population of cells. Evidence suggests that bulk sequencing methods can mask the contribution of individual cells. As a result, many researchers are turning to an evolving technique: single-cell sequencing.
Pioneered in the 1990s by James Eberwine2 and made more robust by the analytical sensitivity and specificity of next-generation sequencing (NGS) methods,3 single-cell sequencing enables researchers to examine the heterogeneity of cells, and promises to reveal what role individual cells play in disease and complex biological systems.
How? For every cell sequenced, researchers have a comprehensive map of the transcriptome that can be analyzed in several of different ways to characterize cells at single-cell resolution. Currently, 3 primary applications stand out: