Removing the NGS Analytics Data Bottleneck with Field-Programmable Gate Arrays (FPGAs)
The following is a guest blog, written by our partners at Edico Genome.
The next-generation sequencing (NGS) analysis demand is growing at an exponential rate, creating a shortage of computing power to analyze the rapidly growing body of data. Current projections1 calculate genomic data to continue doubling every seven months, a stark acceleration in comparison to Moore’s Law, which states CPU capabilities will double every two years (Figure 1, below). The void left in-between creates a bottleneck for genomics labs.
Providing an alternative to traditional CPU-based systems, Edico Genome’s DRAGEN™ (Dynamic Read Analysis for Genomics) Platform leverages FPGA (Field-Programmable Gate Array) technology to provide customers with hardware-accelerated implementation of genome pipeline algorithms. Leveraging FPGAs, DRAGEN allows customers to analyze NGS data at unprecedented speeds with extremely high accuracy2 onsite, in the cloud, or through a blended hybrid cloud.
BaseSpace Sequence Hub, hosted on Amazon Web Services, enables the cloud-based deployment of the Edico Genome DRAGEN pipeline. Edico Genome’s DRAGEN Genome Pipeline is now readily available, enabling rapid analysis of whole genome sequencing and targeted resequencing panels.
Also co-authored by Eric Allen.
Recent advancements in the Illumina TruSeq Amplicon technology enable higher multiplexing of amplicons in a single assay. Combined with next-generation sequencing (NGS) from Illumina, NGS users can perform high throughput, high sensitivity genotyping experiments on Illumina Sequencers. The new TruSeq Amplicon 3.0 BaseSpace® Sequence Hub App introduces major improvements to support a variety of amplicon sequencing applications, including the recently launched TruSeq Genotype Ne product. TruSeq Genotype Ne is a fully customizable targeted genotyping by sequencing (GBS) solution. Key GBS features of TruSeq Amplicon 3.0 include:
- Support for custom reference genomes, allowing a user to analyze amplicon data against their choice of FASTA file (previously uploaded to Sequence Hub).
- Genotypes of Interest reporting, allowing a user to generate a tabular report of genotypes for each sample, which is analogous to genotyping array outputs.
Example usage of the Genotypes of Interest feature can be found in the example Project below. The Input VCF (variant call file) in this Project (found in the test_NA12878_GOI output files) can be used as a template and customized for use with other datasets.
Deep sequencing and high throughput microarray technologies have enabled scientists to routinely generate hundreds of thousands if not millions of new data points in a single experiment. The extraordinary rate of data generation, finite resources, and focused research interests limit most investigations to follow up on only a small fraction of the data generated from next-generation sequencing (NGS) instruments.
Ten years ago, there were no services available to curate data. Researchers relied on home grown tools to perform the cumbersome task of matching information to publications, but they didn’t have the expertise to do a re-analysis. A group of entrepreneurial scientists and bioinfomaticists envisioned a need for solutions to handle the deluge of data that was coming as more and more genomes, from different kinds of species, were being sequenced. That vision manifested into NextBio® Research, a genomics software platform that could match variant to variant sets and gene expression to DNA methylation to protein-DNA binding across a spectrum of organisms, saving researchers time and resources. By leveraging biomedical ontologies coupled with proto-machine learning algorithms, dynamic data-driven applications were added to aid in the discovery of novel relationships among diseases, compounds, gene perturbations, and pathways.
Next-Generation Sequencing (NGS) users often adopt BaseSpace Sequence Hub at times of change within their organizations. They may be new to NGS, or in the process of scaling up their operations, such as having purchased a new sequencing instrument – perhaps a NovaSeq™ Series instrument. In these situations, it can be challenging to estimate data storage and compute costs, creating uncertainty in the budgeting process.
To address this concern, we are excited to announce an unlimited data storage and compute plan for BaseSpace Sequence Hub that takes the uncertainty away. The plan enables new Sequence Hub customers to choose from either the traditional pay-for-use plan or alternatively choose a fixed-price, unlimited plan covering all data storage and compute cost in the first year.
With the plan, new customers get unlimited data storage and have access to all of the apps in BaseSpace Sequence Hub without any additional cost. The plan includes Illumina-developed apps as well as third-party apps, such as the recently announced whole genome sequencing Apps from Edico Genome (coming soon). The unlimited plan eliminates any ambiguity associated with the cost of using BaseSpace Sequence Hub and allows customers to understand their usage patterns so they can comfortably estimate their expenses in subsequent years.
For Research Use Only. Not for use in diagnostic procedures.
Analyzing the genetic basis of a given tumor is important for understanding the progression of cancer and developing new methods of treatment. Cancer researchers use a variety of methods but none of them efficiently cover all of the variations present in our genes. To help researchers address this challenge, Illumina offers TruSight® Tumor 170, a next-generation sequencing (NGS) assay designed to cover 170 genes associated with cancer.
To help TruSight Tumor customers analyze data from this assay, we are excited to announce two new apps in BaseSpace Sequence Hub:
Additionally, we have made a significant update to the Tumor Normal app. All 3 apps expand our portfolio of cancer research applications by delivering advanced, new methods NGS data generation and analysis.
Despite advances in sequencing technology, conducting studies from an informatics perspective can still be challenging. Managing, analyzing, and interpreting the large volume of data generated from genomic studies calls for a systematic, standardized, and pipeline-centric approach.1
To accommodate this type of approach, we have integrated BaseSpace Clarity LIMS and the NovaSeq Series instruments. The integration helps expedite genomic workflows and can potentially reduce human error inherent when handling and managing samples in a laboratory.
To date, most of what we know about our genome comes from studying populations of cells. Although few would argue with how far we have come to understand our genome, many researchers now realize that it may be just as important to fully examine the heterogeneity that exists within the population of cells. Evidence suggests that bulk sequencing methods can mask the contribution of individual cells. As a result, many researchers are turning to an evolving technique: single-cell sequencing.
Pioneered in the 1990s by James Eberwine2 and made more robust by the analytical sensitivity and specificity of next-generation sequencing (NGS) methods,3 single-cell sequencing enables researchers to examine the heterogeneity of cells, and promises to reveal what role individual cells play in disease and complex biological systems.
How? For every cell sequenced, researchers have a comprehensive map of the transcriptome that can be analyzed in several of different ways to characterize cells at single-cell resolution. Currently, 3 primary applications stand out: