Happy 10-Year Anniversary to BaseSpace® Correlation Engine

Deep sequencing and high throughput microarray technologies have enabled scientists to routinely generate hundreds of thousands if not millions of new data points in a single experiment. The extraordinary rate of data generation, finite resources, and focused research interests limit most investigations to follow up on only a small fraction of the data generated from next-generation sequencing (NGS) instruments.

Ten years ago, there were no services available to curate data. Researchers relied on home grown tools to perform the cumbersome task of matching information to publications, but they didn’t have the expertise to do a re-analysis. A group of entrepreneurial scientists and bioinfomaticists envisioned a need for solutions to handle the deluge of data that was coming as more and more genomes, from different kinds of species, were being sequenced. That vision manifested into NextBio® Research, a genomics software platform that could match variant to variant sets and gene expression to DNA methylation to protein-DNA binding across a spectrum of organisms, saving researchers time and resources. By leveraging biomedical ontologies coupled with proto-machine learning algorithms, dynamic data-driven applications were added to aid in the discovery of novel relationships among diseases, compounds, gene perturbations, and pathways.

After the acquisition of NextBio by Illumina in 2014, one of the primary NextBio Research utilities was rebranded as BaseSpace® Correlation Engine. Today it stands as a key pillar in the BaseSpace® Informatics Suite.

The BaseSpace Correlation Engine public study library has steadily grown over the years, approaching 21,000 studies, with more than 130,000 experimental gene signatures that have been collected and curated by a highly skilled team of scientists.* Illumina has been working hard to engage with ecosystem partners  to improve the quality of the user experience. Illumina has partnered with Elsevier over the past year to create connectivity between BaseSpace Correlation Engine and Elsevier’s Pathway Studio. Users can apply data filters to their results or work with the public data and visualize functional relationships among genes found in an experiment.

BaseSpace Correlation Engine results have found their way into hundreds of peer-reviewed citations from distinguished universities around the world and from many of the top 25 large pharmaceutical organizations including:


Pharmaceutical Organizations

Sanford Burnham Celgene
Mayo Clinic Sanofi
University of Pittsburgh Regeneron
University of Southern California Boehringer-Ingelheim
Stanford Pfizer
Harvard Johnson & Johnson
Cornel Weil Medical College Merck
Karolinska Institute Medimmune
Emory University
Kyoto University

Government Organizations

Vanderbilt NIH-National Institute
of Environmental Health Sciences
Yale Health Canada
University of California – Davis Environmental Protection Agency

If you would like to learn more about BaseSpace Correlation Engine:

*Based on internal database read as of 5/2017.

For Research Use Only. Not for use in diagnostic procedures.

Introducing unlimited data and compute plans for new BaseSpace® Sequence Hub customers

Next-Generation Sequencing (NGS) users often adopt BaseSpace Sequence Hub at times of change within their organizations. They may be new to NGS, or in the process of scaling up their operations, such as having purchased a new sequencing instrument – perhaps a NovaSeq™ Series instrument. In these situations, it can be challenging to estimate data storage and compute costs, creating uncertainty in the budgeting process.

To address this concern, we are excited to announce an unlimited data storage and compute plan for BaseSpace Sequence Hub that takes the uncertainty away. The plan enables new Sequence Hub customers to choose from either the traditional pay-for-use plan or alternatively choose a fixed-price, unlimited plan covering all data storage and compute cost in the first year.

With the plan, new customers get unlimited data storage and have access to all of the apps in BaseSpace Sequence Hub without any additional cost. The plan includes Illumina-developed apps as well as third-party apps, such as the recently announced whole genome sequencing Apps from Edico Genome (coming soon). The unlimited plan eliminates any ambiguity associated with the cost of using BaseSpace Sequence Hub and allows customers to understand their usage patterns so they can comfortably estimate their expenses in subsequent years.

These plans are available for both the U.S. and Frankfurt sites. Please contact us to learn more.

For Research Use Only.  Not for use in diagnostic procedures.

Advancing cancer research with new BaseSpace® Sequence Hub Apps 

Analyzing the genetic basis of a given tumor is important for understanding the progression of cancer and developing new methods of treatment. Cancer researchers use a variety of methods but none of them efficiently cover all of the variations present in our genes. To help researchers address this challenge, Illumina offers TruSight® Tumor 170, a next-generation sequencing (NGS) assay designed to cover 170 genes associated with cancer.

To help TruSight Tumor customers analyze data from this assay, we are excited to announce two new apps in BaseSpace Sequence Hub:

Additionally, we have made a significant update to the Tumor Normal app. All 3 apps expand our portfolio of cancer research applications by delivering advanced, new methods NGS data generation and analysis.

TruSight Tumor 170 App

The TruSight Tumor 170 app enables streamlined analysis of samples prepared using the TruSight® Tumor 170 library prep kit. This comprehensive somatic panel targets 170 genes and is based on hybrid capture technology optimized for Formalin-Fixed, Paraffin-Embedded (FFPE) samples. By using both DNA and RNA input sample pairs, TruSight Tumor 170 can detect small variants (Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels), amplifications, structural variants (gene fusions), and splice variants. The TruSight Tumor 170 app performs alignment and variant calling for all variant types in a single analysis workflow, and can analyze up to 16 samples (both DNA and RNA) in a single run.

TruSight Tumor 170 + Watson for Genomics Converter App

To efficiently extract information from a TruSight Tumor 170 sample, Illumina has partnered with IBM Watson for Genomics to expedite variant analysis. By leveraging natural language processing (a form of artificial intelligence), Watson for Genomics delivers an annotated, prioritized variant summary containing curated information on the significance of variants detected by sequencing. This curated information includes drug guidelines, clinical trials, and literature matches. TruSight Tumor 170 customers have the option to purchase add-on access to Watson for Genomics when buying the library prep kit. The TruSight Tumor 170 + Watson for Genomics app converts the output from the standard TruSight Tumor 170 app into variant calling files with a format suitable for upload into the Watson for Genomics portal. This app does not have a compute cost, but upload into Watson for Genomics requires purchase of the add-on license.

Tumor Normal App

Lastly, we have updated the Tumor Normal app to version 4.0. This app has improved performance, while providing updated variant callers for more accurate detection of somatic variants. As with version 3.0, the Tumor Normal app can detect small variants (SNPs and InDels), structural variants (gene fusions), and copy number variants (amplifications and deletions).

For more information, contact us.

For Research Use Only. Not for use in diagnostic procedures. 

BaseSpace® Clarity LIMS integration with NovaSeq™ Series Instruments

Despite advances in sequencing technology, conducting studies from an informatics perspective can still be challenging. Managing, analyzing, and interpreting the large volume of data generated from genomic studies calls for a systematic, standardized, and pipeline-centric approach.1

To accommodate this type of approach, we have integrated BaseSpace Clarity LIMS and the NovaSeq Series instruments. The integration helps expedite genomic workflows and can potentially reduce human error inherent when handling and managing samples in a laboratory.

Ready to use, this integration connects BaseSpace Clarity LIMS to the NovaSeq instrument with automated tracking and file generation. Users of both systems can:

  • Apply a pipeline-based approach from sample accessioning to secondary analysis of the data.
  • Positively track samples from sample accessioning to secondary analysis through automation and validation of sample indexes and reagent barcodes.
  • Automate sequencing run information and parse key sequencing metrics from the instrument back into BaseSpace Clarity LIMS.
  • Initiate secondary analysis by streaming sequencing information directly to BaseSpace Sequence Hub.


The protocol provides a series of validated steps, as noted in the illustration, below.


The NovaSeq Series preconfigured protocol as seen in BaseSpace Clarity LIMS.

Additionally, the integration has several points at which users can validate the integration and efficiently test it before putting into production. These points include:

Integration validation point 1

BaseSpace Clarity LIMS automatically calculates library normalization and pooling volumes. BaseSpace Clarity LIMS generates the run info file and the NovaSeq Sample Sheet including the Library tube ID, which are automatically placed into a specific network folder on the instrument.

Integration validation point 2

Key primary sequencing metrics, such as Yield, %Q30, %Reads PF, Number of reads, etc., are automatically parsed into BaseSpace Clarity LIMS. This parsing enables users to generate sequencing statistics and monitor sequencing instrument performance over time.

Integration validation point 3

NovaSeq 6000 integrates with BaseSpace Sequence Hub, where sequencing run details and sequencing data are automatically sent, thus making the triggering of downstream analysis even easier.

The complete integration is available for BaseSpace Clarity LIMS Gold users, although a more simplified version is available to all BaseSpace Clarity LIMS users. Additionally, the integration is currently compatible with S2 flowcells; additional functionality will become available to the integration when new flowcells are available.

For more information about this integration, please contact us.

For Research Use Only. Not for use in diagnostic procedures.
  1. “Big Biological Data: Challenges And Opportunities”. Sciencedirect.com. N.p., 2017. Web. 17 Apr. 2017.

Singling out solutions for single-cell analysis

To date, most of what we know about our genome comes from studying populations of cells. Although few would argue with how far we have come to understand our genome, many researchers now realize that it may be just as important to fully examine the heterogeneity that exists within the population of cells. Evidence suggests that bulk sequencing methods can mask the contribution of individual cells. As a result, many researchers are turning to an evolving technique: single-cell sequencing.

Pioneered in the 1990s by James Eberwine2 and made more robust by the analytical sensitivity and specificity of next-generation sequencing (NGS) methods,3 single-cell sequencing enables researchers to examine the heterogeneity of cells, and promises to reveal what role individual cells play in disease and complex biological systems.

How? For every cell sequenced, researchers have a comprehensive map of the transcriptome that can be analyzed in several of different ways to characterize cells at single-cell resolution. Currently, 3 primary applications stand out:

  • Assessing cell-to-cell heterogeneity. In this application, researchers dissect cell subtypes in a heterogeneous population of cells using cell surface markers to characterize cell types within a population. Using this method, cells can be bioinformatically classified based on expression levels of thousands of genes using clustering approaches, such as principal component analysis (PCA). This process has even enabled discovery of new cell types that were not previously known.4
  • Mapping cell trajectories. Using this application, researchers can investigate cell lineage trajectories over time and possibly detect expression changes occurring in only a subset of cells or substates along a development path. Notably, in traditional bulk-cell sequencing approaches, these trajectories would be missed as they would be averaged across the population.
  • Dissecting transcriptional mechanics. Using this application, researchers can classify individual cells according to a gene’s transcription state, such as presence or absence of a transcription factor.

Yet researchers who conduct single-cell sequencing still face throughput and analysis challenges, so with the potential for this method comes the need for more refined sequencing and bioinformatics tools.

A scalable, high throughput, and straightforward solution

To deliver on the promise of single-cell biology, the Illumina® Bio-Rad® Single-Cell Sequencing Solution combines the Bio-Rad Droplet Digital™ Technology with Illumina NGS library preparation, sequencing, and analysis technologies. This new platform provides a comprehensive workflow for single-cell RNA-Seq that enables controlled experiments with multiple samples, treatment conditions, and time points.

This co-developed solution enables transcriptome analysis of hundreds to thousands of single cells in one experiment, enabling researchers to apply the sensitivity and precision of RNA-Seq to questions that can only be answered by interrogating individual cells.

Flowjo Workflow

After sequencing, the single-cell sequencing data can be instantly transferred, stored, and analyzed securely in BaseSpace Sequence Hub. There, users can access the SureCell RNA Single-Cell App, which was specifically designed to support data analysis for the Illumina Bio-Rad Single-Cell Sequencing Solution. This app enables streamlined data analysis for up to 96 samples across multiple sequencing runs and performs:

  • Read 2 alignment using the STAR aligner
  • Cell barcode and unique molecular identifier (UMI) identification
  • UMI counting for each gene and associated statistics
  • Identification of good barcodes corresponding to single cells
  • Calculation of alignment, cell, and gene metrics

The app generates a BAM, cell and gene counts table, and a report including analysis metrics and plots.


The UMI cell plot indicates the total number of cells passing filter; the vertical threshold (red line) must pass through the first knee. The defining features are the two distinct curves, or knees, and the threshold, which indicate the number of valid cells detected in the sample.


The t-Distributed Stochastic Neighbor Embedding (t-SNE) plot is a two-dimensional projection of cells illustrating potential clusters (populations) of neighboring cells with similar expression profiles.

Downstream analysis with FlowJo SeqGeq

We’ve worked with another one of our partners – FlowJo – to develop an integration between the SureCell RNA Single-Cell App and the SeqGeq toolset. SeqGeq is a set of tools for exploring single-cell NGS data with an intuitive drag-and-drop interface. Users of both systems can transfer files into SeqGeq for additional visualization and analysis, including gene tables, and heat maps.


Within SeqGeq, you can directly import data from BaseSpace Sequence Hub.

For more information, and to learn how Illumina instruments and bioinformatics are integrated with the solutions from Bio-Rad and FlowJo, download the technical note titled “Illumina® Bio-Rad® SureCell™ WTA 3′Library Prep Kit for the ddSEQ™ System” or visit the FlowJo website.

For Research Use Only.  Not for use in diagnostic procedures.
  1. Macaulay, Iain C. and Thierry Voet. “Single Cell Genomics: Advances And Future Perspectives”. PLoS Genet 10(1): e1004126. doi:10.1371/journal.pgen.1004126
  2. Eberwine J, Yeh H, Miyashiro K et al. Analysis of gene expression in single live neurons. Pnasorg. 2017. Available at: http://www.pnas.org/content/89/7/3010.short. Accessed March 14, 2017.
  3. Liu STrapnell C. Single-cell transcriptome sequencing: recent advances and remaining challenges. 2017.
  4. Macosko E, Basu A, Satija R et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. 2017.

BaseSpace Suite Summit

Join us for our BaseSpace® Suite Informatics Summit in Copenhagen, DK on 31 May and 1 June. Immediately after the European Society for Human Genetics (ESHG) annual meeting, attendance at the summit is FREE. Learn more about our informatics tools and how they’re designed to help you transform complex genomic data into meaningful insights quickly and easily.

Why attend a BaseSpace Suite Summit?

  • Share your perspectives on applying informatics tools in your lab
  • Attend informative sessions and learn how other customers use informatics
  • Get important product information for BaseSpace Clarity LIMS, BaseSpace Sequence Hub, BaseSpace Variant Interpreter (Beta), BaseSpace Cohort Analyzer, and BaseSpace Correlation Engine
  •  Learn best practices, including how an integrated approach to informatics can expedite workflows
  • Connect with your peers

Register here.

Learn more by clicking on the “Summit” dropdown above, or click here. 


BaseSpace Cohort Analyzer Update

BaseSpace Cohort Analyzer enables users to apply complex genomic data in novel ways across the entire drug discovery and development process. Pharmaceutical and biotechnology organizations can incorporate data analysis and interpretation into biomarker discovery, translational research, and clinical trials.

We are writing to summarize recent changes to BaseSpace Cohort Analyzer and to share our plans for 2017.

2016 Highlights

Last year our main focus was on enabling you to upload basic cancer data in a quick, easy, automated and secure manner. We implemented the following features:

User upload of somatic and copy number variation data

Users can simply create a Variant Call Format (VCF) file in BaseSpace Sequence Hub, or with other software, and drag and drop the files into a secure site for automated ingestion. This functionality is enabled for panels, exomes and whole genome sequencing (WGS) of somatic mutation data and copy number variation (CNV).

Access control

Your administrators can now assign permissions to private studies uploaded into BaseSpace Cohort Analyzer to allow certain users to see one study but not another.

TCGA Update

We recently started a major update of The Cancer Genome Atlas (TCGA), for which we are bringing in thousands of new samples for somatic mutations, CNVs and more, which will be finalized in the coming months.

Novel Cancer Outlier Algorithm (Patent Application & AACR)

We developed an improved algorithm for Cancer Outlier Profile Analysis, which was recently released as a new app. We have a pending patent application and will present our method at this year’s American Association for Cancer Research (AACR) conference.


Try our new Cancer Outlier Profile Analysis that helps identify genes in which a subset of subjects show “outliers,” e.g. unpregulated genes like fusion genes or amplified oncogenes such as ERBB2 in breast cancer.

2017 Roadmap

This year we are expanding the ability to upload hundreds of clinical attributes, as well as enabling user upload of RNA-seq data. We will also continue to bring in more TCGA and other public data and increase ways to get input and feedback directly from you. Please note that this roadmap is representative of our current development plan, which may evolve and change over time.

For now, if you have any questions or suggestions, contact us at informatics@illumina.com

We look forward to an exciting year ahead and to hearing from you!

BaseSpace Cohort Analyzer Team

For Research Use Only. Not for use in diagnostic procedures.