BaseSpace Cohort Analyzer enables users to automatically aggregate and analyze subjects with genomics and phenotype data in a few clicks. Ultimately, users can analyze and share data for biomarker discovery, translational research, and clinical trials.
One of the most powerful features of BaseSpace Cohort Analyzer is the ability to centralize all available information for a subject into a single record. This includes phenotype obtained from various phenotypic databases, lab and image data, and genomic, methylation, proteomics, and expression data, to name a few. Breaking down siloed data in this way enables users to perform integrative analyses to make meaningful discoveries in aggregated data. Now, users of BaseSpace Cohort Analyzer can take advantage of a new beta feature: the Data Uploader.
Data Uploader: Import Somatic, CNV, RNA-Seq and >500 Phenotypical Attributes
You can now easily import your genomic data (somatic mutation or copy number variations between tumor and normal samples), or RNA-Seq data into BaseSpace Cohort Analyzer for analysis. Either upload your own files or directly import from a BaseSpace Sequence Hub Enterprise account. The uploader supports >500 phenotype and subject measurements.
Uploading and Analyzing Data
1. Upload in 2 Steps through the Data Uploader (beta)
- Load data with >500 of phenotypic attributes, including age, gender, condition, therapies, overall survival and other outcomes.
- Load genomic data and RNA-seq data directly from BaseSpace Sequence Hub, or from a desktop in multiple formats.
- Check your data to catch formatting errors prior to ingestion.
2. Process and integrate your data so you can analyze it in real time within BaseSpace Cohort Analyzer.
- Monitor and view study import status through a user interface
- Automatically add meaningful content for analysis such as calculating tumor mutation burden for all uploaded somatic mutation data
3. Analyze Data in BaseSpace Cohort Analyzer
After your data is uploaded, perform cohort analysis using over 100 bioinformatic workflows and
- Compare your data with other datatypes or technologies
- Load and view everything associated to a single subject in one place
- Filter and select a cohort based on any phenotype or molecular marker(s).
- Integrate and analyze your data with clinical outcomes and therapies
- Understand the survival, molecular, and clinical differences between two groups
- Find expression outliers in your cohort of interest
- Research meaningful biomarkers and drug targets
For more information about BaseSpace Cohort Analyzer, the Data Uploader or to sign up for a free trial, please contact us at firstname.lastname@example.org.
For Research Use Only. Not for use in diagnostic procedures.
A guest blog, written by GoSeqIt
In an increasingly globalized world, bacteria can spread rapidly and easily. Furthermore, they often contain genes that make them resistant to antibiotics or confer high virulence. Sequencing the entire genome of bacteria enables a thorough characterization and thus makes it possible for researchers to monitor the spread of particular strains of bacteria or sets of genes.
In collaboration with the Illumina BaseSpace Sequence Hub development team, GoSeqIt has published two apps for characterization of bacterial single isolates. Both of these apps are now available to BaseSpace Sequence Hub users:
The input for both apps is a bacterial complete or draft genome in FASTA format (only files with the extension .fa or .fasta are accepted).
Bacterial Analysis Pipeline App
The Bacterial Analysis Pipeline app will initially predict the species of the bacterial draft genome based on the number of kmers (oligonucleotides with the length k) co-occurring between the input genome and bacterial genomes in a reference database (1). Further, acquired antimicrobial resistance genes are identified using a BLAST-based approach, where the nucleotide sequence of the input genome is compared to the genes in the ResFinder database (2). Depending on the identified species, Multilocus Sequence Typing (MLST) is performed, also using a BLAST-based approach (3). One-hundred-twenty-five (125) MLST schemes are currently available.
If the input genome is recognized as belonging to Enterobacteriaceae or the gram positive bacteria (Enterococcus, Streptococcus, or Staphylococcus), BLAST is used to search for plasmid replicons using the PlasmidFinder database (4). Identified plasmids of the incF, IncH1, IncH2, IncI1, IncN, or IncA/C type are further subtyped by plasmid MLST (4). Finally, identified Escherichia coli, Enterococcus sp., Listeria sp., and Staphylococcus aureus are compared to the VirulenceFinder database containing known virulence genes (5). For more information, refer to the article titled “Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.” Figure 1 illustrates the output for species prediction and MLST, while figure 2 illustrates the output for the prediction of acquired antimicrobial resistance genes.
Figure 1: Example of output from the Bacterial Analysis Pipeline app for species prediction and MLST of the input genome.
Figure 2: Example of output from the Bacterial Analysis Pipeline app for acquired antimicrobial resistance genes in the input genome.
E. coli Serotyping App
The E. coli Serotyping app uses a BLAST-based approach to predict the serotype of E. coli isolates by comparing the input genome with a database of specific O-antigen processing system genes for O typing and flagellin genes for H typing (7). The app outputs the predicted serotype along with the identified O-antigen genes (wzx, wzy, wzm, and wzt) and flagellin genes (fliC, flkA, fllA, flmA, and flnA).
Figure 3: Example of output from the E. coli Serotyping app. So far, only E. coli isolates can in this way be in silico serotyped.
Using the New Apps
The price for using the Bacterial Analysis Pipeline app is 5 iCredits per uploaded file plus the cost of computing. The E. coli Serotyping app costs 1 iCredit per uploaded file plus the cost of computing.
Both apps use methods that have been throughly described and published in renowned scientific journals.
1) Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Hasman H, Sicheritz-Pontén T, Aarestrup FM, Ussery DW, Lund O. Benchmarking of methods for genomic taxonomy. J Clin Microbiol. 2014 May;52(5):1529-39. PMID: 24574292.
2) Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother. 2012 Nov;67(11):2640-4. PMID: 22782487.
3) Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-Pontén T, Ussery DW, Aarestrup FM, Lund O. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 2012 Apr;50(4):1355-61. PMID: 22238442.
4) Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O, Villa L, Møller Aarestrup F, Hasman H. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother. 2014 Jul;58(7):3895-903. PMID: 24777092.
5) Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol. 2014 May;52(5):1501-10. PMID: 24574290.
6) Thomsen MC, Ahrenfeldt J, Cisneros JL, Jurtz V, Larsen MV, Hasman H, Aarestrup FM, Lund O. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance. PLoS One. 2016 Jun 21;11(6):e0157718. PMID: 27327771.
7) Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol. 2014 May;52(5):1501-10. PMID: 24574290.
For Research Use Only. Not for use in diagnostic procedures.
Integration and interoperability between laboratory systems—or lack thereof—remains a challenge for those performing next-generation sequencing (NGS) or other genomics studies.1 To address this challenge, we developed version 2.2 of the integration between BaseSpace Clarity LIMS and the NovaSeq 6000 instrument. This integration now supports the NovaSeq S4 flow cell, as well as the NovaSeq Xp protocol.
The NovaSeq S4 flow cell delivers up to 6 TB of output in two days and is ideally suited for high intensity sequencing applications. Users can now sequence up to 48 human genomes or 384 exomes per run in less than 48 hours. This innovation paves the way for large-population-scale initiatives at the lowest price per sample, and enables labs to cost effectively perform human whole-genome sequencing.2 And now, users of both BaseSpace Clarity LIMS and the NovaSeq 6000 instrument can access this out-of-the box integration to get up and running with their system sooner.
The new integration helps users track samples throughout the workflow. Specifically, it:
- Supports S13, S2, and S4 flow cells per sample
- Supports different applications on the same flow cell
- Calculates samples and reagents volumes based on the flow cell type
- Creates an output file for use with liquid handling robots
- Validates every step in the workflow
The new integration also tracks sequencing run information in BaseSpace Clarity LIMS to help with troubleshooting or trending:
- Run recipe files (JSON) are automatically generated to set up and initiate the run
- Sample sheets, which are compatible with BaseSpace Sequence Hub and bcl2fastq
v 2.19, are automatically generated and placed directly on the NovaSeq 6000 instrument
- Sequencing run are tracked and run metrics are parsed per lane and per flow cell
If you have questions about this integration, please email Illumina Technical Support.
- Next-Generation Sequencing Informatics: Challenges and … http://www.bing.com/cr?IG=74008A18392242E59F11965A936C0331&CID=1B0873003B0C6EB91053783A3A0A
6F0E&rd=1&h=qZ8eqx6ov_OxkAzDtTWfrbsSZM2WP_pCoQuO66f-AVI&v=1&r=http%3a%2f%2fwww.archivesofpathology.org%2fdoi%2f10.5858%2farpa.2015-0507-RA&p=DevEx,5067.1. Accessed November 14, 2017.
- Illumina.com. (2017). Illumina Releases NovaSeq S4 Flow Cell and NovaSeq Xp Workflow. [online] Available at: https://www.illumina.com/company/news-center/press-releases/2017/2308795.html [Accessed 16 Nov. 2017].
- Upcoming flow cell in the NovaSeq 6000 instrument portfolio
For Research Use Only. Not for Use in Diagnostic Procedures.
Join us for our upcoming webinar: High-volume sequence analysis with BaseSpace™ Sequence Hub and Edico DRAGEN apps, on Dec 13 at 10AM (PT)
The latest sequencing technologies enable unprecedented throughput and redefine limits for many labs. To adapt, these labs must redefine how they work – by automating tasks to reduce touchpoints and by simplifying workflows with integration and robust analysis tools.
In this webinar, we describe BaseSpace™Sequence Hub and how the newest features support high throughput, high-volume sequencing. We demonstrate how customers can progress from flowcell loading to variant analysis with zero touchpoints by using the Whole Genome Sequencing or Edico DRAGEN apps. Additionally, we describe how the integration with BaseSpace™ Variant Interpreter enables users to interpret and generate reports of identified variants.
For Research Use Only. Not for use in diagnostic procedures.
We are pleased to announce the launch of the first integration between the Illumina VeriSeq™ Noninvasive Prenatal Testing (NIPT) Solution and BaseSpace Clarity LIMS.
The VeriSeqTM NIPT Solution is an in vitro diagnostic test intended for use as a sequencing‐based screening test for the detection of fetal aneuploidies from maternal peripheral whole blood samples in pregnant women of at least 10 weeks gestation. VeriSeq NIPT provides information regarding aneuploidy status for chromosomes: 21, 18, 13, X, and Y. This product must not be used as the sole basis for diagnosis or other pregnancy management decisions.
To facilitate use of the solution, we have implemented an integration with BaseSpace™ Clarity LIMS, which allows users of both to centralize data into one location from sample accessioning to reporting, without altering the IVD CE Marked VeriSeqTM NIPT Solution.
At a high level, the integration includes:
- Automatic generation of a sample upload sheet that is compatible with Workflow Manager. The sample sheet generated captures VeriSeq NIPT Sample Type and Sex Chromosomes fields required by Workflow Manager.
- Preconfigured VeriSeq NIPT v1.0 workflow containing a protocol that maps to the VeriSeq NIPT Software Solution for both library prep and reporting.
- Preconfigured VeriSeq NIPT v1.0 Validation workflow and protocol that allows for validation of the integration.
- Batching step that includes automated validation to ensure batch size equals 48 or 96 – including No Template Controls (NTCs).
- Generation of sample sheet that is designed to be used by the VeriSeq NIPT Workflow Manager to start the run.
- An analysis step that populates NIPT report data back into BaseSpace Clarity LIMS. For BaseSpace Clarity LIMS Silver customers, the report data must be manually uploaded. However, for BaseSpace Clarity LIMS Gold users, the report data are automatically uploaded.
To learn more about the BaseSpace Clarity LIMS integration to the VeriSeq™ NIPT Solution, please contact us.
For Research Use Only. Not for use in diagnostic procedures.
Bioinfomatics tools are a key component in the Next-generation Sequencing (NGS) workflow and can have a significant impact on the results. Alignment and variant calling, in particular, involve complex algorithms, each with unique strengths and weaknesses. The Broad Institute’s BWA+GATK application is among the most popular, but over the last few years more alignment+variant calling methods have been released by companies including Illumina, Edico Genome, and Sentieon. With the emergence of multiple methods comes a clear need for comparison between the results obtained by these methods so that people who use these tools can select the best one for their purpose.
The new Hap.py app available on BaseSpace Sequence Hub enables users to compare diploid genotypes at the haplotype level by generating and matching alternate sequences in a small region of the genome that contains one or more variants. Hap.py makes it easy to compare any variant call set against a range of packaged gold-standard truth sets1,2 to perform routine benchmarking.
Next-generation sequencing (NGS) systems now produce more data than ever before. Additionally, a typical NGS workflow involves manual, time-consuming touchpoints for quality control, analysis setup, and results review. As a result, labs who perform NGS or other complex, high-volume processing of samples can be overwhelmed managing the workflows and data generated. To address these issues and simplify NGS research, we are happy to announce the new version of BaseSpace Sequence Hub. It is designed to enhance your laboratory’s efficiency and support the needs of high-throughput labs.
Included in this update are new features, including a biosample-centric data model that provides tracking of all biosample activity from lab preparation through analysis delivery. We’re also introducing the following features:
- New automation quality control features
- Automated app launches and workflows
- An updated Application Programming Interface (API) to help you streamline your next-generation sequencing (NGS) workflows
- An improved user interface that helps you access your data and perform functions more quickly
Biosample-centric Data Model
Our new biosample-centric data model enables easy tracking of all biosample activity from lab preparation through analysis delivery. Biosamples are the data containers that represent the original DNA source material. They are used to trace all sequencing activities, including lab preparation (with LIMS integration) sequencing runs, data analysis, and delivery of data.
The new data model centers on biosamples, the original source of DNA, so you can easily track all biosample activity from lab preparation, with optional laboratory information management system (LIMS) integration, to delivery of analysis results. Biosamples can be used as inputs to multiple sequencing runs, and they can contain multiple datasets, which can live within separate projects.
Important Note: Biosamples with the same name (Sample ID in the sample sheet) are automatically aggregated. The new features will aggregate all FASTQ data sets with the same Sample ID into a single biosample. It is important to name the samples in your sample sheet uniquely, otherwise they will be aggregated together. Learn more about automatic data aggregation here.
Automated Lane QC, App Launch, and Analysis QC
After sequencing, much of the work required to process biosamples can be automated in bulk. By setting up automation ahead of time using the command line interface (CLI), sequencing runs can be automatically passed or failed based on their sequencing quality, converted to FASTQ datasets, used as inputs in an app, and then be passed or failed based on their app metrics. Automation removes much of the time-consuming and error prone manual work of processing sequencing data into downstream results.
Improved User Interface
The updated interface provides quick access to all of your data from the My Data menu, while the new Action Toolbar contains new and improved app functions such as requeues, QC status changes, workflows, and collaboration tools.
The Analyses page provides a listing of all analyses in your account. The filters on this page help you quickly narrow your search for specific analyses by their current status.
The Projects and Runs pages function the same as before, providing quick access to all of your sequencing projects and instrument runs.
Advanced Automation and Integration Toolset
Alongside our updated data model, we’ve introduced version 2 of the API, which enables you to interact directly with your data and integrate systems together with your BaseSpace Sequence Hub account.
The new automation tools in version 2 of the API:
- Correspond to the new biosample-centric data model
- Improve performance and robustness of the solution
- Include new documentation
Note: The version 1 API is still fully-supported and maintained, although we are actively focusing primarily on version 2 API development. The version 1 API documentation is maintained here.
Version 2 of BaseSpaceCLI has been built using the version 2 API. BaseSpace CLI can be leveraged to read data from your BaseSpace Sequence Hub account and create new data by uploading data and launching apps. In addition, the new BaseSpace CLI can be used to create automated analysis workflows, and import biosamples.
BaseMount is a command-line tool which allows you to explore through runs, projects, biosamples, and datasets, and interact directly with the associated files exactly as you would with any other file system.
We hope the new functionality of BaseSpace Sequence Hub enables your lab to boost productivity and discovery. View a video or visit our updated Support Site to learn more about how to use all the new features and tools. Please contact us at email@example.com if you have any questions or comments.
The BaseSpace Sequence Hub Team
- CLI documentation https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-overview
- CLI automated workflow creation docs https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-examples
- Link to v1 API docs https://developer.basespace.illumina.com/docs/content/documentation/rest-api/v1-api-reference
- Link to v2 API docs https://developer.basespace.illumina.com/docs/content/documentation/rest-api/api-reference