BaseSpace™ CLI v1.0.0 is here!

By Swathi A. Ramani, Staff Product Manager – BaseSpace Sequence Hub

 If you’ve been using BaseSpace Sequence Hub for some time now, then you probably know that there is a lot more to the platform than the browser console. The Command Line Interface (CLI) is an easy-to-use command line tool that enables users to do more with BaseSpace via managing common (and not so common) tasks associated with their genomic data and analysis.

The CLI has been in development for over 4 years, and was created by our talented UK team. They needed automation tools to help sequence more than 20 petabases of data for the 100k Genomes Project. Over the past few months, we’ve been hard at work on the next generation of the CLI. We are thrilled to announce that our CLI is no longer in Beta! In our latest release, we have launched the officially supported BaseSpace (BS) Sequence Hub (BSSH) CLI v1.0.0, and all the exciting features that come with it. In the years since the initial release, we saw incredible product uptake and a lot of positive feedback from the BaseSpace community. With this launch announcement, we are delivering on some of your biggest requests for a robust feature set that simplifies data and analysis wrangling and process automation. This is a great foundation on which we can continue expanding our toolset.

Rich Built-in Features

BS CLI v1.0.0 is a completely different beast from its previous version. With just one file to download and configure, you can control multiple BaseSpace services and automate them through scripts, including uploading samples, downloading runs, launching or stopping apps and workflows, setting custom quality filters for your runs, launching analysis workflows, generate pre-signed URLs, and much more ! 

These include: 

  • Flexible install process: The CLI is installed by downloading a single binary with no additional dependencies, which enables you to install the CLI in an environment where you do not have administrator privileges.
  • Support for Linux, Mac and Windows (32 and 64 bit) operating systems
  • Rich options for listing details and filtering with customized output for seamless multi-command pipelines and scripts
  • Powerful data management features including creation, renaming and deletion of BSSH entities
  • Efficient upload of FASTQ datasets or any other file types, coupled with fast download of runs, projects, biosamples and datasets
  • Parameterize, launch, monitor and kill analyses running remotely in BSSH

Importantly, we’ve made sure the above features work nicely together so you don’t have to do the plumbing yourself. For a full list of worked examples visit our help site.

Try It Out Today! 

Our new BS CLI v1.0.0 is ready to serve as your standard toolchain to programmatically read, create and manipulate data in your BSSH account, automate routine tasks, as well as to efficiently manage your applications. You can try it out right now by following the instructions on our help site

If you are using existing tools like BaseMount or BaseSpace Copy, these will continue to work. However, as we continue to improve the developer experience, we hope to consolidate our existing tools and add new features to the BS CLI v1.0.0 toolchain. 

The more you use BS CLI v1.0.0, the more you will see how powerful it is. We can’t wait to see what you build with it! As always, let us know how we are doing. We want to incorporate best practices in the toolchain as much as possible, so it becomes customary, so please submit any requests in via this blog, twitter or techsupport@illumina.com. Happy hacking! 

  • The BaseSpace Sequence Hub Team

For Research Use only.

QB#8581

 

 

 

Somatic Pipeline Improvements with DRAGEN v3.3

by Severine Catreux – Associate Director, Bioinformatics FPGA Development

Significant accuracy gains and speed improvements with DRAGEN v3.3, released April 2019

The DRAGEN engineering and bioinformatics team is excited to announce a new DRAGEN release, v3.3. The second of several releases scheduled for 2019, DRAGEN v3.3 contains improvements across the many pipeline offerings now supported by the DRAGEN platform. This includes accuracy improvements in the germline and somatic pipelines, new features (e.g. CNV DeNovo calling and RNA quantification) and speed gains (Somatic T/N, BCL conversion).

 Please see DRAGEN v3.3 Release Notes for more details.  This blog highlights the significant updates to the DRAGEN Somatic Pipeline for small variants, that are part of the v3.3 release.

As one of DRAGEN’s core pipelines, the DRAGEN Somatic Pipeline for small variants is utilized by cancer research institutes around the globe. Expanding on the existing functionality, accuracy and speed of the DRAGEN Somatic Pipeline, the v3.3 release placed a high focus on the somatic tumor/normal WGS mode, producing step-function improvements in both accuracy and speed.

Accuracy Improvements:

During the development cycle for v3.3, the DRAGEN engineering and bioinformatics teams took a deep dive into the DRAGEN Somatic Pipeline tumor/normal mode, strengthening the existing algorithm for accuracy improvements. Specific improvements were made in the genotyping module, to replace point estimation of the variant allele frequency with continuous integration over a range of possible frequencies. This led to significant gains in both sensitivity and precision. Additionally, downstream filtering rules were improved to optimize both sensitivity and precision (less stringency on clustered variants, filter variants positioned at the edge of reads, filter variants with low median base quality and MAPQ). Finally, the indel PCR error model autocalibration module was made independent between the tumor and normal control, to allow for differences in library preparation between the tumor sample and the control sample.

These changes are precursors to further accuracy improvements planned for the DRAGEN v3.4 release, specifically in the area of liquid tumor support, where tumor-in-normal contamination will be taken into account.

Accuracy gains of DRAGEN 3.3 over previous DRAGEN versions (3.2) as well as other pipelines (GATK4 MuTect2 and Strelka2) are shown in the plot below. Gains are measured for both SNVs and indels on most datasets.

Figure 1: Comparison of False-Positives (FP) and False-Negatives (FN) between GATK4, Strelka2, DRAGEN 3.2 and DRAGEN 3.3. Lower values are better.

Figure 2: The above chart showcases sensitivity improvements in DRAGEN v3.3 in comparison to DRAGEN v3.2 for INDELs and SNPS.

Speed Gains

DRAGEN v3.3 delivers unprecedented fast run times on the processing of somatic T/N WGS. Users of previous DRAGEN versions will notice substantial speed gains in DRAGEN 3.3 (see graph below). For datasets that were previously HMM-limited, v3.3 delivers up to 6-fold speed improvements, with a typical 100x (tumor) and 40x (normal) run finishing within 1 hour and 40 minutes on an on-premise DRAGEN server. In the cloud, run times average at 2 hours and 30 minutes.

The run time gains were obtained from optimizations in the upstream stages of the pipeline (more efficient way of defining regions of interest and increase the MAPQ threshold of reads to pass downstream, i.e., less reads get passed downstream, without loss on sensitivity). Additionally, the accelerated HMM engines were optimized to consume less of the FPGA footprint, such that more engines could be run in parallel.

Run-time comparison for T/N WGS Somatic Calling

Figure 3: The above chart compares DRAGEN v3.2 (Jan. 2019) and v3.3 for tumor-normal whole genome sequencing somatic calling. DRAGEN v3.3 introduces significant speed improvements.

About the DRAGEN Somatic Pipeline

The DRAGEN Somatic Pipeline provides highly accurate, ultra-rapid secondary analysis for tumor-only and tumor/normal experiments to identify cancer-associated mutations.

Tumor/Normal Mode

The DRAGEN Somatic Pipeline offers flexible data analysis to suit the specific needs of users. DRAGEN accepts FASTQ, BAM/CRAM, and BCL files and supports NGS input from whole genome, whole exome, and targeted cancer panels. In the tumor/normal pipeline, both samples go through identical processing steps of mapping, aligning, sorting, and duplicate marking. Then, both sets of tumor and normal reads are passed through the somatic variant caller which looks for sites exhibiting a mutation in the tumor reads while showing little to no evidence of the mutation in the normal reads, thus producing a VCF file containing tumor-specific mutations. The Somatic Pipeline also reports allele frequency, allowing users to assess the prevalence of a specific mutation.


Figure 4: Tumor-Normal pipeline diagram

Tumor-only Mode

In the tumor-only pipeline, users input NGS data from a tumor sample and run it through the same pipeline as for tumor/normal analysis, but it lacks the matching normal sample. The somatic variant caller contains algorithms that distinguish low-frequency alleles from background noise. Although the resulting VCF file does not distinguish germline from somatic variants, it allows researchers and clinicians to determine if a mutation is present in a tumor sample and its allele frequency.

Figure 5: Tumor-only pipeline diagram

Have any feedback, suggestions or data that you’d like to share with the DRAGEN team? Our new community forum is an active, collaborative hub for connecting and sharing feedback.


For Research Use Only. Not for use in diagnostic procedures


Enhanced Run Monitoring in BaseSpace™ Sequence Hub

The ability to monitor sequencing runs in real time helps users identify issues that prevent costly sequencing errors. Many users rely on the Sequencing Analysis Viewer (SAV) to access detailed quality metrics generated by the real-time analysis software on Illumina instruments.

BaseSpace Sequence Hub has enabled users to remotely monitor their sequencing runs with the Run Charts function with a very similar interface to that of SAV. We have recently released a synchronized update with SAV to offer an expanded set of metrics for monitoring run quality. At the same time, we have added a few capabilities previously only present in SAV. These enhancements provide a consistent experience and enable users to make informed decisions on the quality of their sequencing runs – whether they are standing in front of their instrument accessing SAV or monitoring the run remotely using BaseSpace Sequence Hub.

Expanded menu of metrics that maintains consistency with SAV

BaseSpace Sequence Hub now includes per cycle Phasing and Pre-phasing metrics, % No Call, and Median QScore measures in the Charts section of Run Monitoring. These measures were also released as part of SAV 2.4.5. % No Call & Median QScores are available for all sequencing platforms. The new Phasing/Pre-phasing metrics are available for all platforms except MiSeq and HiSeq 2000/2500.

expanded menu.png

Traditional Phasing (and pre-phasing) metrics, which were calculated once at cycle 25, are now listed as “Legacy Phasing Rate.” The new per-cycle weights are listed as “Phasing Weight” in the Run Charts.

traditional phasing.png

Improved usability

The Charts section of Run Monitoring now includes the same menu structure as SAV 2.4.5. Now, metrics in the drop down menus only appear if they are available for the cycle, significantly improving the usability of the charts.

Extracted, Called, and Scored cycles have a minimum-maximum range

Run Monitoring now provides Extracted, Called, and Scored cycles as a minimum-maximum range during an instrument run. Previously, Run Monitoring showed only the maximum cycles. A wide spread between the leading and lagging tile might be an indication of a run problem. Now users can easily spot a problem with their run on both SAV and BaseSpace Sequence Hub.

New Metrics in Both SAV and BaseSpace Sequence Hub

In addition to the changes enumerated above, both SAV and BaseSpace Sequence Hubnow include Occupied Count (K) and % Occupied measures in the Charts section of Run Monitoring for NovaSeq systems. The Occupied Count is a measure of the number of wells on the flow cell with DNA. Adding these new metrics will help users understand their loading concentrations and identify issues with their sequencing run.

new metrics

 

For Research Use Only. Not for use in diagnostic procedures.

Announcing the New Data Uploader in BaseSpace™ Cohort Analyzer

BaseSpace Cohort Analyzer enables users to automatically aggregate and analyze subjects with genomics and phenotype data in a few clicks. Ultimately, users can analyze and share data for biomarker discovery, translational research, and clinical trials.

One of the most powerful features of BaseSpace Cohort Analyzer is the ability to centralize all available information for a subject into a single record. This includes phenotype obtained from various phenotypic databases, lab and image data, and genomic, methylation, proteomics, and expression data, to name a few. Breaking down siloed data in this way enables users to perform integrative analyses to make meaningful discoveries in aggregated data. Now, users of BaseSpace Cohort Analyzer can take advantage of a new beta feature: the Data Uploader.

Data Uploader: Import Somatic, CNV, RNA-Seq and >500 Phenotypical Attributes

You can now easily import your genomic data (somatic mutation or copy number variations between tumor and normal samples), or RNA-Seq data into BaseSpace Cohort Analyzer for analysis. Either upload your own files or directly import from a BaseSpace Sequence Hub Enterprise account. The uploader supports >500 phenotype and subject measurements.

Uploading and Analyzing Data

1. Upload in 2 Steps through the Data Uploader (beta)

  • Load data with >500 of phenotypic attributes, including age, gender, condition, therapies, overall survival and other outcomes.
  • Load genomic data and RNA-seq data directly from BaseSpace Sequence Hub, or from a desktop in multiple formats.
  • Check your data to catch formatting errors prior to ingestion.

ch1

2. Process and integrate your data so you can analyze it in real time within BaseSpace Cohort Analyzer.

  • Monitor and view study import status through a user interface
  • Automatically add meaningful content for analysis such as calculating tumor mutation burden for all uploaded somatic mutation data

ca2

3. Analyze Data in BaseSpace Cohort Analyzer

After your data is uploaded, perform cohort analysis using over 100 bioinformatic workflows and

  • Compare your data with other datatypes or technologies
  • Load and view everything associated to a single subject in one place
  • Filter and select a cohort based on any phenotype or molecular marker(s).
  • Integrate and analyze your data with clinical outcomes and therapies
  • Understand the survival, molecular, and clinical differences between two groups
  • Find expression outliers in your cohort of interest
  • Research meaningful biomarkers and drug targets

ca3

For more information about BaseSpace Cohort Analyzer, the Data Uploader or to sign up for a free trial, please contact us at techsupport@illumina.com.

 

For Research Use Only. Not for use in diagnostic procedures.

Welcome to the new BaseSpace® Sequence Hub!

basespace-suite-logo-sequence-hub-reg

Next-generation sequencing (NGS) systems now produce more data than ever before. Additionally, a typical NGS workflow involves manual, time-consuming touchpoints for quality control, analysis setup, and results review. As a result, labs who perform NGS or other complex, high-volume processing of samples can be overwhelmed managing the workflows and data generated. To address these issues and simplify NGS research, we are happy to announce the new version of BaseSpace Sequence Hub. It is designed to enhance your laboratory’s efficiency and support the needs of high-throughput labs.

Included in this update are new features, including a biosample-centric data model that provides tracking of all biosample activity from lab preparation through analysis delivery. We’re also introducing the following features:

  • New automation quality control features
  • Automated app launches and workflows
  • An updated Application Programming Interface (API) to help you streamline your next-generation sequencing (NGS) workflows
  • An improved user interface that helps you access your data and perform functions more quickly

New Features

Biosample-centric Data Model

Our new biosample-centric data model enables easy tracking of all biosample activity from lab preparation through analysis delivery. Biosamples are the data containers that represent the original DNA source material. They are used to trace all sequencing activities, including lab preparation (with LIMS integration) sequencing runs, data analysis, and delivery of data.

figure1
Figure 1 Access all libraries, runs, requeues, analyses, and datasets associated with biosamples from a single place.

The new data model centers on biosamples, the original source of DNA, so you can easily track all biosample activity from lab preparation, with optional laboratory information management system (LIMS) integration, to delivery of analysis results. Biosamples can be used as inputs to multiple sequencing runs, and they can contain multiple datasets, which can live within separate projects.

Important Note: Biosamples with the same name (Sample ID in the sample sheet) are automatically aggregated. The new features will aggregate all FASTQ data sets with the same Sample ID into a single biosample. It is important to name the samples in your sample sheet uniquely, otherwise they will be aggregated together. Learn more about automatic data aggregation here.

figure2.png
Figure 2 Samples have been replaced by biosamples as inputs to apps.

Automated Lane QC, App Launch, and Analysis QC

After sequencing, much of the work required to process biosamples can be automated in bulk. By setting up automation ahead of time using the command line interface (CLI), sequencing runs can be automatically passed or failed based on their sequencing quality, converted to FASTQ datasets, used as inputs in an app, and then be passed or failed based on their app metrics. Automation removes much of the time-consuming and error prone manual work of processing sequencing data into downstream results.

figure3.png
Figure 3 View lane metric details to understand why a lane may have failed.

Picture1.png
Figure 4 A comparison of the number of touchpoints when using the new automation features of BaseSpace Sequence Hub

Improved User Interface

The updated interface provides quick access to all of your data from the My Data menu, while the new Action Toolbar contains new and improved app functions such as requeues, QC status changes, workflows, and collaboration tools.

figure5.png
Figure 5 The new Action toolbar contains app functions like requeues, QC status changes, workflows, and collaboration tools.

The Analyses page provides a listing of all analyses in your account. The filters on this page help you quickly narrow your search for specific analyses by their current status.

The Projects and Runs pages function the same as before, providing quick access to all of your sequencing projects and instrument runs.

Advanced Automation and Integration Toolset

Alongside our updated data model, we’ve introduced version 2 of the API, which enables you to interact directly with your data and integrate systems together with your BaseSpace Sequence Hub account.

The new automation tools in version 2 of the API:

  • Correspond to the new biosample-centric data model
  • Improve performance and robustness of the solution
  • Include new documentation

Note: The version 1 API is still fully-supported and maintained, although we are actively focusing primarily on version 2 API development. The version 1 API documentation is maintained here.

Version 2 of BaseSpaceCLI has been built using the version 2 API. BaseSpace CLI can be leveraged to read data from your BaseSpace Sequence Hub account and create new data by uploading data and launching apps. In addition, the new BaseSpace CLI can be used to create automated analysis workflows, and import biosamples.

BaseMount is a command-line tool which allows you to explore through runs, projects, biosamples, and datasets, and interact directly with the associated files exactly as you would with any other file system.

We hope the new functionality of BaseSpace Sequence Hub enables your lab to boost productivity and discovery. View a video or visit our updated Support Site to learn more about how to use all the new features and tools. Please contact us at techsupport@illumina.com if you have any questions or comments.

Sincerely,
The BaseSpace Sequence Hub Team

References

  1. CLI documentation https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-overview
  2. CLI automated workflow creation docs https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-examples
  3. Link to v1 API docs https://developer.basespace.illumina.com/docs/content/documentation/rest-api/v1-api-reference
  4. Link to v2 API docs https://developer.basespace.illumina.com/docs/content/documentation/rest-api/api-reference

 

BaseSpace Cohort Analyzer Update

BaseSpace Cohort Analyzer enables users to apply complex genomic data in novel ways across the entire drug discovery and development process. Pharmaceutical and biotechnology organizations can incorporate data analysis and interpretation into biomarker discovery, translational research, and clinical trials.

We are writing to summarize recent changes to BaseSpace Cohort Analyzer and to share our plans for 2017.

2016 Highlights

Last year our main focus was on enabling you to upload basic cancer data in a quick, easy, automated and secure manner. We implemented the following features:

Continue reading BaseSpace Cohort Analyzer Update

Updated Command Line Tools – Wait for App Dependencies

We are pleased to announce a minor release of BaseSpaceCLI (0.8.10) with some improvements to existing tools and a new tool – bs wait.

‘bs wait’

The new wait command for BaseSpaceCLI is analogous to the shell command wait and was designed to help connect together separate app launches. The wait command accepts as arguments one or more appsessions and will then wait for these appsessions to finish, polling based on a specified interval (default 60 seconds). Once they have all finished, bs wait returns the appresults that have been generated by the provided appsessions. The intention is that these appresults can then be passed into another app launch, providing some limited app-chaining capabilities.
Continue reading Updated Command Line Tools – Wait for App Dependencies