This update includes a recent version of hap.py which (in combination with vcfeval) has been selected by the GA4GH as the recommended tool for small variant call benchmarking. For more details, see the publication at: https://www.ncbi.nlm.nih.gov/pubmed/30858580
By Swathi A. Ramani, Staff Product Manager – BaseSpace Sequence Hub
If you’ve been using BaseSpace Sequence Hub for some time now, then you probably know that there is a lot more to the platform than the browser console. The Command Line Interface (CLI) is an easy-to-use command line tool that enables users to do more with BaseSpace via managing common (and not so common) tasks associated with their genomic data and analysis.
The CLI has been in development for over 4 years, and was created by our talented UK team. They needed automation tools to help sequence more than 20 petabases of data for the 100k Genomes Project. Over the past few months, we’ve been hard at work on the next generation of the CLI. We are thrilled to announce that our CLI is no longer in Beta! In our latest release, we have launched the officially supported BaseSpace (BS) Sequence Hub (BSSH) CLI v1.0.0, and all the exciting features that come with it. In the years since the initial release, we saw incredible product uptake and a lot of positive feedback from the BaseSpace community. With this launch announcement, we are delivering on some of your biggest requests for a robust feature set that simplifies data and analysis wrangling and process automation. This is a great foundation on which we can continue expanding our toolset.
Rich Built-in Features
BS CLI v1.0.0 is a completely different beast from its previous version. With just one file to download and configure, you can control multiple BaseSpace services and automate them through scripts, including uploading samples, downloading runs, launching or stopping apps and workflows, setting custom quality filters for your runs, launching analysis workflows, generate pre-signed URLs, and much more !
Flexible install process: The CLI is installed by downloading a single binary with no additional dependencies, which enables you to install the CLI in an environment where you do not have administrator privileges.
Support for Linux, Mac and Windows (32 and 64 bit) operating systems
Rich options for listing details and filtering with customized output for seamless multi-command pipelines and scripts
Powerful data management features including creation, renaming and deletion of BSSH entities
Efficient upload of FASTQ datasets or any other file types, coupled with fast download of runs, projects, biosamples and datasets
Parameterize, launch, monitor and kill analyses running remotely in BSSH
Importantly, we’ve made sure the above features work nicely together so you don’t have to do the plumbing yourself. For a full list of worked examples visit our help site.
Try It Out Today!
Our new BS CLI v1.0.0 is ready to serve as your standard toolchain to programmatically read, create and manipulate data in your BSSH account, automate routine tasks, as well as to efficiently manage your applications. You can try it out right now by following the instructions on our help site.
If you are using existing tools like BaseMount or BaseSpace Copy, these will continue to work. However, as we continue to improve the developer experience, we hope to consolidate our existing tools and add new features to the BS CLI v1.0.0 toolchain.
The more you use BS CLI v1.0.0, the more you will see how powerful it is. We can’t wait to see what you build with it! As always, let us know how we are doing. We want to incorporate best practices in the toolchain as much as possible, so it becomes customary, so please submit any requests in via this blog, twitter or firstname.lastname@example.org. Happy hacking!
Author: Eric Allen, Associate Director of Bioinformatics at Illumina
As part of the new DRAGEN v3.4 launch, the Illumina software development team has released a new BaseSpace-exclusive DRAGEN app –DRAGEN Enrichment v3.4. Combining the best of DRAGEN with Illumina’s legacy Enrichment 3 App, the DRAGEN Enrichment app provides ultra-rapid analysis and improved accuracy all at a lower cost per sample.
The DRAGEN Enrichment app is the preferable method for analyzing enrichment data with DRAGEN, delivering a full suite of enrichment specific metrics and reporting.
Here is what to know:
The DRAGEN Enrichment App is faster and more accurate vs Enrichment (Isaac/Starling) and BWA Enrichment (BWA/GATK) apps, as demonstrated via the visuals below
Small variant calling – The app includes germline and somatic (low-frequency) small variant calling (tumor only); outputs VCF and gVCF in same analysis
Note: Tumor-normal analysis can be conducted by first running the DRAGEN Enrichment app on all their normal and tumor samples, and then running the DRAGEN Somatic app on the resulting BAM files for the Tumor/Normal pairs.
Copy number variant (CNV) calling – utilize CNV baseline files based on a panel of normals
Structural variant calling
Enrichment metrics generated:
Read/base enrichment padded/unpadded
% bases covered at 1x, 10x, 20x, 50x
Picard HsMetrics enabled by checkbox
Variety of reference options supported, including hg19, GRCh38 and custom references
Includes built-in targeted region BED files for common enrichment panels, and accepts custom targeted region BEDs
In-browser, PDFs, and CSVs
Single sample and aggregate reports
Integrated variant annotation (Nirvana) and variant browser
The improved small variant calling over other available BaseSpace app solutions is shown below for one replicate of Coriell sample NA12878 with 106x depth:
CNV calling is also enabled in the DRAGEN Enrichment app. The screenshot below from IGV shows a 937,697 bp CNV loss found in a melanoma cancer sample (Me01/ERR174231) around the chromosomal region chr9:125239269-126176965. The sample data was obtained from NCBI’s Sequence Read Archive (accession ERR174231) using the SRA Import BaseSpace App.
Somatic/low-frequency variant calling is also enabled. The table below demonstrates the usefulness of this somatic calling tool:
We’ve also incorporated many of the comprehensive metrics and reporting features built into the legacy Enrichment 3.1.0 app, including read-, base-, and target-level enrichment metrics, as well as the variant table for simple variant call browsing and filtering.
We hope this update enables you to discover new insights. Stay tuned for more app announcements, and let us know if you have any questions.
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES.
BaseSpace™ Sequence Hub is used by investigators around the world to facilitate and scale their sequencing and genomic data analysis operations. At Illumina, we understand that security, privacy, and confidentiality are complex issues, and we are committed to protecting our software-as-a-service (SaaS) customers’ data.
To ensure that our customers remain compliant with upcoming changes to the EU General Data Protection Regulation (GDPR), we’ve made a number of updates to privacy practices, policies and agreements that are effective May 25, 2015 for all users globally. These changes include explaining in more detail how we use your information, including your choices, rights, and controls.
Privacy and compliance is a shared responsibility between Illumina and our customers. We are responsible for the security of the BaseSpace Sequence Hub platform. Our cloud provider, Amazon Web Services (AWS) is responsible for providing the tools, services and functionality that enable both the data controller (our customers) and the data processor (Illumina) to be successful.
Figure 1: Shared responsibility Model
A short summary of our changes:
Improved clarity and transparency.As a key part of GDPR compliance, we’ve described our data processing practices in clear language. For instruments sending Performance Data (IPD) to BaseSpace Sequence Hub, or connected in the Run Monitoring or Storage and Analysis mode, our updated Illumina®Proactive Technical Note (Link) clearly explains what data is sent to BaseSpace in each of the connectivity modes.
Data Protection Addendum:BaseSpace Sequence Hub leverages AWS to deliver its services. The updated AWS Service Terms (Link) incorporate the GDPR Data Processing Addendum (DPA) and will automatically apply to all customers. Illumina is willing to sign a DPA for customers who ask for it.
Opt-in & Opt-out:Sharing data with BaseSpace Sequence Hub, irrespective of connectivity mode, is entirely controlled by our customers. If you would like to opt out of sharing Instrument Performance Data (IPD), Run Monitoring, or Storage and Analysis mode, you can do so at any time.
In addition, we are continually reviewing and updating our security best practices to safeguard your data and the services we provide. We are ISO 27001 certified, which has a direct emphasis on international compliance and governance. Please review our security and data privacy whitepaper (Link) to learn more about our security practices.
We hope this makes your use of our SaaS products much easier. As always, please contact us at email@example.com if you have any questions.
The ability to monitor sequencing runs in real time helps users identify issues that prevent costly sequencing errors. Many users rely on the Sequencing Analysis Viewer (SAV) to access detailed quality metrics generated by the real-time analysis software on Illumina instruments.
BaseSpace Sequence Hub has enabled users to remotely monitor their sequencing runs with the Run Charts function with a very similar interface to that of SAV. We have recently released a synchronized update with SAV to offer an expanded set of metrics for monitoring run quality. At the same time, we have added a few capabilities previously only present in SAV. These enhancements provide a consistent experience and enable users to make informed decisions on the quality of their sequencing runs – whether they are standing in front of their instrument accessing SAV or monitoring the run remotely using BaseSpace Sequence Hub.
Expanded menu of metrics that maintains consistency with SAV
BaseSpace Sequence Hub now includes per cycle Phasing and Pre-phasing metrics, % No Call, and Median QScore measures in the Charts section of Run Monitoring. These measures were also released as part of SAV 2.4.5. % No Call & Median QScores are available for all sequencing platforms. The new Phasing/Pre-phasing metrics are available for all platforms except MiSeq and HiSeq 2000/2500.
Traditional Phasing (and pre-phasing) metrics, which were calculated once at cycle 25, are now listed as “Legacy Phasing Rate.” The new per-cycle weights are listed as “Phasing Weight” in the Run Charts.
The Charts section of Run Monitoring now includes the same menu structure as SAV 2.4.5. Now, metrics in the drop down menus only appear if they are available for the cycle, significantly improving the usability of the charts.
Extracted, Called, and Scored cycles have a minimum-maximum range
Run Monitoring now provides Extracted, Called, and Scored cycles as a minimum-maximum range during an instrument run. Previously, Run Monitoring showed only the maximum cycles. A wide spread between the leading and lagging tile might be an indication of a run problem. Now users can easily spot a problem with their run on both SAV and BaseSpace Sequence Hub.
New Metrics in Both SAV and BaseSpace Sequence Hub
In addition to the changes enumerated above, both SAV and BaseSpace Sequence Hubnow include Occupied Count (K) and % Occupied measures in the Charts section of Run Monitoring for NovaSeq systems. The Occupied Count is a measure of the number of wells on the flow cell with DNA. Adding these new metrics will help users understand their loading concentrations and identify issues with their sequencing run.
For Research Use Only. Not for use in diagnostic procedures.
Next-generation sequencing (NGS) systems now produce more data than ever before. Additionally, a typical NGS workflow involves manual, time-consuming touchpoints for quality control, analysis setup, and results review. As a result, labs who perform NGS or other complex, high-volume processing of samples can be overwhelmed managing the workflows and data generated. To address these issues and simplify NGS research, we are happy to announce the new version of BaseSpace Sequence Hub. It is designed to enhance your laboratory’s efficiency and support the needs of high-throughput labs.
Included in this update are new features, including a biosample-centric data model that provides tracking of all biosample activity from lab preparation through analysis delivery. We’re also introducing the following features:
New automation quality control features
Automated app launches and workflows
An updated Application Programming Interface (API) to help you streamline your next-generation sequencing (NGS) workflows
An improved user interface that helps you access your data and perform functions more quickly
Biosample-centric Data Model
Our new biosample-centric data model enables easy tracking of all biosample activity from lab preparation through analysis delivery. Biosamples are the data containers that represent the original DNA source material. They are used to trace all sequencing activities, including lab preparation (with LIMS integration) sequencing runs, data analysis, and delivery of data.
The new data model centers on biosamples, the original source of DNA, so you can easily track all biosample activity from lab preparation, with optional laboratory information management system (LIMS) integration, to delivery of analysis results. Biosamples can be used as inputs to multiple sequencing runs, and they can contain multiple datasets, which can live within separate projects.
Important Note: Biosamples with the same name (Sample ID in the sample sheet) are automatically aggregated. The new features will aggregate all FASTQ data sets with the same Sample ID into a single biosample. It is important to name the samples in your sample sheet uniquely, otherwise they will be aggregated together. Learn more about automatic data aggregation here.
Automated Lane QC, App Launch, and Analysis QC
After sequencing, much of the work required to process biosamples can be automated in bulk. By setting up automation ahead of time using the command line interface (CLI), sequencing runs can be automatically passed or failed based on their sequencing quality, converted to FASTQ datasets, used as inputs in an app, and then be passed or failed based on their app metrics. Automation removes much of the time-consuming and error prone manual work of processing sequencing data into downstream results.
Improved User Interface
The updated interface provides quick access to all of your data from the My Data menu, while the new Action Toolbar contains new and improved app functions such as requeues, QC status changes, workflows, and collaboration tools.
The Analyses page provides a listing of all analyses in your account. The filters on this page help you quickly narrow your search for specific analyses by their current status.
The Projects and Runs pages function the same as before, providing quick access to all of your sequencing projects and instrument runs.
Advanced Automation and Integration Toolset
Alongside our updated data model, we’ve introduced version 2 of the API, which enables you to interact directly with your data and integrate systems together with your BaseSpace Sequence Hub account.
The new automation tools in version 2 of the API:
Correspond to the new biosample-centric data model
Improve performance and robustness of the solution
Include new documentation
Note: The version 1 API is still fully-supported and maintained, although we are actively focusing primarily on version 2 API development. The version 1 API documentation is maintained here.
Version 2 of BaseSpaceCLI has been built using the version 2 API. BaseSpace CLI can be leveraged to read data from your BaseSpace Sequence Hub account and create new data by uploading data and launching apps. In addition, the new BaseSpace CLI can be used to create automated analysis workflows, and import biosamples.
BaseMount is a command-line tool which allows you to explore through runs, projects, biosamples, and datasets, and interact directly with the associated files exactly as you would with any other file system.
We hope the new functionality of BaseSpace Sequence Hub enables your lab to boost productivity and discovery. View a video or visit our updated Support Site to learn more about how to use all the new features and tools. Please contact us at firstname.lastname@example.org if you have any questions or comments.