DRAGEN™ Enrichment App – Accurate, rapid analysis for germline and somatic exome experiments

Author: Eric Allen, Associate Director of Bioinformatics at Illumina

As part of the new DRAGEN v3.4 launch, the Illumina software development team has released a new BaseSpace-exclusive DRAGEN app –DRAGEN Enrichment v3.4. Combining the best of DRAGEN with Illumina’s legacy Enrichment 3 App, the DRAGEN Enrichment app provides ultra-rapid analysis and improved accuracy all at a lower cost per sample.

The DRAGEN Enrichment app is the preferable method for analyzing enrichment data with DRAGEN, delivering a full suite of enrichment specific metrics and reporting.

Here is what to know:

  • The DRAGEN Enrichment App is faster and more accurate vs Enrichment (Isaac/Starling) and BWA Enrichment (BWA/GATK) apps, as demonstrated via the visuals below
  • Variant Calling:
    • Small variant calling – The app includes germline and somatic (low-frequency) small variant calling (tumor only); outputs VCF and gVCF in same analysis
      • Note: Tumor-normal analysis can be conducted by first running the DRAGEN Enrichment app on all their normal and tumor samples, and then running the DRAGEN Somatic app on the resulting BAM files for the Tumor/Normal pairs.
    • Copy number variant (CNV) calling – utilize CNV baseline files based on a panel of normals
    • Structural variant calling
  • Enrichment metrics generated:
    • Read/base enrichment padded/unpadded
    • Uniformity
    • % bases covered at 1x, 10x, 20x, 50x
    • Picard HsMetrics enabled by checkbox
  • Variety of reference options supported, including hg19, GRCh38 and custom references
  • Includes built-in targeted region BED files for common enrichment panels, and accepts custom targeted region BEDs
  • Extensive reporting:
    • In-browser, PDFs, and CSVs
    • Single sample and aggregate reports
  • Integrated variant annotation (Nirvana) and variant browser

The improved small variant calling over other available BaseSpace app solutions is shown below for one replicate of Coriell sample NA12878 with 106x depth:

Analysis AppApp Execution TimeDRAGEN-only Execution TimeSNV RecallSNV PrecisionIndel RecallIndel Precision
DRAGEN Enrichment v3.4.516m 4s6m 50s95.04%99.49%86.90% 92.18%
(Isaac/Starling) Enrichment v3.1.053m 20sNA93.26%99.38%78.29% 86.90%
BWA Enrichment v2.1.21h 23m 2sNA90.66% 99.78%72.85% 89.44%


• Example sample (s01-NFE-CEX-NA12878-demo.vcf) was prepared using Nextera Flex for Enrichment Library Preparation kit with dual indices and sequenced on a NovaSeq™ S2 flow cell: https://basespace.illumina.com/s/FaxWSm2X1gwO
• Variant accuracy comparison was performed using the Variant Calling Assessment Tool v3.2.0 app.

CNV calling is also enabled in the DRAGEN Enrichment app. The screenshot below from IGV shows a 937,697 bp CNV loss found in a melanoma cancer sample (Me01/ERR174231) around the chromosomal region chr9:125239269-126176965. The sample data was obtained from NCBI’s Sequence Read Archive (accession ERR174231) using the SRA Import BaseSpace App.

Project: SRA: ERP001844 (Agilent SureSelect – Exome CNV Detection – Melanoma). Publication: Magi et al.

Somatic/low-frequency variant calling is also enabled. The table below demonstrates the usefulness of this somatic calling tool:

Variant TypeChr Pos Gene Variant HD753 – Expected VF (%) HD753 – Measured VF (DRAGEN Enrichment) (%)
SNV Low GCchr.3 178936091 PIK3CA E545K 5.63.8
SNV High GC chr.19 3118942 GNA11 Q209L 5.66
Long Deletion chr.7 55242464 EGFR ΔE746 – A750 5.33.3
Long Insertion chr.755248998 EGFR V769_D770insASV 5.63.7
SNV High GC chr.14 105246551 AKT1 E17K 55.7

Project: NovaSeq S4: Nextera Flex for Enrichment (HCC1187, HCC1395, HCC1954, HD753, Coriell Mixture). 1% VF cutoff

We’ve also incorporated many of the comprehensive metrics and reporting features built into the legacy Enrichment 3.1.0 app, including read-, base-, and target-level enrichment metrics, as well as the variant table for simple variant call browsing and filtering.

We hope this update enables you to discover new insights. Stay tuned for more app announcements, and let us know if you have any questions.

FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES.
QB#8389

Somatic Pipeline Improvements with DRAGEN v3.3

by Severine Catreux – Associate Director, Bioinformatics FPGA Development

Significant accuracy gains and speed improvements with DRAGEN v3.3, released April 2019

The DRAGEN engineering and bioinformatics team is excited to announce a new DRAGEN release, v3.3. The second of several releases scheduled for 2019, DRAGEN v3.3 contains improvements across the many pipeline offerings now supported by the DRAGEN platform. This includes accuracy improvements in the germline and somatic pipelines, new features (e.g. CNV DeNovo calling and RNA quantification) and speed gains (Somatic T/N, BCL conversion).

 Please see DRAGEN v3.3 Release Notes for more details.  This blog highlights the significant updates to the DRAGEN Somatic Pipeline for small variants, that are part of the v3.3 release.

As one of DRAGEN’s core pipelines, the DRAGEN Somatic Pipeline for small variants is utilized by cancer research institutes around the globe. Expanding on the existing functionality, accuracy and speed of the DRAGEN Somatic Pipeline, the v3.3 release placed a high focus on the somatic tumor/normal WGS mode, producing step-function improvements in both accuracy and speed.

Accuracy Improvements:

During the development cycle for v3.3, the DRAGEN engineering and bioinformatics teams took a deep dive into the DRAGEN Somatic Pipeline tumor/normal mode, strengthening the existing algorithm for accuracy improvements. Specific improvements were made in the genotyping module, to replace point estimation of the variant allele frequency with continuous integration over a range of possible frequencies. This led to significant gains in both sensitivity and precision. Additionally, downstream filtering rules were improved to optimize both sensitivity and precision (less stringency on clustered variants, filter variants positioned at the edge of reads, filter variants with low median base quality and MAPQ). Finally, the indel PCR error model autocalibration module was made independent between the tumor and normal control, to allow for differences in library preparation between the tumor sample and the control sample.

These changes are precursors to further accuracy improvements planned for the DRAGEN v3.4 release, specifically in the area of liquid tumor support, where tumor-in-normal contamination will be taken into account.

Accuracy gains of DRAGEN 3.3 over previous DRAGEN versions (3.2) as well as other pipelines (GATK4 MuTect2 and Strelka2) are shown in the plot below. Gains are measured for both SNVs and indels on most datasets.

Figure 1: Comparison of False-Positives (FP) and False-Negatives (FN) between GATK4, Strelka2, DRAGEN 3.2 and DRAGEN 3.3. Lower values are better.

Figure 2: The above chart showcases sensitivity improvements in DRAGEN v3.3 in comparison to DRAGEN v3.2 for INDELs and SNPS.

Speed Gains

DRAGEN v3.3 delivers unprecedented fast run times on the processing of somatic T/N WGS. Users of previous DRAGEN versions will notice substantial speed gains in DRAGEN 3.3 (see graph below). For datasets that were previously HMM-limited, v3.3 delivers up to 6-fold speed improvements, with a typical 100x (tumor) and 40x (normal) run finishing within 1 hour and 40 minutes on an on-premise DRAGEN server. In the cloud, run times average at 2 hours and 30 minutes.

The run time gains were obtained from optimizations in the upstream stages of the pipeline (more efficient way of defining regions of interest and increase the MAPQ threshold of reads to pass downstream, i.e., less reads get passed downstream, without loss on sensitivity). Additionally, the accelerated HMM engines were optimized to consume less of the FPGA footprint, such that more engines could be run in parallel.

Run-time comparison for T/N WGS Somatic Calling

Figure 3: The above chart compares DRAGEN v3.2 (Jan. 2019) and v3.3 for tumor-normal whole genome sequencing somatic calling. DRAGEN v3.3 introduces significant speed improvements.

About the DRAGEN Somatic Pipeline

The DRAGEN Somatic Pipeline provides highly accurate, ultra-rapid secondary analysis for tumor-only and tumor/normal experiments to identify cancer-associated mutations.

Tumor/Normal Mode

The DRAGEN Somatic Pipeline offers flexible data analysis to suit the specific needs of users. DRAGEN accepts FASTQ, BAM/CRAM, and BCL files and supports NGS input from whole genome, whole exome, and targeted cancer panels. In the tumor/normal pipeline, both samples go through identical processing steps of mapping, aligning, sorting, and duplicate marking. Then, both sets of tumor and normal reads are passed through the somatic variant caller which looks for sites exhibiting a mutation in the tumor reads while showing little to no evidence of the mutation in the normal reads, thus producing a VCF file containing tumor-specific mutations. The Somatic Pipeline also reports allele frequency, allowing users to assess the prevalence of a specific mutation.


Figure 4: Tumor-Normal pipeline diagram

Tumor-only Mode

In the tumor-only pipeline, users input NGS data from a tumor sample and run it through the same pipeline as for tumor/normal analysis, but it lacks the matching normal sample. The somatic variant caller contains algorithms that distinguish low-frequency alleles from background noise. Although the resulting VCF file does not distinguish germline from somatic variants, it allows researchers and clinicians to determine if a mutation is present in a tumor sample and its allele frequency.

Figure 5: Tumor-only pipeline diagram

Have any feedback, suggestions or data that you’d like to share with the DRAGEN team? Our new community forum is an active, collaborative hub for connecting and sharing feedback.


For Research Use Only. Not for use in diagnostic procedures


Doing more with DRAGEN™ v3.2.8

Advancing Workflows through Relentless Innovation

We’ve been busy over the last few months! Back in May, Illumina announced the acquisition of Edico Genome and the DRAGEN™ (Dynamic Read Analysis for GENomics) technology. Since then, we have been hard at work expanding DRAGEN’s capabilities to provide more advanced, robust and performant pipelines for our customers. With the inclusion of DRAGEN into the Illumina ecosystem, we are now able to take advantage of the expertise of both teams to build out an expanded chest of tools that offer added functionality, benefits and ease-of-use.

The team has come a long way since we last published about DRAGEN on the BaseSpace™ Blog, and we are excited to share some insight into what we have been working on. Over the coming months, we will continue to post about our latest updates and activities to keep you updated.

Earlier this month, we released DRAGEN v3.2.8, which introduces a variety of new capabilities designed to deliver more insights from your data.

Continue reading Doing more with DRAGEN™ v3.2.8