MiXCR Immune Repertoire Analyzer version 2.1.11 from MiLaboratories is now available at Illumina BaseSpace™

  • Youting Sun, Senior Bioinformatics Scientist at Illumina
  • Dmitriy Chudakov, CSO at MiLaboratories https://milaboratory.com

The upgraded version of the MiLaboratories LLC flagship software product, MiXCR, is now officially available as an Illumina BaseSpace™ application.

MiXCR is a “gold standard” analytical package in the area of T-cell receptors (TCR) and immunoglobulin (IG) repertoire profiling. Analysts apply MiXCR for extracting immune repertoires  from any type of sequencing data with any level of TCR/IG coverage, ranging from perfectly enriched libraries such as multiplex PCR or targeted 5’RACE, to the “rare event” datasets containing several target entities among hundreds of millions of reads, such as RNA-Seq, and even Exome-Seq data.

In the new version, simple selection of species (Human or Mouse), template material (RNA or DNA), library type (targeted or random), and basic information on library preparation enables appropriate analysis settings for a variety of immune repertoire experiment scenarios:

MiXCR Immune Repertoire Analyzer v2.1.11 Input Form Parameters.

Extraction of both full VDJ length or CDR3  only repertoires is possible, for TCR or IG chains of interest, with or without out-of-frame and stop codon-containing clonotype variants:

MiXCR Immune Repertoire Analyzer v2.1.11 Analysis Settings.

The application also provides post-analysis metrics in the form of interactive reports, including:

Post-analysis metrics in the form of interactive reports, including basic statistics.
Spectratype with major clonotypes.
Quantile Statistics on clonotype frequencies.
Clonotypes with colorized V, D, and J segments.

High extraction efficiency for any type of sequencing data and superb accuracy should make the new MiXCR BaseSpace version a highly useful resource for many. The ability to look at basic parameters and immediately download the resulting figures for reports and publication make it really convenient for the most efficient everyday work on immune repertoires.

Analysis of amplicon data

Recommended settings for panels (a) Immune repertoire panel and (b) TCR-beta SR panel:

  • Starting material: RNA for panel (a), RNA or Genomic DNA for panel (b)
  • Library type: Targeted TCR/IG library amplification (5’RACE, Amplicon, Multiplex, etc)
  • 5’-end of the library: V gene single primer/multiplex
  • 3’-end of the library: J gene single primer/multiplex
  • Presence of PCR primers and/or adapter sequence: Absent / nearly absent / trimmed
  • Target region: CDR3

For Research Use Only.  Not for use in diagnostic procedures.


Introducing Enrichment v3.0 with Enhanced Variant Calling


Enrichment v3.0

The new Enrichment v3.0 BaseSpace® App (formerly called Isaac Enrichment) introduces major improvements and new features including:

  • Improved small variant calling
  • Copy number variant (CNV) calling
  • Structural variant calling
  • Somatic/low-frequency variant calling
  • Ability to start from FASTQ or BAM
  • GRCh38 reference added
  • Variant table CSV file including variant frequencies
  • Improved variant annotation engine
  • Improved metrics engine

Continue reading Introducing Enrichment v3.0 with Enhanced Variant Calling

Differential Methylation Analysis with the MethylKit BaseSpace Labs App

In May 2015, Illumina introduced the MethylSeq 1.0 BaseSpace app for performing analysis on bisulfite sequencing data.  Now we are happy to announce release of the MethylKit BaseSpace Labs app (https://basespace.illumina.com/apps/1550550/MethylKit), which is focused on differential methylation analysis on two groups of bisulfite sequencing samples.  This BaseSpace Labs app is based on the MethylKit R package, published in 2012 in Genome Biology (http://www.genomebiology.com/2012/13/10/r87).  The MethylKit app includes these features:

  • Coverage Stats Plot for each sample
  • Methylation Stats Plot for each sample
  • Methylation Correlation Plot
  • Differential Methylation Summary Table (Per Chromosome)
  • Differential Methylation Regions (in csv file and bigwig file)
  • Methylation Stats Summary
  • Methylation Stats Percentile Information

Continue reading Differential Methylation Analysis with the MethylKit BaseSpace Labs App

Variant calling assessment using Platinum Genomes, NIST Genome in a Bottle, and VCAT 2.0

With the rapid improvements in sequencing throughput, cost, and ease of use, it’s becoming routine to generate lots of variant calls in the form of VCF files. But how do you know if your new variant calls are accurate? How can a non-bioinformatician compare variant calls from different sequencing platforms, reagent kits, biological samples, or software pipelines? Illumina is now offering a carefully designed and highly curated data set and a corresponding BaseSpace Labs App to address these types of comparison questions.

The Platinum Genomes project was started in 2011 with the goal of creating a high confidence, “platinum” quality reference variant call set. This was accomplished by sequencing a large family to high depth using a PCR-Free sample prep to maximize variant calling sensitivity. A large set of candidate variants was obtained from multiple methods and technologies. Candidates that were pedigree consistent were included in the reference call set. Based on this approach, Illumina has derived a set of high-confidence, pedigree-validated reference variant calls for Coriell samples NA12877 and NA12878.

The full set of Platinum Genomes public data and documentation are freely available at http://www.illumina.com/platinumgenomes/ . The BaseSpace Platinum Genomes Project also has copies of the platinum VCF files.

Please cite the Platinum Genomes website and Illumina, Inc. in publications and other public usage of the Platinum Genomes data.

In addition, Illumina has upgraded the Variant Calling Assessment Tool (VCAT 2.0) BaseSpace app. The app calculates SNV and indel statistics and optionally determines the overlap between the input variant call sets. Additionally, the quality of SNV and Indel calls can be assessed based on Platinum Genomes and/or NIST Genome in a Bottle (GIAB) reference variant calls. No existing tool currently offers a simple user interface for using both of these resources. The accuracy and comparison logic in VCAT is primarily based on vcftools, a commonly used open source toolkit for analyzing variant calls. More insight into how VCAT works is available by browsing the VCAT log file.

The Platinum Genomes project is led by Epameinondas Fritzilas and the VCAT project is led by Robert Schmieder, while many other team members have contributed. Please note that while both Platinum Genomes and VCAT are freely available, Illumina does not offer technical support for either of these resources.

There are many interesting ways to use these powerful new tools together. Here’s an example:

Case study on exome sequencing: How much depth is enough?

Using the “Combine Samples” feature in BaseSpace, Nextera Rapid Capture Exome samples of approximately 50x, 100x, 200x, and 400x were created from replicates of Coriell sample NA12878. The source data is here. A BaseSpace Project containing the resulting VCF files and the VCAT 2.0 results is here. The Platinum Genomes v7 recall numbers below suggest that 50x exome depth may only find 80% of the SNVs and 70% of the indels, while exome depths greater than 200x enable finding over 95% of SNVs and over 88% of indels.




VCAT 2.0 also enables the analysis of samples other than NA12878 via pairwise intersect comparisons. The Venn diagrams and corresponding tables shown below are from a VCAT report from the same example BaseSpace Project. When using this feature, VCAT also creates new VCF files which represent the unique SNV and indel calls, as well as VCF files for the common calls.




The Unique VCF files are also indexed for browsing within the BaseSpace IGV App.  Below is a screenshot which shows two SNVs that are found in the 105x exome, but are missed in the 53x exome due to low coverage depth.


That’s it for now. In an upcoming blog post, we’ll look at Platinum Genomes and NIST GIAB in more detail including some comparisons.

Nextera Rapid Capture Exome: New data sets, new manifest, and new analysis tools!

We are happy to introduce two new Nextera Rapid Capture Exome data sets in BaseSpace:

  • 12 exome samples sequenced (on 1 flow cell) on HiSeq 2500®
  • 1 exome sample sequenced on MiSeq

These exome data sets demonstrate the accuracy of the HiSeq 2500 & MiSeq sequencing platforms, the improved enrichment metrics from using the new targeted region manifest v1.2, and the power and ease of use of the BaseSpace BWA Enrichment App.

Be sure to take a look and compare the difference between the data sets analyzed with manifest v1.1 and v1.2. Both manifest versions are available for use in BaseSpace now.  The v1.2 manifest files will be available for download from the Illumina support web site in the near future and a URL will be provided in an update to this blog post.

Click on the links below to see the project and run folders. You will be asked to “Accept” the Run/Project into your BaseSpace account: this is the same mechanism you would use to share BaseSpace projects or runs with your colleagues/collaborators via a dedicated URL.

  • HiSeq 2500: Nextera Rapid Capture Exome (12plex, CEPH Trio replicates): Project (Sample data & analysis results), Run (QC plots & run summaries).
  • MiSeq v3: Nextera Rapid Capture Exome (NA12878): Project (Sample data & analysis results), Run (QC plots & run summaries).

Materials and Methods: Human Coriell CEPH trio samples NA12878, NA12891, and NA12892; Nextera Rapid Capture Exome kit; analysis with BaseSpace BWA Enrichment App.

Learn more about exome sequencingNextera Rapid Capture Exome Kits, and BaseSpace Core Apps.

Whole-genome and cancer analysis: Datasets from Illumina’s FastTrack Services Laboratory


We are pleased to announce the availability of data from two sequencing projects conducted in the Illumina FastTrack Services Laboratory through the Illumina Genome Network (IGN).   Whole-genome and Cancer Analysis Demo Datasets can now be accessed within or downloaded from BaseSpace for free through BaseSpace’s Public Data repository.

Whole-Genome Analysis Dataset:

Results from the ENCODE project reveal that many DNA variants previously associated to disease lie outside of the coding regions of genomic DNA.  Because whole-genome sequencing (WGS) gives researchers the most complete view, we offer the Illumina FastTrack Services Whole-genome Demo Dataset containing three WGS example datasets using the CEPH family trio sequenced to depth of ~30x coverage and analyzed using the Whole-Genome Sequencing Informatics Pipeline v2.0.  The project includes archival BAM files, variant calls (CNV, SV, & SNPs), a sample PDF summary report, and Illumina Omni2.5M genotyping data.

To access the shared whole-genome dataset in your BaseSpace account, click the following shared project link: https://basespace.illumina.com/s/dOTDV9brOuJJ.

Cancer Analysis Dataset:

Cancer possesses significant heterogeneity at the genetic and histological levels.  The Illumina FastTrack Services Cancer Analysis Demo Dataset uses the IGN variant calling and sequencing methodology to address this complexity using ATCC_HCC samples sequenced to 40x coverage for the normal tissue sample and 80x coverage for the tumor tissue sample.  The data is analyzed using Cancer Analysis Pipeline v2.0, which uses a Bayesian combined variant calling method that provides the most accurate models for real-life tumor samples, recovering 97% of known SNVs.  The datasets include the standard WGS deliverable, as well as somatic variant data, and somatic PDF summary report.

To access the shared cancer analysis dataset in your BaseSpace account, click the following shared project link: https://basespace.illumina.com/s/lAkSmtRTYN1Z.

More About the Illumina Genome Network:

The IGN, consisting of CSPro-certified organizations and Illumina FastTrack Services, offers highly accurate, affordable, end-to-end human whole-genome sequencing services.  The IGN laboratories have experienced scientists using TruSeq technology for superior coverage and quality of even challenging regions, and industry-leading HiSeq systems for the highest throughput.  IGN Services are finalized with data analysis by skilled bioinformaticians to accelerate researchers’ opportunities to discover more from the whole human genome.

We invite you to view these example IGN projects using BaseSpace Apps such as the Broad’s IGV, or by downloading files and exploring the data using your favorite tools.  See for yourself the unmatched performance, data quality and expertise of the Illumina Genome Network.