Tag Archive | HiSeq 2500

Introducing fast, free alignment and variant calling with the Isaac Human Whole Genome Sequencing App

With the widespread adoption of the HiSeq 2500 and its lightning speed, enabling biologists to quickly and inexpensively extract biological information from sequences has become a critical need1,2. However, the management and analysis of large data sets is widely recognized as an obstacle to a wide adoption of next-generation sequencing, requiring large IT investment and bio-informatics expertise to set-up, maintain and run software at reasonable speed, especially for the most demanding applications like Whole Genome Sequencing (WGS).

To address this, Illumina has developed a user-friendly human WGS analysis workflow to enable scientists with no bioinformatics experience to align and call variants in whole human-genome data 4-6 times faster than existing methods. Combined with Illumina’s PCR-free sample preparation and the HiSeq 2500, the workflow provides a sample to answer time of less than 2 days.

In the words of Waibhav Tembe, Ph.D., Director of the Collaborative Bioinformatics Center at TGen “For whole genome sequencing, the aligner did an awesome job in cutting down the time to align 30x data against human genome and in using available hardware resources effectively.”

With the Isaac Human WGS app in BaseSpace, HiSeq users can now analyze and store WGS data without bioinformatics expertise, Linux experience or IT infrastructure. The workflow is free to use and can be accessed here (access requires a free BaseSpace account).

For those who prefer to keep their data on premises, the workflow is available as part of the HiSeq Analysis Software (HAS), freely available on the Illumina website here. HAS , can analyze WGS in a few hours, on a commodity PC with a single command line or an easy to use Graphical User Interface.

The component algorithms for the Isaac aligner and Variant Caller are released as open source here for developers to re-use and improve them. The open source version of the Isaac aligner is not commercially supported and provided as is under Illumina Open Source Software License available here.

Finally, data generated by Illumina’s IGN services uses the Isaac Human WGS workflow.

You can find more details on table 1 below and in our white paper available for download  here.

Table 1: Isaac Human WGS workflow on premises with the HiSeq Analysis Software. Comparison of analysis metrics with the BWA + GATK workflow showing comparable data is generated 6 times faster.

2013-03-05-IsaacPlanningExit-v4 [Read-Only] - Microsoft PowerPoint_2013-06-03_10-12-31

(1)    Saunders, C. J. et al. (2012) Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units Sci Transl Med 4:154ra1352.

(2)    Jones, S. J. et al. (2010) Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol. 11, R82

High Concordance between HiSeq 2000 and HiSeq 2500 Data

This new HiSeq® 2500 dataset, compares data quality and run time for a 2x100bp sequencing run on the HiSeq 2000 and the HiSeq 2500*. The HiSeq 2500 system maintains the industry-leading data quality of the HiSeq 2000, but delivers daily throughput in excess of 100Gb when used in rapid run mode.

Altogether, our sample-to-answer workflow takes around 50 hours for a 2x100bp and we will soon commercialize methods to further improve this time.

Click on the links below to see the project and run folders. You will be asked to “Accept” the Run/Project into your BaseSpace account: this is the same mechanism you will use to share specific real-life projects or runs with your colleagues/collaborators via a dedicated URL.

HiSeq 2000 Run, HiSeq 2500 Run 1 (Flow Cell 1) and Run 2 (Flow Cell 2), Project (alignment and variant calling, analysis with App Store, file downloads)

Materials and Methods: Human sample NA12878**, TruSeq Rapid SBS and Cluster kits (HiSeq 2500) or v3 kits (HiSeq 2000), PCR-free sample prep (in development), BWA/GATK analysis.

Summary of run

Summary of BWA/GATK alignment/variant calling

HiSeq 2000 vs 2500_blog_Final_Final

* Learn more about the features and specifications of the HiSeq 2500 system here. This is our third HiSeq 2500 dataset. View the “Genome in a Day” blog and dataset here. View the HiSeq 2x150bp blog and dataset here.

**A member of the well-studied CEPH family. See details here.

*** The total run on the HiSeq 2000 exceeded 600 Gb, but we focus on the yield of one of the samples for comparison purposes.

2x150bp Human Genome in Record Time with the HiSeq 2500

We are very happy to announce the BaseSpace availability of our second HiSeq 2500® dataset*. It demonstrates the ability to provide high quality 2x150bp reads in record time: 176 Gb in ~40h including on board cluster generation and sequencing, with 90.2% of bases at or above Q30, high quality alignment and variant calling.

Long reads allow a more precise analysis of gene fusions and structural variations, which have both been implicated in cancer and other diseases. Long reads also increase the quality of de novo assemblies based on metrics such as N50, contig size and genome coverage**. The incredible speed, daily throughput and data quality of the HiSeq 2500 is critical in settings where fast and accurate answers are required.

Altogether, our sample-to-analysis workflow takes around 50 hours for a 2x100bp run and around 74h for a 2x150bp run and we will soon commercialize methods to further improve this time.

Click on the links below to see the project and run folders. You will be asked to “Accept” the Run/Project into your BaseSpace account: this is the same mechanism you will use to share specific real-life projects or runs with your colleagues/collaborators via a dedicated URL.

Run 1 (Flow Cell 1), Run 2 (Flow Cell 2), Project (alignment and variant calling, analysis with App Store, file downloads)

Materials and Methods: Human Sample NA12878***, TruSeq Rapid SBS and Cluster Kits, PCR-free sample prep (in development), BWA/GATK analysis.

Summary of run

Summary of BWA/GATK alignment/variant calling

 

* Learn more about the features and specifications of the HiSeq 2500 system here  and see the first HiSeq 2500 “Genome in a Day” blog and dataset here.

** Benefits of Long, Paired-End Data for De Novo assembly are described in the Tech Note here

*** A member of the well-studied CEPH family. See details here.