Genome-in-a-Day

From the beginning, we designed BaseSpace to be a place where things moved quickly. We did this largely so that our customers and partners could easily deploy their own apps on the platform (more to come on that in a later post), but we also did it out of necessity to keep pace with the breakneck innovation from our sequencing systems. As many of you no doubt have seen, Illumina recently announced a new high-throughput sequencing instrument called the HiSeq2500, which will produce high quality, high coverage human genomes in a single day. This week, at the Advances in Genome Biology and Technology (AGBT) meeting, Illumina is releasing the first public dataset to be run on this new system, described in the Application Note here.

The Coriell sample NA18507 was prepared using a modified version of the TruSeq DNA sample prep protocol and sequenced at 40X depth in “rapid run” mode. It yielded >90% reads above Q30, for an output of ~135Gb. Bcl files generated by the HiSeq 2500 were converted to fastq’s and aligned against human reference build hg19; BAM and variant calls were generated using CASAVA v1.8.2 and produced >95% dbSNP concordance. A few additional secondary build metrics:

In keeping with our rapid deployment ethos, this week the results of this first dataset are being made available for navigation within BaseSpace using a prototype genome browser. You can view histograms of both snps and coverage plots when zoomed out, move seamlessly from chromosome level to the base pair level, view directional stacked reads displaying variants to reference, use the gene track to link out to NCBI, and download variants in VCF format.

More importantly, it’s fast enough that serving a 135Gbyte dataset to a thin client feels about as smooth as running through MiSeq data. The fact that data from the world’s fastest human genome sequencing machine can be analyzed this way shows that BaseSpace is on its way to dramatically reducing the complexity of sequencing humans.

The browser highlights what’s possible in terms of “big data” storage and visualization in BaseSpace. But note that it is a prototype we’ve stood up just for the purpose of examining this dataset; we’ll have something far more feature-forward in the coming months, and that browser will go into general release for all users. So with that disclaimer, please enjoy the dataset, explore the nooks and crannies of BaseSpace, and use the feedback button on the right to let us know how we can make it better.

Access Genome-in-a-Day dataset:

Run: https://basespace.illumina.com/s/8fPIHc

Project: https://basespace.illumina.com/s/CWDKXszRUpUV

– Jason Blue-Smith, Product Manager

One response to “Genome-in-a-Day”

  1. cydne says :

    Thanks for this Jason!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: