Tag Archive | Labs Apps

Import data from SRA into BaseSpace

When a user generates new data, a common workflow is to compare new results with previously published ones. So how would a user of BaseSpace do this? Until now, a user would have two choices:

  1. Download their BaseSpace data and previously published data to their local machine, and then build the bioinformatics workflow to process both datasets.
  2. Re-process previously published data to send into BaseSpace using the BaseSpace FASTQ uploader.

Today, we’re excited to announce that bringing data into BaseSpace from NCBI’s Sequence Read Archive (SRA) becomes push-button easy with the SRA Import App, our next BaseSpace Labs release.

applogo_v3

Inputs

The SRA Import App lets users easily import data from any of the big three public data repositories – SRA, the European Nucleotide Archive (ENA), and the DNA Data Bank of Japan (DDBJ). The only information needed is a valid accession number.

SRA Import form

Accession numbers

From the accession number, the app will capture all associated SRA Run data, and import the FASTQ files into BaseSpace so that you may use them with other BaseSpace apps. This means that you may enter in accessions for studies (SRP*/ERP*/DRP*), experiments (SRX*/ERX*/DRX*), samples (SRS*/ERS*/DRS*), runs (SRR*/ERR*/DRR*), or submissions (SRA*/ERA*/DRA*), and the app will import any associated FASTQ files. Some samples and studies have many associated FASTQ files. We currently limit the import to 25GB of data per request, so it may help to assess the size of the study and perform the import in consideration of the data limit..

SRA Runs == BaseSpace Samples

An initial challenge was reconciling SRA’s data model with the BaseSpace data model. There is variability in how data is captured in SRA, but by and large, for most submissions, an SRA Run matches what BaseSpace would call a BaseSpace Sample. This means that if one imports an SRA Study with multiple associated Runs, the Import App will create a different BaseSpace Sample for each Run.

SraBaseSpace_DataModel

There are some cases where this is incorrect. Particularly for older runs (back in the GA and GAII days), sometimes multiple runs were required to generate enough reads to analyze as one sample. In these cases, you can use the Combine tool in the BaseSpace UI to combine multiple samples into one logical sample.

combineGif_v2_optimized

Illumina data only

The app is currently limited to importing only Illumina data. We want to ensure that data imported with this app is compatible with our Core BaseSpace apps, and that currently means that we don’t support data from other platforms.

Under the Hood

Designing one tool to handle the variability within three huge public repositories is not trivial! Data formats and standards have changed over the years, even just from Illumina, and so the app will automatically check for older Illumina formats, and make modifications to convert data in an older format into a modern format. For example, data with quality scores encoded in the Phred+64 format will be automatically converted to Phred+33, and FASTQ headers, whose format from Illumina has changed over time, are rewritten for compatibility with BaseSpace Core Apps.

We hope you find this app useful for enabling you to do more with your data on BaseSpace! There will likely be some accession numbers that the app does not handle correctly. And for those, we invite users to send feedback so that the app can be improved.


For an example import and analysis of human gut microbiome data, click here: SRP024239 – Human Gut Microbiome Dataset


 

Introducing our First BaseSpace Labs Applications – FastQC and Velvet de novo Assembly

We are excited to announce two new applications in BaseSpace, FastQC and Velvet de novo Assembly.

 denovo_assembly_100                                     FastQC_icon_100

     Velvet de novo Assembly                                      FastQC

Both applications are currently available for all users and were built using the BaseSpace Native App Engine by our internal R&D groups.  These two applications are also the first BaseSpace Labs Apps of many more to come, the concept behind BaseSpace Labs Apps is explained in more detail below.

BaseSpace Labs Apps are Illumina’s internally developed applications that extend the functionality within BaseSpace.  Some BaseSpace Labs applications will be experimental or research focused, while others will be used as a step in a greater workflow.  The Apps are reviewed regularly by our team and put through the same review process as third-party apps.

BaseSpace Labs Apps are developed using an accelerated development process in order to make them available to BaseSpace users faster than the BaseSpace Core Apps.  It is important to note that, unlike BaseSpace Core Apps, BaseSpace Labs Apps are not officially supported by Illumina Customer Service.  Support for BaseSpace Labs applications is provided at the developer’s discretion and the apps are provided as-is without any warranty of any kind.

The FastQC app can be used to provide a quality assessment of the sequence data generated using Illumina sequencers.  FastQC for BaseSpace is based on the FastQC software developed by the Bioinformatics Group at the Babraham Institute.  It provides a modular set of analyses which can be used quickly to assess if there are any problems with the sequencing data before doing any additional analysis.

fastqcscreenshot

The above figure shows an example output from the FastQC app depicting the quality score across all bases at a given position in the reads.  For an example of additional output generated by FastQC, please view this FastQC demo project.

The Velvet de novo Assembly app is a de novo assembly pipeline for bacterial samples using the Velvet assembler.  One of the key features of this app is that it has an adapter trimming protocol that has been optimized for the Nextera Mate-Pair library prep kit.  An application note describing the de novo assembly of 9 different bacterial using the Velvet de novo Assembly app can be found here.  In many cases, a single contig representing the entire bacterial genome can be assembled.  The figure below is an example of the output generate by the Velvet de novo Assembly app.

rsz_denovoscreenshot

Example output generated by the Velvet de novo Assembly can be found here.  We hope you enjoy the FastQC and Velvet de novo Assembly apps.  For any questions, feedback, or feature requests for these applications, please send an email to basespacelabs@illumina.com and include the name of the application.  Thank you!