When sequencing data supports a new finding that is reported to the scientific community, it is common practice to share the sequencing data that generated the result, through an archive such as NCBI’s Sequence Read Archive. How would a BaseSpace user do this? Until now, a BaseSpace user would have to download their data locally, and then prepare an SRA submission through NCBI’s published guidelines.
Since BaseSpace already stores some of the metadata that go into a submission, we felt we could make this process easier. Today, we’re excited to announce the release of the SRA Submission App, a BaseSpace Labs release. Built in collaboration with NCBI, the app prepares your submission XML and transfers the data directly to SRA from BaseSpace. Read More…
With the addition of NextBio products to our informatics offerings, Illumina adds one of the richest compendia of curated genomic data in existence today. As a starting point towards integrating our BaseSpace and NextBio platforms, we are proud to announce the release of the NextBio Transporter App as our latest BaseSpace Labs release. The Transporter sends analysis results from BaseSpace into NextBio Research and requires that users have an existing account with NextBio.
Similar to the NextBio Annotates RNA-Seq App, the NextBio Transporter uses an AppResult as input to the app, and currently supports outputs from the Cufflinks or RNA Express Core Apps from Illumina. You must also specify your account and domain information in NextBio Research, and the app takes care of everything else.
As output, users are provided a link to the transported data in NextBio Research, and a QuickView is also generated which displays the information NextBio has found relating to the input data.
Within NextBio Research, users can then explore connections with curated content. For example, by clicking on “Curated Studies”, we can pull up published studies that have produced results that are highly correlated with our transported dataset. NextBio Research offers an incredibly rich platform for biological information, and we are excited to now provide the ability for BaseSpace users to connect their sequencing data to the biological insights offered by NextBio.
When a user generates new data, a common workflow is to compare new results with previously published ones. So how would a user of BaseSpace do this? Until now, a user would have two choices:
- Download their BaseSpace data and previously published data to their local machine, and then build the bioinformatics workflow to process both datasets.
- Re-process previously published data to send into BaseSpace using the BaseSpace FASTQ uploader.
The SRA Import App lets users easily import data from any of the big three public data repositories – SRA, the European Nucleotide Archive (ENA), and the DNA Data Bank of Japan (DDBJ). The only information needed is a valid accession number.
From the accession number, the app will capture all associated SRA Run data, and import the FASTQ files into BaseSpace so that you may use them with other BaseSpace apps. This means that you may enter in accessions for studies (SRP*/ERP*/DRP*), experiments (SRX*/ERX*/DRX*), samples (SRS*/ERS*/DRS*), runs (SRR*/ERR*/DRR*), or submissions (SRA*/ERA*/DRA*), and the app will import any associated FASTQ files. Some samples and studies have many associated FASTQ files. We currently limit the import to 25GB of data per request, so it may help to assess the size of the study and perform the import in consideration of the data limit..
SRA Runs == BaseSpace Samples
An initial challenge was reconciling SRA’s data model with the BaseSpace data model. There is variability in how data is captured in SRA, but by and large, for most submissions, an SRA Run matches what BaseSpace would call a BaseSpace Sample. This means that if one imports an SRA Study with multiple associated Runs, the Import App will create a different BaseSpace Sample for each Run.
There are some cases where this is incorrect. Particularly for older runs (back in the GA and GAII days), sometimes multiple runs were required to generate enough reads to analyze as one sample. In these cases, you can use the Combine tool in the BaseSpace UI to combine multiple samples into one logical sample.
Illumina data only
The app is currently limited to importing only Illumina data. We want to ensure that data imported with this app is compatible with our Core BaseSpace apps, and that currently means that we don’t support data from other platforms.
Under the Hood
Designing one tool to handle the variability within three huge public repositories is not trivial! Data formats and standards have changed over the years, even just from Illumina, and so the app will automatically check for older Illumina formats, and make modifications to convert data in an older format into a modern format. For example, data with quality scores encoded in the Phred+64 format will be automatically converted to Phred+33, and FASTQ headers, whose format from Illumina has changed over time, are rewritten for compatibility with BaseSpace Core Apps.
We hope you find this app useful for enabling you to do more with your data on BaseSpace! There will likely be some accession numbers that the app does not handle correctly. And for those, we invite users to send feedback so that the app can be improved.
For an example import and analysis of human gut microbiome data, click here: SRP024239 – Human Gut Microbiome Dataset
Velvet de novo Assembly FastQC
Both applications are currently available for all users and were built using the BaseSpace Native App Engine by our internal R&D groups. These two applications are also the first BaseSpace Labs Apps of many more to come, the concept behind BaseSpace Labs Apps is explained in more detail below.
BaseSpace Labs Apps are Illumina’s internally developed applications that extend the functionality within BaseSpace. Some BaseSpace Labs applications will be experimental or research focused, while others will be used as a step in a greater workflow. The Apps are reviewed regularly by our team and put through the same review process as third-party apps.
BaseSpace Labs Apps are developed using an accelerated development process in order to make them available to BaseSpace users faster than the BaseSpace Core Apps. It is important to note that, unlike BaseSpace Core Apps, BaseSpace Labs Apps are not officially supported by Illumina Customer Service. Support for BaseSpace Labs applications is provided at the developer’s discretion and the apps are provided as-is without any warranty of any kind.
The FastQC app can be used to provide a quality assessment of the sequence data generated using Illumina sequencers. FastQC for BaseSpace is based on the FastQC software developed by the Bioinformatics Group at the Babraham Institute. It provides a modular set of analyses which can be used quickly to assess if there are any problems with the sequencing data before doing any additional analysis.
The above figure shows an example output from the FastQC app depicting the quality score across all bases at a given position in the reads. For an example of additional output generated by FastQC, please view this FastQC demo project.
The Velvet de novo Assembly app is a de novo assembly pipeline for bacterial samples using the Velvet assembler. One of the key features of this app is that it has an adapter trimming protocol that has been optimized for the Nextera Mate-Pair library prep kit. An application note describing the de novo assembly of 9 different bacterial using the Velvet de novo Assembly app can be found here. In many cases, a single contig representing the entire bacterial genome can be assembled. The figure below is an example of the output generate by the Velvet de novo Assembly app.
Example output generated by the Velvet de novo Assembly can be found here. We hope you enjoy the FastQC and Velvet de novo Assembly apps. For any questions, feedback, or feature requests for these applications, please send an email to firstname.lastname@example.org and include the name of the application. Thank you!