Deleting Samples and Projects

Our initial release of the delete feature provided users with the ability to delete Runs and Analyses, but we didn’t want to stop there.  Our users now have the ability to delete Samples and Projects from their accounts! Here’s how it works:

  • You can delete a Project from the list of all Projects in your account

Screen Shot 2014-12-19 at 3.14.55 AM

  • You can also remove a Project from its specific page

Screen Shot 2014-12-19 at 3.21.18 AM

  • Samples can be deleted from within a Project and multiple Samples can be selected at once

Screen Shot 2014-12-19 at 3.24.02 AM

  • Deleted items are first moved to the Trash, from there you may Restore items or Empty Trash

Screen Shot 2014-12-21 at 9.43.32 PM

  • If you delete a Project or Sample as a collaborator instead of an owner, the data will be un-shared with your account

For any questions, concerns, or feedback, please do not hesitate to contact us.  We are happy to help in any way that we can. Thanks!

Rounding out 2014 with new apps for the BaseSpace platform

We are looking forward to 2015 as we will continue to launch new Apps and support additional applications, but we are excited to close out 2014 with the release of three new Illumina Core Apps in BaseSpace:

image

The Amplicon-DS App enables analysis of the Illumina TruSight Tumor library prep kit. This solution is specifically design for analysis of all tumor samples, including FFPE. Using targeted TruSeq Amplicon chemistry and a unique, mirrored dual strand (“DS”) assay, researchers can easily detect low frequency somatic mutations. Amplicon-DS also leverages the mirrored dual strand design to reconcile variant calls and capture deamination events due to FFPE, providing confident measurements even in degraded samples.

The Isaac and BWA Enrichment v2.0 Apps add significant functionality over the Enrichment v1.0 Apps. Both Isaac and BWA can now analyze Nextera Rapid Capture Custom panels built in Illumina’s DesignStudio. Isaac Enrichment v2.0 includes Illumina’s own Isaac pipeline for alignment and variant calling. BWA Enrichment v2.0 incorporates the latest aligner, BWA-MEM, which provides improved accuracy (especially when calling structural variants) and increased speed. Both the Isaac and BWA Enrichment v1.0 Apps are available concurrently with v2.0 Apps in BaseSpace Cloud.

In addition to the above Illumina Core Apps we are also launching a BaseSpace Labs App called FASTQ Toolkit v1.0.

image

This App enables the user to have enhanced control over their data, allowing manipulation of FASTQ files including adapter trimming, quality trimming, length filtering, and down-sampling.Users can now down-sample or quality-trim their data and determine what effect that has on their variants, gene expression results, or bacterial classifications. Users could also assess their sample data with the FastQC App and then use that information to optimize their samples with the FASTQ Toolkit v1.0.

Specs for the FASTQ toolkit v1.0 are as follows:

Input- BaseSpace samples (max=200GB per analysis) and user specified parameters that define how the input sample(s) should be processed.

Output- Samples that can be accessed on the “Samples” page of the selected output project. In addition, the App generates a statistics summary file in JSON format that is used to generate the BaseSpace report.

Adapter Trimming-  performed using the approximate matching approach described in TagCleaner. The adapter sequence can be specified separately for the 5′- and 3′-end. Poly-A/T tails are considered repeats of As or Ts at the sequence ends. Trimming them can reduce the number of false positives during database searches, as long tails tend to align well to sequences with low complexity or sequences with tails (e.g. viral sequences) in the database.

Bases can be trimmed from either the 5′- or 3′-end. Alternatively, reads can be trimmed to a maximum read length. Quality trimming on the 3’-end is also available. Note: Aligners such as BWA and Isaac perform trimming internally during alignment. The trimming logic was adapted from BWA.

Down-sampling is performed when only a subset of the sample is needed for an application, such as de novo assembly with memory constraints, or when it is not necessary to process a full sample, like validating an approach at varying levels of genomic coverage.

Filtering- Paired-end reads are only filtered (and removed from the sample) if both reads are filtered out. Otherwise, the filtered mate is replaced by a sequence of Ns (number of Ns will be the minimum read length) to keep the order of pairs in the FASTQ files, which is necessary for many secondary analysis tools.

Nextera Mate-pair conversion- The App supports conversion of Nextera Mate-Pair oriented reads to paired-end oriented reads.

The output of the App contains a set of before and after metrics so you can quickly see the properties of your new data. The table below is an example of the results of down sampling 2,957,468 read pairs to 500,000 read pairs and at the same time performing quality trimming (< Q30) from the 3’ end of the reads.

image

A read length distribution is also provided as shown below for Read 1. The read length distribution provides the distribution of read lengths in your data before and after trimming and allows the user to quickly asses what effect the trimming had on their data.

image

Finally a read filtering summary is provided as shown below. Read filtering will only contain numbers if an option that turns on read filtering such as quality trimming (filters reads < 32 bps) is selected.

image

We are very proud of the hard work our team has put into providing these Apps for the NGS community and look forward to and even more exciting 2015.

Introducing BSFS, the BaseSpace File System

Today, together with the current release of BaseSpace, we would like to announce the release of a product that has gotten I and other developers on the BaseSpace team really excited and really busy over the past months: BaseSpace File System (abbreviated as BSFS or BaseSpace FS) – a feature that many of our developers on the BaseSpace platform have been asking for – is a way for you to directly mount your Samples and Appresults’ data residing in BaseSpace into your docker containers and access it on a strictly as-needed basis.

A range of improvements

This addition to the BaseSpace platform will bring in a great number of benefits, which I will go over now:

  • No pre-download of your Samples and AppResults

When running your apps on the BaseSpace Native App Engine with BSFS turned on, you will notice your applications executing right away upon launch. The usual pre-download step, which could take a good few hours on those very large NextSeq or HiSeq samples is now eliminated.

  • Less network data transfers

When an app executes on an input sample or app result, there is no guarantee that it will use the entire input dataset, up until today the entire input dataset had to be downloaded before any processing could happen – this is no longer the case. BSFS presents a virtual view of your data in the file system, and downloads only the data that is actually read from the files.

  • Overlap computation and network transfers

A typical data processing workflow is for an app to read data then process it in an iterative fashion. In order to make this process more efficient BSFS features a data pre-fetch mechanism: while the app is processing data at a certain location in a file, the data directly adjacent and following this location is downloaded automatically. This has the effect of mitigating issues in download speed due to network latency.

  • An Improved workflow for developers

One of the major areas of focus of the BaseSpace platform team has been to provide developers with an awesome experience, and adding BSFS to the platform will make things even more awesome!
Soon after this release we will be providing a public Amazon Machine Image (AMI) which is the same one we are using in production today. This image contains all that’s required to get started coding in BaseSpace together with BSFS. This is a huge improvement of the developer workflow as it will provide an environment that is readily usable and in which you can simply drop apps in a docker container and see them interact with your BaseSpace data within minutes of getting started!
Finally, with the download step eliminated, there is nothing left to get in the way of a highly iterative development process, where developers can work directly with their BaseSpace data.

  • New and existing apps

All new apps created in the BaseSpace developer portal will now have BSFS turned on by default. Also, we have made sure that existing apps can benefit fully from this new addition, hence if you have been following developer guidelines and conventions (ie. the /data/input drive should not be written to), enabling BSFS in your existing app should be as easy as flicking the switch.

Upon creation of a new application in the developer portal, you will notice a slightly modified launch spec callback, with a new Options array that is used to turn on bsfs:

function launchSpec(dataProvider)
{
    var ret = {
        commandLine: [ "cat", "/illumina.txt" ],
        containerImageId: "basespace/demo",
        Options: [ "bsfs.enabled=true" ]
    };
    return ret;
}

You will want to use a callback function with this new Options array, in order to enable BSFS in your existing app.

Also, as of today the 16S Metagenomics v1.0 app is running with BaseSpace FS switched on to reap all the performance benefits. In the coming weeks, we will turn on BSFS for the rest of the BaseSpace core apps.

  • Real world performance improvements

The kinds of speed-ups we are seeing on these apps are only scratching the surface for the potential speed-ups we can get. On large samples processed on a single node the performance benefits are less pronounced since the download time is dwarfed by the compute time, however multi-node applications that access part of an input sample or app result will benefit greatly as the download portion is always a major contributor to the overall execution time.

With that, I hope you will share my excitement with this announcement, and that BSFS will make your development process even more awesome in BaseSpace.

Links to more resources

BaseSpace Developer portal
BaseSpace FileSystem Developer Documentation
Using BSFS

Import data from SRA into BaseSpace

When a user generates new data, a common workflow is to compare new results with previously published ones. So how would a user of BaseSpace do this? Until now, a user would have two choices:

  1. Download their BaseSpace data and previously published data to their local machine, and then build the bioinformatics workflow to process both datasets.
  2. Re-process previously published data to send into BaseSpace using the BaseSpace FASTQ uploader.

Today, we’re excited to announce that bringing data into BaseSpace from NCBI’s Sequence Read Archive (SRA) becomes push-button easy with the SRA Import App, our next BaseSpace Labs release.

applogo_v3

Inputs

The SRA Import App lets users easily import data from any of the big three public data repositories – SRA, the European Nucleotide Archive (ENA), and the DNA Data Bank of Japan (DDBJ). The only information needed is a valid accession number.

SRA Import form

Accession numbers

From the accession number, the app will capture all associated SRA Run data, and import the FASTQ files into BaseSpace so that you may use them with other BaseSpace apps. This means that you may enter in accessions for studies (SRP*/ERP*/DRP*), experiments (SRX*/ERX*/DRX*), samples (SRS*/ERS*/DRS*), runs (SRR*/ERR*/DRR*), or submissions (SRA*/ERA*/DRA*), and the app will import any associated FASTQ files. Some samples and studies have many associated FASTQ files. We currently limit the import to 25GB of data per request, so it may help to assess the size of the study and perform the import in consideration of the data limit..

SRA Runs == BaseSpace Samples

An initial challenge was reconciling SRA’s data model with the BaseSpace data model. There is variability in how data is captured in SRA, but by and large, for most submissions, an SRA Run matches what BaseSpace would call a BaseSpace Sample. This means that if one imports an SRA Study with multiple associated Runs, the Import App will create a different BaseSpace Sample for each Run.

SraBaseSpace_DataModel

There are some cases where this is incorrect. Particularly for older runs (back in the GA and GAII days), sometimes multiple runs were required to generate enough reads to analyze as one sample. In these cases, you can use the Combine tool in the BaseSpace UI to combine multiple samples into one logical sample.

combineGif_v2_optimized

Illumina data only

The app is currently limited to importing only Illumina data. We want to ensure that data imported with this app is compatible with our Core BaseSpace apps, and that currently means that we don’t support data from other platforms.

Under the Hood

Designing one tool to handle the variability within three huge public repositories is not trivial! Data formats and standards have changed over the years, even just from Illumina, and so the app will automatically check for older Illumina formats, and make modifications to convert data in an older format into a modern format. For example, data with quality scores encoded in the Phred+64 format will be automatically converted to Phred+33, and FASTQ headers, whose format from Illumina has changed over time, are rewritten for compatibility with BaseSpace Core Apps.

We hope you find this app useful for enabling you to do more with your data on BaseSpace! There will likely be some accession numbers that the app does not handle correctly. And for those, we invite users to send feedback so that the app can be improved.


For an example import and analysis of human gut microbiome data, click here: SRP024239 – Human Gut Microbiome Dataset


 

Annotate your RNA-Seq data with NextBio Research

We’ve developed a new BaseSpace app named NextBio Annotates RNA-Seq that lets you ‘test-drive’ NextBio Research on your RNA-Seq data. With a few clicks, you can find how the most differentially expressed genes in your experiment are correlated with diseases, tissue types, public studies, and more.

NextBio Annotates RNA-Seq

NextBio Annotates RNA-Seq

 

When you run the Cufflinks Assembly and DE app on your RNA-Seq data, you receive a list of differentially expressed genes. What do these genes do? In what cell lines are they expressed? What drugs are they associated with? These are the questions that NextBio Research helps to answer.

When you run the NextBio Annotates RNA-Seq app, the most differentially expressed genes from your experiment are annotated via the NextBio Research API. The output report shows you how those genes are correlated with many public experiments. This is accomplished by the extensive curation of public data that has been performed by the NextBio curation team.

Example output from the NextBio Annotates RNA-Seq app

 

Of course this app gives just a taste of what NextBio Research can do. In the full version of the product, you can import data of many different types — not just RNA-Seq data. You can use the NextBio Correlation Engine to find how ranked lists of genes from your experiments compare to those in thousands of public experiments. If you’re interested, you can try the full version for free.

We hope this app shows you how NextBio Research can enrich your experiments, by easily annotating and correlating your data with curated public experiments.

Shotgun metagenomics can now be analyzed in the BaseSpace platform.

We are happy to announce the release of the Kraken Metagenomics App as a part of BaseSpace Apps.

image

With this BaseSpace Labs App researchers will be able to classify the presence of viruses and bacteria in their next-generation sequencing (NGS) samples. Kraken was developed by Derrick Wood in Steven Salzberg’s Lab at Johns Hopkins University. Unlike alignment-based classification methods, Kraken utilizes exact k-mer matching and a novel classification algorithm to perform taxonomic classification of NGS reads. The methods and performance have been described in detail in Genome Biology. The Kraken Metagenomics App limits the taxonomic classification to bacteria and viruses available in the MiniKraken 20140330 database.

In addition to using Kraken for classification, the app also provides a host removal feature which uses the SNAP aligner to remove human reads prior to classification. SNAP is an aligner that was developed by a team from the UC Berkeley AMP Lab, Microsoft, and UCSF. The output of the SNAP host removal step is an anonymized BAM file containing the host filtered reads. If host removal is selected then only the host filtered reads will be used by Kraken for taxonomic classification.

In order to demonstrate the performance of the App, we tested it on data described in a recent publication by Wilson et al., 2014. The authors were able to identify the presence of Leptospira in a CSF sample, using Illumina’s MiSeq desktop sequencer. This was in contrast to other methods, such as qPCR, which failed to identify the Leptospira. The data was downloaded from the SRA (Accession Number SRR1145846) and analyzed using the App. The SRA data contained 52,621 paired-end reads that remained after host filtering the original 3,063,784 paired-end reads. Reads were trimmed to remove sequencing adapter prior to analysis. The analysis completed in 30 minutes and generated results that are consistent with Wilson et al. This analysis demonstrates that the App is able to produce publication quality results with equivalent sensitivity.

The tables below show some of the basic metrics obtained from the App. Because the host removal option was selected for this analysis, 1,232 of the 52,621  reads were additionally identified as host. Of the remaining 51,389 reads, only 792 were classified as virus or bacteria. The remaining  50,597 reads which were not assigned a bacterial or viral taxonomy may have come from contamination by other organisms not found in the MiniKraken database or host reads not found in the human reference.

image

The Krona plot obtained from the Kraken Metagenomics App result is shown below. Krona plots allow for hierarchical data to be visualized with zoomable pie charts.  In the case of metagenomics data Krona plots display the taxonomic hierarchy of a sample. The taxonomic levels are represented in the radial direction and the organisms within each taxonomic level in the angular direction. Greater than 20% of the 792 reads that were assigned a taxonomy, were identified as Leptospira which is consistent with the results of Wilson et al.

image

 

With this App, researchers now have access to a high performing, sensitive, and interactive tool for analyzing their metagenomics data in BaseSpace. Researchers can use this App to perform hypothesis-free studies of the structure of bacterial and viral communities present in environmental, industrial, and biological samples. We are excited to provide this App to the BaseSpace community and look forward to feedback and suggestions for improving later versions.

 

Upcoming BaseSpace Developer Conference in San Francisco!

We want to invite all of you to the BaseSpace Developer Conference in San Francisco!  We’ve been active with many BaseSpace Developer Conferences throughout the world this year, including Heidelberg, Singapore, Bangalore, and our most recent visit to the University of Tokyo in Japan!

First of all, we would like to thank all of our developers and speakers, you all made this possible.  We hope it was a great learning experience and look forward to the apps we can bring to BaseSpace.  Also, a big shout out to the University of Tokyo for hosting the event and our Illumina team in Japan.

developer pic

The events showcase the new Native App Engine within BaseSpace with which developers can easily adapt their command-line pipelines into the BaseSpace cloud infrastructure or an infrastructure of their choice.

During the event, developers are taken through a step-by-step walkthrough where they develop two separate BaseSpace applications by the end!  For anyone that is interested in learning more about BaseSpace App development, there is a lot of documentation available on the BaseSpace Developer Portal for both Native and Web applications.

B1KY_rrCIAAW48F

We also spend time interacting with developers and users directly to brainstorm ideas and answer any questions they may have.

helping individual dev

We are hosting another BaseSpace Developer Conference in San Francisco on December 8th, if you are interested in attending you can sign up here.

To get an idea of whats in store for you when you attend one of our developer conferences, check us out on twitter at #basedev2014.

For any further questions about BaseSpace App development, please view or post on the developer forum or contact us through BaseSpace support.