We are happy to announce the release of the Kraken Metagenomics App as a part of BaseSpace Apps.
With this BaseSpace Labs App researchers will be able to classify the presence of viruses and bacteria in their next-generation sequencing (NGS) samples. Kraken was developed by Derrick Wood in Steven Salzberg’s Lab at Johns Hopkins University. Unlike alignment-based classification methods, Kraken utilizes exact k-mer matching and a novel classification algorithm to perform taxonomic classification of NGS reads. The methods and performance have been described in detail in Genome Biology. The Kraken Metagenomics App limits the taxonomic classification to bacteria and viruses available in the MiniKraken 20140330 database.
In addition to using Kraken for classification, the app also provides a host removal feature which uses the SNAP aligner to remove human reads prior to classification. SNAP is an aligner that was developed by a team from the UC Berkeley AMP Lab, Microsoft, and UCSF. The output of the SNAP host removal step is an anonymized BAM file containing the host filtered reads. If host removal is selected then only the host filtered reads will be used by Kraken for taxonomic classification.
In order to demonstrate the performance of the App, we tested it on data described in a recent publication by Wilson et al., 2014. The authors were able to identify the presence of Leptospira in a CSF sample, using Illumina’s MiSeq desktop sequencer. This was in contrast to other methods, such as qPCR, which failed to identify the Leptospira. The data was downloaded from the SRA (Accession Number SRR1145846) and analyzed using the App. The SRA data contained 52,621 paired-end reads that remained after host filtering the original 3,063,784 paired-end reads. Reads were trimmed to remove sequencing adapter prior to analysis. The analysis completed in 30 minutes and generated results that are consistent with Wilson et al. This analysis demonstrates that the App is able to produce publication quality results with equivalent sensitivity.
The tables below show some of the basic metrics obtained from the App. Because the host removal option was selected for this analysis, 1,232 of the 52,621 reads were additionally identified as host. Of the remaining 51,389 reads, only 792 were classified as virus or bacteria. The remaining 50,597 reads which were not assigned a bacterial or viral taxonomy may have come from contamination by other organisms not found in the MiniKraken database or host reads not found in the human reference.
The Krona plot obtained from the Kraken Metagenomics App result is shown below. Krona plots allow for hierarchical data to be visualized with zoomable pie charts. In the case of metagenomics data Krona plots display the taxonomic hierarchy of a sample. The taxonomic levels are represented in the radial direction and the organisms within each taxonomic level in the angular direction. Greater than 20% of the 792 reads that were assigned a taxonomy, were identified as Leptospira which is consistent with the results of Wilson et al.
With this App, researchers now have access to a high performing, sensitive, and interactive tool for analyzing their metagenomics data in BaseSpace. Researchers can use this App to perform hypothesis-free studies of the structure of bacterial and viral communities present in environmental, industrial, and biological samples. We are excited to provide this App to the BaseSpace community and look forward to feedback and suggestions for improving later versions.
We want to invite all of you to the BaseSpace Developer Conference in San Francisco! We’ve been active with many BaseSpace Developer Conferences throughout the world this year, including Heidelberg, Singapore, Bangalore, and our most recent visit to the University of Tokyo in Japan!
First of all, we would like to thank all of our developers and speakers, you all made this possible. We hope it was a great learning experience and look forward to the apps we can bring to BaseSpace. Also, a big shout out to the University of Tokyo for hosting the event and our Illumina team in Japan.
The events showcase the new Native App Engine within BaseSpace with which developers can easily adapt their command-line pipelines into the BaseSpace cloud infrastructure or an infrastructure of their choice.
During the event, developers are taken through a step-by-step walkthrough where they develop two separate BaseSpace applications by the end! For anyone that is interested in learning more about BaseSpace App development, there is a lot of documentation available on the BaseSpace Developer Portal for both Native and Web applications.
We also spend time interacting with developers and users directly to brainstorm ideas and answer any questions they may have.
We are hosting another BaseSpace Developer Conference in San Francisco on December 8th, if you are interested in attending you can sign up here.
To get an idea of whats in store for you when you attend one of our developer conferences, check us out on twitter at #basedev2014.
For any further questions about BaseSpace App development, please view or post on the developer forum or contact us through BaseSpace support.
Hello. Aaron from AB SCIEX here, and we are adding Proteomics to BaseSpace. I bet you weren’t expecting that! But about a year ago we started working with the very nice people at Illumina and together we began to map out a grand vision of how we could better enable systems biology/translational medicine/functional genomics (did I miss one?). Both teams recognized that to really help our customers make revolutionary discoveries in biological research we needed to expand beyond our individual core competencies. For too long the omics technologies had been compartmentalized, and that really isn’t how it works in living cells.
In parallel, mass spectrometry-based proteomics was growing up. With the new AB SCIEX next-gen proteomics technology (a.k.a. SWATH Proteomics), we can quantify thousands of proteins in many, many samples reproducibly for the first time, so integration with genomics and transcriptomics would be more meaningful.
I am proud to announce the launch of OneOmics – an exclusive partnership to bring together AB SCIEX next-gen proteomics (NGP) and Illumina next-generation sequencing (NGS) tools in the BaseSpace cloud computing environment. There are four BaseSpace Apps in the AB SCIEX next-gen proteomics toolkit:
• Protein Expression Extractor – for processing raw mass spectrometry data
• Protein Expression Assembler – for protein fold-change analysis
• Protein Expression Browser – to visualize results in biological context
• Protein Expression Analytics – for data quality review
The Protein Expression Extractor and Assembler have some really fancy algorithms generating the results, and the high-powered distributed computing in BaseSpace delivers results up to 50x faster than on the usual high-end desktop computer (from 3 days down to a couple of hours!). Also the ‘don’t try this at home’ paradigm normally associated with mass spec proteomics is a thing of the past, and you don’t have to be a bioinformatics expert to process the data. It’s virtually parameter free. I told you the algorithms are fancy.
But the Protein Expression Browser is the really cool App. There’s not a mass spectrum in sight, and you don’t have to worry about any of the usual impenetrable jargon associated with mass spec proteomics. Just great visuals of showing your results in biological context.
Having proteomics and genomics data in the same place/cloud is a huge step forward in itself, but the AB SCIEX and Illumina teams wanted to take this a step further and integrate NGP and NGS. One of the main benefits of using BaseSpace is that there is already a community of bioinformatics developers publishing new apps, and we are really excited about the applications that our collaborators at the Institute for Systems Biology (ISB), Yale, NextBio and Advaita have been developing. Rob Moritz and his team at ISB have developed the SWATHAtlas Ion Library Generator App that can generate standard and modified SWATH Proteomics Libraries as part of their SWATHAtlas project. This means that researchers have easy access to human, yeast and MtB libraries currently, but more are on the way.
There are obviously many ways to combine genomics information with proteomics information. You could simply do your gene expression with Illumina’s TopHat, Cufflinks or RNA Express Apps, and then your protein expression with AB SCIEX’s next-gen proteomics toolkit, and then integrate the results (parallel analysis). But Chris Colangelo and Rob Kitchen at Yale University are developing the RNA-Seq Translator App that takes the output from Cufflinks and converts it to protein FASTA files. That App will be available soon. These files can then be converted into an NGP library and be used as the basis for proteomics analysis (serial analysis). The transcriptomics and proteomics results can be loaded into Advaita’s iPathwayGuide, or mapped onto the genome using NextBio, and differential transcript or gene expression can be compared with differential protein expression.
I hope you are as excited about the possibilities as we are here at AB SCIEX and Illumina. This is only the beginning…
If you’ve been using BaseSpace for a while you may have noticed that there wasn’t a way to permanently remove data from your account. I say that in the past tense because it is no longer true. The wait is over! “Move to Trash” is now available on Runs and Analyses.
This has been one of the most important features for us to get right because it has to do with removing your data and we take that very seriously. That is why we are introducing a two-step delete process that will help prevent accidental deletes and give you the confidence you need to safely manage your data.
First, you will notice a new action available on run and analysis list and detail pages, called “Move to Trash”. On the list pages, you must first highlight the row that you want before it’s available.
This action is very similar to moving files on your desktop to the trash or recycle bin. Just like your desktop, the data can be recovered, but it can no longer be viewed or acted upon.
Trashed Items Side-Effects:
- If the items were shared, all share recipients will lose access to that data
- All API access is immediately removed and will return the HTTP status code of 410 (“Gone”)
- Any attempt to view this data on the website will take the user to an 410 error page stating the content is “Gone”
- Data, while in the trash, can only be “Restored” or “Emptied” by the owner.
- Purging data will cause it to be permanently removed and cannot be undone.
Moving Runs to the Trash
- Runs can be put in the trash from the list or the detail pages.
- Runs cannot be removed if they are in a non-terminal state. The most common non-terminal states would be: running, uploading, analyzing.
- The dialog may also present you with the option to remove all associated analyses that used the run as input.
- All sequencing runs will have at least 1 associated analysis unless they were failed or used just for remote monitoring.
- If you are not the owner of the run, moving this item to the trash will simply remove your access and cannot be undone.
- To restore access, just contact the owner or click on the previously sent share link if it’s still active.
Moving Analyses to the Trash
- Analyses can be put in the trash from the list and detail pages.
- Analyses cannot be removed if in a non-terminal state. The most common non-terminal states would be: pending execution and running.
- If a project is being transferred, some of the analyses may not be removed until after the transfer has been completed.
- Apps that are leveraging data as input may fail if items are moved to the trash.
- If you have items in the trash, we prevent project transfers until all items in that project are restored or emptied.
Emptying and Restoring Items in the Trash
The trash page can be accessed from most of the project and run list pages. The icon is always in the right side of the grid and labeled, “View Trash”.
There are only two actions currently on the Trash page: Empty and Restore.
Empty will permanently delete all items, and Restore allows you to return the items back to being active.
Restored items will keep all of their original attributes except for the share recipients.
User Agreement Updates
Because of all of these changes, we have also updated our User agreements to reflect the behavior of these new features. In particular, item 7 states that even though data can be removed it may have been previously shared with other users or apps and subsequently downloaded or copied. You will be prompted to accept these new terms upon your next login. If you have any questions, don’t hesitate to ask!
Velvet de novo Assembly FastQC
Both applications are currently available for all users and were built using the BaseSpace Native App Engine by our internal R&D groups. These two applications are also the first BaseSpace Labs Apps of many more to come, the concept behind BaseSpace Labs Apps is explained in more detail below.
BaseSpace Labs Apps are Illumina’s internally developed applications that extend the functionality within BaseSpace. Some BaseSpace Labs applications will be experimental or research focused, while others will be used as a step in a greater workflow. The Apps are reviewed regularly by our team and put through the same review process as third-party apps.
BaseSpace Labs Apps are developed using an accelerated development process in order to make them available to BaseSpace users faster than the BaseSpace Core Apps. It is important to note that, unlike BaseSpace Core Apps, BaseSpace Labs Apps are not officially supported by Illumina Customer Service. Support for BaseSpace Labs applications is provided at the developer’s discretion and the apps are provided as-is without any warranty of any kind.
The FastQC app can be used to provide a quality assessment of the sequence data generated using Illumina sequencers. FastQC for BaseSpace is based on the FastQC software developed by the Bioinformatics Group at the Babraham Institute. It provides a modular set of analyses which can be used quickly to assess if there are any problems with the sequencing data before doing any additional analysis.
The above figure shows an example output from the FastQC app depicting the quality score across all bases at a given position in the reads. For an example of additional output generated by FastQC, please view this FastQC demo project.
The Velvet de novo Assembly app is a de novo assembly pipeline for bacterial samples using the Velvet assembler. One of the key features of this app is that it has an adapter trimming protocol that has been optimized for the Nextera Mate-Pair library prep kit. An application note describing the de novo assembly of 9 different bacterial using the Velvet de novo Assembly app can be found here. In many cases, a single contig representing the entire bacterial genome can be assembled. The figure below is an example of the output generate by the Velvet de novo Assembly app.
Example output generated by the Velvet de novo Assembly can be found here. We hope you enjoy the FastQC and Velvet de novo Assembly apps. For any questions, feedback, or feature requests for these applications, please send an email to firstname.lastname@example.org and include the name of the application. Thank you!
We are excited to announce the availability of a data upload feature for FASTQ files that were previously generated on Illumina sequencing instruments. This simple-to-use feature is accessible from any project to which the user has write access by first clicking on the project and then selecting the Import tab shown below.
The user will then be prompted to select their import type. The user can upload a single sample by clicking on “Sample” as shown below.
The user can then either “Drag and drop” one or more files into the webpage or click on “select files” and select which files they would like to upload from a file browser. Note that the FASTQ files need to adhere to Illumina standards, as specified below. Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively. It will take 1-2 hours to upload a 25GB sample on a network with a relatively fast internet connection.
The user will then see a progress bar as the file/s are uploaded. Once the progress bar completes, the user can add additional files. The user can also set the sample name and associate a genome with the sample in the upper left hand corner of the screen.
Once the user has imported all of the files and the files complete uploading, the user will need to click on the “Complete Import” button (shown above) to complete the session.
FASTQ file standards
- The uploader will only support gzipped FASTQ files generated on Illumina instruments
- The name of the FASTQ files must conform the following convention:
- SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
- The read descriptor in the FASTQ files must conform to the following convention:
- @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
- Read 1 descriptor would look like this:
- Read 2 would have a 2 in the ReadNum field, like this:
- Read 1 descriptor would look like this:
- @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
- The number of base calls for each read must equal the number of quality scores
- The number of entries for Read 1 must equal the number of entries for Read 2
- The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum
- For paired-end reads, the descriptor must match for every entry for both reads 1 and 2
- Each read has passed filter
- Only one sample can be uploaded at a time
- A maximum of 16 files can be uploaded in a session
- The size of the uploaded files cannot exceed 25 GB
- A detailed description of how to use the uploader can be found in the BaseSpace user guide
DeepChek®-HIV – App for genotyping by NGS and inferred drug resistance testing – for research use only
HIV genotyping and inferred drug resistance testing has become an integral part of the clinical management of patients infected with HIV. Detecting minority populations of resistant viruses is now routinely done. Next-generation sequencing (NGS) technology is replacing Sanger sequencing methodology, and end-to-end solutions combining sensitive genomic tests with advanced data management software platforms are in high demand.
DeepChek®-HIV is easy-to-use downstream analysis software for NGS data management, interpretation, and reporting for Research Use Only. DeepChek is a reliable software and database solution that is capable of handling the complexity of NGS data for all the key genomic regions involved in HIV drug resistance (reverse transcriptase, protease, integrase, GP41, and GP120/V3). The database is regularly updated with the most recent drug resistance information and provides an efficient and downstream analysis platform for clinical laboratories involved in routine HIV-1 genotyping and drug resistance testing.
Link to App in BaseSpace:
Link to example dataset with example input data and output results: