Hello. Aaron from AB SCIEX here, and we are adding Proteomics to BaseSpace. I bet you weren’t expecting that! But about a year ago we started working with the very nice people at Illumina and together we began to map out a grand vision of how we could better enable systems biology/translational medicine/functional genomics (did I miss one?). Both teams recognized that to really help our customers make revolutionary discoveries in biological research we needed to expand beyond our individual core competencies. For too long the omics technologies had been compartmentalized, and that really isn’t how it works in living cells.
In parallel, mass spectrometry-based proteomics was growing up. With the new AB SCIEX next-gen proteomics technology (a.k.a. SWATH Proteomics), we can quantify thousands of proteins in many, many samples reproducibly for the first time, so integration with genomics and transcriptomics would be more meaningful.
I am proud to announce the launch of OneOmics – an exclusive partnership to bring together AB SCIEX next-gen proteomics (NGP) and Illumina next-generation sequencing (NGS) tools in the BaseSpace cloud computing environment. There are four BaseSpace Apps in the AB SCIEX next-gen proteomics toolkit:
• Protein Expression Extractor – for processing raw mass spectrometry data
• Protein Expression Assembler – for protein fold-change analysis
• Protein Expression Browser – to visualize results in biological context
• Protein Expression Analytics – for data quality review
The Protein Expression Extractor and Assembler have some really fancy algorithms generating the results, and the high-powered distributed computing in BaseSpace delivers results up to 50x faster than on the usual high-end desktop computer (from 3 days down to a couple of hours!). Also the ‘don’t try this at home’ paradigm normally associated with mass spec proteomics is a thing of the past, and you don’t have to be a bioinformatics expert to process the data. It’s virtually parameter free. I told you the algorithms are fancy.
But the Protein Expression Browser is the really cool App. There’s not a mass spectrum in sight, and you don’t have to worry about any of the usual impenetrable jargon associated with mass spec proteomics. Just great visuals of showing your results in biological context.
Having proteomics and genomics data in the same place/cloud is a huge step forward in itself, but the AB SCIEX and Illumina teams wanted to take this a step further and integrate NGP and NGS. One of the main benefits of using BaseSpace is that there is already a community of bioinformatics developers publishing new apps, and we are really excited about the applications that our collaborators at the Institute for Systems Biology (ISB), Yale, NextBio and Advaita have been developing. Rob Moritz and his team at ISB have developed the SWATHAtlas Ion Library Generator App that can generate standard and modified SWATH Proteomics Libraries as part of their SWATHAtlas project. This means that researchers have easy access to human, yeast and MtB libraries currently, but more are on the way.
There are obviously many ways to combine genomics information with proteomics information. You could simply do your gene expression with Illumina’s TopHat, Cufflinks or RNA Express Apps, and then your protein expression with AB SCIEX’s next-gen proteomics toolkit, and then integrate the results (parallel analysis). But Chris Colangelo and Rob Kitchen at Yale University are developing the RNA-Seq Translator App that takes the output from Cufflinks and converts it to protein FASTA files. That App will be available soon. These files can then be converted into an NGP library and be used as the basis for proteomics analysis (serial analysis). The transcriptomics and proteomics results can be loaded into Advaita’s iPathwayGuide, or mapped onto the genome using NextBio, and differential transcript or gene expression can be compared with differential protein expression.
I hope you are as excited about the possibilities as we are here at AB SCIEX and Illumina. This is only the beginning…
If you’ve been using BaseSpace for a while you may have noticed that there wasn’t a way to permanently remove data from your account. I say that in the past tense because it is no longer true. The wait is over! “Move to Trash” is now available on Runs and Analyses.
This has been one of the most important features for us to get right because it has to do with removing your data and we take that very seriously. That is why we are introducing a two-step delete process that will help prevent accidental deletes and give you the confidence you need to safely manage your data.
First, you will notice a new action available on run and analysis list and detail pages, called “Move to Trash”. On the list pages, you must first highlight the row that you want before it’s available.
This action is very similar to moving files on your desktop to the trash or recycle bin. Just like your desktop, the data can be recovered, but it can no longer be viewed or acted upon.
Trashed Items Side-Effects:
- If the items were shared, all share recipients will lose access to that data
- All API access is immediately removed and will return the HTTP status code of 410 (“Gone”)
- Any attempt to view this data on the website will take the user to an 410 error page stating the content is “Gone”
- Data, while in the trash, can only be “Restored” or “Emptied” by the owner.
- Purging data will cause it to be permanently removed and cannot be undone.
Moving Runs to the Trash
- Runs can be put in the trash from the list or the detail pages.
- Runs cannot be removed if they are in a non-terminal state. The most common non-terminal states would be: running, uploading, analyzing.
- The dialog may also present you with the option to remove all associated analyses that used the run as input.
- All sequencing runs will have at least 1 associated analysis unless they were failed or used just for remote monitoring.
- If you are not the owner of the run, moving this item to the trash will simply remove your access and cannot be undone.
- To restore access, just contact the owner or click on the previously sent share link if it’s still active.
Moving Analyses to the Trash
- Analyses can be put in the trash from the list and detail pages.
- Analyses cannot be removed if in a non-terminal state. The most common non-terminal states would be: pending execution and running.
- If a project is being transferred, some of the analyses may not be removed until after the transfer has been completed.
- Apps that are leveraging data as input may fail if items are moved to the trash.
- If you have items in the trash, we prevent project transfers until all items in that project are restored or emptied.
Emptying and Restoring Items in the Trash
The trash page can be accessed from most of the project and run list pages. The icon is always in the right side of the grid and labeled, “View Trash”.
There are only two actions currently on the Trash page: Empty and Restore.
Empty will permanently delete all items, and Restore allows you to return the items back to being active.
Restored items will keep all of their original attributes except for the share recipients.
User Agreement Updates
Because of all of these changes, we have also updated our User agreements to reflect the behavior of these new features. In particular, item 7 states that even though data can be removed it may have been previously shared with other users or apps and subsequently downloaded or copied. You will be prompted to accept these new terms upon your next login. If you have any questions, don’t hesitate to ask!
Velvet de novo Assembly FastQC
Both applications are currently available for all users and were built using the BaseSpace Native App Engine by our internal R&D groups. These two applications are also the first BaseSpace Labs Apps of many more to come, the concept behind BaseSpace Labs Apps is explained in more detail below.
BaseSpace Labs Apps are Illumina’s internally developed applications that extend the functionality within BaseSpace. Some BaseSpace Labs applications will be experimental or research focused, while others will be used as a step in a greater workflow. The Apps are reviewed regularly by our team and put through the same review process as third-party apps.
BaseSpace Labs Apps are developed using an accelerated development process in order to make them available to BaseSpace users faster than the BaseSpace Core Apps. It is important to note that, unlike BaseSpace Core Apps, BaseSpace Labs Apps are not officially supported by Illumina Customer Service. Support for BaseSpace Labs applications is provided at the developer’s discretion and the apps are provided as-is without any warranty of any kind.
The FastQC app can be used to provide a quality assessment of the sequence data generated using Illumina sequencers. FastQC for BaseSpace is based on the FastQC software developed by the Bioinformatics Group at the Babraham Institute. It provides a modular set of analyses which can be used quickly to assess if there are any problems with the sequencing data before doing any additional analysis.
The above figure shows an example output from the FastQC app depicting the quality score across all bases at a given position in the reads. For an example of additional output generated by FastQC, please view this FastQC demo project.
The Velvet de novo Assembly app is a de novo assembly pipeline for bacterial samples using the Velvet assembler. One of the key features of this app is that it has an adapter trimming protocol that has been optimized for the Nextera Mate-Pair library prep kit. An application note describing the de novo assembly of 9 different bacterial using the Velvet de novo Assembly app can be found here. In many cases, a single contig representing the entire bacterial genome can be assembled. The figure below is an example of the output generate by the Velvet de novo Assembly app.
Example output generated by the Velvet de novo Assembly can be found here. We hope you enjoy the FastQC and Velvet de novo Assembly apps. For any questions, feedback, or feature requests for these applications, please send an email to firstname.lastname@example.org and include the name of the application. Thank you!
We are excited to announce the availability of a data upload feature for FASTQ files that were previously generated on Illumina sequencing instruments. This simple-to-use feature is accessible from any project to which the user has write access by first clicking on the project and then selecting the Import tab shown below.
The user will then be prompted to select their import type. The user can upload a single sample by clicking on “Sample” as shown below.
The user can then either “Drag and drop” one or more files into the webpage or click on “select files” and select which files they would like to upload from a file browser. Note that the FASTQ files need to adhere to Illumina standards, as specified below. Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively. It will take 1-2 hours to upload a 25GB sample on a network with a relatively fast internet connection.
The user will then see a progress bar as the file/s are uploaded. Once the progress bar completes, the user can add additional files. The user can also set the sample name and associate a genome with the sample in the upper left hand corner of the screen.
Once the user has imported all of the files and the files complete uploading, the user will need to click on the “Complete Import” button (shown above) to complete the session.
FASTQ file standards
- The uploader will only support gzipped FASTQ files generated on Illumina instruments
- The name of the FASTQ files must conform the following convention:
- SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
- The read descriptor in the FASTQ files must conform to the following convention:
- @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
- Read 1 descriptor would look like this:
- Read 2 would have a 2 in the ReadNum field, like this:
- Read 1 descriptor would look like this:
- @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
- The number of base calls for each read must equal the number of quality scores
- The number of entries for Read 1 must equal the number of entries for Read 2
- The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum
- For paired-end reads, the descriptor must match for every entry for both reads 1 and 2
- Each read has passed filter
- Only one sample can be uploaded at a time
- A maximum of 16 files can be uploaded in a session
- The size of the uploaded files cannot exceed 25 GB
- A detailed description of how to use the uploader can be found in the BaseSpace user guide
DeepChek®-HIV – App for genotyping by NGS and inferred drug resistance testing – for research use only
HIV genotyping and inferred drug resistance testing has become an integral part of the clinical management of patients infected with HIV. Detecting minority populations of resistant viruses is now routinely done. Next-generation sequencing (NGS) technology is replacing Sanger sequencing methodology, and end-to-end solutions combining sensitive genomic tests with advanced data management software platforms are in high demand.
DeepChek®-HIV is easy-to-use downstream analysis software for NGS data management, interpretation, and reporting for Research Use Only. DeepChek is a reliable software and database solution that is capable of handling the complexity of NGS data for all the key genomic regions involved in HIV drug resistance (reverse transcriptase, protease, integrase, GP41, and GP120/V3). The database is regularly updated with the most recent drug resistance information and provides an efficient and downstream analysis platform for clinical laboratories involved in routine HIV-1 genotyping and drug resistance testing.
Link to App in BaseSpace:
Link to example dataset with example input data and output results:
Sequence and stream data to BaseSpace- done. Run quality check- done. Alignment and variant calling- done. Congratulations, you now have a set of variants! But what good is a set of variants if you can’t describe what they mean, how they might explain the phenotype of the specimen, and which ones aren’t worth worrying about? There is good news- now that Illumina VariantStudio is available on BaseSpace, deciphering the biological meaning of genomic variants is not a huge challenge. If you’re not familiar with it, VariantStudio1 is an easy-to-use tool for variant annotation, filtering, and reporting.
- Step 1: Import data into VariantStudio. Click the import button and you will be able to browse and select VCF and gVCF files stored in BaseSpace to import into VariantStudio. You can import DNA variant data from targeted, exome, or whole-genome sequencing, but VariantStudio only supports human SNP and indel analysis at this time.
- Step 2: Annotate variants using the Illumina Annotation Service, which aggregates annotations from a broad range of public sources including ClinVar, COSMIC, OMIM, and 1000 Genomes Project. Variants will be richly annotated with biological information including transcript consequence, functional impact, known disease association, population allele frequency, and more.
- Step 3: Apply a cascade of filtering options to quickly create a short list of candidate variants that are likely associated with the disease or phenotype. In addition to single sample analysis, you can perform tumor/normal comparisons to identify somatic mutations or family-based analyses to investigate variants underlying rare disease.
- Step 4: Use the provided annotations to classify variants based on their presumed biological impact. A common scheme is pathogenic, likely pathogenic, benign, or unknown significance. Or you can use your own, customizable classification scheme.
- Step 5: Generate a customizable report that summarizes your important variants, along with any additional metadata.
Import variants > Annotate > Filter > Classify and Interpret > Generate Report. Watch our analysis videos and see how quickly you can go through this workflow. VariantStudio is a powerful, secure2 tool to simplify genomic data interpretation, and accessing it on BaseSpace is just a click away. With the addition of VariantStudio to the BaseSpace Core Apps, BaseSpace users can now execute the entire sample-to-answer workflow- from generating sequence reads to reporting biologically significant results.
Two applications for Illumina’s synthetic long reads: TruSeq Long-Read Assembly and TruSeq Phasing Analysis
Using data generated by the TruSeq Synthetic Long-Read DNA Library Prep Kit (also released this week), the TruSeq Long-Read Assembly App executes the assembly of synthetic long reads, and the TruSeq Phasing Analysis App performs whole human genome phasing.
The TruSeq Long-Read Assembly App constructs synthetic long reads from shorter sequencing reads, providing FASTQ files for accurate genome assembly, genome finishing, de novo assembly and metagenomics analysis.
The TruSeq Phasing Analysis App performs whole human genome phasing, identifying haplotype information, co-inherited alleles and phasing de novo mutations. The application reports haplotype blocks across the genome and phasing confidence scores in a phased VCF file.
Together with the TruSeq Long-Read DNA Library Prep Kit and Illumina’s sequencing technology, these new apps provide a solution for long reads that spans library prep, sequencing, and informatics. See some sample data on the Illumina Blog, and learn more about synthetic long-read and phasing technology on the Illumina website.