Sequence and stream data to BaseSpace- done. Run quality check- done. Alignment and variant calling- done. Congratulations, you now have a set of variants! But what good is a set of variants if you can’t describe what they mean, how they might explain the phenotype of the specimen, and which ones aren’t worth worrying about? There is good news- now that Illumina VariantStudio is available on BaseSpace, deciphering the biological meaning of genomic variants is not a huge challenge. If you’re not familiar with it, VariantStudio1 is an easy-to-use tool for variant annotation, filtering, and reporting.
- Step 1: Import data into VariantStudio. Click the import button and you will be able to browse and select VCF and gVCF files stored in BaseSpace to import into VariantStudio. You can import DNA variant data from targeted, exome, or whole-genome sequencing, but VariantStudio only supports human SNP and indel analysis at this time.
- Step 2: Annotate variants using the Illumina Annotation Service, which aggregates annotations from a broad range of public sources including ClinVar, COSMIC, OMIM, and 1000 Genomes Project. Variants will be richly annotated with biological information including transcript consequence, functional impact, known disease association, population allele frequency, and more.
- Step 3: Apply a cascade of filtering options to quickly create a short list of candidate variants that are likely associated with the disease or phenotype. In addition to single sample analysis, you can perform tumor/normal comparisons to identify somatic mutations or family-based analyses to investigate variants underlying rare disease.
- Step 4: Use the provided annotations to classify variants based on their presumed biological impact. A common scheme is pathogenic, likely pathogenic, benign, or unknown significance. Or you can use your own, customizable classification scheme.
- Step 5: Generate a customizable report that summarizes your important variants, along with any additional metadata.
Import variants > Annotate > Filter > Classify and Interpret > Generate Report. Watch our analysis videos and see how quickly you can go through this workflow. VariantStudio is a powerful, secure2 tool to simplify genomic data interpretation, and accessing it on BaseSpace is just a click away. With the addition of VariantStudio to the BaseSpace Core Apps, BaseSpace users can now execute the entire sample-to-answer workflow- from generating sequence reads to reporting biologically significant results.
Two applications for Illumina’s synthetic long reads: TruSeq Long-Read Assembly and TruSeq Phasing Analysis
Using data generated by the TruSeq Synthetic Long-Read DNA Library Prep Kit (also released this week), the TruSeq Long-Read Assembly App executes the assembly of synthetic long reads, and the TruSeq Phasing Analysis App performs whole human genome phasing.
The TruSeq Long-Read Assembly App constructs synthetic long reads from shorter sequencing reads, providing FASTQ files for accurate genome assembly, genome finishing, de novo assembly and metagenomics analysis.
The TruSeq Phasing Analysis App performs whole human genome phasing, identifying haplotype information, co-inherited alleles and phasing de novo mutations. The application reports haplotype blocks across the genome and phasing confidence scores in a phased VCF file.
Together with the TruSeq Long-Read DNA Library Prep Kit and Illumina’s sequencing technology, these new apps provide a solution for long reads that spans library prep, sequencing, and informatics. See some sample data on the Illumina Blog, and learn more about synthetic long-read and phasing technology on the Illumina website.
We constantly strive to improve the experience for all users using our tools, and today we are excited to announce a few new updates we have made to the BaseSpace Prep Tab. The BaseSpace Prep Tab makes it easier for our users to prepare and plan a sequencing Run by using a rich web user interface which communicates directly with the instrument to set it up in four easy steps: preparing Biological Samples, Libraries, Pools, and planning a Run which can be discovered by the instrument. At the moment, the Prep Tab supports only NextSeqs but support for MiSeqs and HiSeqs is coming in the future. Today we’ve added a few new features to the BaseSpace Prep Tab, which we will explain in more detail below:
- The ability for users to Import Custom Library Prep Kits
- The ability for users to Import prepped Libraries in one step
- The Prep Libraries section under the Libraries tab is now supported on small laptop screens
For those that would like more information about the Prep Tab, please view the original NextSeq and Prep Tab blog post to learn more.
Import a Custom Library Prep Kit
- When prepping Libraries, under Library Prep Kit drop-down list, you can now choose to add a Custom Library Prep Kit.
- After selecting Custom Library Prep Kit, in addition to naming your kit, you will be asked to specify a few basic options for supported read types, indexing strategies, and default read cycles.
- Now, click on Choose .csv file to select a template file. You can customize your template file with the following information for the new kit:
- Custom adapter sequences
- Custom indexes (name and sequences)
- Custom default layout
Here’s a simple example template file:
- When complete, click Create New Kit to add this kit to the drop-down list that appears for your account.
- You can also view all of your Library Prep Kits under the My Account section of Basespace.
Import Prepped Libraries
- Users can now import prepped libraries all in one step, instead of importing biological samples and prepping them in 2 separate steps.
- First, access the Import feature from the Libraries page within the Prep tab by clicking on Import:
- Click the choose .csv File button
- In your .csv file, specify plate information and a list of libraries to import. The following is a simple example file:
- Once a file is selected, click Open.
- The page will now populate with prepped libraries, ensure that the information displayed is correct.
- When you’re done you can save the plate for later use or proceed directly to pooling libaries!
We hope you enjoy these changes and look forward to more updates in BaseSpace in the future.
The BaseSpace App store is continually expanding in the breadth and depth of applications, taking you from raw sequencing data through biological interpretation. Today we’re happy to introduce a new App from PathGenDX called PathSeq Virome.
Whether you are looking for a contaminant in your biological sample, identifying an infection in your sample, or investigating new viruses, PathSEQ Virome can provide the answer. PathSEQ Virome is a BaseSpace app that automatically identifies and characterizes virus genomes from a sequencing run, in a comprehensive and systematic way. After streaming from MiSeq, NextSeq or HiSeq, sequencing data is filtered, matched against the PathSEQ virus database, and quality-controlled for false positives. Currently able to detect >50,000 clinically relevant virus genomes, the app outputs a PDF report from ~2GB of sequence data within an hour. The report identifies the strength of the overall genomes matched, provides a confidence score to novel viruses (if related to any of the genomes in the database), while also identifying the genomic region of the match. If multiple viruses are present in the same sample, they are combined into the same report, with the highest matches first.
Getting started with PathSEQ Virome is simple. After running your sequence data on HiSeq or MiSeq, click on “Apps” in the header in BaseSpace and select the PathSEQ Virome app.
Alternatively, you can also launch the app from a Project by clicking on the “Launch App” button and selecting PathSEQ Virome from the pull-down menu
Select your Project from the pull-down menu within BaseSpace, and PathSEQ will prompt you to select which samples in the project you want to analyze. By default, the results will be saved to a new project folder named “PathSEQ_virome_results”.
Upon completion, BaseSpace will automatically send you the following email, if you have opted in for these notifications:
To get to the analysis report, simply click on the link in this email. All reports generated by the app are stored in a Project folder named “PathSEQ_virome_results”. Clicking on that folder provides convenient access all the reports generated by the app.
Check out this demo dataset containing an example of results from the PathSEQ Virome app, it will be shared with your account once you click on that link. The PathSEQ Virome app is available for a free trial for all Sample data that is less than 500 MB in size and a paid version of the app will be available in the near future without that sample size limitation.
Didn’t find the viruses you were expecting? Contact PathGENDx to find out about their wet lab viral enrichment process to increase sensitivity.
We invite you to try it out. PathSeq Virome comments and feedback are welcome through the PathGenDX portal: http://enquiry.pathgendx.com/enquiry_form.php
With the latest update of BaseSpace, we have updated the MiSeq Reporter (MSR) Workflows to v2.4! This means all users streaming MiSeq data to BaseSpace with automatic analysis launch will now have the latest and greatest features incorporated since the release of MSR v2.2 in BaseSpace. MSR v2.4 in BaseSpace will also provide consistent results with MSR local v2.4, released in March, 2014.
For full details on the MSR v2.4 release in BaseSpace, please see the Customer Release Notes.
New platform level features
- gVCF file output for TruSeq Amplicon, Enrichment, and PCR Amplicon workflows. The gVCF files provide data at all genomic coordinates assayed, regardless of the variant status, and show the reference allele if no variant is found. For more information on gVCF files, please see the following link: https://sites.google.com/site/gvcftools/home/about-gvcf
- BAM and VCF/gVCF files are now stamped with a header containing algorithm versions and parameter settings to easily track your analysis details.
- Updated the Starling variant caller to v2.0.3 for use with TruSeq Amplicon, Resequencing, and PCR Amplicon workflows. The default variant caller for these workflows remains GATK v1.6.
- Enabled StitchedReads in the TruSeq Amplicon and GenerateFASTQ workflows. Read stitching allows construction of a single, longer read from two overlapping, shorter paired reads. For more information on Stitched Reads, please see the Customer Release Notes.
Highlights for specific workflows
- The 16S Metagenomics workflow now allows species-level identification (previous versions stopped at genus-level). To classify down the species level, add the sample sheet setting described in the Customer Release Notes, or use Illumina Experiment Manager v1.7 to build your sample sheet. We also added a new, interactive HTML output report to dive deep into your 16S data. Lastly, the classification algorithm has been updated to provide faster results.
- The Enrichment workflow now allows users to specify padding values in the sample sheet. Please see the Customer Release Notes for additional details.
If you have any older runs that you want to push through these new workflows, please contact support through the “Contact Us” button on the site.
We are excited to announce the release of the first 3rd party native application, the SPAdes Genome Assembler 3.0. This app was developed by the Algorithmic Biology Lab at the St. Petersburg Academic University of the Russian Academy of Sciences using the BaseSpace Native App Engine.
SPAdes is the first open source de novo assembly application available to the BaseSpace community and can be used to assemble small genomes. The output from the SPAdes app includes various assembly statistics at the contig and scaffold levels.
Sample B. Cereus data can be found and imported into your BaseSpace account from the following link
The BaseSpace team is excited to announce the release of our most anticipated applications since the announcement of the Native App Engine last October.
- The TopHat Alignment App can be used to align RNA reads as well as detect gene fusions using the industry-standard method. Illumina’s Isaac method further enables the calling of SNVs and small indels.
- The Cufflinks Assembly & Differential Expression App enables gene expression profiling and detection of novel transcript isoforms.
These applications are part of the BaseSpace Core Apps and were both written with our Native App Engine, taking advantage of the parallelization features which allows TopHat to process 96 samples in less than a couple hours with fusion calling enabled. This parallelization is seamless and automatic, bringing the scalability of the cloud to your analysis – so that if you have 1 sample or 100, they run in the same amount of time.
In the coming days, we will be posting a sample project with fully analyzed data so that you can quickly view the output from the RNA-Seq Core Apps. There is also example data comparing Universal Human Reference and Human Brain mRNA that you can import into your BaseSpace account from BaseSpace’s Public Data repository: