Support for Internet Explorer 9

Coinciding with the release of BaseSpace v3.16, we are officially discontinuing support for Internet Explorer 9 (IE9). After April 17, 2015, users will no longer be able to run BaseSpace on IE9.

With the IE browser having undergone two major upgrades (latest version is IE11), we only have a small number of users accessing BaseSpace via IE9. More importantly, with rapid evolution of browser features, the technical burden of supporting old browsers slows down development of newer features.

We apologize for any inconvenience, but encourage you to either upgrade to the latest version of Internet Explorer (IE11), or choose other supporter browsers such as Chrome, Firefox, or Safari.

New and Updated BaseSpace Apps

It’s been an exciting last 30 days for App releases. At the end of February we released v1.1 of the TruSeq Long-Read Assembly and the TruSeq Phasing Analysis Apps.

 

TruSeq Long-Read Assembly v1.1

TruSeq Phasing Analysis Apps v1.1

These updates were mainly to improve the performance of the Apps and to fix a few minor bugs. Detailed information can be found in the customer release note.

We also released v2.1 of the Isaac Enrichment and BWA Enrichment Apps.

Isaac Enrichment v2.1

BWA Enrichment v2.1

These updates allow the Apps to take advantage of the multi-launch feature available in the BaseSpace platform. Users can now analyze up to 96 samples in parallel across multiple nodes dramatically reducing the time to answer. 96 samples can now be analyzed in as little as 2 hours on a 50x exome. Detailed information can be found in the customer release note.

We also published a handfull of new 3rd Party Apps.

The GENIUS Metagenomics: Know Now App from COSMOSID was released. The release of the App was discussed in GenomeWeb. The GENIUS® Metagenomics application uses CosmosID’s curated genome database and high performance algorithms to provide rapid, accurate, and actionable bacterial identification at the species, subspecies, and/or strain level. You can explore the App for free with their limited time free trial offer (through April 3, 2015).

Genius Metagenomics: Know Now

The MetaPhlAn App from the Huttenhower Lab at the Harvard School of Public Health was published. Like The GENIUS® Metagenomics application the MetaPhlAn App adds to our growing suite of tools for microbiology data analysis. MetaPhlAn (Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from reference genomes, allowing orders of magnitude speedups and unambiguous taxonomic assignments.

MetaPhlAn

The EDGC Annotator v1.0 App from EONE-DIAGNOMICS Genome Center was published. EDGC Annotator annotates human variants by providing information about cancer related variants, genomic regions, allele frequency data, clinical knowledge and insights using ClinVar, OMIM, COSMIC, 1000 Genomes Project allele frequencies, dbSNP and VEP(Variant Effect Predictor) database.

EDGC Annotator v1.0

The RNA-Seq Translator v1.0 App from Yale University was published. The App compliments our suite of Apps for the OneOmics project. The App consumes the output of the Cufflinks Assembly and DE App and converts the differential expressed genes to create a protein reference with which to search spectra obtained from a mass-spectrometry experiment.

RNA-Seq Translator v1.0

Finally we published an updated version 3.5 of the SPAdes Genome Assembler App from the Algorithmic Biology Lab. SPAdes is a best in class de novo assembler for assembly of small genomes. This update includes improved support for Nextera® Mate Pair Libraries which allows users to generate large continuous high quality assemblies. Users can also mix multiple libraries in a single assembly

SPAdes Genome Assembler 3.5

We are excited about these new and updated Apps and the continued maturity of the BaseSpace platform. We now have around 60 Apps on BaseSpace covering many of the NGS analysis needs.  We look forward to bringing more functionality in the days to come.

Introducing the New BaseSpace HelpCenter

The team is proud to announce the release of our new help site at help.basespace.illumina.com.  The new HelpCenter makes it easier to navigate and find the relevant information about BaseSpace. It also enables greater user participation (see below for details).

image

image

Site Contents

  • Overview section for new users
  • Articles explaining the usage of different features
  • Release notes
  • Videos
  • Latest news
  • Search
  • Links to other BaseSpace related content like our Developer’s Site

Feedback

This is just the beginning of our content here and it will continue to be expanded and refined.  If you would like to suggest a topic to be covered or want us to clarify an existing article, all of the content is open source and hosted on github. You can create issues on current or new content and submit pull requests as well.

image

We still have direct access to support via our “Contact Us” widget within BaseSpace along with email and phone support through the normal Illumina support channels.

Thanks!

Native App Report Engine Improvements

We recently released improvements to our reporting engine for native apps that make it much easier to do some common tasks.  For those of you that are new to the “Native App Engine”, it’s comprised of three main developer components that allow a developer to package up any analytic workflow and publish it in the BaseSpace App store.  

Those three components are:

  1. Form Builder – Custom input form designer
  2. Docker – Linux container technology that your app runs in
  3. Report Builder – HTML 5 template engine

It’s the same toolset that we use internally to publish apps and we are constantly looking to make improvements based on internal and external feedback. 

Report Generation

report-editor-shot2

The reporting engine, which is essentially a html template engine,  is the last step in building a native app.  When your application finishes processing, we build a meta model around all the data that was produced and allow you to bind that to an html template.  We leverage an open source technology called Liquid which is a markup template engine used in lots of other companies with similar needs.  Along with some of the basic filters that are defined in liquid, we have extended the syntax to include BaseSpace specific needs.

New Features

Find Filter

XPath operator on XML files

  • Takes in an xpath expression and returns the resultant xml text

{{ statsFile | find: "/Statistics/Overall/Stats[SampleID='10002 - R1']/NumberOfClustersPF" }}

 

Stringify Improvements

Stringify is a custom filter that allows you to serialize contents of a drop, csv, or xml segment to JSON.

var globals = {}; 
globals.sample = {{ sample | stringify }};
globals.sample.chromosomes = {{ sample.chromosomes | stringify }};
globals.sample.statsByChromosome = {{ statsFile.parse.StatisticsResequencing.Samples.SampleStatistics | stringify }}; 

 

Custom Dictionary Filters

  • where.starts_with – returns the values where the dictionary key starts with provided string
  • where.ends_with – returns the values where the dictionary key starts with provided string
  • where.contains – returns the values where the dictionary key starts with provided string
  • first – returns the first value, errors if nothing there
  • first_or_default – returns the first value or default (usually null)
{% assign_object datafile1 = result.files.where.starts_with["datafile1"].first %}
{% assign_object anyXml = result.files.where.ends_with[".xml"].first_or_default %}
{% assign_object datafile3 = result.files.where.contains["file3"].first %}

 

Break and Continue Tags

Allows breaking and continuing in Liquid loops.

{% for file in files %}
	{% if file.href == null %}
		{% continue %}
	{% endif %}
	{% if file.href == 'http://special.com' %}
		{% assign specialHref = file.href %}
		{% break %}
	{% endif %}
{% endfor %}

 

Select Columns

Allows the selection of columns in a csv by column name or index. 

{% assign grid2 = result.files[key].select['0,1,2'].take[1].parse %}
{% assign grid3 = result.files[key].select['LastName','City','Phone'].take[5].parse %}

 

Take

Ability to take a subset of rows from a csv file.

{% assign grid2 = result.files[key].select['0,1,2'].take[1].parse %}
{% assign grid3 = result.files[key].select['LastName','City','Phone'].take[5].parse %}

 

ToArray

Ability to output csv data rows to a 2-dimensional data array.

{{ result.files[key].parse.to_array | stringify }}

 

Assign Improvements

Assign now allows assignment of any liquid object, not just primitives.

{% assign myFiles = result.files %}
{% assign myCsv = myFiles.where.ends_with[".csv"].first.select["0"].take[2].parse.to_array | stringify %}
{{ myCsv }}

 

Summary

We hope developers will leverage these new features to build great interactive reports.  If you want to learn more about native apps, then read our intro post, check out our developer portal, or follow our native app tutorial.

Prokka small genome annotation is now in BaseSpace Apps.

We are pleased to announce the release of our latest BaseSpace Labs App Prokka Genome Annotation.

 

image

Prokka wraps the tool of the same name developed by Dr. Torsten Seemann of the Victoria Bioinformatics Consortium. Prokka automates the process of building an annotation of a prokaryotic genome, first running a comprehensive set of feature prediction tools then combining their output into standards-compliant files suitable for further analysis, visualization in genome browsers or submission to archives.

As input, the Prokka App requires a FASTA file which is assumed by default to contain assembled contigs from a bacterial or other prokaryotic genome, such as produced by the SPAdesVelvet de novo Assembly or DNAStar Assemble bacteria Apps. Shotgun metagenomic data can also be annotated by making the appropriate selection on the input form. An example of the App’s output can be found here.

Citation: Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 Jul 15;30(14):2068-9. PMID:24642063

Send data from BaseSpace to NextBio Research

With the addition of NextBio products to our informatics offerings, Illumina adds one of the richest compendia of curated genomic data in existence today. As a starting point towards integrating our BaseSpace and NextBio platforms, we are proud to announce the release of the NextBio Transporter App as our latest BaseSpace Labs release. The Transporter sends analysis results from BaseSpace into NextBio Research and requires that users have an existing account with NextBio.

NextBio Transporter

Similar to the NextBio Annotates RNA-Seq App, the NextBio Transporter uses an AppResult as input to the app, and currently supports outputs from the Cufflinks or RNA Express Core Apps from Illumina. You must also specify your account and domain information in NextBio Research, and the app takes care of everything else.

As output, users are provided a link to the transported data in NextBio Research, and a QuickView is also generated which displays the information NextBio has found relating to the input data.

Transporter Output

Within NextBio Research, users can then explore connections with curated content. For example, by clicking on “Curated Studies”, we can pull up published studies that have produced results that are highly correlated with our transported dataset. NextBio Research offers an incredibly rich platform for biological information, and we are excited to now provide the ability for BaseSpace users to connect their sequencing data to the biological insights offered by NextBio.

Variant calling assessment using Platinum Genomes, NIST Genome in a Bottle, and VCAT 2.0

With the rapid improvements in sequencing throughput, cost, and ease of use, it’s becoming routine to generate lots of variant calls in the form of VCF files. But how do you know if your new variant calls are accurate? How can a non-bioinformatician compare variant calls from different sequencing platforms, reagent kits, biological samples, or software pipelines? Illumina is now offering a carefully designed and highly curated data set and a corresponding BaseSpace Labs App to address these types of comparison questions.

The Platinum Genomes project was started in 2011 with the goal of creating a high confidence, “platinum” quality reference variant call set. This was accomplished by sequencing a large family to high depth using a PCR-Free sample prep to maximize variant calling sensitivity. A large set of candidate variants was obtained from multiple methods and technologies. Candidates that were pedigree consistent were included in the reference call set. Based on this approach, Illumina has derived a set of high-confidence, pedigree-validated reference variant calls for Coriell samples NA12877 and NA12878.

The full set of Platinum Genomes public data and documentation are freely available at http://www.illumina.com/platinumgenomes/ . The BaseSpace Platinum Genomes Project also has copies of the platinum VCF files.

Please cite the Platinum Genomes website and Illumina, Inc. in publications and other public usage of the Platinum Genomes data.

In addition, Illumina has upgraded the Variant Calling Assessment Tool (VCAT 2.0) BaseSpace app. The app calculates SNV and indel statistics and optionally determines the overlap between the input variant call sets. Additionally, the quality of SNV and Indel calls can be assessed based on Platinum Genomes and/or NIST Genome in a Bottle (GIAB) reference variant calls. No existing tool currently offers a simple user interface for using both of these resources. The accuracy and comparison logic in VCAT is primarily based on vcftools, a commonly used open source toolkit for analyzing variant calls. More insight into how VCAT works is available by browsing the VCAT log file.

The Platinum Genomes project is led by Epameinondas Fritzilas and the VCAT project is led by Robert Schmieder, while many other team members have contributed. Please note that while both Platinum Genomes and VCAT are freely available, Illumina does not offer technical support for either of these resources.

There are many interesting ways to use these powerful new tools together. Here’s an example:

Case study on exome sequencing: How much depth is enough?

Using the “Combine Samples” feature in BaseSpace, Nextera Rapid Capture Exome samples of approximately 50x, 100x, 200x, and 400x were created from replicates of Coriell sample NA12878. The source data is here. A BaseSpace Project containing the resulting VCF files and the VCAT 2.0 results is here. The Platinum Genomes v7 recall numbers below suggest that 50x exome depth may only find 80% of the SNVs and 70% of the indels, while exome depths greater than 200x enable finding over 95% of SNVs and over 88% of indels.

image

image

image

VCAT 2.0 also enables the analysis of samples other than NA12878 via pairwise intersect comparisons. The Venn diagrams and corresponding tables shown below are from a VCAT report from the same example BaseSpace Project. When using this feature, VCAT also creates new VCF files which represent the unique SNV and indel calls, as well as VCF files for the common calls.

image

 

image

The Unique VCF files are also indexed for browsing within the BaseSpace IGV App.  Below is a screenshot which shows two SNVs that are found in the 105x exome, but are missed in the 53x exome due to low coverage depth.

image

That’s it for now. In an upcoming blog post, we’ll look at Platinum Genomes and NIST GIAB in more detail including some comparisons.