A guest blog, written by GoSeqIt
In an increasingly globalized world, bacteria can spread rapidly and easily. Furthermore, they often contain genes that make them resistant to antibiotics or confer high virulence. Sequencing the entire genome of bacteria enables a thorough characterization and thus makes it possible for researchers to monitor the spread of particular strains of bacteria or sets of genes.
In collaboration with the Illumina BaseSpace Sequence Hub development team, GoSeqIt has published two apps for characterization of bacterial single isolates. Both of these apps are now available to BaseSpace Sequence Hub users:
The input for both apps is a bacterial complete or draft genome in FASTA format (only files with the extension .fa or .fasta are accepted).
Bacterial Analysis Pipeline App
The Bacterial Analysis Pipeline app will initially predict the species of the bacterial draft genome based on the number of kmers (oligonucleotides with the length k) co-occurring between the input genome and bacterial genomes in a reference database (1). Further, acquired antimicrobial resistance genes are identified using a BLAST-based approach, where the nucleotide sequence of the input genome is compared to the genes in the ResFinder database (2). Depending on the identified species, Multilocus Sequence Typing (MLST) is performed, also using a BLAST-based approach (3). One-hundred-twenty-five (125) MLST schemes are currently available.
If the input genome is recognized as belonging to Enterobacteriaceae or the gram positive bacteria (Enterococcus, Streptococcus, or Staphylococcus), BLAST is used to search for plasmid replicons using the PlasmidFinder database (4). Identified plasmids of the incF, IncH1, IncH2, IncI1, IncN, or IncA/C type are further subtyped by plasmid MLST (4). Finally, identified Escherichia coli, Enterococcus sp., Listeria sp., and Staphylococcus aureus are compared to the VirulenceFinder database containing known virulence genes (5). For more information, refer to the article titled “Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.” Figure 1 illustrates the output for species prediction and MLST, while figure 2 illustrates the output for the prediction of acquired antimicrobial resistance genes.
Figure 1: Example of output from the Bacterial Analysis Pipeline app for species prediction and MLST of the input genome.
Figure 2: Example of output from the Bacterial Analysis Pipeline app for acquired antimicrobial resistance genes in the input genome.
E. coli Serotyping App
The E. coli Serotyping app uses a BLAST-based approach to predict the serotype of E. coli isolates by comparing the input genome with a database of specific O-antigen processing system genes for O typing and flagellin genes for H typing (7). The app outputs the predicted serotype along with the identified O-antigen genes (wzx, wzy, wzm, and wzt) and flagellin genes (fliC, flkA, fllA, flmA, and flnA).
Figure 3: Example of output from the E. coli Serotyping app. So far, only E. coli isolates can in this way be in silico serotyped.
Using the New Apps
The price for using the Bacterial Analysis Pipeline app is 5 iCredits per uploaded file plus the cost of computing. The E. coli Serotyping app costs 1 iCredit per uploaded file plus the cost of computing.
Both apps use methods that have been throughly described and published in renowned scientific journals.
1) Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Hasman H, Sicheritz-Pontén T, Aarestrup FM, Ussery DW, Lund O. Benchmarking of methods for genomic taxonomy. J Clin Microbiol. 2014 May;52(5):1529-39. PMID: 24574292.
2) Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother. 2012 Nov;67(11):2640-4. PMID: 22782487.
3) Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-Pontén T, Ussery DW, Aarestrup FM, Lund O. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 2012 Apr;50(4):1355-61. PMID: 22238442.
4) Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O, Villa L, Møller Aarestrup F, Hasman H. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother. 2014 Jul;58(7):3895-903. PMID: 24777092.
5) Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol. 2014 May;52(5):1501-10. PMID: 24574290.
6) Thomsen MC, Ahrenfeldt J, Cisneros JL, Jurtz V, Larsen MV, Hasman H, Aarestrup FM, Lund O. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance. PLoS One. 2016 Jun 21;11(6):e0157718. PMID: 27327771.
7) Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol. 2014 May;52(5):1501-10. PMID: 24574290.
For Research Use Only. Not for use in diagnostic procedures.
We want to invite all of you to the BaseSpace Developer Conference in San Francisco! We’ve been active with many BaseSpace Developer Conferences throughout the world this year, including Heidelberg, Singapore, Bangalore, and our most recent visit to the University of Tokyo in Japan!
First of all, we would like to thank all of our developers and speakers, you all made this possible. We hope it was a great learning experience and look forward to the apps we can bring to BaseSpace. Also, a big shout out to the University of Tokyo for hosting the event and our Illumina team in Japan.
The events showcase the new Native App Engine within BaseSpace with which developers can easily adapt their command-line pipelines into the BaseSpace cloud infrastructure or an infrastructure of their choice.
During the event, developers are taken through a step-by-step walkthrough where they develop two separate BaseSpace applications by the end! For anyone that is interested in learning more about BaseSpace App development, there is a lot of documentation available on the BaseSpace Developer Portal for both Native and Web applications.
We also spend time interacting with developers and users directly to brainstorm ideas and answer any questions they may have.
We are hosting another BaseSpace Developer Conference in San Francisco on December 8th, if you are interested in attending you can sign up here.
To get an idea of whats in store for you when you attend one of our developer conferences, check us out on twitter at #basedev2014.
For any further questions about BaseSpace App development, please view or post on the developer forum or contact us through BaseSpace support.
Velvet de novo Assembly FastQC
Both applications are currently available for all users and were built using the BaseSpace Native App Engine by our internal R&D groups. These two applications are also the first BaseSpace Labs Apps of many more to come, the concept behind BaseSpace Labs Apps is explained in more detail below.
BaseSpace Labs Apps are Illumina’s internally developed applications that extend the functionality within BaseSpace. Some BaseSpace Labs applications will be experimental or research focused, while others will be used as a step in a greater workflow. The Apps are reviewed regularly by our team and put through the same review process as third-party apps.
BaseSpace Labs Apps are developed using an accelerated development process in order to make them available to BaseSpace users faster than the BaseSpace Core Apps. It is important to note that, unlike BaseSpace Core Apps, BaseSpace Labs Apps are not officially supported by Illumina Customer Service. Support for BaseSpace Labs applications is provided at the developer’s discretion and the apps are provided as-is without any warranty of any kind.
The FastQC app can be used to provide a quality assessment of the sequence data generated using Illumina sequencers. FastQC for BaseSpace is based on the FastQC software developed by the Bioinformatics Group at the Babraham Institute. It provides a modular set of analyses which can be used quickly to assess if there are any problems with the sequencing data before doing any additional analysis.
The above figure shows an example output from the FastQC app depicting the quality score across all bases at a given position in the reads. For an example of additional output generated by FastQC, please view this FastQC demo project.
The Velvet de novo Assembly app is a de novo assembly pipeline for bacterial samples using the Velvet assembler. One of the key features of this app is that it has an adapter trimming protocol that has been optimized for the Nextera Mate-Pair library prep kit. An application note describing the de novo assembly of 9 different bacterial using the Velvet de novo Assembly app can be found here. In many cases, a single contig representing the entire bacterial genome can be assembled. The figure below is an example of the output generate by the Velvet de novo Assembly app.
Example output generated by the Velvet de novo Assembly can be found here. We hope you enjoy the FastQC and Velvet de novo Assembly apps. For any questions, feedback, or feature requests for these applications, please send an email to email@example.com and include the name of the application. Thank you!
DeepChek®-HIV – App for genotyping by NGS and inferred drug resistance testing – for research use only
HIV genotyping and inferred drug resistance testing has become an integral part of the clinical management of patients infected with HIV. Detecting minority populations of resistant viruses is now routinely done. Next-generation sequencing (NGS) technology is replacing Sanger sequencing methodology, and end-to-end solutions combining sensitive genomic tests with advanced data management software platforms are in high demand.
DeepChek®-HIV is easy-to-use downstream analysis software for NGS data management, interpretation, and reporting for Research Use Only. DeepChek is a reliable software and database solution that is capable of handling the complexity of NGS data for all the key genomic regions involved in HIV drug resistance (reverse transcriptase, protease, integrase, GP41, and GP120/V3). The database is regularly updated with the most recent drug resistance information and provides an efficient and downstream analysis platform for clinical laboratories involved in routine HIV-1 genotyping and drug resistance testing.
Link to App in BaseSpace:
Link to example dataset with example input data and output results:
Registration is now open for the BaseSpace Developer’s Meeting at the European Molecular Biology Laboratory (EMBL) Heidelberg, Germany on May 7, 2014. This free, one-day forum is a great opportunity for both experienced and novice developers to network, exchange ideas, and learn more about the world’s most widely used cloud-based bioinformatics platform for next-generation sequencing. Participants will use the BaseSpace Native App Engine to launch their own bioinformatics apps in BaseSpace.
Why develop for BaseSpace? Because 90% of the world’s next-gen sequencing data is produced on Illumina instruments, and your novel algorithms, open-source tools, and applications for BaseSpace users can directly impact the growth of genomic research. In short, you can change the way the world analyzes genomic data.
Welcome to EMBL & Illumina’s Co-Hosting of 2014 BaseSpace WWDC
Jonathon Blake, Ph.D., Bioinformatics, EMBL
Raymond Tecotzky, Market Manager, BaseSpace, Illumina, Inc.
Keynote: BaseSpace and The Next Frontier for Genomics Storage, Sharing, and Analysis
Elliott Margulies, Ph.D., Product Owner, BaseSpace, Illumina, Inc.
Biomax PEDANT – Pathway Analysis for NGS Data
Dimitrij Frishman, Ph.D., Professor of Bioinformatics, Technical University in Munich, Germany
New Frontiers of Genome Assembly with SPAdes 3.0 on Illumina BaseSpace Platform
Anton Korobeynikov, Ph.D., Associate Professor Saint Petersberg State University, St. Petersburg, Russian Federation
ABL (Advanced Biology Laboratories/Therapy Edge) DeepChek® Hep B & C Detection App
Dr. Chalom Sayada, CEO, Advanced Biological Laboratories SA
Hands-On Session: Build Your Own BaseSpace App
Greg Roberts, Senior Staff Software Engineer, Illumina, Inc.
Mayank Tyagi, Senior Applications Support Engineer, Illumina, Inc.
Hands-On “Hackathon” Build Your Own BaseSpace App (Choose from Open-Source, Command-Line, or Bring Your Own Code)
Ilya Chorny, Sequencing Application Marketing, Illumina, Inc.
BaseSpace Onsite Introduction – Storage, Sharing, & Analysis in a Box
John Duddy, Senior Staff Software Engineer. Illumina, Inc.
Will be followed by a Networking Reception
Date: Wednesday 7 May, 2014 9:00-6:30 PM
69117 Heidelberg, Germany
Flex Lab A+B
Got a killer NGS app? Enter your original idea and win an iPad mini at the conference!
We are humbled and excited by the overwhelming attention BaseSpace, BaseSpace Onsite, and the BaseSpace Core Apps have received at AGBT. Things were set in motion on Wednesday by a review of the BaseSpace RNA-Seq Apps (TopHat and Cufflinks) by James Hadfield from Cancer Research UK as a part of his presentation at the Illumina User Meeting. Then on Thursday, during the standing-room only Illumina Workshop, our own Gary Schroth gave a “User’s Perspective” talk on RNA-Seq right after Sheila Fisher’s hot-off-the press presentation of HiSeq X10 and NextSeq datasets from the Broad Institute. Gary gave a deep dive of the TopHat and Cufflinks Apps on BaseSpace. Gary emphasized the high usability and the end-to-end workflow now enabled by BaseSpace. The workflow starts with creation of samples, libraries and runs for the NextSeq on the Prep tab, followed by real-time monitoring of sequencing metrics, and finally the streamlined analysis of data resulting in graphical interactive plots of expression profiles. Gary also mentioned that he is not a bioinformatician, but can now perform RNASeq analysis all by himself.
On Thursday evening, the first-ever AGBT “Electronic Poster Session” was held, where about 30 software vendors showcased their solutions in a large, well-catered room (chocolate fountain, crabs, sushi and all). The two of us who were demo-ing BaseSpace were kept busy throughout the two hours and we definitely got the sense that the value proposition of the BaseSpace platform and Apps resonated with all users who stopped by our booth.
Finally, we would like to respond to some questions that have come up in the twitter-verse based on the James Hadfield and Gary Schroth presentations:
1. To cite BaseSpace in journal manuscripts: we recommend citing the specific URL as appropriate
- To cite BaseSpace in general: basespace.illumina.com
- To cite a particular App: https://basespace.illumina.com/apps/303303/BWA-Enrichment etc. (Each App has a dedicated “App description page” that is accessible from the App tab on the top menu bar)
- To cite algorithms/methods used in the Apps: All the methods used within apps are referenced within the corresponding App description page
- To cite a particular Project, with embedded datasets and App analysis sessions, again use the particular URL associated with the dataset. Here is an example of a publicly shared Nextera Rapid Capture Exome project, with 12 exome samples run on the HiSeq 2500 (you will notice the associated “Analyses” and “Samples” along the tab on the left) : https://basespace.illumina.com/projects/3289289/
2. Timing of availability of the BaseSpace Core Apps: The BWA/GATK whole-genome (WGS) App, the BWA/GATK Exome (Enrichment) App, as well as the Strelka-based Tumor-Normal App are available on BaseSpace today. The Isaac-based WGS and exome Apps, along with the RNA-Seq Apps will be available by the end of February.
As we near the end of the year, we’d like to share a fun competition put together by James Hadfield, director of the genomics core facility at the University of Cambridge, Cancer Research UK Cambridge Institute, and sponsored by Illumina and BaseSpace. James is a blogger at CoreGenomics, and is running a holiday contest that will test your knowledge of library prep and sequencing applications. How much do you know about the cost of library prep and sequencing for various applications, including genome sequencing, RNA-Seq, Exome Seq and others? Take the CoreGenomics logic challenge and find out!
The winner of this festive competition will receive a MiSeq 600 cycle run to be performed in James’ lab, and 250 iCredits towards BaseSpace analysis. You’re in charge of library prep, and of course, the sample for sequencing, although James may lean towards seasonally appropriate species such as the cranberry or fir tree genome, (our pick: follow up study on the Reindeer rumen microbiome).
So check out the CoreGenomics post here, with a link to enter the contest, as well as all the rules and regs. We’re happy to help support this fun exercise by covering the kit and the iCredits.
Happy Holidays from Illumina and the BaseSpace team!