We are excited to announce the availability of a data upload feature for FASTQ files that were previously generated on Illumina sequencing instruments. This simple-to-use feature is accessible from any project to which the user has write access by first clicking on the project and then selecting the Import tab shown below.
The user will then be prompted to select their import type. The user can upload a single sample by clicking on “Sample” as shown below.
The user can then either “Drag and drop” one or more files into the webpage or click on “select files” and select which files they would like to upload from a file browser. Note that the FASTQ files need to adhere to Illumina standards, as specified below. Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively. It will take 1-2 hours to upload a 25GB sample on a network with a relatively fast internet connection.
The user will then see a progress bar as the file/s are uploaded. Once the progress bar completes, the user can add additional files. The user can also set the sample name and associate a genome with the sample in the upper left hand corner of the screen.
Once the user has imported all of the files and the files complete uploading, the user will need to click on the “Complete Import” button (shown above) to complete the session.
FASTQ file standards
- The uploader will only support gzipped FASTQ files generated on Illumina instruments
- The name of the FASTQ files must conform the following convention:
- SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
- The read descriptor in the FASTQ files must conform to the following convention:
- @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
- Read 1 descriptor would look like this:
- Read 2 would have a 2 in the ReadNum field, like this:
- Read 1 descriptor would look like this:
- @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
- The number of base calls for each read must equal the number of quality scores
- The number of entries for Read 1 must equal the number of entries for Read 2
- The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum
- For paired-end reads, the descriptor must match for every entry for both reads 1 and 2
- Each read has passed filter
- Only one sample can be uploaded at a time
- A maximum of 16 files can be uploaded in a session
- The size of the uploaded files cannot exceed 25 GB
- A detailed description of how to use the uploader can be found in the BaseSpace user guide
DeepChek®-HIV – App for genotyping by NGS and inferred drug resistance testing – for research use only
HIV genotyping and inferred drug resistance testing has become an integral part of the clinical management of patients infected with HIV. Detecting minority populations of resistant viruses is now routinely done. Next-generation sequencing (NGS) technology is replacing Sanger sequencing methodology, and end-to-end solutions combining sensitive genomic tests with advanced data management software platforms are in high demand.
DeepChek®-HIV is easy-to-use downstream analysis software for NGS data management, interpretation, and reporting for Research Use Only. DeepChek is a reliable software and database solution that is capable of handling the complexity of NGS data for all the key genomic regions involved in HIV drug resistance (reverse transcriptase, protease, integrase, GP41, and GP120/V3). The database is regularly updated with the most recent drug resistance information and provides an efficient and downstream analysis platform for clinical laboratories involved in routine HIV-1 genotyping and drug resistance testing.
Link to App in BaseSpace:
Link to example dataset with example input data and output results:
Sequence and stream data to BaseSpace- done. Run quality check- done. Alignment and variant calling- done. Congratulations, you now have a set of variants! But what good is a set of variants if you can’t describe what they mean, how they might explain the phenotype of the specimen, and which ones aren’t worth worrying about? There is good news- now that Illumina VariantStudio is available on BaseSpace, deciphering the biological meaning of genomic variants is not a huge challenge. If you’re not familiar with it, VariantStudio1 is an easy-to-use tool for variant annotation, filtering, and reporting.
- Step 1: Import data into VariantStudio. Click the import button and you will be able to browse and select VCF and gVCF files stored in BaseSpace to import into VariantStudio. You can import DNA variant data from targeted, exome, or whole-genome sequencing, but VariantStudio only supports human SNP and indel analysis at this time.
- Step 2: Annotate variants using the Illumina Annotation Service, which aggregates annotations from a broad range of public sources including ClinVar, COSMIC, OMIM, and 1000 Genomes Project. Variants will be richly annotated with biological information including transcript consequence, functional impact, known disease association, population allele frequency, and more.
- Step 3: Apply a cascade of filtering options to quickly create a short list of candidate variants that are likely associated with the disease or phenotype. In addition to single sample analysis, you can perform tumor/normal comparisons to identify somatic mutations or family-based analyses to investigate variants underlying rare disease.
- Step 4: Use the provided annotations to classify variants based on their presumed biological impact. A common scheme is pathogenic, likely pathogenic, benign, or unknown significance. Or you can use your own, customizable classification scheme.
- Step 5: Generate a customizable report that summarizes your important variants, along with any additional metadata.
Import variants > Annotate > Filter > Classify and Interpret > Generate Report. Watch our analysis videos and see how quickly you can go through this workflow. VariantStudio is a powerful, secure2 tool to simplify genomic data interpretation, and accessing it on BaseSpace is just a click away. With the addition of VariantStudio to the BaseSpace Core Apps, BaseSpace users can now execute the entire sample-to-answer workflow- from generating sequence reads to reporting biologically significant results.
Two applications for Illumina’s synthetic long reads: TruSeq Long-Read Assembly and TruSeq Phasing Analysis
Using data generated by the TruSeq Synthetic Long-Read DNA Library Prep Kit (also released this week), the TruSeq Long-Read Assembly App executes the assembly of synthetic long reads, and the TruSeq Phasing Analysis App performs whole human genome phasing.
The TruSeq Long-Read Assembly App constructs synthetic long reads from shorter sequencing reads, providing FASTQ files for accurate genome assembly, genome finishing, de novo assembly and metagenomics analysis.
The TruSeq Phasing Analysis App performs whole human genome phasing, identifying haplotype information, co-inherited alleles and phasing de novo mutations. The application reports haplotype blocks across the genome and phasing confidence scores in a phased VCF file.
Together with the TruSeq Long-Read DNA Library Prep Kit and Illumina’s sequencing technology, these new apps provide a solution for long reads that spans library prep, sequencing, and informatics. See some sample data on the Illumina Blog, and learn more about synthetic long-read and phasing technology on the Illumina website.
We constantly strive to improve the experience for all users using our tools, and today we are excited to announce a few new updates we have made to the BaseSpace Prep Tab. The BaseSpace Prep Tab makes it easier for our users to prepare and plan a sequencing Run by using a rich web user interface which communicates directly with the instrument to set it up in four easy steps: preparing Biological Samples, Libraries, Pools, and planning a Run which can be discovered by the instrument. At the moment, the Prep Tab supports only NextSeqs but support for MiSeqs and HiSeqs is coming in the future. Today we’ve added a few new features to the BaseSpace Prep Tab, which we will explain in more detail below:
- The ability for users to Import Custom Library Prep Kits
- The ability for users to Import prepped Libraries in one step
- The Prep Libraries section under the Libraries tab is now supported on small laptop screens
For those that would like more information about the Prep Tab, please view the original NextSeq and Prep Tab blog post to learn more.
Import a Custom Library Prep Kit
- When prepping Libraries, under Library Prep Kit drop-down list, you can now choose to add a Custom Library Prep Kit.
- After selecting Custom Library Prep Kit, in addition to naming your kit, you will be asked to specify a few basic options for supported read types, indexing strategies, and default read cycles.
- Now, click on Choose .csv file to select a template file. You can customize your template file with the following information for the new kit:
- Custom adapter sequences
- Custom indexes (name and sequences)
- Custom default layout
Here’s a simple example template file:
- When complete, click Create New Kit to add this kit to the drop-down list that appears for your account.
- You can also view all of your Library Prep Kits under the My Account section of Basespace.
Import Prepped Libraries
- Users can now import prepped libraries all in one step, instead of importing biological samples and prepping them in 2 separate steps.
- First, access the Import feature from the Libraries page within the Prep tab by clicking on Import:
- Click the choose .csv File button
- In your .csv file, specify plate information and a list of libraries to import. The following is a simple example file:
- Once a file is selected, click Open.
- The page will now populate with prepped libraries, ensure that the information displayed is correct.
- When you’re done you can save the plate for later use or proceed directly to pooling libaries!
We hope you enjoy these changes and look forward to more updates in BaseSpace in the future.
The BaseSpace App store is continually expanding in the breadth and depth of applications, taking you from raw sequencing data through biological interpretation. Today we’re happy to introduce a new App from PathGenDX called PathSeq Virome.
Whether you are looking for a contaminant in your biological sample, identifying an infection in your sample, or investigating new viruses, PathSEQ Virome can provide the answer. PathSEQ Virome is a BaseSpace app that automatically identifies and characterizes virus genomes from a sequencing run, in a comprehensive and systematic way. After streaming from MiSeq, NextSeq or HiSeq, sequencing data is filtered, matched against the PathSEQ virus database, and quality-controlled for false positives. Currently able to detect >50,000 clinically relevant virus genomes, the app outputs a PDF report from ~2GB of sequence data within an hour. The report identifies the strength of the overall genomes matched, provides a confidence score to novel viruses (if related to any of the genomes in the database), while also identifying the genomic region of the match. If multiple viruses are present in the same sample, they are combined into the same report, with the highest matches first.
Getting started with PathSEQ Virome is simple. After running your sequence data on HiSeq or MiSeq, click on “Apps” in the header in BaseSpace and select the PathSEQ Virome app.
Alternatively, you can also launch the app from a Project by clicking on the “Launch App” button and selecting PathSEQ Virome from the pull-down menu
Select your Project from the pull-down menu within BaseSpace, and PathSEQ will prompt you to select which samples in the project you want to analyze. By default, the results will be saved to a new project folder named “PathSEQ_virome_results”.
Upon completion, BaseSpace will automatically send you the following email, if you have opted in for these notifications:
To get to the analysis report, simply click on the link in this email. All reports generated by the app are stored in a Project folder named “PathSEQ_virome_results”. Clicking on that folder provides convenient access all the reports generated by the app.
Check out this demo dataset containing an example of results from the PathSEQ Virome app, it will be shared with your account once you click on that link. The PathSEQ Virome app is available for a free trial for all Sample data that is less than 500 MB in size and a paid version of the app will be available in the near future without that sample size limitation.
Didn’t find the viruses you were expecting? Contact PathGENDx to find out about their wet lab viral enrichment process to increase sensitivity.
We invite you to try it out. PathSeq Virome comments and feedback are welcome through the PathGenDX portal: http://enquiry.pathgendx.com/enquiry_form.php
With the latest update of BaseSpace, we have updated the MiSeq Reporter (MSR) Workflows to v2.4! This means all users streaming MiSeq data to BaseSpace with automatic analysis launch will now have the latest and greatest features incorporated since the release of MSR v2.2 in BaseSpace. MSR v2.4 in BaseSpace will also provide consistent results with MSR local v2.4, released in March, 2014.
For full details on the MSR v2.4 release in BaseSpace, please see the Customer Release Notes.
New platform level features
- gVCF file output for TruSeq Amplicon, Enrichment, and PCR Amplicon workflows. The gVCF files provide data at all genomic coordinates assayed, regardless of the variant status, and show the reference allele if no variant is found. For more information on gVCF files, please see the following link: https://sites.google.com/site/gvcftools/home/about-gvcf
- BAM and VCF/gVCF files are now stamped with a header containing algorithm versions and parameter settings to easily track your analysis details.
- Updated the Starling variant caller to v2.0.3 for use with TruSeq Amplicon, Resequencing, and PCR Amplicon workflows. The default variant caller for these workflows remains GATK v1.6.
- Enabled StitchedReads in the TruSeq Amplicon and GenerateFASTQ workflows. Read stitching allows construction of a single, longer read from two overlapping, shorter paired reads. For more information on Stitched Reads, please see the Customer Release Notes.
Highlights for specific workflows
- The 16S Metagenomics workflow now allows species-level identification (previous versions stopped at genus-level). To classify down the species level, add the sample sheet setting described in the Customer Release Notes, or use Illumina Experiment Manager v1.7 to build your sample sheet. We also added a new, interactive HTML output report to dive deep into your 16S data. Lastly, the classification algorithm has been updated to provide faster results.
- The Enrichment workflow now allows users to specify padding values in the sample sheet. Please see the Customer Release Notes for additional details.
If you have any older runs that you want to push through these new workflows, please contact support through the “Contact Us” button on the site.