Tag Archive | Upload

FASTQ upload is now available in BaseSpace

We are excited to announce the availability of a data upload feature for FASTQ files that were previously generated on Illumina sequencing instruments. This simple-to-use feature is accessible from any project to which the user has write access by first clicking on the project and then selecting the Import tab shown below.

ProjectTab

The user will then be prompted to select their import type. The user can upload a single sample by clicking on “Sample” as shown below.

Samples

The user can then either “Drag and drop” one or more files into the webpage or click on “select files” and select which files they would like to upload from a file browser. Note that the FASTQ files need to adhere to Illumina standards, as specified below.  Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively. It will take 1-2 hours to upload a 25GB sample on a network with a relatively fast internet connection.

dranganddrop

The user will then see a progress bar as the file/s are uploaded. Once the progress bar completes, the user can add additional files. The user can also set the sample name and associate a genome with the sample in the upper left hand corner of the screen.

upload_screen

Once the user has imported all of the files and the files complete uploading, the user will need to click on the  “Complete Import” button (shown above) to complete the session.

FASTQ file standards

  • The uploader will only support gzipped FASTQ files generated on Illumina instruments
  • The name of the FASTQ files must conform the following convention:
    • SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
  • The read descriptor in the FASTQ files must conform to the following convention:
    • @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
      • Read 1 descriptor would look like this:
        @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13
      • Read 2 would have a 2 in the ReadNum field, like this:
        @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13

Quality considerations

  • The number of base calls for each read must equal the number of quality scores
  • The number of entries for Read 1 must equal the number of entries for Read 2
  • The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum
  • For paired-end reads, the descriptor must match for every entry for both reads 1 and 2
  • Each read has passed filter

Upload parameters

  • Only one sample can be uploaded at a time
  • A maximum of 16 files can be uploaded in a session
  • The size of the uploaded files cannot exceed 25 GB
  • A detailed description of how to use the uploader can be found in the BaseSpace user guide