FASTQ upload is now available in BaseSpace

We are excited to announce the availability of a data upload feature for FASTQ files that were previously generated on Illumina sequencing instruments. This simple-to-use feature is accessible from any project to which the user has write access by first clicking on the project and then selecting the Import tab shown below.

ProjectTab

The user will then be prompted to select their import type. The user can upload a single sample by clicking on “Sample” as shown below.

Samples

The user can then either “Drag and drop” one or more files into the webpage or click on “select files” and select which files they would like to upload from a file browser. Note that the FASTQ files need to adhere to Illumina standards, as specified below.  Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively. It will take 1-2 hours to upload a 25GB sample on a network with a relatively fast internet connection.

dranganddrop

The user will then see a progress bar as the file/s are uploaded. Once the progress bar completes, the user can add additional files. The user can also set the sample name and associate a genome with the sample in the upper left hand corner of the screen.

upload_screen

Once the user has imported all of the files and the files complete uploading, the user will need to click on the  “Complete Import” button (shown above) to complete the session.

FASTQ file standards

  • The uploader will only support gzipped FASTQ files generated on Illumina instruments
  • The name of the FASTQ files must conform the following convention:
    • SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
  • The read descriptor in the FASTQ files must conform to the following convention:
    • @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:
      • Read 1 descriptor would look like this:
        @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13
      • Read 2 would have a 2 in the ReadNum field, like this:
        @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13

Quality considerations

  • The number of base calls for each read must equal the number of quality scores
  • The number of entries for Read 1 must equal the number of entries for Read 2
  • The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum
  • For paired-end reads, the descriptor must match for every entry for both reads 1 and 2
  • Each read has passed filter

Upload parameters

  • Only one sample can be uploaded at a time
  • A maximum of 16 files can be uploaded in a session
  • The size of the uploaded files cannot exceed 25 GB
  • A detailed description of how to use the uploader can be found in the BaseSpace user guide

Tags: , ,

17 responses to “FASTQ upload is now available in BaseSpace”

  1. Nandita says :

    Good to hear- this is very useful.Can I also get this on my BaseSpace Onsite? Thanks

  2. Barry Murphy says :

    Great News. Really looking forward to seeing how this works.

  3. Dr.K says :

    Do you have delete or remove function for the unsuccessful uploaded fastq files or items?

  4. Chentha Vasu says :

    why does the failed sign appear when drag and drop the correct format file (like: 44_S44_L001_R1_001.fastq)? Thanks, Chentha

  5. Chentha Vasu says :

    Not able to import files and failed sign appears. Thanks

    • Ilya Chorny says :

      The files need to be gzipped (gnu zip) with a .gz extension Please see the BaseSpace user guide for further information about making sure the files conform to Illumina standards.

  6. michelmfarah says :

    Hi. I’m trying import the fastq.gz files and failed… Say that the name of the files were wrong, here is the name format of my files.

    CH12_AGTCAAA_L005_R1_001.fastq.gz

    Anyone know what is wrong?

    Thanks,

    • Ilya Chorny says :

      Hi.

      You should change the name to CH12-AGTCAA_S1_L005_R1_001.fastq.gz. Note the dash and the underscore.

      Thanks,

      Ilya

  7. Chelse says :

    I am trying to upload a file from 1000 genomes into Illumina, please lead me in the right direction of how to do this.. Thank you!!!!!!!!!!

  8. JLuis says :

    You mentioned that the FASTQ reads must have the following header format to be uploaded to BaseSpace:

    @Instrument:RunID:FlowCellID:Lane:Tile:X:Y readNum:FilterFlag:0:SampleNumber:

    Can I upload Illumina generated FASTQ files with any of this header formats?

    @FCH9RVLADXX:1:1103:4045:80994#/1

    @FCH9RVLADXX:1:1103:4045:80994#NGTACTAG_NCTGCATA/1

    This is a very concerning issue to me…

    Thanks in advance for your answer.

    JL

    • Ilya Chorny says :

      Hi JL,

      You headers are not the same as the header we require. You headers should look like

      Read 1 descriptor would look like this:
      @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13
      Read 2 would have a 2 in the ReadNum field, like this:
      @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13

      so in your case you are missing the RunId and FlowCellID as well as the readNum:FilterFlag:0:SampleNumber field.

      @FCH9RVLADXX:1:1103:4045:80994#/1 will need to be converted to

      @FCH9RVLADXX:62:000000000-A2CYG:1:1103:4045:80994 1:N:0:1 for read one and
      @FCH9RVLADXX:62:000000000-A2CYG:1:1103:4045:80994 2:N:0:1 for read two

      You will need to modify your headers accordingly.

      Thanks,

      Ilya

      • JLuis says :

        Dear Ilya,

        I have a question about the quality system of the uploaded FASTQs.

        1) Is there any requirement of quality standards?

        I mean can ASCII-64 quality standard be used or it is required to convert it to ASCII-33? Or are both of them correct?

        2) Apart from this, you require to change the sample number field to a single digit instead the adapter sequences, see example:

        You told me that we should have this kind of header:

        @FCH9RVLADXX:62:000000000-A2CYG:1:1103:4045:80994 1:N:0:1

        But I can get this one:

        @HWI-ST812:425:H9RVLADXX:1:1101:1286:2070 1:N:0:GTAGAGGA_NTAAGGAG -> for read one

        and

        @HWI-ST812:425:H9RVLADXX:1:1101:1286:2070 2:N:0:GTAGAGGA_NTAAGGAG -> for read two

        Are those with the full adapter sequence also valid or is mandatory to convert the adapter into a digit?

        Thanks in advance

        JL

  9. Ilya Chorny says :

    1. ASCI-33
    2. @FCH9RVLADXX:62:000000000-A2CYG:1:1103:4045:80994 1:N:0:1 is the correct format.

    • JLuis says :

      Is there a way to translate the index into a numeric code?

      Have you ever consider to accept FASTQ headers carrying the “index” instead of a number?

      I say this because CASAVA outputs FASTQs with the “index” header and it is not so straight forward to make that conversion even for bioinformaticians, so common users may find impossible to upload their data to BaseSpace and this will disappoint Illumina customers….like in this case.

      Thanks in advance

      JL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: