Barcode Splitting Multiplexed Data

If you multiplexed samples into a single lane using Illumina's barcoded TruSeq adapters, you may need to split your reads into separate files for each sample.  The simplest way to do this is to use the Galaxy system's "Barcode Splitter" tool. If you don't already have access to Galaxy, you can request it by sending an email to the Lewis-Sigler Bioinformatics Group with your name, Princeton lab affiliation, and your Princeton NetID.

Import your data into Galaxy

  1. Login to Galaxy
  2. Choose the Get Data => Princeton HTSEQ tool from the left menu
  3. Login to the HTSEQ database and use select the menu Search => Assay Search to find the assay you are interested in.
  4. Click the [Upload to Galaxy] button next to the Read 1 passed filter data file, and click Upload to Galaxy to confirm on the next page. You should now see a new data file in your Galaxy history that will be yellow while the data imports.
  5. Repeat this process for Read 2 (Index read) and the Barcodes file (if this is missing or inaccurate, you will need a tab delimited file with two columns, the first is the sample name and the second the barcode sequence that corresponds to it).
  6. For Paired End Runs Only - Upload Read 3 data

Split the data into individual files for each sample

  1. From the top menu, select [Shared Data] => [Published Workflows] 
  2. Choose either [Barcode Split (single-end)] or [Barcode Split (paired-end)] 
  3. Select [+ Import Workflow] from the top right, and click on "start using this workflow"
  4. Click on the new workflow from the menu [Imported: Barcode Split (single-end)] and select [Run]
  5. Select the appropriate data files in the menus for Read 1, Barcodes File, and Read 2.
  6. Choose an appropriate number of mismatches for the barcode matching (typically 0 or 1 mismatch is appropriate). For paired-end data, you must enter the same number of mistmatches for BOTH Barcode splitting steps.
  7. Click on [Run Workflow]. You will receive an email at your princeton.edu email address when the splitting is complete.

Review the results

Each input read will be split into multiple files, one for each sample in the barcodes file and one to hold reads that didn't match any known sample. There will also be a small report that indicates how many reads were matched to each sample and the percent of the total.