Data Retrieval

After sequencing has completed, you will receive an email from the HTSEQ system that your data is ready.

The data from an Illumina sequencing run consists of a FASTQ formatted file for each read: the forward read, the index read, and optionally the reverse read.  The quality values are encoded in the standard Sanger encoding scheme (ASCII + 33) and they are gzipped to reduce file sizes and download times.

In addition, you have access to reads that were filtered out during quality control, in case those are useful to you.  Starting in the summer of 2011, we began spiking in 1% PhiX DNA into each sample to assist with quality control.  The HTSEQ system filters fragments that map to PhiX into a separate file. Mapping is determined by mapping read 1 (the forward read) to the PhiX genome using Bowtie with default settings.

Your data can be retrieved in two ways, via the HTSeq website or via a Network Drive.

HTSeq Website

The simplest way to retrieve your data is to login to HTSeq, search for the assay you are interested in, and click the download button For detailed instructions, please see help page. This method allows you to get Sanger formatted FASTQ files of reads as well as some basic QC analysis. The FASTQ files on HTSeq will be kept for as long space is available.

Network Drive

In addition to using the interactive HTSEQ website, you can also download your data via a network drive.  This can make scripting downloads simpler and is recommended when downloading data directly to a server such as the TIGRESS Della computing cluster or Lewis-Sigler's Cetus computing cluster.

In order to use this method of downloading your data you need:

  • Princeton NetID: You must have an active Princeton NetID which must match your HTSEQ UserID.  Please contact Lewis-Sigler computing support if you do not have an active NetID or are having difficulty connecting to the network drive.
  • An account on the Arrayfiles server: You must print, sign, and submit the Arrayfiles Data Storage and Retrieval Agreement to obtain an account on arrayfiles.princeton.edu.
  • On campus or VPN network connection: In order to access the network drive you will need to be on the Princeton network, or connected via VPN.

The data is available here

Windows

  1. In the Windows Explorer Address bar, type \\arrayfiles.princeton.edu\htseq
  2. Enter PRINCETON\yourNetId as the username. Enter your NetID password in the password field.

Or for a more permanent way to connect:

  1. Choose Map Network Drive from the Explorer Tools menu.
  2. Select the drive letter of your choice (e.g. "S:" or "Q:") and enter the share address in Folder field: \\arrayfiles.princeton.edu\htseq.
  3. If you tick the Reconnect at logon checkbox, Windows will attempt to reconnect the network drive when you next log in.

Mac OS X

  1. With the Finder active, select Connect to Server... from the Go menu, or from the keyboard use Command-K.
  2. In the Connect to Server window, type the following Address field: smb://yourNetId@arrayfiles.princeton.edu/htseq
  3. Click the Connect button and login using Princeton as the Workgroup/Domain, your NetID as Username, and your NetID password

Linux

  • Browse: Use the smbclient command to connect and transfer data interactively.  This is useful to test your connection to the server or if you do not have administrator rights to the system.
    smbclient //arrayfiles.princeton.edu/htseq -W PRINCETON -U <princeton_netid>
    ​The password will be your Princeton NetID password.
  • Mount the drive: To mount the network drive as a directory on your system, use the following proceedure:
  1. Make a directory for the mountpoint:
    mkdir /mnt/<name-of-mount-point>
  2. Mount the share: 
    mount -t smbfs -o username=<username>,password=<password> //arrayfiles.princeton.edu/htseq /mnt/<name-of-mountpoint>
  3. Create a symbolic link to the mounted drive: 
    ln -s /mnt/<name-of-mount-point> /<path-of-symlink>