Galaxy

The Galaxy workflow system provides a simple way to analyze high-throughput sequencing and other biological datasets.

The Lewis-Sigler Institute Bioinformatics Group has setup a local instance of Galaxy for use by Princeton researchers.  In addition to many of the tools available from the public instance of Galaxy at Penn State, our local version provides easy upload of data from the HTSEQ system as well as a number of customized analysis tools requested by researchers in the Institute.

Getting Started

Getting access to Galaxy is simple, just send an email to the Lewis-Sigler Bioinformatics group.  Once you have been granted access, you can login at http://galaxy.princeton.edu with your Princeton NetId and password.

Get started by selecting Get Data from the menu on the left. Upload a file from your computer or import sequencing data from HTSEQ.

There are a number of ways to learn about the Galaxy system including:

Available Tools

There are many tools available in Galaxy, and you can even add your own custom tools (just ask).  Some of the more useful tools available for High Throughput Sequencing data include:

  • FASTQ Quality Control and Manipulation
    • Barcode Splitter - Split apart FASTQ files based on sequence barcodes (or indexes)
    • cutadapt - trim adapter sequences and poor quality reads
    • FASTQC - QC report of FASTQ files, excellent for getting an idea of library quality, contamination, etc.
    • Fastx Toolkit - various statistics and manipulation of FASTQ files
  • Genomic Interval Tools
    • BedTools - Intersect BAM and Count overlapping intervals tools can be very useful to obtain coverage of genomic features (genes). Creating a histogram of genome coverage is a great way to visualize a summary of how well covered your genome is after sequencing.
  • Mapping/Alignment
    • Bowtie and BWA (BWA has been enhanced with a quality trimming option)
    • SRMA - Short Read Micro Re-Aligner
    • GATK Indel Realigner
    • SAMtools and Picard alignment statistics and analysis
  • Variant (SNP/Indel) Analysis
    • freebayes - Bayesian genetic variant detector
    • GATK Unified Genotyper
    • Various VCF filtering and analysis tools
  • ChIP-Seq
    • MACS - Model-based Analysis of ChIP-Seq (both versions 1.3 and 1.4 available)
    • CCAT - Control-based ChIP-seq Analysis Tool
  • RNA-Seq
    • TopHat and Cufflinks - Analyze RNA-Seq data to find novel splice junctions, calculate transcript abundance, and more
  • Visualization
    • Trackster - Built in visualization online, save and share your genomic visualizations with others
    • View in IGV - You can view BAM files directly in IGV without downloading the BAM file. For more information check out the tutorial on using Galaxy and IGV at the Princeton HTSEQ Users Group.
    • View at UCSC Genome Browser - View BAM files directly, or convert them to BigWig files to view on UCSC's online genome browser.  Other datatypes such as BED are viewable as well.
  • More - Most command line tools and scripts can be added to Galaxy to include in your workflows and be shared with others.  Just let us know if you would like something to be added.