RNA-seq Analysis

(Presentation)

Introduction

RNA-seq is essentially the sequence of RNA molecules from either a specific cell, tissue, or species. There are two main motivations for sequencing RNA:

  1. Identifying differential expression of genes by comparing different samples.
  2. Attempt to capture all RNA molecules in a given species.

In the case where a species does not have a sequenced genome, the researcher has to do (2) before they can do (1).

(Hass and Zody, Advancing RNA-Seq analysis, Nature Biotechnology 28:421-423)

In addition to RNA sequencing, Microarray technologies are an alternative method to determining differentially expressed genes. The main limitations of Microarray technologies are:

  1. Knowledge of reference genome is required to determine unique probes
  2. Can only determine differentially expressed for genes that are represented on the chip
  3. GeneChips do not have enough resolution to differentiate differential expression from different isoforms of the same gene.

(Pepke S., Wold B., Mortazavi A., Computation for ChIP-seq and RNA-seq studies. Nature Methods 6:11, Nov 2009)

There are several different method for normalization and differentially expressed genes. Wang et al (Nature Biotechnology 2014) compared results from Microarray and RNA-seq using different methods and their results show that genes with a higher level of expression are detected by most methods.

The workflow

  • Quality Contol : Fastqc

  • Quality Trimming: Trimmomatic

  • Alignment: Tophat2 & Bowtie2

  • Index and Dedup: Picard

DESeq Example

Required R/Bioconductor packages

  • Rsamtools
  • GenomicFeatures
  • GenomicAlignments
  • DESeq

Dataset to download

https://drive.google.com/drive/folders/0B172nc4dAaaObEhfZkVxaUFLY28?usp=sharing