Over-Representation Analysis with ClusterProfiler

Install and load packages

#BiocManager::install("clusterProfiler", version = "3.8")
#BiocManager::install("pathview")
#install.packages("wordcloud")
library(clusterProfiler)
library(wordcloud)

Annotations

I’m using D melanogaster data, so I install and load the annotation “org.Dm.eg.db” below. See all annotations available here: http://bioconductor.org/packages/release/BiocViews.html#___OrgDb (there are 19 presently available).

organism = "org.Dm.eg.db"
#BiocManager::install(organism, character.only = TRUE)
library(organism, character.only = TRUE)

Prepare Input

# reading in input from deseq2
df = read.csv("drosphila_example_de.csv", header=TRUE)

# we want the log2 fold change 
original_gene_list <- df$log2FoldChange

# name the vector
names(original_gene_list) <- df$X

# omit any NA values 
gene_list<-na.omit(original_gene_list)

# sort the list in decreasing order (required for clusterProfiler)
gene_list = sort(gene_list, decreasing = TRUE)

# Exctract significant results (padj < 0.05)
sig_genes_df = subset(df, padj < 0.05)

# From significant results, we want to filter on log2fold change
genes <- sig_genes_df$log2FoldChange

# Name the vector
names(genes) <- sig_genes_df$X

# omit NA values
genes <- na.omit(genes)

# filter on min log2fold change (log2FoldChange > 2)
genes <- names(genes)[abs(genes) > 2]

Create enrichGO object

Params:

Ontology Options: [“BP”, “MF”, “CC”]
keyType This is the source of the annotation (gene ids). The options vary for each annotation. In the example of org.Dm.eg.db, the options are:

“ACCNUM” “ALIAS” “ENSEMBL” “ENSEMBLPROT” “ENSEMBLTRANS” “ENTREZID”
“ENZYME” “EVIDENCE” “EVIDENCEALL” “FLYBASE” “FLYBASECG” “FLYBASEPROT”
“GENENAME” “GO” “GOALL” “MAP” “ONTOLOGY” “ONTOLOGYALL”
“PATH” “PMID” “REFSEQ” “SYMBOL” “UNIGENE” “UNIPROT”

Check which options are available with the keytypes command, for example keytypes(org.Dm.eg.db).

Create the object

go_enrich <- enrichGO(gene = genes,
                      universe = names(gene_list),
                      OrgDb = organism, 
                      keyType = 'ENSEMBL',
                      readable = T,
                      ont = "BP",
                      pvalueCutoff = 0.05, 
                      qvalueCutoff = 0.10)

Output

Upset Plot

Emphasizes the genes overlapping among different gene sets.

#BiocManager::install("enrichplot")
library(enrichplot)
upsetplot(go_enrich)

Wordcloud

wcdf<-read.table(text=go_enrich$GeneRatio, sep = "/")[1]
wcdf$term<-go_enrich[,2]
wordcloud(words = wcdf$term, freq = wcdf$V1, scale=(c(4, .1)), colors=brewer.pal(8, "Dark2"), max.words = 25)

Barplot

barplot(go_enrich, 
        drop = TRUE, 
        showCategory = 10, 
        title = "GO Biological Pathways",
        font.size = 8)

Dotplot

dotplot(go_enrich)

Encrichment map:

Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional modules.

emapplot(go_enrich)

Enriched GO induced graph:

goplot(go_enrich, showCategory = 10)

Category Netplot

The cnetplot depicts the linkages of genes and biological concepts (e.g. GO terms or KEGG pathways) as a network (helpful to see which genes are involved in enriched pathways and genes that may belong to multiple annotation categories).

# categorySize can be either 'pvalue' or 'geneNum'
cnetplot(go_enrich, categorySize="pvalue", foldChange=gene_list)