Over-Representation Analysis with ClusterProfiler

Install and load packages

#BiocManager::install("clusterProfiler", version = "3.8")


I’m using D melanogaster data, so I install and load the annotation “org.Dm.eg.db” below. See all annotations available here: http://bioconductor.org/packages/release/BiocViews.html#___OrgDb (there are 19 presently available).

organism = "org.Dm.eg.db"
#BiocManager::install(organism, character.only = TRUE)
library(organism, character.only = TRUE)

Prepare Input

# reading in input from deseq2
df = read.csv("drosphila_example_de.csv", header=TRUE)

# we want the log2 fold change 
original_gene_list <- df$log2FoldChange

# name the vector
names(original_gene_list) <- df$X

# omit any NA values 

# sort the list in decreasing order (required for clusterProfiler)
gene_list = sort(gene_list, decreasing = TRUE)

# Exctract significant results (padj < 0.05)
sig_genes_df = subset(df, padj < 0.05)

# From significant results, we want to filter on log2fold change
genes <- sig_genes_df$log2FoldChange

# Name the vector
names(genes) <- sig_genes_df$X

# omit NA values
genes <- na.omit(genes)

# filter on min log2fold change (log2FoldChange > 2)
genes <- names(genes)[abs(genes) > 2]

Create enrichGO object


Ontology Options: [“BP”, “MF”, “CC”]
keyType This is the source of the annotation (gene ids). The options vary for each annotation. In the example of org.Dm.eg.db, the options are:


Check which options are available with the keytypes command, for example keytypes(org.Dm.eg.db).

Create the object

go_enrich <- enrichGO(gene = genes,
                      universe = names(gene_list),
                      OrgDb = organism, 
                      keyType = 'ENSEMBL',
                      readable = T,
                      ont = "BP",
                      pvalueCutoff = 0.05, 
                      qvalueCutoff = 0.10)


Upset Plot

Emphasizes the genes overlapping among different gene sets.



wcdf<-read.table(text=go_enrich$GeneRatio, sep = "/")[1]
wordcloud(words = wcdf$term, freq = wcdf$V1, scale=(c(4, .1)), colors=brewer.pal(8, "Dark2"), max.words = 25)


        drop = TRUE, 
        showCategory = 10, 
        title = "GO Biological Pathways",
        font.size = 8)



Encrichment map:

Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional modules.


Enriched GO induced graph:

goplot(go_enrich, showCategory = 10)

Category Netplot

The cnetplot depicts the linkages of genes and biological concepts (e.g. GO terms or KEGG pathways) as a network (helpful to see which genes are involved in enriched pathways and genes that may belong to multiple annotation categories).

# categorySize can be either 'pvalue' or 'geneNum'
cnetplot(go_enrich, categorySize="pvalue", foldChange=gene_list)