As mentioned in the introduction, this will be a guided walk-through of the online seurat tutorial, so first, we will download the raw data available here.
Unzip the file and remember where you saved it (you will need to supply the path to the data next).
Next, in Rstudio, we will load the appropriate libraries and import the raw data. We will also be optimizing memory usage (important when dealing with large datasets) using seurat’s sparse matrices,
library(Seurat) library(dplyr) library(Matrix) # Load the PBMC dataset pbmc.data <- Read10X(data.dir = "~/Downloads/filtered_gene_bc_matrices/hg19/") # Examine the memory savings between regular and sparse matrices dense.size <- object.size(x = as.matrix(x = pbmc.data)) dense.size sparse.size <- object.size(x = pbmc.data) sparse.size dense.size/sparse.size
Comparing the dense and sparse size allows us to examine the memory savings using the sparse matrices.
Now we will initialize the Seurat object in using the raw “non-normalized” data. At this point, it is a good idea to perform some initial prefiltering of the data. We do this at the gene and cell level by excluding any genes that are not expressed in at least 3 cells, and excluding any genes that do not have a minimum of 200 expressed genes in total. We also name our project “10X_PBMC”.
pbmc <- CreateSeuratObject(raw.data = pbmc.data, min.cells = 3, min.genes = 200, project = "10X_PBMC")
Depending on your experiment and data, you might want to experiment with these cutoffs. For example, you might want to adjust the minimum number of detected genes to a higher threshold if you have deep coverage, or not impose it completely in case you have a very low number of reads for your cells.