Analyzing the data supplied with Seurat is a great way of understanding its functions and versatility, but ultimately, the goal is to be able to analyze your own data.
We often find that the biggest hurdle in adopting a software or tool in R, is the ability to load user data, rather than the supplied data. As such, here we will give you some code snippets that will allow you to do just that.
Remember that Seurat has some specific functions to deal with different scRNA technologies, but let’s say that the only data that you have is a gene expression matrix. That is, a plain text file, where each row represents a gene and each column represents a single cell with a raw count for every row (gene) in the file. You can load it in Seurat like this,
mydata <- CreateSeuratObject(raw.data = raw_counts, min.cells = 3, min.genes = 200, project = "mydata_scRNAseq")
mito.genes <- grep(pattern = "^MTN", x = rownames(x = mydata@data), value = TRUE)
Note that we specified the separator as being a “,”. You should change this to “\t” if your file is tab-delimited.
Also note that we did not supply any additional files, so no UMI information. In which case, the nUMI plot will reflect the total molecule counts in each cell, rather than UMI molecule counts.
Finally, when detecting our mitochondrial gene content, our pattern was changed to “MTN” because this is how mitochondrial genes are named in our file. Again, you will need to change this accordingly.
Question: What if I don’t have a gene expression matrix, and instead I have individual counts per single cell, where the genes (rows) are named using their gene IDs (such as ENSEMBL gene IDs)?
Answer: In this case, head over to our online TSAR resource, and click on the “merge counts” tab. From there, you will be able to load all your files representing all you cells, select the “replace with gene names” option, and download the merged file. It will do it for you automatically.
Analyze a different dataset in Seurat using the methods in the tutorial
Now is the moment of truth! Here we are supplying a publicly available dataset from 10X genomics, and using what you have learned in the previous sections you will need to reanalyze this data, filter it according to what you observe, and finally be able to summarize it!
- The data is available by following THIS link.
- 1k Brain Cells from an E18 Mouse
- Chromium Demonstration (v2 Chemistry) Dataset by Cell Ranger 2.1.0
- Cells from a combined cortex, hippocampus and sub ventricular zone of an E18 mouse.
- 931 cells detected
- Sequenced on Illumina HiSeq2500 with approximately 56,000 reads per cell
- 26bp read1 (16bp Chromium barcode and 10bp UMI), 98bp read2 (transcript), and 8bp I7 sample barcode
- Analysis run with —cells=2000