Seurat on BioHPC
Seurat, from the Satija lab (https://satijalab.org/seurat/) is a popular R toolkit for single cell genomics. Because Seurat is a complex package, with many dependencies, it can be difficult to install.
In addition, Seurat 2.3 supports clustering with UMAP. Unfortunately this doesn't happen inside R, but requires the umap-learn package for Python. The R Reticulate library which provides the R->Python bridge may not find umap-learn installed into the Anaconda Python environments used on BioHPC. If it is found, there can still be conflicts between dependencies in the conda environment, and the cluster R modules. If you need to use UMAP functionality in Seurat you can do so using the
seurat/2.3.4 module which has been built specifically to support this.
There are 3 ways to use Seurat on BioHPC:
RStudio OnDemand with Seurat (no UMAP)
BioHPC's OnDemand service offers an easy way to launch an RStudio session from the web, which will run on an exclusively allocated cluster node, and allow you to connect by clicking a link in your browser. Seurat is available in the R 3.5.1 OnDemand environment.
Note: the OnDemand session does not currently support umap functionality of Seurat. BioHPC expects to enable this in January 2019.
- Visit https://portal.biohpc.swmed.edu in your web browser, and navigate to the
'BioHPC OnDemand -> RStudio OnDemand page'.
- Choose the 'R 3.5.1 with Seurat' option in the 'Job type' dropdown menu and click 'Launch Job'
- Once the job launches you can click the web link to connect to an RStudio session, login with the password shown on the OnDemand launch page, and begin working with R and Seurat.
seurat/2.3.4 cluster module (supports UMAP)
To allow use of UMAP functionality in Seurat we have built a
seurat/2.3.4 module that you can access via
module load seurat/2.3.4. This module provides Seurat inside a Singularity container, where R, Seurat, Python, umap-learn have all been setup to work nicely together.
module load seurat/2.3.4 you can then:
- Start an R CLI session using the command
- Start an rstudio desktop GUI session using the command
Important things to note about this environment:
- Although R / rstudio run inside a container, you can access all of your cluster files.
- The R installation in the container is 3.5.1
- The R installation does not share a personal package library with any other R modules on the cluster.
- You can install packages using
install.packagesand they will be put in a location
~/R/module-seurat-2.3.4which is specific to this
seurat/2.3.4containerized environment only.
The module allows you to use Seurat with umap from:
- A command line session on a BioHPC machine
- From a Slurm batch job
- Inside a webGUI web visualization session
- On a workstation/thin-client or webDesktop.
Install Seurat into a personal library (no UMAP)
If you wish to install Seurat yourself, into a personal library to work with the existing
R/3.5.1-gccmkl modules you can do so, but you will not be able to use the UMAP functionality, due to the inability of R's Reticulate to find umap-learn in the Anaconda Python environments on BioHPC.
To install Seurat, follow this procedure at the nucleus login node in an ssh session:
$ module add hdf5_18/1.8.17 $ module add R/3.4.1-gccmkl $ R > install.packages("Seurat")
(You can substitute
R/3.5.1-gccmkl if you wish to use R 3.5 rather than R 3.4.)
Questions / Comments
Please email firstname.lastname@example.org if you have questions, issues, or comments regarding Seurat on BioHPC.