Seurat on BioHPC
Seurat, from the Satija lab (https://satijalab.org/seurat/) is a popular R toolkit for single cell genomics. Because Seurat is a complex package, with many dependencies, it can be difficult to install.
In addition, Seurat 2.3 and above support clustering with UMAP. Unfortunately this doesn't happen inside R, but requires the umap-learn package for Python. The R Reticulate library which provides the R->Python bridge may not find umap-learn installed into the Anaconda Python environments used on BioHPC. If it is found, there can still be conflicts between dependencies in the conda environment, and the cluster R modules. If you need to use UMAP functionality in Seurat you can do so using the
seurat/2.3.4 module which has been built specifically to support this.
There are 3 ways to use Seurat on BioHPC:
RStudio OnDemand with Seurat (no UMAP)
BioHPC's OnDemand service offers an easy way to launch an RStudio session from the web, which will run on an exclusively allocated cluster node, and allow you to connect by clicking a link in your browser. Seurat is available in the R 3.5.1 OnDemand environment.
Note: the OnDemand session does not currently support umap functionality of Seurat. BioHPC expects to enable this in January 2019.
- Visit https://portal.biohpc.swmed.edu in your web browser, and navigate to the
'BioHPC OnDemand -> RStudio OnDemand page'.
- Choose the 'R 3.5.1 with Seurat' option in the 'Job type' dropdown menu and click 'Launch Job'
- Once the job launches you can click the web link to connect to an RStudio session, login with the password shown on the OnDemand launch page, and begin working with R and Seurat.
seurat/2.3.4 cluster module (supports UMAP)
To allow use of UMAP functionality in Seurat we have built a
seurat/2.3.4 module that you can access via
module load seurat/2.3.4. This module provides Seurat inside a Singularity container, where R, Seurat, Python, umap-learn have all been setup to work nicely together.
module load seurat/2.3.4 you can then:
- Start an R CLI session using the command
- Start an rstudio desktop GUI session using the command
Important things to note about this environment:
- Although R / rstudio run inside a container, you can access all of your cluster files.
- The R installation in the container is 3.5.1
- The R installation does not share a personal package library with any other R modules on the cluster.
- You can install packages using
install.packagesand they will be put in a location
~/R/module-seurat-2.3.4which is specific to this
seurat/2.3.4containerized environment only.
The module allows you to use Seurat with umap from:
- A command line session on a BioHPC machine
- From a Slurm batch job
- Inside a webGUI web visualization session
- On a workstation/thin-client or webDesktop.
Install Seurat into a personal library (no UMAP)
If you wish to install Seurat yourself, into a personal library to work with the existing
R/3.5.1-gccmkl modules you can do so, but you will not be able to use the UMAP functionality, due to the inability of R's Reticulate to find umap-learn in the Anaconda Python environments on BioHPC.
To install Seurat, follow this procedure at the nucleus login node in an ssh session:
$ module add hdf5_18/1.8.17 $ module add R/3.4.1-gccmkl $ R > install.packages("Seurat")
(You can substitute
R/3.5.1-gccmkl if you wish to use R 3.5 rather than R 3.4.)
Using a 3.0.0 pre-release or development version of seurat
The 3.0.0 version of seurat is in preparation for release, but is usable and has features that may be interesting. The containerized seurat/2.3.4 module is setup so that you can upgrade to 3.0.0 pre-release or the development branch of seurat.
To update to the 3.0.0 pre-release
Every time you load the seurat/2.3.4 module, and seurat-R you will now be using seurat 3.0
To update to the development branch
Every time you load the seurat/2.3.4 module, and seurat-R you will now be using the seurat development branch, from the date that you ran these commands.
Returning to the 2.3.4 stable version
Installing packages inside seurat-R will add them to a personal R library in your home directory at ~/R/module-seurat-2.3.4 which is separate from any other R install. You can return to the stable 2.3.4 default version of seurat in the container by removing this directory:
Questions / Comments
Please email email@example.com if you have questions, issues, or comments regarding Seurat on BioHPC.