/home2 - Home directories only. Configuration, code etc. Not for active data analysis. Mirror backup twice-weekly on Mon and Wed.
/project - Large space, high-performance for large files. Not for working with large numbers of small files. Archive large collections of small files (<1MB files) and avoid working on very small files (<100KB). No backup by defaullt. Incremental backup is available - PIs should email biohpc-help@utsouthwestern.edu if you would like any content on /project backed up to /project1,
/work - This is also a high-performance filesystem for users to have LIVE HOT data since our recent upgrade. When using /work, you do not need to stripe your large single files for performance as in /project. Each user has 5 TB of space. /work is mirror backup'ed once per week (Friday/Saturday, no old versions).
/archive - This is a place for users to store COLD data. Each lab has 5TB of space by default. Quota can be increased upon approval. Accounting usage will be 2/3 of actual usage. /archive file system has similar directory tree setup as /project.
Overall, Single thread writing to /work or /archive can be up to 2.3 GB/s, slightly faster than /project, metadata query is slightly slower. For most applications, you will not feel the performance difference among /work, /archive and /project.
** For applications which need to read large files from multiple threads concurrently (eg. sequencing applications reading large reference database), /work or /archive are optimal choices than /project since the IO throughput are passed from more arrays of disks.
/home2 - 50GB per user
/project - 5TB+ for the lab as agreed with PI/department chair (default 5TB for each new lab)
/work - 5TB per user (but not for long term storage)
/archive - 5TB+ for the lab as agreed with PI/department chair (default 5TB for each new lab)
Quota stats show soft and hard limits. You need to keep within the soft limit. The hard limit only exists to give a margin of safety so that jobs generating more data than expected do not fail.
The biohpc_quota command shows your quota status on each filesystem:
To see individual usage for a user on project use the lfs quota command:
lfs quota –u <username> -h /project
The /project filesystem is a parallel filesystem consisting of 40 storage targets. By default each file is stored on a single target. Each target can provide read speeds of up to 1GB/s depending on use.
Faster speeds can be achieved for very large files by striping the file across multiple targets. Most software can’t read files fast enough to benefit from striping – but some can. If you have many processes all reading from a single file then striping can also help improve the aggregate speed.
Some important rules:
NEVER use a stripe count of more than 8 – usually no benefit, and it slows things down for others.
ONLY stripe large files. Striping files <1GB will increase the load on the system with no real benefit.
ONLY use the -c stripe count option for setstripe. Never change stripe index or stripe size!
Try to set striping on directories – and keep large and small files separate so you can do this.
When you set striping on a directory it only applies to new files in that directory. To apply striping to old files you must copy (not move) them inside the directory that has striping set.
To set striping for a directory:
To see striping settings for a directory or file:
To apply striping to an existing file in a directory:
BE CAREFUL – make sure you are certain you don’t overwrite the wrong thing. It can be safer to create a new directory and copy files into it.
How many Stripes?
The following general rules are appropriate for our storage system:
1 |
Default – Any file that doesn’t fit the criteria below. |
2 |
Moderate size files 2-10GB that are read by 1-2 concurrent processes |
4 |
Moderate size files 2-10GB that are read by 3+ concurrent processes regularly Large files 10GB+ that are read by 1-2 concurrent processes |
8 |
Large files 10GB+ that are read by 3+ concurrent processes regularly Any very large files 200GB+ (to balance storage target usage) |
Remember – performance is very good even without striping. You only have to worry about striping at all if you have a real need to increase performance, or are storing files that are 100s of GBs in size.