Hard drives are slow. It does not matter whether we are talking HDD or SDD drives, it matters what we compare it to. In the context of high-performance computing, this means memory (RAM). To set this into perspective, here are some common approximate timings alongside a more intuitive “human scale” where a fast lookup is on the order of a fast human lookup of one second.
What | Actual | Human scale |
Duration of light to travel 30 cm | 1 ns | |
L1 cache | 2 ns | 1 s |
One core to another | 70 ns | 35 s |
Memory random access | 100 ns | 50 s |
Infiniband random access | 1 mus | 8 m |
SDD random access | 100 mus | 14 h |
Read 1MB from memory | 250 mus | 1.4 d |
HDD random access | 8 ms | 46 d |
Read 1MB from lustre | 8 ms | 46 d |
Read 1MB from network | 10 ms | 57 d |
Read 1MB from disk | 30 ms | 0.5 a |
Since many codes have a long history, they may expect to be able to write to disk without repercussions. With multicore machines becoming standard however, it becomes exceedingly difficult (and expensive!) to provide a fast filesystem that is disk based. Since not all computing centres have enough funding to set up a hierarchical file system comprised of layers of memory (fast IO), SSD (fast persistent) and HDD (cheap persistent), it is necessary to have an alternative approach that scales with the number of compute cores involved.
The solution is called a ramdisk or a memory-backed filesystem. This is like a regular disk, with the difference that the contents are stored in memory only, so they are never persisted to disk. The ephemeral nature of this storage means that it is mostly useful for temporary files that are frequently read from or written to. This could be e.g. the checkpoint files written by Gaussian or the wavefunction files written by CP2K or Molpro. Even fast semiempirical codes like mopac can be sped up this way. For IO-heavy codes, the speed up can be several orders of magnitude. In reality, one often obtains about 20-fold increase and significantly more reliable performance.
How to use it
Modern machines should offer /dev/shm as a virtual device. Using a ramdisk for your operations is easy: create a folder on /dev/shm/ with e.g. your username and use it as temporary directory just like you would use scratch directories. Please make sure to delete any files you do not need any more, because they will occupy memory until the machine is rebooted otherwise. If you use this mechanism in a compute cluster environment, please keep in mind that you need to allocate the memory of the ramdisk on top on the memory requirements of your code. Only this ensures that no code crashes as side effect of your actions.
Should you work with legacy code that requires static paths for temporary storage, you can use symlinks. This is a unix feature to place individual directories or folders on a different file system. In this case, this means that to userspace applications, this folder then appears as if it was on disk while it actually resides in memory. To do that, first create a directory in memory:
$ mkdir /dev/shm/DEMO
$ cd /path/to/folder/containing/tempdir
$ rm -f name_of_tempdir
$ ln -s /dev/shm/DEMO name_of_tempdir
The last line actually create the symlink. Note that deleting the symlink does not delete the contents held in memory. Those you need to delete from /dev/shm separately.
Many small writes or reads are a problem
Individual read or write operations are measured as IOPS (input / output operation per second). The following numbers give an idea why hard drives get so slow on accessing many files:
Consumer grade HDD | 50 IOPS |
Enterprise HDD | 200 IOPS |
Consumer grade SSD | 20.000 IOPS |
Enterprise SSD | 400.000 IOPS |
Memory | 10.000.000 IOPS |
This shows that cases (for example post-processing of QM data, parsing of log files) works best when the data is kept in memory. This is particularly true if you plan to follow a random access pattern (read: jumping between log lines), because disks are much better suited for sequential read (all lines in order). Therefore, in many cases it is significantly faster to copy data to a ramdisk first, analyse it there and write the results back to disk for persistence.
Caveat: All numbers reported on this page are only guidance not hard numbers. All benchmark numbers heavily depend on the environment and the problem at hand. The numbers are, however, typical for many cases and serve as an example.