GPU-Cluster Requirements.md 3.1 KB
Newer Older
1
# Current Situation
2

3
## Hardware Stack
4
Currently planned node-configuration:
5
6
- 2x Intel Xeon Ice Lake CPU (26 cores, 2.20GHz)
- 256 GB DDR4 RAM
7
8
- 8 GPUs

9
10
11
We are currently looking into the following GPU configuration (80%SP/20%DP):
- 9x 8 Nvidia RTX 3080 with 10GB GDDR6, 8704 CUDA cores (alternatively Nvidia Quadro RTX4000 with 8GB GDDR6, 2304 CUDA cores)
- 2x 8 Nvidia A100 with 40GB HBM2
12

13
## Software Stack
14
15
16
17
Currently planned:
- Scientific Linux 7 (to ensure compatibility with the existing CPU-Cluster)
- In the long future we are looking to migrate to CentOS since the support for SL7 got discontinued.

18
# Additional Information
19
20
Difference between SingePrecision and DoublePrecision:  
[simple explanation](https://www.thecrazyprogrammer.com/2018/04/single-precision-vs-double-precision.html)  
21
22
23
[Comparison of NVIDIA Tesla/Quadro (DP) and NVIDIA GeForce (SP) GPUs](https://www.microway.com/knowledge-center-articles/comparison-of-nvidia-geforce-gpus-and-nvidia-tesla-gpus)

Desktop GPUs like the RTX2080 are SingePrecision (SP) cards, where professional ones like the V100 are DoublePrecision (DP) ones.
24

25
# Requirements
26

27
28
29
Section | Project Description | Hardware Requirements | Software Requirements | Comments
--- | --- | --- | --- | ---
copy | paste | this | line | !
30
2.4 | Deep learning for fast magnitude estimation of earthquakes | Single precision GPU with > 10 GB GPU memory, at least 200 GB main memory, 8 CPU cores | up to date nvidia driver, the rest works fine with conda (cuda, tensorflow, pytorch) | I fear that 2 CPUs for 8 GPUs might be to less. The machine I'm currently running on has two Xeon Gold 5122 and four RTX 2080 Ti and is CPU bound.
31
2.8 | Deep learning with solar images | Single precision GPU with > 20 GB GPU memory (RTX Titan not 2080Ti) | Nvidia driver. Module loads for standard python software, Virtualenv. Horovod required for multi node computations | I suggest 4 GPUs per node not 8.
32
1.3 | Machine learning for numerical modelling, data assimilation and data inversion | Double precision GPU with > 10 GB GPU memory (e.g., V100) | Nvidia driver, cuda, Python, R, Tensorflow and Keras libs, NetCDF support (+ ncview), Climate Data Operators (CDO), tmux | --
33
34
2.4 | Full waveforms inversion for seismic wave propogation simulation | Single precison GPU with  around 20 GB GPU memory  | Nvidia driver,conda, hdf5, netcdf 
3.6 | Atomistic simulation of (Geo-)materials | DP GPU preferred over SP | Nvidia drv, cuda, Python, MKL | for us at least 192 GB main memory, preferably more. Also, why not consider AMD EPYC instead of xeon? From a price/performance point of view
Maximilian Dolling's avatar
Maximilian Dolling committed
35
1.4 | Machine learning for Sentinel-1 SAR data | Double precision GPU with > 10 GB GPU memory | Nvidia driver with cuda, tensorflow, Python | !
36
37
4.4 | Inundation modelling | DP GPU |PGI fortran compiler supporting CUDA and OpenACC,PGI debugger, Visual PGI profiler |In case, we want to get PGI to work as an MPI compiler, the MPI library needs to be re-compiled with PGI.|!
1.4 | CNN & Capsule Nets; Hyperspectral Image Classification and Regression tasks | Single precision GPU with > 20 GB GPU memory | Nvidia driver, cuda, Python, Tensorflow and Keras, PyTorch | --