Oliver Henriot

Asked: 2022-03-08 01:44:33 +0800 CST2022-03-08 01:44:33 +0800 CST 2022-03-08 01:44:33 +0800 CST

accounting GPU compute time on HPC clusters

How do you account for GPU compute time on your HPC clusters ?

I have a growing, and quite heterogeneous (SXM4 A100s, PCIe A100s, NVlinked V100s, PCIe V100s, T4s, AMD cards arriving soon etc...), GPU partition on an HPC cluster (mixed hardware Debian servers running OAR scheduler).

Traditionally, we accounted compute time as seconds per core per job. Despite CPU and memory variability between nodes (fat nodes, high speed nodes, standard nodes), the difference was sufficiently small that it didn't impact accounting noticeably, especially in a small university setting.

On GPUs, things change quite a bit. The difference in performance and cost between an SXM4 A100 node and a T4 are quite significant and our current model is probably not going to cut it, moreover as growing university partnerships impose that we host more and more private sector projects which we will have to account for precisely.

I'm exploring how to do this accounting with our current infrastructure but was also wondering what methods by other people operating HPC GPU clusters. If you have any advice as to how to do this or what strategy/tools you have used, I'd be very willing to hear them!

Thanks!

accounting GPU compute time on HPC clusters

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

accounting GPU compute time on HPC clusters

0 Answers