Ping a Specific Port

Question

elwray14

Asked: 2021-06-07 03:33:03 +0800 CST2021-06-07 03:33:03 +0800 CST 2021-06-07 03:33:03 +0800 CST

What is the best metric for auto-scaling GPU instances for machine learning inference in the cloud?

772

We have an API in AWS with a GPU instance that does inference. We have an auto-scaler setup with the minimum and maximum number of instances, but aren’t sure which metric (GPU/CPU usage, RAM usage, average latency, etc) or combination of metrics should be used to determine when a new instance needs to be launched to keep up with incoming requests.

Are there best practices in regards to what metrics should be used in this scenario? Inference in our case is very GPU intensive.

1 Answers

Voted

user3041539 · Answer 1 · 2022-07-04T07:26:57+08:00

Amazon CloudWatch Agent adds Support for NVIDIA GPU Metrics

https://aws.amazon.com/about-aws/whats-new/2022/02/amazon-cloudwatch-agent-nvidia-metrics/

Amazon CloudWatch agent now supports the collection of NVIDIA GPU performance metrics from Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing instances running Linux. GPU-based instances provide access to NVIDIA GPUs with thousands of compute cores. You can use these instances to accelerate scientific, engineering, and rendering applications. Customers can install and configure CloudWatch agent to collect system and application metrics from Amazon EC2, on-premises hosts, and containerized applications and send them to CloudWatch. CloudWatch provides you with data and actionable insights to monitor your applications and optimize resource utilization. GPU metrics are intended for users who want to monitor the utilization of GPU co-processors in their EC2 accelerated instances.

So based on these metrics, you'd probably want to monitor:

nvidia_smi_utilization_gpu The percentage of time over the past sample period during which one or more kernals on the GPU was running.
nvidia_smi_utilization_memory The percentage of time over the past sample period during which global (device) memory was being read or written.

What is the best metric for auto-scaling GPU instances for machine learning inference in the cloud?

Amazon CloudWatch Agent adds Support for NVIDIA GPU Metrics

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?