I am trying to add the GPU Nvidia module in ganglia (/ganglia/gmond_python_modules/gpu/nvidia/
).
Do we need to apply the ganglia_web.patch
patch?
If I do not apply the patch, I don't see any GPU metrics when I go to http://localhost/ganglia/
If I try to apply the patch, I have the following issue:
ubuntu@server:/usr/share/ganglia-webfrontend$ sudo patch -p0 < /home/ubuntu/gmond_python_modules/gpu/nvidia/ganglia_web.patch
sudo: unable to resolve host server
patching file host_view.php
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 37.
Hunk #3 FAILED at 144.
Hunk #4 FAILED at 153.
Hunk #5 FAILED at 169.
5 out of 5 hunks FAILED -- saving rejects to file host_view.php.rej
patching file templates/default/host_view.tpl
Hunk #1 FAILED at 80.
Hunk #2 FAILED at 89.
2 out of 2 hunks FAILED -- saving rejects to file templates/default/host_view.tpl.rej
ubuntu@server:/usr/share/ganglia-webfrontend$ cd /usr/share/ganglia-webfrontend
The readme does not mention what to do with the patch file.
The web interface does contain the GPU metric, but all images are 404:
When I go to a Grid > [name] > [gpu node]
, I don't see any GPU option:
On the Ganglia server (i.e., on the server where gmetad
is running), I ran:
git clone https://github.com/ganglia/gmond_python_modules.git
sudo cp gmond_python_modules/gpu/nvidia/graph.d/* /usr/share/ganglia-webfrontend/graph.d/
sudo /etc/init.d/gmetad restart
On the Ganglia client (i.e., on the server where gmond
is running, and where the GPU is located), I ran:
git clone https://github.com/ganglia/gmond_python_modules.git
sudo pip install nvidia-ml-py
sudo cp gmond_python_modules/gpu/nvidia/python_modules/nvidia.py /usr/lib/ganglia/nvidia.py
sudo cp gmond_python_modules/gpu/nvidia/conf.d/nvidia.pyconf /etc/ganglia/conf.d
sudo /etc/init.d/ganglia-monitor restart
I use:
- Ganglia Web Frontend version 3.6.1
- Ganglia Web Backend (gmetad) version 3.6.0
- RRDtool version 1.4.7.
- Ubuntu 14.04.3 LTS x64 server
After running into this myself, strangely enough yesterday as well. I asked a developer of the module. He said it should "just work" ... so, after playing a bit I found the following to work:
On web host:
On GPU Node (Note, this is RHEL/SL/Cent package names and locations):
From source:
Restart gmond
No need to patch the web tree now. So, on the Web interface go to:
There should be a "gpu metrics" in the listing now. Might want to collapse and look. If there isn't for some reason, you can go to the Grid > [Name] page, and at the bottom in the Metric drop down, select one of the gpu_* metrics. That may kick something once you do that. I had to do that to get one of the nodes to display the 'gpu metrics' section... but another I didn't.
YMMV.
-J
On Ubuntu xenial I found I also needed to add
modpython.conf
to tell ganglia'smodpython.so
to load thenvidia.py
module:sudo pip install nvidia-ml-py
From source:
If you don't have
/etc/ganglia/conf.d/modpython.conf