How can I use docker without sudo?

Question

NovHak

Asked: 2019-11-05 11:55:15 +0800 CST2019-11-05 11:55:15 +0800 CST 2019-11-05 11:55:15 +0800 CST

My Nvidia dGPU isn't detected after updating to 19.10

772

As can be guessed from the subject, I have an Optimus laptop. As long as I was running 19.04, I was able to switch to the Nvidia dGPU and back, using Prime (via the prime-select {intel|nvidia} command). Things changed after the upgrade to 19.10 though : the day following the upgrade, the system froze with the kernel complaining about some tasks being stucked, such as an rmmod one. I managed to get back my system by running prime-select nvidia in a chroot root login environment.

I won't get too much into side details such as removing the iGPU/dGPU drivers from the initramfs (what do these have to do into the initramfs anyway ?), but now it boots at least, with or without the dGPU prime-activated.

And that's where I come to the problem : if my system boots with the intel profile activated, switching to the nvidia profile doesn't work, since the dGPU isn't detected in hardware. And indeed, it is absent from an lspci listing. I have to reboot for the dGPU to be detected again. Hence, when I shutdown my system, I should always think of activating the nvidia profile beforehand, or I will have to reboot to be able to use it the next time.

That's my main problem. Another, less annoying one, is that I always have to restart the gdm service when switching from nvidia to intel. I can live with that, but that's a problem I didn't have in 19.04.

Advices on this problem are welcome ! Either prevent the dGPU from disappearing from the hardware list, or a method to have it detected again by the system, without rebooting that is.

Fwiw, my iGPU is Intel HD Graphics 4600, and my dGPU is an Nvidia GTX 880M.

EDIT : @Syfer Polski, thanks for your informative reply !

I noticed there was an on-demand profile, but I discarded it as likely some useless attempt, as I had read not so long ago that a truly working Optimus implementation would not come anytime soon... I should have read that readme !

So I immediately tried that on-demand profile. At first it didn't work since I had the 430 driver that doesn't support it. There should have been some driver check refusing to enable the profile for people who are not running a supporting version, and I suspect that's why my system crashed, because that on-demand profile was automatically activated during the upgrade (only assuming, I didn't check at the time).

Anyway... so I installed the 435 driver and indeed the on-demand profile works. However, I don't find it satisfying enough, since my GPU isn't powered off when it's not used, and trying to power if off myself doesn't work. I tried powering it off via a direct ACPI call, and indeed it powered off but :

NVRM: GPU at PCI:0000:01:00: GPU-9b8a3387-4913-0c33-619e-da118e532a5f
NVRM: Xid (PCI:0000:01:00): 79, pid=29013, GPU has fallen off the bus.
NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.

So, unfortunately for me, as long as the proprietary drivers are unable to power off my dGPU when it's not used, I guess I'll stick with the classic intel/nvidia profiles system.

Which brings me back to my original question, when I boot with the intel mode enabled : how can I get my dGPU back without rebooting ?

A rescan (echo 1 >/sys/bus/pci/rescan) shows it in the logs :

pci 0000:01:00.0: [10de:1198] type 00 class 0x030000
pci 0000:01:00.0: reg 0x10: [mem 0xf6000000-0xf6ffffff]
pci 0000:01:00.0: reg 0x14: [mem 0xe0000000-0xefffffff 64bit pref]
pci 0000:01:00.0: reg 0x1c: [mem 0xf0000000-0xf1ffffff 64bit pref]
pci 0000:01:00.0: reg 0x24: [io  0xe000-0xe07f]
pci 0000:01:00.0: reg 0x30: [mem 0xf7000000-0xf707ffff pref]
pci 0000:01:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:01.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none

But lspci remains silent. I can power the device on/off at will through ACPI calls, the kernel shows it upon rescan, but it's not detected by drivers that consequently won't load. There must be something to do, but what ?

2 Answers

Voted

Syfer Polski · Answer 1 · 2019-11-05T12:34:36+08:00

nvidia-prime has changed yet again between Ubuntu 19.04 and 19.10.

Between Ubuntu 16.04 and Ubuntu 18.04, Ubuntu used bbswitch, a community built kernel module, to turn off the Nvidia GPU in Optimus laptops. However, the module stopped being maintained, and so in Ubuntu 18.10(since backported to Ubuntu 18.04), switching between GPUs was handled by loading the open-source nouveau drivers. However, this didn't fully switch off the GPU(it was still using ~2W).

Simultaneously, Nvidia was finally working on coexisting with other GPU drivers. GLVND(Graphics Library Vendor Neutral Display) became a thing in Xorg 1.20 and allowed multiple GPU drivers to be loaded and powering a display server. This allows granular control - each application can use separate drivers. In practice, it's almost always about Intel and Nvidia GPUs in Optimus laptops. There are now three modes prime-select lets you choose from:

intel
on-demand
nvidia

intel mode physically turns the Nvidia GPU off, saving additional power, but requires a reboot to turn it on, and not just a log out. nvidia is the reverse.

For people who switch modes frequently, the on-demand mode is recommended - in on-demand, the GPU used to draw a program is determined by environment variables. There are different environment variables for OpenGL and Vulkan applications, and if they're not set, the integrated(Intel) GPU is used. See Nvidia's README for full explanation on the environment variables involved (__NV_PRIME_RENDER_OFFLOAD, __GLX_VENDOR_LIBRARY_NAME and __VK_LAYER_NV_optimus)

Depending on which series of drivers support your GPU, the on-demand profile might not work for you - the oldest supported driver appears to be the 435 series.

NovHak · Answer 2 · 2019-11-07T10:37:53+08:00

Solution found ! I wasn't doing some things in order. So the procedure to have the dGPU back is :

Set the profile to either nvidia or on-demand (if supported by your driver) : prime-select {nvidia|on-demand}
Turn the dGPU on. The BIOS usually turns it on at boot, hence there should be no problem here. If you turned it off in the meantime, I'm assuming you know how to turn it back on. In case it stays off for some other reason, you can try your luck with apt install acpi-call-dkms. You will find useful examples in /usr/share/doc/acpi-call-dkms/examples. Handle with care, as it can crash your system badly ! In my case the following ACPI call turns my dGPU on : \_SB_.PCI0.PEG0.PEGP._ON. I give mine as an example, yours may very well not be the same. Don't forget to escape the backslash if you have any.
Rescan your PCI bus : echo 1 >/sys/bus/pci/rescan. It may be enough to rescan only part of the bus though.
(may be optional) Load the nvidia module : modprobe nvidia

WARNING : Don't power off your GPU with a direct ACPI call unless you're certain it's not bound to any driver (put more simply, the nvidia module should be unloaded), or the driver will crash (crash example given in the question).

As long as it's loaded, it's the driver that drives the GPU, and you taking the steering wheel by surprise won't generally do much good.

However, the Nvidia driver has a power management feature that's off by default, but it can be activated by passing the following parameter to the nvidia module : NVreg_DynamicPowerManagement=0x01. Unfortunately, it works for Turing and newer GPUs only (i.e. not my Kepler)... Taken from /usr/src/nvidia-435.21/nvidia/nv-reg.h :

/*
 * Option: DynamicPowerManagement
 *
 * This option controls how aggressively the NVIDIA kernel module will manage
 * GPU power through kernel interfaces.
 *
 * Possible Values:
 *
 *  0: Never allow the GPU to be powered down (default).
 *  1: Power down the GPU when it is not initialized.

 *  2: Power down the GPU after it has been inactive for some time.

 */

My Nvidia dGPU isn't detected after updating to 19.10

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?