EDIT: see updates at the end for solution, title changed to reflect better the problem.
I have Ubuntu 22.04 LTS on a system with a Geforce RTX 2060 card. I have recently done some small changes in hardware (changing the graphics card from one PCI slot to another, and days after that installing some case fan), and after the latest change I am finding that the graphic output of the system randomly dies, not much time after booting. Booting is apparently fine, I can log in, start opening my browser, terminals etc as usual, and then the screen turns blue just like when there is no signal. Any attempt to open a terminal (Ctrl+Alt+F3, Ctrl+Alt+F1...) is useless, and I can only do a Alt+SysRq+REISUB to reboot the system.
Looking at the system/kernel logs, it seems that the problems start with this:
kernel: [ 1531.539086] xhci_hcd 0000:0c:00.2: Unable to change power state from D3hot to D0, device inaccessible
kernel: [ 1531.539241] nouveau 0000:0c:00.0: timer: stalled at ffffffffffffffff
kernel: [ 1531.539244] ------------[ cut here ]------------
kernel: [ 1531.539245] nouveau 0000:0c:00.0: timeout
And later some lines like
kernel: [ 1531.599952] xhci_hcd 0000:0c:00.2: Unable to change power state from D3cold to D0, device inaccessible
kernel: [ 1531.599959] xhci_hcd 0000:0c:00.2: Controller not ready at resume -19
kernel: [ 1531.599961] xhci_hcd 0000:0c:00.2: PCI post-resume error -19!
kernel: [ 1531.599962] xhci_hcd 0000:0c:00.2: HC died; cleaning up
I have tried browsing for those messages and read that some people gets some issues after changing the card from one PCI slot to another (which I find surprising), but the funny thing is that I did change the graphics card of PCI slot about one week ago, and during this week everything was fine, and it has been only today after powering off to add a case fan and rebooting (fan was an Arctic P14 slim PWM PST, connected to an Arctic P12 PWM PST that was already installed, and that to CHA_FAN1 on the mobo which is an Asus ROG Strix X570-e) that I am having these issues.
So, I do not know if the issue is the changes in the hardware that are creating conflicts, or else whether there has been some update of the nouveau drivers that has kicked in after the last boot (I take lots of time from one boot to another, so I would have detected that only now).
Someone has some idea of what is the issue, or what should I look for in the logs to better pinpoint the problem? Thanks a lot!
** UPDATE: just tried putting back the graphics card to the previous PCI slot, and the problem appears again. So I guess that it must be something related with some recent drivers update or something like that. Anyone has some idea?
** UPDATE 2: As said in the comment to the answer by kanehekili, I think I know now the origin of the problem. The card was originally in a x16 slot, and then I changed it to another slot that is admits a x16 card but actually is a x8 slot. The documentation of the mobo very misleadingly labels the slots as PCIEX16_1 and PCIEX16_2, omitting the fact that the second slot is actually only x8. Then, surely this change triggered some issue with the drivers that persisted even after putting back the card to the x16. The problem was finally solved by installing the Nvidia "driver metapackage from nvidia-driver-530 (proprietary)" with the GUI "additional drivers" menu. I note that trying the first driver option in the menu, which is the "-open" version of 530, still gave some issues as the system would not fully recognize the card (e.g. output of nvidia-smi in terminal would give "no devices were found"). Now, apparently everything is fine again. I mark the issue as solved.