This is my first post. I will try to give as much information about the issue as possible. The post turned out to be a little long, but it is quite detailed.
System specs:
Processor: i5-4690k
Gpu: MSI R9 290x
Ram: 4 GB DDR3
Storage: 500 GB SSD
Monitor: Benq XL2411
I am using a dual boot with Windows, and I usually play the game on Windows.The game runs at ~144 FPS with no issues whatsoever.
However, recently I thought about using Linux a little more, and decided to install Ubuntu 18.04. I installed the AMDGPU-PRO driver from AMD. I followed the installation instructions thoroughly, and made sure it is installed correctly.
When I started up Dota 2 it worked fine, until I entered a public game, after which I suddenly lost display signal either after the game starts or at some arbitrary time. Sometimes the signal loss duration is 5 minutes into the game, sometimes 20, sometimes 40. When I lose display signal, sound still plays on the background for a few seconds, but it stops or starts looping. My computer is still on, because I can hear the fans and see the LEDs. The power button does not work, unless I hold it down for 5 seconds (hard shutdown). Upon restarting, the game runs perfectly fine if I boot windows. If I boot Ubuntu, it will last for another 5-10 (sometimes more) minutes before the aforementioned sequence of events is repeated.
The issue persists no matter whether I choose to login to Ubuntu Wayland or Xorg, or whether I have processes running in the background.
I decided to check the GPU temperature using lm-sensors
. I installed it, ran sensor detect, and ran the sensors command. While idle on the desktop, I got the following output:
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx: +1.00 V
fan1: N/A (min = 0 RPM, max = 0 RPM)
edge: +66.0°C (crit = +104000.0°C, hyst = -273.1°C)
power1: 36.11 W (cap = 208.00 W)
The first thing I noticed was that the GPU temp kept rising while idle in desktop, by about 1 degree every few minutes. Also fan1
sensor was not working, for some reason. I ran sensors -u
, and here is the output:
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:
in0_input: 1.000
fan1:
ERROR: Can't get value of subfeature fan1_input: Can't read
fan1_min: 0.000
fan1_max: 0.000
edge:
temp1_input: 65.000
temp1_crit: 104000.000
temp1_crit_hyst: -273.150
power1:
power1_average: 36.113
power1_cap: 208.000
Then I decided to download a script from git that manually sets fanspeed.
Well, I tested it and when I execute it with 100 it does make the fans louder, however I doubt this is really their max, as I've heard them way louder while playing Battlefield 4 on Windows. Then, the sensors
GPU temp starts lowering while idle on desktop.
Now I tried playing Dota 2 with fans set to 100 and monitoring the GPU temperature through sensors
. The temperature would rise considerably, reaching about 82-86 degrees in main menu and 95+ in-game. As soon as it reaches above 96-97, the monitor loses signal, as I previously described. The GPU temperature also rises quite rapidly, 1 degree every few minutes or so. It does not happen anymore at the start of the game with fans running at 100, but the result is essentially the same after some time.
I really wanted to make sure that this is not an issue with the hardware, so I used my Radeon settings program to monitor my GPU's performance during a game of Dota 2 on Windows.
The GPU temperature at the main menu was about 74 degrees, in-game it ranged from 85 to 93 at max, staying at 90-88 most of the time, including during teamfights (which should require a lot more GPU power).
The fan speed is about 2400 RPM on average, with about 2550 peak during gameplay. During idle desktop usage it's more like 1200 RPM. I have run Battlefield 3 and 4 on ultra graphics and had no such issues on Windows. Furthermore GPU utilization is about 30% -50% on Dota 2, with CPU utilization being about 70-80%.
The GPU sensors work perfectly well on Windows and are quite accurate, especially with fan speed.
I looked up at what this amdgpu-pro-fans script does, and found that it essentially accesses the card's directory (/sys/class/drm/card0/device/hwmon/hwmon2
) and gets the value from the file pwm1_max
(which is 255
), calculates the percentage of the input, and writes this new value down in pwm1
. I have no idea why this value is 255
or what it represents. There is also a fan1_enable
file, which contains a value of 1
, and a fan1_input
file, which cannot be opened, because it is of "unknown type". This is probably related to the issue in sensors
. Also, the file temp1_crit
shows 104000000
, while temp1_crit_hyst
shows -273150
. I am pretty sure these values are garbage. They can also be seen in the sensor output above.
hwmon2
is the only folder in hwmon
. I see some people have hwmon3
, but I don't know why. Here is a screenshot of the hwmon2
folder:
I am not 100% sure that GPU overheating is causing my issue, but I think it is very likely.
I have tried reinstalling the AMDGPU-PRO driver at least 5 times. My system and AMDGPU-PRO driver is up to date.
I am running Dota 2 with the same settings as on Windows.
I am using the Vulkan API in Dota 2. On OpenGL the performance is quite noticeably lower, averaging 80 FPS in-game. On Vulkan it more or less matches my Windows performance.
On a final note, the game's FPS do not really drop before the black screen/loss of signal on my monitor. It happens rather instantaneously. There is no lag or anything like that.
Any ideas on what may be causing the issues?
0 Answers