I recently built 2 identical intel i7 Ubuntu 10.04 32 machines and have a problem with my desktop or gnome session disappearing.
For example, if I leave the machine sitting at the desktop (actually I am pretty sure even if I leave it at the login screen the same issue happens), at some point (1-3 days maybe) when I come back to it, it will have only the color pattern of the desktop. No icons, no menu bars, nothing else. Clicking, moving the mouse, pressing keys on the keyboard all do nothing.
Accessing the machine through vnc or team viewer I see the same thing!
A couple times now I have also experienced the machine seems very slow or mostly unresponsive. A couple times I was still able to ssh in to the machine and reboot it. Once I could not even ssh in and I had to power cycle it.
Beyond the base OS install which was a fresh install, I have backuppc, crashplan, teamviewer, tightvncserver, open ssh server and I set up a mdadm software raid 5 with 3 drives. I did also enable the pae (?) so ubuntu 32 would see the full 4 gigs of ram.
Note that I have a basically identical 3rd older machine with no problems. The only real difference I believe is the hardware - it's an intel i5 machine.
Any thoughts on what I should try or look at the next time it freezes up? I looked through my dmesg log and nothing in there I could see.
Update: I updated all the machines to ubuntu 11 last week to see if that helps. Unfortunately this morning I turned on the monitor and I have an almost blank desktop. No menu bars or clock - only thing I have are the two icons that are normally on my desktop. Alt-F2 does not seem to do anything. The machine is responsive in other ways though - I can access the samba server and ssh, but the gnome desktop is no good.
This time I do see lots of strage info when I type 'dmesg' - it's just a small sample - how do I look at all of it?:
[425794.900377] CPU 7: hi: 186, btch: 31 usd: 0
[425794.900382] active_anon:747275 inactive_anon:207413 isolated_anon:0
[425794.900383] active_file:349 inactive_file:618 isolated_file:0
[425794.900384] unevictable:0 dirty:0 writeback:1 unstable:0
[425794.900385] free:28156 slab_reclaimable:4790 slab_unreclaimable:15168
[425794.900387] mapped:194 shmem:311 pagetables:2724 bounce:0
[425794.900393] DMA free:7292kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15800kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:28kB slab_unreclaimable:44kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[425794.900399] lowmem_reserve[]: 0 869 4031 4031
[425794.900407] Normal free:104852kB min:3736kB low:4668kB high:5604kB active_anon:289524kB inactive_anon:289684kB active_file:1316kB inactive_file:1780kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:890008kB mlocked:0kB dirty:0kB writeback:4kB mapped:468kB shmem:400kB slab_reclaimable:19132kB slab_unreclaimable:60628kB kernel_stack:3848kB pagetables:1472kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:4964 all_unreclaimable? yes
[425794.900413] lowmem_reserve[]: 0 0 25299 25299
[425794.900422] HighMem free:480kB min:512kB low:3908kB high:7308kB active_anon:2699576kB inactive_anon:539968kB active_file:80kB inactive_file:692kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3238372kB mlocked:0kB dirty:0kB writeback:0kB mapped:308kB shmem:844kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:9424kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1216 all_unreclaimable? yes
[425794.900427] lowmem_reserve[]: 0 0 0 0
[425794.900431] DMA: 13*4kB 13*8kB 14*16kB 14*32kB 11*64kB 11*128kB 9*256kB 4*512kB 0*1024kB 0*2048kB 0*4096kB = 7292kB
[425794.900441] Normal: 545*4kB 3738*8kB 1954*16kB 421*32kB 130*64kB 42*128kB 34*256kB 7*512kB 2*1024kB 0*2048kB 0*4096kB = 104852kB
[425794.900451] HighMem: 146*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 600kB
[425794.900461] 7359 total pagecache pages
[425794.900463] 6028 pages in swap cache
[425794.900465] Swap cache stats: add 76118, delete 70090, find 8162246/8166421
[425794.900467] Free swap = 0kB
[425794.900469] Total swap = 93180kB
[425794.910862] 1310704 pages RAM
[425794.910865] 1082370 pages HighMem
[425794.910866] 283533 pages reserved
[425794.910868] 1832 pages shared
[425794.910869] 996924 pages non-shared
[425794.910873] Out of memory: kill process 13783 (lshw) score 31039 or a child
[425794.910877] Killed process 13783 (lshw) vsz:124156kB, anon-rss:105456kB, file-rss:0kB
[433508.955028] CE: hpet increased min_delta_ns to 56952 nsec
Update:
I have discovered there may be several issues going on. So far I have solved:
When the machine reboots, it was getting stuck during the shutdown process at/around a line saying automatic updates. I had that turned on, so I turned it off and I think that's better - no longer getting stuck.
On two of the machines I was not using the proprietary nvidia driver. Installing that seems to have made a difference in terms of video looking better, but also I am not getting errors with widgets like I was when it booted up. Hopefully it will also contribute to stability.
I am running CrashPlan for backup. I had the OpenJDK version of Java installed. I noticed yesterday the machine was VERY slow and I killed CrashPlan. It got much faster. So it's either crashplan or java at fault. Based on past bad experiences I have uninstalled OpenJDK and installed the Sun (Oracle?) Java and will see what happens with that.
One of the machines I had run the Ubuntu 11 update. However I guess it never completed, because I checked for updates and noticed it said I could update to ubuntu 11! In trying to update software I was getting all kinds of errors. I have a feeling the system may have been in some strange in-between state of software versions. I am finishing up updates and upgrade to unbntu 11 to see if that cleans up a lot. This is the machine that threw all those strange errors above.
This line is alarming:
Either you're having too many programs open (crons that do not finish, VNC clients that do not shutdown) or you're experiencing a serious memory leak.
If you install the
htop
program (sudo apt-get install htop
), you can get a quick overview of programs that consume most of the memory. Starthtop
and press F6 to set the column on which the list should be sorted. Use your arrow keys to selectMEM%
and press Enter.You might see a lot of duplicate processes (which are actually threads sharing memory), to hide those, press F2. Navigate to the Display options menu by using Arrow Down and jump to the next field by using Arrow Right. Check *Hide userland threads" and press F10 to save this preference.