How do you send the output of an upstart script to a terminal so to find tracebacks in python code? It's taking me for ever to do things without trace-backs that used to take just a second. I'm having to place several file write calls to track down errors. What took second to find before with a traceback is turning in to several minutes minutes. This is miserable. This has been going on for a few weeks now and I'm sick of it. would some speak up on this please. I feel like I'm using assembly without a debugger again.
I am fairly new to Ubuntu and I was wonder what a good way is to debug when something crashes?
For example: I installed 11.04 and the default media player banshee makes my laptop crash after a few minutes of playing music. Since the system completely freezes and there is not error message it's hard to find out what is wrong.
Second problem I have is that sometimes the system crashes to login. I am suspecting it to be a Chromium or Flash but I am not 100% sure.
So anybody got some tips on how to debug stuff like this?
I'm hitting a problem whereby X prevents processes from creating windows, uttering something like the following into ~/.xsession-errors
:
cannot open display: :0.0
Maximum number of clients reached
Searching around there are lots of examples of people facing this problem, and sometimes people identify which program they are running is using up all the client slots. See e.g. LP 70872 (Firefox), LP 263211 (gnome-screensaver).
For what it's worth, I run gnome-terminal, thunderbird, chromium-browser, empathy, tomboy and virtualbox nearly all the time, on top of the normal stuff you get with the GNOME desktop, and occasionally some other bits and pieces.
However my question is not "which of my programs is causing this problem" but rather, how can one go about diagnosing this problem?
In the above (and other) bugs, forum reports, etc., a number of tools are suggested:
xlsclients
- lists the client applications for the given display, but I don't think that corresponds to 'X clients'xrestop
- a top-style X resources tool, one row per X client. Lots of '' clients, not shown inxlsclients
outputxwininfo -root -children
lists X window objects
From what I can gather, the problem might not be too many clients at all, but rather resources kept around in the X server for clients who have long-since detached. But it would also appear that you cannot (easily?) relate X resources back to their client. Can one effectively diagnose this issue once it has started to occur, or is a tedious divide-and-conquer approach for the apps I run the only approach open to me?
Update Jan 2011: I think I have resolved this issue. For the benefit of anyone stumbling across this, nautilus and/or compiz or something in that chain of software was segfaulting due to a wallpaper I had. I had chosen an XML file as my wallpaper, which defined a rotating gallery of images. It was hand-made, but based on /usr/share/backgrounds/contest/background-1.xml or similar. Disabling the wallpaper and I have not had a crash since.
I'm not marking this as answered yet, since the actual specific problem was not my question, but how to diagnose it was. Unfortunately this was mostly trial-and-error which sucks.
The system is a spare Dell 2400 I wiped clean, with Ubuntu 10.4 installed. Update manager has everything current, and I haven't been mucking with drivers or tricky system settings. In fact, it has been a stable and friendly system to install and use.
So imagine my surprise when browsing to http://element-14.com/ (an otherwise useful community site for electronic engineering types) followed a redirect or two, then black screen, then the I'm starting up tune with the pink hazy smoke and nothing further works. The keyboard is crashed hard, and the Alt-SysRq key combos do nothing.
More than just firefox and the X server are crashing. I repeated the crash with an SSH session open, and not only did the connection get taken down, but it no longer responded to attempts to get a fresh connection.
I tried enabling Apport, in hopes that it would notice something and help identify the culprit, but it seems to be oblivious to the crash.
Each time, I've had to lean on the power button to reboot.
Google searches hint that there are issues with the particular intel chipset providing the VGA on its motherboard.
I'm looking for advice about how to proceed with debugging this kind of crash. Any ideas?
Update: I tried following advice to try setting up the netconsole
kernel module and a matching netcat instance to receive the log. I set up netcat on my XP box, used Alt-SysRq-S to verify it could receive kernel messages, then browsed to the site. Only two printk()
s were logged:
[251728.009794] i915: Unknown parameter `modset' [251728.051420] i915: Unknown parameter `modset'
Hmm. Perhaps my video driver is misconfigured? Especially since I see these same messages in the output of dmesg
just after booting.
At least this time I explicitly synced my disks before deliberately crashing the system.
For the record, lspci -nn | grep VGA
says:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device [8086:2562] (rev 01)
Update: Solved!!!
The hint to use netconsole
led to an epiphany. Googling around the phrase "i915 unknown parameter modset" suddenly led me to trip over the root cause.
The name of the option to the i915 driver is modeset not modset.
I changed /etc/modprobe.d/i915.conf to have the correct spelling, rebooted, and now I can access element-14 (and presumably other sites that do whatever it is that element-14 does that triggers the bug in the video driver) without an unpleasant forced reboot.
This leaves behind the (apparently well known) issue that the i915 driver lacks quality, especially on older chipsets. Apparently the Kernel Mode Setting feature is particularly deficient. Without the option spelled correctly, it defaulted to KMS enabled, and also crashed. With it spelled correctly, KMS is disabled, and the driver survives whatever content was triggering the crash.
Also, there are a number of bug pages at launchpad and other community sites that have the wrong spelling of the option name. I strongly suspect that is where I got the spelling I used.
Edit: I've copied the relevant solution to an actual answer, and improved my description of it here.
I have a couple of cron jobs that sometimes produce error output and would like to get a notification in my "real" email account, since I don't use my user's mailbox in my Ubuntu laptop, but cron (or is it postfix maybe) keeps trying to email the local root account.
I know I can add the MAILTO variable to the crontab:
ricardo@ricardo-laptop:~$ sudo crontab -l
[email protected]
# m h dom mon dow command
*/5 * * * * /home/ricardo/mrtg/cfg/run.sh
But it doesn't seem to pay any attention to it
I also tried adding my email to the /etc/aliases
file and running newaliases
ricardo@ricardo-laptop:~$ cat /etc/aliases
# See man 5 aliases for format
postmaster: root
root: [email protected]
ricardo: [email protected]
still, whenever cron wants to send an email it's still sending it to [email protected]
:
ricardo@ricardo-laptop:/var/log$ tail mail.log
Aug 3 16:25:01 ricardo-laptop postfix/pickup[2002]: D985B310: uid=0 from=<root>
Aug 3 16:25:01 ricardo-laptop postfix/cleanup[4117]: D985B310: message-id=<20100803192501.D985B310@ricardo-laptop>
Aug 3 16:25:01 ricardo-laptop postfix/qmgr[2003]: D985B310: from=<[email protected]>, size=762, nrcpt=1 (queue active)
Aug 3 16:25:03 ricardo-laptop postfix/smtp[4120]: D985B310: to=<[email protected]>, orig_to=<root>, relay=smtp.gmail.com[74.125.157.109]:25, delay=1.5, delays=0.38/0.02/0.9/0.18, dsn=5.7.0, status=bounced (host smtp.gmail.com[74.125.157.109] said: 530 5.7.0 Must issue a STARTTLS command first. d1sm12275173anc.19 (in reply to MAIL FROM command))
Any suggestions? I'm running Ubuntu 10.04, with everything up-to-date