I've got a Mac Pro running Mac OS X 10.6.4 Snow Leopard Server and it's recently started getting numerous 'kNetworkError's in Server Admin.app when viewing services. It's acting as a gateway w/NAT and has been so for quite some time.
There is one glaring issue, bootpd
crashes all the time with the following errors in `/var/log/system.log/:
Aug 12 16:54:59 servername bootpd[3572]: server starting
Aug 12 16:54:59 servername bootpd[3572]: server name servername.domain.tld
Aug 12 16:54:59 servername bootpd[3572]: interface en0: ip 10.0.1.9 mask 255.255.255.0
Aug 12 16:54:59 servername bootpd[3572]: bsdpd: re-reading configuration
Aug 12 16:54:59 servername bootpd[3572]: bsdpd: shadow file size will be set to 48 megabytes
Aug 12 16:54:59 servername bootpd[3572]: bsdpd: age time 00:15:00
Aug 12 16:54:59 servername bootpd[3572]: [3572] detected buffer overflow
Aug 12 16:54:59 servername com.apple.launchd[1] (com.apple.bootpd[3572]): Job appears to have crashed: Abort trap
Aug 12 16:54:59 servername com.apple.ReportCrash.Root[3571]: 2010-08-12 16:54:59.828 ReportCrash[3571:2807] Saved crash report for bootpd[3572] version ??? (???) to /Library/Logs/DiagnosticReports/bootpd_2010-08-12-165459_localhost.crash
It is correctly configured to serve DHCP through en1 (not en0), the "LAN" port. This happens even with no hardware (even switches) connected to the "LAN" port. There are no DHCP clients listed. Oddly, the "Overview" shows 1 static map, but nothing is listed under "Static Maps" and there are no "Computers" in Open Directory. /var/db/dhcp_leases
is empty.
/Library/Logs/DiagnosticReports/bootpd_2010-08-12-165459_localhost.crash
is as follows:
Process: bootpd [3572]
Path: /usr/libexec/bootpd
Identifier: bootpd
Version: ??? (???)
Code Type: X86-64 (Native)
Parent Process: launchd [1]
Date/Time: 2010-08-12 16:54:59.713 -0400
OS Version: Mac OS X Server 10.6.4 (10F569)
Report Version: 6
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Application Specific Information:
__abort() called
Thread 0 Crashed: Dispatch queue: com.apple.main-thread
0 libSystem.B.dylib 0x00007fff803c13d6 __kill + 10
1 libSystem.B.dylib 0x00007fff80461913 __abort + 103
2 libSystem.B.dylib 0x00007fff80456157 mach_msg_receive + 0
3 libSystem.B.dylib 0x00007fff803b92cf __strncpy_chk + 14
4 bootpd 0x0000000100014e5d PLCache_read + 782
5 bootpd 0x0000000100004a3d BSDPClients_init + 68
6 bootpd 0x00000001000053b5 bsdp_init + 2396
7 bootpd 0x000000010000200b S_update_services + 1228
8 bootpd 0x0000000100002344 S_server_loop + 571
9 bootpd 0x0000000100003963 main + 1766
10 bootpd 0x0000000100000984 start + 52
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000000 rbx: 0x00007fff5fbfe220 rcx: 0x00007fff5fbfe218 rdx: 0x0000000000000000
rdi: 0x0000000000000df4 rsi: 0x0000000000000006 rbp: 0x00007fff5fbfe240 rsp: 0x00007fff5fbfe218
r8: 0x0000000000000001 r9: 0x0000000100114280 r10: 0x00007fff803bd412 r11: 0xffffff80002e1680
r12: 0xffffffffffffffff r13: 0x00007fff5fbfe330 r14: 0x00007fff5fbfe33b r15: 0x00007fff7009bec0
rip: 0x00007fff803c13d6 rfl: 0x0000000000000202 cr2: 0x000000010004c000
Any thoughts or suggestions as to resolving this?
Okay, solution found.
I googled 'PLCache_read' (the last function listed in
/Library/Logs/DiagnosticReports/bootpd_2010-08-12-165459_localhost.crash
as having been run bybootpd
before the buffer overflow) and the second hit was in Apple's source forbootpd
(bsdpd.c, specificall).BSDP_CLIENTS_FILE()
is passed theBSDP_CLIENTS_FILE
constant, which, looking at the top of the file, is hardcoded as/var/db/bsdpd_clients
.Checking,
/var/db/bsdpd_clients
, I found a pseudo-plist containing all the NetBoot clients (remember, NetBoot is built upon bootp) and—sure enough!—the last entry was cut off as follows, leaving the file incomplete:Stopped
bootpd
(sudo serveradmin stop dhcp
), backed up/var/db/bsdpd_clients
& emptied it, then startedbootpd
(sudo serveradmin start dhcp
) and no crashing!After a reboot, all the other related services (incl. NetBoot) are now back up and Server Admin.app is no longer throwing the 'kNetworkError's.
Hmm... The crash log shows that bootpd is running a function called PLCache_read which is copying a string and that, somehow, is causing the buffer overflow. (Incidentally, it looks like the source to bootpd is available here.)
My guess is that bootpd is reading a bad config file or getting bad data over the network. I would try running:
and see if that gives any clue as to the source of the problem.
It is clear that someone else had this problem, but, not being registered, I have no idea if they got a useful answer. Moving /etc/bootpd.plist might help.
Ah, you've found an answer while I was typing this. Well, I'll post this answer anyways; perhaps it will be useful to someone else.
I just solved the exact same problem with slight differences.
I'm running 10.6.5 client (not server). Same error messages (or as same as I could see).
PLCache_read was also the culprit, except that I had no /var/db/bsdpd_clients file, and creating one didn't solve the problem.
Googling PLCache_read also lead me to apple code, except in this case it was dhcpd.c, which lead me to the hardcoded variable
#define DHCP_LEASES_FILE "/var/db/dhcpd_leases"
and lo and behold /var/db/dhcpd_leases looked to be full of garbage. I moved it to a temporary filename and now internet sharing works just fine.
Morgant, thanks for your in-depth solution. I learned something about how to read crash logs!