I have read quite a lot of pages about the OOMKiller, including http://www.linuxatemyram.com/ , but I don't quite get still why a backup script I run would be killed if I look at my output of 'free -m':
total used free shared buffers cached
Mem: 8070 7968 102 0 293 6523
-/+ buffers/cache: 1151 6919
Swap: 1983 8 1975
Is this (Fedora 8) server doing something weird or am I misunderstanding something..?
This is the vmstat output of running the script, it is killed around 5/6 lines before the end:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 8 8312 40360 54592 6971548 0 0 716 204 669 1909 21 10 0 69 0
1 8 8312 40384 54592 6971616 0 0 752 10764 855 4284 37 24 1 37 0
2 9 8312 41068 54584 6970236 0 0 664 3752 896 3953 28 20 5 47 0
1 7 8312 40308 54588 6970180 0 0 484 2996 871 2058 21 10 9 59 0
1 8 8312 46228 54588 6963540 0 0 536 13372 531 2154 16 11 6 67 0
2 7 8312 44544 54600 6964004 0 0 340 18452 486 1726 24 14 21 41 0
3 7 8312 40056 54600 6968660 0 0 388 17136 591 1876 44 21 2 32 0
0 8 8312 40324 54612 6967248 0 0 388 14176 715 2146 32 18 1 50 0
2 9 8312 43464 54632 6964352 0 0 412 14080 788 4040 29 23 0 49 0
4 5 8312 41672 54688 6968836 0 0 260 9188 989 3258 63 32 0 4 0
2 6 8312 41376 54764 6969472 0 0 260 6244 587 2255 44 21 0 35 0
3 8 8312 42392 54740 6968180 0 0 228 5316 778 2869 24 17 0 60 0
1 19 8312 40568 54776 6957660 0 0 268 8844 467 3183 26 18 0 55 0
0 9 8312 43008 54788 6955548 0 0 20 19560 721 1036 4 3 17 75 0
1 10 8312 46232 54776 6951812 0 0 64 13608 675 1005 24 8 7 61 0
0 7 8312 47780 54776 6950692 0 0 40 14756 635 866 24 11 11 54 0
1 8 8312 43508 54796 6954844 0 0 128 33140 569 2676 29 19 12 39 0
3 7 8312 43976 54816 6954968 0 0 68 13168 854 2362 39 23 0 38 0
1 8 8312 43924 54824 6954784 0 0 84 9704 901 1866 11 6 0 82 0
3 7 8312 43872 54836 6954844 0 0 72 17812 882 1768 44 19 0 37 0
0 8 8312 43616 54836 6955068 0 0 172 15148 836 1247 36 11 0 53 0
0 9 8312 47708 54844 6950472 0 0 96 15020 556 938 22 5 0 72 0
1 8 8312 48608 54844 6950628 0 0 100 9304 637 1010 31 11 0 58 0
0 9 8312 48816 54844 6950764 0 0 120 15008 814 1192 35 12 0 53 0
4 7 8312 45604 54864 6949256 0 0 80 8604 654 1474 9 5 0 85 0
1 9 8312 41816 54892 6959464 0 0 188 11436 586 2206 26 13 0 61 0
1 9 8312 41584 54856 6956432 0 0 92 7364 763 1712 24 11 21 44 0
3 9 8312 41264 54844 6956360 0 0 88 4524 718 2172 27 16 12 46 0
2 9 8312 44600 54868 6953664 0 0 152 3960 630 2338 21 13 16 50 0
4 6 8312 40388 54860 6957156 0 0 156 4800 646 2789 29 16 6 49 0
4 6 8312 40756 54840 6956224 0 0 104 19644 399 2349 26 15 3 56 0
0 8 8312 40160 54896 6956332 0 0 1488 12436 822 2158 57 23 3 17 0
output of 'ulimit -a':
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 69632
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 69632
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I found out, much to my embarrassment, that a cron task was killing it. The cron had a
killall -9 perl
in it which killed it...Thanks to the people trying to help out on this issue!
Not knowing anything about your setup, here's a wild guess:
Your backup script encounters broken symbolic links, which will point back to themselves, and the script continues to chase those links ad infinitum. Your script builds a file list and since the file list would just grow and grow, it will eat all the RAM at some point.
Also, Fedora 8 is seriously antiquated.