Tombart's questions -server

Tombart

Asked: 2024-09-25 17:39:31 +0800 CST

Terribly slow I/O on NVMe mdadm RAID array

9

I have a AMD EPYC 7502P 32-Core Linux server (kernel 6.10.6) with 6 NVMe drives, where suddenly I/O performance dropped. All operations takes too much time. Installing package updates takes hours instead of seconds (maybe minutes).

I've tried running fio on filesystem with RAID5. There's a huge difference in clat metric:

    clat (nsec): min=190, max=359716k, avg=16112.91, stdev=592031.05

stdev value is extreme.

full output:

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][w=53.3MiB/s][w=13.6k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=48391: Wed Sep 25 09:17:02 2024
  write: IOPS=45.5k, BW=178MiB/s (186MB/s)(10.6GiB/61165msec); 0 zone resets
    slat (nsec): min=552, max=123137, avg=2016.89, stdev=468.03
    clat (nsec): min=190, max=359716k, avg=16112.91, stdev=592031.05
     lat (usec): min=10, max=359716, avg=18.13, stdev=592.03
    clat percentiles (usec):
     |  1.00th=[   11],  5.00th=[   12], 10.00th=[   14], 20.00th=[   15],
     | 30.00th=[   15], 40.00th=[   15], 50.00th=[   15], 60.00th=[   16],
     | 70.00th=[   16], 80.00th=[   16], 90.00th=[   17], 95.00th=[   18],
     | 99.00th=[   20], 99.50th=[   22], 99.90th=[   42], 99.95th=[  119],
     | 99.99th=[  186]
   bw (  KiB/s): min=42592, max=290232, per=100.00%, avg=209653.41, stdev=46502.99, samples=105
   iops        : min=10648, max=72558, avg=52413.32, stdev=11625.75, samples=105
  lat (nsec)   : 250=0.01%, 500=0.01%, 1000=0.01%
  lat (usec)   : 10=0.01%, 20=99.15%, 50=0.76%, 100=0.03%, 250=0.06%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 500=0.01%
  cpu          : usr=12.62%, sys=30.97%, ctx=2800981, majf=0, minf=28
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2784519,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=178MiB/s (186MB/s), 178MiB/s-178MiB/s (186MB/s-186MB/s), io=10.6GiB (11.4GB), run=61165-61165msec

Disk stats (read/write):
    md1: ios=0/710496, merge=0/0, ticks=0/12788992, in_queue=12788992, util=23.31%, aggrios=319833/649980, aggrmerge=0/0, aggrticks=118293/136983, aggrin_queue=255276, aggrutil=14.78%
  nvme1n1: ios=318781/638009, merge=0/0, ticks=118546/131154, in_queue=249701, util=14.71%
  nvme5n1: ios=321508/659460, merge=0/0, ticks=118683/138996, in_queue=257679, util=14.77%
  nvme2n1: ios=320523/647922, merge=0/0, ticks=120634/134284, in_queue=254918, util=14.71%
  nvme3n1: ios=320809/651642, merge=0/0, ticks=118823/135985, in_queue=254808, util=14.73%
  nvme0n1: ios=316267/642934, merge=0/0, ticks=116772/143909, in_queue=260681, util=14.75%
  nvme4n1: ios=321110/659918, merge=0/0, ticks=116300/137570, in_queue=253870, util=14.78%

Probably one disk is faulty, is there a way how to determine the slow disk?

All disks have similar SMART attributes, nothing outstanding. SAMSUNG 7T:

Model Number:                       SAMSUNG MZQL27T6HBLA-00A07
Firmware Version:                   GDC5902Q
Data Units Read:                    2,121,457,831 [1.08 PB]
Data Units Written:                 939,728,748 [481 TB]
Controller Busy Time:               40,224
Power Cycles:                       5
Power On Hours:                     6,913

write performance appears to be very similar:

iostat -xh
Linux 6.10.6+bpo-amd64 (ts01b)  25/09/24        _x86_64_        (64 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.0%    0.0%    4.3%    0.6%    0.0%   90.2%

     r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz Device
    0.12      7.3k     0.00   0.0%    0.43    62.9k md0
 6461.73    548.7M     0.00   0.0%    0.22    87.0k md1
 3583.93     99.9M     9.60   0.3%    1.13    28.5k nvme0n1
 3562.77     98.9M     0.80   0.0%    1.15    28.4k nvme1n1
 3584.54     99.8M     9.74   0.3%    1.18    28.5k nvme2n1
 3565.96     98.8M     1.06   0.0%    1.16    28.4k nvme3n1
 3585.04     99.9M     9.78   0.3%    1.16    28.5k nvme4n1
 3577.56     99.0M     0.86   0.0%    1.17    28.3k nvme5n1

     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     4.0k md0
  366.41    146.5M     0.00   0.0%   14.28   409.4k md1
 8369.26     32.7M     1.18   0.0%    3.73     4.0k nvme0n1
 8364.63     32.7M     1.12   0.0%    3.63     4.0k nvme1n1
 8355.48     32.6M     1.10   0.0%    3.56     4.0k nvme2n1
 8365.23     32.7M     1.10   0.0%    3.46     4.0k nvme3n1
 8365.37     32.7M     1.25   0.0%    3.37     4.0k nvme4n1
 8356.70     32.6M     1.06   0.0%    3.29     4.0k nvme5n1

     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     0.0k md0
    0.00      0.0k     0.00   0.0%    0.00     0.0k md1
    0.00      0.0k     0.00   0.0%    0.00     0.0k nvme0n1
    0.00      0.0k     0.00   0.0%    0.00     0.0k nvme1n1
    0.00      0.0k     0.00   0.0%    0.00     0.0k nvme2n1
    0.00      0.0k     0.00   0.0%    0.00     0.0k nvme3n1
    0.00      0.0k     0.00   0.0%    0.00     0.0k nvme4n1
    0.00      0.0k     0.00   0.0%    0.00     0.0k nvme5n1

     f/s f_await  aqu-sz  %util Device
    0.00    0.00    0.00   0.0% md0
    0.00    0.00    6.68  46.8% md1
    0.00    0.00   35.24  14.9% nvme0n1
    0.00    0.00   34.50  14.6% nvme1n1
    0.00    0.00   33.98  14.9% nvme2n1
    0.00    0.00   33.06  14.6% nvme3n1
    0.00    0.00   32.33  14.8% nvme4n1
    0.00    0.00   31.72  14.6% nvme5n1

sort of problematic appears to be interrupts

$ dstat -tf --int24 60
----system---- -------------------------------interrupts------------------------------
     time     | 120   128   165   199   213   342   LOC   PMI   IWI   RES   CAL   TLB 
25-09 10:53:45|2602  2620  2688  2695  2649  2725   136k   36  1245  2739   167k  795 
25-09 10:54:45|  64    64    65    64    66    65  2235     1    26    16  2156     3 
25-09 10:55:45|  33    31    32    32    32    30  2050     1    24    10  2162    20 
25-09 10:56:45|  31    31    30    35    30    33  2303     1    26    63  2245     9 
25-09 10:57:45|  36    29    27    34    35    35  2016     1    23    72  2645    10 
25-09 10:58:45|   9     8     9     8     7     8  1766     0    27     4  1892    15 
25-09 10:59:45|  59    62    59    58    60    60  1585     1    22    20  1704     9 
25-09 11:00:45|  25    21    21    26    26    26  1605     0    26    10  1862    10 
25-09 11:01:45|  34    32    32    33    36    31  1515     0    23    24  1948    10 
25-09 11:02:45|  21    23    23    25    22    24  1772     0    27    27  1781     9

the fields with increased interrupts are mapped to 9-edge to all drives nvme[0-5]q9, e.g.:

$ cat /proc/interrupts | grep 120:
IR-PCI-MSIX-0000:01:00.0    9-edge      nvme2q9

EDIT: The 9-edge is probably Metadisk (Software RAID) devices.

Tombart

Asked: 2021-10-01 00:48:22 +0800 CST

How to generate certificates for (secondary) compile puppetserver?

0

I'm trying to scale puppetserver, in order to have redundancy, using round robin DNS. The secondary puppetserver (version 7.4.0) is configured to use the CA authority from primary puppetserver:

/etc/puppetlabs/puppet/puppet.conf:

[main]
ca_name = Puppet CA: puppet-ca-master.company.com
ca_server = puppet-ca-master.company.com
[agent]
server = puppet-ca-master.company.com
runinterval=1800

On the secondary server I've disabled CA service, as there could be only single certificate authority in /etc/puppetlabs/puppetserver/services.d/ca.cfg:

# To enable the CA service, leave the following line uncommented
# puppetlabs.services.ca.certificate-authority-service/certificate-authority-service
# To disable the CA service, comment out the above line and uncomment the line below
puppetlabs.services.ca.certificate-authority-disabled-service/certificate-authority-disabled-service
puppetlabs.trapperkeeper.services.watcher.filesystem-watch-service/filesystem-watch-service

I've removed certificates from the secondary, in order to fetch certificate signed certificate from the CA master:

rm -rf /etc/puppetlabs/puppet/ssl && mkdir -p /etc/puppetlabs/puppet/ssl/certs
chmod 0700 /etc/puppetlabs/puppet/ssl
chown -R puppet /etc/puppetlabs/puppet/ssl

However the puppetserver service refuses to start because of missing certificate:

2021-09-30T09:06:18.220+02:00 ERROR [async-dispatch-2] [p.t.internal] Error during service start!!!
java.lang.IllegalArgumentException: Unable to open 'ssl-cert' file: /etc/puppetlabs/puppet/ssl/certs/secondary-puppetserver.company.com.pem

When I try to run puppet agent -t on the secondary puppetserver it fails to sign the certificate:

Couldn't fetch certificate from CA server; you might still need to sign this agent's certificate (secondary-puppetserver.company.com)

Moreover the private key gets generated, but not a public one:

ll /etc/puppetlabs/puppet/ssl/public_keys/
total 0

Tombart

Asked: 2021-09-07 13:29:58 +0800 CST

How to debug PostgreSQL segmentation fault?

3

I have a PostgreSQL 13 instance that keeps crashing:

LOG:  server process (PID 10722) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: COMMIT
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

I've updated /etc/postgresql/13/main/pg_ctl.conf to include core dumps

pg_ctl_options = '--core-files'

and restarted postgresql service. Now it seems to allow core dumps:

$ for f in `pgrep postgres`; do cat /proc/$f/limits | grep core; done
Max core file size        unlimited            unlimited            bytes

gdb backtrace gives following output

$ gdb /usr/lib/postgresql/13/bin/postgres 13/main/core.postgres.12264

Program terminated with signal SIGSEGV, Segmentation fault.
#0  slot_deform_heap_tuple (natts=5, offp=0x557cc2e60720, tuple=<optimized out>, slot=0x557cc2e606d8) at ./build/../src/backend/executor/execTuples.c:930
930     ./build/../src/backend/executor/execTuples.c: No such file or directory.
(gdb) bt
#0  slot_deform_heap_tuple (natts=5, offp=0x557cc2e60720, tuple=<optimized out>, slot=0x557cc2e606d8) at ./build/../src/backend/executor/execTuples.c:930
#1  tts_buffer_heap_getsomeattrs (slot=0x557cc2e606d8, natts=5) at ./build/../src/backend/executor/execTuples.c:695
#2  0x0000557cc1d3998c in slot_getsomeattrs_int (slot=slot@entry=0x557cc2e606d8, attnum=5) at ./build/../src/backend/executor/execTuples.c:1912
#3  0x0000557cc1d28fba in slot_getsomeattrs (attnum=<optimized out>, slot=0x557cc2e606d8) at ./build/../src/include/executor/tuptable.h:344
#4  ExecInterpExpr (state=0x557cc2e620a8, econtext=0x557cc2ea1768, isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:482
#5  0x0000557cc1d5548d in ExecEvalExprSwitchContext (isNull=0x7ffdd2599507, econtext=0x557cc2ea1768, state=0x557cc2e620a8) at ./build/../src/include/executor/executor.h:322
#6  ExecQual (econtext=0x557cc2ea1768, state=0x557cc2e620a8) at ./build/../src/include/executor/executor.h:391
#7  MJFillInner (node=0x557cc2ea1558) at ./build/../src/backend/executor/nodeMergejoin.c:494
#8  0x0000557cc1d55ce8 in ExecMergeJoin (pstate=0x557cc2ea1558) at ./build/../src/backend/executor/nodeMergejoin.c:1353
#9  0x0000557cc1d2cc83 in ExecProcNode (node=0x557cc2ea1558) at ./build/../src/include/executor/executor.h:248
#10 ExecutePlan (execute_once=<optimized out>, dest=0x557cc2e1a630, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x557cc2ea1558, 
    estate=0x557cc2ea12f8) at ./build/../src/backend/executor/execMain.c:1632
#11 standard_ExecutorRun (queryDesc=0x557cc2e1a5a0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:350
#12 0x00007f0ec05ae09d in pgss_ExecutorRun (queryDesc=0x557cc2e1a5a0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1045
#13 0x0000557cc1cdbcd4 in PersistHoldablePortal (portal=portal@entry=0x557cc2d44b78) at ./build/../src/backend/commands/portalcmds.c:407
#14 0x0000557cc1ff95f9 in HoldPortal (portal=portal@entry=0x557cc2d44b78) at ./build/../src/backend/utils/mmgr/portalmem.c:642
#15 0x0000557cc1ff9e7d in PreCommit_Portals (isPrepare=isPrepare@entry=false) at ./build/../src/backend/utils/mmgr/portalmem.c:738
#16 0x0000557cc1c001c4 in CommitTransaction () at ./build/../src/backend/access/transam/xact.c:2087
#17 0x0000557cc1c015d5 in CommitTransactionCommand () at ./build/../src/backend/access/transam/xact.c:3085
#18 0x0000557cc1ea211d in finish_xact_command () at ./build/../src/backend/tcop/postgres.c:2662
#19 0x0000557cc1ea4703 in exec_simple_query (query_string=0x557cc2c9cd28 "COMMIT") at ./build/../src/backend/tcop/postgres.c:1264
#20 0x0000557cc1ea6143 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x557cc2cf6c68, dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4339
#21 0x0000557cc1e25bcd in BackendRun (port=0x557cc2ce94d0, port=0x557cc2ce94d0) at ./build/../src/backend/postmaster/postmaster.c:4526
#22 BackendStartup (port=0x557cc2ce94d0) at ./build/../src/backend/postmaster/postmaster.c:4210
#23 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1739
#24 0x0000557cc1e26b41 in PostmasterMain (argc=5, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1412
#25 0x0000557cc1b70f4f in main (argc=5, argv=0x557cc2c96c30) at ./build/../src/backend/main/main.c:210

Adding log_statement = 'all' to /etc/postgresql/13/main/postgresql.conf doesn't really help, as postmaster terminates all processes immediately and the query doesn't get written to logs.

here's strace output after running COMMIT

[pid 20006] pwrite64(29, "COMMIT", 6, 15936) = 6
[pid 20006] pwrite64(29, "\0", 1, 15942) = 1
[pid 20006] close(29)                   = 0
[pid 20006] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x10} ---
[pid 20006] +++ killed by SIGSEGV (core dumped) +++
<... select resumed> )                  = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=20006, si_uid=108, si_status=SIGSEGV, si_utime=0, si_stime=0} ---
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], WNOHANG, NULL) = 20006
write(2, "2021-09-08 13:38:51.853 UTC [299"..., 198) = 198
write(2, "2021-09-08 13:38:51.853 UTC [299"..., 88) = 88
kill(19324, SIGQUIT)                    = 0
kill(-19324, SIGQUIT)                   = 0
kill(19331, SIGQUIT)                    = 0
kill(-19331, SIGQUIT)                   = 0
kill(19320, SIGQUIT)                    = 0
kill(-19320, SIGQUIT)                   = 0
kill(19319, SIGQUIT)                    = 0
kill(-19319, SIGQUIT)                   = 0
kill(19321, SIGQUIT)                    = 0
kill(-19321, SIGQUIT)                   = 0
kill(19322, SIGQUIT)                    = 0
kill(-19322, SIGQUIT)                   = 0
kill(19323, SIGQUIT)                    = 0
kill(-19323, SIGQUIT)                   = 0
wait4(-1, 0x7ffe90814374, WNOHANG, NULL) = 0
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(7, [5 6], NULL, NULL, {tv_sec=5, tv_usec=0}) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=19320, si_uid=108, si_status=2, si_utime=14, si_stime=3} ---

Is there a way how to trace back the exact SQL query that was executed?

Tombart

Asked: 2021-01-07 15:08:12 +0800 CST

ntpd fails to sync TIME_ERROR: 0x41: Clock Unsynchronized

0

On Debian 10, ntpd 4.2.8p12@1.3728-o fails to sync with following error:

kernel reports TIME_ERROR: 0x41: Clock Unsynchronize

here's ntp.conf:

disable monitor

statsdir /var/log/ntpstats

restrict -4 default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict ::1

server 0.us.pool.ntp.org iburst
server 1.us.pool.ntp.org iburst
server 2.us.pool.ntp.org iburst
server 3.us.pool.ntp.org iburst

server   127.127.1.0
fudge    127.127.1.0 stratum 10
restrict 127.127.1.0

driftfile /var/lib/ntp/drift

ntpq -c sysinfo:

associd=0 status=0614 leap_none, sync_ntp, 1 event, freq_mode,
system peer:        50-205-57-38-static.hfc.comcastbusiness.net:123
system peer mode:   client
leap indicator:     00
stratum:            2
log2 precision:     -23
root delay:         70.634
root dispersion:    3.569
reference ID:       50.205.57.38
reference time:     e3a0c049.c39d770a  Wed, Jan  6 2021 23:03:37.764
system jitter:      0.723169
clock jitter:       1.177
clock wander:       0.000
broadcast delay:    -50.000
symm. auth. delay:  0.000

ntpq -c lpeers:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l  286   64   20    0.000    0.000   0.000
*50-205-57-38-st .GPS.            1 u   19   64   37   70.631    1.618   1.843
-ns1.backplanedn 173.162.192.156  2 u   14   64   37   84.235   -1.575   2.852
+c-73-239-136-18 74.6.168.73      3 u   11   64   37   48.606    1.598   2.522
+time-d.bbnx.net 252.74.143.178   2 u   14   64   37   92.632    0.623   0.799

timedatectl:

               Local time: Wed 2021-01-06 23:06:44 UTC
           Universal time: Wed 2021-01-06 23:06:44 UTC
                 RTC time: Wed 2021-01-06 23:06:44
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: no
              NTP service: inactive
          RTC in local TZ: no

Any idea what could be wrong?

Tombart

Asked: 2019-08-15 04:13:24 +0800 CST

Docker packets are not being masqueraded (desipite NAT rules)

0

On a machine with Debian 9 (Linux kernel 4.9) I have a Docker (18.06.1) with some containers in brigde mode. For some strange reason some packets from Docker manage to bypass MASQUERADE rule, enp2s0 is a public interface (Docker uses docker0 interface with 172.17.0.1).

$ tcpdump -vvlnn -i enp2s0 port 3000 and src net 172.16.0.0/12
tcpdump: listening on enp2s0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:57:49.918655 IP (tos 0x0, ttl 63, id 62271, offset 0, flags [DF], proto TCP (6), length 52)
    172.17.0.2.55664 > x.x.x.x.3000: Flags [F.], cksum 0xe40c (correct), seq 9863202, ack 476959401, win 856, options [nop,nop,TS val 1382910659 ecr 2481487487], length 0
11:57:50.126683 IP (tos 0x0, ttl 63, id 62272, offset 0, flags [DF], proto TCP (6), length 52)
    172.17.0.2.55664 > x.x.x.x.3000: Flags [F.], cksum 0xe3d8 (correct), seq 0, ack 1, win 856, options [nop,nop,TS val 1382910711 ecr 2481487487], length 0
11:57:50.546660 IP (tos 0x0, ttl 63, id 62273, offset 0, flags [DF], proto TCP (6), length 52)
    172.17.0.2.55664 > x.x.x.x.3000: Flags [F.], cksum 0xe36f (correct), seq 0, ack 1, win 856, options [nop,nop,TS val 1382910816 ecr 2481487487], length 0

NAT rules from iptables-save:

*nat
:PREROUTING ACCEPT [11397418:724275374]
:INPUT ACCEPT [39095:3038067]
:OUTPUT ACCEPT [1328340:79997617]
:POSTROUTING ACCEPT [5102467:306147980]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -o enp2s0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.3/32 -d 172.17.0.3/32 -p tcp -m tcp --dport 5501 -j MASQUERADE
-A POSTROUTING -s 172.17.0.3/32 -d 172.17.0.3/32 -p tcp -m tcp --dport 5500 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 3000 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 48842 -j DNAT --to-destination 172.17.0.3:5501
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 48841 -j DNAT --to-destination 172.17.0.3:5500
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 13119 -j DNAT --to-destination 172.17.0.2:3000

I've tried to add MANGLE rules to catch those packets, but so far without any success:

*mangle
:PREROUTING ACCEPT [44457014385:7315518035795]
:INPUT ACCEPT [404840097:241773793538]
:FORWARD ACCEPT [44052174279:7073744241603]
:OUTPUT ACCEPT [526370610:171137381220]
:POSTROUTING ACCEPT [44578544703:7244881613871]
:bogus - [0:0]
:spoofing - [0:0]
-A PREROUTING -s 192.168.0.0/24 -i enp2s0 -j spoofing
-A PREROUTING -s 10.0.0.0/8 -i enp2s0 -j spoofing
-A PREROUTING -s 172.16.0.0/12 -i enp2s0 -j spoofing
-A PREROUTING -s 127.0.0.0/8 ! -i lo -j spoofing
-A PREROUTING -p tcp -m tcp --tcp-flags FIN,SYN FIN,SYN -j bogus
-A PREROUTING -p tcp -m tcp --tcp-flags SYN,RST SYN,RST -j bogus
-A PREROUTING -p tcp -m tcp --tcp-flags FIN,RST FIN,RST -j bogus
-A bogus -j LOG --log-prefix "BOGUS: "
-A bogus -j DROP
-A spoofing -j LOG --log-prefix "IP SPOOF: "
-A spoofing -j DROP
COMMIT

Any idea how can I block those packets?

Forwarded packets:

iptables -vnL FORWARD
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  44G 7074G DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  44G 7074G DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  16G 4358G ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
  54M 3269M DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
  28G 2712G ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            state INVALID
    0     0 ACCEPT     all  --  docker0 enp2s0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  enp2s0 docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 4 prefix "fw forward drop "
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW

Forward rules (partially injected by Docker):

-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -m state --state INVALID -j DROP
-A FORWARD -i docker0 -o enp2s0 -j ACCEPT
-A FORWARD -i enp2s0 -o docker0 -j ACCEPT

Also OUTPUT chain should be dropping invalid packets:

-A OUTPUT -m state --state INVALID -j DROP

Tombart

Asked: 2017-05-25 04:54:45 +0800 CST

hwclock: Cannot access the Hardware Clock via any known method

7

On a Debian server, I'm having problem with hwclock:

$ hwclock --show 
hwclock: Cannot access the Hardware Clock via any known method.
hwclock: Use the --debug option to see the details of our search for an access method.

System runs on backports kernel Debian 4.9.18-1~bpo8+1 (2017-04-10).

Here's debug output:

$ hwclock --debug
hwclock from util-linux 2.25.2
hwclock: cannot open /dev/rtc: Device or resource busy
No usable clock interface found.
hwclock: Cannot access the Hardware Clock via any known method.

clocksource:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

Finally, rtc device exists:

$ ls -l /dev/rtc*
lrwxrwxrwx 1 root root      4 Apr 29 16:41 /dev/rtc -> rtc0
crw------- 1 root root 253, 0 Apr 29 16:41 /dev/rtc0

Tombart

Asked: 2015-01-09 00:42:46 +0800 CST

systemd-journal in Debian Jessie LXC container eats 100% CPU

3

After creating fresh LXC based on Debian Jessie, on a Ubuntu 14.04, systemd-journal eats all CPU available.

lxc-create -n jessie -t debian

Tombart

Asked: 2014-07-16 06:47:04 +0800 CST

Graphite doesn't display any data

0

I'm trying to setup graphite (megacarbon) with Ceres storage.

Right now the default gui graphite-web does not show any data in Tree / Metrics However in search I'm able to search for my metrics and it shows some results. But I'm unable to plot any data. Any idea what could be wrong?

Is there some way how to dump graphite config, so that I could see which config graphite actually uses?

Tue Jul 15 16:49:22 2014 :: [IndexSearcher] performing initial index load
Tue Jul 15 16:49:22 2014 :: [IndexSearcher] reading index data from /opt/graphite/storage/index
Tue Jul 15 16:49:22 2014 :: [IndexSearcher] index reload took 0.002199 seconds (1143 entries)
Tue Jul 15 16:49:24 2014 :: graphite.wsgi - pid 7979 - reloading search index
Tue Jul 15 16:49:24 2014 :: [IndexSearcher] performing initial index load
Tue Jul 15 16:49:24 2014 :: [IndexSearcher] reading index data from /opt/graphite/storage/index
Tue Jul 15 16:49:24 2014 :: [IndexSearcher] index reload took 0.002313 seconds (1143 entries)
Tue Jul 15 16:49:24 2014 :: find_view query=* local_only=0 matches=0
Tue Jul 15 16:49:24 2014 :: received remote find request: pattern=* from=None until=None local_only=0 format=treejson matches=0

Tombart

Asked: 2014-04-11 02:51:52 +0800 CST

nginx fastcgi rewrite: primary script unknown

1

I have following nginx configuration:

  location / {
      try_files $uri $uri/ index.html =404;

      if (!-e $request_filename) {
        rewrite ^/(.+)$ index.php?url=$1 last;
      }
  }


 location ~ .php$ {
    # protection from known vulnerability
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    include fastcgi_params;
    fastcgi_pass   unix:/var/run/php5-fpm.sock;
    fastcgi_index  index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
 }

(fastcgi_params are defaults from Debian package)

it works for request /, however when the request is rewritten the main file is not found:

request is /contact which should be rewritten to /index.php?url=contact

 *104 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 10.0.0.1, server: localhost, request: "GET /contact HTTP/1.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "localhost:8080"

I'm unable to get from logs what is actuall fastcgi trying to load, which path?

Tombart

Asked: 2014-02-06 04:13:05 +0800 CST

Java causing kernel panic

0

I'm running quite a lot of computations on Ubuntu 12.04 (3.8.0-35-generic), application is written in Java (Oracle 1.7.0_45-b18) on a server which has 12 AMD cores. Typically the computations uses 10 threads, sometimes the load average in htop looks like this: 13.46, 13.01, 13.07. But usually the numbers are lower.

Anyway, last week one machine freezed, I wasn't able to connect over ssh, from remote console I took this screenshot, rebooted.

java kernel panic

There was nothing interesting in system logs. Today another machine with very same configuration did the same.

Is there a way how to debug kernel panic?

Tombart

Asked: 2013-11-12 01:29:30 +0800 CST

How to update grub with puppet?

4

I would like to change a line in /etc/default/grub with puppet to this:

GRUB_CMDLINE_LINUX="cgroup_enable=memory"

I've tried to used augeas which seems to do this magic:

   exec { "update_grub":
    command => "update-grub",
    refreshonly => true,
   }

  augeas { "grub-serial":
    context => "/files/etc/default/grub",
    changes => [
      "set /files/etc/default/grub/GRUB_CMDLINE_LINUX[last()] cgroup_enable=memory",
    ],
    notify => Exec['update_grub'],
  }

It seems to work, but the result string is not in quotes and also I want to make sure that any other values will be separated by space.

GRUB_CMDLINE_LINUX=cgroup_enable=memory

Is there some mechanism how to append values and escape the whole thing?

GRUB_CMDLINE_LINUX="quiet splash cgroup_enable=memory"

Tombart

Asked: 2013-05-04 07:02:12 +0800 CST

Can't start ejabberd after hostname change

1

When I try start ejabberd service it always crashes.

Starting jabber server: ejabberd
Crash dump was written to: /var/log/ejabberd/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})

Crash dump was written to: /var/log/ejabberd/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
.

I've changed hostanme of server, before that it worked fine, however in config I have:

{hosts, ["localhost", "private.localhost", "public.localhost"]}.

Tombart

Asked: 2013-04-06 00:53:38 +0800 CST

puppet fileserver serve non-module file

5

I'd like to server file which is located in /etc/puppet/files/key.pgp

  file { "/var/www/key.gpg":
    ensure  => present,
    source  => 'puppet:///files/key.gpg',
  }

I'm getting this error:

Not authorized to call find on /file_metadata/files/key.gpg

auth.conf: (I understood that this should match file*)

path /file
allow *

fileserver.conf:

[files]
  path /etc/puppet/files
  allow *

Am I doing something wrong? I'm using librarian for managing my modules, so I don't wanna put anything configuration specific to modules dir.

Tombart

Asked: 2013-03-18 05:02:09 +0800 CST

How to grant in Puppet a global permissions to PostgreSQL?

1

Official postgresql module from puppetlabs allow granting privileges on specific database.

postgresql::database_grant{'grant to myuser':
    privilege   => 'CREATE',
    db          => 'app_production',
    role        => 'myuser',
  }

this would execute:

 GRANT ${privilege} ON database ${db} TO ${role};

However I'd like to execute query on global permissions for given user:

  ALTER role myuser login createdb;

Is there a way how to do this? Or should I use different puppet module for PostgreSQL?

Tombart

Asked: 2013-03-18 04:39:43 +0800 CST

User can't remove symlink which owns

3

I'm trying to remove symlink, although I have appropriate permission, the operation is denied (the user is called capistrno):

capistrno $ rm -f /var/www/app/current
rm: cannot remove `/var/www/app/current': Permission denied

the user should have all permissions to this file

lrwxrwxrwx 1 capistrano capistrano 42 17. mar 13.09 /var/www/app/current -> /var/www/app/releases/20130317120932/

capistrno $ file /var/www/app/current
/var/www/app/current: symbolic link to `/var/www/app/releases/20130317120932'

Any idea what's wrong?

EDIT:

folder /var/www/app

$ ls -laF /var/www/app/
total 16
drwxr-xr-x 4 www-data   www-data 4096 17. mar 14.15 ./
drwxrwxr-x 4 capistrano www-data 4096 17. mar 00.01 ../
drwxrwxr-x 6 capistrano www-data 4096 17. mar 14.15 releases/
drwxrwxr-x 7 capistrano www-data 4096 17. mar 00.39 shared/

user capistrano belongs to this groups:

$ groups
capistrano www-data rvm

Tombart

Asked: 2013-03-17 13:36:46 +0800 CST

How to install packages from source code with Puppet?

1

I'd like to install source code packages which doesn't have binary packages (deb, rpm) yet.

How do I stop execution of a module in case that the module is already install on that machine?

I'm using:

  Exec {
    creates => "${zookeeper_path}/zookeeper/bin/zkServer.sh"
  }

however all the other block are executed anyway. What is the best way? Checking existence of several files? I don't want to untar and recompile all the modules when puppet check for changes.

EDIT:

The installation process consists of several steps:

fetch tar.gz package
untar package
create several config files
create service
ensure service is running

Tombart

Asked: 2013-03-11 08:57:06 +0800 CST

Why Puppet can require each package just once?

6

When defining dependencies in a class each Package can be globally defined just once. I have hierarchy of configuration and some packages should be installed on all machines (that goes to default configuration) but other should be installed only on some category of machines. How I am supposed to check whether that package is already on a machine when Puppet threat is as a duplicate declaration?

  Duplicate declaration: Package[wget] is already declared

should I use a function like this?

  if defined( Package[$package] ) {
    debug("$package already installed")
  } else {
    package { $package: ensure => $ensure }
  }

I would expect from configuration tool to deal whith this issue by default... am I missing something?

Tombart

Asked: 2013-03-11 06:31:36 +0800 CST

how can I call ruby function basename in puppet

5

I'd like to call function File.basename which is available in Ruby. Is it possible in puppet?

Something like:

$filename = basename($download_url)

Tombart

Asked: 2013-03-08 01:53:20 +0800 CST

how to pass parameters to puppet modules?

6

What is the best practice for configuration of puppet modules? I have puppet 2.7.11. I find this way quite messy, it looks like using global variables.

node default {
   $always_apt_update = true
   include apt
}

Should I create class which would inherit most of configuration from the original? The documentation seems to have too many versions and I'm not sure which one applies for me.

UPDATE:

when I try this:

  class { 'apt': 
    always_update => 'true',
  }

I get an error:

Error 400 on SERVER: Invalid parameter always_update at /etc/puppet/manifests/nodes.pp:32

Tombart

Asked: 2013-03-01 10:13:55 +0800 CST

puppet environment variable $PATH is not set

1

I'm trying to install a module with puppet 2.7 on Debian 6.0 and I keep getting this error:

returns: change from notrun to 0 failed: Could not find command 'tar'

this is the relevant code:

 file {"zookeeper-tarball":
    path => "${zookeeper_parent_dir}/${tarball}",
    source => "puppet:///modules/zookeeper/${tarball}",
    ensure => file,
  }

  exec { "zookeeper_untar":
    path => "${zookeeper_parent_dir}",
    command => "tar -xzf ${zookeeper_parent_dir}/${tarball}",
    cwd => "${zookeeper_parent_dir}",
    user => "$user",
    require =>  File["zookeeper-tarball"],
    creates => "${zookeeper_parent_dir}/zookeeper-${zookeeper_version}",
  }

in manifests/site.pp I have this:

Exec {
  path => "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
}

the user is root. Any idea what could be the problem? It seem like the $PATH is empty...

Terribly slow I/O on NVMe mdadm RAID array

How to generate certificates for (secondary) compile puppetserver?

How to debug PostgreSQL segmentation fault?

ntpd fails to sync TIME_ERROR: 0x41: Clock Unsynchronized

Docker packets are not being masqueraded (desipite NAT rules)

hwclock: Cannot access the Hardware Clock via any known method

systemd-journal in Debian Jessie LXC container eats 100% CPU

Graphite doesn't display any data

nginx fastcgi rewrite: primary script unknown

Java causing kernel panic

How to update grub with puppet?

Can't start ejabberd after hostname change

puppet fileserver serve non-module file

How to grant in Puppet a global permissions to PostgreSQL?

User can't remove symlink which owns

How to install packages from source code with Puppet?

Why Puppet can require each package just once?

how can I call ruby function basename in puppet

how to pass parameters to puppet modules?

puppet environment variable $PATH is not set

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?