Aaron's questions -server

Aaron

Asked: 2018-04-11 13:26:11 +0800 CST

Random timestamp on first syn-ack on loopback

4

Preface

We are testing some host based IPS. In this test case, our application is listening on the loopback and the application is receiving traffic in clear text. We are using either nginx or haproxy to terminate TLS on the public interface. Our IDP will be monitoring the loopback so that it can see unencrypted traffic.

Our IDP was seeing malformed / incorrect dates, so we started digging deeper.

[ Update 2 ] As @kasperd mentioned, tcpdump is getting the timestamps from the OS. That said, it turns out, this bug is actually tripping up the IDP in addition to tcpdump. It sees the connection_established, but failed to see a valid http session, as the syn-ack is not valid.

A bug has been filed on redhat.com and centos.org.

Observation

The first syn-ack on the loopback always have a date close to the start of epoch, or within 2 years of it on VM's. This varies wildly from Dec 1970 to Feb 1973 on VM's and far in the futre on bare metal Xeon servers. NTP is correct on all of our VM's and bare metal servers, less than 50ms drift.

This only happens on the loopback. We never see this on bond0 on the servers or eth0 on VM's.

Test Servers and laptops

OS: CentOS 7

Platforms:

Dell 20 Core Xeon servers ( bare metal host OS )

HP 20 Core Xeon servers ( bare metal host OS )

VirtualBox on MacOS

Hyper-V on Windows 10 Enterprise on Lenovo P50 with 6 virtual cores.

One Celeron 4 Core 1.6 GHz based router (can not reproduce on the Celeron)

Steps to reproduce

On each platform, we start up a web listener on port 80 on the loopback.

./simple_python 127.0.0.1 &

The code for above is here

Then we start up tcpdump

tcpdump -p -NNnn -XXxx -tttt -vv -s0 -c2 -i lo &

Then we curl to localhost

curl -s -o /dev/null http://127.0.0.1/

Output

2018-04-10 21:05:30.087769 IP (tos 0x0, ttl 127, id 49233, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.25134 > 127.0.0.1.80: Flags [S], cksum 0xfe30 (incorrect -> 0xce27), seq 4053136920, win 65495, options [mss 65495,sackOK,TS val 22951497 ecr 0,nop,wscale 13], length 0
    0x0000:  0000 0000 0000 0000 0000 0000 0800 4500  ..............E.
    0x0010:  003c c051 4000 7f06 3d68 7f00 0001 7f00  .<.Q@...=h......
    0x0020:  0001 622e 0050 f195 f618 0000 0000 a002  ..b..P..........
    0x0030:  ffd7 fe30 0000 0204 ffd7 0402 080a 015e  ...0...........^
    0x0040:  3649 0000 0000 0103 030d                 6I........
1973-02-14 22:12:10.785902 IP (tos 0x0, ttl 127, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.80 > 127.0.0.1.25134: Flags [S.], cksum 0xfe30 (incorrect -> 0x2f28), seq 3928063281, ack 4053136921, win 65483, options [mss 65495,sackOK,TS val 22951497 ecr 22951497,nop,wscale 13], length 0
    0x0000:  0000 0000 0000 0000 0000 0000 0800 4500  ..............E.
    0x0010:  003c 0000 4000 7f06 fdb9 7f00 0001 7f00  .<..@...........
    0x0020:  0001 0050 622e ea21 7d31 f195 f619 a012  ...Pb..!}1......
    0x0030:  ffcb fe30 0000 0204 ffd7 0402 080a 015e  ...0...........^
    0x0040:  3649 015e 3649 0103 030d                 6I.^6I....

In every case, the syn-ack is always some date between 1970 and 1973 on VM's and way in the future on the Xeons.

I can reproduce this 100% of the time on each of the platforms, except for the Celeron. We don't use Celerons in the data-center. I was just trying to find something not affected.

What else have I tried to make this go away?

I have tried pinning applications to a core using taskset.
I tried setting different variables that affect libc, such as TZ, LANG, LC_ALL, etc...
I have tried disabling all offloading capabilities of the interface, despite it being a loopback and those shouldn't actually do anything.
I have tried a few different sysctl settings.
I tried using different snaplen in tcpdump. ( I am aware of some historical issues around snaplength )
I verified the hardware clock is correct.

What I have not tried

I have not tried to set up receive flow steering, since we would not do this in our data-centers without really good reason.
There are probably a myriad of other things I could try, but this really looks like a libc / buffer / race condition bug of some kind.

Any thoughts on where in the Linux code this might be occurring? I am hesitant to dig into glibc as I am not a C developer.

[Update] It appears @jackthecoiner found where someone else is having this issue as well and has not received any feedback on the Redhat site as of yet.

Aaron

Asked: 2017-07-01 17:31:10 +0800 CST

Proper method to disable polkit.service on CentOS 7

7

Why?

My services are started via proper unit files or init scripts. I have no need for regular users to do anything special on my servers beyond su. I am specifically looking for a way to completely shut down polkit without it starting up on it's own when other services are restarted.

I foresee a problem explaining this to auditors in our PCI environment as well. We have to describe the purpose of each service. We do not have a legit use case for polkit in a PCI environment.

Additional note: I did not install polkit. These servers have a very minimal install around 670MB on / It was a systemd update that appears to have installed polkit and the spec apparently has dependencies to all systemd managed services. Once it is installed, I have to rebuild the machine to remove it, just like trying to remove nss once you install it. My concern is that if I force the unstall, it may have left files that will trip up systemd that assumes it is there.

What I have tried:

Create /etc/polkit-1/rules.d/99-deny-all.rules with

polkit.addRule(function(action, subject) {
    return polkit.Result.YES;
});

Then

systemctl daemon-reload && systemctl daemon-reexec

This does nothing, /usr/lib/polkit-1/polkitd --no-debug continues to start when other services under systemd are restarted.

[ Update ] As Alexander mentioned, restarting polkit will apply the settings to polkit itself and that is good, but I am looking for a way to tell polkit to not start that does not break other services.

[ update 2 ] This may actually prevent some services from re-starting correctly.

Mask or disable the service:

This causes other services to hang on startup and shutdown, waiting for polkit.

Edit /usr/share/dbus-1/sstem-services/org.freedesktop.PolicyKit1.service with:

[snip]
Exec=/bin/false
[snip]

Then

systemctl daemon-reload && systemctl daemon-reexec

This does nothing, /usr/lib/polkit-1/polkitd --no-debug continues to start when other services under systemd are restarted.

I have read the man pages a couple times. It's probably something really simple I am missing. My preference would be for a method that persists after systemd package updates.

The end goal I am looking for is for polkit.service to not start when other daemons are restarted, such as unbound, bind, dhcp, etc.

Aaron

Asked: 2016-02-01 11:50:48 +0800 CST

Recursive forwarding Bind DNS server not answering from cache

0

Problem Statement

I have a bind caching and forwarding server that is functioning almost as expected. All queries are forwarded, their TTL is cached. At first blush, everything appeared normal until I was watching the response time never fall below the 150ms of my VPN.

Upon performing tcpdump, I found that even though bind was well aware of the TTL being geater than 0 (TTL of 14000 in this case), the resolver was still forwarding every request. The TTL of the response still decrements as expected, but every single request is still forwarded upstream regardless of the TTL remaining in the cached record.

If I disable forwarding, this behavior ceases and cache works as expected.

Version

Bind 9.9.4-29  (Redhat forked)
OS: CentOS 7

Relevant Parts of the Configuration

allow-query { local; };
recursion yes;
allow-recursion { local; };
max-cache-size unlimited;
stacksize unlimited;
datasize unlimited;

    zone "." IN {
    type forward;
    forward first;
    forwarders { 192.168.120.3; 192.168.120.2; };
    };

I have tried both forward first; and forward only; This resolver has almost no load on it and several GB of memory available. At any given time, it never has more than a few hundred records cached.

Is it likely that I have the wrong expectation of behavior in bind? Using Unbound I did not see this behavior, but I would like to switch back to bind for other reasons.

Random timestamp on first syn-ack on loopback

Proper method to disable polkit.service on CentOS 7

Recursive forwarding Bind DNS server not answering from cache

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?