I want to start etcd (single node) in docker from systemd, but something seems to go wrong - it gets terminated about 30 seconds after start.
It looks like the service starts in status "activating" but get terminated after about 30 seconds without reaching the status "active". Perhaps there are any missing signalling between docker container and systemd?
Update (see bottom of post): systemd service status reaches failed (Result: timeout)
- when I remove the Restart=on-failure
instruction.
When I check the status of the etcd service after boot, I get this result:
$ sudo systemctl status etcd● etcd.service - etcd Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-08-18 20:13:30 UTC; 4s ago
Process: 2971 ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --advertise-client-urls http://10.0.0.11:2379 --listen-client-urls http://0.0.0.0:2379 --initial-advertise-peer-urls http://10.0.0.11:2380 --listen-peer-urls http://0.0.0.0:2380 --initial-cluster etcd0=http://10.0.0.11:2380 (code=exited, status=125)
Main PID: 2971 (code=exited, status=125)
I run this on an Amazon Linux 2 machine, with a user data script to run at launch. I have confirmed that docker.service
and docker_ecr_login.service
run successfully.
And short after launch of the machine, I can see that the etcd is running:
sudo systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (start) since Wed 2021-08-18 20:30:07 UTC; 1min 20s ago
Main PID: 1573 (docker)
Tasks: 9
Memory: 24.3M
CGroup: /system.slice/etcd.service
└─1573 /usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com...
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.690Z","logger":"raft","caller":"...rm 2"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.691Z","caller":"etcdserver/serve..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"membership/clust..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"etcdserver/server.go:2...
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"api/capability.g..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"etcdserver/serve..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"embed/serve.go:9...ests"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.695Z","caller":"etcdmain/main.go...emon"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.695Z","caller":"etcdmain/main.go...emon"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.702Z","caller":"embed/serve.go:1...2379"}
Hint: Some lines were ellipsized, use -l to show in full.
I get the same behavior wether etcd listen to the Node IP (10.0.0.11) or 127.0.0.1.
I can run etcd locally, started from command line (and it does not terminate after 30 seconds), with:
sudo docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd-local \
my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
/usr/local/bin/etcd --data-dir=/etcd-data \
--name etcd0 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-client-urls http://0.0.0.0:2379 \
--initial-advertise-peer-urls http://127.0.0.1:2380 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd0=http://127.0.0.1:2380
The parameters to etcd is similar to Running a single node etcd - ectd 3.5 documentation.
This is the relevant part of the startup script that is intended to launch etcd:
sudo docker volume create --name etcd-data
cat <<EOF | sudo tee /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=docker_ecr_login.service
[Service]
Type=notify
ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data \
--name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
/usr/local/bin/etcd --data-dir=/etcd-data \
--name etcd0 \
--advertise-client-urls http://10.0.0.11:2379 \
--listen-client-urls http://0.0.0.0:2379 \
--initial-advertise-peer-urls http://10.0.0.11:2380 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd0=http://10.0.0.11:2380
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable etcd
sudo systemctl start etcd
When listing all containers on the machine, I can see that it has been running:
sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a744aed0beb1 my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 "/usr/local/bin/etcd…" 25 minutes ago Exited (0) 24 minutes ago etcd
but I suspect that it cannot be restarted since the container name already exists.
Why does the etcd container get terminated after ~30 seconds, when started from systemd? It appears like it successfully start, but systemd only shows it in status "activating" but never in status "active" and it seem to be terminated after about 30 seconds. Is there some missing signalling from the etcd docker container to systemd? If so, how do I get that signalling correct?
UPDATE:
After removing the Restart=on-failure
instruction in the service unit file, I now get status: failed (Result: timeout):
$ sudo systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: failed (Result: timeout) since Wed 2021-08-18 21:35:54 UTC; 5min ago
Process: 1567 ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://0.0.0.0:2379 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://0.0.0.0:2380 --initial-cluster etcd0=http://127.0.0.1:2380 (code=exited, status=0/SUCCESS)
Main PID: 1567 (code=exited, status=0/SUCCESS)
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.332Z","caller":"osutil/interrupt...ated"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.333Z","caller":"embed/etcd.go:36...379"]}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: WARNING: 2021/08/18 21:35:54 [core] grpc: addrConn.createTransport failed ...ing...
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.335Z","caller":"etcdserver/serve...6a6c"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.337Z","caller":"embed/etcd.go:56...2380"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.338Z","caller":"embed/etcd.go:56...2380"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.339Z","caller":"embed/etcd.go:36...379"]}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: Failed to start etcd.
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: Unit etcd.service entered failed state.
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: etcd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
Update: Posting test data and integrating updates based on comments received. docker -d is not required for systemd integration, as originally thought. Type= setting as Michael indicated seems to be in my experience, more important than offloading the daemonized status of a service to docker. The OP problem seemed on first blush to be a side effect of not having backgrounded, as I originally explained. This background seems irrelevant after having tested it further.
Note that the Amazon AWS image used in the OP, is not something that I can test or directly troubleshoot. A contrasting example for etcd and systemd are shown here to help with configuring the endpoint system similarly to mine. System details:
systemd configuration
I ended up with the following systemd service file. Note that Type=simple, owing to Michael suggesting to clarify this point in the response (and, apparently, my own understanding of this piece of the puzzle). You can learn more about systemd types here:
https://www.freedesktop.org/software/systemd/man/systemd.service.html
Type matters; More to the point my original understanding of simple as type, was myopically focused on the lack of communication back to systemd, which caused me to ignore the applicable behavior of what the type setting does in reaction to responses from the called application (in this case docker).
Removal of type, or addition of type to simple, will result in the same behavior regardless. The following configuration in my test worked reliably, as did -d being present or not in the docker run command:
Notes
docker ps
shows the container running, butsystemctl status container-etcd
showed exited and inactive.Confirmation
After updating systemd service file, doing a daemon-reload, etc., I execed into the container and ran a test command against etcd:
docker exec -it etcd sh
etcdctl --endpoints=http://10.4.4.132:2379 member list
Result