Rino Bino

Asked: 2022-10-01 16:43:15 +0800 CST2022-10-01 16:43:15 +0800 CST 2022-10-01 16:43:15 +0800 CST

Jenkins pipelines do not resume properly after Jenkins restart

Issue Summary:

Jenkins LTS + The Durable Task plugin does not properly resume a pipeline job if the Jenkins service is restarted during the task run.

This is a regression in Jenkins 2.3x and seems to coincide with the migration to systemd (it used to work perfectly fine in 2.2x).

Steps to reproduce the issue:

Start with a single node Jenkins host with the durable task plugin installed.
Start a pipeline job on the host. I've included a sample pipeline file at the bottom of this question.
While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
After Jenkins starts, the task attempts to resume, but instead eventually fails (log below).

Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
Waiting to resume part of test-job #5: Waiting for next available executor
Ready to run at Tue Jul 19 23:27:01 UTC 2022
wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
(JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

After the above message throws, the job goes into a "failed" state.

Manually touching/writing to the mentioned log file does not resolve the problem.
The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. (This is a regression in the latest versions of Jenkins.)
There are no available plugin updates (fully up to date).
This seemed to happen when we got on the 2.332 version which also included the migration to systemd. So, there is a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable tasks.

This issue has been filed on the Jenkins official tracker: https://issues.jenkins.io/browse/JENKINS-69061

However, nobody has responded to that report in over 2 months so I'm asking if anyone here has any idea what the issue could be, to find potential workarounds, and to overall increase visibility/traction on the problem.

Example minimal/simple pipeline used in testing this issue:

pipeline {
  agent any

  stages {

    stage("Sleep for 60 seconds") {
      steps {

        echo "Go restart jenkins service now and see that this job wont resume"

        sh "sleep 60"

        echo "The job will never get this far"

      }
    }
  }
}

Jenkins pipelines do not resume properly after Jenkins restart

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Jenkins pipelines do not resume properly after Jenkins restart

0 Answers