This question pertains to working with a Python + Poetry project within a Jenkins pipeline and how to retain the .venv/
SCENARIO:
I have a Jenkins Pipeline job that triggers a Python project. The project uses poetry to create a virtual env in .venv within the workspace. Each subsequent job run, it will re-use the .venv as expected so each pip package does not need to re-download on every run (unless there is a diff in the poetry.lock file). Everything works as expected.
I want to make a change to the Pipeline using the Jenkins Workspace Cleanup Plugin, I want to clobber the workspace files but keep some files, including the pip/poetry/venv environment files. This is to allow it to re-use pip packages from the previous run still stored in .venv -- just as it does on the working Pipeline today.
Full pipeline file example is at the bottom of this post, but here is a snippet of the cleanWs() portion I've added to the existing Pipeline:
post {
always {
cleanWs(
deleteDirs: true,
notFailBuild: true,
patterns: [
[pattern: '.venv', type: 'EXCLUDE'],
[pattern: '.venv/**', type: 'EXCLUDE']
]
)
}
}
HERE IS THE ISSUE:
The first time the job runs, it works perfectly fine and the workspace cleanup works as expected. The
.venv/
directory is retained as expected.(problem) On subsequent runs of the job, poetry will re-install all the packages and will not re-use the .venv directory:
Creating virtualenv test in /data/jenkins_home/workspace/test-cleanup/.venv
-- This forces a full re-download of every package, even though.venv
already exists. It's been confirmed that/data/jenkins_home/workspace/test-cleanup/.venv
already exists before the job runs.Here is the strange part: If I go to the workspace dir manually on the Jenkins server and run the exact same command
poetry install
it works as expected, the.venv
is reused and all the packages are not reinstalled. So there is something specific about the way it's handled running in the job that's making it want to recreate the .venv dir.
NOTE that in-project = true
is already set for Poetry. So it will always try to use .venv within the current working directory.
EXAMPLES:
Here is a simple example pipeline that works as expected. When poetry install
step runs, it does not re-download all the packages every time the job runs, only the first time, or if there is a diff:
pipeline {
agent any
stages {
stage("Prep Build Environment") {
steps {
script {
scmVars = git branch: "main", poll: false, url: "[email protected]:my-org/private-repo.git"
}
sh "poetry install"
}
}
}
}
Here is the new Jenkinsfile pipeline file with cleanWs()
added. After this was added, the project will no longer re-use the .venv on each run, even though it still exists:
pipeline {
agent any
stages {
stage("Prep Build Environment") {
steps {
script {
scmVars = git branch: "main", poll: false, url: "[email protected]:my-org/private-repo.git"
}
sh "poetry install"
}
}
}
post {
always {
cleanWs(
deleteDirs: true,
notFailBuild: true,
patterns: [
[pattern: '.venv', type: 'EXCLUDE'],
[pattern: '.venv/**', type: 'EXCLUDE']
]
)
}
}
}
Typically a Jenkins Pipeline will clean the workspace at the start of a build, which is why .venv folder exists after the build runs but not when running the virtualenv step during the next build.
If you want to cache or retain some files between builds, the most reliable way to do so is to store those files outside of the workspace. You need to be extremely careful with this, however, because simultaneously running builds accessing the same files can lead to resource contention and race conditions and corrupted files.
The quick answer:
Add
[pattern: '.git/**', type: 'EXCLUDE']
to your list ofcleanWs()
patterns.The workspace files, including
.venv/
will then persist between job runs.The long answer:
So the problem seems to be that the git plugin clobbers the workspace when the
.git/
directory is missing (when it needs to do a full clone). I don't see this behavior documented anywhere from what I've seen.Since the
cleanWs()
action removes.git
, it will trigger this "feature" of the git plugin.Adding the
.git/
directory to the exclude list seems to have solved the original problem and allows the other files to persist between builds.This behavior can be proven:
Pipeline steps that writes a file to the workspace and checks out repo.
After the job runs once,
foo.txt
will persist in the workspace:Remove
.git
data after cloning:After the job runs once,
foo.txt
will now be missing at the start of every job run:Final post/always block snippet. The poetry installs and virtual env is working as expected now: