in our company we have a huge code base (>100000 files) and so we keep it in several git repositories. So we have a forest of repositories and one super repository containing only submodule references on top of that.
The idea is to have the super repository just as a convenience glue and update it automatically whenever a developer updates any submodule.
I have experimented with the post-receive hook and ended up with the following implementation:
(it involves git plumbing in order to be able to modify the bare repository directly)
#!/bin/bash -e
UPDATED_BRANCHES="^(master|develop)$"
UPDATED_REPOS="^submodules/.+$"
# determine what branch gets modified
read REV_OLD REV_NEW FULL_REF
BRANCH=${FULL_REF##refs/heads/}
if [[ "${BRANCH}" =~ ${UPDATED_BRANCHES} ]] && [[ "${GL_REPO}" =~ ${UPDATED_REPOS} ]];
then
# determine the name of the branch in the super repository
SUPERBRANCH=$FULL_REF
SUBMODULE_NAME=${GL_REPO##submodules/}
# clean the submodule repo related environment
unset $(git rev-parse --local-env-vars)
# move to the super repository
cd $SUPERREPO_DIR
echo "Automaticaly updating the '$SUBMODULE_NAME' reference in the super repository..."
# modify the index - replace the submodule reference hash
git ls-tree $SUPERBRANCH | \
sed "s/\([1-8]*\) commit \([0-9a-f]*\)\t$SUBMODULE_NAME/\1 commit $REV_NEW\t$SUBMODULE_NAME/g" | \
git update-index --index-info
# write the tree containing the modified index
TREE_NEW=$(git write-tree)
COMMIT_OLD=$(git show-ref --hash $SUPERBRANCH)
# write the tree to a new commit and use the current commit as its parent
COMMIT_NEW=$(echo "Auto-update submodule: $SUBMODULE_NAME" | git commit-tree $TREE_NEW -p $COMMIT_OLD)
# update the branch reference
git update-ref $SUPERBRANCH $COMMIT_NEW
# shall we also update the HEAD?
# git symbolic-ref HEAD $SUPERBRANCH
fi
Now the questions are:
- Is it a good idea at all to use a git hook to modify another repository than the one that triggered the event?
- Is the hook implementation OK?
(It seems to be working on my machine, but I have no prior experience with git plumbing and so maybe I have omitted something) - I guess there is a possibility of race conditions in case of two (or more) submodules being updated simultaneously. Is it possible to prevent that somehow (e.g. a lock file)?
(we are using gitolite as the access layer). - Would it be better to use a clone of the super repository for the modification and then push (as opposed to modify the bare super repository directly)?
Thanks in advance.
There are benefits to the implementation you've done. Although you have omitted some possible edge-cases like checking for un-staged changes in other branches (you might want to add/stash first). The alternative to this is using a continuous integration system like Jenkins to handle the updates:
https://wiki.jenkins-ci.org/display/JENKINS/Meet+Jenkins
This has several benefits over the git hooks system. It can be centrally controlled (we ran into issues getting the git-hooks to work on different operating systems our engineers used the more complexity we added). There is more functionality available as well (lots of user contributed modules). Our repo scripts now contact Jenkins for repo status and can update accordingly.