Git Submodules are Fun
As usual, because Git is a confusing, convoluted mess of nested questionably-designed wrapper commands that often do not accurately describe their actual functionality, submodules are not as simple as they appear on the surface, and misunderstanding them can lead to headache.
As usual.
How submodules are implemented
Before discussing what a submodule is, one must understand gitlinks.
Git keeps track of repository files and their changes by storing them inside objects. These objects represent file types and permissions in a bitmask, following Git’s own specification. While similar to and modeled after traditional Unix/POSIX mode bitmasks, Git’s rendition actually only supports a small fixed list of modes, with additional Git-only bits. In particular, the specification defines a unique file type called “gitlink” that contains a commit object ID—or more colloquially, a SHA-1 commit hash. While this object has a filesystem path associated with it, do note that how this path is used, what repository the commit object belongs to, and even how the commit object is represented in the filesystem is not actually defined here.
Submodules are an extension of this functionality, using gitlinks to keep track of what commit the submodule is currently at within the superproject. Other information handled by submodules includes the remote repository it pulls from, how and where the repository is cloned, and how and where to check out its working tree.
How users interact with submodules
Given the above information, you might be fooled into thinking submodules are simple and straightforward. How wrong you are.
Submodules use multiple configuration files across the repository, and if you touch one area of them without also updating the other, you can easily cause errors and stoppages that, on their surface, present no obvious method of resolution.
Submodule operations may read and write any selection of the following, and it is not always obvious which: ^updates
.gitmodules
, under[submodule "<submodule-name>"]
sections.git/config
, under[submodule "<submodule-name>"]
sections.git/modules/<submodule-name>
- the submodule path in the index
- the submodule path in the working tree
There are certain ways to interact with submodules, and doing so in an unsupported way will result in the whole thing falling over. Fun! 🥳
Keep in mind the following when dealing with submodules, if you want to avoid problems. I’ve highlighted potentially fun bits of information in bold.
Submodule names are also filesystem paths
Submodules have names, and these names can contain path separators. By default, the name of a submodule is the filesystem path it is checked out to, but it can really be anything you want. Submodule paths do not have to match their name, and if a submodule is moved without renaming it, the new path and name will not match.
Cloned Git repositories of submodules are stored locally in .git/modules
. Each repository is put in a subdirectory created using the submodule’s name. If the name contains path separators, further subdirectories will be created to fully resolve the path. For example, a submodule named hello/world
will find its Git repo stored in .git/modules/hello/world
.
This isn’t really a problem for most users in most situations, and it’s possible to specify custom names for submodules with git submodule add --name
. But what if the user didn’t realize that before creating it, and they want to reformat their names? What if they want a moved submodule’s name to match its new path, or to have the names not contain path elements at all after realizing they do by default? Perhaps the repo or project the submodule points to changed somehow, and the user wants the name to reflect the new information. Users with issues like these may attempt to rename their submodules in .gitmodules
, which is the most immediately obvious place to look, but this won’t work by itself and will cause errors later on.
Renaming submodules
There is actually no built-in way to rename a submodule. The easiest way to accomplish such a task with native Git tooling is to simply recreate it:
# unregister the submodule; will remove it from .git/config
# NOTE: will empty the submodule working tree as well!
git submodule deinit <submodule-path>
# remove the gitlink from the index
git rm --cached <submodule-path>
# recreate the submodule; will add a new entry to .gitmodules, and add a new
# gitlink to the index to replace the old one
git submodule add --name <new-name> <repo-path> <submodule-path>
# re-init the submodule; will add it back to .git/config with the new name
git submodule init <submodule-path>
However, doing this will not remove the submodule from .gitmodules
, that must be done manually. Just as there is no git submodule mv
command or similar, there is also no git submodule rm
—git submodule add
has no direct inverse!
If you don’t want to touch the index, or don’t want to clear the submodule directory as a consequence of git submodule deinit
, it is possible to rename a module manually by editing files only. Renaming the module in both .gitmodules
and .git/config
, and moving the repo to the new name in .git/modules
should be sufficient, since those are the only places the name is used. Furthermore, if the module hasn’t been initialized with git submodule init
, then .git/config
doesn’t need to be edited. Likewise, there will be no repo in .git/modules/<old-name>
if the submodule hasn’t been checked out with git submodule update
before. ^names
Submodule filesystem paths are also keys in the index
The filesystem paths of submodules are written to .gitmodules
in the repository. Rather than use the submodule name, its path is what is used to associate a gitlink in the index with the submodule. This means that moving a submodule around in the working tree requires moving the gitlink as well. ^paths
I imagine this would be particularly confusing for new users that want to move their submodule somewhere else. Upon discovering that there isn’t a dedicated git submodule
command to do so, the clever user may believe they can edit .gitmodules
directly to point at a new path, and mv
the directory themselves to the new destination. But this will not work, and will cause errors later on. In reality, the user must instead use git mv
and not touch .gitmodules
. As far as Git is concerned, submodules are files and must be moved as such… and git mv
must specifically be used so that other files that reference the submodule are also updated. Updating these files manually before using git mv
will also cause issues.
Specifically, note that git mv
ing a submodule will properly update everything necessary to ensure the submodule continues to work. The user needn’t worry about implementation details like .gitmodules
or .git/modules
. While this is a good thing, the documentation does not make it very clear this is how it should be done, and that the other files and repositories basically never need to be touched. (And in fact you will have a bad time if you touch them.)
If you want to update the path yourself, or fix a botched move, you must make sure to update the index, submodule.<name>.path
in .gitmodules
, core.worktree
in .git/modules/<submodule-name>/config
, and gitdir
in <submodule-path>/.git
to point to the correct path. All paths should be relative, and neither of the last two paths will exist if the submodule hasn’t been checked out with git submodule update
. Note that git submodule deinit
will remove the entire submodule path in the working tree, but will only remove the core.worktree
line from the submodule repo config and not the entire directory.
Submodule and their remotes can fall out of sync
Submodules are cloned from remote repositories. When creating a submodule, the remote is stored in .gitmodules
This then has to be copied to .git/config
by running git submodule init
… And unless you know/remember to run git submodule sync
after you git pull
, upstream changes to .gitmodules
won’t be reflected in your local repo. This can result in hidden problems caused by outdated submodules, where the user may otherwise expect git submodule update
to take care of everything. Hell, users may not even know git pull
doesn’t update their submodules for them—while Git will fetch the changes to submodules necessary to resolve the commit specified by the gitlink in the index, it won’t update them for you.
Implementation details
Based on my above descriptions, it may seem like submodules are incomprehensible. But in reality, they aren’t that complex. Going into a bit more detail on how exactly everything ties together can help puzzle stuff out:
- The submodule name, remote path/URL, and filesystem path are stored in the
.gitmodules
config file, and this file is automatically updated when adding a new submodule withgit submodule add
. This file is what connects index gitlinks to repositories and other metadata, as mentioned. - Submodule names, remotes, and update methods are synced into submodule sections in
.git/config
when runninggit submodule init
.git submodule sync
can be used to sync settings if.gitmodules
changes later on. It is set up like this to allow remotes to be overridden locally without updating.gitmodules
in the worktree.- N.B.: Filesystem paths are not copied to
.git/config
! The index is considered the source of truth for submodule paths, and cannot be overridden.
- N.B.: Filesystem paths are not copied to
- Submodule names, as mentioned, are only used to associate submodules in
.gitmodules
and.git/config
, and as the path to the repo in.git/modules/<submodule-name>
. This is so that submodules’ remotes and filesystem paths can change without requiring changes to local configurations or moving directories around in.git/modules
. As long as the name is the same, the remote information and locally cloned repository can be updated in.gitmodules
safely. - The desired commit of a submodule is stored in the index inside a gitlink file object. This object points to the filesystem path the submodule is checked out to. Gitlinks can only be created by
git submodule add
, and plumbing commands if you’re brave enough. Editing.gitmodules
or any other file to try and add a new submodule will see your changes silently ignored, since they all first look at the index to locate submodules. This may not be immediately apparent to the user, as it may seem.gitmodules
controls submodules, given that Git will create and populate it for you when runninggit submodule add
.
A defense of this design
As mentioned above, submodule commands update multiple areas of the repository. They touch a lot of stuff and are rather delicately interwoven. However, looking at the implementation details and all the things you can do with the system when configured in this manner, does bring to light certain benefits.
But while there are good reasons for most of these decisions, the added complexity makes it difficult for the layman to understand, while providing only occasional benefit for power-users. One could argue it isn’t worth the hassle.