x

Git Submodules are Fun

Updated 9/22/25: More corrections and better description of how gitlinks work internally.
Updated 5/13/25: Corrected some mistakes and added more example scripts.

As usual, because Git is a confusing, convoluted mess of nested questionably-designed wrapper commands that often do not accurately describe their actual functionality, submodules are not as simple as they appear on the surface, and misunderstanding them can lead to headache.

As usual.

Before discussing what a submodule is, one must understand gitlinks.

And before discussing those, one must understand how Git stores data in general.

Git, being a source code manager, needs a way to store your source files in a way that can be easily and quickly accessed later on, while also being agnostic to whatever filesystem you're using so that it can be portable between systems. It does this by writing the contents of your repository into objects. These objects have types, like blob for files, and tree for directories. These types—as well as the permissions of the files they represent—are stored in a bitmask, following Git’s own specification. While similar to and modeled after traditional Unix/POSIX mode bitmasks, Git’s rendition actually only supports a small fixed list of modes, with additional Git-only bits.

In particular, the specification defines a unique file type called “gitlink” that references a commit object name—or more colloquially, a SHA-1 commit hash. Do note that gitlinks only contain a hash; Git does not define what the hash is used for, or even what repository the hash belongs to. Since Git generally ignores the contents of gitlinks, you could theoretically use them for any number of purposes.

Submodules are one such (and so far only official) extension of this functionality. They using gitlinks to keep track of what commit the submodule is currently at within the superproject.

How submodules are implemented

So, submodules use gitlinks to keep track of the submodule's commit, but how does a submodule know what repo that commit belongs to? This allows the commit of the submodule to be checked in to source control, so it is updated alongside other files in the repository. Other information handled by submodules includes the remote repository it pulls from, how and where the repository is cloned, and how and where to check out its working tree. In order to track the repository a gitlink refers to, an entry containing the submodule’s repository path is added to the superproject’s .gitmodules. This path can be a URL to a remote resource, or a local directory. These two items are the only parts of a submodule that get committed, and further updates to the submodule can be tracked by updating the gitlink with the submodule’s new commit hash.

  • Checking out submodules: Once a commit hash and repo path are set up, the submodule must be initialized and updated: Initializing a submodule will add it to .git/config, while updating it will clone the repo to .git/modules/<submodule-name>, and if necessary, check out a worktree to the submodule’s path in the superproject.

Given the above information, you might be fooled into thinking submodules are relatively simple and straightforward, unless you’re already lost. I don’t blame you. However, there is a lot of nuance here that results in some potentially confusing implementation details.

Submodules use multiple configuration and other files across the repository, and if you touch one area of them without also updating the other, you can easily cause errors and stoppages that, on their surface, present no obvious method of resolution.

Submodule operations may read and write any selection of the following, and it is not always obvious which: ^updates

  • .gitmodules, under [submodule "<submodule-name>"] sections
  • .git/config, under [submodule "<submodule-name>"] sections
  • .git/modules/<submodule-name>
  • the submodule’s gitlink in the index
  • the submodule’s worktree in the superproject’s working tree

There are certain ways to interact with submodules, and doing so in an unsupported way will result in the whole thing falling over. And what exactly is “supported” is masked by Git builtins and obscure documentation. Fun! 🥳

Keep in mind the following when dealing with submodules, if you want to avoid problems. I’ve highlighted more potentially fun bits of information in bold.

Submodules and their remotes can fall out of sync

Submodules are cloned from remote repositories. When creating a submodule, the remote is stored in .gitmodules. This then has to be copied to .git/config by running git submodule init… And unless you know/remember to run git submodule sync after you git pull, upstream changes to .gitmodules won’t be reflected in your local repo. This can result in hidden problems caused by outdated submodules, where the user may otherwise expect git submodule update to take care of everything. Hell, users may not even know git pull doesn’t update their submodules for them—while Git will fetch the changes to submodules necessary to resolve the commit specified by the gitlink in the index, it won’t update their worktrees for you.

Submodule names are also filesystem paths

Submodules have names, and these names can contain path separators. By default, the name of a submodule is the filesystem path it is checked out to relative to the repository root, but it can really be anything you want. Submodule paths do not have to match their name, and if a submodule is moved without renaming it, the new path and name will not match. This was presumably done so that adding a submodules back to the same path will reuse/overwrite the old submodule name, and to otherwise guarantee a unique name in the usual usage.

Cloned Git repositories of submodules are stored locally in .git/modules. Each repository is put in a subdirectory created using the submodule’s name. If the name contains path separators, further subdirectories will be created to fully resolve the path. For example, a submodule named hello/world/foo will find its Git repo stored in .git/modules/hello/world/foo.

These concepts aren’t really a problem for most users in most situations, and it’s possible to specify custom names for submodules with git submodule add --name. But what if the user didn’t realize that before creating it, and they want to reformat their names? What if they want a moved submodule’s name to match its new path, or to have the names not contain path elements at all after realizing they do by default? Perhaps the repo or project the submodule points to changed somehow, and the user wants the name to reflect the new information. Users with issues like these may attempt to rename their submodules in .gitmodules, which is the most immediately obvious place to look, but this won’t work by itself and may cause errors later on.

Renaming submodules

There is actually no built-in way to rename a submodule. The easiest way to accomplish such a task with only native Git commands is to simply recreate it:

path='<submodule-path>'
# deinit the submodule: removes its worktree, and its entry from .git/config
git submodule deinit "$path"
# remove the submodule worktree, its gitlink, and its entry from .gitmodules
git rm "$path"
# recreate the submodule: adds a new entry to .gitmodules, runs
# `submodule init` to add its entry back to .git/config and check it out, and
# adds the worktree directory to the index as a gitlink
git submodule add --name "<new-submodule-name>" "<submodule-repo>" "$path"

In the past, doing git rm <submodule-path> would not remove the submodule from .gitmodules, but it now does with modern versions of Git. However, it still won’t remove the submodule’s Git directory from .git/modules, and will re-fetch the repo when adding it under the new name. It’s possible to avoid this by substituting git submodule add:

old_name='<old-submodule-name>'
new_name='<new-submodule-name>'
path='<submodule-path>'
# deinit the submodule: removes its worktree, and its entry from .git/config
git submodule deinit "$path"
# rename the submodule in .gitmodules
sed -Ei 's|(^\[submodule ")'"$old_name"'("\]$)|\1'"$new_name"'\2|' .gitmodules
# to avoid cloning the submodule again, rename its git dir
mv .git/modules/"$old_name" .git/modules/"$new_name"
# re-init the submodule: will add it back to .git/config with the new name, and
# check out the repo at the submodule location
git submodule init "$path"

Since git submodule add would always run git add to create the gitlink, we can also remove the git rm step. This means we’ve managed to avoid touching the index at all. This is beneficial if you’ve updated the commit hash of your submodule and haven’t committed the change to the superproject yet.

But what if you have local changes to the submodule? git submodule deinit will remove your worktree and as such any uncommitted changes. For this situation, it’s possible to rename a module manually by editing the name in both .gitmodules and .git/config, and moving the submodule repo to the new name in the .git/modules tree—these are the only places the name is used.

Furthermore, if the module hasn’t been initialized with git submodule init, then .git/config doesn’t need to be edited. Likewise, there will be no repo in .git/modules/<old-name> if the submodule hasn’t been checked out with git submodule update before. ^names

old_name='<old-submodule-name>'
new_name='<new-submodule-name>'
# rename the submodule in .gitmodules and .git/config
sed -Ei 's|(^\[submodule ")'"$old_name"'("\]$)|\1'"$new_name"'\2|' \
  .gitmodules .git/config
# to avoid cloning the submodule again, rename its git dir
mv .git/modules/"$old_name" .git/modules/"$new_name"

Submodule filesystem paths also exist in the index

As mentioned, there are two parts to a submodule: the gitlink and the entry in .gitmodules. One must keep in mind that the submodule’s path is what is used to look up the gitlink in the index to get the submodule’s commit. This means that moving a submodule’s worktree around in the superproject requires moving the gitlink as well. ^paths

I imagine this would be particularly confusing for new users that want to move their submodule somewhere else. Upon discovering that there isn’t a dedicated git submodule mv command to do so, the clever user may believe they can edit .gitmodules directly to point at a new path, and mv the directory themselves to the new destination. But this will not work, and will cause errors later on.

In reality, the user must instead use git mv, which will update .gitmodules for you. As far as Git is concerned, submodules are files and must be treated as such… and git mv must specifically be used so that other files that reference the submodule are also updated. Updating any of these files manually before using git mv will also cause issues. As long as git mv is used, the user needn’t worry about implementation details like .gitmodules or .git/modules. And while this is a good thing, the documentation does not make it immediately clear this is how it should be done, and that the other files basically never need to be touched manually. (And that you will in fact have a bad time if you touch them.)

If you for some reason want to move a submodule yourself, or fix a botched move, you must:

  • Remove the old gitlink from the index.
  • Add a new gitlink to the index with the correct hash at the new path. This can be done manually.
  • Update the following paths in the following files to point to the correct locations:
    • submodule.<name>.path in .gitmodules should point to the gitlink
    • core.worktree in .git/modules/<submodule-name>/config should point to the gitlink
    • gitdir in <submodule-path>/.git should point to .git/module/<submodule-name>

Note that all of the above paths should be relative, the entry in .git/config won’t exist if the submodule hasn’t been initialized with git submodule init, and the last path won’t exist if the submodule hasn’t been checked out with git submodule update.

Also note that git submodule deinit will remove the entire submodule path in the working tree, but will only remove the core.worktree line from the submodule repo config and not the entire repo.

Git specially handles gitlinks: when adding a directory with git add, Git will check if the directory is a valid repo or not. If it isn’t, it will recursively add all files. If it is, however, it will instead add a gitlink to the index with the ID of the commit as the object hash.

You can see this with git ls-files -s. The first column is 160000, which indicates a gitlink, and the second column is the commit hash of the submodule to check out. Normally, this hash would instead be the SHA-1 hash of the file on the filesystem that is being added, but for gitlinks it is repurposed.

It is not always possible or desired, however, to manually clone a repo, check out a specific commit or tree, and use git add—especially if you already know the commit you want to use. In this case, it’s actually possible to manually add a gitlink to the index like so: ^add-gitlink

git update-index --add --cacheinfo 160000,<submodule-commit-hash>,<submodule-path>

Implementation details

Based on my above descriptions, it may seem like submodules are incomprehensible. But in reality, they aren’t that complex. Going into a bit more detail on how exactly everything ties together can help puzzle stuff out:

  • The submodule name, remote path/URL, and filesystem path are stored in the .gitmodules config file, and this file is automatically updated when adding a new submodule with git submodule add or removing a gitlink with git rm. This file is what connects index gitlinks to repositories and other metadata, as mentioned.
  • Submodule names, remotes, and update methods are synced into submodule sections in .git/config when running git submodule init. git submodule sync can be used to sync settings if .gitmodules changes later on. It is set up like this to allow remotes to be overridden locally in .git/config without updating .gitmodules in the worktree.
    • N.B.: Filesystem paths are not copied to .git/config! See below for why.
  • Information lookups for submodules is usually done by path, at least on the user side. Git will find a submodule’s Git repo using the worktree’s .git path, or it will look up the entry in .gitmodules if the submodule hasn’t been checked out yet.
  • Submodule names, as mentioned, are only used to associate submodules in .gitmodules to their sections in .git/config, and as the path to the repo in .git/modules/<submodule-name>. This is so that submodules’ remotes and repo paths can change without requiring changes to the superproject. As long as the name is the same, the remote information and locally cloned repository can be updated in .gitmodules safely.
  • The desired commit of a submodule is stored in the index using a gitlink object. This object points to the filesystem path the submodule is checked out to. Editing .gitmodules or any other file to try and add a new submodule will see your changes silently ignored, since they all first look at the index to locate submodules. This may not be immediately apparent to the user, as it may seem .gitmodules controls submodules, given that Git will create and populate it for you when running git submodule add. When doing things this way, the index must be updated to add a gitlink before anything will work.

A defense of this design

As mentioned above, submodule commands update multiple areas of the repository. They touch a lot of stuff and are rather delicately interwoven. However, looking at the implementation details and all the things you can do with the system when configured in this manner, does bring to light certain benefits.

But while there are good reasons for most of these decisions, the added complexity makes it difficult for the layman to understand, while providing only occasional benefit for power-users. One could argue it isn’t worth the hassle, and the system was built this way to not break existing users. As is true for most software.

Left-click: follow link, Right-click: select node, Scroll: zoom
x