Run more submodule tests on Cygwin (fix flaky xfails)#2154

EliahKagan

In #1455, which got Cygwin tests running on GitHub Actions, DWesl mentioned:

I have three test failures left that I don't understand, and hope someone here knows what's going on.

Byron's review mentioned:

The failures are all related to submodule handling, and I think that functionality isn't necessarily widely used anymore.

Further discussion raised the question of whether this was the case, as well as workarounds for using submodules on Cygwin even if submodule-specific functionality is broken. The tests were marked xfail (54bae76, 0eda0a5, 7f3689d) and the PR was merged.

It turns out the GitPython submodule functionality is not broken on Cygwin, and neither are the tests! Instead, the problem is that the rorepo fixture is GitPython's own repository, and when actions/checkout clones GitPython, the top-level gitdb directory is owned by the Administrators rather than the runneradmin user. Git for Windows special-cases this, judging that repositories owned by the Administrators group are safe for members of that group to trust. Cygwin git does not special-case this, so more safe.directory entries are needed to assuage it.

The key to identifying the cause of the problem is that only tests that use self.rorepo and attempt submodule operations, but not those using @with_rw_repo and attempting submodule operations, failed in this way. The nature of the problem was obscured by several oddities, though the fourth and final oddity is also what revealed it:

The subtleties of safe.directory protections on Windows are unintuitive and complex even in cases where they work fully as intended, only one Git implementation is involved, and CI behavior is likely to match local behavior, none of which hold here.
Whether actions/checkout clones the submodules, or leaves them to be cloned by init-tests-after-clone.sh, is irrelevant to all aspects of this problem. This is even though the action uses Git for Windows, while the init script (in a Cygwin job) uses Cygwin git. The git/ext/gitdb directory is created when checking out GitPython itself, and it is owned by Administrators.
The dubious ownership happens in a git cat-file --batch-check invocation we intend to reuse across operations. The git subprocess quits before reading its input. Currently GitPython issues a message about possible dubious ownership in this situation, since that's the most common cause. But it is not the only possible cause of that message from GitPython--so if one does not expect that the error is due to safe.directory protections, then one might dismiss that.
We run git cat-file --batch-check and send refs through the pipe. But if there is dubious ownership, then the git process quits before reading its input. So there is a race condition: our write may fail with EPIPE and raise BrokenPipeError, or our read may complete with EOF, such that we pass b"" to _parse_object_header, which raises ValueError.

BrokenPipeError is a subclass of IOError, and we currently swallow IOError in some of the places where this happens. The ValueError case is overwhelmingly more common. That's the xfail decorations from #1455 covered. Recently, EPIPE wins the race slightly more often than in the past, and we've had to rerun tests a number of times. It would be possible to add more exception types to the xfail decorations, but a better approach is to verify the complete details of what is going wrong and why, add more regression tests, and fix the problem properly, removing the xfails. This PR does those things.

The fix is very simple: we already have a CI step that adds paths to the Cygwin git safe.directory configuration, and the fix is to add the missing paths related to submodules. But the tests are somewhat nontrivial, and the partially reverted instrumentation to fully confirm the cause and facilitate easier debugging in the future is also somewhat nontrivial.

The code changes and commit messages in this PR were made with Claude Code, as disclosed in commit message trailers. I've reviewed and substantially adjusted the code changes. I've also reviewed and honed the commit messages through many rounds of revision, including manual edits. A few of the commit messages are long and dense. I have made sure to spend more time with those, to ensure I believe the details are warranted, since even though I am well known for writing long detailed commit messages, I understand this is something people are more wary of when LLMs are involved (since they can generate large amounts of text quickly). As for this PR description, no part of it is LLM-generated, though I did use Claude for proofreading.

The commit messages describe the situation, the evidence for it, and the fix from the perspective of tracing what has occurred. One aspect of the bug--the behavior of safe.directory protections on Windows when multiple git implementations are used and the repository has submodules that it must operate on--is particularly non-obvious, unintuitive, and interesting, and it may end up being relevant to future improvements here in GitPython, as well as to submodule portability subtleties that might arise in the future in gitoxide. Hence this section.

On POSIX, the owner and group owner of a file are separate concepts, with each file being owned by one user and group-owned by one group. But on Windows there is a unified concept of Owner. Unlike on a Unix-like system, on Windows a user or a group can own a file in the same sense of "own." Users usually own the files they create, but one of the exceptions to this is that users who are members of the Administrators group create files that are owned by the Administrators group. This exception is actually more specific than that: it only applies when the user is actually running with their unfiltered (full) token in which the Administrators group is active. Usually, members of the Administrators group on Windows run with UAC enabled and configured to require elevation to act with their full administrative powers. But the runneradmin user that runs Windows CI jobs runs with a full admin token, so files it creates are owned by Administrators.

Git uses ownership as a powerful trust signal. It will operate on repositories it thinks the user running it owns, since the configuration and hooks in such a repository are presumably safe. Git checks if some important repository paths, such as the repo's top-level directory and its .git dir, have the user running it as their owner. If they do, it trusts the repository. For any not owned, it checks if they match any values of the safe.directory configuration variable. If any are neither owned by the user nor match entries in safe.directory, then Git refuses to operate. But this gets weird on Windows, where the directories might be owned by something that isn't a user at all:

Git for Windows lets members of the Administrators group operate on repositories whose ownership matches what those users might very well create themselves. If the user is in Administrators, and a directory is owned by Administrators, then Git for Windows considers it to be owned by the user for purpose of assessing trust.
Cygwin git does not do this. It doesn't generally need to. If you start a program built against cygwin1.dll--whether that program is git or anything else--it changes the owner of the process token to the user, and then the owner inherited by securable objects (such as files) the process creates is the user instead of whatever it was before. So a member of the Administrators group, acting with the full powers thereof, can run Cygwin git with Administrators as the process token owner initially--but the process token owner is changed to the user, almost immediately, before Cygwin git's main() function is called. For this reason, Cygwin git typically has no need to treat repositories owned by the Administrators group specially--it never creates such a repository.

When we clone a repository that has a submodule, we may or may not also clone the submodule. If we do, we might clone it at the same time, or later. But whatever happens, so long as the top-level repository is able to be checked out, we first get the top-level repository with an empty directory at the submodule root. Whatever ownership we are creating files with, we create the submodule root with that ownership.

In all our Windows CI jobs, including the Cygwin jobs, we use actions/checkout to clone. While actions/checkout is capable of cloning repositories using the GitHub REST API, it only uses this as a fallback strategy. It first checks if a recent enough version of git is available and, if so, uses it. On all our Windows CI jobs, including the Cygwin jobs, actions/checkout clones the GitPython repository using Git for Windows.

Thus, no matter how its submodules are cloned, the top-level directory of gitdb is owned by the Administrators group. Cygwin git would therefore refuse to operate on it. But the GitPython test suite uses the GitPython repository itself, as well as its direct submodule gitdb and its nested submodule smmap, as test fixtures. Because Cygwin git wouldn't operate on the gitdb submodule, tests that use it were failing.

Most submodule tests didn't have a problem, because they use @with_rw_repo, which creates a new clone, instead of self.rorepo, which uses the repository in place. On Cygwin, @with_rw_repo clones the repository with Cygwin git. As described above, when Cygwin git clones repositories, it clones them as the user.

More remarkably, while I have added safe.directory entries for the nested smmap module, this is actually not strictly necessary--smmap never had dubious ownership! The reason is that only the gitdb directory created in the top-level GitPython checkout is owned by Administrators. No contents of submodules, not even of the gitdb submodule, are checked out as owned by Administrators. This is the case even if actions/checkout is made to clone them all by setting submodules: recursive. (I tested with this to be sure. But I did not keep this, since we have good reasons to validate that our init script will clone the submodules even if they were not cloned before. See #1713 and #1715 on this.) That is to say that none of the files created when cloning submodules are owned by Administrators even when Git for Windows creates them in the same top-level git clone command.

How can this be? Historically, the machinery that operated on submodules was implemented in scripts. Over time, it basically all came to be implemented in C, except that the git-submodule subcommand itself remains implemented by git-submodule.sh. Today, the major functionality of the script is to parse options, apply defaults, look up some information about the current repository, and call git submodule--helper to do the real work. In Git for Windows, the C code that does the real work is in native Windows programs such as git.exe. But git-submodule.sh is a shell script. Its interpreter is sh.exe from the MSYS2 "Git Bash" environment that Git for Windows ships.

MSYS2 is like Cygwin (MSYS2 is a fork of MSYS, which is a fork of Cygwin). Just as Cygwin programs link to cygwin1.dll, which does various setup--including, as described above, resetting the process's token owner to the user--MSYS2 programs link to msys-2.0.dll, which does that very same thing. Therefore, a user who is a member of the Administrators group can run git with Administrators as its process token owner, but in any chain of subprocesses that goes through a shell invocation, everything at or below that invocation will operate with it reset to the user.

Copilot

Pull request overview

This PR removes three xfail markers on Cygwin submodule tests by addressing the real underlying cause: Cygwin git refuses to operate on the git/ext/gitdb submodule directory because actions/checkout (Git for Windows) creates it owned by the Administrators group, which Cygwin git—unlike Git for Windows—does not treat as user-owned. The fix adds the missing submodule paths to the Cygwin workflow's safe.directory configuration. The PR also adds a new test_fixture_health.py to surface this class of misconfiguration with clear messages, and adds diagnostic CI steps that print POSIX/NTFS ownership and safe.directory entries to make future debugging easier.

Changes:

Add safe.directory entries for git/ext/gitdb and the nested smmap submodule in the Cygwin workflow; remove now-unneeded xfail markers on three Cygwin submodule tests and tighten one assertion.
Add test/test_fixture_health.py to fail fast with actionable messages when fixture directories are missing, uninitialized, or rejected by git for dubious ownership.
Add ownership / safe.directory diagnostic steps to the standard, Alpine, and Cygwin workflows.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file File Description

.github/workflows/cygwin-test.yml	Adds safe.directory entries for the gitdb/smmap submodules (the actual fix) and diagnostic steps showing POSIX/NTFS ownership and configured safe.directory entries.
.github/workflows/pythonpackage.yml	Adds POSIX/NTFS ownership diagnostics and a safe.directory listing step on the main CI workflow.
.github/workflows/alpine-test.yml	Adds POSIX ownership and safe.directory listing diagnostics for the Alpine job.
test/test_submodule.py	Removes the Cygwin xfail from test_root_module and tightens the deep-traversal submodule assertion to an exact path list.
test/test_repo.py	Removes the Cygwin xfail from TestRepo.test_submodules.
test/test_docs.py	Removes the Cygwin xfail from test_submodules and drops now-unused sys/pytest imports.
test/test_fixture_health.py	New module verifying that required fixture directories exist, are initialized as git repos, and are trusted by git, providing clear remediation hints on failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

EliahKagan

I think this will be ready to merge once CI passes.

In case CI on intermediate commits is of interest, this can be seen in my fork. I've refrained from doing multiple pushes here because some of the intermediate commits temporarily introduce many test jobs to ensure I produce the rare ~1% race conditions and that the fix actually makes them go away.

EliahKagan

One shortcoming is that, due to the effect of set -x, the new steps that show POSIX file ownership for debugging purposes produce output that is ugly and hard to read. The solution might just be to do set +x for those steps, though I worry that in realistic situations where something goes wrong and we want to see that, it might cause something to be left out that we actually do want. I'll think about that and hope to include a cosmetic improvement in a future PR. My view is that it shouldn't be considered a blocker.

EliahKagan and others added 11 commits May 17, 2026 23:44

EliahKagan requested a review from Copilot May 18, 2026 06:01

Copilot started reviewing on behalf of EliahKagan May 18, 2026 06:02 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

EliahKagan commented May 18, 2026

View reviewed changes

EliahKagan marked this pull request as ready for review May 18, 2026 06:14

EliahKagan deleted the claude/cygwin-safe-directory branch May 18, 2026 12:54

EliahKagan mentioned this pull request May 23, 2026

Cut xtrace noise from POSIX-ownership diagnostic steps #2156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run more submodule tests on Cygwin (fix flaky xfails)#2154