There was a problem hiding this comment.
This PR fixes a UnicodeDecodeError that occurred when GitPython attempted to read packed-refs files containing ref names encoded with non-UTF-8 character encodings (e.g., Latin-1 encoded tag names from older Git versions). The fix uses Python's surrogateescape error handler, which is the standard approach for handling filesystem operations with potentially mixed or unknown encodings.
Key changes:
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| git/refs/symbolic.py | Adds errors='surrogateescape' to the packed-refs file reader to handle non-UTF8 encoded ref names gracefully |
| test/test_refs.py | Adds test case that creates a packed-refs file with Latin-1 encoded ref name and verifies it can be read without errors |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Sorry, something went wrong.
Summary
Fixes #2064
The packed-refs file can contain ref names that are not valid UTF-8 (e.g., Latin-1 encoded tag names created by older Git versions or systems with different locale settings). Previously, GitPython would fail with UnicodeDecodeError when reading such files.
Reproduction
As described in #2064:
Before fix:
After fix: Successfully reads all 101 tags.
Changes
Technical Details
The surrogateescape error handler is Python's standard approach for handling potentially non-UTF8 data in filesystem operations. It:
This is the same approach used by Python's os.fsdecode() and is recommended for filesystem operations where encoding may be unknown or mixed.