Sorry, something went wrong.
|
To add a bit more context: I followed the general approach mentioned by @Byron and used by gitoxide, which is a for loop that iterates over each character, with memory of the previous two characters seen. This is faster than the naive approach (since we minimize the amount of times we iterate over the refname string), at the cost of some readability. For comparison, here's the naive approach, where the logic separation matches the docs (one rule per if condition): def _check_ref_name_valid_naive(ref_path: PathLike) -> None:
# Based on https://git-scm.com/docs/git-check-ref-format/
if any([component.startswith(".") or component.endswith(".lock") for component in ref_path.split("/")]):
raise ValueError(f"Invalid reference '{ref_path}': components cannot start with '.' or end with '.lock'")
elif ".." in str(ref_path):
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '..'")
elif any([ord(c) < 32 or ord(c) == 127 or c in [" ", "~", "^", ":"] for c in ref_path]):
raise ValueError(
f"Invalid reference '{ref_path}': references cannot contain ASCII control characters, spaces, tildes (~), carets (^) or colons (:)"
)
elif any([c in ["?", "*", "["] for c in ref_path]):
raise ValueError(
f"Invalid reference '{ref_path}': references cannot contain question marks (?), asterisks (*) or open brackets ([)"
)
elif ref_path.startswith("/") or ref_path.endswith("/") or "//" in ref_path:
raise ValueError(f"Invalid reference '{ref_path}': references cannot start or end with '/', or contain '//")
elif ref_path.endswith("."):
raise ValueError(f"Invalid reference '{ref_path}': references cannot end with '.'")
elif "@{" in ref_path:
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '@{{'")
elif ref_path == "@":
raise ValueError(f"Invalid reference '{ref_path}': references cannot be '@'")
elif "\\" in ref_path:
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '\\'")
The naive approach is IMO more readable, but around half as fast as the one in the PR. Although, for reference, in my MacBook M1 Pro, for a refname 25 characters long:
So we are talking about minimal amounts either way. I'll leave the choice of which algorithm to use up to the maintainers. |
Sorry, something went wrong.
There was a problem hiding this comment.
Thanks a million, I love this implementation!
Strangely enough, I find the faster version (the one here) more readable as well and would want to keep it for that reason alone.
There is one issue I see that might be hard to solve, but it's time to at least try. It's the general problem of how to interact with paths without running into decoding problems (i.e. Python tries to decode a path as decoding X, and fails, even though it's a valid filesystem path). Maybe @EliahKagan also has ideas regarding this topic.
Sorry, something went wrong.
| # Based on the rules described in https://git-scm.com/docs/git-check-ref-format/#_description | ||
| previous: Union[str, None] = None | ||
| one_before_previous: Union[str, None] = None | ||
| for c in str(ref_path): |
There was a problem hiding this comment.
Is there a way to avoid converting to str? I assume this tries to decode ref_path with the current string encoding, which changes depending on the interpreter or user configuration and generally causes a lot of trouble.
Sorry, something went wrong.
There was a problem hiding this comment.
Unless this PR worsens that problem in some way, which I believe it does not, I would recommend it be fixed separately and later. The code this is replacing already had:
GitPython/git/refs/symbolic.py
Lines 171 to 172 in d40320b
But actually even that neither introduced nor exacerbated the problem. From the commit prior to #1644 being merged:
GitPython/git/refs/symbolic.py
Lines 164 to 174 in 830025b
Note how str(ref_path) was passed to os.path.join, which when given strs returns a str, thus a str was being passed to open. Note also that, while this str call was actually redundant (os.path.join accepts path-like objects since Python 3.6), even it was not the cause of str and not bytes being used. The annotation on ref_path is Union[PathLike, None], where PathLike is:
Line 43 in 830025b
Where both alternatives--str and os.PathLike[str]--represent text that has already been decoded.
So unless I'm missing something--which I admit I could be--I don't think it makes conceptual sense to do anything about that in this pull request. Furthermore, unless the judgment that CVE-2023-41040 was a security vulnerability was mistaken, or something about the variation explicated in #1644 (comment) is less exploitable, it seems to me that this pull request is fixing a vulnerability. Assuming that is the case, then I think this should avoid attempting to make far-reaching changes beyond those that pertain to the vulnerability, and that although reviewing these changes for correctness should not be rushed, other kinds of delays should be mostly avoided. With good regression tests included, as seems to be the case, the code could be improved on later in other ways.
Sorry, something went wrong.
There was a problem hiding this comment.
Thanks a lot for the thorough assessment, I wholeheartedly agree.
The 'how to handle paths correctly' issue is definitely one of the big breaking points in GitPython, but maybe, for other reasons, this wasn't ever a problem here.
Knowing this is on your radar, maybe one day there will be a solution to it. gitoxide already solves this problem, but it's easier when you have an actual type system and a standard library that makes you aware every step of the way.
Sorry, something went wrong.
There was a problem hiding this comment.
Is the ultimate goal to support both str-based and bytes-based ref names and paths?
Sorry, something went wrong.
There was a problem hiding this comment.
The goal is correctness, and it's vital that one doesn't try to decode paths to fit some interpreter-controlled encoding. Paths are paths, and if you are lucky, they can be turned into bytes. On Unix, that's always possible and a no-op, but on windows it may require a conversion. It's just the question how these things are supposed to work in python.
Sorry, something went wrong.
There was a problem hiding this comment.
Does this relate (conceptually, I mean) to the issue in rust-lang/rust#12056?
Sorry, something went wrong.
There was a problem hiding this comment.
A great find :) - yes, that's absolutely related. gitoxide internally handles git-paths as bundles of bytes without known encoding, and just like git, it assumes at least ASCII. Conversions do happen but they all go through gix-path to have a central place for it.
Doing something like it would be needed here as well, even though I argue that before that happens universally, there should be some clear definition of what GitPython is supposed to be.
When I took it over by contributing massively, just like you do now, I needed more control for the use-case I had in mind, and started implementing all these sloppy pure-python components that don't even get the basics right. With that I turned GitPython into some strange hybrid which I think didn't do it any good besides maybe being a little faster for some usecases. After all, manipulating an index in memory has advantages, but there are also other ways to do it while relying on git entirely.
Maybe this is thinking a step too far, but I strongly believe that the true benefit of GitPython is to be able to call git in a simple manner and to be compliant naturally due to using git directly. This should be its identity.
But then again, it's worth recognizing that changing the various pure-python implementations to us git under the hood probably isn't possible in a non-breaking way.
Another avenue would be to try and get the APIs to use types that don't suffer from encoding/decoding issues related to Paths, and then one day make the jump to replacing the pure-python implementations with the python bindings of gitoxide.
Sorry, something went wrong.
|
It looks like CI improved and now that the PR was merged, it failed CI due to a lint: https://github.com/gitpython-developers/GitPython/actions/runs/6271134895/job/17030195508#step:4:122 . A quick fix will be appreciated. Edit: I quickly fixed it myself - it seems like sometimes I forget that I am still able to edit text, despite it being python. |
Sorry, something went wrong.
|
@Byron Because the forthcoming 3.1.37 release that will include this patch will be a security fix, either the existing advisory/CVE should be updated with a correction (both a note and the version change), or a new CVE should be created for the variant of the vulnerability reported at #1644 (comment). I am not sure which of those things should be done here. Usually I would lean toward regarding such things as new bugs meriting new advisories/CVEs, which is also what I see more often. But I do not know that that's the best approach here, because the variant of the exploit where an absolute (or otherwise non-relative) path is used does seem to match the description in the summary section of CVE-2023-41040 even though it doesn't resemble any of the examples. To be clear, I don't mean that this situation is necessarily ambiguous, but instead that I do not have the knowledge and experience to know how it ought to be handled. Either way, this need not delay the release, of course. (Sorry if you're already on top of the CVE/advisory matter and this comment is just noise.) |
Sorry, something went wrong.
|
Thanks for the hint, it's appreciated! I think it's fair to say that I am not on top of CVEs and that I have no intention to be - even though this sounds harsh it's just the current reality. But thus far members of the community picked up the necessary work around CVEs which I definitely appreciate if this would keep happening. |
Sorry, something went wrong.
|
A new release was created: https://pypi.org/project/GitPython/3.1.37/ |
Sorry, something went wrong.
|
Given the 3.1.37 release title ("3.1.37 - a proper fix CVE-2023-41040") I'm thinking this is intuitively being regarded as fixing the originally reported vulnerability, so perhaps that advisory should be updated, rather than a new one created? I am still not sure. As noted in #1638 (comment), you (@Byron) could update the local advisory. If you do, a PR could then also be opened on https://github.com/github/advisory-database (where github/advisory-database#2690 was opened) to change the global advisory accordingly. I don't know if there's anything else that would need to be done. @stsewd Do you have any opinion about what ought to be done here? Would you have any objection to the local advisory being edited this way? Would you instead prefer that this variant, where an absolute path is used, be regarded as a related but separate vulnerability altogether? (I know @Byron can edit the advisory, but I wanted to check in case you had an opinion on this.) One source of my hesitancy here is that I think a new CVE may still be needed in this kind of situation. That seems common (courtesy of this SO answer). |
Sorry, something went wrong.
|
As noted in #1638 (comment), you (@Byron) could update the local advisory. If you do, a PR could then also be opened on https://github.com/github/advisory-database (where github/advisory-database#2690 was opened) to change the global advisory accordingly. I don't know if there's anything else that would need to be done. A good point - I am still getting used to advisories and the local ones are indeed editable. So that one has been adjusted. I kindly ask somebody else to create a PR for the global database though - it seems GitHub makes it hard/impossible to the use web interface for that. One source of my hesitancy here is that I think a new CVE may still be needed in this kind of situation. To me, CVEs are good to create a far-reaching 'ping' to users of GitPython. Some might see it earlier than the new release. To me it's the question on how much time one wants to spend to create such a ping, and judging from the CVE's I have seen, it's quite expensive. |
Sorry, something went wrong.
|
A good point Thanks, but I'm not actually sure if it was a good point. Maybe a new advisory ought to have been, or ought to be, created. I really don't know the proper thing to do here. |
Sorry, something went wrong.
|
If there is an uproar because of how this was handled, it will be possible to undo changes to the local CVE and create a new one. So I think nothing is lost, and I think it's OK to chose less expensive options in dealing with this. |
Sorry, something went wrong.
|
Hi, a new CVE/advisory is usually created for this type of situation, and in the description you can put something like "this was a due to an incomplete fix of [link to the other CVE]". I don't oppose to edit the current one, but I guess editing doesn't have the same "ping to everyone to upgrade" effect as a new one. |
Sorry, something went wrong.
|
Hi, a new CVE/advisory is usually created for this type of situation, and in the description you can put something like "this was a due to an incomplete fix of [link to the other CVE]". I don't oppose to edit the current one, but I guess editing doesn't have the same "ping to everyone to upgrade" effect as a new one. Thanks! @Byron Based on this, and also what I am now seeing is the recent history of this practice being followed for GitPython in CVE-2022-24439/CVE-2023-40267, I recommend making a new advisory. Maybe there is some way I can help with this? However, if for any reason you would still prefer this route not be taken, then I can definitely go ahead and open a PR to update the global advisory with the version change. (I am unsure if that would cause Dependabot to notify users of the security update or not, but I imagine that, if it would not, then a reviewer on the PR would mention that.) But thus far members of the community picked up the necessary work around CVEs which I definitely appreciate if this would keep happening. I have three ideas of what I could do, but I don't know what, if any of them, would help or be wanted. This depends, in part, on what takes up the time for you.
|
Sorry, something went wrong.
|
@Byron Based on this, and also what I am now seeing is the recent history of this practice being followed for GitPython in CVE-2022-24439/CVE-2023-40267, I recommend making a new advisory. Maybe there is some way I can help with this? However, if for any reason you would still prefer this route not be taken, then I can definitely go ahead and open a PR to update the global advisory with the version change. (I am unsure if that would cause Dependabot to notify users of the security update or not, but I imagine that, if it would not, then a reviewer on the PR would mention that.) Let's try something: updating version numbers is much cheaper than creating a new 'follow-up' CVE, for all sides actually. One could ask in the PR of the version change if notifications will be sent, and if unknown, @stsewd could probably help to tell as well. If no notification is sent, you could create a new CVE - you would be able to do this here in GitPython and from there it can be elevated, along with requesting a global CVE for it - this is easily done through the maintainer interface. The rest we can take from there should it come to that. |
Sorry, something went wrong.
|
Let's try something: updating version numbers is much cheaper than creating a new 'follow-up' CVE, for all sides actually. Sounds good; I will do this. I noticed in the local advisory that, while 3.1.37 is given for patched versions, <=3.1.34 is still given for affected versions. I recommend changing that <=3.1.34 here to <=3.1.36 for consistency, and I'll specify <=3.1.36 in my proposed change to the global advisory. But if that is no the case and you want that specified differently, please let me know and I'll do differently (or amend the PR if I have already made it). One could ask in the PR of the version change if notifications will be sent I'm making the PR through the structured "Suggest improvements" template, in which every field is pretty specific. I'll either include it somewhere if it fits, or otherwise try and add it into the created PR or add a comment with it. If no notification is sent, you could create a new CVE - you would be able to do this here in GitPython and from there it can be elevated, along with requesting a global CVE for it - this is easily done through the maintainer interface. The rest we can take from there should it come to that. Thanks for telling me about that. That is much nicer than the particular specific I had suggested might be used. Of course, I'll still save that for if the above proves insufficient, as you say. |
Sorry, something went wrong.
|
I noticed in the local advisory that, while 3.1.37 is given for patched versions, <=3.1.34 is still given for affected versions. I recommend changing that <=3.1.34 here to <=3.1.36 for consistency, and I'll specify <=3.1.36 in my proposed change to the global advisory. But if that is no the case and you want that specified differently, please let me know and I'll do differently (or amend the PR if I have already made it). Thanks for the head's up, that's an oversight that is now corrected. And thanks again for your help! |
Sorry, something went wrong.
|
No problem! I've submitted the proposed edit to the global advisory in PR github/advisory-database#2753. |
Sorry, something went wrong.
|
github/advisory-database#2753 has been merged and the global GitHub advisory for CVE-2023-41040 has thus been updated. |
Sorry, something went wrong.
|
Further update: The change to the global advisory has caused Dependabot security alerts to be raised, as desired. For example, dmvassallo/EmbeddingScratchwork#248 is a PR opened automatically to resolve a new Dependabot security alert in a project where GitPython had already been upgraded to the previously listed version. Note that this does not necessarily apply for tools that are less closely coupled to the GitHub ecosystem, and I don't know, for example, if any new Renovatebot PRs will be generated. |
Sorry, something went wrong.
This change adds checks based on the rules described in the docs in order to more robustly check a refname's validity.