GitPython is a Python code library for programmatically reading from and writing to Git source control repositories.
Let's learn how to use GitPython by quickly installing it and reading from a local cloned Git repository.
This tutorial should work with either Python 2.7 or 3, but Python 3, especially 3.6+, is strongly recommended for all new applications. I used Python 3.6.3 to write this post. In addition to Python, throughout this tutorial we will also use the following application dependencies:
Take a look at this guide for setting up Python 3 and Flask on Ubuntu 16.04 LTS if you need specific instructions to get a base Python development environment set up.
All code in this blog post is available open source under the MIT license on GitHub under the first-steps-gitpython directory of the blog-code-examples repository. Use and abuse the source code as you like for your own applications.
Start by creating a new virtual environment for your project. My virtualenv is named testgit but you can name yours whatever matches the project you are creating.
Activate the newly-created virtualenv.
The virtualenv's name will be prepended to the command prompt after activation.
Now that the virutalenv is activated we can use the pip command to install GitPython.
Run the pip command and after everything is installed you should see output similar to the following "Successfully installed" message.
Next we can start programmatically interacting with Git repositories in our Python applications with the GitPython installed.
GitPython can work with remote repositories but for simplicity in this tutorial we'll use a cloned repository on our local system.
Clone a repository you want to work with to your local system. If you don't have a specific one in mind use the open source Full Stack Python Git repository that is hosted on GitHub.
Take note of the location where you cloned the repository because we need the path to tell GitPython what repository to handle. Change into the directory for the new Git repository with cd then run the pwd (present working directory) command to get the full path.
You will see some output like /Users/matt/devel/py/fsp. This path is your absolute path to the base of the Git repository.
Use the export command to set an environment variable for the absolute path to the Git repository.
Our Git repository and path environment variable are all set so let's write the Python code that uses GitPython.
Create a new Python file named read_repo.py and open it so we can start to code up a simple script.
Start with a couple of imports and a constant:
The os module makes it easy to read environment variables, such as our GIT_REPO_PATH variable we set earlier. from git import Repo gives our application access to the GitPython library when we create the Repo object. COMMITS_TO_PRINT is a constant that limits the number of lines of output based on the amount of commits we want our script to print information on. Full Stack Python has over 2,250 commits so there'd be a whole lot of output if we printed every commit.
Next within our read_repo.py file create a function to print individual commit information:
The print_commit function takes in a GitPython commit object and prints the 40-character SHA-1 hash for the commit followed by:
Below the print_commit function, create another function named print_repository to print details of the Repo object:
print_repository is similar to print_commit but instead prints the repository description, active branch, all remote Git URLs configured for this repository and the latest commit.
Finally, we need a "main" function for when we invoke the script from the terminal using the python command. Round out our
The main function handles grabbing the GIT_REPO_PATH environment variable and creates a Repo object based on the path if possible.
If the repository is not empty, which indicates a failure to find the repository, then the print_repository and print_commit functions are called to show the repository data.
If you want to copy and paste all of the code found above at once, take a look at the read_repo.py file on GitHub.
Time to test our GitPython-using script. Invoke the read_repo.py file using the following command.
If the virtualenv is activated and the GIT_REPO_PATH environment variable is set properly, we should see output similar to the following.
The specific commits you see will vary based on the last 5 commits I've pushed to the GitHub repository, but if you see something like the output above that is a good sign everything worked as expected.
We just cloned a Git repository and used the GitPython library to read a slew of data about the repository and all of its commits.
GitPython can do more than just read data though - it can also create and write to Git repositories! Take a look at the modifying references documentation page in the official GitPython tutorial or check back here in the future when I get a chance to write up a more advanced GitPython walkthrough.
Questions? Let me know via a GitHub issue ticket on the Full Stack Python repository, on Twitter @fullstackpython or @mattmakai.
See something wrong in this blog post? Fork this page's source on GitHub and submit a pull request.