Git Under the Hood

Question

Git Under the Hood

István DöbrenteiBackerLeader posted Jun 28 3 min read

I decided to write a short introduction to Git — and more specifically, what’s going on under the hood. Understanding the internal mechanics of Git can help you better grasp its basic behavior and make it easier to understand commands used in everyday workflows.

Did you know that Git is fundamentally a key-value store?

In Git, the key is a SHA-1 (or SHA-256) hash, and the value is a Git object.

What is a hash?

SHA stands for Secure Hash Algorithm. It’s a family of cryptographic hash functions designed to take input data of any size and produce a fixed-size string of characters — a "digest" or "hash" — that looks like a random sequence of letters and numbers.

Key Properties of a Hash

The same input always produces the same output hash.
It’s fast to compute a hash for any given input.
Given a hash, it's computationally infeasible to reverse it and recover the original input — doing so would require an impractical amount of time and energy.
It's extremely unlikely that two different inputs will produce the same hash (this is called a collision).
Even small changes in the input result in completely different hashes.

Each Git object's hash is calculated from its content. This guarantees data integrity — meaning the content can’t be modified without changing its hash. It also enables deduplication, where identical content is stored only once.
Historically, Git has used SHA-1 hashes for its internal objects — such as blobs, trees, commits, and tags. Each object is identified by a 40-character SHA-1 hash. SHA-1 is a cryptographic hash function that produces a 160-bit (20-byte) hash.

However, due to growing concerns about vulnerabilities in SHA-1, newer versions of Git support SHA-256, a more secure hashing algorithm.

Git isn't the only tool that uses hashes — there are many other use cases for verifying data integrity. A common method is to generate a digest (hash) of a file or message and later recheck the digest to confirm that the content hasn't been modified.

For example, when downloading a file or disk image, you’ll often see a checksum (like SHA-256) provided alongside it. After the download, you can compute the digest of the file on your machine and compare it to the provided value. If they match, the file hasn't been altered.

It’s also important to remember that a digest is a one-way function: you cannot retrieve the original data from the hash.

What Are Git Objects?

Git stores all of its data as a set of objects in its internal database. These objects form the foundation of Git’s version control system.

The four main Git object types are:

Blob (Binary Large Object)

Represents the content of a file.
Stores only the raw data — not the filename or metadata.
Each version of a file’s content is stored as a separate blob.
Think of it as a snapshot of the file's content at a specific point in time.

Tree

Represents a directory.
Contains pointers to blobs (files) and other trees (subdirectories).
Stores:
Filenames
File modes (permissions)
References to blobs and other trees
Think of it as a snapshot of a folder’s structure and contents.

Commit

Represents a snapshot of the entire project at a given point in time. Points to a single tree object (the project’s root directory).
Contains metadata:
Author and committer info
Commit message
Timestamp
References to parent commits (the project’s history)

Tag

A human-readable label or bookmark that points to another Git object, usually a commit.
Can be lightweight (just a reference) or annotated (with metadata).
Annotated tags include:
Tagger name
Date
Optional message
Often used for marking releases (e.g. v1.0.0).

In practice, this is why Git tracks content, not directories. If a directory contains no files or subdirectories, Git has nothing to store — the directory itself doesn't exist in Git's object database. That's why developers often create an empty file like .gitkeep to force Git to track an otherwise empty directory.

It's also worth noting that Git does not track full file permissions. It only stores limited permission bits — mainly whether a file is executable or not (i.e., the +x bit).

Since a blob represents only the content of a file (without its name or path), you can’t retrieve the filename directly from a blob object. The filename and path information are stored in the tree object that references the blob.

This design means that two files with different names but identical content will share the same hash — and therefore the same blob. This is one of the ways Git achieves storage efficiency through deduplication.

If you read this far, tweet to the author to show them you care. Tweet a Thanks

chevron_left

	Understanding the basics of GIT Hiruthic Sha - Jun 25
	Fixing mistakes with Git Hiruthic Sha - Aug 11
	Git default options Nicolas Fränkel - Jul 31
	Git & GitHub: Your First Steps to Version Control Abdelhakim Baalla - Jul 13
	Rewinding Time in Git: Exploring Commits & Branching Hiruthic Sha - Jun 27

Git Under the Hood

Did you know that Git is fundamentally a key-value store?

What is a hash?

What Are Git Objects?

Please log in to add a comment.

0 Answers

More Posts

Understanding the basics of GIT

Fixing mistakes with Git

Git default options

Git & GitHub: Your First Steps to Version Control

Rewinding Time in Git: Exploring Commits & Branching

More From István Döbrentei

Who is a senior developer?

Primum nil nocere in PHP: First, Do No Harm

Modern Dependency Injection in PHP: Beyond the Container

Welcome to Coder Legion Community

with 2,231 amazing developers

Connect with

Already have an account? Log in

Git Under the Hood

Did you know that Git is fundamentally a key-value store?

What is a hash?

What Are Git Objects?

Please log in to add a comment.

0 Answers

More Posts

More From István Döbrentei