A Detailed View of Git’s Object Model and Files

By now, you should have the basic skills to manage files. Nonetheless, keeping track of what file is where—working directory, index, and repository—can be confusing.

Let’s follow a series of four pictures to visualize the progress of a single file named file1 as it is edited, staged in the index, and finally committed. Each figure simultane-ously shows your working directory, the index, and the object store. For simplicity, let’s stick to just the master branch.

The initial state is shown in Figure 5-1. Here, the working directory contains two files named file1 and file2, with contents “foo” and “bar,” respectively.

In addition to file1 and file2 in the working directory, the master branch has a commit that records a tree with exactly the same “foo” and “bar” contents for files file1 and file2. Furthermore, the index records SHA1 values, a23bf and 9d3a2 for exactly those same file contents. The working directory, the index, and the object store are all synchronized and in agreement. Nothing is dirty.

Figure 5-2 shows the changes after editing file1 in the working directory so that its contents now consist of “quux.” Nothing in the index or in the object store has changed, but the working directory is now considered dirty.

Some interesting changes take place when you use the command git add file1 to stage the edit of file1.

As Figure 5-3 shows, Git first takes the version of file1 from the working directory, computes a SHA1 hash ID (bd71363) for its contents, and places that ID in the object store. Next, Git records in the index that the pathname file1 has been updated to the new bd71363 SHA1.

Since the contents of file2 haven’t changed and since no git add staged file2, the index continues to reference the original blob object for it.

At this point, you have staged file1 in the index, and the working directory and index agree. However, the index is considered dirty with respect to HEAD because it differs from the tree recorded in the object store for the HEAD commit of the master branch.^‡ Finally, after all changes have been staged in the index, a commit applies them to the repository. The effects of git commit are shown in Figure 5-4.

‡You can get a dirty index in the other direction, too, irrespective of the working directory state. By reading a non-HEAD commit out of the object store into the index and not checking out the corresponding files into the working directory, you create the situation where the index and working directory are not in agreement and where the index is still dirty with respect to the HEAD.

As Figure 5-4 shows, the commit initiates three steps. First, the virtual tree object that is the index gets converted into a real tree object and placed into the object store under its SHA1 name. Second, a new commit object is created with your log message. The new commit points to the newly created tree object and also to the previous or parent commit. Third, the master branch ref is moved from the most recent commit to the newly created commit object, becoming the new master HEAD.

An interesting detail is that the working directory, the index, and the object store (rep-resented by the HEAD of master HEAD) are once again all synchronized and in agreement, just as they were in Figure 5-1.

Working directory

Index

Object store

master

a23bf

foo bar

file1 file2

9d3a2 file1

foo bar

file2 project

Figure 5-1. Initial files and objects

A Detailed View of Git’s Object Model and Files | 59

Working directory 1. Edit file1

Index

Object store

master

a23bf

foo bar

file1 file2

9d3a2 file1

quux

foo bar

file2 project

Figure 5-2. After editing file1

Working directory

2a. Add file1 to Object store

2b. Update index Index

Object store

master

a23bf

foo bar

file1 file2

9d3a2 bd71363

file1 quux

quux bar

file2 project

Figure 5-3. After git add

A Detailed View of Git’s Object Model and Files | 61

Working directory

3b. Make commit object 3c. Update

branch ref

3a. Convert index into tree object Index

Object store

master

a23bf

foo bar

file1 file2

9d3a2 bd71363

file1 quux

quux bar

file2 project

Figure 5-4. After git commit

CHAPTER 6 Commits

In Git, a commit is used to record changes to a repository.

At face value, a Git commit seems no different from a commit or check-in found in other version control systems. However, under the hood, a Git commit operates in a unique way.

When a commit occurs, Git records a snapshot of the index and places that snapshot in the object store. (Preparing the index for a commit is covered in Chapter 5.) This snapshot does not contain a copy of every file and directory in the index, because such a strategy would require enormous and prohibitive amounts of storage. Instead, Git compares the current state of the index to the previous snapshot and so derives a list of affected files and directories. Git creates new blobs for any file that has changed and new trees for any directory that has changed, and it reuses any blob or tree object that has not changed.

Commit snapshots are chained together, with each new snapshot pointing to its pred-ecessor. Over time, a sequences of changes is represented as a series of commits.

It may seem expensive to compare the entire index to some prior state, yet the whole process is remarkably fast because every Git object has a SHA1 hash. If two objects, even two subtrees, have the same SHA1 hash, the objects are identical. Git can avoid swaths of recursive comparisons by pruning sub-trees that have the same content.

There is a one-to-one correspondence between a set of changes in the repository and a commit: a commit is the only method of introducing changes to a repository, and any change in the repository must be introduced by a commit. This mandate provides ac-countability. Under no circumstance should repository data change without a record of the change! Just imagine the chaos if, somehow, content in the master repository changed and there was no record of how it happened, who did it, or why.

While commits are most often introduced explicitly by a developer, Git itself can in-troduce commits. As you’ll see in Chapter 9, a merge operation causes a commit in the repository in addition to any commits made by users before the merge.

How you decide when to commit is pretty much up to you and your preferences or development style. In general, you should perform a commit at well-defined points in time when your development is at a quiescent stage, such as when a test suite passes, when everyone goes home for the day, or any number of other reasons.

However, don’t hesitate to introduce commits! Git is well suited to frequent commits and provides a rich set of commands for manipulating them. Later, you’ll see how several commits—each with small, well-defined changes—can also lead to better or-ganization of changes and easier manipulation of patch sets.

Dans le document Version Control with Git (Page 76-82)