Commit Graphs - Version Control with Git

“Object Store Pictures” on page 33 introduced some figures to help visualize the layout and connectivity of objects in Git’s data model. Such sketches are illuminating, espe-cially if you are new to Git; however, even a small repository with just a handful of commits, merges, and patches becomes unwieldy to render in the same detail. For example, Figure 6-3 shows a more complete but still somewhat simplified commit graph. Imagine how it would appear if all commits and all data structures were rendered.

Yet one observation about commits can simplify the blueprint tremendously: each commit introduces a tree object that represents the entire repository. Therefore, a commit can be pictured as just a name.

master pr-17

dev

Figure 6-3. Full commit graph

Figure 6-4 shows the same commit graph as Figure 6-3 without depicting the tree and blob objects. Usually for the purpose of discussion or reference, branch names are also shown in the commit graphs.

In computer science, a graph is a collection of nodes and a set of edges between the nodes. There are several types of graphs with different properties. Git makes use of a special graph called a directed acyclic graph (DAG). A DAG has two important prop-erties. First, the edges within the graph are all directed from one node to another.

Second, starting at any node in the graph, there is no path along the directed edges that leads back to the starting node.

Git implements the history of commits within a repository as a DAG. In the commit graph, each node is a single commit, and all edges are directed from one descendant node to another parent node, forming an ancestor relationship. The graphs you saw in Figures 6-3 and 6-4 are both DAGs.

When speaking of the history of commits and discussing the relationship between commits in a graph, the individual commit nodes are often labeled as shown in Fig-ure 6-5.

Commit History | 73

In these diagrams, time is roughly left to right. A is the initial commit as it has no parent, and B occurred after A. Both E and C occurred after B, but no claim can be made about the relative timing between C and E; either could have occurred before the other. In fact, Git doesn’t really care about the time or timing (absolute or relative) of commits. The actual “wall clock” time of a commit can be misleading because a computer’s clock can be set incorrectly or inconsistently. Within a distributed development environment, the problem is exacerbated. Timestamps can’t be trusted. What is certain, though, is that if commit Y points to parent X, then X captures the repository state prior to the repository state of commit Y, regardless of what timestamps might be on the commits.

The commits E and C share a common parent, B. Thus, B is the origin of a branch. The master branch begins with commits A, B, C, and D. Meanwhile, the sequence of commits

master pr-17

dev Figure 6-4. Simplified commit graph

master pr-17

A B C D H

F G

Figure 6-5. Labeled commit graph

A, B, E, F, and G form the branch named pr-17. The branch pr-17 points to commit G. (You can read more about branches in Chapter 7.)

Because it’s a merge, H has more than one commit parent—in this case, D and G. Even though H has two parents, it is only present on the master branch as pr-17 refers to G. (The merge operation is discussed in more detail in Chapter 9.)

In practice, the fine points of intervening commits are considered unimportant. Also, the implementation detail of a commit pointing back to its parent is often elided, as shown in Figure 6-6. Time is still vaguely left to right, there are two branches shown, and there is one identified merge commit (H), but the actual directed edges are simplified because they are implicitly understood.

master pr-17

H Figure 6-6. Commit graph without arrows

This kind of commit graph is often used to talk about the operation of certain Git commands and how each might modify the commit history. The graphs are a fairly abstract representation of the actual commit history, in contrast to tools (such as gitk and git show-branch) that provide concrete representations of commit history graphs. In these tools, though, time is usually represented from bottom to top, oldest to most recent. Conceptually, it is the same information.

Using gitk to view the commit graph

A graph, by its very nature, is a visual aid to help you visualize a complicated structure and relationship. The gitk command^‡ can draw a picture of a repository DAG whenever you want.

Let’s look at our example website:

$ cd public_html

$ gitk

‡Yes, this is one of the few Git commands that is not considered a “subcommand”; thus, it is given as gitk and not git gitk.

Commit History | 75

The gitk program can do a lot of things, but let’s just focus on the DAG. The graph output looks something like Figure 6-7.

Figure 6-7. Merge viewed with gitk

Here’s what you must know in order to understand the DAG of commits. First of all, each commit can have zero or more parents as follows:

• Normal commits have exactly one parent, which is the previous commit in the history. When you make a change, your change is the difference between your new commit and its parent.

• There is usually only one commit with zero parents: the initial commit, which appears at the bottom of the graph.

• A merge commit, such as the one at the top of the graph, has more than one parent.

A commit with more than one child is the place where history began to diverge and formed a branch. In Figure 6-7, the commit Remove my poem is the branch point.

There is no permanent record of branch start points, but Git can algo-rithmically determine them using the git merge-base command.

Dans le document Version Control with Git (Page 90-94)