Reachability in Graphs - Version Control with Git

In graph theory, a node, X, is said to be reachable from another node, A, if you can start at A, travel along the arcs of the graph according to the rules, and arrive at X. The set of reachable nodes for a node, A, is the collection of all nodes reachable from A.

In a Git commit graph, the set of reachable commits are those you can reach from a given commit by traversing the directed parent links. Conceptually and in terms of dataflow, the set of reachable commits is the set of ancestor commits that flows into and contributes to a given starting commit.

When you specify a commit, Y, to git log, you are actually requesting Git to show the log for all commits that are reachable from Y. You can exclude a specific commit, X, and all commits reachable from X with the expression ^X.

Combining the two forms, git log ^X Y is the same as git log X..Y and might be paraphrased as “Give me all commits that are reachable from Y, and don’t give me any commit leading up to and including X.”

The commit range X..Y is mathematically equivalent to ^X Y. You can also think of it as a set subtraction: use everything leading up to Y minus everything leading up to and including X.

Returning to the commit series from the earlier example, here’s how M~12..M~10 speci-fies just two commits, A and B. Begin with everything leading up to M~10, as shown in the first line of Figure 6-9. Find everything leading up to and including M~12, as shown in the second line of the figure. And finally, subtract M~12 from M~10 to get the commits shown in the third line of the figure.

A B

M~14 M~13 M~12 M~11 M~10 M~9

M Figure 6-8. Linear commit history

Commit History | 77

When your repository history is a simple linear series of commits, it is pretty easy to understand how a range works. But when branches or merges are involved in the graph, things can become a bit tricky, and so it’s important to understand the rigorous definition.

Let’s look at a few more examples. In the case of a master branch with a linear history, as shown in Figure 6-10, the three sets B..E, ^B E, and the set of C, D, and E are equivalent.

A B C D E master

Figure 6-10. Simple linear history

In Figure 6-11, the master branch at commit V was merged into the topic branch at B.

A B C D

topic

T U V W X Y Z

master

Figure 6-11. Master merged into topic

The range topic..master represents those commits in master, but not in topic. Since each commit on the master branch prior to and including V, (..., T, U, V), contributes to topic, those commits are excluded, leaving W, X, Y, and Z.

A B

M~14 M~13 M~12 M~11 M~10

…

M~14 M~13 M~12

…

M~11 M~10

A B

Figure 6-9. Interpreting ranges as set subtraction

The inverse of the previous example is shown in Figure 6-12. Here, topic has been merged into master.

A B

topic

V W X Y Z

master

Figure 6-12. Topic merged into master

In this example, the range topic..master, again representing those commits in master but not in topic, is the set of commits on the master branch leading up to and then including V, W, X, Y, and Z.

However, we have to be a little careful and consider the full history of the topic branch.

Consider the case where it originally started as a branch of master and then merged in again, as shown in Figure 6-13.

A B C D

topic

U V W X Y Z

master

Figure 6-13. Branch and merge

In this case, topic..master contains only the commits W, X, Y, and Z. Remember, the range will exclude all commits that are reachable (going back or left over the graph) from topic (i.e., the commits D, C, B, A, and earlier) as well as V, U, and earlier from the other parent of B. The result is just W through Z.

There are two other range permutations. If you leave either the start or end commits out of range, HEAD is assumed. Thus, ..end is equivalent to HEAD..end and start.. is equivalent to start..HEAD.

Finally, just as start..end can be thought of as representing a set subtraction operation, the notation A...B (using three periods) represents the symmetric difference between A and B, or the set of commits that are reachable from either A or B but not from both.

Because of the function’s symmetry, neither commit can really be considered a “start”

or “end.” In this sense, A and B are “equal.”

Commit History | 79

More formally, the set of revisions in the symmetric difference between A and B, A...B, is given by:

$ git rev-list A B --not $(git merge-base --all A B)

Let’s look at an example in Figure 6-14.

A B C D E F G H I master

U V W X Y Z ^dev

Figure 6-14. Symmetric difference

We can compute each piece of the symmetric difference definition:

master...dev = (master OR dev) AND NOT (merge-base --all master dev)

The commits that contribute to master are (I, H, ..., B, A, W, V, U). The commits that contribute to dev are (Z, Y, ..., U, C, B, A). The union of those two sets is (A, ..., I, U, ..., Z). The merge base between master and dev is commit W. In more complex cases, there might be multiple merge bases, but here, we have only one. The commits that contrib-ute to W are (W, V, U, C, B, and A); these are also the commits that are common to both master and dev, so they need to be removed to form the symmetric difference: (I, H, Z, Y, X, G, F, E, D).

It may be helpful if you can think of the symmetric difference between two branches, A and B, as “Show everything in branch A or in branch B, but only back to the point where the two branches diverged.”

Now that we have gone over what commit ranges are, how to write them, and how they work, it’s important to reveal that Git doesn’t actually support a true range oper-ator. It is purely a notational convenience that A..B represents the underlying ^A B form.

Git actually allows much more powerful commit set manipulation on its command line. Commands that accept a range are actually accepting an arbitrary sequence of

“included” and “excluded” commits. For example, you could use git log ^dev ^topic ^bugfix master to select those commits in master but not in any of the dev, topic, or bugfix branches.

All of these examples may be a bit abstract, but the power of the range representation really comes to fruition when you consider that any branch name can be used as part of the range. As described in “Tracking Branches” on page 180, if one of your branches represents the commits from another repository, you can quickly discover the set of commits that are in your repository that are not in another repository!

Dans le document Version Control with Git (Page 95-99)