Graph Scanning Algorithm - R.L. Graham, La Jolla B. Korte, Bonn

Input: A graphG (directed or undirected) and some vertexs.

Output: The set R of vertices reachable froms, and a set T ⊆ E(G)such that(R,T)is an arborescence rooted ats, or a tree.

1 Set R:= {s},Q:= {s}andT := ∅. 2 IfQ= ∅then stop,

else choose av∈Q.

3 Choose aw∈V(G)\R withe=(v, w)∈E(G)ore= {v, w} ∈ E(G). Ifthere is no suchw thenset Q:=Q\ {v}andgo to2.

4 Set R:=R∪ {w}, Q:=Q∪ {w}andT :=T ∪ {e}.Go to2. Proposition 2.16. TheGraph Scanning Algorithmworks correctly.

Proof: At any time, (R,T) is a tree or an arborescence rooted at s. Suppose at the end there is a vertex w ∈ V(G)\R that is reachable from s. Let P be an s-w-path, and let {x,y}or (x,y) be an edge of P with x ∈ R and y ∈/ R.

Sincexhas been added to R, it also has been added toQat some time during the execution of the algorithm. The algorithm does not stop before removingx from Q. But this is done in3 only if there is no edge{x,y}or(x,y)withy∈/ R. 2 Since this is the ﬁrst graph algorithm in this book we discuss some imple-mentation issues. The ﬁrst question is how the graph is given. There are several natural ways. For example, one can think of a matrix with a row for each vertex and a column for each edge. The incidence matrix of an undirected graphG is the matrix A=(a_v,e)_v∈V(G),e∈E(G) where

a_v,e =

1 ifv∈e 0 ifv∈e .

Theincidence matrixof a digraphGis the matrix A=(a_v,e)_v∈V(G),e∈E(G)where a_v,(x,y) =

−1 ifv=x 1 ifv=y 0 ifv∈ {x,y}.

2.3 Connectivity 25 Of course this is not very efﬁcient since each column contains only two nonzero entries. The space needed for storing an incidence matrix is obviously O(nm), wheren:= |V(G)|andm:= |E(G)|.

A better way seems to be having a matrix whose rows and columns are indexed by the vertex set. Theadjacency matrix of a simple graph G is the 0-1-matrix A=(a_v,w)_v,w∈V(G)witha_v,w=1 iff{v, w} ∈E(G)or(v, w)∈ E(G). For graphs with parallel edges we can deﬁne a_v,w to be the number of edges fromv tow. An adjacency matrix requires O(n²)space for simple graphs.

The adjacency matrix is appropriate if the graph isdense, i.e. has(n²)edges (or more). Forsparsegraphs, say with O(n)edges only, one can do much better.

Besides storing the number of vertices we can simply store a list of the edges, for each edge noting its endpoints. If we address each vertex by a number from 1 to n, the space needed for each edge is O(logn). Hence we need O(mlogn)space altogether.

Just storing the edges in an arbitrary order is not very convenient. Almost all graph algorithms require ﬁnding the edges incident to a given vertex. Thus one should have a list of incident edges for each vertex. In case of directed graphs, two lists, one for entering edges and one for leaving edges, are appropriate. This data structure is called adjacency list; it is the most customary one for graphs.

For direct access to the list(s) of each vertex we have pointers to the heads of all lists; these can be stored with O(nlogm)additional bits. Hence the total number of bits required for an adjacency list is O(nlogm+mlogn).

Whenever a graph is part of the input of an algorithm in this book, we assume that the graph is given by an adjacency list.

As for elementary operations on numbers (see Section 1.2), we assume that elementary operations on vertices and edges take constant time only. This includes scanning an edge, identifying its ends and accessing the head of the adjacency list for a vertex. The running time will be measured by the parametersn andm, and an algorithm running in O(m+n)time is called linear.

We shall always use the lettersn and m for the number of vertices and the number of edges. For many graph algorithms it causes no loss of generality to assume that the graph at hand is simple and connected; hencen−1 ≤m <n². Among parallel edges we often have to consider only one, and different connected components can often be analyzed separately. The preprocessing can be done in linear time in advance; see Exercise 13 and the following.

We can now analyze the running time of theGraph Scanning Algorithm: Proposition 2.17. TheGraph Scanning Algorithmcan be implemented to run inO(m+n)time. The connected components of a graph can be determined in linear time.

Proof: We assume that G is given by an adjacency list. For each vertex x we introduce a pointer current(x), indicating the current edge in the list containing all edges inδ(x)or δ⁺(x)(this list is part of the input). Initiallycurrent(x)is set to the ﬁrst element of the list. In3, the pointer moves forward. When the end of

the list is reached, x is removed from Qand will never be inserted again. So the overall running time is proportional to the number of vertices plus the number of edges, i.e. O(n+m).

To identify the connected components of a graph, we apply the algorithm once and check if R = V(G). If so, the graph is connected. Otherwise R is a connected component, and we apply the algorithm to(G,s)for an arbitrary vertex s∈V(G)\R (and iterate until all vertices have been scanned, i.e. added to R).

Again, no edge is scanned twice, so the overall running time remains linear. 2 An interesting question is in which order the vertices are chosen in3. Obvi-ously we cannot say much about this order if we do not specify how to choose a v ∈ Q in2. Two methods are frequently used; they are called Depth-First Search (DFS)andBreadth-First Search (BFS). InDFSwe choose thev∈ Q that was the last to enter Q. In other words, Q is implemented as a LIFO-stack (last-in-first-out). InBFSwe choose thev∈Q that was the first to enter Q. Here Q is implemented by a FIFO-queue (first-in-first-out).

An algorithm similar to DFS has been described already before 1900 by Trémaux and Tarry; see König [1936]. BFS seems to have been mentioned first by Moore [1959]. Trees (in the directed case: arborescences)(R,T)computed by DFSandBFSare calledDFS-treeandBFS-tree, respectively. ForBFS-trees we note the following important property:

Proposition 2.18. A BFS-tree contains a shortest path from s to each vertex reachable from s. The values distG(s, v) for all v ∈ V(G) can be determined in linear time.

Proof: We apply BFS to(G,s)and add two statements: initially (in 1 of the Graph Scanning Algorithm) we setl(s):=0, and in4 we setl(w):=l(v)+1.

We obviously have thatl(v)=dist₍_R_,_T₎(s, v) for all v ∈ R, at any stage of the algorithm. Moreover, if v is the currently scanned vertex (chosen in 2), at this time there is no vertex w ∈ R with l(w) >l(v)+1 (because the vertices are scanned in an order with nondecreasingl-values).

Suppose that when the algorithm terminates there is a vertex w ∈ V(G) with distG(s, w) < dist₍R,T)(s, w); let w have minimum distance from s in G with this property. Let P be a shortest s-w-path in G, and let e = (v, w) or e= {v, w}be the last edge in P. We have distG(s, v)=dist₍R,T)(s, v), butedoes not belong to T. Moreover,l(w)=dist₍R,T)(s, w) >distG(s, w)=distG(s, v)+ 1 = dist₍R,T)(s, v)+1 = l(v)+1. This inequality combined with the above observation proves thatwdid not belong to R whenvwas removed from Q. But

this contradicts3 because of edgee. 2

This result will also follow from the correctness ofDijkstra’s Algorithm for theShortest Path Problem, which can be thought of as a generalization of BFS to the case where we have nonnegative weights on the edges (see Section 7.1).

2.3 Connectivity 27 We now show how to identify the strongly connected components of a digraph.

Of course, this can easily be done by usingn timesDFS(or BFS). However, it is possible to ﬁnd the strongly connected components by visiting every edge only twice:

Dans le document R.L. Graham, La Jolla B. Korte, Bonn (Page 37-40)