HAL Id: hal-00905902
https://hal.archives-ouvertes.fr/hal-00905902
Submitted on 18 Nov 2013
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
A note on Pr’́ufer–like coding and counting forests of uniform hypertrees
Christian Lavault
To cite this version:
Christian Lavault. A note on Pr’́ufer–like coding and counting forests of uniform hypertrees. Journal
of Discrete Algorithms, Elsevier, 2012, 12 (1), 29–36 (selected paper of ACiD 2010). �hal-00905902�
A note on Pr¨ ufer-like coding and counting forests of uniform hypertrees
Christian Lavault
∗Abstract
This note presents an encoding and a decoding algorithms for a forest of (la- belled) rooted uniform hypertrees and hypercycles in linear time, by using as few as n−2 integers in the range [1, n]. It is a simple extension of the classical Pr¨ufer code for (labelled) rooted trees to an encoding for forests of (labelled) rooted uniform hy- pertrees and hypercycles, which allows to count them up according to their number of vertices, hyperedges and hypertrees. In passing, we also find Cayley’s formula for the number of (labelled) rooted trees as well as its generalisation to the number of hypercycles found by Selivanov in the early 70’s.
Key words: Hypergraph, Forest of (labelled rooted) hypertrees, Pr¨ufer code, Encoding-decoding,b-uniform, Enumeration.
1 Notations and definitions
A hypergraph H is a pair H = (V, E), where V = {1, 2, . . . , n} denotes the set of vertices and E is a family of subsets of V each of size ≥ 2 called hyperedges (see e.g. [1]).
Two vertices are neighbours if they belong to the same hyperedge. The degree of a vertex is the number of its neighbours. A leaf is a set of b − 1 non-distinguished vertices of degree (b − 1) belonging to the same hyperedge.
A hyperpath (path) between two vertices u and v is a finite sequence of hyperedges e
1, . . . , e
k, such that e
i∩ e
i+16= ∅ for any 1 ≤ i ≤ k − 1, with u ∈ e
1and v ∈ e
k.
A hypergraph H is connected if there exists a path between any two vertices of H.
A connected hypergraph is also called a connected component, or simply a component.
A hypergraph is called b-uniform (or uniform) if every hyperedge e ∈ E contains exactly b vertices (2 ≤ b ≤ n) [8, 14]. For example, 2-uniform hypergraphs are simply graphs. In the present note, only connected b-uniform hypergraphs (2 ≤ b ≤ n) are considered.
∗LIPN (UMR CNRS 7030), Universit´e Paris 13 99, av. J.-B. Cl´ement 93430 Villetaneuse (France).
E-mail: lavault@lipn.univ-paris13.fr
The excess of a connected hypergraph H = (V , E) is defined as exc(H) = X
e∈E
(|e| − 1) − |V|
(see e.g. [8, 14]). Thus, if H is b-uniform, its excess is (b − 1)|E| − |V|. A hypertree is a component of excess −1, which is the smallest excess possible for a connected hypergraph H, and hence, exc(H) ≥ −1 for any H (see the above definition).
A component is rooted if one of its vertices is distinguished from all others. A hypergraph is called a forest if all its components have excess −1 (i.e. are hypertrees), and similarly a hypercycle has excess 0.
2 Bijective enumeration of a forest of hypertrees
2.1 State of the art and motivations
Concerning connected graphs (2-uniform hypergraphs), there exist several methods for counting trees (see e.g. [2, 3, 4, 7, 9, 10, 11]), including of course the Pr¨ ufer code. Pr¨ ufer sequences were first introduced by Heinz Pr¨ ufer to prove Cayley’s enumeration formula in 1918 [13]. In his very elegant proof
1, Pr¨ ufer verified Caley’s Theorem [3] by establishing a one-to-one correspondence between labelled free trees of order n and all sequences of n − 2 positive integers from 1 to n. The Pr¨ ufer codes can thus be generated by a simple iterative algorithm (see also [9, vol. 1, chap. 2] and [6]).
Remark 1. To compute the Pr¨ ufer sequence Seq(T) for a labelled tree T , iteratively delete the leaf with smallest label and append the label of its neighbor to the sequence.
After n − 2 iterations a single edge remains and we have produced a sequence Seq(T) of length n − 2.
Since the introduction of Pr¨ ufer codes, a linear time algorithm for its computation was given for the first time only in the 70’s [12, 15], and has been later rediscovered several times in various forms [2, 4, 5]. Recently for example, the sequential encoding and decoding schemes presented in [2]. Both require an optimal Θ(n) time when applied to rooted n-node trees, and provide the first optimal linear time decoding algorithm for Neville’s codes [2].
Amongst the most recent results, “Pr¨ ufer-like” encoding-decoding algorithms are generalizing the Pr¨ ufer code to the case of hypertrees (uniform or arbitrary). In 2009, S. Shannigrahi, S.P. Pal have shown in [17] that uniform hypertrees can be Pr¨ ufer-like encoded (and decoded) in optimal linear time Θ(n), using only n − 2 integers in the range [1, n] (see Pr¨ ufer’s Theorem in [13]).
1Pr¨ufer’s Theorem is as follows. There arenn−2 sequences (called Pr¨ufer sequence or Pr¨ufer codes) of lengthn−2with entries being from natural numbers; we establish a bijection between the set of trees and this set of sequences.
In the case when hypertrees are arbitrary (non uniform), the same authors’ encoding and decoding algorithms in [17] require codes of length (n − 2) + p, where p is the number of vertices belonging to more than one hyperedge, or pivots. Since the number p of pivots is bounded by |E|, at most (n − 2) + |E| integers are needed to encode a general hypertree. Therefore, the design of efficient Pr¨ ufer-like encoding and decoding algorithms can be extended to arbitrary hypertrees where each hyperedge has at least two vertices. Up until now, no better bounds are known for the length of Pr¨ ufer-like codes for arbitrary hypertrees. By contrast, the exact number of distinct hypertrees is known to be
n−1
X
i=0
n − 1 i
n
i−1, where p
q
denotes the Stirling numbers of the second kind [9].
The main motivation of the present note comes from analytic and bijective combina- torics of hypergraphs, including the enumeration of (labelled) rooted forests of uniform hypertrees and hypercycles [14]. These enumeration results are actually tightly linked to Pr¨ ufer-like coding and decoding of such combinatorial structures.
The following two algorithms code and decode forests of rooted uniform hypertrees, which in turns allows to enumerate these structures bijectively by using a generalization of the Pr¨ ufer sequences. The knowledge of the number of forests of rooted uniform hypertrees provides a concise and simple description of the structures, e.g. by using a recursive pruning of the leaves in the forests.
2.2 Encoding and decoding algorithms for a forest of (labelled) rooted hypertrees
Definition 1. The forests F composed of (k + 1) (labelled) rooted b- uniform hypertrees, with n vertices and s hyperedges, is coded with a 4-tuple (R, r, P , N ) defined as follows:
• R is a set of (k + 1) vertices (roots) with distinct labels in [n] ≡ {1, . . . , n},
• r ∈ R is one the (k + 1) roots,
• P is a partition of [n] \ R into s subsets, each of size (b − 1) and
• N is an (s − 1)-tuple in [n]
s−1.
Note that the above positive integers n ≡ n(s), s and k meet the condition n = s(b − 1) + k + 1,
and since F is b-uniform, exc(F) = s(b − 1) − n. So, |E| = s and |V| = n.
Algorithm 1 is coding a given forest F of k + 1 (labelled) rooted uniform hypertrees as input, and returns the 4-tuple (R, r, P, N ) coding F as output.
The 4-tuple (R, r, P , N ) is obtained from the coding in Definition 1 of a forest F (Algo- rithm 1) as follows.
1. the number of its components (i.e. the number |R| of its roots),
Algorithm 1 Encoding a forest of rooted hypertrees.
Input: A forest F of k + 1 (labelled) rooted b-uniform hypertrees, with s hyperedges and n = s(b − 1) + k + 1 vertices.
Output: The coding (R, r, P , N ) of F as in Definition 1.
1.
(R, r, P , N ) ← ({root}, r, { }, ( ))
2.Repeat
3.
Add the set of vertices corresponding to the smallest leaf (with respect to the lexicographical order) to the partition P .
4.
Put the vertex linking that set into the (s − 1)-tuple N .
5.
Take F as the “new” forest not having the vertices corresponding to the smallest leaf.
6.
Until there is no hyperedge remaining in F.
7.
The last vertex in N is necessarily a root and r is redefined as this last vertex.
8.
Return (R, r, P, N ).
2. the unique root vertex r attached to the leaf of the last hyperedge in the pruning process,
3. the number of hyperedges |N | + 1, and finally,
4. the number of hypertrees that are not reduced to their roots, namely the number of distinct roots in the pair (N, r).
Example.
The forest F = (V , E) of rooted uniform hypertrees depicted in Fig. 1 is as follows:
V = {1, 2, . . . , 22} and
E = {{1, 21, 22}, {2, 17, 18}, {3, 13, 19}{4, 8, 18}, {4, 12, 14}, {6, 7, 13}, {7, 20, 21}, {10, 13, 15}, {11, 18, 21}}.
The roots of the hypertrees are 5, 9, 13 and 16.
In this example, Algorithm 1 outputs (R, r, P , N ) s.t.
R = {5, 9, 13, 16}, r = 13,
P = {{1, 22}, {2, 17}, {3, 19}, {4, 8}, {6, 7}, {10, 15}, {11, 18}, {12, 14}, {20, 21}} and N = (21, 18, 13, 13, 4, 18, 21, 7).
Next, the following Algorithm 2 decodes a given 4-tuple (R, r, P, N ) as input, and returns a forest of (labelled) rooted b-uniform hypertrees as output.
Within the loop of Algorithm 2, the forest F is found with no ambiguity by choosing the smallest leaf in the lexicographical order.
Remark 2. Encoding Algorithm 1 and decoding Algorithm 2 are both running in optimal
linear time, by using as few as n −2 integers in the range [1, n]. This complexity result is
a direct consequence of the Pr¨ ufer-like encoding algorithm designed in [17] (see Pr¨ ufer’s
Theorem in [13]). The algorithm computes the hyperedge partial order on the hyperedges
5 9 16
14 17 1
22
10 15
3 19 13
6 20 7
11 21 18
2 8
4 12
Figure 1: A forest of 4 (labelled) rooted hypertrees.
of the hypertree in linear time by defining a directed acyclic graph (DAG) with vertex set E, where each vertex represents a hyperedge of H. (The proof is completed in [17, Lemma 2].)
2.3 Enumeration of forests, hypertrees and trees
From Algorithm 1, we obtain the enumeration of forests with (k + 1) (labelled) rooted uniform hypertrees and s hyperedges.
Theorem 1. The number of forests having (k + 1) (labelled) rooted b-uniform hypertrees and s hyperedges is
n k + 1
(k + 1)
(n − k − 1)!
s! (b − 1)!
sn
s−1= n!
k!
n
s−1s! (b − 1)!
s, (1) where the number of vertices is n ≡ n(s) = s(b − 1) + k + 1.
Proof. The proof stems directly from the one-to-one correspondence constructed in Al- gorithm 1, which codes forests F with the set of 4-tuples (R, r, P, N ) defined as in Defi- nition 1. Indeed, the number of forests of (k + 1) (labelled) rooted b-uniform hypertrees and s hyperedges is equal to |R| × #roots × |P| × |N |.
Now, we have |R| = n
k + 1
, the number of roots is (k + 1) , |P| = (n − k − 1)!
s! (b − 1)!
sand |N | = n
s−1. So, after simplifications, Theorem 1 follows.
Setting k = 0 in the above Eq. (1) (Theorem 1) yields n! n
s−1s!(b − 1)!
s. Whence the
following
Algorithm 2 Decoding to a forest of rooted hypertrees.
Input: Integers n, k and s meeting the condition n = s(b − 1) + k + 1 and the coding (R, r, P , N ) of F, as in Definition 1.
Output: Forest F of rooted b-uniform hypertrees, whose k + 1 roots are the vertices of R.
1.
Repeat
2.
Build one hyperedge with the first vertex in the (s − 1)-tuple N and the vertices of the first set of P (w.r. to the lexicographic order) having no vertex still in the remaining set N .
3.
Remove the above set from P and delete the first vertex from the (s − 1)-tuple N .
4.Until the set N is empty.
5.
Build the (s − 1)-th hyperedge with the last subset of P and the vertex r.
6.
Return the forest F obtained.
Corollary 1. The number of (labelled) rooted b-uniform hypertrees with s hyperedges is (n − 1)!
s! (b − 1)!
sn
s, where the number of vertices is n ≡ n(s) = s(b − 1) + 1.
Note that, whenever b = 2 (and thus s = n − 1), Corollary 1 is a generalization of Cayley’s Theorem enumerating rooted trees of n vertices: for any n ≥ 1, the number of (nonplane labelled) rooted trees of n vertices is n
n−1[3].
One of the advantages offered by a bijective construction proof is also the possibil- ity of performing a random generation and learn some characteristic properties of the structures in F [14].
A generalization of Subsection 2.3 also gives an explicit expression of the number of uniform hypercycles and obtain an alternative proof of Selivanov’s 1972 enumeration result.
3 Enumeration of uniform hypercycles
Together with hypertrees, hypercycles are the simplest structures and they have excess 0.
Along the same lines as in Theorem 1, the following bijective proof gives the enumeration formula (first given by Selivanov in [16]) of the uniform hypercycles in a forest F.
Theorem 2. [16] The number of b-uniform hypercycles with s hyperedges is (b − 1)n! n
s−12 (b − 1)!
s sX
j=2
j
s
j(s − j)! =
(b − 1)n! n
s−12 (b − 1)!
s1
s(s − 2)! ,
where the number of vertices is n ≡ n(s) = s(b − 1).
Proof. A hypercycle having a cycle length j corresponds to a forest F of j(b − 1) rooted hypertrees, up to an arrangment of the hypertrees in all distinct ways of forming a cycle. F has (s − j) hyperedges and j(b − 1) components to be arranged into a cycle.
The number of b-uniform hypercycles having a cycle length j (2 ≤ j ≤ s) and with s hyperedges is thus
n
j(b − 1)
j(b − 1)
(s − j)(b − 1)
! (s − j)! (b − 1)!
s−j
1 2
j(b − 1)
! (b − 2)!
j
, (2) where n ≡ n(s) = s(b − 1).
In the above Eq. (2) indeed, the left-hand factor (between brackets) counts the number of forests of rooted b-uniform hypertrees, while the right-hand one counts the number of smooth hypercycles labelled with the set {1, . . . , j(b − 1)}. j distinct hyperedges, and thus labelled with the set {1, . . . , j(b − 1)}.
Now, (b − 1)!
−s+j(b − 2)!
−j= (b − 1)!
j(b − 1)!
−sand, since n = s(b − 1), we have (s −j)(b− 1) = n −j(b− 1). So, Eq. (2) simplifies to n!(b − 1)
2 (b − 1)!
s(b − 1)!
j(s − j)! , with j ranging from 2 to s.
Finally, substituting n
s−1/s
jfor (b − 1)!
jand summing on 2 ≤ j ≤ s, gives the finite sum in Theorem 2:
s
X
j=2