HAL Id: hal-01194700
https://hal.archives-ouvertes.fr/hal-01194700
Preprint submitted on 7 Sep 2015
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
What is the minimal cardinal of a family which shatters all d-subsets of a finite set?
N Chevallier, A Fruchard
To cite this version:
N Chevallier, A Fruchard. What is the minimal cardinal of a family which shatters all d-subsets of a finite set?. 2015. �hal-01194700�
What is the minimal cardinal of a family which shatters all d-subsets of a finite set?
N. Chevallier and A. Fruchard September 7, 2015
In this note, d ≤ n are positive integers. Let S be a finite set of cardinal |S| = n and let 2S denote its power set, i.e. the set of its subsets. A d-subset of S is a subset of S of cardinal d.
LetF ⊆2S and A ⊆S. The trace of F on A is the family FA ={E∩A ; E ∈ F }. One says that F shattersA if FA= 2A. The VC-dimension ofF is the maximal cardinal of a subset of S that is shattered by F [7]. The following is well-known [7, 4, 5]:
Theorem 1. (Vapnik-Chervonenkis, Sauer, Shelah)
If VC-dim(F)≤d (i.e. if F shatters no (d+ 1)-subset of S) then |F | ≤ c(d, n), where c(d, n) = n0
+· · ·+ nd . Moreover this bound is tight: It is achieved e.g. for F = ≤Sd
, the family of all k-subsets of S, 0≤k ≤d.
A first natural question is:
Question 1 . Assume a family F ⊆ 2S is maximal for the inclusion among all families of VC-dimension at most d. DoesF always have the maximal possible cardinal c(d, n)?
Let us define the index of F as follows:
IndF = max{d∈ {0, ..., n} ; F shatters all d-subsets ofS}.
Let C(d, n) = min{|F | ; IndF = d}. For instance, we have C(1, n) = 2, with the (only possible) choice F ={∅, S}. Of course we have 2d≤C(d, n)≤2n. The question is:
Question 2 . Give the exact value of C(d, n)for 2≤d≤n. If this is not possible, give lower and upper bounds as accurate as possible.
A well-known duality yields another formulation of Question 2. Let ϕ : S → 2F, a 7→ {E ∈ F ; a∈E} and setS =ϕ(S). In this manner, we have for alla∈S and allE ∈ F:
a ∈E ⇔E ∈ϕ(a). (1)
One can check that F shatters A ⊆ S if and only if, for every partition (B, C) of A (i.e.
A= B∪C and B ∩C =∅) the intersection T
b∈Bϕ(b)
∩ T
c∈Cϕ(c)
is nonempty, where the notationY stands for F \Y.
If IndF ≥ 2, then ϕ is a one-to-one correspondance from S to S, hence we have logn ≤ C(d, n) for all 2 ≤d ≤n, where log denotes the logarithm in base 2.
1
The case d = 2
. Using for instance the binary expansion, it is easy to show that the order of magnitude ofC(2, n) is actually logn. The next statement refines this.Proposition 2 . Ifn = 12 2ll
= 2ll−−11
, then C(2, n) = 2l.
Proof. (Recall the notation A=F \A.) We first prove by contradiction that C(2, n)>2l−1.
Actually, if a family F of subsets of S shatters all 2-subsets of S, then the image S ⊆2F of S by ϕ must satisfy
∀A6=B ∈ S, A∩B, A∩B, A∩B, and A∩B are nonempty. (2) In particular S is a Sperner family of F (i.e. an antichain for the partial order of inclusion;
one finds several other expressions in the literature: ‘Sperner system’, ‘independent system’,
‘clutter’, ‘completely separating system’, etc.). For a survey on Sperner families and several generalizations, we refer e.g. to [1] and the references therein.
Assume now that |F | = 2l−1; it is known [6, 2, 3] that all Sperner families of F have a cardinal at most 2ll−−11
, and that there are only two Sperner families of maximal cardinal: the families l−F1
and Fl
, i.e. of (l−1)-subsets, resp. l-subsets of F. However, none of these families satisfies both A∩B and A∩B nonempty in (2). As a consequence, we must have
|F | ≥2l.
Conversely, let S = {a1, . . . , an}, consider {1,...,2ll }
, the set of l-subsets of {1, . . . ,2l}, and choose one element in each pair of complementary l-subsets. We then obtain a family {A1, . . . , An} which satisfies (2). Now we setF ={E1, . . . , E2l}, withEi ={aj ; i∈Aj}. The characterization (1) shows that F shatters every 2-subset of S.
The proof of the following statement is straightforward.
Corollary 3. If 2ll−−11
< n≤ 2l+1l
, then2l ≤C(2, n)≤2l+ 2.
The upper bound can be slightly improved: One can prove that, if 2ll−−11
< n ≤ l−2l1 , then 2l≤C(2, n)≤2l+ 1.
Question 3 . It seems that we have C(2, n) = k if and only if ⌊(k−k1)/2−2⌋−1
< n ≤ ⌊k/2k−⌋−1 1 , where ⌊x⌋ denotes the integer part of x. Is it true? Is it already known?
The first values are C(2,2) = C(2,3) = 4, C(2,4) = 5, C(2,5) = · · · = C(2,10) = 6. Com- puter seems to be useless, at least for a naive treatment. Already in order to obtainC(2,11) = 7, we would have to verify that C(2,11) > 6, i.e. to find, for each of the 2116
≈1017 families F in 2S some 2-subset that is not shattered by the family. (Alternatively, in the dual statement, we have to check “only” 2116
≈7.1011 families S in 2F.)
The case d ≥ 3
. From now, we assumen ≥4.Proposition 4 . For all3≤d < n, we have C(d, n)≤ 2d!d (3 logn)d.
The constant 3 can be improved. The proof below shows that, for all a > 1 and all n large enough, C(d, n)≤ 2d!d(alogn)d.
Proof. Let F0 ⊂ 2S be a minimal separating system of S, i.e. such that, for all a, b∈ S there exists Eab ∈ F0 which satisfies b /∈ Eab ∋ a. Since this amounts to choosing F0 minimal such that S = ϕ(S) is a Sperner family for F0, we know that |F0| =N if and only if ⌊(NN−−1)/21 ⌋
<
n ≤ ⌊N/2N ⌋
, hence N := |F0| ≤ 2 + logn + 12log logn ≤ 3 logn since n ≥ 4. We assume 2
N ≥ 2 in the sequel. Given two disjoint subsets B and C of S such that |B ∪C|=d, the set EBC = T
c∈C
S
b∈BEbc
contains B and does not meet C. Let F be the collection of all such sets EBC; then F shatters all subsets of S of cardinal at most d.
To estimate |F |, we consider Fk the collection of all such sets EBC, with |B| =k (and thus
|C|=d−k). We have |Fk| = Nk N−k
d−k
(with N =|F0|). Then we choose F =Sd
k=0Fk. We obtain|F | ≤Pd
k=0 N
k
N−k
d−k
= Nd
2d ≤ 2d!dNd≤ 2d!d (3 logn)d.
Question 4 . Is (logn)⌊d/2⌋⌊(d+1)/2⌋ the right order of magnitude for C(d, n)?
By constructing auxiliary Sperner families fromS, it is possible to give a better lower bound for C(d, n) than onlyC(d, n)≥C(2, n). For instance, in the cased= 3, for all distinctA, B, C ∈ S, we must haveA∩B 6⊆C. One can check that this implies that the family{A∩B ; A, B ∈ S}
is a Sperner family, therefore we obtain n2
≤ ⌊C(3,n)C(3,n)⌋/2
. Unfortunately, this does not modify the order of magnitude. Already in this cased= 3, we do not know whetherC(3, n) is of order logn, (logn)2, or an intermediate order of magnitude. Another formulation is:
Question 5 . Prove or disprove: There exists C > 0 such that, for all k ∈N, if F is a finite set of cardinalk and S ⊆2F satisfies ∀A, B, C ∈ S, A∩B 6⊆C, then|S| ≤C2C√k.
References
[1] P. Borg, Intersecting families of sets and permutations: a survey. Int. J. Math. Game Theory Algebra 21 (2012) 543–559.
[2] G. Katona, On a conjecture of Erd¨os and a stronger form of Sperner’s theorem. Studia Sci. Math. Hungar. 1 (1966) 59–63.
[3] D. Lubell, A short proof of Sperner’s theorem, J. Combin Theory 1 (1966) 299.
[4] N. Sauer, On the density of families of sets, J. Combin. Theory25 (1972) 80–83.
[5] S. Shelah, A combinatorial problem, stability and order for models and theories in infinite languages, Pacific J. Math. 41 (1972) 247–261.
[6] E. Sperner, Ein Satz ¨uber Untermenger einer endlichen Menge,Math. Zeitschrift 27 (1928) 544–548.
[7] V. N. Vapnik and A. Y. Chervonenkis, On the uniform convergence of relative frequences of events to their probabilities Theory Probab. Appl. 16 (1971) 264–280.
3