Formal Languages-Course 1.
Formal Languages-Course 1.
Géraud Sénizergues
Bordeaux university
04/05/2020
Master computer-science MINF19, IEI, 2019/20
1 / 49
Formal Languages-Course 1.
contents
1 Introduction
What are formal languages ? Brief history
What shall we study ? Where is it useful ?
2 Formal languages Words
Languages
Formal Languages-Course 1.
Introduction
3 / 49
Formal Languages-Course 1.
Introduction
What are formal languages ?
What are formal languages
A language is, informally, a way to communicate by means of sequences of symbols (whether written symbols or spoken elementary sounds).
Examples :
Natural languages :Written english - spoken english- written international scientific english- written Vietnamese- written French, etc ...
Artificial languages : Formal mathematical language (as defined by some logical system, the “sequent calculus” for example), Lisp (a programming language), Lambda-calculus (a “theoretical”
programming language), etc ...
Formal Languages-Course 1.
Introduction
What are formal languages ?
What are formal languages
Two aspects for each language :
the syntax : what are the correct sequences of symbols (words, sentences, discourses,...)
the semantics: what does mean a sentence ? One hopes the meaning of a sentence can be deduced from the structure of the sentence and the meaningof each elementary component of the sentence (compositionality principle).
5 / 49
Formal Languages-Course 1.
Introduction
What are formal languages ?
What are formal languages
Both aspects (syntax and semantics) can be described by mathematical models.
A model for the syntax is a formal language i.e. a set of words.
A model for the semantics is aformal semanticsi.e. a mathematical definition of a functional :
{correct sentences}→{mathematical functions} that associates to every sentence its meaning.
Formal Languages-Course 1.
Introduction Brief history
Brief history
1850-1920-now : mathematicians defined formal languages expressing mathematics [Frege, Peano, Pieri, Hilbert,
Russel-Whitehead ]. These were languages used to model the mathematical proofs. Theses languages were not rigorously used in practise. They were necessary to study the power of mathematics : what is provable and what is not provable.
1920-1960-now : linguists modelednatural languagesby several kinds of formal grammars : AB-grammars, context-free grammars, Lambek-grammars, etc ...
1950-now : computer-scientists defined programming languages as formal languages, using rational expressions, grammars, automata,
7 / 49
Formal Languages-Course 1.
Introduction Brief history
Brief history
1960-now :logicians [Buchi, Rabin, Shellah, Muchnik-Semenov,etc ...] used automata in order to show decidability (and other properties) of some logical theories.
1970-now :Some formal languages expressing mathematics were practically implemented on computers ; thusformal proofs could be produced for many mathematical theorems [Automath 1970], HOL [produced the formal proof of the Kepler conjecture], Mizar, COQ [ACM software prize,2013], PVS,etc ...
Formal Languages-Course 1.
Introduction
What shall we study ?
What shall we study
For describing a formal language Lwe can use : - “algebraic operations”
operations on words and operations on languages : w ∈Lifw can be obtained by a specific sequence of operations
- “rewriting“
w ∈L ifw can be obtained from a starting word w0 by by a sequence of rewritings i.e. of local modifications
- “abstract machines” (automata)
w ∈L if the automaton A, on input w,computes, and eventually answers “yes”.
9 / 49
Formal Languages-Course 1.
Introduction
What shall we study ?
What shall we study
Algorithmic problems :
INPUT : adescriptionof a language (a rational expression, a grammar),
OUTPUT : anautomaton that recognizes the language.
The “automaton”is a mathematical object that can be easily turned into a program that solves the problem on a computer.
INPUT : aword and a language,
QUESTION : does the wordbelong to the language ? Example : given an allegedly C-program P : is P really a C-program ?
Formal Languages-Course 1.
Introduction
What shall we study ?
What shall we study
INPUT : two languages L,M: QUESTION :L=M?
Example : given a specificationof a language and aprogram that analyzes words, does the program implement the specification ? i.e.
does it recognize exactly what was defined as correct words ?
INPUT : a descriptionDof some kind,
OUTPUT : a descriptionE of another kind, defining thesame language.
Example : translate an automaton into a rational expression.
11 / 49
Formal Languages-Course 1.
Introduction Where is it useful ?
Where is it useful
Compiling : a compiler is a device that translates a given
source-language S into a given target-languageT, while preserving the semantics. The first step of a compiler consists in the lexical analysis of the input-text
P →P1
The second step consists in the syntacticanalysis of the textP1 : P1 →P2
where P2 is the abstract-syntax tree of P1. These two steps strongly lean on formal languages theory.
Formal Languages-Course 1.
Introduction Where is it useful ?
Where is it useful
Computer-architecture: building afinite-automaton is a step on the way to build a sequential boolean circuit that computes a given function.
Computational biology : many algorithms on long words (that model proteins or DNA-sequences) are based on automata or on rewriting-systems.
Automatic language processing: in order to treat natural language text and extract its semantics or to produce some natural language sentence expressing some meaning, the natural language must be modeled after some grammar and the semantics must be modeled as some function of the structure of the sentences (expressed by sometree).
13 / 49
Formal Languages-Course 1.
Formal languages
Formal languages : Words
Formal Languages-Course 1.
Formal languages Words
Free Monoid
Let Σbe a “set of symbols” (mathematically, Σis just some set).
We call word over the alphabet Σany finite sequence of symbols fromΣ :
Examples :
Σ ={a,b,c},
w1 =aabcabbc, w2 =bcabbcaaaa, w3 =cccaa, w4 =c. Length :
|w1|=8, |w2|=10, |w3|=5, |w4|=1 the length of a word u is the number of positions ofu.
One can denote the letter in position i of a wordu by u[i]. For example :
w3[0] =c,w3[1] =c,w3[2] =c,w3[3] =a,w3[4] =a
15 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid
Length relative to a letter :
For a letter x ∈Σand a word u ∈Σ∗ we note|u|x the number of positions of u that are labelled by letterx. Formally :
|u|x =Card{i ∈[0,|u| −1],u[i] =x}.
Example
u =abaccbaaa
|u|a=5, |u|b=2, |u|c =2
|u|=9=|u|a+|u|b+|u|c.
Formal Languages-Course 1.
Formal languages Words
Free Monoid
Product :
w1·w2 =aabcabbc·bcabbcaaaa=aabcabbcbcabbcaaaa For words the “concatenation product” of u,v is obtained by putting u before v and omitting the right-end ofu and left-end ofv.
w3·w4 =cccaa·c =cccaac. Empty-word :
We denote by εthe empty-word : it has length 0. The product by ε does not modify any word i.e. for every word u :
u·ε=ε·u =u
17 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid
Product and length : for every words u,v :
|u·v|=|u|+|v|.
for every words u,v and letterx :
|u·v|x =|u|x +|v|x.
Formal Languages-Course 1.
Formal languages Words
Free Monoid
Σ∗ : we denote byΣ∗ the set ofall words over the alphabet Σ.
The law ·has the properties : for everyu,v,w ∈Σ∗
u·(v·w) = (u·v)·w. (1)
u·ε=ε·u =u (2)
19 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid
Properties (1,2) are summarized by :
hΣ∗,·, εi is a monoid
It is called thefree-monoid over Σ. Other (useful) properties : for every u,v,w ∈Σ∗
u·v =u·w ⇒v =w and u·w =v·w ⇒u =v (3) The free monoid iscancellative.
Formal Languages-Course 1.
Formal languages Words
Free Monoid :left-cancellation
v=w u
v
w
w v m
uv=uw
21 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :right-cancellation
u=v v
u
w
u
v
m
uw=vw
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal
The reversalof a wordu of lengthℓ is defined by : uR :=u[ℓ−1]u[ℓ−2]· · ·u[0].
Example :
u=abcbbc, uR =cbbcba. Proposition
∀u,v ∈Σ∗,(u·v)R =vR·uR, εR =ε.
23 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :factors
A word v is called afactor of a wordw iff, there exist wordsα, β such that
α·v·β =w
A word v is called aprefix of a wordw iff, there exists a word β such that
v·β =w
A word v is called asuffix of a wordw iff, there exist a word α such that
α·v =w
Formal Languages-Course 1.
Formal languages Words
Free Monoid :factors
v issuffixofw w
α
v β
v
v w
w β
α
visfactorofw
visprefixofw
25 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Example Σ ={a,b,c}.
u =abb,v1 =aaabaab,v2=ababab,v3 =bbabb u is notfactor ofv1.
u is notfactor ofv2. u is factor of v3. u is suffix of v3. u is notprefix of v3
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Example
(continued) Σ ={a,b,c}.
u =abb,v4 =ababbabbbabbbb u is factor of v4 :
v4 =ababbabbbabbbb=ababbabbbabbbb=ababbabbbabbbb we say : u has 3occurrences as a factorof v4 :
one in position 2, one in position 5, one in position 9.
u is notprefix of v4. u is notsuffix ofv4.
27 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Example
(continued) Σ ={a,b,c}.
u =abb,v5 =abbcabb u prefix and suffix ofv5 :
v5=abbcabb=abbcabb.
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Exercice
Let Σ ={a,b,c}.
u =abbaa,v1=babbaab,v2 =abbabbaabb v3=aaabbabbabba,v4=bbaa,v5 =abababbaa 1- For which i ∈[1,5]is true thatu isfactor ofvi? 2- For which i ∈[1,5]is true thatu isprefix ofvi? 3- For which i ∈[1,5]is true thatu issuffix of vi?
29 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Exercice
Let us suppose that u,v are words such thatu is factor ofv. Does it imply, in general, that :
1- u is factor of vR? 2- uR is factor of vR? 3- uR is factor of v? Hint : use Proposition 2.2.
Exercice
Let us suppose that u,v are words such thatu is prefix ofv. Does it imply, in general, that :
1- u is prefix ofvR? 2- uR is prefix ofvR? 3- uR is suffix of vR?
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Exercice
Let Σbe an alphabet. Show that : for every wordsu1,u2,v ∈Σ∗, if u1,u2 are both prefixes ofv, then (u1 is prefix ofu2) or (u2 is prefix of u1)
Exercice
Let Σbe an alphabet. A wordv ∈Σ∗ is called square-freeif it does not have any factor of the form w·w for any w 6=ε.
For example v0=abcabaca is square-free,abcbcba is not square-free since cb·cbis factor of v0.
Show that, on the alphabet Σ ={a,b} there are onlyfinitely many square-free words. Computethe set of all square-free words on {a,b}.
31 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Exercice
(difficult, but solvedin combinatorics-textbooks)
A word v is calledcube-free if it does not have any factor of the form w ·w·w for any wordw 6=ε.
Show that, on the alphabet Σ ={a,b} there areinfinitely many cube-free words.
Formal Languages-Course 1.
Formal languages Words
Free Monoid :reversal-factors
Exercice
Let Σbe an alphabet. Show that : for every words u,v,u′,v′ ∈Σ∗, u·v=u′·v′ if and only ifthere exists some word w ∈Σ∗ such that,
(u=u′·w and v′ =w ·v) or (u′ =u·w and v=w·v′).
Hint : Distinguish the case where|u| ≤ |u′|and the case where
|u′| ≤ |u|; a picture will be helpful.
This property is called : equidivisibility of the free monoid.
33 / 49
Formal Languages-Course 1.
Formal languages Words
Free Monoid :powers
The powerun of the word u is defined by induction on the natural integer n :
u0 =ε, un+1=un·u For example : u1 =u
u2=u·u, (it is called the square ofu), u3=u·u·u, (it is called the cube ofu).
Formal Languages-Course 1.
Formal languages Words
Formal languages :Languages
35 / 49
Formal Languages-Course 1.
Formal languages Languages
Definition-Product
Definition
We call language over the alphabet Σ any subsetLof Σ∗. Examples :
Let Σ ={a,b,c}. Here are somelanguages over Σ: L1 ={a,ab,ac,bcc}, L2={ac,ca},
L3 ={u ∈Σ∗ | |u|is even}, L4 ={u ∈Σ∗ | ∃α, β ∈Σ∗ |u=α·bba·β}.
Product :forL,M languages overΣwe define L·M ={u·v |u ∈L,v ∈M} Example :
L1·L2 ={aac,abac,acac,bccac,aca,abca,acca,bccca}.
Formal Languages-Course 1.
Formal languages Languages
Reversal
The reversalof a language Lis defined by : LR ={uR |u ∈L}.
Example
L1 ={a,ab,ac,bcc}, L2={ac,ca}, L1R ={a,ba,ca,ccb}, L2R ={ca,ac}
37 / 49
Formal Languages-Course 1.
Formal languages Languages
Empty-Unit
The empty languageis denoted by ∅: it has no element :
∀u ∈Σ∗,u ∈ ∅./
Theunit language{ε}has the property that : for every languageL: L· {ε}={ε} ·L=L.
while :
L· ∅=∅, ∅ ·L=∅.
Formal Languages-Course 1.
Formal languages Languages
Power
The power Ln of the languageLis defined by induction on the natural integer n :
L0 ={ε}, Ln+1 =Ln·L For example : L1 =L
L2 =L·L, (it is called thesquare ofL), Example
L={ab,ca},L2 ={abab,abca,caab,caca} note that abca∈L2 but abca isnot a square.
39 / 49
Formal Languages-Course 1.
Formal languages Languages
Boolean operations
union :
L∪M ={u ∈Σ∗ |u ∈Lor u∈M} intersection :
L∩M ={u ∈Σ∗ |u ∈Land u∈M} set difference :
L\M ={u ∈Σ∗ |u ∈Land u ∈/M} complement :
CL= Σ∗\L.
Formal Languages-Course 1.
Formal languages Languages
Boolean operations
Note that :
L\M =L∩CM
∪,∩are bothcommutativeand associative; each of these two operations is distributiveon the other :
A∩(B∪C) = (A∩B)∪(A∩C) A∪(B∩C) = (A∪B)∩(A∪C)
41 / 49
Formal Languages-Course 1.
Formal languages Languages
Star
star :
L∗:=
[∞ n=0
Ln. Example :
L={ab,acc}
L∗ = {ε,ab,acc,abab,abacc,accab,accacc,ababab,abaccab,
accabab,accaccab,ababacc,abaccacc,accabacc,accaccacc, . . .}
Formal Languages-Course 1.
Formal languages Languages
Cross
cross :
L+:=
[∞ n=1
Ln. Example(continued) :
L={ab,acc}
L+ = {ab,acc,abab,abacc,accab,accacc,ababab,abaccab,
accabab,accaccab,ababacc,abaccacc,accabacc,accaccacc, . . .}
43 / 49
Formal Languages-Course 1.
Formal languages Languages
Left-quotient
Left-quotient : Let u ∈Σ∗,L⊆Σ∗.
u−1L={v ∈Σ∗ |uv ∈L} Let U ⊆Σ∗,L⊆Σ∗.
U−1L={v ∈Σ∗| ∃u∈U,uv ∈L} Note that
U−1L= [
u∈U
u−1L
u v
w∈L
v∈u−1L
Formal Languages-Course 1.
Formal languages Languages
Left-quotient
Left-quotient :
U−1L={v ∈Σ∗| ∃u∈U,uv ∈L}
u∈U v
w∈L
v∈U−1L
45 / 49
Formal Languages-Course 1.
Formal languages Languages
Right-quotient
Right-quotient : Let u ∈Σ∗,L⊆Σ∗.
Lu−1={v ∈Σ∗ |vu ∈L} Let U ⊆Σ∗,L⊆Σ∗.
LU−1 ={v ∈Σ∗| ∃u∈U,vu ∈L} Note that
LU−1= [
u∈U
Lu−1
u v
w∈L
v∈Lu−1
Formal Languages-Course 1.
Formal languages Languages
Quotient
Example
LetL={ab,acb,abb,bba}
(ab)−1L={ε,b}, (bb)−1L={a}, a−1L={b,cb,bb} Lb−1={a,ac,ab}, La−1 ={bb},
47 / 49
Formal Languages-Course 1.
Formal languages Languages
Quotient
Example Let L= (aab)∗
(aa)−1L=b·(aab)∗, (b)−1L=∅, (aab)−1L=L. L(aa)−1 =∅, Lb−1 = (aab)∗·aa, L(aab)−1 =L.