• Aucun résultat trouvé

Formal Languages-Course 1.

N/A
N/A
Protected

Academic year: 2022

Partager "Formal Languages-Course 1."

Copied!
48
0
0

Texte intégral

(1)

Formal Languages-Course 1.

Formal Languages-Course 1.

Géraud Sénizergues

Bordeaux university

04/05/2020

Master computer-science MINF19, IEI, 2019/20

1 / 49

(2)

Formal Languages-Course 1.

contents

1 Introduction

What are formal languages ? Brief history

What shall we study ? Where is it useful ?

2 Formal languages Words

Languages

(3)

Formal Languages-Course 1.

Introduction

3 / 49

(4)

Formal Languages-Course 1.

Introduction

What are formal languages ?

What are formal languages

A language is, informally, a way to communicate by means of sequences of symbols (whether written symbols or spoken elementary sounds).

Examples :

Natural languages :Written english - spoken english- written international scientific english- written Vietnamese- written French, etc ...

Artificial languages : Formal mathematical language (as defined by some logical system, the “sequent calculus” for example), Lisp (a programming language), Lambda-calculus (a “theoretical”

programming language), etc ...

(5)

Formal Languages-Course 1.

Introduction

What are formal languages ?

What are formal languages

Two aspects for each language :

the syntax : what are the correct sequences of symbols (words, sentences, discourses,...)

the semantics: what does mean a sentence ? One hopes the meaning of a sentence can be deduced from the structure of the sentence and the meaningof each elementary component of the sentence (compositionality principle).

5 / 49

(6)

Formal Languages-Course 1.

Introduction

What are formal languages ?

What are formal languages

Both aspects (syntax and semantics) can be described by mathematical models.

A model for the syntax is a formal language i.e. a set of words.

A model for the semantics is aformal semanticsi.e. a mathematical definition of a functional :

{correct sentences}→{mathematical functions} that associates to every sentence its meaning.

(7)

Formal Languages-Course 1.

Introduction Brief history

Brief history

1850-1920-now : mathematicians defined formal languages expressing mathematics [Frege, Peano, Pieri, Hilbert,

Russel-Whitehead ]. These were languages used to model the mathematical proofs. Theses languages were not rigorously used in practise. They were necessary to study the power of mathematics : what is provable and what is not provable.

1920-1960-now : linguists modelednatural languagesby several kinds of formal grammars : AB-grammars, context-free grammars, Lambek-grammars, etc ...

1950-now : computer-scientists defined programming languages as formal languages, using rational expressions, grammars, automata,

7 / 49

(8)

Formal Languages-Course 1.

Introduction Brief history

Brief history

1960-now :logicians [Buchi, Rabin, Shellah, Muchnik-Semenov,etc ...] used automata in order to show decidability (and other properties) of some logical theories.

1970-now :Some formal languages expressing mathematics were practically implemented on computers ; thusformal proofs could be produced for many mathematical theorems [Automath 1970], HOL [produced the formal proof of the Kepler conjecture], Mizar, COQ [ACM software prize,2013], PVS,etc ...

(9)

Formal Languages-Course 1.

Introduction

What shall we study ?

What shall we study

For describing a formal language Lwe can use : - “algebraic operations”

operations on words and operations on languages : w ∈Lifw can be obtained by a specific sequence of operations

- “rewriting“

w ∈L ifw can be obtained from a starting word w0 by by a sequence of rewritings i.e. of local modifications

- “abstract machines” (automata)

w ∈L if the automaton A, on input w,computes, and eventually answers “yes”.

9 / 49

(10)

Formal Languages-Course 1.

Introduction

What shall we study ?

What shall we study

Algorithmic problems :

INPUT : adescriptionof a language (a rational expression, a grammar),

OUTPUT : anautomaton that recognizes the language.

The “automaton”is a mathematical object that can be easily turned into a program that solves the problem on a computer.

INPUT : aword and a language,

QUESTION : does the wordbelong to the language ? Example : given an allegedly C-program P : is P really a C-program ?

(11)

Formal Languages-Course 1.

Introduction

What shall we study ?

What shall we study

INPUT : two languages L,M: QUESTION :L=M?

Example : given a specificationof a language and aprogram that analyzes words, does the program implement the specification ? i.e.

does it recognize exactly what was defined as correct words ?

INPUT : a descriptionDof some kind,

OUTPUT : a descriptionE of another kind, defining thesame language.

Example : translate an automaton into a rational expression.

11 / 49

(12)

Formal Languages-Course 1.

Introduction Where is it useful ?

Where is it useful

Compiling : a compiler is a device that translates a given

source-language S into a given target-languageT, while preserving the semantics. The first step of a compiler consists in the lexical analysis of the input-text

P →P1

The second step consists in the syntacticanalysis of the textP1 : P1 →P2

where P2 is the abstract-syntax tree of P1. These two steps strongly lean on formal languages theory.

(13)

Formal Languages-Course 1.

Introduction Where is it useful ?

Where is it useful

Computer-architecture: building afinite-automaton is a step on the way to build a sequential boolean circuit that computes a given function.

Computational biology : many algorithms on long words (that model proteins or DNA-sequences) are based on automata or on rewriting-systems.

Automatic language processing: in order to treat natural language text and extract its semantics or to produce some natural language sentence expressing some meaning, the natural language must be modeled after some grammar and the semantics must be modeled as some function of the structure of the sentences (expressed by sometree).

13 / 49

(14)

Formal Languages-Course 1.

Formal languages

Formal languages : Words

(15)

Formal Languages-Course 1.

Formal languages Words

Free Monoid

Let Σbe a “set of symbols” (mathematically, Σis just some set).

We call word over the alphabet Σany finite sequence of symbols fromΣ :

Examples :

Σ ={a,b,c},

w1 =aabcabbc, w2 =bcabbcaaaa, w3 =cccaa, w4 =c. Length :

|w1|=8, |w2|=10, |w3|=5, |w4|=1 the length of a word u is the number of positions ofu.

One can denote the letter in position i of a wordu by u[i]. For example :

w3[0] =c,w3[1] =c,w3[2] =c,w3[3] =a,w3[4] =a

15 / 49

(16)

Formal Languages-Course 1.

Formal languages Words

Free Monoid

Length relative to a letter :

For a letter x ∈Σand a word u ∈Σ we note|u|x the number of positions of u that are labelled by letterx. Formally :

|u|x =Card{i ∈[0,|u| −1],u[i] =x}.

Example

u =abaccbaaa

|u|a=5, |u|b=2, |u|c =2

|u|=9=|u|a+|u|b+|u|c.

(17)

Formal Languages-Course 1.

Formal languages Words

Free Monoid

Product :

w1·w2 =aabcabbc·bcabbcaaaa=aabcabbcbcabbcaaaa For words the “concatenation product” of u,v is obtained by putting u before v and omitting the right-end ofu and left-end ofv.

w3·w4 =cccaa·c =cccaac. Empty-word :

We denote by εthe empty-word : it has length 0. The product by ε does not modify any word i.e. for every word u :

u·ε=ε·u =u

17 / 49

(18)

Formal Languages-Course 1.

Formal languages Words

Free Monoid

Product and length : for every words u,v :

|u·v|=|u|+|v|.

for every words u,v and letterx :

|u·v|x =|u|x +|v|x.

(19)

Formal Languages-Course 1.

Formal languages Words

Free Monoid

Σ : we denote byΣ the set ofall words over the alphabet Σ.

The law ·has the properties : for everyu,v,w ∈Σ

u·(v·w) = (u·v)·w. (1)

u·ε=ε·u =u (2)

19 / 49

(20)

Formal Languages-Course 1.

Formal languages Words

Free Monoid

Properties (1,2) are summarized by :

,·, εi is a monoid

It is called thefree-monoid over Σ. Other (useful) properties : for every u,v,w ∈Σ

u·v =u·w ⇒v =w and u·w =v·w ⇒u =v (3) The free monoid iscancellative.

(21)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :left-cancellation

v=w u

v

w

w v m

uv=uw

21 / 49

(22)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :right-cancellation

u=v v

u

w

u

v

m

uw=vw

(23)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal

The reversalof a wordu of lengthℓ is defined by : uR :=u[ℓ−1]u[ℓ−2]· · ·u[0].

Example :

u=abcbbc, uR =cbbcba. Proposition

∀u,v ∈Σ,(u·v)R =vR·uR, εR =ε.

23 / 49

(24)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :factors

A word v is called afactor of a wordw iff, there exist wordsα, β such that

α·v·β =w

A word v is called aprefix of a wordw iff, there exists a word β such that

v·β =w

A word v is called asuffix of a wordw iff, there exist a word α such that

α·v =w

(25)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :factors

v issuffixofw w

α

v β

v

v w

w β

α

visfactorofw

visprefixofw

25 / 49

(26)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Example Σ ={a,b,c}.

u =abb,v1 =aaabaab,v2=ababab,v3 =bbabb u is notfactor ofv1.

u is notfactor ofv2. u is factor of v3. u is suffix of v3. u is notprefix of v3

(27)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Example

(continued) Σ ={a,b,c}.

u =abb,v4 =ababbabbbabbbb u is factor of v4 :

v4 =ababbabbbabbbb=ababbabbbabbbb=ababbabbbabbbb we say : u has 3occurrences as a factorof v4 :

one in position 2, one in position 5, one in position 9.

u is notprefix of v4. u is notsuffix ofv4.

27 / 49

(28)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Example

(continued) Σ ={a,b,c}.

u =abb,v5 =abbcabb u prefix and suffix ofv5 :

v5=abbcabb=abbcabb.

(29)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Exercice

Let Σ ={a,b,c}.

u =abbaa,v1=babbaab,v2 =abbabbaabb v3=aaabbabbabba,v4=bbaa,v5 =abababbaa 1- For which i ∈[1,5]is true thatu isfactor ofvi? 2- For which i ∈[1,5]is true thatu isprefix ofvi? 3- For which i ∈[1,5]is true thatu issuffix of vi?

29 / 49

(30)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Exercice

Let us suppose that u,v are words such thatu is factor ofv. Does it imply, in general, that :

1- u is factor of vR? 2- uR is factor of vR? 3- uR is factor of v? Hint : use Proposition 2.2.

Exercice

Let us suppose that u,v are words such thatu is prefix ofv. Does it imply, in general, that :

1- u is prefix ofvR? 2- uR is prefix ofvR? 3- uR is suffix of vR?

(31)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Exercice

Let Σbe an alphabet. Show that : for every wordsu1,u2,v ∈Σ, if u1,u2 are both prefixes ofv, then (u1 is prefix ofu2) or (u2 is prefix of u1)

Exercice

Let Σbe an alphabet. A wordv ∈Σ is called square-freeif it does not have any factor of the form w·w for any w 6=ε.

For example v0=abcabaca is square-free,abcbcba is not square-free since cb·cbis factor of v0.

Show that, on the alphabet Σ ={a,b} there are onlyfinitely many square-free words. Computethe set of all square-free words on {a,b}.

31 / 49

(32)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Exercice

(difficult, but solvedin combinatorics-textbooks)

A word v is calledcube-free if it does not have any factor of the form w ·w·w for any wordw 6=ε.

Show that, on the alphabet Σ ={a,b} there areinfinitely many cube-free words.

(33)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :reversal-factors

Exercice

Let Σbe an alphabet. Show that : for every words u,v,u,v ∈Σ, u·v=u·v if and only ifthere exists some word w ∈Σ such that,

(u=u·w and v =w ·v) or (u =u·w and v=w·v).

Hint : Distinguish the case where|u| ≤ |u|and the case where

|u| ≤ |u|; a picture will be helpful.

This property is called : equidivisibility of the free monoid.

33 / 49

(34)

Formal Languages-Course 1.

Formal languages Words

Free Monoid :powers

The powerun of the word u is defined by induction on the natural integer n :

u0 =ε, un+1=un·u For example : u1 =u

u2=u·u, (it is called the square ofu), u3=u·u·u, (it is called the cube ofu).

(35)

Formal Languages-Course 1.

Formal languages Words

Formal languages :Languages

35 / 49

(36)

Formal Languages-Course 1.

Formal languages Languages

Definition-Product

Definition

We call language over the alphabet Σ any subsetLof Σ. Examples :

Let Σ ={a,b,c}. Here are somelanguages over Σ: L1 ={a,ab,ac,bcc}, L2={ac,ca},

L3 ={u ∈Σ | |u|is even}, L4 ={u ∈Σ | ∃α, β ∈Σ |u=α·bba·β}.

Product :forL,M languages overΣwe define L·M ={u·v |u ∈L,v ∈M} Example :

L1·L2 ={aac,abac,acac,bccac,aca,abca,acca,bccca}.

(37)

Formal Languages-Course 1.

Formal languages Languages

Reversal

The reversalof a language Lis defined by : LR ={uR |u ∈L}.

Example

L1 ={a,ab,ac,bcc}, L2={ac,ca}, L1R ={a,ba,ca,ccb}, L2R ={ca,ac}

37 / 49

(38)

Formal Languages-Course 1.

Formal languages Languages

Empty-Unit

The empty languageis denoted by ∅: it has no element :

∀u ∈Σ,u ∈ ∅./

Theunit language{ε}has the property that : for every languageL: L· {ε}={ε} ·L=L.

while :

L· ∅=∅, ∅ ·L=∅.

(39)

Formal Languages-Course 1.

Formal languages Languages

Power

The power Ln of the languageLis defined by induction on the natural integer n :

L0 ={ε}, Ln+1 =Ln·L For example : L1 =L

L2 =L·L, (it is called thesquare ofL), Example

L={ab,ca},L2 ={abab,abca,caab,caca} note that abca∈L2 but abca isnot a square.

39 / 49

(40)

Formal Languages-Course 1.

Formal languages Languages

Boolean operations

union :

L∪M ={u ∈Σ |u ∈Lor u∈M} intersection :

L∩M ={u ∈Σ |u ∈Land u∈M} set difference :

L\M ={u ∈Σ |u ∈Land u ∈/M} complement :

CL= Σ\L.

(41)

Formal Languages-Course 1.

Formal languages Languages

Boolean operations

Note that :

L\M =L∩CM

∪,∩are bothcommutativeand associative; each of these two operations is distributiveon the other :

A∩(B∪C) = (A∩B)∪(A∩C) A∪(B∩C) = (A∪B)∩(A∪C)

41 / 49

(42)

Formal Languages-Course 1.

Formal languages Languages

Star

star :

L:=

[ n=0

Ln. Example :

L={ab,acc}

L = {ε,ab,acc,abab,abacc,accab,accacc,ababab,abaccab,

accabab,accaccab,ababacc,abaccacc,accabacc,accaccacc, . . .}

(43)

Formal Languages-Course 1.

Formal languages Languages

Cross

cross :

L+:=

[ n=1

Ln. Example(continued) :

L={ab,acc}

L+ = {ab,acc,abab,abacc,accab,accacc,ababab,abaccab,

accabab,accaccab,ababacc,abaccacc,accabacc,accaccacc, . . .}

43 / 49

(44)

Formal Languages-Course 1.

Formal languages Languages

Left-quotient

Left-quotient : Let u ∈Σ,L⊆Σ.

u1L={v ∈Σ |uv ∈L} Let U ⊆Σ,L⊆Σ.

U−1L={v ∈Σ| ∃u∈U,uv ∈L} Note that

U−1L= [

u∈U

u−1L

u v

wL

vu−1L

(45)

Formal Languages-Course 1.

Formal languages Languages

Left-quotient

Left-quotient :

U−1L={v ∈Σ| ∃u∈U,uv ∈L}

uU v

wL

vU−1L

45 / 49

(46)

Formal Languages-Course 1.

Formal languages Languages

Right-quotient

Right-quotient : Let u ∈Σ,L⊆Σ∗.

Lu1={v ∈Σ |vu ∈L} Let U ⊆Σ,L⊆Σ.

LU−1 ={v ∈Σ| ∃u∈U,vu ∈L} Note that

LU−1= [

u∈U

Lu−1

u v

wL

vLu−1

(47)

Formal Languages-Course 1.

Formal languages Languages

Quotient

Example

LetL={ab,acb,abb,bba}

(ab)−1L={ε,b}, (bb)−1L={a}, a−1L={b,cb,bb} Lb−1={a,ac,ab}, La−1 ={bb},

47 / 49

(48)

Formal Languages-Course 1.

Formal languages Languages

Quotient

Example Let L= (aab)

(aa)−1L=b·(aab), (b)−1L=∅, (aab)−1L=L. L(aa)−1 =∅, Lb−1 = (aab)·aa, L(aab)−1 =L.

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

We present a system that can generate natural language text in multiple different languages, in large quantities and at low costs due to the utilization

When the rule is extracted from the text collection, the support of the set X support(X) usually stands for the proportion of the documents where concepts X occur and the confidence

We are grateful for comments to the participants in three courses we taught in which we presented a version of this material: Type Theory with Records for Natural Language Se-

This module searches a file of code or documentation for blocks of text that look like an interactive Python session, of the form you have already seen many times in this book.

In Figure 1 and in Section 3 we review all the necessary elements for our investigation: the notion of argumentation schemes; how to represent argument networks in the

During the implementation of the above method, existing models for identifying the most informative components of the natural language text from the position of

Keywords: Natural Language Processing, Compositional Distributional Model of Meaning, Quantum Logic, Similarity Measures..