Learning Disjunctive Logic Programs from Interpretation Transition

(1)

Learning Disjunctive Logic Programs from Interpretation Transition

Yi Huang^1,3, Yisong Wang^1,?, Ying Zhang², and Mingyi Zhang²

1Guizhou University, Guiyang 550025, P. R. China

2Guizhou Academy of sciences, Guiyang 550001, P. R. China

3Chongqing University of Arts and Sciences, Chongqing 402160, P. R. China

Abstract. We present a new framework for learning disjunctive logic programs from interpretation transitions, calledLFDT. It is a nontrivial extension to Inoue, Ribeiro and Sakama’sLF1Tlearning framework, which learns normal logic programs from interpretation transitions. Two resolutions for disjunctive rules are also presented and used inLFDTto simplify learned disjunctive rules.

1 Introduction

In machine learning and specifically inductive logic programming [1, 2], it is an im- portant task to learn the dynamics of complex systems, such as Boolean networks. A Boolean network consist of a set of Boolean variables each of which has a Boolean function. It is a simple yet a powerful mathematical tool to describe dynamics of complex systems [3, 4].

Since a seminal work on representing Boolean networks by logic programs [5], there is increasing interest in approaching the task from the perspective of logic programming [6–8]. In particular, Inoue, Ribeiro and Sakama proposed a novel framework, namedLF1T, for learning normal logic programs from interpretation transitions that are pairs hI, Jiof interpretations withTP(I) = J, whereTP is the immediate consequence operator for a normal logic programP [6].

It is well-known that disjunctive logic programs [9, 10] are substantially more ex- pressive than normal logic programs at many aspects. To extend LF1Tfor learning disjunctive logic programs from interpretation transitions, we present a new immediate consequence operatorT_P^dfor a disjunctive logic programP. Informally,T_P^d(I)consists of all minimal models of the heads of rules inPwhose bodies are satisfied byI.

Comparing withLF1Tframework for learning normal logic programs, a nontrivial work in the paper is to handle with nondeterministic interpretation transitions,i.e., it is possible there are two interpretation transitionshI, JiandhI, J⁰iin an observation withJ 6=J⁰. Combining with two new proposed resolutions for disjunctive rules, we achieve the new framework for learning disjunctive logic programs from interpretation transitions, calledLFDT. It is proved being both sound and complete.

?Corresponding author, [email protected].

(2)

2 Disjunctive Logic Programs

We assume a underlying first-order language without proper function symbolsLand denote its Herbrand base (the set of all ground atoms) byA. We assumeAis finite for our learning purpose.

A(disjunctive) logic programis a finite set of(disjunctive) rulesof the form A1∨ · · · ∨Ak←B1∧ · · · ∧Bm∧ ¬C1∧ · · · ∧ ¬Cn, (1) wherek≥1, m≥0, n≥0andAi(1≤i≤k), Bj(1≤j≤m), andCs(1≤s≤n) are atoms ofL.

For a rulerof the form (1), thehead ofr, written hd(r), isA1∨ · · · ∨Ak; the bodyofr, writtenbd(r), is the conjunctionB1∧ · · · ∧Bm∧ ¬C1∧ · · · ∧ ¬Cn; the atoms occurring in the body ofrpositively (resp. negatively) is denoted bybd⁺(r) = {B1, . . . , Bm} (resp.bd⁻(r) = {C1, . . . , Cn}). If k = 1 then ris normal. Anor- mal logic programis a finite set of normal rules. Given a logic programP, we denote hd(P) =S

r∈Phd(r)andbd(P) =S

r∈Pbd(r). For convenience, we also writehd(r) as the set{A1, . . . , Ak}, andbd(r)as the set{B1, . . . , Bm,¬C1, . . . ,¬Cn}when there is no confusion. In this sense, a rulerof the form (1) can be alternatively written as

{A1, . . . , Ak} ← {B1, . . . , Bm,¬C1, . . . ,¬Cn}. (2) Asubstitutionθis a function from variables to terms, which is written in the following form

{x1/t₁, . . . , x_n/t_n} (3) wherexis(1≤i≤n)are pair-wise distinct variables andtis(1 ≤i≤n)are terms (of the languageL). Theapplicationeθofθto an expressioneis obtained fromeby simultaneously replacing all occurrences of each variablexi inewith the same term ti, andeθis called aninstanceofe[11]. The substitutionθisgroundifticontains no variables for everyxi/tiinθ. If the instance eθofecontains no variable then it is a ground instanceofe. Thegroundof a logic programP, writtenground(P), is the set S

r∈Pground(r), where

ground(r) ={rθ|rθis a ground instance ofr}. (4) For simplicity, we assume logic programs are always ground in the following of the paper, unless explicitly stated otherwise.

Letr₁, r₂be two rules. We say thatr₁subsumesr₂, writtenr₁r₂, if there exists a substitutionθsuch thathd(r₁)θ⊆hd(r₂)andbd(r₁)θ⊆bd(r₂). In this sense, we say thatr₁ismore (or equally) general thanr₂, andr₂isless (or equally) general thanr₁. Byr₁≺r₂we meanr₁subsumesr₂butr₂does not subsumer₁. For a logic program P, we denoteSR(P)the logic program obtained fromP by removing all rules that are properly subsumed by some other rules inP,i.e.,

SR(P) ={r∈P |6 ∃r⁰∈P s.t. r⁰≺r}. (5) A(Herbrand) interpretationIis a set of ground atoms. An interpretationIsatisfies a ground ruler ifI satisfiesbd(r)implies I satisfieshd(r). Itsatisfies a ruler ifI

(3)

satisfies every ground rule in ground(r). It satisfiesa logic program P if it satisfies every rule in the logic programP. In this case we callI amodelof a (ground) rule (resp., logic program). In the following we use “|=” to denote the satisfaction relation, and “≡” to denote the classical equivalence relation.

A ruler is applicable w.r.t. an interpretation I if I |= bd(r). Let P be a logic program. We denoteapp(P, I)the set of rules inPthat are applicablew.r.t.I.

Definition 1. LetPbe a disjunctive logic program. Theimmediate consequence oper- atorT_P^d : 2^A→2²^Ais defined as follows, forI⊆ A,

T_P^d(I) ={S|Sis a minimal (under set inclusion) model of hd(app(P, I))}. (6) Please note that the operatorT_P^d is a generalization to the operatorT_P for normal logic programs [12], and it is similar to the operatorT_P^ndfor logic programs with abstract constraint atoms [13].

3 Two resolutions

To extend the learning algorithm for normal logic programs in [6] to disjunctive logic programs, we extend its ground resolution for disjunctive rules and present a combined resolution. Recall that a literallis either an atom or its classical negation. The comple- ment ofl, writtenl, is defined asA=¬Aand¬A=AwhereAis an atom. For a set Sof atoms, we denote¬S={¬A|A∈S}.

Definition 2 (disjunctively ground resolution).Letrandr⁰be two ground rules. The rulerisdisjunctively ground resolvable w.r.t.r⁰on a literallwhenever

(a) l∈bd(r)andl∈bd(r⁰), (b) bd(r⁰)\ {l} ⊆bd(r)\ {l}, and (c) hd(r⁰)⊆hd(r).

The disjunctive ground resolvent ofrw.r.t.r⁰onlis the rule hd(r)←bd(r)\ {l}. We denote it bygr(r, r⁰). In particular, if the above condition (b) is strengthen to

bd(r⁰)\ {l}=bd(r)\ {l} (7) then we say thatrisdisjunctively naive resolvablew.r.t.r⁰onl.

The following proposition shows that the disjunctive ground resolution preserves the equivalence of theT_P^d operator in terms ofT_P^d(I) =T_P^d0(I)for everyI⊆ Awhere P⁰is obtained fromP by adding some disjunctive ground resolvent.

Proposition 1. Let P be a ground logic program containing two disjunctive rulesr andr⁰ such thatris disjunctively ground resolvable w.r.t.r⁰ on a literall, andQ = P∪ {gr(r, r⁰)}. Then hd(app(P, I))≡hd(app(Q, I))for everyI⊆ A.

(4)

Definition 3 (combined resolution).Letr₁, . . . , r_kandrbe the following rules:

r1:hd(r1)←bd⁺∪ ¬(bd⁻∪ {b1}), ...

rk :hd(rk)←bd⁺∪ ¬(bd⁻∪ {bk}), r:hd(r)←bd⁺∪ {bi|1≤i≤k} ∪ ¬bd⁻⁰ such that

– bd⁻∩ {bi|1≤i≤k}=∅, and – hd(r_i)⊆hd(r)for everyi(1≤i≤k).

Then thecombined resolventofr, r1, . . . , rk, writtencr(r, r1, . . . , rk), is the rule r^∗:hd(r)←bd⁺∪ ¬(bd⁻∪bd⁻⁰). (8) In this case we say that the rulesr, r₁, . . . , r_karecombined resolvable.

The next proposition shows that, similar to the disjunctive ground resolution, the combined resolution preserves the equivalence of theT_P^doperator as well.

Proposition 2. Let P be a logic program containing rulesr, r1, . . . , rk such thatr, r1,. . .,rk are combined resolvable, andQ = P ∪ {cr(r, r1, . . . , rk)}. It holds that hd(app(P, I))≡hd(app(Q, I))for anyI⊆ A.

4 Learning from 1-step Transitions

In the section we present our inductive learning task for disjunctive logic programs and its learning algorithm. Properties of the algorithm are investigated as well.

4.1 The Learning Task

Abackground theoryis a logic program. Anexample(orobservation) is astate transition(orinterpretation transition),i.e., a tuplehI, JiwithI ⊆ AandJ ⊆ A, which means that the stateJ is a candidate successor of the stateIin a Boolean network, or J ∈T_P^d(I)for a disjunctive logic programP. LetEbe a set of examples. We denote Eⁱ ={I | hI, Ji ∈ E},E^o ={J | hI, Ji ∈ S}andE(I) = {J | hI, Ji ∈ E}for I⊆ A. The setEistotalwheneverEⁱ= 2^A.

Definition 4 (the learning task). An inductive learning task from nondeterministic state transitions is, given a background theoryBand a setEof examples (state transitions), to find a hypothesis (a logic program)Hsuch that, for every examplehI, Ji ∈E, J ∈T_B∪H^d (I)holds.

(5)

The above inductive learning task is written asILT(B, E). Such a hypothesisHto the inductive learning task is called asolutiontoILT(B, E). For our learning purpose, the given examples have to be restricted. For instance, letE = {h∅,∅i,h∅,{p}i}and B =∅. There will be no logic programH satisfying{∅,{p}} ⊆T_B∪H^d (∅), since the sets in the collectionT_B∪H^d (∅)are incomparable under set inclusion, while∅and{p}

are comparable under set inclusion.

A set E of state transitions iscoherent if J andJ⁰ are incomparable under set inclusion for every hI, JiandhI, J⁰iin E,i.e.,J and J⁰ are all minimal under set inclusion. A setEof state transitions isconsistent w.r.t.a logic programP, if for each hI, Ji ∈E,I|=bd(r)impliesJ |=hd(r)holds for every rulerinP.

The following property identifies the sufficient and necessary condition for the ex- istence of a solution to an inductive learning task.

Proposition 3. Given an inductive learning task ILT(B, E)whereBis a background theory andEis a set of observations, there exists a solutionHto ILT(B, E)if and only ifEis coherent andEis consistent w.r.t.B.

4.2 An Inductive Learning Algorithm

In the following we present a bottom-up method to compute a logic program for our inductive learning tasks. This method generates hypothesis by generalization from the most specific rules until all examples are covered.

Firstly, letq∈ AandI⊆ A. We denoter^I_q the following rule:

q←I∪ ¬I (9)

which is the most specific normal rule such that qbelongs to a candidate successor of the state I. Now theLFDTalgorithm is showed in Algorithm 1. Intuitively, this algorithm is to construct the following rules for these examples with the same first state in the state transitionshI, J1i, . . . ,hI, JmiofE:

H ←I∪ ¬I, H is a minimal hitting set ofJ1, . . . , Jm. (10) The algorithmAddRule, shown in Algorithm 2, adds these rules into the result. It also simplifies the result by removing being subsumed rules through disjunctive ground resolution and combined resolution. Since disjunctive ground resolution is a generalization of ground resolution, this algorithm is also a generalization of the algorithmLF1T in [6], which learns normal logic programs from(deterministic) state transitions,i.e., I₁6=I₂for any two distinct state transitionshI1, J₁iandhI2, J₂iinE.

LetP be a logic program, and E be a coherent set of state transitions which is consistentw.r.t.a background theoryB. The logic programP iscompleteforE w.r.t.

B if{J | hI, Ji ∈ E} ⊆T_B∪P^d (I)for anyI ∈ Eⁱ, it issoundforEifT_B∪P^d (I)⊆ {J | hI, Ji ∈E}for anyI∈Eⁱ. A learning algorithm iscomplete(resp.sound) forE w.r.t.B if its output is complete (resp. sound) forEw.r.t.B. In the following we show the correctness of theLFDTalgorithm according to its soundness and completeness.

Theorem 1. The algorithmLFDTis sound and complete (with disjunctive ground resolution, combined resolution, and/or subsumption reduction). Namely, ifEis coherent and B is consistent w.r.t.E then the output P by the algorithmLFDT is sound and complete forEw.r.t.B.

(6)

Algorithm 1: LFDT(E, B)

Input:A setEof state transitions and a background theoryBsuch thatEis coherent and it is consistentw.r.t.B

Output:A logic programP

1 P←B;

2 foreachhI, Ji ∈Edo

3 Q← {rq^I|q∈J};

4 E←E\ {hI, Ji};

5 foreachhI⁰, J⁰i ∈EwithI⁰=Ido

6 E←E\ {hI⁰, J⁰i};

7 foreachp∈J⁰andr∈Qdo Q←Q∪ {hd(r)∪ {p} ←bd(r)};

8 ;

9 end

10 foreachr∈Qdo AddRule(r, P);

11 ;

12 end

13 P←P\B;

14 returnP;

5 Concluding Remarks and Future Work

In this paper we proposed a new frameworkLFDTfor learning disjunctive programs from interpretation transitions. It is a nontrivial and substantial extension to theLF1T framework. One remaining challenge work is to apply the learning approach to practical domains, such as bio-informatics for whichLF1Tis successfully applied.

Acknowledgement. We thank reviewers for their helpful comments. This work is partially supported by NSFC under grant 63170161, Stadholder Fund of Guizhou Province under grant (2012)62, Outstanding Young Talent Training Fund of Guizhou Province under grant (2015)01 and Science and Technology Fund of Guizhou Province under grant [2014]76400, Scientific and Technological Research Program of Chongqing Municipal Education Commission under Grant No. KJ1601129.

References

1. Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods.

Journal of Logic Programming, 19/20:629–679, 1994.

2. Stephen Muggleton, Luc De Raedt, David Poole, Ivan Bratko, Peter A. Flach, Katsumi Inoue, and Ashwin Srinivasan. ILP turns 20 - biography and future challenges.Machine Learning, 86(1):3–23, 2012.

3. S.A. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets.

Journal of Theoretical Biology, 22(3):437 – 467, 1969.

4. Stuart A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution.

Oxford: Oxford University Press, 1993.

5. Katsumi Inoue. Logic programming for boolean networks. In Toby Walsh, editor,IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 924–930. IJCAI/AAAI, 2011.

(7)

Algorithm 2: AddRule(r, P) Input:A rulerand a logic programP

1 if∃r⁰∈Ps.t.r⁰≺rthen return;

2 ;

3 foreachr⁰∈Pdo ifr≺r⁰thenP←P\ {r⁰};

4 ;

5 ;

6 P←P∪ {r};

7 whiler, r1, . . . , rk∈Pare combined resolvabledo AddRule(cr(r, r1, . . . , rk));

8 ;

9 foreachr⁰∈Pdo

10 ifris disjunctively ground resolvable w.r.t.r⁰then

11 AddRule(gr(r, r⁰), P);

12 else ifr⁰is disjunctively ground resolvable w.r.t.rthen

13 AddRule(gr(r⁰, r), P);

14 end

15 end

16 returnP;

6. Katsumi Inoue, Tony Ribeiro, and Chiaki Sakama. Learning from interpretation transition.

Machine Learning, 94(1):51–79, 2014.

7. Maxime Folschette, Lo¨ıc Paulev´e, Katsumi Inoue, Morgan Magnin, and Olivier H. Roux.

Identification of biological regulatory networks from process hitting models. Theoretical Computer Science, 568:49–71, 2015.

8. David Mart´ınez, Tony Ribeiro, Katsumi Inoue, Guillem Aleny`a, and Carme Torras. Learning probabilistic action models from interpretation transitions. In Marina De Vos, Thomas Eiter, Yuliya Lierler, and Francesca Toni, editors,Proceedings of the Technical Communications of the 31st International Conference on Logic Programming (ICLP 2015), Cork, Ireland, August 31 - September 4, 2015., volume 1433 of CEUR Workshop Proceedings. CEUR- WS.org, 2015.

9. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases.New Generation Computing, 9:365–385, 1991.

10. Jorge Lobo, Jack Minker, and Arcot Rajasekar. Foundations of disjunctive logic programming. Logic Programming. MIT Press, 1992.

11. J. W. Lloyd.Foundations of Logic Programming. Springer-Verlag, 2nd edition, 1987.

12. Maarten H. van Emden and Robert A. Kowalski. The semantics of predicate logic as a programming language. Journal of the ACM, 23(4):733–742, 1976.

13. Victor W. Marek and Miroslaw Truszczynski. Logic programs with abstract constraint atoms.

InProceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence (AAAI 2004), pages 86–91, San Jose, California, USA, 2004. AAAI Press.

14. Gordon D. Plotkin. A note on inductive generalisation. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 5, chapter 8, pages 153–163. Edingburgh Unviersity Press, 1970.