Learning Disjunctive Logic Programs from Interpretation Transition
Yi Huang1,3, Yisong Wang1,?, Ying Zhang2, and Mingyi Zhang2
1Guizhou University, Guiyang 550025, P. R. China
2Guizhou Academy of sciences, Guiyang 550001, P. R. China
3Chongqing University of Arts and Sciences, Chongqing 402160, P. R. China
Abstract. We present a new framework for learning disjunctive logic programs from interpretation transitions, calledLFDT. It is a nontrivial extension to Inoue, Ribeiro and Sakama’sLF1Tlearning framework, which learns normal logic pro- grams from interpretation transitions. Two resolutions for disjunctive rules are also presented and used inLFDTto simplify learned disjunctive rules.
1 Introduction
In machine learning and specifically inductive logic programming [1, 2], it is an im- portant task to learn the dynamics of complex systems, such as Boolean networks. A Boolean network consist of a set of Boolean variables each of which has a Boolean function. It is a simple yet a powerful mathematical tool to describe dynamics of com- plex systems [3, 4].
Since a seminal work on representing Boolean networks by logic programs [5], there is increasing interest in approaching the task from the perspective of logic pro- gramming [6–8]. In particular, Inoue, Ribeiro and Sakama proposed a novel frame- work, namedLF1T, for learning normal logic programs from interpretation transitions that are pairs hI, Jiof interpretations withTP(I) = J, whereTP is the immediate consequence operator for a normal logic programP [6].
It is well-known that disjunctive logic programs [9, 10] are substantially more ex- pressive than normal logic programs at many aspects. To extend LF1Tfor learning disjunctive logic programs from interpretation transitions, we present a new immediate consequence operatorTPdfor a disjunctive logic programP. Informally,TPd(I)consists of all minimal models of the heads of rules inPwhose bodies are satisfied byI.
Comparing withLF1Tframework for learning normal logic programs, a nontrivial work in the paper is to handle with nondeterministic interpretation transitions,i.e., it is possible there are two interpretation transitionshI, JiandhI, J0iin an observation withJ 6=J0. Combining with two new proposed resolutions for disjunctive rules, we achieve the new framework for learning disjunctive logic programs from interpretation transitions, calledLFDT. It is proved being both sound and complete.
?Corresponding author, [email protected].
2 Disjunctive Logic Programs
We assume a underlying first-order language without proper function symbolsLand denote its Herbrand base (the set of all ground atoms) byA. We assumeAis finite for our learning purpose.
A(disjunctive) logic programis a finite set of(disjunctive) rulesof the form A1∨ · · · ∨Ak←B1∧ · · · ∧Bm∧ ¬C1∧ · · · ∧ ¬Cn, (1) wherek≥1, m≥0, n≥0andAi(1≤i≤k), Bj(1≤j≤m), andCs(1≤s≤n) are atoms ofL.
For a rulerof the form (1), thehead ofr, written hd(r), isA1∨ · · · ∨Ak; the bodyofr, writtenbd(r), is the conjunctionB1∧ · · · ∧Bm∧ ¬C1∧ · · · ∧ ¬Cn; the atoms occurring in the body ofrpositively (resp. negatively) is denoted bybd+(r) = {B1, . . . , Bm} (resp.bd−(r) = {C1, . . . , Cn}). If k = 1 then ris normal. Anor- mal logic programis a finite set of normal rules. Given a logic programP, we denote hd(P) =S
r∈Phd(r)andbd(P) =S
r∈Pbd(r). For convenience, we also writehd(r) as the set{A1, . . . , Ak}, andbd(r)as the set{B1, . . . , Bm,¬C1, . . . ,¬Cn}when there is no confusion. In this sense, a rulerof the form (1) can be alternatively written as
{A1, . . . , Ak} ← {B1, . . . , Bm,¬C1, . . . ,¬Cn}. (2) Asubstitutionθis a function from variables to terms, which is written in the fol- lowing form
{x1/t1, . . . , xn/tn} (3) wherexis(1≤i≤n)are pair-wise distinct variables andtis(1 ≤i≤n)are terms (of the languageL). Theapplicationeθofθto an expressioneis obtained fromeby simultaneously replacing all occurrences of each variablexi inewith the same term ti, andeθis called aninstanceofe[11]. The substitutionθisgroundifticontains no variables for everyxi/tiinθ. If the instance eθofecontains no variable then it is a ground instanceofe. Thegroundof a logic programP, writtenground(P), is the set S
r∈Pground(r), where
ground(r) ={rθ|rθis a ground instance ofr}. (4) For simplicity, we assume logic programs are always ground in the following of the paper, unless explicitly stated otherwise.
Letr1, r2be two rules. We say thatr1subsumesr2, writtenr1r2, if there exists a substitutionθsuch thathd(r1)θ⊆hd(r2)andbd(r1)θ⊆bd(r2). In this sense, we say thatr1ismore (or equally) general thanr2, andr2isless (or equally) general thanr1. Byr1≺r2we meanr1subsumesr2butr2does not subsumer1. For a logic program P, we denoteSR(P)the logic program obtained fromP by removing all rules that are properly subsumed by some other rules inP,i.e.,
SR(P) ={r∈P |6 ∃r0∈P s.t. r0≺r}. (5) A(Herbrand) interpretationIis a set of ground atoms. An interpretationIsatisfies a ground ruler ifI satisfiesbd(r)implies I satisfieshd(r). Itsatisfies a ruler ifI
satisfies every ground rule in ground(r). It satisfiesa logic program P if it satisfies every rule in the logic programP. In this case we callI amodelof a (ground) rule (resp., logic program). In the following we use “|=” to denote the satisfaction relation, and “≡” to denote the classical equivalence relation.
A ruler is applicable w.r.t. an interpretation I if I |= bd(r). Let P be a logic program. We denoteapp(P, I)the set of rules inPthat are applicablew.r.t.I.
Definition 1. LetPbe a disjunctive logic program. Theimmediate consequence oper- atorTPd : 2A→22Ais defined as follows, forI⊆ A,
TPd(I) ={S|Sis a minimal (under set inclusion) model of hd(app(P, I))}. (6) Please note that the operatorTPd is a generalization to the operatorTP for normal logic programs [12], and it is similar to the operatorTPndfor logic programs with ab- stract constraint atoms [13].
3 Two resolutions
To extend the learning algorithm for normal logic programs in [6] to disjunctive logic programs, we extend its ground resolution for disjunctive rules and present a combined resolution. Recall that a literallis either an atom or its classical negation. The comple- ment ofl, writtenl, is defined asA=¬Aand¬A=AwhereAis an atom. For a set Sof atoms, we denote¬S={¬A|A∈S}.
Definition 2 (disjunctively ground resolution).Letrandr0be two ground rules. The rulerisdisjunctively ground resolvable w.r.t.r0on a literallwhenever
(a) l∈bd(r)andl∈bd(r0), (b) bd(r0)\ {l} ⊆bd(r)\ {l}, and (c) hd(r0)⊆hd(r).
The disjunctive ground resolvent ofrw.r.t.r0onlis the rule hd(r)←bd(r)\ {l}. We denote it bygr(r, r0). In particular, if the above condition (b) is strengthen to
bd(r0)\ {l}=bd(r)\ {l} (7) then we say thatrisdisjunctively naive resolvablew.r.t.r0onl.
The following proposition shows that the disjunctive ground resolution preserves the equivalence of theTPd operator in terms ofTPd(I) =TPd0(I)for everyI⊆ Awhere P0is obtained fromP by adding some disjunctive ground resolvent.
Proposition 1. Let P be a ground logic program containing two disjunctive rulesr andr0 such thatris disjunctively ground resolvable w.r.t.r0 on a literall, andQ = P∪ {gr(r, r0)}. Then hd(app(P, I))≡hd(app(Q, I))for everyI⊆ A.
Definition 3 (combined resolution).Letr1, . . . , rkandrbe the following rules:
r1:hd(r1)←bd+∪ ¬(bd−∪ {b1}), ...
rk :hd(rk)←bd+∪ ¬(bd−∪ {bk}), r:hd(r)←bd+∪ {bi|1≤i≤k} ∪ ¬bd−0 such that
– bd−∩ {bi|1≤i≤k}=∅, and – hd(ri)⊆hd(r)for everyi(1≤i≤k).
Then thecombined resolventofr, r1, . . . , rk, writtencr(r, r1, . . . , rk), is the rule r∗:hd(r)←bd+∪ ¬(bd−∪bd−0). (8) In this case we say that the rulesr, r1, . . . , rkarecombined resolvable.
The next proposition shows that, similar to the disjunctive ground resolution, the combined resolution preserves the equivalence of theTPdoperator as well.
Proposition 2. Let P be a logic program containing rulesr, r1, . . . , rk such thatr, r1,. . .,rk are combined resolvable, andQ = P ∪ {cr(r, r1, . . . , rk)}. It holds that hd(app(P, I))≡hd(app(Q, I))for anyI⊆ A.
4 Learning from 1-step Transitions
In the section we present our inductive learning task for disjunctive logic programs and its learning algorithm. Properties of the algorithm are investigated as well.
4.1 The Learning Task
Abackground theoryis a logic program. Anexample(orobservation) is astate tran- sition(orinterpretation transition),i.e., a tuplehI, JiwithI ⊆ AandJ ⊆ A, which means that the stateJ is a candidate successor of the stateIin a Boolean network, or J ∈TPd(I)for a disjunctive logic programP. LetEbe a set of examples. We denote Ei ={I | hI, Ji ∈ E},Eo ={J | hI, Ji ∈ S}andE(I) = {J | hI, Ji ∈ E}for I⊆ A. The setEistotalwheneverEi= 2A.
Definition 4 (the learning task). An inductive learning task from nondeterministic state transitions is, given a background theoryBand a setEof examples (state transi- tions), to find a hypothesis (a logic program)Hsuch that, for every examplehI, Ji ∈E, J ∈TB∪Hd (I)holds.
The above inductive learning task is written asILT(B, E). Such a hypothesisHto the inductive learning task is called asolutiontoILT(B, E). For our learning purpose, the given examples have to be restricted. For instance, letE = {h∅,∅i,h∅,{p}i}and B =∅. There will be no logic programH satisfying{∅,{p}} ⊆TB∪Hd (∅), since the sets in the collectionTB∪Hd (∅)are incomparable under set inclusion, while∅and{p}
are comparable under set inclusion.
A set E of state transitions iscoherent if J andJ0 are incomparable under set inclusion for every hI, JiandhI, J0iin E,i.e.,J and J0 are all minimal under set inclusion. A setEof state transitions isconsistent w.r.t.a logic programP, if for each hI, Ji ∈E,I|=bd(r)impliesJ |=hd(r)holds for every rulerinP.
The following property identifies the sufficient and necessary condition for the ex- istence of a solution to an inductive learning task.
Proposition 3. Given an inductive learning task ILT(B, E)whereBis a background theory andEis a set of observations, there exists a solutionHto ILT(B, E)if and only ifEis coherent andEis consistent w.r.t.B.
4.2 An Inductive Learning Algorithm
In the following we present a bottom-up method to compute a logic program for our inductive learning tasks. This method generates hypothesis by generalization from the most specific rules until all examples are covered.
Firstly, letq∈ AandI⊆ A. We denoterIq the following rule:
q←I∪ ¬I (9)
which is the most specific normal rule such that qbelongs to a candidate successor of the state I. Now theLFDTalgorithm is showed in Algorithm 1. Intuitively, this algorithm is to construct the following rules for these examples with the same first state in the state transitionshI, J1i, . . . ,hI, JmiofE:
H ←I∪ ¬I, H is a minimal hitting set ofJ1, . . . , Jm. (10) The algorithmAddRule, shown in Algorithm 2, adds these rules into the result. It also simplifies the result by removing being subsumed rules through disjunctive ground resolution and combined resolution. Since disjunctive ground resolution is a generaliza- tion of ground resolution, this algorithm is also a generalization of the algorithmLF1T in [6], which learns normal logic programs from(deterministic) state transitions,i.e., I16=I2for any two distinct state transitionshI1, J1iandhI2, J2iinE.
LetP be a logic program, and E be a coherent set of state transitions which is consistentw.r.t.a background theoryB. The logic programP iscompleteforE w.r.t.
B if{J | hI, Ji ∈ E} ⊆TB∪Pd (I)for anyI ∈ Ei, it issoundforEifTB∪Pd (I)⊆ {J | hI, Ji ∈E}for anyI∈Ei. A learning algorithm iscomplete(resp.sound) forE w.r.t.B if its output is complete (resp. sound) forEw.r.t.B. In the following we show the correctness of theLFDTalgorithm according to its soundness and completeness.
Theorem 1. The algorithmLFDTis sound and complete (with disjunctive ground res- olution, combined resolution, and/or subsumption reduction). Namely, ifEis coherent and B is consistent w.r.t.E then the output P by the algorithmLFDT is sound and complete forEw.r.t.B.
Algorithm 1: LFDT(E, B)
Input:A setEof state transitions and a background theoryBsuch thatEis coherent and it is consistentw.r.t.B
Output:A logic programP
1 P←B;
2 foreachhI, Ji ∈Edo
3 Q← {rqI|q∈J};
4 E←E\ {hI, Ji};
5 foreachhI0, J0i ∈EwithI0=Ido
6 E←E\ {hI0, J0i};
7 foreachp∈J0andr∈Qdo Q←Q∪ {hd(r)∪ {p} ←bd(r)};
8 ;
9 end
10 foreachr∈Qdo AddRule(r, P);
11 ;
12 end
13 P←P\B;
14 returnP;
5 Concluding Remarks and Future Work
In this paper we proposed a new frameworkLFDTfor learning disjunctive programs from interpretation transitions. It is a nontrivial and substantial extension to theLF1T framework. One remaining challenge work is to apply the learning approach to practical domains, such as bio-informatics for whichLF1Tis successfully applied.
Acknowledgement. We thank reviewers for their helpful comments. This work is partially supported by NSFC under grant 63170161, Stadholder Fund of Guizhou Province under grant (2012)62, Outstanding Young Talent Training Fund of Guizhou Province under grant (2015)01 and Science and Technology Fund of Guizhou Province under grant [2014]76400, Scientific and Technological Research Program of Chongqing Municipal Education Commission under Grant No. KJ1601129.
References
1. Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods.
Journal of Logic Programming, 19/20:629–679, 1994.
2. Stephen Muggleton, Luc De Raedt, David Poole, Ivan Bratko, Peter A. Flach, Katsumi Inoue, and Ashwin Srinivasan. ILP turns 20 - biography and future challenges.Machine Learning, 86(1):3–23, 2012.
3. S.A. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets.
Journal of Theoretical Biology, 22(3):437 – 467, 1969.
4. Stuart A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution.
Oxford: Oxford University Press, 1993.
5. Katsumi Inoue. Logic programming for boolean networks. In Toby Walsh, editor,IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 924–930. IJCAI/AAAI, 2011.
Algorithm 2: AddRule(r, P) Input:A rulerand a logic programP
1 if∃r0∈Ps.t.r0≺rthen return;
2 ;
3 foreachr0∈Pdo ifr≺r0thenP←P\ {r0};
4 ;
5 ;
6 P←P∪ {r};
7 whiler, r1, . . . , rk∈Pare combined resolvabledo AddRule(cr(r, r1, . . . , rk));
8 ;
9 foreachr0∈Pdo
10 ifris disjunctively ground resolvable w.r.t.r0then
11 AddRule(gr(r, r0), P);
12 else ifr0is disjunctively ground resolvable w.r.t.rthen
13 AddRule(gr(r0, r), P);
14 end
15 end
16 returnP;
6. Katsumi Inoue, Tony Ribeiro, and Chiaki Sakama. Learning from interpretation transition.
Machine Learning, 94(1):51–79, 2014.
7. Maxime Folschette, Lo¨ıc Paulev´e, Katsumi Inoue, Morgan Magnin, and Olivier H. Roux.
Identification of biological regulatory networks from process hitting models. Theoretical Computer Science, 568:49–71, 2015.
8. David Mart´ınez, Tony Ribeiro, Katsumi Inoue, Guillem Aleny`a, and Carme Torras. Learning probabilistic action models from interpretation transitions. In Marina De Vos, Thomas Eiter, Yuliya Lierler, and Francesca Toni, editors,Proceedings of the Technical Communications of the 31st International Conference on Logic Programming (ICLP 2015), Cork, Ireland, August 31 - September 4, 2015., volume 1433 of CEUR Workshop Proceedings. CEUR- WS.org, 2015.
9. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunc- tive databases.New Generation Computing, 9:365–385, 1991.
10. Jorge Lobo, Jack Minker, and Arcot Rajasekar. Foundations of disjunctive logic program- ming. Logic Programming. MIT Press, 1992.
11. J. W. Lloyd.Foundations of Logic Programming. Springer-Verlag, 2nd edition, 1987.
12. Maarten H. van Emden and Robert A. Kowalski. The semantics of predicate logic as a programming language. Journal of the ACM, 23(4):733–742, 1976.
13. Victor W. Marek and Miroslaw Truszczynski. Logic programs with abstract constraint atoms.
InProceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence (AAAI 2004), pages 86–91, San Jose, California, USA, 2004. AAAI Press.
14. Gordon D. Plotkin. A note on inductive generalisation. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 5, chapter 8, pages 153–163. Edingburgh Unviersity Press, 1970.