Ontology mutation testing

(1)

February 3, 2016 Cesare Bartolini

Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg

(2)

Outline

1 Mutation testing

2 Mutant generation

3 OWL ontologies

4 OWL mutation testing

5 Validation

(3)

Outline

1 Mutation testing

2 Mutant generation

3 OWL ontologies

Validation

(4)

What is testing

I Verifying the conformance of the System Under Test (SUT) to its requirements

I Many properties to verify

I Correctness

I Performance

I Security

I . . .

I Many different ways of testing

I Requires a Test Suite (TS)

I Manual, automated, test factory. . .

(5)

Testing 101

Figure:How to run a test

(6)

Testing 102

Figure:How standard testing works

(7)

Mutation testing

(8)

Mutation essentials

Figure:Mutation testing process

(9)

Step 1: normal test suite run

I Use the unmodified SUT ("golden")

I Run the test suite TS

I Right or wrong doesn’t matter!

I Store the output R₀ in some format

I Text, XML, binary. . .

Important!

Tests should not fail(i.e., break execution) against the "golden" SUT. Consequently

It’s important to fix the TS first.

(10)

Step 1: normal test suite run

Important!

Tests should not fail(i.e., break execution) against the "golden" SUT.

Consequently

It’s important to fix the TS first.

(11)

Step 1: normal test suite run

Important!

Tests should not fail(i.e., break execution) against the "golden" SUT.

Consequently

(12)

Step 2: generate the mutants

I Start from ground string ("golden" SUT)

I Mutation operators

I Remove equivalent mutants (optional)

I Reduce number of mutants (optional)

I Store the mutated SUTs

I Have n mutants at the end

(13)

Step 3: mutant runs

I Batch runner

I Fetches a mutant

I Runs TS against the mutant

I Stores the results R₁, ..., Rn I Rinse & repeat

Complexity

Mutation testing can be very hard. Think of a TS with 100 tests run over a code which generates 10K mutants.

(14)

Step 3: mutant runs

I Batch runner

I Fetches a mutant

I Runs TS against the mutant

I Stores the results R₁, ..., Rn I Rinse & repeat

Complexity

Mutation testing can be very hard. Think of a TS with 100 tests run over a code which generates 10K mutants.

(15)

Step 4: check the outputs

I Oracle compares results

I R₁, ..., R_n against R0 I Comparison may be difficult

I Results differ: mutant is killed

I Results do not differ: mutant is alive

I Best result: 100% killed mutants

Important!

Tests mayfail (i.e., execution breaks) against the mutant. The result is different from the "golden" anyway.

I E.g., if the mutant introduces an infinite loop

(16)

Step 4: check the outputs

I Oracle compares results

I R₁, ..., R_n against R0 I Comparison may be difficult

I Results differ: mutant is killed

I Results do not differ: mutant is alive

I Best result: 100% killed mutants Important!

Tests mayfail (i.e., execution breaks) against the mutant. The result is different from the "golden" anyway.

I E.g., if the mutant introduces an infinite loop

(17)

Step 5: and now?

I Mutation testing tells me how good my test suite is

I Find patterns of live mutants

I But it can also give me insights on the SUT

I Example: mutants alive because path not covered

I Reason 1: missing a test in the TS (must add tests)

I Reason 2: unreachable code (must modify the SUT)

I Analysis can be complex

I Generally used for unit testing

(18)

Outline

1 Mutation testing

2 Mutant generation

3 OWL ontologies

5 Validation

(19)

How mutant generation works

I Based on error testing or fault testing

I Hypothesis: the original SUT is correct

I Inject an error in the code

I Asingleerror

I E.g., remove a semicolon

I Each error injection is a separate mutant

I Alive mutant ⇔ TS cannot detect the error

I Specific tests should be added

(20)

Semantic mutant generation

I Traditional mutant generation is syntactic

I Can operate on the semantics

I E.g., + changed to -

I The system is still formally correct

I But now it should behave differently from the "golden"

I If it doesn’t, then

I TS doesn’t even go there, or

I TS goes there but code is irrelevant

This can be an error in thecode or in thetest suite!

(21)

Typical mutation operations

I Remove statement

I Change variable type

I Change unary operators

I Change arithmetical operators

I Change comparison operators

I Change logical operators

I Reverse conditions

I Reverse then and else branches

I Change 1 into 0

I . . .

Important!

Never use random changes.

(22)

Typical mutation operations

I Remove statement

I Change variable type

I Change unary operators

I Change arithmetical operators

I Change comparison operators

I Change logical operators

I Reverse conditions

I Reverse then and else branches

I Change 1 into 0

I . . . Important!

(23)

Equivalent mutants

I Mutants are supposed to be different

I Two different mutants might behave identically Example

for (int i = 1; i < n; i++) // "golden"

for (int i = 0; i < n; i++) // Mutant 1 for (int i = 1; i <= n; i++) // Mutant 2

Techniques allow equivalent detection.

(24)

Equivalent mutants

I Mutants are supposed to be different

I Two different mutants might behave identically Example

for (int i = 1; i < n; i++) // "golden"

for (int i = 0; i < n; i++) // Mutant 1 for (int i = 1; i <= n; i++) // Mutant 2 Techniques allow equivalent detection.

(25)

Too many mutants?

I If TS is changed, mutation testing should be redone

I Possibly too much computation

I It may be necessary to further reduce the number of mutants

I Heuristics or algorithms such as Category Partition

(26)

Outline

1 Mutation testing

2 Mutant generation

3 OWL ontologies

5 Validation

(27)

What is the Web Ontology Language (OWL)?

(28)

OWL essentials

I Knowledge representation

I Ontologies are descriptions of a knowledge domain

I RDF is too low-level

I OWL derives from DAML+OIL

I Representation of real-world objects

I Ontologies do not define anything

I Objects are defined in the domain itself

I Ontologies describe relations

I By means ofaxioms

(29)

OWL classification

Syntax

Abstract modelling with no mandatory syntax. Possibilities:

I RDF/XML (standard, XML-based, W3C)

I OWL/XML (uses own tagset, XML-based, W3C)

I Manchester (highly descriptive, almost textual)

I Turtle (descriptive, similar to SPARQL syntax)

I . . .

Semantics (OWL 2)

I OWL Full

I OWL-DL

I Several profiles

Syntax and semantics are irrelevant for the present work.

(30)

OWL classification

Syntax

I . . .

Semantics (OWL 2)

I OWL Full

I OWL-DL

I Several profiles

Syntax and semantics are irrelevant for the present work.

(31)

OWL classification

Syntax

I . . .

Semantics (OWL 2)

I OWL Full

I OWL-DL

(32)

OWL structure

I Entities (named or anonymous)

I Classes

I Individuals

I Object properties

I Data properties

I Datatypes

I Annotations

I . . .

I Axioms

I Subclass

I Domain

I Range

I Class assertion

I . . .

(33)

Outline

1 Mutation testing

2 Mutant generation

3 OWL ontologies

4 OWL mutation testing Validation

(34)

OWL mutation

(35)

OWL mutation testing basics

I The SUT is theontology

I TS built for the ontology

I E.g., SPARQL queries

I The tester must be able to run tests for the specific SUT

I E.g., SPARQL engine

(36)

A more practical perspective

I The SUT is thesoftware

I TS built for the software

I E.g., input values for the program

I The tester only needs to run the software

I E.g., batch execution

(37)

Differences

Testing the ontology

I Deeper analysis of the ontology

I Harder to develop tests (no specific functionality)

I Harder to say when the output is wrong

I Harder to compare results (ask later)

I The testing setup is more complex because OWL does not execute

Testing the software

I Focus only on the software requirements

I Plenty of test generation methodologies

I Outputs are clearer

I Easy to compare outputs (the software has an output format)

I The testing setup must only invoke the program

(38)

Differences

Testing the ontology

I Deeper analysis of the ontology

I Harder to develop tests (no specific functionality)

I Harder to say when the output is wrong

I Harder to compare results (ask later)

I The testing setup is more complex because OWL does not execute Testing the software

I Focus only on the software requirements

I Plenty of test generation methodologies

I Outputs are clearer

I Easy to compare outputs (the software has an output format)

(39)

Mutation operators: categories

I Five categories of operators

I Entities in general (E)

I Classes (C)

I Object properties (O)

I Data properties (D)

I Named individuals (I)

(40)

Mutation operators

ERE Remove the entity and all its axioms Any entity ERL Remove entity labels

ECL Change label language CRS Remove a single subclass axiom Class CSC Swap the class with its superclass

CRD Remove disjoint class CRE Remove equivalent class OND Remove a property domain ONR Remove a property range ODR Change property domain to range Object property ORD Change property range to domain

ODP Assign domain to superclass ODC Assign domain to subclass ORP Assign range to superclass ORC Assign range to subclass ORI Remove inverse property DAP Assign property to superclass Data property DAC Assign property to subclass

DRT Remove data type IAP Assign to superclasses

(41)

Some examples

ERE operator

I Completely removes an entity

I Also removes all axioms associated with it

I If it’s a class, its subclasses become subclasses of Thing

OND operator

I Removes a domain from an object property

I The object property actually expands its domain

(42)

Some examples

ERE operator

I Completely removes an entity

I Also removes all axioms associated with it

I If it’s a class, its subclasses become subclasses of Thing OND operator

I Removes a domain from an object property

I The object property actually expands its domain

(43)

Outline

1 Mutation testing

2 Mutant generation

3 OWL ontologies

Validation

(44)

Experimental setup

I Programming language: Java 7+

I Just because I haven’t learnt lambda expressions yet

I Mutant generator: based on OWL API 4

I SUT is the OWL ontology in RDF/XML format

I TS is set of SPARQL queries

I Query engine: based on Apache Jena/ARQ

I https://github.com/guerret/lu.uni.owl.mutatingowls

Why two libraries?

I had already developed a tool for operating on ontologies using OWL API, but OWL API does not manage SPARQL.

(45)

Experimental setup

I Programming language: Java 7+

I Just because I haven’t learnt lambda expressions yet

I Mutant generator: based on OWL API 4

I SUT is the OWL ontology in RDF/XML format

I TS is set of SPARQL queries

I Query engine: based on Apache Jena/ARQ

I https://github.com/guerret/lu.uni.owl.mutatingowls Why two libraries?

I had already developed a tool for operating on ontologies using OWL API,

(46)

How the testing works

1. Generate the mutants 1.1 Why this step first?

2. Run all queries on "golden" ontology 3. Store the results (not as text) 4. For each mutant:

4.1 Run all queries on the mutant 4.2 Compare against the "golden" results 4.3 Reset the ground results

4.4 Store if the mutant is killed or alive 5. Output a detailed report

(47)

Result comparison

I Mutation testing normally compares text

I SPARQL results may have a different order of the output

I Text is not an option

I Better to compare the mutants one by one

I Too much space needed to store all results

I Jena/ARQ has the order-neutral method

I ResultSetCompare.equalsByTerm

I But I must reset the "golden" results after each comparison

I By default, parsing "consumes" the data

Why this?

(48)

Result comparison

I Mutation testing normally compares text

I SPARQL results may have a different order of the output

I Text is not an option

I Better to compare the mutants one by one

I Too much space needed to store all results

I Jena/ARQ has the order-neutral method

I ResultSetCompare.equalsByTerm

I But I must reset the "golden" results after each comparison

I By default, parsing "consumes" the data Why this?

(49)

Example

I Tried to reuse existing stuff, avoid bias

I Reference SUT: the pizza ontology

I http://protege.stanford.edu/ontologies/pizza/pizza.owl

I Set of SPARQL queries: not immediately available

I Found

https://code.google.com/p/twouse/wiki/SPARQLASExamples

I Had to convert back to SPARQL

I Very minimal, had to introducetwoadditional tests

(50)

Results

Operator Mutants killed Total mutants Percentage

ERE 108 112 96.43

ERL 95 95 100.00

ECL 95 95 100.00

CRS 255 255 100.00

CSC 83 83 100.00

CRD 471 753 62.55

CRE 41 41 100.00

OND 0 6 0.00

ONR 0 7 0.00

ODR 0 6 0.00

ORD 0 7 0.00

ODP 0 6 0.00

ODC 1 250 0.40

ORP 0 7 0.00

ORC 1 253 0.40

IRT 0 10 0.00

Other operators 0 0 0.00

Total 1150 1986 57.62

(51)

Some preliminary analyses

Considerations on the TS

I The TS mainly covers the class hierarchy

I More tests needed for properties and individuals

I Tests cover only a branch of the class hierarchy

I Tests needed for the rest

Considerations on the SUT

I Some object properties are not used anywhere

I This might mean they are irrelevant

(52)

Future developments

I Full-fledged test suite

I Using both the ontology and the software relying on it as SUT

I Extend the set of mutation operators, e.g.:

I Change the OWL cardinality constraints

I Operate on annotations other than labels

I Algorithms to reduce the complexity (e.g., detect equivalents)

I Add a new, "structural" level of mutation (unique to ontologies), e.g.:

I Change a subclass axiom into an object property

I Create named classes from unnamed ones

I Split intersections into separate entities

I . . .

This work will be presented at the AMARETTO workshop, co-located with the MODELSWARD conference, on February 19.

(53)

Future developments

I Full-fledged test suite

I Using both the ontology and the software relying on it as SUT

I Extend the set of mutation operators, e.g.:

I Change the OWL cardinality constraints

I Operate on annotations other than labels

I Algorithms to reduce the complexity (e.g., detect equivalents)

I Add a new, "structural" level of mutation (unique to ontologies), e.g.:

I Change a subclass axiom into an object property

I Create named classes from unnamed ones

I Split intersections into separate entities

I . . .