VALIDATION & VERIFICATION

(1)

UNIV. RENNES 1, 2020-2021

VALIDATION & VERIFICATION

MUTATION ANALYSIS

BENOIT COMBEMALE

PROFESSOR, UNIV. TOULOUSE, FRANCE

HTTP://COMBEMALE.FR

BENOIT COMBEMALE

PROFESSOR, UNIV. RENNES 1 & INRIA, FRANCE

HTTP://COMBEMALE.FR

(2)

V&V M2 ISTIC

Initializes the program Triggers specific behavior

Specifies the expected effects

public class CharSetTest { ...

@Test

public void testConstructor () {

CharSet set = CharSet.getInstance("a");

CharRange[] array = set.getCharRanges();

assertEquals("[a]", set.toString());

assertEquals(1, array.length);

assertEquals("a", array[0].toString());

} ...

}

A typical test case

(3)

Test cases are expected to:

• Cover main requirements

• Stress the application

• Prevent regressions

• Reveal bugs

(4)

Test case quality is impacted by:

• How well the input has been chosen

• How strong the assertions are

V&V M2 ISTIC

(5)

How can we be sure that a test suite is adequate

enough to find bugs?

(6)

Code coverage

• Most used approach

• Relatively “cheap” to compute

• Multiple effective implementations

• (JaCoCo,OpenClover,Cobertura)

• IDE integration out-of-the-box and via plugins

• Supported in Continuous Integration Servers and Github

V&V M2 ISTIC

(7)

Example

long fact(int n) { if(n==0) {

return 1;

}

long result = 1;

for(int i = 2; i <= n; i++) { result = result * i;

}

return result;

}

@Test

factorialWith5Test() { long obs = fact(5);

assertTrue(5 < obs);

}

@Test

factorialWith0Test() {

assertEqual(1, fact(0));

}

(8)

Code coverage

• Useful to spot not tested code

• High coverage ⇏ Effective test suite

• Well tested projects have high coverage

• The opposite may not be true! (e. g. No assertions)

• 100% coverage is hard to achieve and may be impractical

V&V M2 ISTIC

(9)

A real example…

(10)

V&V M2 ISTIC

• Issue tracker

• Continuous Integration

• Automated Static Analysis

• Code coverage monitoring ^Cobertura

CheckStyles

Apache Commons Collections

(11)

3,552 methods

(12)

13,677 tests 3,552 methods

V&V M2 ISTIC

(13)

85% coverage

(14)

926 test cases

V&V M2 ISTIC

(15)

public class AbstractHashedMap<K, V> extends AbstractMap<K, V> implements IterableMap<K, V> { protected void ensureCapacity(final int newCapacity) {

final int oldCapacity = data.length;

if (newCapacity <= oldCapacity) { return;

}

if (size == 0) {

threshold = calculateThreshold(newCapacity, loadFactor);

data = new HashEntry[newCapacity];

} else {

final HashEntry<K, V> oldEntries[] = data;

final HashEntry<K, V> newEntries[] = new HashEntry[newCapacity];

modCount++;

for (int i = oldCapacity - 1; i >= 0; i--) { HashEntry<K, V> entry = oldEntries[i];

if (entry != null) { oldEntries[i] = null;

do {

final HashEntry<K, V> next = entry.next;

final int index = hashIndex(entry.hashCode, newCapacity);

entry.next = newEntries[index];

newEntries[index] = entry;

entry = next;

} while (entry != null);

} }

threshold = calculateThreshold(newCapacity, loadFactor);

data = newEntries;

} } }

Executed 11,593 times

If the code is removed no test fails

926 test cases

public class AbstractHashedMap<K, V> extends AbstractMap<K, V> implements IterableMap<K, V> { protected void ensureCapacity(final int newCapacity) {

} } }

(16)

Mutation Analysis

• Proposed in 1970

• Programmers create programs close to being correct

• A test suite that detects all simple faults also detects the complex ones

V&V M2 ISTIC

(17)

Intuition

• Plus la qualité des tests est élevée plus on peut avoir confiance dans le programme

• L’analyse de mutation permet d’évaluer la qualité des tests

• Si les cas de test peuvent détecter des fautes

mises intentionnellement, ils peuvent détecter

des fautes réelles

(18)

Analyse de mutation

• Qualifie un ensemble de cas de test

• évalue la proportion de fautes que les tests détectent

• fondé sur l’injection de fautes

• L’évaluation de la qualité des cas de test est

importante pour évaluer la confiance dans le

programme

(19)

Analyse de mutation

• Le choix des fautes injectées est très important

• les fautes sont modélisées par des opérateurs de mutation

• Mutant = programme initial avec une faute injectée

• Deux fonctions d’oracle

• Différence de traces entre le programme initial et le mutant

• Contrats exécutables

(20)

put (x : INTEGER) is

-- put x in the set

require not_full: not full 1 do if not has (x) then

2 count := count + 1

3 structure.put (x, count) end -- if

ensure

has: has (x)

not_empty: not empty end -- put

- 1

Remove-inst

Analyse de mutation

(21)

Processus

P Génération

de mutants

Mutant 1 Mutant 2 Mutant 3 Mutant 4 Mutant 5 Mutant 6

CT

Exécu tion

Mutant tué Diagnostic

Mutant vivant

Spécification incomplète

Automatique Manuel

Mutant équivalent Supprimé de l’ensemble des mutants

Ajout de contrats

Cas de test insuffisants

(22)

Mutants équivalents

• Mutant équivalent est fonctionnellement équivalent à l’original

• aucun cas de test ne permet de le tuer

22

int Min ( ^int i, ^intj ){

int minval = i;

if (j<i) ^then minval = j;

return minval }

int Min ( ^int i, ^intj ){

int minval = i;

if (j< ^minval ) ^then minval = j;

return minval

}

(23)

Mutants vivants

• Si un mutant n’est pas tué?

• cas de test insuffisants => ajouter des cas de test

• mutant équivalent => supprimer le mutant

(24)

Score de mutation

•Q(Ci) = score de mutation de Ci = d _i /m _i

• d _i = nombre de mutants tués

• m _i = nombre de mutants non équivalents

• Attention Q(Ci)=100% not=> bug free

• Qualité d’un système S fait de composants ^d _i

• Q(S) = S d _i / S m _i

24

(25)

Opérateurs de mutation (1)

• Remplacement d’un opérateur arithmétique

• Exemple: ‘+’ devient ‘-’ and vice-versa

• Remplacement d’un opérateur logique

• les opérateurs logiques (and, or, nand, nor, xor) sont remplacés;

• les expressions sont remplacées par ^TRUE et/ou ^FALSE

(26)

Opérateurs de mutation (2)

• Remplacement des opérateurs relationnels

• les opérateurs relationnels (<, >, <=, >=, =, /=) sont remplacés.

• Suppression d’instruction

• Perturbation de variable et de constante

• +1 sur une variable

• chaque booléen est remplacé par son complément.

26

(27)

Opérateurs OO

• Pour évaluer des cas de test pour des

programmes OO, il est important d’avoir des opérateurs spécifiques qui modélisent des

fautes de conception OO

• Des idées de fautes OO?

(28)

Opérateurs OO(1)

• Exception Handling Fault

• force une exception

• Visibilité

• passe un élément privé en public et vive-versa

• Faute de référence (Alias/Copy)

• passer un objet à null après sa création.

• supprimer une instruction de clone ou copie.

• ajouter un clone.

28

(29)

Opérateurs OO(2)

• Inversion de paramètres dans la déclaration d’une méthode

• Polymorphisme

• affecter une variable avec un objet de type « frère »

• appeler une méthode sur un objet « frère »

• supprimer l’appel à super

• suppression de la surcharge d’une méthode

(30)

Opérateurs OO(3)

• En Java

• erreurs sur static

• mettre des fautes dans les librairies

30

(31)

Five examples

#classes #statements #test cases coverage

lang 132 8442 2352 94%

collection 286 6780 13677 84%

gson 66 2377 951 79%

io 103 2573 962 87%

dagger 23 4984 128 89%

(32)

Number of mutants with PIT

commons−lang commons−collections gson commons−io dagger 32

nb mutant 0200040006000800010000

(33)

Mutation score

% mutantions killed 0.20.40.60.81.0

(34)

Negate condition

34

original mutant

== !=

!= ==

<= >

>= <

< >=

> <=

(35)

Mutation score (neg. cond)

% negate conditionals killed 0.20.40.60.8

(36)

Conditionals Boundary Mutator

36

original mutant

< <=

<= <

> >=

>= >

(37)

Mutation score (neg. cond. boundaries)

% conditionals boundary mutations killed 0.10.20.30.40.50.6

(38)

Test par mutation

• Génération de test dirigée par:

• la qualité: choisir une qualité souhaitée Q(Ci)

• l’effort: choisir un nombre maximum de cas de test

possibles MaxTC

(39)

Test par mutation

• Améliorer la qualité d’un ensemble de cas de test

• tant que Q(Ci) < Q(Ci) et nTC <= MaxTC

• ajouter des cas de test (nTC++)

• relancer l’exécution des mutants

• éliminer les mutants équivalents

• recalculer Q(Ci)

• Diminuer la taille d’un ensemble de cas de test

• supprimer les cas de test qui tuent les mêmes mutants

(40)

Some available tools for Java

• Javalanche

• https://www.st.cs.uni-saarland.de/mutation/

• Bytecode manipulation

• Major

• http://mutation-testing.org/

• AST manipulation

• Extensible

• PIT

• http://pitest.org

• Bytecode manipulation

• Out favorite choice J

V&V M2 ISTIC

(41)

PIT or PITest

• Open source, in active development and production ready

• Integrates with major build systems

• State of the art mutation testing

• Extensible via plugins

• Concurrent execution

• Test selection </>

(42)

Current limitations of mutation testing

• Few integrated tools

• Expensive computation

• Huge number of mutants

• Presence of equivalent mutants

V&V M2 ISTIC

(43)

Equivalent mutants

class Foo { int min;

public void bar(int i) { if (i < min) {

min = i;

}

System.out.println("" + min);

} }

public void bar(int i) {

if (i

<=

min) {

min = i;

}

} }

Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24^th2018

(44)

Equivalent mutants

• Harmful for quality metrics

• May provide helpful information for developers

• May signal code duplication or redundancy à Refactor

• May signal buddy tests à Fix

V&V M2 ISTIC

(45)

Equivalent mutants

public void bar(int i) { if (i < min) {

min = i;

}

} }

public void bar(int i) {

if (i

<=

min) {

min = i;

}

} }

(46)

Equivalent mutants

V&V M2 ISTIC

public void bar(int i) { min = Math.min(i, min);

} }

The code is more expressive

(47)

Equivalent mutants

public void someLogic(int i) { if (i <= 100) {

throw new IllegalArgumentException();

}

if (i > 100) { doSomething();

} }

}

if (i

>=

100) {

doSomething();

} }

Code is redundant

(48)

Equivalent mutants

V&V M2 ISTIC

}

doSomething();

}

Refactor the code

(49)

Live mutants provide good hints

public void isNotEqualTo(Object expected) { int[] actual = getSubject();

try {

int[] expectedArray = (int[]) expected;

if (actual == expected ||

Arrays.equals(actual, expectedArray)) { failWithRawMessage(...);

}

catch(ClassCastException ignored){}

}

@Test

isNotEqualTo_FailEquals() { try {

assertThat(array(2,3))

.isNotEqualTo(array(2,3));

}

catch(AssertionError err) {

assertThat(err).hasMessage(...);

} } public void isNotEqualTo(Object expected) {

int[] actual = getSubject();

try {

int[] expectedArray = (int[]) expected;

if (actual == expected ||

Arrays.equals(actual, expectedArray)) { // failWithRawMessage(...);

}

catch(ClassCastException ignored){}

}

Google Truth: https://github.com/google/truth

@Test

isNotEqualTo_FailEquals() { try {

assertThat(array(2,3))

.isNotEqualTo(array(2,3));

throw new Error(...);

}

catch(AssertionError err) {

assertThat(err).hasMessage(...);

} }

(50)

Overcoming the limitations

• Do fewer

• Reduce the number of mutants

• Sample the mutants

• Use less operators

• Do smarter

• Distribute the analysis

• Cloud infrastructure

• Do faster

• Improve efficiency

V&V M2 ISTIC

(51)

PIT Strategy

• Bytecode manipulation à avoids compilation cycles

• Runs tests in parallel

• Cheap tests are executed first

• Runs only tests covering the mutant

• Stops when the mutant is killed

• Mutates only changed code in a regular basis

(52)

Deletion operators

• Remove statements, constants, blocks, etc.

• Correlated mutation score

• 80 % less mutants

• Still hard to understand

V&V M2 ISTIC

(53)

Proposed in 2016

R. Niedermayr, E. Juergens, and S. Wagner Will my tests tell me if I break this code?

Proceedings of the International Workshop on Continuous Software Evolution and Delivery, 2016, pp. 23–29.

Extreme mutation

(54)

Extreme mutation

V&V M2 ISTIC

public void setValue(int x) { // pass

}

public void setValue(int x) { if(a + x < 10)

a += x;

}

public int fact(int x) { int result = 0;

for(int i=1; i <= x; i++) { result *= i;

}

return result;

}

public int fact(int x) { return 42;

}

(55)

Example

long fact(int n) { if(n==0) {

return 1;

}

long result = 1;

for(int i = 2; i <= n; i++) { result = result * i;

}

return result;

}

long fact(int n) { return 1;

}

long fact(int n) { return 0;

}

(56)

Extreme mutation

• Creates less mutants à Faster analysis

• Works at the method level à Easier to understand

• Many equivalent mutants can be detected à More reliable

V&V M2 ISTIC

(57)

Pseudo-tested methods

• Methods executed by the test suite

• No extreme mutation is detected

• Found in well tested projects

(58)

A pseudo-tested method

V&V M2 ISTIC

class VList {

private List elements;

private int version;

public void add(Object item) { elements.add(item);

incrementVersion();

}

public int size () {

return elements.size();

}

private void incrementVersion () { version ++;

} }

class VListTest

@Test

public void testAdd (){

VList l = new VList();

l.add(1);

assertEquals(l.size(), 1);

} }

Pseudo-tested

Testability issues

(59)

Descartes I mutate therefore I am

• A set of extensions for PIT

• Implements extreme mutation

• Finds pseudo-tested methods in Java projects

(60)

V&V M2 ISTIC

. . .

. . . Compiled files

Compiled test files

Source files

Test source files

Dependencies

Configuration

PIT

Descartes

class A {

…

public void voidMethod() {}

public int intMethod() {}

public String strMethod() {}

… }

void 3 “S” null

Reports Execution Unit

Operators to use

Number of execution threads

…

. . .

.json

{…}

(61)

Practical results: number of mutants

626 161758 271 979 3558 1164 3872 4935 848 1252 7210 7152 4525 412 1566 2304 7559 3627 4713 3082 5534

7296 2141689 2560 9233 20394 8809 30361 43619 7353 12210 89592 78316 31233 2271 14054 17163 79763 62768 43916 17345 112605

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

AuthZForce PDP Core

Amazon Web Services SDK

Apache Commons CLI

Apache Commons Codec

Apache Commons Collections

Apache Commons IO

Apache Commons Lang

Apache Flink

Google Gson

Jaxen XPath Engine

JFreeChart

Java Git

Joda-Time

JOpt Simple

jsoup

SAT4J Core

Apache PdfBox

SCIFIO

Spoon

Urban Airship Client Library

XWiki Rendering Engine

Descartes Gregor

(62)

Practical results: time

V&V M2 ISTIC

00:08:00 01:32:23 00:00:13 00:02:02 00:01:41 00:02:16 00:02:07 00:14:04 00:01:08 00:01:31 00:05:48 01:30:08 00:03:39 00:00:37 00:02:43 00:53:09 00:44:07 00:24:14 02:24:55 00:07:25 00:10:56

01:23:50 06:11:22 00:01:26 00:07:57 00:05:41 00:12:48 00:21:02 02:29:45 00:05:34 00:24:40 00:41:28 16:02:03 00:16:32 00:01:36 00:12:49 10:55:50 06:20:25 03:12:11 56:47:57 00:11:31 02:07:19

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

AuthZForce PDP Core

Amazon Web Services SDK

Apache Commons CLI

Apache Commons Collections

Apache Commons IO

Apache Commons Lang

Apache Flink

Google Gson

Jaxen XPath Engine

JFreeChart

Java Git

Joda-Time

JOpt Simple

jsoup

SAT4J Core

Apache PdfBox

SCIFIO

Spoon

Urban Airship Client Library

XWiki Rendering Engine

Descartes Gregor

(63)

Raw mutation score correlation

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%

50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%

Gregor

Descartes

(64)

Pseudo-tested methods

V&V M2 ISTIC

13

224

2 12

40 29 47

100

10 11

476

296

82

2

28

143

473

72

213

28

239

0 50 100 150 200 250 300 350 400 450 500

AuthZForce PDP Core

Amazon Web Services SDK Apache Commons CLI

Apache Commons Collections Apache Commons IO

Apache Commons Lang Apache Flink

Google Gson

Jaxen XPath Engine JFreeChart

Java Git

Joda-Time

JOpt Simple jsoup

SAT4J Core

Apache PdfBox SCIFIO

Spoon

Urban Airship Client Library XWiki Rendering Engine

(65)

Some examples

(66)

Apache Commons Codec

V&V M2 ISTIC

public void testIsEncodeEquals() { final String[][] data = {

{"Meyer", "M\u00fcller"}, {"Meyer", "Mayr"},

...

{"Miyagi", "Miyako"}

};

for (final String[] element : data) { final boolean encodeEqual =

this.getStringEncoder().isEncodeEqual(element[1], element[0]);

} }

No assertions

Pseudo-tested

(67)

Apache Commons IO

Weak oracle

Result is the same if nothing is written

Pseudo-tested

public void testTee() {

ByteArrayOutputStream baos1 = new ByteArrayOutputStream();

ByteArrayOutputStream baos2 = new ByteArrayOutputStream();

TeeOutputStream tos = new TeeOutputStream(baos1, baos2);

...

tos.write(array);

assertByteArrayEquals(baos1.toByteArray(), baos2.toByteArray());

}

(68)

Apache Commons Collections

V&V M2 ISTIC

class SingletonListIterator implements Iterator<Node> { ...

void add() { throw

new UnsupportedOperationException();

} ...

}

class SingletonListIteratorTest { ...

@Test

void testAdd() {

SingletonListIterator it = ...;

try {

it.add(value);

}

catch(Exception ex) {}

...

}

No exception is thrown A fail is needed here

Pseudo-tested

(69)

Amazon Web Services SDK

class SdkTLSSocketFactory {

protected void prepareSocket(SSLSocket s){

...

s.setEnabledProtocols(protocols);

...

} }

@Test

void typical() {

SdkTLSSocketFactory f = ...;

f.prepareSocket(new TestSSLSocket() {

@Override

public void setEnabledProtocols (String[] protocols) {

assertTrue(

Arrays.equals(protocols, expected));

} ...

});

}

Pseudo-tested

No assertion is verified if

the method is emptied

(70)

Partially-tested methods

V&V M2 ISTIC

class AClass {

private int aField = 0;

public AClass(int field) { aField = field;

}

public boolean equals(object other) { return other instanceof AClass &&

((AClass) other).aField == aField;

} }

@Test

public void test()

AClass a = new AClass(3);

AClass b = new AClass(3);

AClass c = new AClass(4);

assertTrue(a.equals(b));

assertFalse(a == c);

}

return false;

return true;

Is always false (in Java) Mixed results may also

reveal faults

(71)

VALIDATION & VERIFICATION

UNIV. RENNES 1, 2020-2021