UNIV. RENNES 1, 2020-2021
VALIDATION & VERIFICATION
MUTATION ANALYSIS
BENOIT COMBEMALE
PROFESSOR, UNIV. TOULOUSE, FRANCE
HTTP://COMBEMALE.FR
BENOIT COMBEMALE
PROFESSOR, UNIV. RENNES 1 & INRIA, FRANCE
HTTP://COMBEMALE.FR
V&V M2 ISTIC
Initializes the program Triggers specific behavior
Specifies the expected effects
public class CharSetTest { ...
@Test
public void testConstructor () {
CharSet set = CharSet.getInstance("a");
CharRange[] array = set.getCharRanges();
assertEquals("[a]", set.toString());
assertEquals(1, array.length);
assertEquals("a", array[0].toString());
} ...
}
A typical test case
Test cases are expected to:
• Cover main requirements
• Stress the application
• Prevent regressions
• Reveal bugs
Test case quality is impacted by:
• How well the input has been chosen
• How strong the assertions are
V&V M2 ISTIC
How can we be sure that a test suite is adequate
enough to find bugs?
Code coverage
• Most used approach
• Relatively “cheap” to compute
• Multiple effective implementations
• (JaCoCo,OpenClover,Cobertura)
• IDE integration out-of-the-box and via plugins
• Supported in Continuous Integration Servers and Github
V&V M2 ISTIC
Example
long fact(int n) { if(n==0) {
return 1;
}
long result = 1;
for(int i = 2; i <= n; i++) { result = result * i;
}
return result;
}
@Test
factorialWith5Test() { long obs = fact(5);
assertTrue(5 < obs);
}
@Test
factorialWith0Test() {
assertEqual(1, fact(0));
}
Code coverage
• Useful to spot not tested code
• High coverage ⇏ Effective test suite
• Well tested projects have high coverage
• The opposite may not be true! (e. g. No assertions)
• 100% coverage is hard to achieve and may be impractical
V&V M2 ISTIC
A real example…
V&V M2 ISTIC
• Issue tracker
• Continuous Integration
• Automated Static Analysis
• Code coverage monitoring Cobertura
CheckStyles
Apache Commons Collections
3,552 methods
13,677 tests 3,552 methods
V&V M2 ISTIC
85% coverage
926 test cases
V&V M2 ISTIC
public class AbstractHashedMap<K, V> extends AbstractMap<K, V> implements IterableMap<K, V> { protected void ensureCapacity(final int newCapacity) {
final int oldCapacity = data.length;
if (newCapacity <= oldCapacity) { return;
}
if (size == 0) {
threshold = calculateThreshold(newCapacity, loadFactor);
data = new HashEntry[newCapacity];
} else {
final HashEntry<K, V> oldEntries[] = data;
final HashEntry<K, V> newEntries[] = new HashEntry[newCapacity];
modCount++;
for (int i = oldCapacity - 1; i >= 0; i--) { HashEntry<K, V> entry = oldEntries[i];
if (entry != null) { oldEntries[i] = null;
do {
final HashEntry<K, V> next = entry.next;
final int index = hashIndex(entry.hashCode, newCapacity);
entry.next = newEntries[index];
newEntries[index] = entry;
entry = next;
} while (entry != null);
} }
threshold = calculateThreshold(newCapacity, loadFactor);
data = newEntries;
} } }
Executed 11,593 times
If the code is removed no test fails
926 test cases
public class AbstractHashedMap<K, V> extends AbstractMap<K, V> implements IterableMap<K, V> { protected void ensureCapacity(final int newCapacity) {
} } }
Mutation Analysis
• Proposed in 1970
• Programmers create programs close to being correct
• A test suite that detects all simple faults also detects the complex ones
V&V M2 ISTIC
Intuition
• Plus la qualité des tests est élevée plus on peut avoir confiance dans le programme
• L’analyse de mutation permet d’évaluer la qualité des tests
• Si les cas de test peuvent détecter des fautes
mises intentionnellement, ils peuvent détecter
des fautes réelles
Analyse de mutation
• Qualifie un ensemble de cas de test
• évalue la proportion de fautes que les tests détectent
• fondé sur l’injection de fautes
• L’évaluation de la qualité des cas de test est
importante pour évaluer la confiance dans le
programme
Analyse de mutation
• Le choix des fautes injectées est très important
• les fautes sont modélisées par des opérateurs de mutation
• Mutant = programme initial avec une faute injectée
• Deux fonctions d’oracle
• Différence de traces entre le programme initial et le mutant
• Contrats exécutables
put (x : INTEGER) is
-- put x in the set
require not_full: not full 1 do if not has (x) then
2 count := count + 1
3 structure.put (x, count) end -- if
ensure
has: has (x)
not_empty: not empty end -- put
- 1
Remove-inst
Analyse de mutation
Processus
P Génération
de mutants
Mutant 1 Mutant 2 Mutant 3 Mutant 4 Mutant 5 Mutant 6
CT
Exécu tion
Mutant tué Diagnostic
Mutant vivant
Spécification incomplète
Automatique Manuel
Mutant équivalent Supprimé de l’ensemble des mutants
Ajout de contrats
Cas de test insuffisants
Mutants équivalents
• Mutant équivalent est fonctionnellement équivalent à l’original
• aucun cas de test ne permet de le tuer
22
int Min ( int i, intj ){
int minval = i;
if (j<i) then minval = j;
return minval }
int Min ( int i, intj ){
int minval = i;
if (j< minval ) then minval = j;
return minval
}
Mutants vivants
• Si un mutant n’est pas tué?
• cas de test insuffisants => ajouter des cas de test
• mutant équivalent => supprimer le mutant
Score de mutation
•Q(Ci) = score de mutation de Ci = d i /m i
• d i = nombre de mutants tués
• m i = nombre de mutants non équivalents
• Attention Q(Ci)=100% not=> bug free
• Qualité d’un système S fait de composants d i
• Q(S) = S d i / S m i
24
Opérateurs de mutation (1)
• Remplacement d’un opérateur arithmétique
• Exemple: ‘+’ devient ‘-’ and vice-versa
• Remplacement d’un opérateur logique
• les opérateurs logiques (and, or, nand, nor, xor) sont remplacés;
• les expressions sont remplacées par TRUE et/ou FALSE
Opérateurs de mutation (2)
• Remplacement des opérateurs relationnels
• les opérateurs relationnels (<, >, <=, >=, =, /=) sont remplacés.
• Suppression d’instruction
• Perturbation de variable et de constante
• +1 sur une variable
• chaque booléen est remplacé par son complément.
26
Opérateurs OO
• Pour évaluer des cas de test pour des
programmes OO, il est important d’avoir des opérateurs spécifiques qui modélisent des
fautes de conception OO
• Des idées de fautes OO?
Opérateurs OO(1)
• Exception Handling Fault
• force une exception
• Visibilité
• passe un élément privé en public et vive-versa
• Faute de référence (Alias/Copy)
• passer un objet à null après sa création.
• supprimer une instruction de clone ou copie.
• ajouter un clone.
28
Opérateurs OO(2)
• Inversion de paramètres dans la déclaration d’une méthode
• Polymorphisme
• affecter une variable avec un objet de type « frère »
• appeler une méthode sur un objet « frère »
• supprimer l’appel à super
• suppression de la surcharge d’une méthode
Opérateurs OO(3)
• En Java
• erreurs sur static
• mettre des fautes dans les librairies
30
Five examples
#classes #statements #test cases coverage
lang 132 8442 2352 94%
collection 286 6780 13677 84%
gson 66 2377 951 79%
io 103 2573 962 87%
dagger 23 4984 128 89%
Number of mutants with PIT
commons−lang commons−collections gson commons−io dagger 32
nb mutant 0200040006000800010000
Mutation score
% mutantions killed 0.20.40.60.81.0
Negate condition
34
original mutant
== !=
!= ==
<= >
>= <
< >=
> <=
Mutation score (neg. cond)
% negate conditionals killed 0.20.40.60.8
Conditionals Boundary Mutator
36
original mutant
< <=
<= <
> >=
>= >
Mutation score (neg. cond. boundaries)
% conditionals boundary mutations killed 0.10.20.30.40.50.6
Test par mutation
• Génération de test dirigée par:
• la qualité: choisir une qualité souhaitée Q(Ci)
• l’effort: choisir un nombre maximum de cas de test
possibles MaxTC
Test par mutation
• Améliorer la qualité d’un ensemble de cas de test
• tant que Q(Ci) < Q(Ci) et nTC <= MaxTC
• ajouter des cas de test (nTC++)
• relancer l’exécution des mutants
• éliminer les mutants équivalents
• recalculer Q(Ci)
• Diminuer la taille d’un ensemble de cas de test
• supprimer les cas de test qui tuent les mêmes mutants
Some available tools for Java
• Javalanche
• https://www.st.cs.uni-saarland.de/mutation/
• Bytecode manipulation
• Major
• http://mutation-testing.org/
• AST manipulation
• Extensible
• PIT
• http://pitest.org
• Bytecode manipulation
• Out favorite choice J
V&V M2 ISTIC
PIT or PITest
• Open source, in active development and production ready
• Integrates with major build systems
• State of the art mutation testing
• Extensible via plugins
• Concurrent execution
• Test selection </>
Current limitations of mutation testing
• Few integrated tools
• Expensive computation
• Huge number of mutants
• Presence of equivalent mutants
V&V M2 ISTIC
Equivalent mutants
class Foo { int min;
public void bar(int i) { if (i < min) {
min = i;
}
System.out.println("" + min);
} }
class Foo { int min;
public void bar(int i) {
if (i
<=
min) {min = i;
}
System.out.println("" + min);
} }
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
Equivalent mutants
• Harmful for quality metrics
• May provide helpful information for developers
• May signal code duplication or redundancy à Refactor
• May signal buddy tests à Fix
V&V M2 ISTIC
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
Equivalent mutants
class Foo { int min;
public void bar(int i) { if (i < min) {
min = i;
}
System.out.println("" + min);
} }
class Foo { int min;
public void bar(int i) {
if (i
<=
min) {min = i;
}
System.out.println("" + min);
} }
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
Equivalent mutants
V&V M2 ISTIC
class Foo { int min;
public void bar(int i) { min = Math.min(i, min);
System.out.println("" + min);
} }
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
The code is more expressive
Equivalent mutants
public void someLogic(int i) { if (i <= 100) {
throw new IllegalArgumentException();
}
if (i > 100) { doSomething();
} }
public void someLogic(int i) { if (i <= 100) {
throw new IllegalArgumentException();
}
if (i
>=
100) {doSomething();
} }
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
Code is redundant
Equivalent mutants
V&V M2 ISTIC
public void someLogic(int i) { if (i <= 100) {
throw new IllegalArgumentException();
}
doSomething();
}
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
Refactor the code
Live mutants provide good hints
public void isNotEqualTo(Object expected) { int[] actual = getSubject();
try {
int[] expectedArray = (int[]) expected;
if (actual == expected ||
Arrays.equals(actual, expectedArray)) { failWithRawMessage(...);
}
catch(ClassCastException ignored){}
}
Henry Coles, Making Mutants Work For You or how I learned to stop worrying and love equivalent mutants, Paris JUG October 24th2018
@Test
isNotEqualTo_FailEquals() { try {
assertThat(array(2,3))
.isNotEqualTo(array(2,3));
}
catch(AssertionError err) {
assertThat(err).hasMessage(...);
} } public void isNotEqualTo(Object expected) {
int[] actual = getSubject();
try {
int[] expectedArray = (int[]) expected;
if (actual == expected ||
Arrays.equals(actual, expectedArray)) { // failWithRawMessage(...);
}
catch(ClassCastException ignored){}
}
Google Truth: https://github.com/google/truth
@Test
isNotEqualTo_FailEquals() { try {
assertThat(array(2,3))
.isNotEqualTo(array(2,3));
throw new Error(...);
}
catch(AssertionError err) {
assertThat(err).hasMessage(...);
} }
Overcoming the limitations
• Do fewer
• Reduce the number of mutants
• Sample the mutants
• Use less operators
• Do smarter
• Distribute the analysis
• Cloud infrastructure
• Do faster
• Improve efficiency
V&V M2 ISTIC
PIT Strategy
• Bytecode manipulation à avoids compilation cycles
• Runs tests in parallel
• Cheap tests are executed first
• Runs only tests covering the mutant
• Stops when the mutant is killed
• Mutates only changed code in a regular basis
Deletion operators
• Remove statements, constants, blocks, etc.
• Correlated mutation score
• 80 % less mutants
• Still hard to understand
V&V M2 ISTIC
Proposed in 2016
R. Niedermayr, E. Juergens, and S. Wagner Will my tests tell me if I break this code?
Proceedings of the International Workshop on Continuous Software Evolution and Delivery, 2016, pp. 23–29.
Extreme mutation
Extreme mutation
V&V M2 ISTIC
public void setValue(int x) { // pass
}
public void setValue(int x) { if(a + x < 10)
a += x;
}
public int fact(int x) { int result = 0;
for(int i=1; i <= x; i++) { result *= i;
}
return result;
}
public int fact(int x) { return 42;
}
Example
long fact(int n) { if(n==0) {
return 1;
}
long result = 1;
for(int i = 2; i <= n; i++) { result = result * i;
}
return result;
}
long fact(int n) { return 1;
}
long fact(int n) { return 0;
}
Extreme mutation
• Creates less mutants à Faster analysis
• Works at the method level à Easier to understand
• Many equivalent mutants can be detected à More reliable
V&V M2 ISTIC
Pseudo-tested methods
• Methods executed by the test suite
• No extreme mutation is detected
• Found in well tested projects
A pseudo-tested method
V&V M2 ISTIC
class VList {
private List elements;
private int version;
public void add(Object item) { elements.add(item);
incrementVersion();
}
public int size () {
return elements.size();
}
private void incrementVersion () { version ++;
} }
class VListTest
@Test
public void testAdd (){
VList l = new VList();
l.add(1);
assertEquals(l.size(), 1);
} }
Pseudo-tested
Testability issues
Descartes I mutate therefore I am
• A set of extensions for PIT
• Implements extreme mutation
• Finds pseudo-tested methods in Java projects
V&V M2 ISTIC
. . .
. . .
. . .
. . .
. . . Compiled files
Compiled test files
Source files
Test source files
Dependencies
Configuration
PIT
Descartes
class A {
…
public void voidMethod() {}
public int intMethod() {}
public String strMethod() {}
… }
void 3 “S” null
Reports Execution Unit
Operators to use
Number of execution threads
…
. . .
.json
{…}
Practical results: number of mutants
626 161758 271 979 3558 1164 3872 4935 848 1252 7210 7152 4525 412 1566 2304 7559 3627 4713 3082 5534
7296 2141689 2560 9233 20394 8809 30361 43619 7353 12210 89592 78316 31233 2271 14054 17163 79763 62768 43916 17345 112605
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
AuthZForce PDP Core
Amazon Web Services SDK
Apache Commons CLI
Apache Commons Codec
Apache Commons Collections
Apache Commons IO
Apache Commons Lang
Apache Flink
Google Gson
Jaxen XPath Engine
JFreeChart
Java Git
Joda-Time
JOpt Simple
jsoup
SAT4J Core
Apache PdfBox
SCIFIO
Spoon
Urban Airship Client Library
XWiki Rendering Engine
Descartes Gregor
Practical results: time
V&V M2 ISTIC
00:08:00 01:32:23 00:00:13 00:02:02 00:01:41 00:02:16 00:02:07 00:14:04 00:01:08 00:01:31 00:05:48 01:30:08 00:03:39 00:00:37 00:02:43 00:53:09 00:44:07 00:24:14 02:24:55 00:07:25 00:10:56
01:23:50 06:11:22 00:01:26 00:07:57 00:05:41 00:12:48 00:21:02 02:29:45 00:05:34 00:24:40 00:41:28 16:02:03 00:16:32 00:01:36 00:12:49 10:55:50 06:20:25 03:12:11 56:47:57 00:11:31 02:07:19
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
AuthZForce PDP Core
Amazon Web Services SDK
Apache Commons CLI
Apache Commons Codec
Apache Commons Collections
Apache Commons IO
Apache Commons Lang
Apache Flink
Google Gson
Jaxen XPath Engine
JFreeChart
Java Git
Joda-Time
JOpt Simple
jsoup
SAT4J Core
Apache PdfBox
SCIFIO
Spoon
Urban Airship Client Library
XWiki Rendering Engine
Descartes Gregor
Raw mutation score correlation
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%
Gregor
Descartes
Pseudo-tested methods
V&V M2 ISTIC
13
224
2 12
40 29 47
100
10 11
476
296
82
2
28
143
473
72
213
28
239
0 50 100 150 200 250 300 350 400 450 500
AuthZForce PDP Core
Amazon Web Services SDK Apache Commons CLI
Apache Commons Codec
Apache Commons Collections Apache Commons IO
Apache Commons Lang Apache Flink
Google Gson
Jaxen XPath Engine JFreeChart
Java Git
Joda-Time
JOpt Simple jsoup
SAT4J Core
Apache PdfBox SCIFIO
Spoon
Urban Airship Client Library XWiki Rendering Engine
Some examples
Apache Commons Codec
V&V M2 ISTIC
public void testIsEncodeEquals() { final String[][] data = {
{"Meyer", "M\u00fcller"}, {"Meyer", "Mayr"},
...
{"Miyagi", "Miyako"}
};
for (final String[] element : data) { final boolean encodeEqual =
this.getStringEncoder().isEncodeEqual(element[1], element[0]);
} }
No assertions
Pseudo-tested
Apache Commons IO
Weak oracle
Result is the same if nothing is written
Pseudo-tested
public void testTee() {
ByteArrayOutputStream baos1 = new ByteArrayOutputStream();
ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
TeeOutputStream tos = new TeeOutputStream(baos1, baos2);
...
tos.write(array);
assertByteArrayEquals(baos1.toByteArray(), baos2.toByteArray());
}
Apache Commons Collections
V&V M2 ISTIC
class SingletonListIterator implements Iterator<Node> { ...
void add() { throw
new UnsupportedOperationException();
} ...
}
class SingletonListIteratorTest { ...
@Test
void testAdd() {
SingletonListIterator it = ...;
try {
it.add(value);
}
catch(Exception ex) {}
...
}
No exception is thrown A fail is needed here
Pseudo-tested
Amazon Web Services SDK
class SdkTLSSocketFactory {
protected void prepareSocket(SSLSocket s){
...
s.setEnabledProtocols(protocols);
...
} }
@Test
void typical() {
SdkTLSSocketFactory f = ...;
f.prepareSocket(new TestSSLSocket() {
@Override
public void setEnabledProtocols (String[] protocols) {
assertTrue(
Arrays.equals(protocols, expected));
} ...
});
}
Pseudo-tested
No assertion is verified if
the method is emptied
Partially-tested methods
V&V M2 ISTIC
class AClass {
private int aField = 0;
public AClass(int field) { aField = field;
}
public boolean equals(object other) { return other instanceof AClass &&
((AClass) other).aField == aField;
} }
@Test
public void test()
AClass a = new AClass(3);
AClass b = new AClass(3);
AClass c = new AClass(4);
assertTrue(a.equals(b));
assertFalse(a == c);
}
return false;
return true;