• Aucun résultat trouvé

On the Use of Available Testing Methods for Verification & Validation of AI-based Software and Systems

N/A
N/A
Protected

Academic year: 2022

Partager "On the Use of Available Testing Methods for Verification & Validation of AI-based Software and Systems"

Copied!
6
0
0

Texte intégral

(1)

On the Use of Available Testing Methods for Verification & Validation of AI-based Software and Systems

Franz Wotawa

Graz University of Technology, Institute for Software Technology Inffeldgasse 16b/2, A-8010 Graz, Austria

wotawa@ist.tugraz.at

Abstract

Verification and validation of software and systems is the es- sential part of the development cycle in order to meet given quality criteria including functional and non-functional re- quirements. Testing and in particular its automation has been an active research area for decades providing many methods and tools for automating test case generation and execution.

Due to the increasing use of AI in software and systems, the question arises whether it is possible to utilize available test- ing techniques in the context of AI-based systems. In this po- sition paper, we elaborate on testing issues arising when using AI methods for systems, consider the case of different stages of AI, and start investigating on the usefulness of certain test- ing methods for testing AI. We focus especially on testing at the system level where we are interesting not only in assuring a system to be correctly implemented but also to meet given criteria like not contradicting moral rules, or being depend- able. We state that some well-known testing techniques can still be applied providing being tailored to the specific needs.

Introduction

Because of the growing importance of AI methodologies for current and future software and systems, there is a need for coming up with appropriate quality assurance measures.

Such measures should come up with certain guarantees that the resulting products fulfill their requirements, e.g., provide the requested functionality and safety concerns. Providing guarantees seem to be essential in order to gain trust in AI- based system solutions. In particular, in autonomous driving to mention one more recent application area of AI, we have to establish a certification and homologation process that as- sures an autonomous vehicle to follow given regulations and other requirements.

Because of the fact that artifacts making use of AI tech- nology are themselves systems, the question is whether it is possible re-use ordinary testing methodologies and to adapt them for providing means for certification and homologa- tion. In particular, besides components like vision systems relying on machine learning, there are other components that do not rely on any AI methodology. In Figure 1 we give an overview of the architecture of such a system comprising the Copyright © 2021 for this paper by its authors. Use permitted un- der Creative Commons License Attribute 4.0 International (CC BY 4.0).

Figure 1: AI-based systems their boundaries, and environ- ment.

AI component and other components implementing func- tionality like providing user interfaces or database access. In addition, such system rely on a computational stack where we also have to consider the operating system, firmware, and even the hardware for verification and validation purposes.

As a consequence, we have to consider verification and val- idation of the whole system for quality assurance.

In a previous paper (Wotawa 2019), we already focused on the need for system testing. In contrast, in this paper, we try to give a first answer regarding the usefulness of certain available system testing methods for testing AI applications.

Furthermore, we discuss the corresponding general verifi- cation and validation problem of such application in more detail. We have always to understand what we want to test and what we want to achieve. We have also to be aware of shortcomings arising when focusing only a subparts of the overall verification and validation problem. First, faults of- ten arise because of untested interactions between different system components. Such cases may arise because of unin- tended interactions not considered during development. Sec- ond, we might not be able to sufficiently make guarantees regarding the degree of testing. And finally, we may miss critical inputs or scenarios that lead to trouble. The latter especially holds for different machine learning approaches and is referred to adversarial attacks (see e.g., (Su, Vargas, and Sakurai 2019) and (Goodfellow, McDaniel, and Paper- not 2018)).

We organize this paper as follows. In the next section, we

(2)

discuss the system testing challenge in detail. We focus on different aspects of testing to be considered and refer to re- lated literature. Afterwards, we present three approaches of systems testing that have been proven to find faults when testing systems using AI techniques. Finally, we summarize the obtained findings.

The testing challenge

As depicted in Figure 1 systems comprising AI methodology also rely on other components providing interfaces and func- tionality, as well as runtime support including operation sys- tems, firmware, and hardware. As a consequence, we have to considertesting as a holistic activitythat has to take care of all different parts of the whole system. In particular, we have toclarify what to test and how to test. For example, a logic- based reasoning system comprises a compiler for reading in the logic rules and facts, and the reasoning part. Hence, we have to test the compiler and the reasoning part first separately and afterwards together in close interaction. The compiler can be tested, for example using, fuzzing where more or less randomly generated inputs are generated (see e.g., (K¨oroglu and Wotawa 2019)). The reasoning engine it- self can be tested using certain known relations like that the sequence of rules provided to the system does not influence the final outcome (see e.g., (Wotawa 2018)). The overall sys- tem itself may be tested using fault injection, e.g., (Wotawa 2016). All these examples have – more or less – in common that they only capture some parts of the expected behavior.

If using fault injection, we are interested in how systems react on inputs that occur in case of faults. When using in- variants like the order of rules, we do not test all aspects of reasoning. Hence, in order to thoroughly test such sys- tems, we need to understand what to test in order toidentify shortcomings of underlying testing methodsto be used. Be- sides this and more specifically to AI methods, we have to provide some measures that at least indicate the quality of testing. For ordinary programs, coverage (e.g., (Ammann, Offutt, and Huang 2003)) and mutation score (e.g., (Jia and Harman 2011)) are used to determine whether test suites are good enough, i.e., being likely able to reveal a faulty behav- ior. Coverage helps to identify those parts of the program that are executed using the test suite, i.e., code coverage1. The mutation score is an indicator of the number of program variants, i.e., the mutations, that can be detected using the given test suite. It is worth noting that coverage or mutation score can be seen as a measure or indicator for guaranteeing that a test suite has the required capabilities for detecting a failing behavior.

Let us consider testing neural networks as an example.

Neural networks are trained using a set of examples and evaluated afterwards. Evaluation is used for assuring that a network reaches a given quality of the prediction outcome.

The set of examples used for training and evaluation have to be distinct. The question is now whether this evaluation is good enough for replacing further testing effort. The answer is no, because but not only of adversarial attacks (Su, Var-

1Note that besides code coverage there are other coverage defi- nitions used like test input coverage, combinatorial coverage, etc.

(a)

(d) (b)

(c)

Figure 2: Different variants of the ”do not enter” traffic sign someone sees in reality: (a) is the original sign, (b) the traffic sign with a bend, a sticker, and partially missing color, and (c) and (d) are traffic sign with various stickers attached.

gas, and Sakurai 2019) that lead to misclassifications even in case of small input variations. Other reasons for misclassifi- cations are the use of a training data set that is not covering all different examples, and other aspects like the distribu- tion of examples. Furthermore, note that variations of the appearance of objects in the real world exists often. In Fig- ure 2 we depict different images of the traffic sign ”do not enter” ranging from a bend to occlusions because of stick- ers attached. An autonomous car would require always to handle these case, and it is very unlikely that we really have all of such cases represented in the training data set. More- over, if so, we still would have misclassifications occurring, requiring to assure that there is no unwanted effect on the behavior of the overall system.

There is plenty of literature regarding different testing ap- proaches for neural networks, e.g., (Pei et al. 2017; Sun, Huang, and Kroening 2018; Ma et al. 2018b,a) and most recently (Kim, Feldt, and Yoo 2019; Sekhon and Fleming 2019). In some of the methods also an adapted version of coverage and mutation score for neural networks has been used. Unfortunately, coverage information maybe somehow misleading (Li et al. 2019) leaving the question regarding the quality of the test suite open.

In the case of neural network we may also ask whether classical coverage or mutation score used in ordinary soft- ware engineering can be used as quality measure when test- ing a current neural network implementation. (Chetouane, Klampfl, and Wotawa 2019) showed that making use of these measures when testing the configuration of neural net- works, i.e., setting the type of neurons, the number of layers and neurons, can be justified. Unfortunately, this is not the case when testing the whole neural network library as dis- cussed in (Klampfl, Chetouane, and Wotawa 2020). Hence, for neural networks or measures and means for testing shall

(3)

be provided.

Although, we may need to live with the challenge that we cannot completely tests certain system parts and that there is always a critical case where the AI part of a system may deliver a wrong result, the further question is whether this establishes a problem for the whole system. The answer in this case is no, providing that the system itself is able to de- tect this critical case and to react appropriately. For example in autonomous driving, we may make use of more than one sensor for obtaining information regarding objects around the vehicle and use sensor fusion to obtain reliable informa- tion. We only need to assure that the whole system interacts with the environment in a way that is dependable and ful- fills our requirements including ethical or moral considera- tions. Hence, identifying critical scenarios between the sys- tem and its environment seems to be a crucial factor of test- ing AI-based systems (Koopman and Wagner 2016; Menzel, Bagschik, and Maurer 2018).

Moreover, it seems also of importance to consider that critical scenarios often originate from different settings that have to occur at the same time. One issue, e.g., missing a certain traffic sign may not lead to an accident, but in com- bination with other issues would.

We summarize our discussion in the following position:

Position 1 Testing aims at identifying interactions between the system under test and its environment leading to an un- expected behavior. When testing systems utilizing AI, we have to consider testing all parts of a system including the one with and the one without AI as well as their in- teractions. Evaluating performance characteristic of imple- mented AI methodology may not be sufficient for assuring meeting quality criteria.

Most of testing is performed during development of sys- tems before deployment. In some cases certification (or even homologation), i.e., the formal confirmation that an applica- tion, product or system, meets its required characteristics, is needed. In case of AI technology we are interested that the system fulfills dependability goals like safety but maybe also given ethical or moral rules. For example, we want a conver- sational agent or a decision support system not to be racist or sexist. Furthermore, because of the fact that the system’s un- derlying software is updated regularly in order to cope with changes required because of bugs or improved functionality, there is a need of carrying out any certification regularly as well. For example, in autonomous driving we have to assure that a new software update is not going to lead to an unsafe system. However, regression testing may require a lot of ef- fort or come with high costs, which may be reduced when automating testing.

Hence, automating at least part of certification may be a future requirement. But how can certification of AI be car- ried out? What we need is a process where we identify what we want to achieve, and how this can be checked (or tested)?

How can we come up with certain parameters justifying that testing is appropriate? We shall also think about the meth- ods for checking, their limitations, and how to assure that the methods can guarantee (with respect to a given certainty) that the system fulfills requested needs. However, in any case

in order to bring AI technology into practice, we have to con- vince customers that the systems are not of harm. Certifica- tion that takes into account such customer’s considerations as well as regulations provide the right means for further supporting the delivery of AI technology into practical ap- plications we are using on a daily basis.

It is worth noting that there are many initiatives like the ethics guidelines for trustworthy AI (Pietil¨a et al. 2019) for coming up with first steps of how AI-based systems have to be constructed, evaluated, and verified. However, for ex- ample, in autonomous driving such principles have to be concretized leading to practical rules companies can follow when developing AI systems or systems at least partially based on AI methodologies and tools.

Position 2 There is a need for well-defined certification and homologation processes for AI-based systems that ideally can be carried out in an automated way. Such certification and homologation processes shall rely on existing guidelines considering all aspects of trustworthy AI.

When we want to carry out certification at least partially automated we may rely on testing. Hence, we have to state the question whether existing testing techniques can be used for confirming that an AI-based system fulfills regulation and other rules and expectations. This includes besides test- ing functionality the degree of fulfilling generally agreed ethical and moral rules. In the following section, we intro- duce three techniques that can (at least partially) serve this purpose.

Testing AI

As discussed there seems to be a need for testing the whole system considering functional and non-functional require- ments including moral and ethical rules. For testing systems at the system level black-box approaches are used that do not consider the internal structure. Various methods with corresponding tools have been proposed including model- based testing (MBT) (Utting and Legeard 2006), combina- torial testing (CT) (Kuhn et al. 2015), or metamorphic test- ing (Chen, Cheung, and Yiu 1998). MBT makes use of a model of the system for obtaining test cases. In order to find critical interactions between the system and its environment this may not be sufficient. It would be required to model the environment including potential interactions and have a look about the reactions of the system.

The focus on modeling the environment of the system in order to obtain test cases is somehow different to ordinary MBT where a model of the system is used for test case gen- eration. Changing from modeling the system to modeling the environment is necessary for finding critical interactions between an AI-based system and its environment. Moreover, in this kind of testing we are not interested in showing that an implementation works accordingly to a model, but is ca- pable of handling arbitrary interactions that may not be fore- seen during development.

In contrast to MBT, CT has been developed to search for critical interactions between configuration parameters and inputs. It has been shown that CT can effectively detect faults in many different kinds of software (Kuhn et al. 2009).

(4)

Figure 3: The last episode of a failing test case applied to an implementation of an automated emergency braking system, close to the time where a simulated pedestrian tries to cross the street coming from the right side. The crash occurred in a scenario where another vehicle at the front brakes, caus- ing the ego vehicle to brake. A first pedestrian crossing the street from left passing by, and the second one coming from right who is overseen by the automated emergency braking system and hit.

The question is whether we can also apply CT for AI test- ing? In (Li, Tao, and Wotawa 2020) the authors introduced an approach utilizing a model of the system environment in combination with CT for obtaining a test suite. In their pa- per, the authors not only provide the foundations but also re- ported on a case study where the authors tested an automated emergency braking (AEB) function. From 319 test cases, 9 test cases lead to crashes (including test cases where pedes- trians would have been killed (see Figure 3)), and 30 were considered as being critical. It is worth noting that the pro- posed overall approach also includes a simulation environ- ment for carrying out the generated test cases in a realistic setting automatically.

(Kl¨uck et al. 2019) introduced an alternative method for generating critical scenarios, where the authors rely on ge- netic algorithms for obtaining test cases. The idea is to model test cases as genes that can be crossed and mutated.

The evaluation function maps test cases to a goodness value.

In each generation the best test cases are taken modified and again evaluated. This kind of testing is also referred to search-based testing. In (Kl¨uck et al. 2019) the authors also evaluated the approach using an AEB function too. The ob- tained results showed that genetic algorithms can be applied to detect faults in the setting of autonomous and automated driving leading to the following position:

Position 3 Combinatorial testing and search-based testing are effective testing techniques for identifying critical sce- narios.

CT and also search-based testing applied to test au- tonomous and automated driving functions always has to fulfill the property that no crash with another car or even a pedestrian occurs. In this context closeness to a crash is often represented as the time to collision (TTC), where 0 means that a crash occurs. Usually, in many applications positive but small TTC values may also be considered as unwanted.

When testing in the case of the automotive domain includ-

ing autonomous driving, we can always rely on the TTC for judging whether a test case passes or fails.Hence, there the test oracle can be automated using the TTC, which is not al- ways the case when testing AI. We, therefore, require other means for dealing with the oracle problem, i.e., providing a function that allows to distinguish passing executions of programs and systems from failing ones.

The objective behind metamorphic testing (Chen, Che- ung, and Yiu 1998) is to provide a solution to the oracle problem of testing. The underlying idea is to define rela- tions over different inputs that always deliver the same out- put. For example, sin(x) is equivalent for all values ofx andx+ 2·π, i.e.,sin(x) = sin(x+ 2·π)always holds.

In (Guichard et al. 2019) and more specifically in (Bozic and Wotawa 2019) the authors proposed the use of metamorphic testing for testing conversational agents, i.e., chatbots. The underlying described idea was to propose relations consid- ering semantical relationships between words and sentences, e.g., some sentences have the same semantics when replac- ing one word with its synonym, or sometimes the sequence of sentences given to a chatbot, does not change the answer provided by the chatbot. Moreover, we are able to test for fulfilling certain moral and ethical regulations. For example, if an answer of a chatbot should not be influenced by the race or sex of the chat participant, we can be formulate this as a metamorphic relation, where we say that a conversion considering one race or sex should lead to the same results when changing race or sex. In case of AI systems, where we are able to come up with metamorphic relations, we are also able to apply metamorphic testing for solving the oracle problem.

Position 4 Metamorphic testing seems to be of use for im- plementing the test oracle problem of AI systems allowing to identifying contradictions with requirements, which may include ethical and moral considerations.

There are more system testing approaches that can be also adapted to fit the purpose of AI testing with the objective of assuring safety of AI-based systems and software. However, we have identified approaches where there is experimental evidence that they could be effectively used for testing AI- based systems. These approaches may also fit into certifica- tion and homologation processes. For this purpose, certain measures have to be developed that can be used for deciding when to stop testing in cases no failing test case could be obtained.

Moreover, the presented methods and techniques for test- ing AI-based systems have disadvantages. They are mainly focusing on quality assurance of the overall system and not its comprising parts. For example, the CT approach consid- ers a model of the environment, which works as basis for ob- taining the CT input model. The approach is testing whether certain interactions of the CT with its environment reveal a fault, and in case of automated or autonomous driving a crash, but does not consider any knowledge regarding the SUT’s internal structure or behavior. Finding out the root cause of any misbehavior within the SUT might be com- plicated. Moreover, we are not able to make use of quality assurance measures like code coverage or mutation score for

(5)

the particular test suite. Furthermore, CT like MBT requires to concretize the abstract test cases computed using these testing methods. This concretization step cause additional effort and has to be done carefully in order to come up with good test cases that can be executed and most likely reveal a fault.

In case of metamorphic testing it is essential to define the metamorphic relations, which cause additional effort and in- fluence the ability to work as a good test oracle. There are maybe metamorphic relations leading to test cases a SUT can easy fulfill only allowing to test a fraction of function- ality. In such cases metamorphic testing would not lead to tests covering most of the functionality and, therefore, can be considered as incomplete. Search-based testing requires to implement a search procedure using a function allowing to estimate the quality of a current test, e.g., the ability of a test revealing a fault. Again this requires additional effort and costs. It is worth noting that in some cases random test- ing, i.e., generating test inputs using a random procedure, also provides fault revealing test cases requiring even less time than search-based testing at almost no additional costs.

Conclusion

In this position paper, we focused on providing an answer to the question whether there exists testing techniques that can be efficiently used for checking that a software or system comprising AI methodologies fulfills requirements includ- ing also moral and ethical rules, and regulations. We also discussed the involved challenges of testing where we iden- tified also shortcomings that arise when only focusing on specific parts and not providing a holistic view. Finally, we introduced several testing methods that have been developed in the context of testing ordinary systems and elaborate on their usefulness in the context of AI-based systems. Search- based testing, combinatorial testing, and metamorphic test- ing seem to be excellent candidate for this purpose and may also be of use for automating certification and homologation processes for AI applications.

However, further studies have to be carried out. For CT more experiments making use of other autonomous and au- tomated functions have to be considered. Moreover, we re- quire to come up with certain measures of guarantees for the computed test suites. Parameters of CT like the combinato- rial strength maybe sufficient but in the context of AI-based systems there is no experimental evidence. For metamorphic testing we further need more use cases and experimental evaluations making use of AI-based systems. In the case of chatbots and also logic-based reasoning metamorphic test- ing has already been successfully applied. However, there is a need to show the usefulness of metamorphic testing also in other applications where AI technology is a central part.

Acknowledgments

The research was supported by ECSEL JU under the project H2020 826060 AI4DI - Artificial Intelligence for Digitising Industry. AI4DI is funded by the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) under the program ”ICT of the Future” between May 2019 and

April 2022. More information can be retrieved from https:

//iktderzukunft.at/en/ .

References

Ammann, P.; Offutt, J.; and Huang, H. 2003. Coverage Cri- teria for Logical Expressions. InProceedings of the 14th In- ternational Symposium on Software Reliability Engineering, ISSRE ’03. Washington, DC, USA: IEEE Computer Society.

Bozic, J.; and Wotawa, F. 2019. Testing Chatbots Us- ing Metamorphic Relations. In Gaston, C.; Kosmatov, N.;

and Le Gall, P., eds.,Testing Software and Systems, 41–55.

Cham: Springer International Publishing. ISBN 978-3-030- 31280-0.

Chen, T.; Cheung, S.; and Yiu, S. 1998. Metamorphic test- ing: a new approach for generating next test cases. Technical report, Department of Computer Science, Hong Kong Uni- versity of Science and Technology, Hong Kong. Technical Report HKUST-CS98-01.

Chetouane, N.; Klampfl, L.; and Wotawa, F. 2019. Inves- tigating the Effectiveness of Mutation Testing Tools in the Context of Deep Neural Networks. In IWANN (1), vol- ume 11506 ofLecture Notes in Computer Science, 766–777.

Springer.

Goodfellow, I.; McDaniel, P.; and Papernot, N. 2018. Mak- ing Machine Learning Robust Against Adversarial Inputs.

Commun. ACM 61(7): 56–66. ISSN 0001-0782. doi:10.

1145/3134599. URL http://doi.acm.org/10.1145/3134599.

Guichard, J.; Ruane, E.; Smith, R.; Bean, D.; and Ven- tresque, A. 2019. Assessing the Robustness of Conversa- tional Agents using Paraphrases. In2019 IEEE International Conference On Artificial Intelligence Testing (AITest), 55–

62.

Jia, Y.; and Harman, M. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Transactions on Software Engineering37(5): 649–678.

Kim, J.; Feldt, R.; and Yoo, S. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. InProceedings of the 41st International Conference on Software Engineering, ICSE’19, 1039–1049. IEEE Press. doi:10.1109/ICSE.2019.

00108. URL https://doi.org/10.1109/ICSE.2019.00108.

Klampfl, L.; Chetouane, N.; and Wotawa, F. 2020. Mutation Testing for Artificial Neural Networks: An Empirical Eval- uation. InIEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 356–365. IEEE.

Kl¨uck, F.; Zimmermann, M.; Wotawa, F.; and Nica, M.

2019. Performance Comparison of Two Search-Based Test- ing Strategies for ADAS System Validation. In Gaston, C.;

Kosmatov, N.; and Le Gall, P., eds.,Testing Software and Systems, 140–156. Cham: Springer International Publishing.

ISBN 978-3-030-31280-0.

Koopman, P.; and Wagner, M. 2016. Challenges in Au- tonomous Vehicle Testing and Validation.SAE Int. J. Trans.

Safety4: 15–24. doi:10.4271/2016-01-0128. URL https:

//doi.org/10.4271/2016-01-0128.

(6)

K¨oroglu, Y.; and Wotawa, F. 2019. Fully automated com- piler testing of a reasoning engine via mutated grammar fuzzing. In Choi, B.; Escalona, M. J.; and Herzig, K., eds., Proceedings of the 14th International Workshop on Automa- tion of Software Test, AST@ICSE 2019, May 27, 2019, Mon- treal, QC, Canada, 28–34. IEEE / ACM. doi:10.1109/AST.

2019.00010. URL https://doi.org/10.1109/AST.2019.00010.

Kuhn, D.; Kacker, R.; Lei, Y.; and Hunter, J. 2009. Combi- natorial Software Testing. Computer94–96.

Kuhn, D. R.; Bryce, R.; Duan, F.; Ghandehari, L. S.; Lei, Y.;

and Kacker, R. N. 2015. Combinatorial Testing: Theory and Practice. InAdvances in Computers, volume 99, 1–66.

Li, Y.; Tao, J.; and Wotawa, F. 2020. Ontology-based test generation for automated and autonomous driving functions.

Inf. Softw. Technol.117. doi:10.1016/j.infsof.2019.106200.

URL https://doi.org/10.1016/j.infsof.2019.106200.

Li, Z.; Ma, X.; Xu, C.; and Cao, C. 2019. Structural Cov- erage Criteria for Neural Networks Could Be Misleading.

In2019 IEEE/ACM 41st International Conference on Soft- ware Engineering: New Ideas and Emerging Results (ICSE- NIER), 89–92.

Ma, L.; Zhang, F.; Sun, J.; Xue, M.; Li, B.; Juefei-Xu, F.;

Xie, C.; Li, L.; Liu, Y.; Zhao, J.; et al. 2018a. Deepmu- tation: Mutation testing of deep learning systems. In2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), 100–111. IEEE.

Ma, L.; Zhang, F.; Xue, M.; Li, B.; Liu, Y.; Zhao, J.; and Wang, Y. 2018b. Combinatorial testing for deep learning systems.arXiv preprint arXiv:1806.07723.

Menzel, T.; Bagschik, G.; and Maurer, M. 2018. Scenarios for Development, Test and Validation of Automated Vehi- cles. InarXiv:1801.08598. URL https://arxiv.org/abs/1801.

08598. Appeared in Proc. of the IEEE Intelligent Vehicles Symposium.

Pei, K.; Cao, Y.; Yang, J.; and Jana, S. 2017. Deepxplore:

Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles, 1–18. ACM.

Pietil¨a, P. A.; et al. 2019. Ethics Guidelines For Trustworthy AI. High-Level Expert Group on AI, European Commission.

Sekhon, J.; and Fleming, C. 2019. Towards Improved Test- ing For Deep Learning. In2019 IEEE/ACM 41st Interna- tional Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), 85–88.

Su, J.; Vargas, D. V.; and Sakurai, K. 2019. One Pixel Attack for Fooling Deep Neural Networks. IEEE Transactions on Evolutionary Computation1–1. ISSN 1089-778X. doi:10.

1109/TEVC.2019.2890858.

Sun, Y.; Huang, X.; and Kroening, D. 2018. Testing deep neural networks. arXiv preprint arXiv:1803.04792. Utting, M.; and Legeard, B. 2006. Practical Model-Based Testing - A Tools Approach. Morgan Kaufmann Publishers Inc.

Wotawa, F. 2016. Testing Self-Adaptive Systems using Fault Injection and Combinatorial Testing. InProceedings of the Intl. Workshop on Verification and Validation of Adaptive Systems (VVASS 2016). Vienna, Austria.

Wotawa, F. 2018. Combining Combinatorial Testing and Metamorphic Testing for Testing a Logic-based Non- Monotonic Reasoning System. InIn Proceedings of the 7th International Workshop on Combinatorial Testing (IWCT) / ICST 2018.

Wotawa, F. 2019. On the importance of system testing for assuring safety of AI systems. InCEUR Workshop Proceed- ings , Workshop on Artificial Intelligence Safety, AISafety 2019, volume 2419. Macao, China. URL http://ceur-ws.org/

Vol-2419/.

Références

Documents relatifs

In this dissertation, we presented studies, techniques, and tools that contribute to enabling the practical use of mutation testing and support mutation testing research. We focus

stochastic test cases include both test cases provided by software engineers and PUIO sequences, we need to observe that both the following output sequences occur with the

Model testing tackles the untestability challenge by enabling software engineers to (1) execute and validate a much larger number of test scenarios, (2) develop test strategies

Enabling testing of large scale highly configurable systems with search-based software engineering: the case of model-based software product lines Abstract.. Complex situations

2. By strengthening the quality of components by taking into account their involvement in the global system that encapsulates them. The main idea consists in showing how to

This latter allows us to test, from a global behavior of a system, the behavior of its involved sub-systems and then guide the component testing intelligently by taking into account

For this purpose, we have: (i) performed VNS in freely moving animals, chronically implanted with an electrode on the left cervical VN using stimulation parameters

A regularization algorithm allowing random noise in derivatives and inexact function val- ues is proposed for computing approximate local critical points of any order for