• Aucun résultat trouvé

Risk estimation and prediction of cyber attacks

N/A
N/A
Protected

Academic year: 2021

Partager "Risk estimation and prediction of cyber attacks"

Copied!
210
0
0

Texte intégral

(1)

Risk Estimation and Prediction of Cyber Attacks

Thèse

Pavel Yermalovich

Doctorat en informatique

Philosophiæ doctor (Ph. D.)

Québec, Canada

(2)

Résumé

L’utilisation de l’information est étroitement liée à sa sécurité. Le fait d’exploiter des vulnéra-bilités permet à une tierce personne de compromettre la sécurité d’un système d’information. La modélisation des menaces aide à prévoir les attaques les plus probables visant une infra-structure donnée afin de mieux les contrer.

Le projet de recherche proposé « Estimation des risques et prédiction des cyber-attaques » vise la combinaison de différentes techniques de prédiction de cyber-attaques pour mieux protéger un système informatique. Il est nécessaire de trouver les paramètres les plus informatifs, à savoir les marqueurs de prédiction d’attaque, pour créer des fonctions de probabilité d’attaque en fonction de temps. La prédiction d’une attaque est essentielle pour la prévention des risques potentiels. Par conséquent, la prévision des risques contribue beaucoup à l’optimisation de la planification budgétaire de la sécurité de l’information. Ce travail scientifique se concentre sur l’ontologie et les étapes d’une cyber-attaque, ainsi que les principaux représentants du côté attaquant et leur motivation.

La réalisation de ce travail scientifique aidera à déterminer, en temps réel, le niveau de risque d’un système d’information afin de le reconfigurer et mieux le protéger. Pour établir le niveau de risque à un intervalle de temps sélectionné dans le futur, il faut effectuer une décom-position mathématique. Pour ce faire, nous devons sélectionner les paramètres du système d’information requis pour les prévisions et leurs données statistiques pour l’évaluation des risques. Néanmoins, le niveau de risque réel peut dépasser l’indicateur établi. N’oublions pas que, parfois, l’analyse des risques prend trop de temps et établit des valeurs de risques déjà dépassées.

Dans la réalisation de ce travail scientifique, nous continuerons d’examiner la question de l’obtention de valeurs de risque en temps réel. Pour cela, nous introduirons la méthode auto-matisée d’analyse des risques, qui aidera à révéler la valeur du risque à tout moment. Cette méthode constitue la base pour prédire la probabilité d’une cyber-attaque ciblée. Le niveau de risque établi permettra d’optimiser le budget de sécurité de l’information et de le redistribuer pour renforcer les zones les plus vulnérables.

(3)

Abstract

The use of information is inextricably linked with its security. The presence of vulnerabilities enables a third party to breach the security of information. Threat modelling helps to identify those infrastructure areas, which would be most likely exposed to attacks.

This research project entitled “Risk estimation and prediction of cyber attacks” aims to com-bine different techniques for predicting cyber attacks to better protect a computer system. It is necessary to find the most informative parameters, namely the attack prediction markers, to create functions of probability of attack as a function of time. The prediction of an attack is essential for the prevention of potential risk. Therefore, risk forecasting contributes a lot to the optimization of the information security budget planing. This scientific work focuses on ontology and stages of a cyberattack, as well as the main representatives of the attacking side and their motivation.

Carrying out this scientific work will help determine, in real time, the risk level of an infor-mation system in order to reconfigure and better protect it. To establish the risk level at a selected time interval in the future, one has to perform a mathematical decomposition. To do this, we need to select the required information system parameters for the predictions and their statistical data for risk assessment. Nevertheless, the actual risk level may exceed the established indicator. Let us not forget that sometimes, the risk analysis takes too much time and establishes already outdated risk values.

In this scientific work, we will continue reviewing the issue of obtaining real-time risk values. For this, we will introduce the automated risk analysis method, which will help to reveal the risk value at any time point. This method forms the basis for predicting the probability of a targeted cyber attack. The established risk level will help to optimize the information security budget and redistribute it to strengthen the most vulnerable areas.

(4)

Contents

Résumé ii Abstract iii Contents iv List of Tables ix List of Figures x Acknowledgements xiii Foreword xiv Introduction 1

1 Formalization of attack prediction problem 13

1.1 Résumé . . . 13

1.2 Abstract . . . 14

1.3 Introduction. . . 14

1.4 Related Work . . . 15

1.4.1 Statistical analysis . . . 15

1.4.2 Structural system analysis . . . 16

1.4.3 Log analysis. . . 19

1.5 Formalization of problem . . . 20

1.5.1 Introduction of our approach on the basis of a simplified example . . 20

1.5.2 Formalization of the Problem . . . 22

1.5.3 Asset valuation . . . 28

1.5.4 Defense valuation. . . 34

1.6 Conclusion . . . 37

Bibliography 38 2 Determining the probability of cyberattacks 40 2.1 Résumé . . . 40

2.2 Abstract . . . 41

2.3 Introduction. . . 41

2.3.1 Motivation . . . 41

(5)

2.4 Related Work . . . 43

2.4.1 Formal analysis of attack graphs . . . 43

2.4.2 Threat modeling . . . 45

Injection. . . 46

2.4.3 Threat Prediction Platform . . . 47

2.4.4 Types of protection. . . 49 2.5 Comparison table . . . 52 2.6 Formal consideration . . . 53 2.6.1 Problem . . . 53 2.6.2 Formalization . . . 54 2.7 Risk analysis . . . 56

2.7.1 Analysis of business activities or processes . . . 56

2.7.2 Audit of the protection techniques applied to ensure the assets safety 58 2.8 System states modeling . . . 60

2.8.1 Cyberattack probability evaluation in each of the states . . . 62

2.9 Example of simplified website administration on a cloud dedicated server . . 68

2.9.1 Example of assets analysis. . . 70

2.9.2 Example of protection measures analysis . . . 71

2.9.3 Example of system states modeling . . . 72

2.10 Future Work . . . 73

Bibliography 78 3 Information security risk assessment based on decomposition probabil-ity via Bayesian Network 81 3.1 Résumé . . . 81 3.2 Abstract . . . 82 3.3 Introduction. . . 82 3.3.1 Motivation . . . 82 3.3.2 Our Contributions . . . 83 3.4 Background . . . 84

3.4.1 Harmonized Method of Risk Analysis . . . 84

3.4.2 OWASP Risk Rating Methodology . . . 85

3.4.3 Common Vulnerability Scoring System . . . 87

3.5 Information security risk assessment based on decomposition of the risk formula . . . 89

3.5.1 Impact based on the contextual approach . . . 91

Exemple 1. . . 93

Exemple 2. . . 93

Exemple 3. . . 93

3.5.2 Risk decomposition probability . . . 94

Attacks . . . 94

Threats . . . 95

Vulnerabilities . . . 97

Exploitability . . . 97

Attack vectors . . . 98

3.5.3 Example of simplified website administration on a cloud dedicated server . . . 99

(6)

Example of assets analysis . . . 101

Example of analysis the vulnerability . . . 102

Example of protection measures analysis . . . 103

Example of threat analysis . . . 104

Example of analysis of the probability of attack P(a) . . . 105

Example of analysis of the probability of attacks P(A) . . . 106

3.6 Related Work . . . 107

Bibliography 109 4 Ontology-based model for security assessment: predicting cyber at-tacks through threat activity analysis 111 4.1 Résumé . . . 111 4.2 Abstract . . . 112 4.3 Introduction. . . 112 4.3.1 Motivation . . . 113 4.3.2 Our Contributions . . . 114 4.4 Components of ontology . . . 115 4.4.1 Definitions . . . 115

4.4.2 Types of cyber attacks . . . 115

4.4.3 Attack Patterns. . . 116 Attack Pattern ID . . . 116 Description . . . 117 Likelihood Of Attack. . . 117 Typical Severity . . . 117 Relationship . . . 117 Execution Flow . . . 117 Prerequisites . . . 119 Skills Required . . . 119 Consequences . . . 119 Mitigations . . . 119 Related Weaknesses . . . 119

4.4.4 Components of a successful cyberattack . . . 119

Reconnaissance . . . 119

Scanning . . . 120

Weaponization . . . 120

Exploitation: Access and escalation. . . 120

Exfiltration . . . 121

Assault . . . 121

Sustainment. . . 121

Obfuscation . . . 121

Taking back control . . . 121

4.4.5 Classification of hackers . . . 122 White hat . . . 123 Black hat . . . 123 Grey hat . . . 124 Script kiddie . . . 124 Blue hat . . . 125

(7)

Hacktivists (cyberactivists) . . . 125

4.4.6 Theory of human needs and motivation . . . 125

4.5 Decomposing Attack Probabilities into Probability Components . . . 126

4.5.1 Attacks . . . 128

4.5.2 Threats . . . 128

4.5.3 Vulnerabilities . . . 129

4.5.4 Exploitability . . . 129

4.5.5 Attack vectors . . . 130

4.6 Prediction of a cyberattack based on categorization of threats . . . 130

4.7 Future work . . . 135

Bibliography 137 5 Dashboard Visualization Techniques in Information Security 140 5.1 Résumé . . . 140

5.2 Abstract . . . 141

5.3 Introduction. . . 141

5.4 Background . . . 142

5.4.1 Definition of Data Dashboard in Information Security . . . 142

5.4.2 Stages of Creation of Data Dashboards. . . 142

5.5 Methodology . . . 143

5.6 Definition of users of Data Dashboard . . . 145

5.6.1 Responsibilities of ISM and ISC. . . 145

5.7 Creation of indicators . . . 146

5.7.1 Information required for completion of tasks by ISM . . . 147

5.7.2 Information required for completion of tasks by ISC . . . 147

5.8 Visualization Techniques . . . 148

5.8.1 List of visual presentation techniques . . . 150

5.8.2 Visualization prototype . . . 152

5.9 Layout Testing . . . 154

5.10 Related Work . . . 154

5.11 Future Work . . . 157

Bibliography 159 6 Risk Forecasting Automation on the Basis of MEHARI 161 6.1 Résumé . . . 161 6.2 Abstract . . . 162 6.3 Introduction. . . 162 6.3.1 Motivation . . . 162 6.3.2 Our Contributions . . . 163 6.4 Background Information . . . 164

6.4.1 Harmonized Method of Risk Analysis . . . 164

6.5 Proposed method . . . 169

6.5.1 Attacks . . . 170

6.5.2 Threats . . . 172

6.5.3 Vulnerabilities . . . 173

(8)

6.5.5 Attack vectors . . . 174

6.6 Experiment result. . . 177

6.7 Related work . . . 179

6.7.1 OWASP Risk Rating Methodology . . . 181

6.7.2 Quantitative CVSS-based cyber security risk assessment methodology 182

Bibliography 186

Conclusion 188

(9)

List of Tables

1.1 MySQL Stored SQL Injection (CVE-2013-0375) [14] . . . 26

1.2 Example of asset valuation applying MEHARI [11] for content management: confidentiality (C), integrity (I), availability (A) and Efficiency (E) (manage-ment processes, with respect to compliance with laws or regulations) . . . 32

1.3 Set of options for calculating probability and influence . . . 33

2.1 Comparison of Risk Prediction methodologies and techniques . . . 52

3.1 Visualization of the component P (−→v |dj) . . . 99

3.2 Example of the likelihood of successfully exploiting the vulnerability . . . 103

3.3 Example of the likelihood of successfully exploiting the vulnerability through the attack vector when counteracting the protective equipment . . . 104

3.4 Example of the likelihood of successfully exploiting the vulnerability via threat 105 3.5 Example of the attack probability value (SQL-injection) . . . 106

3.6 Example of the likelihood of attacks . . . 106

4.1 Hackers and their motivation . . . 127

4.2 Visualization of the component P (−→v |dj) . . . 131

6.1 Main risk assessment methods . . . 163

6.2 Content comparison in various MEHARI versions . . . 164

6.3 Visualization of component P(−→v |dj) . . . 174

(10)

List of Figures

0.1 Research steps . . . 4

0.2 General scheme of the research project . . . 5

0.3 Chronology of preparation of articles and their abstracts . . . 7

1.1 Prediction timeline [18] . . . 18

1.2 User classification steps [6]. . . 19

1.3 Website administration on a cloud dedicated server . . . 21

1.4 Representation of the use of firewall in the workplace . . . 25

1.5 Risk analysis . . . 30

1.6 System configuration change cycle . . . 36

2.1 Network vulnerability analysis [31] . . . 43

2.2 Reminder based on AI2 system bandwidth and unsupervised outlier analysis after three months of deployment [32] . . . 48

2.3 General scheme of the research project . . . 54

2.4 MEHARI Synthesis table : Risk seriousness [22] . . . 58

2.5 Graphical representation of the periods when risks take the passed calculated value . . . 59

2.6 System configuration change cycle . . . 60

2.7 System states modeling . . . 61

2.8 Graphical representation of states and risk levels . . . 62

2.9 Attack consists of exploiting an existing vulnerability using an attack vector . . 63

2.10 Visualized decomposition of an attack on an asset. . . 64

2.11 Website administration on a cloud dedicated server . . . 69

2.12 The mechanism of game for cybersecurity . . . 75

3.1 Graphical representation of the periods when risks take the passed calculated value . . . 85

3.2 Visualized decomposition of an attack on an asset. . . 90

3.3 Graphical representations of the example of Impact based on the contextual approach . . . 92

3.4 Graphical representations of the examples of Impact based on the contextual approach . . . 94

3.5 Attack consists of exploiting an existing vulnerability using an attack vector . . 95

3.6 The attack consists of exploiting an existing vulnerability using an attack vector 96 3.7 Example based on the administration of an Internet site hosted on a dedicated virtual server . . . 100

(11)

4.1 Ontology components . . . 114

4.2 Successful cyberattack . . . 120

4.3 Classification of hackers . . . 122

4.4 Attack probability decomposition: exploiting the existing vulnerability using an attack vector . . . 127

4.5 Threat activity analysis for establishing the likelihood of an attack in the future 132 4.6 Cyberattack prediction stages . . . 133

5.1 Schematic connection of dashboard components. . . 144

5.2 Information overloaded dashboard. . . 149

5.3 Visualization prototype. . . 153

5.4 The prototype presented on the tablet. . . 155

5.5 The main tasks of SOC. . . 156

5.6 Microarchitecture of SOC. . . 157

6.1 Graphical representation of risks taking the passed calculated values . . . 166

6.2 Graphical structural representation of MEHARI Expert [9] . . . 168

6.3 Risk assessment scheme proposed by MEHARI Expert . . . 170

6.4 Attack probability decomposition: exploiting the existing vulnerability using an attack vector . . . 171

6.5 Risk assessment and prediction scheme proposed by MEHARI Expert . . . 175

6.6 Example based on the administration of an Internet site hosted on a dedicated virtual server [19] . . . 176

6.7 Security level determining speed: New approach versus MEHARI . . . 178

6.8 Information Security Risk Assessment Model Based on Dynamic Bayesian Net-works [14] . . . 180

(12)

The greatest victory is that which requires no battle.

(13)

Acknowledgements

First of all, I would like to sincerely thank my research advisor, Professor Mohamed Mejri, for giving me a chance to work with him and for his support and guidance. I could not have asked for a better advisor! You are brilliant, passionate and you do not hesitate to help the people around you! I especially want to thank you for sharing your experience and encouraging me to explore my ideas regularly, even when they were not directly linked to the research topic. You were always inspiring me to do more. It is without hesitation and with all my heart that I would recommend you to another student.

Special thanks to the following individuals who helped me achieve the completion of this work: • Great Magister Wiwa

• Alexander Kovalev • Nadia Tawbi • Pascal Tesson • Béchir Ktari • Anne Laurent • Andrey Kajava • Mariia A. Umryk • Iryna Shuliak • Oleksandra Levchenko • Mahedine Djamaï • Bita Sadeghi-Tabatabai • André L’Écuyer • Anatoly L. Zharin • Mohamed Barkaoui • Raphael Khoury

I would like to sincerely thank the members of the Jury for agreeing to participate in correcting this thesis, but also for the attention they have shown to my work.

Finally, I would like to thank my whole family for their support. It is thanks to you that I am where I am and who I am today.

Thanks to Aqua, for teaching me the truth: "We have little time, so we will do it anyhow". God’s Blessing on this Wonderful World!

(14)

Foreword

We present in this thesis six different papers which have all been peer-reviewed and presented in various conferences. Here are some information regarding the authors and their roles for each paper.

Formalization of Attack Prediction Problem

The paper was written by Pavel Yermalovich jointly with Professor Mohamed Mejri. It was published in the Proceedings of the 2018 IEEE International Conference "Quality Manage-ment, Transport and Information Security, Information Technologies" (IT&QM&IS). Changes between the integrated version of the article and its published version include:

• Extended version of section Formalization of the Problem.

• Added section Asset valuation (example of using theoretical calculations). • Added section Defense valuation (example of using theoretical calculations).

The article was written by Pavel Yermalovich before the presentation of the research project. Professor Mohamed Mejri coordinated the research direction and pointed out the priority research areas. Pavel Yermalovich took over the entire process of writing and editing the scientific article.

Determining the probability of cyberattacks

The paper was written by Pavel Yermalovich jointly with Professor Mohamed Mejri. It was published in the Proceedings of the "5th International Conference on Engineering and Formal Sciences Brussels" in 2020. Following the presentation, this paper was selected for publication in the "European Journal of Engineering and Formal Sciences" in 2020. Changes between the integrated version of the article and its published version include:

(15)

– Added subsection Formal analysis of attack graphs. – Added subsection Threat modelling.

– Added subsection Threat Prediction Platform. – Added subsection Types of protection.

– Added subsection Comparison table of methodologies and techniques relating to the Risk Prediction Problem.

• Extended version of section System states modelling.

• Extended version of section Example of simplified website administration on a cloud dedicated server.

Pavel Yermalovich is the main author. Professor Mohamed Mejri provided feedback through-out the article. Pavel Yermalovich took over the entire process of writing and editing the scientific article.

Information security risk assessment based on decomposition

probability via Bayesian Network

The paper was written by Pavel Yermalovich jointly with Professor Mohamed Mejri. It was ac-cepted for publication in the Proceedings of the "IEEE International Symposium on Networks, Computer and Communications" in 2020. The symposium was postponed from June 2020 to October 2020 due to the COVID-19 pandemic. Changes between the integrated version of the article and its published version include:

• Added section Background:

– Added subsection Harmonized Method of Risk Analysis. – Added subsection OWASP Risk Rating Methodology. – Added subsection Common Vulnerability Scoring System.

• Extended version of section Information security risk assessment based on decomposition of the risk formula.

• Added subsection Example of simplified website administration on a cloud dedicated server:

– Extended version of subsection Example of assets analysis.

– Extended version of subsection Example of analysis the vulnerability. – Extended version of subsection Example of protection measures analysis.

(16)

– Extended version of subsection Example of threat analysis.

– Extended version of subsection Example of analysis of the probability of attack P(a).

– Extended version of subsection Example of analysis of the probability of attacks P(A).

• Extended version of section Related Work.

The article was written by Pavel Yermalovich. Professor Mohamed Mejri helped to implement decomposition probability of risk for information security risk assessment. Pavel Yermalovich took over the entire process of writing and editing the scientific article.

Ontology-based model for security assessment: predicting

cyberattacks through threat activity analysis

This paper was written by Pavel Yermalovich jointly with Professor Mohamed Mejri. It was published in the Proceedings of the "13th International Conference on Security and its Applications (CNSA 2020)" in 2020. Following the presentation, the paper was selected for publication in the following journals: "Computer Science & Information Technology (CS & IT) - Print version" and "The International Journal of Computer Networks & Communications (IJCNC) - Online version" in 2020. There is no difference between the integrated version of the article and its published version. The article was written by Pavel Yermalovich. Professor Mohamed Mejri provided feedback on the ideas presented in the paper. Pavel Yermalovich took over the entire process of writing the scientific article.

Dashboard Visualization Techniques in Information Security

The paper was written by Pavel Yermalovich. It was accepted for publication in the Proceed-ings of the "IEEE International Symposium on Networks, Computer and Communications" in 2020. The symposium was postponed due to COVID-19 from June 2020 to October 2020. Changes between the integrated version of the article and its published version include:

• Added subsection List of visual presentation techniques. • Extended version of subsection Visualization prototype. • Extended version of section Related Work.

(17)

Risk Forecasting Automation on the Basis of MEHARI

This paper was written by Pavel Yermalovich and Professor Mohamed Mejri. It was accepted for publication in the Proceedings of the "Springer Communications in Computer and In-formation Science" in 2020. Changes between the integrated version of the article and its published version include:

• Extended version of subsection Attack vectors.

• Extended version of subsection Quantitative CVSS-based cyber security risk assessment. methodology.

Professor Mohamed Mejri provided feedback on the ideas presented in the paper. Pavel Yermalovich took over the entire process of writing and editing the scientific article.

(18)

Introduction

Today information systems are used by different stakeholders on a regular basis. The use of information is inextricably linked with its security [9] which is founded on confidentiality, integrity and accessibility. Each component of an information security base has its own vul-nerabilities. The exploitation of vulnerabilities allows a third party to breach the security, either entirely or partially (partial breach of confidentiality, integrity or obtainment of access to the information).

However, the amount of identified vulnerabilities, including Zero Day vulnerability, is growing alongside with the number of hackers and information security experts. Hackers can identify and exploit Zero Day [10] vulnerability that is not yet known to the global community. In this case, modeling may result in an inaccurate or false security assessment due to the fact that the conducted analysis is based solely on common vulnerabilities. The proactive scan does ensure a 100% protection [8] against Zero Day vulnerability. This may necessitate the strengthening of safety rings in case of some links which are less prone to attacks. However, there may not be enough time to strengthen the protection of weak links in the systems.

Today there are different systems analyzing logs [3] and NIDS network activity [1]. These systems are relying on already known and established parameters to reveal an activity distinct from the "normal" level. This "normal" level is set by an information security specialist based on his/her own experience. This method has several disadvantages. First, it relies on the experience of a specialist who sets up a security system. After successful installation and configuration of the system, one would only receive system alerts for further investigation of the incident. Initially, we do not know if it is a real attack or a non-standard situation that was not planned by the security specialist during the configuration. The analysis of such information can take much time.

If the attack is successful, it is important to react quickly and correctly to this incident. It is worth to note that checking the Command and control, also known as C&C or C2, is a compli-cated process. If the infected server uses non-standard (exotic) options to receive commands, such as tweets, ICMP tunnel, short-range RF protocols such as Bluetooth, the probability of detecting communication between the infected server and the management infrastructure will be very low [4]. For this reason, it is very important to have tools that can predict an attack

(19)

or recognize it at the stage of its commitment.

The system threat modelling gives a probabilistic image of an attack plan. Unfortunately, the simulation is not time related. We cannot predict when the attack will be committed. The periodic scanning of information systems for known vulnerabilities gives only a list of vulnerabilities of the system. However, this list cannot ensure an accurate risk assessment for each vulnerability found. Thus, for SIEM (Security Information and Event Management), it is important to have a list arranged according to the importance of primary actions and reactions. In SIEM, it is very important to correctly classify this list of primary responses according to the vulnerability found. First, it is necessary to use the results of vulnerability assessment covering the most important assets in order to ensure their protection against identified critical vulnerabilities.

To date, there are training developments for Artificial Intelligence (AI) that are formed through the analysis of traffic logs to identify outliers. With this approach, it is possible to identify an attack with a certain probability in a real time. The difference between IDS and AI is that the AI learns without deeply analyzing the attack signature.

This research aims at developing an attack prediction system based on various system param-eters. Today it is impossible to determine precisely the time point at which the planned attack will be committed and which vector will be used. This confirms the relevance of "prediction of attacks" to be able to identify the levels prone to risks at every moment. Thus, it is proposed to extend risk prediction to all the existing data (risk indicators history).

The risk assessment is limited to the calculation of a risk level at a certain time point, such as a freeze frame in a movie. The full risk analysis encompasses several stages. The implementation of each stage takes time. One or several indicators might be radically modified in the process of calculation of values in the end of one or more stages of a risk level analysis. Hypothetically, the risk indicator can be altered significantly as a result of this process and it may even exceed the maximum allowed level. This deviation can not be promptly tracked, while the analysis takes much time. For example, the conduct of risk assessment with MEHARI Expert tool [6] may take more than six months [11]. From this it follows that in the context of such a risk assessment model, there are periods that remain uncontrolled.

This research is an attempt to propose a method for forecasting the level of risk. In other words, to obtain an analysis of the current information system and to predict the likelihood of attacks for a system S in the future.

• Given: Parameters of information system S: – a set of business processes;

(20)

– a set of protection techniques applied to ensure the safety of assets; – a set of log files;

– a security policy;

– a risk assessment methodology. • Find:

– The level of risk in the future.

At this stage, no attack prediction system is integrated into benchmark risk analysis. However, all the required tools and threat predicting methods are at our disposal [7]. Many benchmarks and risk analysis techniques do not rely on attack prediction techniques to predict risk levels. Based on this, it is necessary to create or modify the existing risk analysis methodology by adding an attack prediction component.

The attack prediction system should be able to integrate risk analysis techniques. At the same time, the methods used by the prediction system should be maximally automated. A system comprising the best risk prediction techniques will help to identify the right information security budget.

Attack prediction is essential for preventing a potential risk. Therefore, risk forecasting con-tributes a lot to the optimization of information security budget planing. This article focuses on ontology and stages of a cyberattack, as well as the main representatives of the attacking side and their motivation.

Let us consider the probability of the asset loss as a result of a possible attack to one of the assets with the traditional security areas of concern: availability (Equation 1), integrity (Equation 2), and confidentiality (Equation 3):

Rasset(Attacksavailability) = k X

i=1 

Pavailability(attacki) · Iavailability(attacki) 

(1)

Rasset(Attacksintegrity) = k X

i=1 

Pintegrity(attacki) · Iintegrity(attacki) 

(2)

Rasset(Attacksconf identiality) = k X i=1 

Pconf identiality(attacki) · Iconf identiality(attacki) 

(3)

where Rasset(Attacks)is a risk value due to an attack event, Pavailability(attacki) is the prob-ability of loss of availprob-ability of asset attacki due to an attack event and Iavailability(attacki)

(21)

is the impact (a likely consequence) of an attack event of loss of availability of asset attacki. The same applies to the integrity and confidentiality of an asset due to an attack event. The number of attacks is expressed by k.

In this thesis we present six different papers that have been peer-reviewed and presented at various conferences. The research steps are displayed in Figure 0.1.

Figure 0.1 – Research steps

(22)

Figure 0.2 – General sc heme of the researc h pro ject

(23)

To that end, we present in this thesis six papers which introduce new ways of formalization of attack prediction problem (Chapter1), new approaches to determining the probability of cyber attacks and the threat activity analysis (Chapter2,4), new means of information security risk assessment and risk visualization (Chapter 3, 5, 6). Figure 0.3 displays the chronology of preparation of articles and their abstracts.

More specifically, the contributions of each paper are described below.

The attack prediction is viewed as a broad topic covering the steps from the prediction of online attacks and ending by crypto attacks. This research is time bound and cannot cover all possible types of attacks. Therefore, our primary focus is on the following issues:

1. Formalization of the cyber attack prediction problem.

2. Development of a mathematical apparatus for risk calculation with treat prediction potential. Threat prediction is considered to be the most important risk predicting component.

3. Introduction of changes to the Attack Ontology including the introduction of essential revisions in the Attack Prediction section.

4. Modification of the existing risk analysis methodology for risk forecasting.

5. Visualization of the risk for different roles in information security (Information Security Manager and Information Security Consultant).

Based on the accepted goals and time limits of the study, we present the steps required to achieve the objectives.

1. Objective: Analysis of the State of the Art of attack prediction problem: • Risk methodology MEHARI.

• Detection of attacks (IDS, IPS, etc.).

2. Objective: Systematization of the State of the Art of attack prediction problem: • Modification of model of ontology for security assessment of attack prediction

prob-lem.

• Determining the probability of cyber attacks.

3. Objective: Identification of signatures (direct and indirect) and characteristics of attacks with a certain probability. Prediction approach:

(24)

Figure 0.3 – Chronology of preparation of articles and their abstracts

• Attack prediction systems.

(25)

• Proposal of attack prediction.

• Creation of new attack prediction approach.

The first objectives assume the systematization of existing knowledge, as well as the definition of attack signatures. These are needed to create the “Related Work” section for scientific articles. They laid a foundation of the “Related Work” section of subsequent scientific articles. Systematization and analysis of research data provided the basis for the creation of articles on the topic of determining the likelihood of cyber attacks, presented in the paper [13], as well as the introduction of changes to the existing attack ontology, presented in the paper [17]. These articles were based on the mathematical decomposition of an attack, a method presented in [16].

Previous articles provided a foundation for the creation of a method that allowed using the accumulated material to change the MEHARI for risk prediction. This technique allows predicting the risk level in the future (this article is under consideration by the conference commission for admission to publication).

The proposed scientific developments listed in the articles rely on the following techniques:

1. Formalization of Attack Prediction Problem: • Systematization and analysis.

• Structural system analysis. • Log analysis.

• Formalization.

2. Determining the probability of cyber attacks: • Systematization and analysis.

• Statistical analysis.

• Machine learning to identify outlines. • Analysis of business activities or processes.

• Audit of the protection techniques applied to ensure the safety of assets. • System states modeling.

• Cyber attack probability evaluation.

• Visualized decomposition of an attack on an asset. • Signatures and Heuristic Defections.

(26)

3. Information security risk assessment based on decomposition probability via Bayesian Network:

• Law of Total Probability [18].

• Contextual method of functional dependence of the choice of the level of impact indicators.

• New approach to the decomposition of a risk Equation into simple components via Bayesian Network [2].

• Risk decomposition probability. • Threat assessment.

• Vulnerabilities’ assessment. • Visualization of attack vectors.

4. Ontology-based model for security assessment: predicting cyber attacks through threat activity analysis:

• Systematization and analysis. • The ontology of cyber attacks.

• Identifying the absolute risk level based on reliable external data, such as OWASP, CVSS, etc.

• Attack Pattern Identification.

• Representation of an attack by stages of its implementation (components of a suc-cessful cyber attack).

• Classification of Hackers.

• Theory of Human Needs and Motivation.

• Prediction of a cyber attack based on the categorization of treats. 5. Dashboard Visualization Techniques in Information Security:

• Systematization and analysis.

• Definition of Data Dashboard in Information Security.

• Responsibilities of Information Security Manager & Information Security Consul-tant.

• Visualization techniques. • Visualization prototype.

• Connection logic of dashboard components. • Layout testing.

(27)

6. Risk Forecasting Automation based on MEHARI: • Systematization and analysis.

• Creation of new attack prediction approach. • Benchmarks based on security standards.

• Connection of external databases to identify the values of threats and vulnerabili-ties.

• Harmonized Method of Risk Analysis (MEHARI). • Risk Forecasting Automation based on MEHARI.

• Ontology-based model for security assessment: predicting cyber attacks through threat activity analysis.

• Information security risk assessment based on decomposition probability via Bayesian Network.

• Determining the probability of cyber attacks.

• Dashboard Visualization Techniques in Information Security.

Contributions

Formalization of attack prediction problem (Chapter 1)

In this chapter, we present an analysis of different techniques with an attempt to identify the most informative parameters and attack prediction markers, which would lay the foundation for the development of attack probability functions. The functional dependencies obtained should be formally verified for further testing on a real system. The findings of this research could be applied during the future assessment of information system risk levels to ensure more effective information security management.

Determining the Probability of cyber attacks (Chapter 2)

Systematization and analysis of research data provided the basis for the redaction of articles on the topic of determining the likelihood of cyber attacks [14]. Thus, it will be possible not only to simulate a threat but to determine its risk level, depending on different system security configurations. This will ensure more effective information security management. This work will serve as a basis for further research in the area of distribution of investments in information security.

(28)

Information security risk assessment based on decomposition probability

via Bayesian Network (Chapter 3)

This article explores the idea of risk assessment in a future period, as a prediction of what we will see in the film later. In other words, the article presents an approach to predicting a potential future risk and suggests the idea of relying on forecasting the likelihood of an attack on information system assets.

To establish the risk level at a selected time interval in the future, one has to perform a mathematical decomposition. To do this, we need to select the required information system parameters for the predictions and their statistical data for risk assessment.

This method can be used to ensure more detailed budget planning when ensuring the protec-tion of the informaprotec-tion system. It can be also applied in case of a change of the informaprotec-tion protection configuration to satisfy the accepted level of risk associated with projected threats and vulnerabilities. This research is also an attempt to prove the possibility of predicting the level of risk by analyzing the weakest points relying on the analysis of statistical data and hackers’ behaviour in different environments. The article was based on the mathematical decomposition of an attack.

Ontology-based model for security assessment: predicting cyber attacks

through threat activity analysis (Chapter 4)

In a course of scientific work preparation, it was decided to update the ontology of attacks. Since then, the study has recognized the need for the introduction of additional parameters describing the attack, and thus its decomposition. This paper introduces the components of ontology to describe a cyberattack. Article presents the ontology components and different types of cyberattacks, attack patterns, components of a successful cyberattack, classification of hackers. This paper provides tools for establishing the likelihood of an attack in the future. We present a summary of ideas aimed at improving the current risk prediction methods. These findings were presented in the article: "Ontology-based model for security assessment: predicting cyber attacks through threat activity analysis" [17].

Dashboard Visualization Techniques in Information Security (Chapter 5)

During the scientific work preparation, we decided to display the level of risk and its forecasts on the information security dashboard. The article "Dashboard Visualization Techniques in Information Security" [12] was developed as part of the course "GLO-7006 Engineering of human-machine interfaces". This article revealed the importance of applying dashboards in the information security field to ensure fast and accurate decision-making. A prompt un-derstanding of the required parameters and their visualization are important components of information security management. This article describe dashboard display techniques based on an advanced understanding of the tasks assigned to standardized roles in the information

(29)

security field. A detailed understanding of the roles performed in information security man-agement is an integral part of the process of creating a dashboard visualization tool. Parallelly, advanced visualization increases the speed of response to an incident in information security, while an adaptation to a specific role enables more ergonomic presentation of data.

Risk Forecasting Automation based on MEHARI (Chapter 6)

Previous articles provided a foundation for the creation of a method that allowed using the accumulated material to change the MEHARI for risk prediction. This technique allows predicting the risk level in the future. We continue reviewing the issue of obtaining real-time risk values. For this, we introduced the automated risk analysis method, which would help to reveal the risk value at any time point. The conduct of risk assessment with the MEHARI Expert tool [5] may take more than six months [15]. It follows that in the context of such a risk assessment model, there are periods that remain uncontrolled. This method forms the basis for predicting the probability of a targeted cyber attack [14]. The established risk level will help to optimize the information security budget and redistribute it to strengthen the most vulnerable areas. This work will significantly contribute to the improvement and modification of the existing risk analysis methods. The automatic collection of data from various sources forms the basis of the risk level automation and decreases the duration of the risk analysis. The article offers ideas for modifying the MEHARI approach as an example. The risk forecasting possibility coupled with the capacity to verify the risk value at any period is essential for ensuring proper information security budgeting. Therefore, this work is emphasizing one of the key trends in the field of risk analysis, namely risk level forecasting.

(30)

Chapter 1

Formalization of attack prediction

problem

Authors: Pavel Yermalovich, Mohamed Mejri

Conference: IEEE International Conference "Quality Management, Transport and Informa-tion Security, InformaInforma-tion Technologies" (IT&QM&IS)

Status: peer reviewed; published1; presented

Year: 2018

1.1

Résumé

L’utilisation des informations est inextricablement liée à leur sécurité. La présence de vulné-rabilités permet à un tiers de violer la sécurité des informations. La modélisation des menaces permet d’identifier les zones d’infrastructure les plus susceptibles d’être exposées aux attaques. Dans certains cas, la modélisation des menaces ne peut pas être classée comme une méthode de protection suffisante. Cet article intitulé "Formalisation du problème de prédiction d’attaque" présente une analyse de différentes techniques avec une tentative d’identifier les paramètres les plus informatifs et les marqueurs de prédiction d’attaque, ce qui jetterait les bases du dévelop-pement de fonctions de probabilité d’attaque. Les dépendances fonctionnelles obtenues doivent être formellement vérifiées par des tests supplémentaires par un système réel. Les résultats de cette recherche pourraient être appliqués lors de la future évaluation des niveaux de risque du

1

(31)

système d’information afin d’assurer une gestion plus efficace de la sécurité de l’information.

1.2

Abstract

The use of information is inextricably linked with its security. The presence of vulnerabilities enables a third party to breach the security of information. Threat modeling helps to identify those infrastructure areas, which would be most likely exposed to attacks. In some cases, threat modeling cannot be classified as sufficient protection method. This paper entitled "Formalization of attack prediction problem" presents an analysis of different techniques with an attempt to identify the most informative parameters and attack prediction markers, which would lay the foundation for the development of attack probability functions. The obtained functional dependencies should be formally verified for further testing by a real system. The findings of this research could be applied during the future assessment of information system risk levels to ensure more effective information security management.

1.3

Introduction

Today there are different systems analizing logs [7] and NIDS network activity [1]. These systems are relying on already known and established parameters to reveal an activity distinct from the "normal" level. This "normal" level is set by an information security specialist based on his/her own experience. This method has several disadvantages. First, it relies on the experience of a specialist who sets up a security system. After successful installation and configuration of the system, one would only receive system alerts for further investigation of the incident. Initially, we do not know if it is a real attack or a non-standard situation that was not planned by the security specialist during the configuration. The analysis of such information can take much time.

If the attack is successful, it is important to react quickly and correctly to this incident. It is worth to note that checking the Command and control, also known as C&C or C2, is a compli-cated process. If the infected server uses non-standard (exotic) options to receive commands, such as tweets, ICMP tunnel, short-range RF protocols such as Bluetooth, the probability of detecting communication between the infected server and the management infrastructure will be very low [8]. For this reason, it is very important to have tools that can predict an attack or recognize it at the stage of its commitment.

(32)

1.4

Related Work

1.4.1 Statistical analysis

Forecasting is the development of a forecast; in the narrow sense it is viewed as a special scientific study of concrete prospects for the further development of any process. The need for the forecast is preconditioned by the desire to know the future, which, in principle, cannot be predicted with 100% accuracy based on statistical, probabilistic, empirical, philosophical principles.

The accuracy of any forecast is preconditioned by:

• The volume of actual reference data (verified) and the period of its collection; • The amount of unverified input data and its collection period;

• The system’s properties, the set objective.

The increase in the amount of factors influencing the forecast’s accuracy, results in its almost complete replacement of a routine calculation with some established error.

The main forecasting methods include:

• Statistical methods; • Modeling methods; • Expertise;

• Intuition (that is to say a forecast made without the use of technical means, impromptu, which assumes that a specialist has experience with the scientific methods previously used in this type of forecast).

Statistical methods of forecasting are a kind of mathematical forecasting methods allowing to build dynamic series for the future. Statistical methods of forecasting cover the designing, study and application of modern mathematical-statistical prediction methods on the basis of objective data (including nonparametric least squares methods with an assessment of forecast accuracy, adaptive methods, autoregressive methods and others); development of the the-ory and practice of probabilistic statistical modeling of expert forecasting methods, including methods for analyzing subjective expert estimates based on statistics of non-numerical data; designing, study and application of forecasting methods in risk setting and combined fore-casting methods using joint economical-mathematical and econometric (both mathematical-statistical and expert) models. The scientific basis of mathematical-statistical forecasting methods is formed

(33)

by applied statistics and decision theory. The simplest methods of reconstructing the depen-dencies used for prediction arise from a given time series, that is, a function defined at a finite number of points on the time axis. In this case, the time series is often considered within the framework of a particular probability model; other factors (independent variables) are introduced in addition to time, for example, the volume of the money supply. The time series can be multidimensional.

1.4.2 Structural system analysis

There are some approaches assuming the division of the system into simple components, for example, a website composed of various plug-ins and using a certain style layout. Each of the plugins, business logic and layouts have their own vulnerabilities. Let’s focus on known vulnerabilities, including Zero Day. The different authors [15] use machine learning tools trying to detect the websites that have not yet been compromised, but may become malicious in the future, over a reasonably long time horizon (about a year). Web server malware often exploits outdated or uncorrected versions of popular content management systems (CMS). The most popular CMS are tested quite often, as evidenced by numerous articles on vulnerabilities found in CMS2. To date, most of the work on malware identification of Web servers, both

in academia such as [4], [9] and industry3, is primarily based on detecting the presence of

an active infection on a website. The basic contribution of this document [18] is to propose, implement and evaluate a general methodology for identifying web servers that are at high risk of becoming malicious before they actually become malicious. In [15], there is a study of using C4.5 [15] decision trees.

A similar search technique is also described in Vasek and Moore’s recent work [18] [20]. Vasek and Moore manually identified the CMS used by a website and studied the correlation between this CMS and the website security.

The following sequence is used to determine cyber attacks [18] :

• Classification of websites; • Learning process;

• Dynamic extraction of the list of features.

The following parameters have been chosen for the prediction of attacks:

• Traffic statistics (AlexaWeb Information Service)4 ;

2https://thehackernews.com/2017/05/hacking-wordpress-blog-admin.html 3

McAfee Site Advisor www.siteadvisor.com/

(34)

• File system structure;

• Webpage structure and contents; – Website size;

– Parking page or immediate forwarding to another website; – Keywords of spam links;

– Keywords;

– Document Object Model (DOM) tree; – Html tag frequency;

– Style tree;

– Adult site? (yes, no);

• Malicious websites (PhishTank)5 ;

• Backlinks (links to the site); • Load percentile;

• Reach per million.

Let us assume that at a given time t, the classifier predicts that a certain website w is likely to become compromised in the future. As far as the website has not been compromised yet - and may not be compromised for a while - we cannot immediately know whether the prediction is correct. Instead, we have to wait until we reach a time (t + h) to effectively verify whether the site has become compromised between t and (t + h), or if the classifier was inaccurate. This is rather problematic, since just training the classifier alone would encompass at least (t + h) and result in additional waiting.

This case is displayed in Figure 1.1(a).

First, one has to decide what is a meaningful value for the horizon h. In the end, one has to design a classifier to be chosen.

Unless otherwise noted, we will assume that h is set to one year. This choice does not affect our classifier design, but impacts the data we are using for the training purposes.

Second, while we cannot predict the future at time t, we can resort to the past here.

More precisely, for training purposes they [18] can solve our issue if we could extract a set of features, and classify the archived version of the website as it appeared at time (t − h). Afterwards, we are checking whether it has become malicious by time.

This algorithm is depicted in Figure1.1 (b).

5

(35)

Figure 1.1 – Prediction timeline [18]

Advantages:

• Taking into account the problem of the prediction of attacks from a new angle: analysis of the structure of the website and a certain number of characteristics.

Disadvantages:

• Surface analysis of a number of website parameters. For example, the amount of back-links. It is necessary to study the quality of these links (trust flow, link quality, incoming languages, forwarding links, textlinks, images, frames, referring domains, anchor text). This can be carried out through various Internet services such as majestic6 and ahrefs7;

• The limits on the classification of the system’s performance can be attributed to the following few difficulties in predicting if the website will become malicious;

• No analysis of logs to check false positive;

• Penetration tests are not performed, which results in the absence of detailed information about the vulnerabilities for patching;

• The absence of generally accessible global database listing the websites "susceptible to attacks". Large companies will try to conceal the information about hacking, because it affects their reputation;

• The absence of correlation among the website content, language, target region and the probability of a threat;

6https://majestic.com/ 7

(36)

• Website layouts undergo frequent changes, which affects the frequency of its scanning; • Many websites have a unique structure and layout (websites of agencies creating Internet

sites). It is almost impossible to find something similar in the attack history.

The application of this method of attack prediction relies on the BigData methods (BigData approach [16]). However, all system parameters have to be identified before applying BigData methods. An in-depth consideration of each system will allow us to determine the character-istics of the system for further analysis.

1.4.3 Log analysis

The forecast is based on historical data. This data can be obtained from the log analysis. Prior to launching a log analysis to detect the presence of attacks, the log should be converted into a convenient format. The proposed option of knowledge extraction from the log files will quickly bring users together. The approach is to integrate the page classification in the user classification [6], in other words, to exploit the results of the page classification in the user classification. The user classification steps are shown in Figure1.2.

Figure 1.2 – User classification steps [6]

The user classification will result into the creation of two groups: ordinary users and potential attackers. If we include an analysis of hackers’ behavior in the log analysis, then we will be able to conduct an analytical assessment of potential attackers. The detection of abnormal user behavior through the log analysis allows to detect real-time attacks [10].

(37)

Advantages:

• Detection of attacks in real time if there are known attack signatures [17].

Disadvantages:

• The attacks of such type [19] can’t be fully detected. After being launched, that will not allow full detection of attacks of this type [3].

• The real-time log analysis will require powerful computing resources if the system has a lot of peripherals. However, many companies ignore the Critical Log Information analysis because of budgetary constraints8.

1.5

Formalization of problem

1.5.1 Introduction of our approach on the basis of a simplified example

In this section we will attempt to clarify the application of our approach by referring to a simplified example. Figure 1.3 illustrates the example of a simplified website administration on a dedicated cloud server (system S).

8

(38)

Figure 1.3 – W ebs ite administrati on on a cloud dedicated ser ver

(39)

The list below contains a general set of components for website administration on a dedicated server :

1. Workplace a) Firewall; b) Anti-virus;

c) Logging/reporting by operating system (OS) and anti-virus. 2. Router (for connection to the Internet Service Provider (ISP)); 3. Content Delivery Network (CDN);

4. Cloud Virtual Private Server (VPS); a) Firewall;

b) Load balancer (e.g. Gobetween, Nginx, etc.); c) Proxy server (e.g. Squid, Varnish, etc.); d) Web server (e.g. Apache, IIS, etc.);

e) Statics files (e.g. images, CSS, JS, etc.); f) Database (e.g. MySQL, MSSQL, etc.); g) Logging/reporting by OS.

We will refer to this example later to clarify different steps within the framework of our approach.

1.5.2 Formalization of the Problem

For consideration of this formalization should be noted:

• Calligraphic letter denotes a domain, for example A; • Capital letter - a subset element, for example A; • Small letter - an element, for example a.

The initial step is aimed at reviewing the existing system S. The list of incoming data components is presented below:

(40)

• BP: represents a set of business processes. The sequence of bp0, bp1, ...comprises a set of meta-variables applied by us through the entire BP. Thus, bp0 - stands for the content management in the workplace as Figure 1.3 illustrates (specific for our example). For each business process there are many assets involved in this business process, for example content management (bp0) use Application data (data bases) (a0), Electronic mail (E-mail) (a1), Local Area Network services (LAN services) (a10), Web editing Service (a27): bp0= {a0, a1, a10, a27}.

• A: represents a set of assets. The sequence of a0, a1, ...describes a set of meta-variables used by us, which are ultimately comprising A. The examples of assets include the Application data (data bases), Electronic mail (E-mail), Local Area Network services (LAN services), Web editing Service, Digital accounting control, etc.

• AN: describes a set of attribute names of A (set of assets). Availability (avl), Integrity (int), Confidentiality (cnf), Efficiency (eff) are attribute names applied by us to specify a security class of an asset a. For example, efficiency of the management process in order to comply to the legal, regulatory or contractual requirements, for the domains laws and regulations.

• AV: represents a set of attribute values. We are proposing to use 4 point scale (1 = Low, 2 = Acceptable, 3 = Inadmissible, 4 = Intolerable) for any asset’s attribute name (avl, int, cnf, eff) while assessing the degree of importance (the level of loss when the asset is lost) of each value.

• V al : represents a function1.1 of assets valuation.

V al : A × AN −→ AV (1.1)

In the example, provided by us, it is defined as V al(a0) = (4, 3, 3) and established for the following three attribute’s components (Name of asset a0 , Availability a0[0] = 4, Integrity a0[1] = 3, Confidentiality a0[2] = 3).

• D: describes the defense techniques applied to ensure the safety of assets A. The sequence of d0, d1, ... describes meta-variables applied through the range D. In our ex-ample, the following techniques were used to ensure the protection of the operating computer: a firewall, an antivirus and logging. Each security measure could be classi-fied based on its configuration (Conf) - the configuration affecting the system system functioning and performance.

• DN: represents a set of attribute names of D. The sequence of d

0n, d1n, ...describes a set of meta-variables through the range DN. The examples of attribute values of protection techniques include the following: IP address range, port number, service name, etc.

(41)

• DV: a set of attribute values of D. The sequence of d

0v, d1v, ... describes a set of meta-variables through the range DV. The examples of attribute values of protection techniques applied to ensure the assets safety: IP address range, port number, service name, etc.

• DC: a function1.2of each of the security components d.

DC : D × DN −→ DV (1.2)

For example, dc

f irewall - stands for firewall settings (open ports for used services, traffic filtering), dc

antivirusdeals with antivirus settings (update frequency and the system scan frequency), while dc

log encompasses the logging data for the logging system.

• T : stands for a set of time points around A. The sequence of t0, t1, ... describes a set of meta-variables used by us through T . Each time point t is an elementary time unit, under which it is possible enabling tohe determineation of the state of the system S. Usually, the initial system state corresponds to t0.

• EVT : represents a set of events describing A. The sequence of evt0, evt1, ... describes a set of meta-variables used by us through EVT . The examples of events include: transaction of application data (data bases), sending electronic mail, transferring files in local are a network, Web editing, etc. Each event has its own meta-variable, which is included in the EVT. These meta-variables could be distinguished by the event date t and the description of the event itself (forwarding by a link, opening a page, sending a message, etc.), indicating the asset of the event’s participant.

• LOG is classified as a sequence of lines, where each line represents an event with unique information (different details). Each of these lines is creates by a function 1.3. This is predetermined by the level of detail adopted in the system configuration.

LOG : A, T , AN, AV (1.3)

For example, the log file of the server’s web server can contain different information, depending on settings. More detailed, you can show one log file for the web server under Apache 2.4 control [2]:

127.0.0.1 - [10/Oct/2000:13:55:36 -0700] "GET /robot.txt HTTP/1.0" 200 2326

• Cost: represents a function1.4 that returns the price of protection measures.

Cost : DN× DC −→ R+ (1.4)

For our research purposes: Cost : (dn

(42)

To find out the cost of installing and configuring a firewall, you need to know its type, for example, software: built-in antivirus, either hardware implementation 9,

configura-tion cost (ready configuraconfigura-tion, setup time by an expert). An example is shown in the Figure 1.4.

Figure 1.4 – Representation of the use of firewall in the workplace

• V: describes a set of vulnerabilities. The sequence of v0, v1, ... describes a set of meta-variables through a range V.

• VN: stands for a set of attribute names (Metric) of V. The sequence of V

0n, V1n, ... describes meta-variables used by us through the range VNn. The examples of attribute name of vulnerabilities include10: transaction Base Score, Attack Vector, Attack

Com-plexity, Required Privileges, User Interaction, Scope, Confidentiality Impact, Integrity Impact and Availability Impact.

• VV: stands for a set of attribute values of V. The sequence of V

0v, V1v, ... describes meta-variables used by us through the range VNV. For example, MySQL Stored SQL Injection (CVE-2013-0375) [14] in table 1.1.

• SP: stands for the security policy. The security policy comprises a set of instructions (the sequence of sp0, sp1, ...describes meta-variables that range over SP) for implement-ing business processes. The security policy can be accepted and formalized “on paper” as a set of safety rules. However, there are times when the actual security policy is different from what is on paper. Therefore, we will consider two types of security policies, Real (SPr) and Theoretical (SPt). For example, the proper use of security measures D to ensure the safety of an acceptable level for collection of assets A or setting minimum re-quirements for the configuration of the protection system. The security policy describes which ports should be open to the firewall, antivirus updates frequency, time frames

9

Firepower 9000 Series. https://www.cisco.com/c/en/us/products/security/ firepower-9000-series/index.html

10

(43)

Table 1.1 – MySQL Stored SQL Injection (CVE-2013-0375) [14] AN (Metric) AV (Value) Comments

Attack Vector Network The attacker connects to the exploitable MySQL database via a network.

Attack Complexity Low Replication has to be enabled on the target database. Although disabled by default, it is com-mon for it to be enabled, meaning we are assuming the worst-case scenario.

Privileges Required Low The attack requires an account with the ability to change user-supplied identifiers, such as table names. Basic users do not get this privilege by de-fault, but it is not considered a sufficiently trusted privilege to warrant this metric being High.

User Interaction None

Scope Changed The vulnerable component is the MySQL server

database and the impacted component is a remote MySQL server database (or databases).

Confidentiality Impact Low The injected SQL runs with high privilege and can access information the attacker should not have ac-cess to. Although this runs on a remote database (or databases), it may be possible to exfiltrate the information as part of the SQL statement. The ma-licious SQL is injected into SQL statements that are part of the replication functionality, preventing the attacker from executing arbitrary SQL statements. Integrity Impact Low The injected SQL runs with high privilege and can modify information the attacker should not have ac-cess to. The malicious SQL is injected into SQL statements that are part of the replication function-ality, preventing the attacker from executing arbi-trary SQL statements.

Availability Impact None Although injected code is run with high privilege, the nature of this attack prevents arbitrary SQL statements being run that could affect the availabil-ity of MySQL databases.

(44)

for the system antivirus scan, logging detailing and the determination of location of the undertaken security measures (firewall at the entrance, antivirus and logging inside the system).

• The adopted risk assessment methodology (MEHARI, CobiT, etc.) RM is represented as a function of RM : A × D × SP −→ R, that returns the level of risk R of loss of assets A when using protection components D and, consequently, disruption of business processes BP.

Based on the above system input parameters, we can represent the initial state of the system S = (A, D). In the output stage we get the modified state of the system S0 = (A, D0), which satisfies the following requirements:

1. RM(A, D0)represents acceptable risk level for our system. 2. D0 = min

xDCost(D, Conf, x) stands for the selection of a configuration of a minimum cost that meets the requirements for an acceptable risk level.

Based on the input data, we are proceeding to formalizing the attack prediction function in "Oracle" 1.5.

Oracle : BP, A, D, LOG, SPr, RM −→ [0, 1] (1.5) where [0, 1] is the probability value of the event. Vdetected shows the flaws in the configuration Conf of the security system D (errors: developers, system design, passwords, malware and system vulnerabilities). Objectives of attacks: Challenge; Ego; Espionage; Ideology; Mischief; Money; Revenge.

Our contribution to the development of this field is as follows:

• Addition of Variables to “Oracul” Prediction Function 1.6:

Oracle : LOG, D, Vdetected, A, SPr, BP, X) −→ [0, 1] (1.6) We can interpret formula 1.6 as the probability of attacking the asset AttA under the condition of the system state as logi, D, Vdetected, ai, spi, X expressed by formula1.7.

Oracle : P (AttA|logi, D, Vdetected, ai, spi, X) −→ [0, 1] (1.7) • Identification of useful/informative parameters X for improving the forecasting function

(45)

• Creation of a chain of interconnections:

Attack type −→ Purpose of attack −→ Attacker’s profile −→ Attacker’s experience and behavioral pattern −→ Attack prevention measures to be undertaken.

For example, the system configuration change, system modification allocation of funds for the system protection or creation of a working schedule.

• Attempt to overcome an attack based on its type. We can select the types of attacks as follows:

– Espionage (collection of classified information or obtainment of unauthorized access to the information);

– Vandalism (corruption of information); – Denial of Service (DDoS);

– Interference in the equipment operation (civil or military); – Infrastructure control capturing.

• EX P - Experience, knowledge, tactics of the attacker side. • External threats:

– Threats in social networks messages or sent emails; – Text in backlinks;

– Text in backlinks to the attack target. • Trapping (HoneyPot):

– Fail2ban11, IPS, etc.;

– Pages of nonexistent administrators (as reconnaissance component); – Creation of markers that can influence the attack probability

Let us analyze the system description in more detail.

1.5.3 Asset valuation

The initial stage encompasses the consideration of assets. They are falling under definition of business processes that use assets (application data, data bases, personal office data, local area network services, common services, working environment, digital accounting control, etc.) A : a0. . . an, where n ∈ N .

11

Figure

Figure 0.1 – Research steps
Figure 1.1 – Prediction timeline [ 18 ]
Figure 1.4 – Representation of the use of firewall in the workplace
Table 1.1 – MySQL Stored SQL Injection (CVE-2013-0375) [ 14 ] A N (Metric) A V (Value) Comments
+7

Références

Documents relatifs

Although nitroaniline gives similar yields of the amination products with Au(III), Fe(III) and BF 3 as catalysts (Table 1), for other amines, NaAuCl 4 seems to be the best catalyst

The method was applied to temperature records of 24 meteorological stations in Belgium, and allowed to automatically correct more than 80% of all errors in both max/min

For the context of cybersecurity, risk analysis is focused on the evaluation of threats (e.g., deliberate cyber- attacks) that exploit system vulnerabilities that result in economic

Visualizing the uncertainty in the prediction of academic risk, specially in an interactive way, has the potential to im- prove the usefulness of this type of systems. Even

The gray boxes represent the directions for future work: adding linear functions in the Kriging means to improve the optimization step, comparing the upper confidence bound

If r = ∞, we would have a maximal affine immersion of H, whose boundary would project to a closed leaf containing p and there would be two saddle connections in the

A simple calculation based on the length of the combustion chamber and the overall ignition delay evaluates the mean spanwise flame velocity over the complete SP9 sequence at about

The main problem today is that there is no way to estimate the execution time in advance, for the user to decide the best time to execute the query he needs for his work.. This