• Aucun résultat trouvé

Research Internship Proposal 2019-2020 Subject

N/A
N/A
Protected

Academic year: 2022

Partager "Research Internship Proposal 2019-2020 Subject"

Copied!
3
0
0

Texte intégral

(1)

Research Internship Proposal 2019-2020 Subject: Flow Records Anonymization for Secure Outsourcing

Supervisors: Mohamad Nassar1, Bechara Al Bouna1, Christian Salim2, Nathalie Mitton2

1: Ticket Labs - Université Antonine - Liban.

2: Inria Lille - Nord Europe - France.

Internship conditions

Duration: 5 months starting March/April 2020

Deadline to Apply: January 30, 2020

Location: Inria, 40 Avenue Halley 59650 Villeneuve d’Ascq, France

Hosting Team: FUN Team (Future Ubiquitous Networks) - Inria Description

Monitoring and intrusion detection of present-day networks require increasing storage and costly computation resources. For example, for a university network that has a 10 Gbps optical Internet connection with an average load of 650 Mbps and peaks up to 1.0 Gbps, several hundred million flows are exported per day. The scalability of many monitoring and intrusion detection schema is put in question. Exchanging data over a wireless network is even more challenging because of limitations of the medium and end-devices. It endangers corporate and individual privacy requiring specialized service agreements and data protection. Routing in wireless networking is also different than in traditional routing since IP can not be applied directly, neither for routing nor for addressing. Specific routing and addressing have thus been designed for wireless constrained networks such as RPL [1] or OLSR [2] on one hand and 6LowPan [3] on the other hand.

Indeed, with wider dissemination come greater risks to the privacy of the users of networks under measurement, and to the security of these networks. While it is not a complete solution to the issues posed, anonymization (i.e., the deletion or transformation of information that is considered sensitive and that could be used to reveal the identity of entities involved in a communication) is an important tool for privacy protection within network measurement infrastructures. Research on network trace anonymization techniques and attacks against them is ongoing [4]. Still, there is an increasing evidence that anonymization applied to network trace or flow data on its own is insufficient for many data protection applications. Another idea is to use differential privacy as an extra level of indirection between the actual database and the user. Differential privacy receives a query from the user and alters the answer of the query in a well-designed way to prevent privacy leak.

The main objective behind this project is to provide a framework for private analysis of data flows sent over a wireless network. For example, we prevent an adversary from asserting that endpoint X contacted endpoint Y at time T while guaranteeing at the same time a “relevant” flow data analysis to the data analyst. The analysis can be mere statistical or more involved training of a machine learning algorithm.

In term of literature, the student should go through current anonymization techniques (e.g. Truncation, Reverse Truncation, Permutation, Direct Substitution, etc.) for secure outsourcing of wireless flow data in order to analyze their security properties and robustness against different kinds of recovery attacks.

In result, the student has to motivate the need of differential privacy and argue about its advantages.

(2)

The student may resort to one of the differential privacy generalizations such as blowfish privacy, if needed.

The deployment of the private flow analysis may be based on one of the following architectures:

1) Cloud computing: in this case, the data should be collected and altered in a way to prevent further privacy leakage before being released to the cloud. Anonymization seems the best technique for this scenario. Also, some differential privacy algorithms allow working directly with a sanitized data release rather than at the level of the queries. Once the data is differentially sanitized and released, the analysts can publicly query it.

2) Federated learning [7]: in case the network flows are coming from several flow collectors across the monitored network, it might be a good idea to keep the data where it is and displace the query (or the machine learning model in case a machine learning model is to be trained) to the flow collectors.

Let’s say a user wants to train a classification model using machine learning to classify benign vs.

malicious flows. In the same time, the user is not allowed to reveal any private information about the network users, their unique addresses (MAC, private, Ipv6), the visited sites, etc. In FL, the data never leaves the client nodes, but instead, the model is stored at a server and sent to the devices participating in the federation, then re-trained with the local data at each node. It is also common that the server opens a “round of training” and asks the data owners (clients) for summaries of their local data. The clients respond to these queries and send updates to the server. At this level, differential privacy or more generally Blowfish Privacy [8] is to ensure a meaningful privacy guarantee. This might not be possible over a wireless network because of network and node limitations. There would thus be a need to adapt these techniques to out specific context.

3) Blockchain

The student has to decide on one or more scenarios and make a set of requirements. The student then selects the most appropriate framework based on the predefined set of requirements. Finally, the student should perform a set of experiments to test the privatization technique’s efficiency on real data using network analysis tools.

Concerning Blowfish Privacy, the student should:

 Understand the Blowfish Privacy inequation.

 Understand the Blowfish Privacy Policy: discriminative secret graph G and constraints Q.

 Prove that his/ her new approach respects the Blowfish Privacy inequation (Compute the relaxation parameter if necessary).

 List and explain the constraints Q in this scenario.

 Define the discriminative pairs and their graph G in this scenario.

Skills Required

Advanced knowledge in programming is required with basic knowledge of network analysis and cryptography. Basis in wireless networking.

(3)

References

[1] J. Tripathi, J. C. de Oliveira and J. P. Vasseur, "A performance evaluation study of RPL: Routing Protocol for Low power and Lossy Networks," 2010 44th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, 2010, pp. 1-6. (https://tools.ietf.org/html/rfc6550)

[2] Optimized Link State Routing Protocol (OLSR) Clausen, Thomas Jacquet, Philippe Adjih, Cédric Laouiti, Anis Minet, Pascale Muhlethaler, PaulQayyum, Amir Viennot, Laurent

[3] The 6LoWPAN architecture Geoff Mulligan EmNets '07: Proceedings of the 4th workshop on Embedded networked sensorsJune 2007

[4] E. Boschi, B. Trammell, IP Flow Anonymization Support, RFC 6235, http://tools.ietf.org/html/rfc6235, ETH Zurich, May 2011

[5] Martin Burkhart, Dominik Schatzmann, Brian Trammell, Elisa Boschi, and Bernhard Plattner. The role of network trace anonymization under attack. SIGCOMM Comput. Commun. Rev. 40, 1 (January 2010), 5-11. DOI=10.1145/1672308.1672310 http://doi.acm.org/10.1145/1672308.1672310

[6] Abhinav Parate and Gerome Miklau. A Framework for Utility-Driven Network Trace Anonymization. University of Massachusetts, Amherst. Technical Report 2008.

http://people.cs.umass.edu/~miklau/pubs/techreport/parate08tracepub.pdf

[7] Ryffel, T., Trask, A., Dahl, M., Wagner, B., Mancuso, J., Rueckert, D. and Passerat-Palmbach, J., 2018. A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017.

[8] He, X., Machanavajjhala, A. and Ding, B., 2014, June. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 1447-1458). ACM.

Références

Documents relatifs

9 Registre des rapports circonstanciés de la commune Mangobo, exercice 2005.. Les méfaits du bruit sont considérables. Les bruits entraînent non seulement des altérations

Science popularization is important for the economic and social development because of the major role played by science and innovation in our societies and the need of social

In this work, we propose a generic approach implemented in a JavaScript module that takes as input a JSON file describing both the SPARQL query and the shape of the expected output

Figure 4-8: Memory used for network data structures and buffers 12 Figure 4-9: Simulated hit rates for small PCB lookup caches 13 Figure 4-10: Relative popularity of static

The purpose of this study is to investigate and analyze Azeri translation of English extraposition constructions involving copular verbs and followed by that–clause and

Third, we build upon our translation algorithm to develop a rewriting optimization that converts graph traversal queries into equivalent SPARQL queries that execute over RDF graphs

and a user may find only some dimensions and values to be interesting; so, the skyline operator should be combined with subspace projections (then called a subspace skyline [12])

To communicate the completion of an output response (4), the user process could issue an explicit system call; or, the Terminal Handler could assume completion when the