Use of random ranges to anonymise DNS queries

As an alternative to the use of anonymity infrastructures to increase the privacy of the DNS resolutions, we can consider the introduction of noise in the DNS queries. Although this proposal does not seem to have been widely studied, we can find some initial ideas presented by Zhaoet al.in [160]. In fact, the model presented by the authors is inspired by the PIR (Private Information Retrieval) techniques [42,107], used as a way to retrieve information from a database without revealing what information is wanted.

The approach presented by Zhaoet al.works as follows: a userU, instead of launching just a single query to the DNS serverN S, constructs a set of queriesQ{Hi}ⁿ_i=1. If we assume DNS queries of type A, the previous range of queries will include up to n different domain names to be resolved. The queryQ{H_i}will be the only one that includes the domain name desired byU. All the other queries in Q{H1}. . . Q{Hi−1}andQ{Hi+1}. . . Q{Hn}are chosen at random from a databaseDB. The authors claim that this very simple model increases considerably the privacy of userU queries. Indeed, the only information disclosed by userU to third parties (e.g., DNS serverN Sand possible attackers with either active or passive access to the channel betweenU andN S) is that the real queryQ{Hi}is within the interval[1, n]. Zhaoet al.presume that the probability to successfully predict queryQ{Hi}requested by userU can be expressed as follows: Pi = _n¹. We refer the reader to [160] for a more accurate description of the whole proposal.

However, we consider that the probability model presented in [160] is very optimistic. We believe that the degree of privacy offered by the model can clearly be degraded if we consider active attacks, in which an adversary is capable of interacting with the channel. Indeed, the approach does not address possible cases in which the resolution of queryQ{Hi}fails. In case of active attackers that can manipulate network traffic (e.g., by means of RST attacks [10] or sending suitable ICMP traffic [131]),

they could launch a blind attack against the resolution protocol. This attack is based on dropping the queryQ{Hi}— or its associated response. Since attackers do not know which is the query-response pair desired by the client, they will try to force a fail resolution of every queryQ{H_i}ⁿ_i=1 and theirs associated responses. If so, userU will be forced to restart the process and generate a new range of queries —i.e., requesting once againQ{Hi}. Depending on how this new range is managed, the degree of privacy estimated by the probabilistic model in [160] clearly decreases. LetQ_j{H_i}ⁿ_i=1be thej-th consecutive range exchanged for the resolution of the queryQ{Hi}, the probability of success for an attacker trying to guessQ{Hi}must then be defined as follows:

P_ij = 1

|Q1{Hi}ⁿ_i=1 ∩ Q₂{Hi}ⁿ_i=1 ∩ . . . ∩ Q_j{Hi}ⁿ_i=1|

Let us exemplify this privacy level reduction attack by using the following ideal scenario. We assume a query range size ofn= 3, a database of queriesDB={H1, H2, H3, H4, H5, H6}, a DNS serverN S, and a client desired query resolutionQ{H1}. In the first stage of the protocol (cf.Table4.2, Step 1), the client constructs a range query by choosingH₂andH₃fromDBat random, resulting onQ₁={H₁, H₂, H₃}.

Then, this range is sent toN Sand intercepted by the attacker. In this step, from the point of view of the attacker, we can consider that the guess probability isPi1= 1/n= 1/3. At this moment, we suppose that the attacker is able to lead a failed resolution ofQ{H₁}by manipulating the network traffic. Thus, the client is forced to construct (cf.Step 2) a new rangeQ2={H1, H2, H5}which includes againH1, andH2andH5are chosen randomly fromDB. When this new range is sent, the attacker can intercept it and calculate the intersection between the previous range and the current one, resulting on a privacy reduction, sinceQ₁∩Q₂ = {H1, H₂} and, consequently, P_i2 = 1/2. Finally, we can see how, if the attacker successfully forces again an incomplete resolution ofQ{H1}in Step 2, and intercepts the range Q3={H1, H6, H4}built and sent by the client in Step 3, the attacker can deduce the desired query by simply applying the same intersection strategy amongQ₂andQ₃.

Zhao et al. present in [161] a second approach intended to reduce the bandwidth consumption imposed by the previous model. The new approach also gets inspiration from PIR approaches. It relies indeed on the construction of two ranges Q1{Hi}ⁿ_i=1 andQ2{Hi}ⁿ⁺¹_i=1, where Hn+1 ∈ Q2 is the true query defined by userU. Once definedQ1andQ2, such ranges are sent to two independent serverN S1

andN S₂. Assuming the resolution of DNS queries of typeA, each server resolves every query associated with its range, obtaining all the associated IP addresses (defined in [161] asXi) associated to the query Hi. N S1computesR1=Pn

i=1⊗XiandN S2computesR2=Pn+1

i=1 ⊗Xi. BothR1andR2are sent to userU, who obtains the resolution associated toHn+1using the expressionXn+1 =R1⊗R2. As we can observe, the bandwidth consumption of this new approach is considerably smaller than the one in [160], since only two responses (instead ofn) are exchanged.

Step Range Intersection Guess prob.

1 Q1={H1, H2, H3} — Pi1 = 1/3 2 Q2={H1, H2, H5} Q1∩Q2={H1, H2} Pi2 = 1/2 3 Q3={H1, H6, H4} Q2∩Q3={H1} Pi3 = 1

Table 4.2:Intersection attack against Zhaoet al.protocol [160]

The main benefit of this last proposal, beyond the reduction of bandwidth consumption, is its achievement on preserving the privacy of the queries from attacks at the server side. However, it presents an important drawback due to the necessity of modifying DNS protocol and associated tools.

Let us note that the proposal modifies the mechanisms for both querying the servers and responding to the clients. Moreover, it still presents security deficiencies that can be violated by means of active attacks against the communication channel between resolvers and servers. Indeed, attackers controlling the channel can still intercept both rangeQ1andQ2. If so, they can easily obtain the true query established by userU by simply applyingQ1\Q2 =Hn+1. Similarly, if attackers successfully intercept bothR1

andR₂ coming from serversN S₁ andN S₂, they can obtain the corresponding mapping address by performing the same computation expected to be used by userU,i.e., by computingX_n+1=R₁⊗R₂. Once obtain such a value, they can simply infer the original query defined by userU by requesting a reverse DNS mapping ofXn+1. Analogously, an active control of the channel can lead attackers to forge resolutions. Indeed, without any additional measures, a legitimate user does not have non-existence proofs to corroborate query failures. This is especially relevant on UDP-based lookup services, like the DNS, where delivery of messages is not guaranteed. Attacker can satisfactorily apply these kind of attacks by intercepting, at least, one of the server responses. An attacker can for example interceptR₁, computeR^∗₂ =R1⊗R3(whereR3is a malicious resolution), and finally send as a resulting response coming from serverN S2. Then, the resolver associated to userU will resolve the mapping address as follows:R₁⊗R^∗₂=R₁⊗R₁⊗R₃=R₃.

As an alternative to the approaches presented in [160,161], we propose to distribute the load of the set of ranges launched by userU among several serversN S1. . . N Sm. Unlike the previous schemes, our approach aims at constructing different ranges of queries for every serverN S1. . . N Sm. The ranges will be distributed fromQ{H₁^{N S}¹}. . . Q{H^{N S}n ¹

}toQ{H₁^{N S}^m}. . . Q{H^{N S}n ^m m

}. When the responses associated to these queries are obtained from the set of servers, userU verifies that the desired query has been successfully processed. If so, the rest of information is simply discarded. On the contrary, if the query is not processed,i.e., userU does not receive the corresponding response, a new set of ranges is generated and proposed to the set of servers. To avoid the inference attack discussed above, ranges are constructed on independent sessions to preserve information leakage of the legitimate query. Let us note that by using this strategy, we preserve privacy of queries from both server and communication channel. In order to guarantee integrity of queries, authenticity of queries, and non-existence proofs, our proposal relies moreover on the use of the DNS security extension DNSSEC. The formal description of our proposed protocol is the following one:

– Tn

j=1Qj =∅

• UserU concurrently and randomly sends each rangeQ_jto a different serverN Sw ∀w∈[1, m]

with DNSSEC extensions enabled.

• UserU verifies that all the responses have been properly received and their DNSSEC signatures are correct. Otherwise, the failed queries are retried until the responses are received and their signatures are correct, or until a certain number of retriesRare achieved. In that case, the user is warned and the whole protocol is aborted.

• UserU discards all those resolutions that are not associated toQ^∗{H}.

Dans le document Contributions to Privacy and Anonymity on the Internet (Page 68-71)