Deduplication Rate - Prototype Implementation

State of the Art

4.5 Prototype Implementation

4.6.3 Deduplication Rate

Our proposed solution aims at providing a robust security layer which provides confidentiality and privacy without impacting the underlying deduplication tech-nique. Each file is split into blocks by the client, who applies the best possible chunking algorithm. When encrypted data blocks are received by the MM, a hash of each block is calculated in order to compare them to the ones already stored.

This task is completely independent from the chunking technique used by clients.

Also, all the encryptions performed in the system do not affect the deduplication effectiveness since the encryption is fully deterministic. Therefore, ClouDedup provides additional security properties without having an impact on the dedupli-cation rate. Users can thus benefit from the additional security provided by the system without affecting the deduplication rate.

4.6.4 Security

We explained the main security benefits of our solution in section 4.2.4. We now focus on potential attack scenarios and possible issues that might arise.

Curious Cloud Storage Provider As stated in the threat model section, we assume that an attacker, like the malicious storage provider, has full access to the storage. If the attacker has only access to the storage, he cannot get any information. Indeed, files are split into blocks and each block is first encrypted with convergent encryption and then further encrypted with one or more secret keys using a deterministic encryption mechanism. As discussed in Chapter 3, deterministic encryption can effectively provide full confidentiality. Moreover, no metadata (file owner, file name, file size, etc.) are stored at the cloud storage provider. Clearly, thanks to this setup, the attacker cannot perform any dictionary attack on predictable files.

Compromised Metadata Manager A worse scenario is the one in which the attacker manages to compromise the metadata manager and thus has access to data, metadata and encrypted keys. In this case, confidentiality and privacy would still be guaranteed since block keys are encrypted with users’ secret keys and the gateway’s secret key. The only information the attacker can get are data similar-ity and relationships between files, users and blocks. However, as file names are encrypted by users, these information would be of no use for the attacker, unless he manages to find a correspondence with a predictable file according to its size and popularity. Also, as discussed in Chapter 3, deterministic encryption assures confidentiality even when used in conjunction with block-level deduplication. In-deed, ciphertext-only attacks based on the analysis of block frequency do not seem to be feasible in real scenarios.

Compromised GatewayThe system must guarantee confidentiality and privacy even in the unlikely event where the gateway is compromised. An additional en-cryption performed by the metadata manager before sending data to the storage provider will then enforce data protection since it also offers another encryption layer; therefore confidentiality is still guaranteed and offline dictionary attacks are not possible. On the other hand, if the attacker compromises the gateway, only online attacks would be possible since this component directly communicates with users. The effect of such a breach is limited since data uploaded by users are encrypted with convergent encryption, which achieves confidentiality for un-predictable files [5]. Furthermore, a rate limiting strategy put in place by the metadata manager can limit online brute-force attacks performed by the gateway.

Compromised Gateway and Metadata Manager In the worst scenario, the attacker manages to obtain all secret keys by compromising both the gateway

and the metadata manager. In this case, the attacker will be able to remove the two additional layers of encryption and perform offline dictionary attacks on predictable files. However, since data are encrypted with convergent encryption by users, confidentiality for unpredictable files is still guaranteed.

Malicious Users colluding with Metadata Manager Another interesting scenario that we discuss is the case in which one or more users collude with the metadata manager in order to circumvent the gateway and compromise confiden-tiality. In such a scenario a dictionary attack would work as follows: the malicious user generates a plaintext, encrypts it with convergent encryption and uploads it as usual. The gateway receives the upload request and encrypts all data blocks with its secret key. At this point, the metadata manager can easily check whether the file has already been stored, as it would do for deduplication, and send a feedback to the user. If the file exists, then the user knows that a given file has already been uploaded by another user. Such a simple attack may prove to be extremely effective since it may be used for discovering confidential information such as the pin code in a letter from a bank or a password in an email. However, we argue that this kind of attacks would not have a severe impact in real scenarios, the reason is twofold. First, such an attack would be perpetrated online, which means that the attack rate would be limited by the upload capacity of the user.

Moreover, similarly to the metadata manager, the gateway may easily prevent these attack by putting in place a rate limiting strategy. Second, in a real scenario all users would belong to the same organization and be authenticated using an internal and trusted strong authentication mechanism. In addition to that, due to the high number of upload requests, such an attack would likely leave traces and be detected promptly, therefore a potentially malicious user would be strongly discouraged.

External Attacker Finally, we analyze the impact of an attacker who attempts to compromise users and have no access to the storage. If an attacker only com-promises one or more users, he can attempt to perform online dictionary attacks.

As the gateway and the metadata manager are not compromised, the attacker will only retrieve data belonging to the compromised user thanks to the access control mechanism. Furthermore, as mentioned above, the gateway can limit such attacks by setting a maximum threshold for the rate with which users can send requests.

PerfectDedup

5.1 Introduction

ClouDedup achieves secure block-level deduplication at the cost of requiring a complex architecture where the most crucial encryption operation is delegated to a trusted component. Also, as discussed in the security analysis section, a Metadata Manager colluding with one or more users may easily circumvent the protection guaranteed by the additional encryption layer and successfully perform COF and LRI attacks.

Starting from these two drawbacks, we aim at designing a scheme, called PerfectD-edup, with a simpler architecture where users could autonomously assess whether a data block can be deduplicated by running a privacy-preserving protocol with an untrusted Cloud Storage Provider. Such an approach would have the additional and non-negligible benefit of allowing for client-side (source-based) deduplication, which brings bandwidth savings in addition to storage space savings. Thanks to the privacy-preserving protocol, PerfectDedup securely and efficiently combines client-side cross-user block-level deduplication and confidentiality against poten-tially malicious (curious) cloud storage providers without relying on a trusted entity with respect to the encryption operation. Unlike ClouDedup, this scheme also allows for client-side deduplication, meaning that a client can securely check whether a block is a duplicate before uploading and encrypting it.

Data Popularity In PerfectDedup, we propose to counter the weaknesses due to convergent encryption by taking into account the popularity [7] of the data

segments. Data segments stored by several users, that is, popular ones, are only protected under the weak CE mechanism whereas unpopular data segments that are unique in storage are protected under semantically-secure encryption. This declination of encryption mechanisms lends itself perfectly to efficient deduplica-tion since popular data segments that are encrypted under CE are also the ones that need to be deduplicated. This scheme also assures proper security of stored data since sensitive thus unpopular data segments enjoy the strong protection thanks to the semantically-secure encryption whereas the popular data segments do not actually suffer from the weaknesses of CE since the former are much less sensitive because they are shared by several users. Nevertheless, this approach raises a new challenge: the users need to decide about the popularity of each data segment before storing it and the mechanism through which the decision is taken paves the way for a series of exposures very similar to the ones with CE. The focus of schemes based on popularity then becomes the design of a secure mechanism to determine the popularity of data segments.

We suggest a new scheme for the secure deduplication of encrypted data, based on the aforementioned popularity principle. The main building block of this scheme is an original mechanism for detecting the popularity of data segments in a perfectly secure way. Users can lookup for data segments in a list of popular segments stored by the Cloud Storage Provider (CSP) based on data segment identifiers computed with a Perfect Hash Function (PHF). Thanks to this technique, there is no information leakage about unpopular data segments and popular data segments are very efficiently identified. Based on this new popularity detection technique, our scheme achieves deduplication of encrypted data at block level in a perfectly secure manner.

The advantages of our scheme can be summarized as follows:

• PerfectDedup allows for storage size reduction by deduplication of popular data;

• PerfectDedup relies on symmetric encryption algorithms, which are known to be very efficient even when dealing with large data;

• PerfectDedup achieves deduplication at the level of blocks, which leads to higher storage space savings compared to file-level deduplication [12];

• PerfectDedup does not require any coordination or initialization among users;

• PerfectDedup does not incur any storage overhead for unpopular data blocks;

Dans le document Deduplication of encrypted data in cloud computing (Page 130-135)