Countermeasures - The case of the data cache of x86 processors

2.4 The case of the data cache of x86 processors

2.4.7 Countermeasures

Countermeasures against cache attacks can be envisioned at three levels: di-rectly at the architecture or microarchitecture level, at the operating system or hypervisor level, and finally, at the application level.

2.4.7.1 Architecture or microarchitecture level

Instruction set Some changes to the instruction set itself can mitigate or completely remove side channels. For example, on ARM architecture, flushing a line is a privileged operation – contrary to the x86clflushinstruction –, thus removing the Flush+Reload attack vector for this architecture. Additionally, Intel introduced an extension to its x86 instruction set, called AES-NI [Int08].

This added new instructions to enable data encryption and decryption with the AES algorithm. These instructions have a fixed time – thus countering time-driven attacks – and remove memory accesses on key-dependent data – thus countering trace-driven and access-driven attacks. This countermeasure however implies a change in existing programs to use these instructions, and only works on supported hardware.

Secure cache designs Two general solutions are proposed regarding new secure cache designs: eitherremovingcache interferences, orrandomizingthem such that information about cache timings is useless to the attacker.

To remove cache interferences, Page [Pag05] proposed partitioning the cache architecture, as well as introducing changes to the instruction set. This approach however is quite heavy and causes performance degradations. Wang and Lee [WL06, WL07] proposed the Partition Locked Cache (PLcache). It

avoids cache interference by dynamically locking cache lines of sensitive pro-grams, thus preventing other processes from evicting them. After analyzing the security of PLcache [KASZ08], Kong et al. [KASZ09, KASZ13] proposed an improvement over its logic: preloading all critical data in cache before the beginning of the cryptographic operations. All critical data will thus be locked in cache. Domnister et al. [DJL⁺11] proposed to modify the replacement pol-icy in the cache controller. It statically reserves one or several ways for each hardware thread, without the need for the instruction set to change.

To randomize cache interferences, Wang and Lee [WL06, WL07] proposed the Random Permutation Cache (RPcache). It creates permutations tables so that the memory-to-cache-sets mapping are not the same for sensitive programs as for others. After analyzing the security of RPcache [KASZ08], Kong et al. [KASZ09, KASZ13] proposed two improvements over it. The first is the use ofinforming loadsto secure RPcache. Informing loads are special instructions that inform the software when a load misses in the cache. An exception can then be raised, and the exception handler can then load critical data such that future loads will hit the cache, eliminating time variations. The second is to use informing loads to change the permutations of RPcache when critical data misses the cache. Wang and Lee [WL08] introduced a third secure cache design, called Newcache, that also randomizes the interferences. It adds a level of indirection for memory-to-cache mapping, as well as a modified random replacement algorithm. Liu and Lee [LL13] investigated the security of Newcache, and proposed a modification to the replacement algorithm to counter specifically crafted attacks. Liu and Lee [LL14] proposed to change the filling policy of the cache, to de-correlate it from the demand. It means that instead of filling the cache for each miss, data populates the cache after being served to the processor, randomly within a configurable time window.

Prefetcher Fuchs and Lee [FL15] suggested to use disruptive prefetching in order to mitigate cache side channels. Prefetchers are traditionally studied regarding performance. In terms of security, the key idea is that the prefetcher adds noise to the original memory access sequence. By altering the prefetching policy, the cache behavior becomes less predictable. Their new prefetching policy prevents attacks on the L1 data cache, and has been tested against a Prime+Probe attack.

2.4.7.2 Operating system or hypervisor level

Isolation A first countermeasure at the system level – operating system or hypervisor – is to isolate the different virtual machine processes, so that

2.4. The case of the data cache of x86 processors they do not share the resources that are the cause of information leakage.

The most obvious solution is to only allow one virtual machine per physical machine. However, this removes the main benefit of virtualization in terms of performance and thus raises the cost of cloud computing. A more fine-grained solution is to physically isolate only previously annotated functions [BJB15].

Similarly to new cache designs, page coloring provides cache isolation, but operating at the software level [RNSE09, SSCZ11, KPMR12]. Other papers have a more relaxed isolation approach in software. Zhang et al. [ZR13]

proposed Düppel, that repeatedly cleans caches that are time-shared, e.g., the L1 cache. Varadarajan et al. [VRS14] investigate the role of the scheduler to limit the frequency of cross-VM interactions. Although it does not completely eliminate the possibility of such interactions, side channels require frequent measurements to be accurate. Changing the scheduler to limit the frequency of preemptions is thus a practical way of defeating side channels on private caches in virtualized environments. This countermeasure however needs to be evaluated against the newer cache attacks on the shared last-level cache.

Noise in timers Another solution is to use lower-resolution timers or remove them altogether [VDS11, MDS12]. Indeed, one condition for side channels is the ability for the attacker to perform fine-grained measurements to distinguish between, e.g., a cache hit and a cache miss. However, this solution does not account for legitimate uses of fine-grained timers.

Normalized timings Deterministic execution in cloud architecture is an al-ternative solution to remove timing channels [AHFG10, LGR13]. Similar to this idea, Braun et al. [BJB15] proposed modifying the OS to offer protection to sensitive functions, previously annotated. Here the key idea is that the execu-tion time is not entirely deterministic: only the key-dependent computaexecu-tions are time-padded, not external timing differences (e.g., OS scheduling, CPU frequency scaling).

2.4.7.3 Application level

Finally, countermeasures can be built directly at the application level. They are specific to each application, and require either changes in the compilation flow, or in the algorithms themselves. Since they focus on specific applications, they are not able to prevent all cache attacks, e.g., covert channels.

Compiler-based mitigations Compiler-based mitigations have the advan-tage to automate parts of the changes needed to prevent side channels. Cop-pens et al. [CVDD09] proposed to use automated compiler techniques to re-move key-dependent control flow and key-dependent timings in cryptographic software. The changes are made in the backend compiler, leveraging x86 con-ditional move instructions to eliminate branches. Subsequently, Cleemput et al.

[CCD12] investigated variable latency instructions. They evaluated changes to the backend compiler that either compensate this variable latency, or force the use of fixed latency operations. They conclude that these transformations incur a too high overhead when strong protection is required, thus making this solution not practical. Crane et al. [CHB⁺15] explored software diversity.

Their key idea is to create several clones of sensitive program fragments, that are functionally equivalent but differ in runtime characteristic. Then at run-time, the program dynamically and randomly chooses which control path to take.

Manual changes in applications Manual changes in applications are tedious since they require finding the source of the leak and to patch it manually, and they are dependent on the algorithm and implementation. For exam-ple, Brickell et al. [BGNS06] focused on AES and proposed compressed and randomized tables, as well as pre-loading cache lines. However, Blömer and Krummel [BK07] showed that these countermeasures are sometimes not suffi-cient. For AES, bitslice implementations avoid using table lookups [RSD06, K¨08, KS09], without any change in hardware.

They are nonetheless necessary in real-world critical software, since the developers cannot assume changes on the hardware or operating system. For instance, OpenSSL includes mitigations against Percival [Per05] attack¹, against Aciiçmez et al. [AGS07] branch prediction attack², and against Yarom et al.

[YB14] attack against ECSDA³. To that effect, methods that detect side channels in binaries can be used to find the source of information leakage and to close it [DFK⁺13, GSM15].

1See [Ope], “Changes between 0.9.7g and 0.9.7h”, 11 October 2005.

2See [Ope], “Changes between 0.9.8e and 0.9.8f”, 11 October 2007.

3See [Ope], “Changes between 1.0.1f and 1.0.1g”, 7 April 2014.

Dans le document Information leakage on shared hardware: Evolutions in recent hardware and applications to virtualization (Page 47-51)