Anatomy of a Remote Kernel Exploit

(1)

Dan Rosenberg

Anatomy of a Remote Kernel Exploit

(Dartmouth Edition)

(2)

Who am I?

▪ Security consultant and vulnerability researcher at Virtual Security Research in Boston

▫ App/net pentesting, code review, etc.

▫ Published some bugs

▫ Rooted a few Android phones

▫ Focus on Linux kernel

▫ Research on kernel exploitation and mitigation

(3)

Agenda

▪ Motivation

▪ Challenges of remote exploitation

▪ Prior work

▪ Case study: ROSE remote stack overflow

▫ Exploitation

▫ Backdoor

▪ Future work

(4)

Why am I giving this talk?

Motivation

(5)

Why Remote Kernel Exploits?

▪ Instant root

▫ No need to escalate privileges

▪ Remote userland exploitation is hard!

▫ Full ASLR + NX/DEP

▫ Sandboxing

▫ Reduced privileges

(6)

Goals of This Talk

▪ Explore operating system internals from perspective of an attacker

▪ Discuss kernel data structures and subsystems

▪ Exploit development methodology

▪ Individual bugs vs. exploit techniques

(7)

Wait, so you mean this is kind of hard?

Challenges of Remote Kernel

Exploitation

(8)

Warning: Fragile

▪ Consequence of failed remote userland exploit:

▫ Crash application/service, wait until restarted

▫ Crash child process, try again immediately

▪ Consequence of failed remote kernel exploit:

▫ Kernel panic, game over

(9)

Lack of Environment Control

▪ Typical local kernel exploit:

▫ Can trigger allocation of heap structures

▫ Can trigger calling of function pointers

▫ High amount of information leakage available to local users

▪ Remote kernel exploit:

▫ ?

(10)

Primer: Process vs. Interrupt Context

▪ Systems calls occur in “process context”:

▫ Kernel is executing code, but is associated with userland process

▫ Has credentials, network/filesystem namespace, etc.

▪ On Linux, asynchronous events (e.g. network data) occur in “interrupt context”:

▫ Network driver generates hardware interrupt

▫ Kernel dispatches data to appropriate softirq handler

▫ No userland process associated with execution

▫ On Linux, associated with softirqd kernel thread

(11)

Escape From Interrupt Context

▪ End goal: userland code execution (remote shell)

▫ How do we get there?

▫ No process backing execution

▪ Need to transition

▫ Interrupt context to process context to userland

(12)

Prior Work

What's been done before?

(13)

A Few Statistics

▪ 18 known exploits for 16 vulnerabilities

▫ 19 authors

▫ 9 with full public source code

▫ 3 with partial or PoC source

▪ Wide range of platforms

▫ Solaris and OS X still need some remote love

(14)

By Operating System

8

2

3 3

Windows Linux

*BSD

(15)

By Vulnerability Class

12

3 1

Stack Overflow Heap Overflow Array Indexing

(16)

By Year

1 2 3 4 5

(17)

Highlights

▪ Barnaby Jack: Step into the Ring 0 (August 2005)

▫ First publication on remote kernel exploitation

▫ Transition to userland and kernel backdoor

▪ Sinan Eren: GREENAPPLE (May 2006)

▫ First remote kernel exploit in Immunity CANVAS

(18)

Highlights (cont.)

▪ hdm, skape, Johnny Cache (November 2006)

▫ Broadcom, Dlink, and Netgear wifi drivers

▫ First remote kernel exploits in Metasploit

▪ Alfredo Ortega, Gerardo Richarte: OpenBSD IPv6 mbuf overflow (April 2007)

▫ First public remote kernel heap overflow

▫ Bypasses userland NX

(19)

Primer: NX (Non-Executable Pages)

▪ Pages have permissions: read, write, execute

▫ Initially, on Intel chips, page table entries only supported read and write flags

▫ Read implied executable

▪ Before long, realized this was a bad idea

▫ Malicious data can be executed as code!

▪ NX is implemented using 63^rd bit of page table entry:

▫ Natively supported on 64-bit platforms

▫ Supported on PAE CPUs (need hardware + software)

▫ Emulated in userland by kernel

(20)

Highlights (cont.)

▪ Kostya Kortchinsky: MS08-001 (January 2008)

▫ Immunity CANVAS

▫ First publicized remote Windows kernel pool overflow

▪ sgrakkyu: sctp-houdini (April 2009)

▫ First remote Linux sl*b overflow

▫ Introduced vsyscall trick to transition from interrupt context to userland

(21)

Primer: Linux Virtual Syscalls

▪ On x86-64 machines, Linux supports “virtual syscalls”

▫ Three system calls that can be implemented entirely in userland: gettimeofday, getcpu, time

▪ Trapping to kernel mode is relatively expensive

▫ Check CPL, switch stack, store trap frame, reload %cs and %ss

▪ Faster to just stay in userland

▪ “vsyscall” page accomplishes this by mapping a page exported by the kernel into every userland process

(22)

So What Was That Trick?

▪ Sgrakkyu realized this is a good attack vector

▪ vsyscall page is a shadowed mapping: read-write version in kernel memory, read-execute in user memory

▪ In interrupt context, we can write into the kernel mapping of this page, overwriting a virtual syscall

▪ Now every userland process will execute our userland shellcode whenever they call a virtual syscall!

(23)

Observations

▪ Majority stack overflows, but none dealt with NX kernel stack

▫ Let's fix that

▪ No Linux interrupt context stack overflows

▫ sgrakkyu and twiz showed us how in Phrack 64, let's do it in real life

▪ Wireless drivers suck

▫ Six 802.11 remote kernel exploits

(24)

Building the Exploit

Or: How I Learned to Stop Worrying and

Love the Ham

(25)

Target: 32-bit x86 PAE Kernel

▪ Kernel has NX support (CONFIG_DEBUG_RODATA)

▫ Only enforced on PAE (32-bit) or 64-bit kernels

▪ Can't execute first-stage shellcode on kernel stack

▪ Can't introduce code into userspace without proper page permissions

▪ No vsyscall trick for easy transitions

(26)

Test Setup

▪ Attacker and victim VMs (Ubuntu 10.04)

▪ Debugging using KGDB over virtual serial port (host pipe)

▪ BPQ (AX.25 over Ethernet)

▪ Except for glue code, exploit written entirely in x86 assembly

(27)

Famous Last Words

Debian Security Advisory DSA-2240-1:

Dan Rosenburg reported two issues in the Linux implementation of the

Amateur Radio X.25 PLP (Rose)

protocol. A remote user can cause a denial of service

denial of service by providing

specially crafted facilities fields.

(28)

Intro to ROSE

▪ Rarely used amateur radio protocol

▪ Provides network layer on top of AX.25's link layer

▪ Uses 10-digit addresses and AX.25 callsigns

▪ Static routing only

(29)

CVE-2011-1493

▪ On initiating a ROSE connection, parties exchange facilities (supported features)

▪ FAC_NATIONAL_DIGIS allows host to provide list of digipeaters

▪ Parsing for this field reads length value from frame and copies digipeater addresses without bounds checking, causing a stack overflow

(30)

Sad Code :-(

...

l = p[1];

...

else if (*p == FAC_NATIONAL_DIGIS) { fac_national_digis_received = 1;

facilities->source_ndigis = 0;

facilities->dest_ndigis = 0;

for (pt = p + 2, lg = 0 ; lg < llg < l ; pt += AX25_ADDR_LEN, lg += AX25_ADDR_LEN) { if (pt[6] & AX25_HBIT)

memcpymemcpy(&facilities->dest_digis[facilities->dest_ndigis++], pt, AX25_ADDR_LEN);

else

memcpymemcpy(&facilities->source_digis[facilities->source_ndigis++], pt, AX25_ADDR_LEN);

} } ...

(31)

Constraint #1

▪ The seventh byte of an AX.25 address is AND'd with AX25_HBIT (0x80) if it's a destination digipeater

▫ Otherwise, treated as a source digipeater

▪ Every seventh byte of our payload needs to be

consistently greater or less than 0x80, or we'll copy into the wrong array

▪ Requires manual tweaking

(32)

Plan of Attack

Get EIP

Unrestricted code execution

Install kernel backdoor

Restore and

(33)

Triggering the Bug

▪ Fairly trivial

▪ Modify ROSE facilities output functions to craft frame with overly large length field for

FAC_NATIONAL_DIGIS, followed by lots of NOPs (0x90)

Get EIP Get EIP

Restore and recover

(34)

Evil ROSE Frame

ROSE header

Facilities Total Length =

XX

0x00 FAC_NATIONAL FAC_NATIONAL_DIGIS len =

0xff 0x9090...

(35)

Got EIP

▪ Recompile ROSE module, reload, and use rose_call to initiate connection to target

▪ Overflowed softirq stack (interrupt handler)

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 1456]

0x90909090 in ?? () (gdb) i r

eax 0x0 0

ecx 0xde3a5f3c -566599876 edx 0x296 662

ebx 0x90909090 -1869574000 esp 0xd11e199c 0xd11e199c ebp 0x90909090 0x90909090 esi 0x90909090 -1869574000 edi 0x90909090 -1869574000 eip 0x90909090 0x90909090

eflags 0x10286 [ PF SF IF RF ] cs 0x60 96

ss 0x68 104

ds 0x9090007b -1869610885 es 0x9090007b -1869610885 fs 0xffff 65535

gs 0xffff 65535

(36)

How to Execute Code?

▪ Traditionally, return into shellcode on stack

▪ Problem 1: we don't know where we are

▫ Trampolines are easy

▪ Problem 2: softirq stack is non-executable

Get EIP

Unrestricted code Unrestricted code

execution execution

Restore and

(37)

Primer: Registers

▪ x86-32 has several general purpose registers:

▫ %eax, %ebx, %ecx, %edx, %esi, %edi

▪ Some have “traditional” uses

▫ %eax is return code

▫ %ecx is a counter

▫ %esi/%edi are source and destination of copy

▪ Special registers: %esp (stack pointer), %ebp (frame pointer), %eip (instruction pointer)

(38)

Primer: Calling Convention

▪ How do we invoke functions?

▫ Traditionally, put arguments on stack (%esp), and issue a “call” instruction

▪ Different in kernel mode:

● First argument in %eax

● Second in %edx

● Third in %ecx

(39)

Primer: ROP

▪ We control the return address and data at %esp

▪ Each return will direct execution to address at stack pointer and increment it

▪ Chain together function epilogues (“gadgets”) to perform arbitrary computation

▪ Relies on homogeneity of distribution (binary) kernels and lack of randomization

▫ Choose gadgets that are more likely to appear in constant locations across kernels

(40)

Making our Stack Executable

▪ Kernel has nice function to do this for us:

▫ set_memory_x()

▪ Calling convention has arguments in registers

▪ ROP stub steps:

▫ Load (%esp & ~0xfff) into %eax

▫ Load 4 into %edx

static unsigned long rop_stub[] = { /*1*/ PUSH_ESP_POP_EAX,

/*4*/ 0xffffffff, 0xffffffff, /*3*/ 0xffffffff, ALIGN_EAX, /*2*/ 0xffffffff, 0xffffffff, /*1*/ RET,

/*4*/ POP_EDX, 0x00000004, /*3*/ 0xffffffff, 0xffffffff, /*2*/ 0xffffffff, 0xffffffff,

(41)

Overcoming Space Constraints

▪ We now have traditional shellcode executing on the softirq stack!

▪ Problem: length is limited to 0xff (255), minus what we've already used

▪ Not enough room for a useful payload

Get EIP

Unrestricted code Unrestricted code

execution execution

Restore and recover

(42)

Needle in a Haystack

▪ Full ROSE frame is intact somewhere on the kernel heap

▪ Pointer to a memory region containing our socket data lives on the stack

▪ Walk up the stack, following kernel heap pointers

▪ Search general area for tag included in ROSE frame

(43)

What Now?

▪ We can execute arbitrary-length payloads now!

▪ Goal: install kernel backdoor in ICMP handler

Get EIP

Install kernel Install kernel

backdoor backdoor

Restore and recover

(44)

Primer: Linux Networking

▪ What happens when network data is received?

▪ Hardware magic happens, driver layer

(linux/drivers/net) receives low-level frame

▪ Driver identifies “this is an IP packet”, sends to network layer (linux/net/ipv{4,6})

▪ Network layer checks “what protocol is this” (TCP,

UDP, ICMP, etc.) and dispatches to appropriate protocol handler (linux/net/*)

(45)

Protocol Handlers

/* Array of network protocol structure */

const struct net_protocol __rcu

*inet_protos[MAX_INET_PROTOS] __read_mostly;

/* Definition of network protocol structure */

struct net_protocol {

int (*handler)(struct sk_buff *skb);

void (*err_handler)(struct sk_buff *skb, u32 info);

...

};

/* Standard well-defined IP protocols. */

enum {

IPPROTO_IP = 0, /* Dummy protocol for TCP */

IPPROTO_ICMP = 1, /* Internet Control Message Protocol */

...

(46)

Hooking ICMP

▪ Storage on softirq stack

▫ Already executable, safe, persistent

▪ Copy hook and address of original ICMP handler

▫ We'll need this later

▪ Handler is in read-only memory

▫ Flip write-protect bit in %cr0 register

▪ Write address of our hook into ICMP handler function

(47)

Hooked In

icmp_rcv:

<icmp_rcv>: push ebp

<icmp_rcv+1>: mov ebp,esp <icmp_rcv+3>: push edi <icmp_rcv+4>: push esi ...

hook:

<hook>: push edi <hook+1>: push esi <hook+2>: push ebx <hook+3>: push eax ...

IPPROTO_IP IPPROTO_ICMP

...

handler err_handler

...

inet_protos:

net_protocol:

(48)

Time to Rebuild...

▪ We've destroyed large portions of the softirq stack

▪ How can we keep the kernel running?

Get EIP

Restore and Restore and

(49)

Cleaning Up the Locks

▪ ROSE protocol is holding two spinlocks

▫ If we don't release

these, the ROSE stack will deadlock soon

▪ Problem: ROSE is a

module, we don't know where the locks live

(50)

Needle in a Haystack, Again

▪ Global modules variable: linked list of loaded kernel modules

▪ A plan!

▫ Follow linked list until we find ROSE module

▫ Read module structure, find start of .data section

▫ Scan .data section for byte pattern of two consecutive spinlocks (distinctive signature)

▫ Release them

(51)

Preemption Woes

▪ Preemption count must be consistent with what the kernel is expecting, or scheduler will...

...complain and fix it for you?!

if (unlikely(prev_count != preempt_count())) {

printk(KERN_ERR "huh, entered softirq %u %s %p"

"with preempt_count %08x,"

" exited with %08x?\n", vec_nr,

softirq_to_name[vec_nr], h->action, prev_count, preempt_count());

preempt_count() = prev_count;

}

▪ Let's avoid that warning...

(52)

Has Anybody Seen a Preemption Count?

▪ Preempt count lives at known location in thread_info struct, at base of kernel stack:

struct thread_info {

struct task_struct *task; /* main task structure */

struct exec_domain *exec_domain; /* execution domain */

__u32 flags; /* low level flags */

__u32 status; /* thread synchronous flags */

__u32 cpu; /* current CPU */

int preempt_count;int preempt_count; /* 0 => preemptable, <0 => BUG */

...

};

(53)

Unwinding the Stack

▪ Stack is partially

corrupted from overflow

▪ Need to restore it to recoverable state

▪ Walk up stack from

current location until we match a signature of a known good state

▪ Adjust ESP to good state, and return

Overflow Unwind to

frame boundary

(54)

Refresher: What Have We Achieved?

▪ Trigger the overflow, gain control of EIP

▪ Leverage ROP to mark softirq stack executable, jump into shellcode

▪ Search for intact ROSE frame on kernel heap, mark executable, jump into it

▪ Install kernel backdoor by hooking ICMP handler

(55)

Kernel Backdoors for Fun and Profit

(Insert “backdoor” joke)

(56)

What About That Backdoor Part?

▪ Whenever an ICMP packet is received, our hook is called

▪ Check for magic tag in ICMP header

▪ Two distinct types of packets

▫ “Install” packets contain userland shellcode

▫ “Trigger” packets cause shellcode to execute

(57)

Backdoor Strategy

▪ Problem: ICMP handler also runs in softirq context

▫ Want userland code execution

▪ Phase 1: transition to kernel-mode process context

▪ Phase 2: hijack userland control flow

(58)

Backdoor Phase 1

▪ Check for magic tag and packet type

▪ If “install” packet, copy userland payload into safe place (softirq stack)

Install userland Install userland

payload payload

Hook system call

Continue execution

(59)

Transition to Process Context

▪ If “trigger” packet, need to transition to process context

▪ Easiest way: hook system call

Install userland payload

Hook system call Hook system call

Continue execution

(60)

Primer: System Calls

▪ Userland process invokes a system call (read, write, fork, etc.)

▪ Traditional mechanism is int 0x80 (more recently everything uses systenter/syscall)

▪ Index into Interrupt Descriptor Table, check privileges

▪ Invokes handler specified by IDT (syscall entry point)

(61)

System Call Hijacking

▪ How to find system call table at runtime?

▫ sidt instruction retrieves IDT address

▫ Find handler for INT 0x80 (syscall)

▫ Scan function for byte pattern calling into syscall table

▪ Read-only syscall table

▫ More flipping write-protect bit in %cr0

▪ Store original syscall handler for later, write address of hook into syscall table

(62)

Carry On...

▪ Want working ICMP stack

▪ Call original ICMP handler

Install userland payload

Hook system call

Continue execution Continue execution

(63)

Backdoor Phase 2

▪ We've copied userland payload to kernel memory

▪ Some process comes along and calls our hooked system call...

▪ Need to hijack process for userland code execution

(64)

Only Root, Please

▪ Only interested in root processes

▪ How to verify?

▫ thread_info →

task_struct → cred

▫ Unstable, annoying...

Check root privileges Check root privileges

Inject userland payload

Divert userland execution

Continue execution

(65)

System Calls from Kernel Mode?

▪ System calls are extremely useful abstractions

▫ Friendly interface, kernel does most of the work

▪ Poll: is it possible to call system calls via INT 0x80 from kernel mode?

▫ Tally your votes...

(66)

System Calls from Kernel Mode!

▪ Most system calls will work when called from kernel

▪ Stack switch only occurs on inter-PL interrupts

▫ Based on CPL vs. DPL of GDT descriptor

▫ Happens on int and iret

▪ When called from kernel mode, just an ordinary intra- PL interrupt

(67)

Exceptions (No Pun Intended)

▪ Doesn't work quite right with some system calls

▫ Some require pt_regs (per-thread register) structure

▫ Assumptions about state of stack at time of system call

▪ fork, execve, iopl, vm86old, sigreturn, clone, vm86, rt_sigreturn, sigaltstack, vfork

(68)

Checking for Root

▪ Easy: load %eax with 0x18 (getuid), INT 0x80

▪ Check %eax (return code) for 0

▪ If not zero, call original syscall handler for hooked function

▪ If zero, unhook syscall and continue payload

(69)

Lethal Injection

▪ Kernel stack contains pointer to saved

userland %esp

▪ Copy userland payload from kernel memory to userland stack

Check root privileges

Inject userland Inject userland

payload payload

Continue execution

(70)

Let it Run...

▪ Userland stack is non- executable (NX)

▪ Call mprotect syscall via INT 0x80 to mark

userland stack executable

Inject userland Inject userland

payload payload

Continue execution

(71)

It's a Diversion!

▪ Need to redirect

userland control flow

▪ Kernel stack contains pointer to saved

userland %eip

▪ Give original saved %eip to userland shellcode for later

▪ Overwrite pointer with address of payload on

Divert userland Divert userland

execution execution

Continue execution

(72)

Keep on Running

▪ Want hijacked process to keep running

▪ Jump to original handler for hijacked system call

Continue execution Continue execution

(73)

Userland Payloads

▪ Use your imagination!

▫ Connect-back root shells work just fine

▪ Payloads are prefixed with stub that keeps hijacked process running

▫ Fork new process

▫ Child runs shellcode

▫ Parent jumps to original saved %eip

(74)

ROSE Exploitation Demo

(75)

Future Work

No, this isn't a perfect exploit.

(76)

Hard-Coding

▪ Advantages over signatures / fingerprinting

▫ Reliability vs. portability

▪ On PAE kernel, ROP gadgets seem unavoidable

▫ Minimize number of ROP gadgets

▫ Minimize hard-coding of other data structures

▪ On non-PAE kernel, situation is better

▫ Can survive with one JMP ESP (if you know saved EIP offset)

(77)

Future Work: Offense

▪ Remote fingerprinting of kernel

▫ Automatic generation of ROP gadgets

▪ Exploiting other packet families

▫ IrDA, Bluetooth, X.25?

▪ Finding that TCP/IP bug that breaks the Internet

(78)

Future Work: Defense

▪ Randomize kernel base at boot

▫ Prevents code reuse (e.g. ROP) remotely in absence of remote kernel memory disclosure

▪ Fuzz and audit networking protocols more rigorously

▪ Inline functions that alter page permissions directly (prevent easy ROP)

▪ Policies on preventing page permission modification after initialization

(79)

Thanks To...

▪ Ralf Baechle

▪ Nelson Elhage

▪ Kees Cook

▪ twiz, sgrakkyu

(80)

Questions?

E-mail: [email protected] Twitter: @djrbliss

Company:

http://www.vsecurity.com

Personal:

http://www.vulnfactory.org Exploit code: