[VulnWatch] Linux kernel i386 SMP page fault handler privilege escalation

From: Paul Starzetz (ihaquer_at_isec.pl)
Date: 01/12/05

  • Next message: customer service mailbox: "[VulnWatch] iDEFENSE Security Advisory 01.13.05 - Apple iTunes Playlist Parsing Buffer Overflow Vulnerability"
    Date: Wed, 12 Jan 2005 13:22:57 +0100 (CET)
    To: vulnwatch@vulnwatch.org, <bugtraq@securityfocus.com>

    Hash: SHA1

    Synopsis: Linux kernel i386 SMP page fault handler privilege escalation
    Product: Linux kernel
    Version: 2.2 up to and including 2.2.27-rc1, 2.4 up to and including
               2.4.29-rc1, 2.6 up to and including 2.6.10
    Vendor: http://www.kernel.org/
    URL: http://isec.pl/vulnerabilities/isec-0022-pagefault.txt
    CVE: CAN-2005-0001
    Author: Paul Starzetz <ihaquer@isec.pl>
    Date: Jan 12, 2005


    Locally exploitable flaw has been found in the Linux page fault handler
    code that allows users to gain root privileges if running on
    multiprocessor machine.


    The Linux kernel is the core software component of a Linux environment
    and is responsible for handling of machine resources. One of the
    functions of an operating system kernel is handling of virtual memory.
    On Linux virtual memory is provided on demand if an application accesses
    virtual memory areas.

    One of the core components of the Linux VM subsystem is the page fault
    handler that is called if applications try to access virtual memory
    currently not physically mapped or not available in their address space.

    The page fault handler has the function to properly identify the type of
    the requested virtual memory access and take the appropriate action to
    allow or deny application's VM request. Actions taken may also include a
    stack expansion if the access goes just below application's actual stack

    An exploitable race condition exists in the page fault handler if two
    concurrent threads sharing the same virtual memory space request stack
    expansion at the same time. It is only exploitable on multiprocessor
    machines (that also includes systems with hyperthreading).


    The vulnerable code resides for the i386 architecture in
    arch/i386/mm/fault.c in your kernel source code tree:

    [186] down_read(&mm->mmap_sem);

           vma = find_vma(mm, address);
           if (!vma)
                  goto bad_area;
           if (vma->vm_start <= address)
                  goto good_area;
           if (!(vma->vm_flags & VM_GROWSDOWN))
                  goto bad_area;
           if (error_code & 4) {
                   * accessing the stack below %esp is always a bug.
                   * The "+ 32" is there due to some instructions (like
                   * pusha) doing post-decrement on the stack and that
                   * doesn't show up until later..
    [*] if (address + 32 < regs->esp)
                         goto bad_area;
           if (expand_stack(vma, address))
                  goto bad_area;

    where the line number has been given for the kernel 2.4.28 version.

    Since the page fault handler is executed with the mmap_sem semaphore
    held for reading only, two concurrent threads may enter the section
    after the line 186.

    The checks following line 186 ensure that the VM request is valid and in
    case it goes just below the actual stack limit [*], that the stack is
    expanded accordingly. On Linux the notion of stack includes any
    VM_GROWSDOWN virtual memory area, that is, it need not to be the actual
    process's stack.

    The exploitable race condition scenario looks as follows:

    A. thread_1 accesses a VM_GROWSDOWN area just below its actual starting
    address, lets call it fault_1,

    B. thread_2 accesses the same area at address fault_2 where fault_2 +
    PAGE_SIZE <= fault_1, that is:

    [ NOPAGE ] [fault_1 ] [ VMA ] ---> higher addresses
    [fault_2 ] [ NOPAGE ] [ VMA ]

    where one [] bracket pair stands for a page frame in the application's
    page table.

    C. if thread_2 is slightly faster than thread_1 following happens:

    [ PAGE2 ] [PAGE1 VMA ]

    that is, the stack is first expanded inside the expand_stack() function
    to cover fault_2, however it is right after 'expanded' to cover only
    fault_1 since the necessary checks have already been passed. In other
    words, the process's page table includes now two page references (PTEs)
    but only one is covered by the virtual memory area descriptor (namely
    only page1). The race window is very small but it is exploitable.

    Once the reference to page2 is available in the page table, it can be
    freely read or written by both threads. It will also not be released to
    the virtual memory management on process termination. Similar techniques
    like in


    may be further used to inject these lost page frames into a setuid
    application in order to gain elevated privileges (due to kmod this is
    also possible without any executable setuid binaries).


    Unprivileged local users can gain elevated (root) privileges on SMP


    Paul Starzetz <ihaquer@isec.pl> has identified the vulnerability and
    performed further research. RedHat reported that a customer also pointed
    out some problems with the page fault handler on SMP about 20.12.2004
    and they already included a patch for this vulnerability in the
    kernel-2.4.21-27.EL release, however the bug did not make it to the
    security division.



    This document and all the information it contains are provided "as is",
    for educational purposes only, without warranty of any kind, whether
    express or implied.

    The authors reserve the right not to be responsible for the topicality,
    correctness, completeness or quality of the information provided in
    this document. Liability claims regarding damage caused by the use of
    any information provided, including any kind of information which is
    incomplete or incorrect, will therefore be rejected.


    A proof of concept code won't be disclosed now. Special thanks goes to
    OSDL and Marcelo Tosatti for providing a SMP testbed.

    - --
    Paul Starzetz
    iSEC Security Research

    Version: GnuPG v1.0.7 (GNU/Linux)

    -----END PGP SIGNATURE-----

  • Next message: customer service mailbox: "[VulnWatch] iDEFENSE Security Advisory 01.13.05 - Apple iTunes Playlist Parsing Buffer Overflow Vulnerability"