Announcing Userland Exec

From: the grugq (grugq_at_hcunix.net)
Date: 01/01/04

  • Next message: Thomas Walpuski: "multiple payload handling flaws in isakmpd, again"
    Date: Thu, 01 Jan 2004 19:02:26 +0000
    To: bugtraq@securityfocus.com, full-disclosure@lists.netsys.com
    
    
    

    Hey,

    This is an implementation of userland exec, that is, code which replaces
    the current process image with a new one. The accompaning paper explains
    the design and implementation of this code. The full src code is also
    included.

    peace,

    --gq

    
    

            The Design and Implementation of Userland Exec

                            by the grugq

    Abstract:

    This paper examines a mechanism to execute a binary without the use of the
    syscall execve(). The design for this mechanism is simple: clean up the
    address space; check for, and load, the dynamic linker; load the binary; set
    up the stack; locate the entry point, and transfer control of execution. One
    implemenation of this mechanism is used to illustrate this design.

    Introduction:

    Userland exec replaces the existing process image within the current address
    space with a new one. In this, userland exec mimics the behavior of the
    system call execve()*. However, because it operates in userland, the kernel
    process structures which describe the process image remain unchanged. This
    means that the process name reported by system utilities will be the old
    process name, etc.

    The ability to load a new process image without the direct aid of the kernel
    is important in many scenarios. For example: a program (e.g. shellcode) could
    load a binary off the wire and execute it without first creating a copy on
    disk; or, a program could extract a binary from an encrypted data store and
    execute it without creating a plain text image on the disk. Userland exec is
    useful for any situation where it is preferable not to create a file on the
    disk when executing a program.

    This paper will describe the design and implementation of the userland exec
    code. First it will discuss how execve() creates a new process image, and what
    this process image looks like (including the stack layout). Based on this
    knowledge the design of the userland exec code will be examined, using the
    userland exec implementation to illustrate concepts.

    *) This implementation of userland exec is specific to ELF binaries on i386
    Linux systems, however the methodology outlined is universal across Unix-like
    systems.

    Background:

    When a Unix program wants to execute another program it will use the execve()
    system call to load the new program as a process image (creating the new
    process via fork() is ignored here). The man page for the exec family of
    function calls (execl() and the other wrappers for execve()) describes what
    happens: "[t]he exec family of functions replaces the current process image
    with a new process image." A more complete picture of what happens when a
    process image is replaced can be found in the execve() man page:

           execve() does not return on success, and the text, data,
           bss, and stack of the calling process are overwritten by
           that [sic] of the program loaded.

    Clearly, the program which calls execve() is taken out of memory and the new
    program loaded. When this happens, the .text and .data segments as well as the
    heap and the stack are replaced. Any implementation of userland exec would
    have to remove the old process' text, data and heap, and reset the stack.

    To design a userland exec implementation, a more detailed description of how
    execve() operates is required. This more detailed description of how a new
    process image is created from an ELF binary would be:

    First, execve() cleans the address space of the current process by unmapping
    all of the pages of memory which form the existing image. If the executable is
    dynamically linked -- that is, if it uses run time libraries provided by the
    system -- the dynamic linker is loaded into memory. The main ELF executable is
    then mapped into the process address space. The stack is initialized with the
    environmental variables and the other parameters to main(). The stack is also
    initialized with arguments for the dynamic linker to allow it to locate the
    binary in memory. Finally, the kernel determines the correct entry point:
    either the dynamic linker, if it is loaded, or the main binary, if it is
    not.The kernel then transfers execution to this entry point.

    This more detailed description provides a blueprint for the userland exec
    implementation. However, some specific details still need to be clarified:
    in particular the exact layout of the stack just after the image is created.
    The diagram below roughly illustrates this stack layout.

              -----------------------------------------
                            [ 0 ] <-- top of the stack
                            [ envp strings ]
                            [ 0 ]
                            [ argv strings ]
                            [ 0 ]
                            [ auxv ]
                            [ 0 ]
                            [ envp ]
                            [ 0 ]
                            [ argv ]
                            [ argc ] <-- stack pointer
              -----------------------------------------

    At the top of the stack are the environmental and argument strings. The bottom
    of the stack starts with the argument count, then come the argument
    pointerarray and the environmental pointer array. In the middle is the
    Elf_aux vector. This array of data structures provides information to the
    dynamic linker about the process image, e.g. the location of the program
    header table.

    The dynamic linker is responsible for relocating the ELF binary, if required,
    as well as loading all the necessary libraries and performing run time symbol
    resolution. For more information on dynamic linkers see [Levine 1999],
    [grugq 2001] and [grugq & scut 2001].

    Design:

    With a clear understanding of how execve() creates a new process image, it is
    possible to enumerate a list of steps that a userland implementation must
    follow. That list is:

    1. Clean out the address space
    2. If the binary is dynamically linked, load the dynamic linker
    3. Load the binary
    4. Initialize the stack
    5. Determine the entry point (i.e. the dynamic linker or the main executable)
    6. Transfer execution to the entry point

    Note: the second and third step are actually interchangeable.

    Implementation:

    Step 0, preserving arguments to main()

    When userland exec is invoked, it accepts arguments to the main() function of
    the new program. While building the new process image, userland exec will
    reset the stack and potentially damage those parameters. For this reason, the
    first thing that userland exec must do is preserve these arguments for later
    use. In this implementation, the arguments are stored in tables allocated with
    mmap(). The contents of these tables can then be copied back into the stack
    when it is being initialized for the new process image.

    Step 1, clean out the address space

    To prepare the address space for a new process image userland exec needs to
    clean out only certain address ranges, not the entire old image. The address
    range of primary importance is the one to which the new program binary has
    been relocated. On i386 Linux systems, this address is usually 0x08048000. The
    .text segment, and the .data segment, of the current process's binary will be
    mapped to this location. Following the .data segment will be the .bss, and
    following that, the heap. Userland exec must therefore map the old executable
    down, in addition to mapping down the old heap. Depending on the state of the
    program, the heap can be of varying size.

    The userland exec implementation must determine where the memory range which
    comprises the .text, .data and heap, begins and ends so that it can map the
    range down. There are several different ways of determining this range. In
    some instances, for a specific target, it is possible to hard-code the values.
    For a more generic approach, one approach might be to register a signal
    handler and start probing the address space, e.g. to determine the end of the
    heap. Using the hard coded start address of 0x08048000 and the probed heap
    size, userland exec could map down this range.

    In this implementation, userland exec uses the process maps file maintained by
    the kernel to determine the location and size of the process' .text and .data
    segments and the heap. Convieniently, under Linux, the first three lines of
    the /proc/self/maps file contain the .text segment, the .data segment and the
    heap. By mapping down the first three memory segments listed in the maps file
    userland exec can clean the range required for the new .text and .data
    segments, and the new heap.

    Step 2, check for, and load, the dynamic linker

    Determining whether a given ELF binary requires a dynamic linker, or not, is
    incredibly straightforward. The execve() man pages describe the exact method
    used by the kernel (and by userland exec) to check for a dynamic linker.

           If the executable is a dynamically-linked ELF executable,
           the interpreter named in the PT_INTERP segment is used to
           load the needed shared libraries.

    The dynamic linker is only required for those processes which utilize shared
    libraries from the system. If a binary requires dynamic linking, the path
    to the desired dynamic linker will be stored in a special program header
    segment: PT_INTERP. By searching the program header table of the new ELF
    image, userland exec can determine whether a dynamic linker is required, and,
    simultaneously, which dynamic linker to load. Loading the linker after this is
    trivial, requiring only an understanding of how ELF binaries are loaded into
    memory. The next section will discuss how this is done.

    Step 3, load the main binary

    The principle for loading an ELF binary is the same regardless of whether the
    binary is loaded from a memory buffer or off the disk. For each PT_LOAD
    segment in the program header table, the loader ensures that the ELF image
    from p_offset for p_filesz bytes is copied into a memory buffer (with p_flags
    permissions) at p_vaddr of p_memsz (aligned to p_align) bytes. That is, each
    PT_LOAD segment essentially says "this part of the ELF goes into this part of
    the memory image".

    The userland exec implementation scans the program header table and determines
    the total amount of memory the loaded ELF will occupy. This simply means
    aligning the p_memsz values of all the PT_LOAD segments using p_align, then
    adding them together. The userland exec code then allocates a memory range
    starting from the first PT_LOAD segment's p_vaddr for the total length
    previously calculated. With the memory now reserved, all that remains is to
    copy the PT_LOAD segments into their appropriate ranges and then set the
    segment permissions based on their p_flags.

    Step 4, initialize the stack

    With the main binary, and any dynamic linker, loaded into memory, the last
    major task which still remains is setting up the stack. The stack, at program
    start up, has a strict format examined briefly above. To initialize the
    stack, userland exec must reset the stack pointer. Performing this is simply
    a matter of calculating the size of the stack contents (the argv strings,
    environmental strings, auxv, etc. etc.) and subtracting this value from the
    stack base. With the space allocated for the new parameters, userland exec can
    begin creating the stack layout.

    To create the stack layout the userland exec implementation must extract the
    saved parameters for the main() function (argc, argv and envp) and copy them
    to the stack. This can trivially be accomplished using the saved parameters
    from step 0 and some simple pointer arithmetic. The Elf_Aux vector is the only
    data structure which requires some explanation. This vector contains data
    which the dynamic linker needs in order to integrate with the process image
    correctly. There are only a few key pieces of information required by the
    dynamic linker to perform its task. These are: its own load address (AT_BASE);
    the location of the program header table (AT_PHDR), as well as the number of
    program headers (AT_PHNUM); the size of a page of memory (AT_PAGESZ); any
    flags to alter its run time behavior (AT_FLAGS), and the entry point for the
    main executable (AT_ENTRY).

    Step 5, determine entry point

    Locating the correct entry point for the new process image is very straight
    forward. If a dynamic linker was loaded, then the entry point is the load
    address of the linker plus the e_entry value, otherwise it is the main ELF's
    e_entry.

    Step 6, transfer control of execution

    The final step is to start the new process image running by transferring
    control of execution to the entry point located during step 5.

    These six steps are all that are required to load an arbitrary ELF binary into
    an address space and execute it.

    Conclusion:

    It has been shown that the execve() system call can be mimicked in userland
    with a minimum of difficulty. Creating a new process image is a simple matter
    of: cleaning out the address space; checking for, and loading, the dynamic
    linker; loading the binary; initializing the stack; determining the entry
    point and, transferring control of execution. By following these six steps,a
    new process image can be created and run. Using this implementation as a start
    point, it is possible to create both smaller, and also more elaborate,userland
    exec implementations.

    History of userland exec:

    The author first developed a userland exec implementation in early 2002. The
    initial proof of concept implementation demonstrated that it was indeed
    possible to mimic execve() in userland. However, several limitations in the
    implementation made it unfit for public release. Recently, some people have
    posted incredibly crude mechanisms for executing a new binary in an address
    space occupied by another process image. Given the severe limitations of this
    technique, and the request of a friend for the old proof of concept
    implementation, the author felt compelled to implement a fully functional
    version of userland exec. Design and implementation took less than 48 hours
    to complete.

    Acknowledgments:

    As always, the creation of any work is not possible without the help of other
    individuals. Here, then, is a list of those who've been involved in some
    capacity or another in making userland exec: mammon_; gera; duke, and Grendel,
    PhD. Additionally thanks to those ppl who made .sg so enjoyable Halvar &
    Evelyn, SK, Saumil, Charl and Dave Aitel. And all the ppl who came to my talk.

    Bibliography:

    http://www.phrack.org/phrack/58/p58-0x05 grugq & scut, 2001
    http://downloads.securityfocus.com/library/subversiveld.pdf grugq, 2001
    http://www.iecc.com/linker/ levine, 1999

    
    



  • Next message: Thomas Walpuski: "multiple payload handling flaws in isakmpd, again"

    Relevant Pages

    • Announcing Userland Exec
      ... up the stack; locate the entry point, ... Userland exec replaces the existing process image within the current address ... system -- the dynamic linker is loaded into memory. ...
      (Full-Disclosure)
    • [Full-Disclosure] Announcing Userland Exec
      ... up the stack; locate the entry point, ... Userland exec replaces the existing process image within the current address ... system -- the dynamic linker is loaded into memory. ...
      (Full-Disclosure)