Re: Linux kernels DoSable by file-max limit

From: Andrea Arcangeli (andrea@suse.de)
Date: 07/10/02


Date: Wed, 10 Jul 2002 23:07:41 +0200
From: Andrea Arcangeli <andrea@suse.de>
To: Paul Starzetz <paul@starzetz.de>

On Sun, Jul 07, 2002 at 10:54:44PM +0200, Paul Starzetz wrote:
> Hi,
>
> the recently mentioned problem in BSD kernels concerning the global
> limit of open files seems to be present in the Linux-kernel too. However
> as mentioned in the advisory about the BSD specific problem the Linux
> kernel keeps some additional file slots reserved for the root user. This
> code can be found in the fs/file_table.c source file (2.4.18):
>
> struct file * get_empty_filp(void)
> {
> static int old_max = 0;
> struct file * f;
>
> file_list_lock();
> if (files_stat.nr_free_files > NR_RESERVED_FILES) {
> used_one:
> f = list_entry(free_list.next, struct file, f_list);
>
> [...]
>
> /*
> * Use a reserved one if we're the superuser
> */
> [*] if (files_stat.nr_free_files && !current->euid)
> goto used_one;
>
>
> Greping the source code (2.4.18) reveals that the limit is pretty low:
>
> ./include/linux/fs.h:#define NR_RESERVED_FILES 10 /* reserved for root */

well, that's not really secure in the first place, I mean there's
nothing to exploit, it's more an hack to try to have more chances to
keep an usable machine as root after you hit the file-max, but it's not
guaranteed to work at all regardless of malicious or non malicious
workloads. Linux never enforce to keep the nr_free_files to a level >=
NR_RESERVED_FILES, it just tries to do that lazily, but it's not
guaranteed you will have any nr_free_files when you happen to need them.

For example if you keep only opening files since boot and you never
execute a single close() or exit() syscall, you will never get any
nr_free_file available, so no matter who you are (root or not), you will
never pass this test "if (files_stat.nr_free_files && !current->euid)"
because nr_free_files will be always zero.

Furthmore that part of the vfs file allocation management needs a
rewrite (hope it will happen in 2.5) and the file-max should go away
like the inode-max gone away too in 2.3. At the moment all released
files have no way to be releaed dynamically, and that's not good. There
should be a proper slab cache and the fput should kmem_cache_free,
instead of putting the file into the unshrinkable
"file_table.c::free_list". But this is more a linux-kernel topic...

After we make possible to shrink the released files, the file-max limit
can go away (we need it now or we can pin all the ram into this not
shrinkable "free_list"). Then if you allocate all the ram into files you
will run the machine oom at some point. Which moves the DoS issues
elsewere: in the memory management area, which becomes a generic
problem, not specific to the file allocations anymore. After you hit the
oom point, even if you could allocate the file with a
root-file-reserved-pool, still you may not be able to allocate the
dentry and the inode then.

Anyways regardless of the memory management oom possible DoSes (when
running out of ram resources), removing the file-max is a goodness
because it makes the usability of linux much better, if you need lots of
files in a temporarly spike of load, then you won't be left with an huge
leak of files hanging around the the vm will shrink them as you need
more ram later. And if you hit oom, it's very likely (though not
guaranteed, also considering the different algorithms to handle oom
conditions, some deadlock prone, some not deadlock prone) that the
offending task will be killed too rendering any malicious attack much
less reproducible than now.

> [..]
> Exploitability to get uid=0 has not been confirmed yet but seems possible.

If that's the case it's an userspace bug in the suid apps that you're
executing, certainly it's not a kernel issue.

Andrea



Relevant Pages

  • memory allocation / deallocation within the kernel
    ... I am trying to understand the impact of memory allocation / deallocation ... within the kernel. ... Does this imply that if I have 1 GB of RAM - then I cannot reserve more ... i.e. if it needs to allocate more than 1 GB of address space, ...
    (freebsd-hackers)
  • Re: async network I/O, event channels, etc
    ... AIO I/O routines make transfer of buffer ownership. ... while kernel "owns" the buffer for an AIO ... Or do we want to allocate at least one page for one skb? ... and actually userspace does not know which NIC receives it's data, ...
    (Linux-Kernel)
  • Re: User space out of memory approach
    ... kernel needs some guidance upon which to base a decision. ... That OOM killer at kernel level who get the list ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: Server overloaded? Or is it a bug?
    ... > your NIC, that the network stack isn't completely hosed, and can allocate ... it's probably safe to assume the TCP state ... > and socket were fully allocated, and the socket was returned by the kernel ... Kate crashed when I wanted to save a document, ...
    (freebsd-stable)
  • OOM problems on 2.6.12-rc1 with many fsx tests
    ... I run into OOM problem again on 2.6.12-rc1. ... I also see fsx tests start to generating report about read bad data ... Mar 23 02:16:18 elm3b92 kernel: Normal per-cpu: ...
    (Linux-Kernel)