DevOps

Using rlimit (And Why You Should)

I’ve been going through some old notes and came across a reminder of setrlimit(2).

This is a C system call that allows an application to specify resource limitations on a number of important parameters:
 
 
 
 
 
 
 

  • RLIMIT_AS – The maximum size of the process’s virtual memory (address space) in bytes.
  • RLIMIT_CORE – Maximum size of core file.
  • RLIMIT_CPU – CPU time limit in seconds.
  • RLIMIT_DATA – The maximum size of the process’s data segment (initialized data, uninitialized data, and heap).
  • RLIMIT_FSIZE – The maximum size of files that the process may create.
  • RLIMIT_MEMLOCK – The maximum number of bytes of memory that may be locked into RAM.
  • RLIMIT_MSGQUEUE – Specifies the limit on the number of bytes that can be allocated for POSIX message queues for the real user ID of the calling process.
  • RLIMIT_NICE – Specifies a ceiling to which the process’s nice value can be raised using setpriority(2) or nice(2).
  • RLIMIT_NOFILE – Specifies a value one greater than the maximum file descriptor number that can be opened by this process.
  • RLIMIT_NPROC – The maximum number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process.
  • RLIMIT_RSS – Specifies the limit (in pages) of the process’s resident set (the number of virtual pages resident in RAM).
  • RLIMIT_RTPRIO – Specifies a ceiling on the real-time priority that may be set for this process using sched_setscheduler(2) and sched_setparam(2).
  • RLIMIT_RTTIME – Specifies a limit (in microseconds) on the amount of CPU time that a process scheduled under a real-time scheduling policy may consume without making a blocking system call.
  • RLIMIT_SIGPENDING – Specifies the limit on the number of signals that may be queued for the real user ID of the calling process.
  • RLIMIT_STACK – The maximum size of the process stack, in bytes.

The limits for all programs are specified in configuration files (/etc/security/limits.conf and /etc/security/limits.d), or can be set in an individual shell and its processes via the ‘ulimit’ shell function. Under Linux the current resource limits for a process are visible at /proc/[pid]/limits.

The limits can also be set programmatically, via setrlimit(2). Any process can give itself more restrictive limits. Any privileged process (running as root or with the correct capability) can give itself more permissive limits.

I believe most systems default to unlimited or very high limits and it is the responsibility of the application to specify tighter limits. Better secured systems will do the reverse – they’ll have much tighter restrictions and use a privileged loader to grant more resources to specific programs.

Why do we care?

Security in depth.

First, people make mistakes. Setting reasonable limits keeps a runaway process from taking down the system.

Second, attackers will take advantage of any opportunity they can find. A buffer overflow isn’t an abstract concern – they are real and often allow an attacker to execute arbitrary code. Reasonable limits may be enough to sharply curtail the damage caused by an exploit.

Here are some concrete examples:

First, setting RLIMIT_NPROC to zero means that the process cannot fork/exec a new process – an attacker cannot execute arbitrary code as the current user. (Note: the man pages suggests this may limit the total number of processes for the user, not just in this process and its children. This should be double-checked.) It also prevents a more subtle attack where a process is repeatedly forked until a desired PID is acquired. PIDs should be unique but apparently some kernels now support a larger PID space than the traditional pid_t. That means legacy system calls may be ambiguous.

Second, setting RLIMIT_AS, RLIMIT_DATA, and RLIMIT_MEMLOCK to reasonable values prevents a process from forcing the system to thrash by limiting available memory.

Third, setting RLIMIT_CORE to a reasonable value (or disabling core dumps entirely) has historically been used to prevent denial of service attacks by filling the disk with core dumps. Today core dumps are often disabled to ensure sensitive information such as encryption keys are not inadvertently written to disk where an attacker can later retrieve them. Sensitive information should also be memlock()ed to prevent it from being written to the swap disk.

What about java?

Does this impact java?

Yes.

The standard classloader maintains an open ‘file handle’ for every loaded class. This can be thousands of open file handles for application servers. I’ve seen real-world failures that were ultimately tracked down to hitting the RLIMIT_NOFILE limit.

There are three solutions. The first is to increase the number of permitted open files for everyone via the limits.conf file. This is undesirable – we want applications and users to have enough resources to do their job but not much more.

The second is to increase the number of permitted open files for just the developers and application servers. This is better than the first option but can still let a rogue process cause a lot of damage.

The third is to write a simple launcher app that sets a higher limit before doing an exec() to launch the application server or developer’s IDE. This ensures that only the authorized applications get the additional resources.

(Java’s SecurityManager can also be used to limit resource usage but that’s beyond the scope of this discussion.)

Sample code

Finally some sample code from the prlimit man page. The setrlimit version is similar.

#define _GNU_SOURCE
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>

#define errExit(msg)	do { perror(msg);  exit(EXIT_FAILURE); } while (0)

int
main(int argc, char *argv[])
{
    struct rlimit old, new;
    struct rlimit *newp;
    pid_t pid;

    if (!(argc == 2 || argc == 4)) {
        fprintf(stderr, "Usage: %s  [<new-soft-limit> <new-hard-limit>]\n", argv[0]);
        exit(EXIT_FAILURE);
     }

     pid = atoi(argv[1]);        /* PID of target process */

     newp = NULL;
     if (argc == 4) {
         new.rlim_cur = atoi(argv[2]);
         new.rlim_max = atoi(argv[3]);
         newp = ≠w;
     }

     /* Set CPU time limit of target process; retrieve and display previous limit */
     if (prlimit(pid, RLIMIT_CPU, newp, &old) == -1)
         errExit("prlimit-1");
         printf("Previous limits: soft=%lld; hard=%lld\n",
             (long long) old.rlim_cur, (long long) old.rlim_max);

    /* Retrieve and display new CPU time limit */
    if (prlimit(pid, RLIMIT_CPU, NULL, &old) == -1)
           errExit("prlimit-2");
           printf("New limits: soft=%lld; hard=%lld\n",
              (long long) old.rlim_cur, (long long) old.rlim_max);

    exit(EXIT_FAILURE);
}

Usage in practice

It should not be hard to write a function that sets limitations as part of the program startup, perhaps as the final step in program initialization but before reading anything provided by the user. In many cases we can just take the existing resource usage and add just enough to cover what we’ll need to support the user’s request. E.g., perhaps two additional file handles, one for input and one for output.

In other cases it’s harder to identify good limits but there are three approaches.

The first is to focus on what’s critical. E.g., many applications know that they should never launch a subprocess so RLIMIT_NPROC can be set to zero. (Again, after verifying that this is the limit of processes under the current process, not all processes for the user.) They know that the should never need to open more than a handful of additional files so RLIMIT_NOFILE can be set to allow a few more open files but no more. Even these modest restrictions can go a long way towards limiting damage.

The second is to simply pick some large value that you are sure will be adequate for limits on memory or processor usage. Maybe 100 MB is an order of magnitude too large – but it’s an order of magnitude smaller than it was before. This approach can be especially useful for subprocesses in a boss/worker architecture where the amount of resources required by any individual worker can be well-estimated.

The final approach requires more work but will give you the best numbers. During development you’ll add a little bit of additional scaffolding:

  • Run the program as setuid root but immediately change the effective user to an unprivileged user.
  • Set a high hard limit and low soft limit.
  • Check whether the soft limit is hit on every system call. (You should already checking for errors.)
  • On soft limit hits change the effective user to root, bump the soft limit, restore the original effective user, and retry the operation.
  • Log it every time you must bump the soft limit. Variant – have an external process poll the /proc/[pid]/limits file.

With good functional and acceptance tests you should have a solid idea about the resources required by the program. You’ll still want to be generous with the final resource limits but it should give you a good ‘order of magnitude’ estimate for what you need, e.g., 10 MB vs 2 GB.

On a final note: disk quotas

We’ve been discussing resource limitations on an individual process but sometimes the problem is resource exhaustion over time. Specifically disk usage – an application could inadvertently cause a denial of service attack by filling the disk.

There’s an easy solution to this – enabling disk quotas. We normally think of disk quotas as being used to make sure users on a multi-user system play well together but they can also be used as a security measure to constrain compromised servers.

Reference: Using rlimit (And Why You Should) from our JCG partner Bear Giles at the Invariant Properties blog.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button