not knowing the /proc filesystem


Overview

Edit: this post was discussed on hacker news Some of the comments contained valuable reviews of the C code listed below. I’ve appended some edits after the original listing.

I’ve been working quite with files and file systems on Linux recently. Mostly from the vantage point of either a shell or a python script. I wanted to practice coding against the Linux API, so I cracked open my copy of the Linux Programming Interface to see if I could find some useful info. As usual, I found myself on an enjoyable tangent learning about file system and process fundamentals.

I developed a fairly simple goal for a small project over the weekend: pick some essential aspect of Linux file systems and learn a bit about it. Specifically, try to use the Linux API directly or at least understand what parts of the API are being used by whatever script is being used to get the job done.

The task I eventually settled on is described as a simple exercise in the book mentioned above. Write a program that prints a list of all processes running on the system that are associated with a specific user. Print the pid and name of the program being run. It’s possible to glean all of this information from the /proc file system. The file /proc/$pid/status holds the info about uid and program name.

The general plan:

  1. Write a quick script in python to do the job
  2. Run the script using strace to see what system calls are being made
  3. Write a program in c to do the same job but use the API directly
  4. Run the program through strace and compare the footprint
  5. Profile both programs

Python script

The initial script is very straight forward. Is relies on the os module and the pathlib.Path class to get the uid, list the pids of all processes being run. It then opens the /proc/$pid/status file for each process and reads two lines; the first to get the name of the program being run and the ninth to get the uid.

#!/usr/bin/python
import os
from pathlib import Path

__doc__ ="""procuall.py: enumerate all processes run by the user who runs the
script.

usage: python procuall.py
"""

def search_proc():
    """for each process under /proc
       parse the status file and try to match uid
       if there is a match, the process belongs to the user"""
    uid = os.getuid()
    all = Path("/proc")
    procs = all.glob("[0-9]*")
    processes = [p for p in procs]
    info = {}
    for proc in processes:
        status = proc.joinpath("status")
        with open(status) as f:
            lines = f.readlines()
            cmd = lines[0].split("\t")
            name = cmd[1].strip("\n")
            uidl = lines[8].split("\t")
            puid = uidl[1].strip("\n")
            if int(puid) == uid:
                info.update({name: proc.stem})
    return info


def main():
    """Get info about running processes
    output the command being run and the pid"""
    info = search_proc()
    for k, v in info.items():
        print(f"cmd: {k}, pid: {v}")


if __name__ == '__main__':
    main()

I found it quite beneficial to sketch out a working program like this. It didn’t take more than a couple of minutes to get it working and it made it possible to think of ways to improve the program for the second iteration.

Strace output

The first thing that I went looking for in the strace output was which system calls get used to list the contents of a directory. In the python code the Path.glob("*") function is being used to list everything in the proc directory. The listing happens over the course of three system calls, lines 912-914 below. Note that these calls are lower level than anything we would use directly from a C program. The directory is first opened, then fstatat is called to get the stat structure for the directory. The call to openat returns the file descriptor for the directory. This is then passed as the first argument to the 4 following function calls lines 913-916 below. The hex value is the address of the directory, this is passed along with the st_size attribute to the getdents64 system call.

One curious part of the system calls generated by the python code is that is seems to call newfstatat and getents64 again with an empty string. I wonder if this might be some side-effect of the use of the Path.glob function. Which would imply inclusion of the empty directory?

One moderately thing that occurred to me after writing the initial python program was that it’s not necessary to read the uid line of the /proc/$PID/status file. An fstat call using the file descriptor is enough to glean this info.

# output of strace python procuall.py
...
910 getuid()                                = 1000
911 newfstatat(AT_FDCWD, "/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}, 0) = 0
912 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
913 newfstatat(3, "", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_EMPTY_PATH) = 0
914 getdents64(3, 0x560cbd19e900 /* 416 entries */, 32768) = 11232
915 getdents64(3, 0x560cbd19e900 /* 0 entries */, 32768) = 0
916 close(3)                                = 0
...

It was quite clear from the strace output that the overhead of the interpreter costs practically the same as running the program itself. Strace generated over 3000 lines of output. The C version used about half the number of system calls. The really major difference is the absence of the virtual machine.

# output of strace ./procuall adam
...
97 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
98 newfstatat(3, "", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_EMPTY_PATH) = 0
99 getdents64(3, 0x7d3db0 /* 438 entries */, 32768) = 12552
...

The following is the strace output surrounding the code that generates the data for the eventual printed line of:

cmd: nvim, pid: 306559

One thing that stands out is the call to ioctl - I have no idea why this is happening and it also appears to be causing an error. As far as I understand, the ioctl call signifies that the program is trying to do a terminal control operation. Dunno.

3339 openat(AT_FDCWD, "/proc/306559/status", O_RDONLY|O_CLOEXEC) = 3
3340 newfstatat(3, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
3341 ioctl(3, TCGETS, 0x7ffff4b194b0)        = -1 ENOTTY (Inappropriate ioctl for devi»
3342 lseek(3, 0, SEEK_CUR)                   = 0
3343 read(3, "Name:\tnvim\nUmask:\t0022\nState:\tS "..., 8192) = 1466
3344 read(3, "", 8192)                       = 0
3345 close(3)                                = 0

Strace output from the C program for gathering data for the following line.

Name:   nvim               pid:306559

Again, the output looks a bit tidier. One noticeable difference is that we’re now at a much higher increment of file descriptor. This is because of the way the program’s loop is running. We’re not closing files after we’re finishing them, instead we’re just letting this happen implicitly on program exit. There is also no lseek operation as there way in the python code. The file pointer is at byte zero on opening the file, so I don’t really understand why the python version is calling lseek to reposition it at the beginning of the file.

1149 openat(AT_FDCWD, "/proc/306559/status", O_RDONLY) = 341
1150 newfstatat(341, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
1151 newfstatat(341, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
1152 read(341, "Name:\tnvim\nUmask:\t0022\nState:\tS "..., 1024) = 1024
1153 write(1, "Name:\tnvim               pid:306"..., 60) = 60

C program

The listing for the C code is included below. I definitely don’t have as much practice writing C as python. I try to pick up good habits when I see them. There are some good online resources for this, like Chris Wellons’ website.

The only real overhead to speak of in the C version is linking a few libraries at the top of the program. I’m using a custom function for reading lines from a file.

/* Function to read a line from file */
char* fgetLine(size_t size, FILE* fd)
{
	char* input = NULL;
	char* buf;
	buf = malloc(size);  /* allocate the min number of bytes */
	if (buf == NULL) {
		fprintf(stderr, "malloc error\n");
		exit(1);
	}
    size_t len = 0, newlen = 0;
	do {
		/* read at most size bytes */
		if(!fgets(buf, size, fd))
		{
			/* read null bytes */
			buf[0] = '\0';
			return buf;
		}
		newlen = strlen(buf); /* check length of string to be copied */
		if (newlen > 0 && buf[newlen-1] == '\n') {
			buf[--newlen] = '\0'; /* remove the trailing newline if present */
		}
		if (newlen == size-1) { /* we're not finished */
			size *= 2;   /* Double the size    */
		} 		
		input = realloc(input, size);
		if (input == NULL) {
			fprintf(stderr, "realloc error\n");
			exit(1);
		}
		memcpy(input + len, buf, newlen+1); /* begin to write at byte 0, else */
		len += newlen;
		size += newlen;
	} while (buf[newlen] && buf[newlen-1]!='\n' && buf[newlen-1]!=EOF);
	return input;
}

Other than that I’m making use of a function written by Michael Kerrisk to translate a name to a uid.

#include <dirent.h>
#include <stdarg.h>
#include <sys/stat.h>

#include "adio.h"
#include "cscratch_common.h"
#include "ugid_info.h"


/* procuall.c: all process being run by a user 
 *
 * usage: procuall <username>
 * */

#define MAXLINE 512
#define LPID 5

int s_isdigit(const char* s) {
    int result = 0;
    while (*s != '\0') {
        if (('0' <= *s) && ('9' >= *s)) {
            result = 1;
        }
        s++;
    }
    return result;
}

char*
make_filename(const char* pid) {
    char* fname;
    if (s_isdigit(pid)) {
        sprintf(fname, "/proc/%s/status", pid);
        return fname;
    }
    return NULL;
}

int
main (int argc, char* argv[])
{
    char* uname = argv[1];
    uid_t uid = uidFromName(uname);
    printf("user: %s\tuid: %d\n", uname, uid);

    DIR* dirp;
    char* proc = "/proc";
    dirp = opendir(proc);
    struct dirent* dp;
    char* fname;
    int size = 0;
    FILE* fp;
    int fd;
    char* lone;

    struct stat sb;

    if (dirp) {
        errno = 0;
        while ((dp = readdir(dirp)) != NULL) {
            fname = make_filename(dp->d_name);
            if (fname) {
                fp = fopen(fname, "r");
                fd = fileno(fp);
                if (fstat(fd, &sb) == -1) {
                    return -1; /* just cheese it! */
                }
                if (uid == sb.st_uid) {
                    lone = fgetLine(MAXLINE, fp);
                    printf("%-24s pid:%-30.30s\n", lone, dp->d_name);
                }
            }
        }
        closedir(dirp);
    }
    return 0;
}

Edit: safely reading a /proc/$PID/status “file”

So, one fairly serious shortcoming of the code listings above is that they do not correctly handle the case where the /proc/$PID process disappears between the time the call to readdir and the call to fopen. I’ve created another post called /proc los that documents some of the steps taken to build understanding of how to fix the errors in the original code. As an additional measure in the following excerpt, access is used to check whether the pseudo file is still available for reading. So it doesn’t even attempt the call to fopen if it’s not.

    dirp = opendir(PROC);
    if (dirp) {
        errno = 0;
        while ((dp = readdir(dirp)) != NULL) {
            fname = make_filename(dp->d_name);
            if (access(fname, F_OK) == 0) {
                fp = fopen(fname, "r");
                if (fp == NULL) { /* Entered a bad state */
                    fprintf(stderr, "Error: fopen attempted read on %s, returned %d", fname, errno);
                } else {
                    fd = fileno(fp);
                    if (fstat(fd, &sb) == -1) {
                        return -1; /* just cheese it! */
                    }
                    if (uid == sb.st_uid) {
                        lone = fgetLine(MAXLINE, fp);
                        printf("%-24s pid:%-30.30s\n", lone, dp->d_name);
                    }
                }
            } else {
                fprintf(stderr, "Error: %s failed before fopen\n", dp->d_name);
            }
        }
        closedir(dirp);
    }

The updated listing can be found on github

Edit: fix s_isdigit

The original routine s_isdigit had a logic bug, as highlighted in one of the comments.

As written, it should be s_hasdigit, but it’s actually wrong - it should be: … otherwise, it will choke on directories like /proc/etc64/ or /proc/net6/ if any such directory is added. (Plus other bits like early return, but that’s not a correctness bug.)

A slightly adapted version of the code posted in that comment would look like the following, it also returns early as soon as it’s clear that the check fails.

/* check that string s is a contiguous array of integer characters */
bool s_isinteger(const char* s) {
    bool result = (*s != '\0');
    while (*s != '\0') {
        if ((*s < '0') || (*s > '9')) {
            return false;
        }
        s++;
    }
    return result;
}

Profiling

Okay, so this section started off as a kind of a joke, I ran both programs using the time command on Linux. They run so quickly, it’s a bit hard to determine (but not that hard to guess) which one runs faster. As mentioned above, there is a huge overhead in running a python interpreter. As much as I love python the language for it’s simple, readable syntax and just how accessible that makes it as a programming language, I do wonder from time to if life would be any different if I wasn’t lugging a python interpreter everywhere.

The initial output was as follows:

python3 scripts/procuall.py  0.03s user 0.01s system 97% cpu 0.043 total
./procuall adam  0.00s user 0.01s system 93% cpu 0.009 total

I then stumbled across a CLI tool called hyperfine that is aimed at benchmarking programs. Using this, and tweaking a few flags, it was easy to generate some useful information about the two programs.

proc % hyperfine --warmup=100 --shell=none "python scripts/procuall.py" "./procuall adam
Benchmark 1: python scripts/procuall.py
  Time (mean ± σ):      32.2 ms ±   0.3 ms    [User: 24.3 ms, System: 7.6 ms]
  Range (min … max):    31.5 ms …  33.2 ms    92 runs

Benchmark 2: ./procuall adam
  Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.3 ms, System: 2.1 ms]
  Range (min … max):     2.3 ms …   3.5 ms    989 runs

Summary
  ./procuall adam ran
   12.97 ± 0.72 times faster than python scripts/procuall.py

Edit: Fix memory leaks and handle early returns

The initial listing of the code had a number of other issues, in particular it doesn’t handle early returns well. Also, it makes things overly complicated by using heap allocation where stack allocation will do. Also, there are some other issues around overly complicated and unsafe use of print statements. A good summary of the issues that were addressed can be found in the email copied into the commit message: sha 24cc39

Here is a full listing of the corrected code:

#include <dirent.h>
#include <stdarg.h>
#include <sys/stat.h>

#include "cscratch_common.h"
#include "ugid_info.h"


/* procuall.c: all process being run by a user
 *
 * usage: procuall <username>
 * */

#define MAXLINE 4096
#define PROC "/proc"

/* check that string s is a contiguous array of integer characters */
bool s_isinteger(const char* s) {
    bool result = (*s != '\0');
    while (*s != '\0') {
        if ((*s < '0') || (*s > '9')) {
            return false;
        }
        s++;
    }
    return result;
}

int
make_filename(char fname[MAXLINE], const char* pid) {
    if (s_isinteger(pid)) {
        snprintf(fname, MAXLINE, "/proc/%s/status", pid);
        return 0;
    }
    return -1;
}

int
main (int argc, char* argv[]) {
    if (argc != 2) {
        printf("usage: procuall <username>\n");
        exit(EPERM);
    }
    char* uname = argv[1];
    uid_t uid = uidFromName(uname);
    printf("user: %s\tuid: %d\n", uname, uid);

    DIR* dirp;
    struct dirent* dp;
    FILE* fp;
    int fd;

    char* line = NULL;
    ssize_t rread = 0;
    size_t len = 0;
    struct stat sb;

    dirp = opendir(PROC);
    if (dirp) {
        errno = 0;
        while ((dp = readdir(dirp)) != NULL) {

            int rc;
            char fname[MAXLINE];
            if ((rc = make_filename(fname, dp->d_name)) == -1) {
                fprintf(stderr, "Error make_filename\n");
                goto err_fname;
            }

            if (access(fname, F_OK) != 0) {
                fprintf(stderr, "Error: %s failed before fopen\n", dp->d_name);
                goto err_file;
            }

            if ((fp = fopen(fname, "r")) == NULL) { /* Entered a bad state */
                fprintf(stderr, "Error: fopen attempted read on %s, returned %d", fname, errno);
                goto err_file;
            }

            if ((fd = fileno(fp)) == -1) {
                fprintf(stderr, "Error fileno %d\n", errno);
                fclose(fp);
                goto err_proc;
            }

            if (fstat(fd, &sb) == -1) {
                fprintf(stderr, "Error fstat %d\n", errno);
                goto err_proc;
            }

            if (uid == sb.st_uid) {
                if ((rread = getline(&line, &len, fp)) != -1) {
                    fprintf(stderr, "%spid:\t%s\n", line, dp->d_name);
                } else {
                    fprintf(stderr, "Error getline %d\n", errno);
                }
            }
err_proc:
            fclose(fp);
        }
err_fname:
err_file:
        /* cleanup directory stuff */
        closedir(dirp);
    }
    return 0;
}
Linux  C  Python 

See also