Overview
Edit: this post was discussed on hacker news Some of the comments contained valuable reviews of the C code listed below. I’ve appended some edits after the original listing.
I’ve been working quite with files and file systems on Linux recently. Mostly from the vantage point of either a shell or a python script. I wanted to practice coding against the Linux API, so I cracked open my copy of the Linux Programming Interface to see if I could find some useful info. As usual, I found myself on an enjoyable tangent learning about file system and process fundamentals.
I developed a fairly simple goal for a small project over the weekend: pick some essential aspect of Linux file systems and learn a bit about it. Specifically, try to use the Linux API directly or at least understand what parts of the API are being used by whatever script is being used to get the job done.
The task I eventually settled on is described as a simple exercise in
the book mentioned above. Write a program that prints a list of all
processes running on the system that are associated with a specific
user. Print the pid and name of the program being run. It’s possible to
glean all of this information from the /proc
file system. The file
/proc/$pid/status
holds the info about uid and program name.
The general plan:
- Write a quick script in python to do the job
- Run the script using strace to see what system calls are being made
- Write a program in c to do the same job but use the API directly
- Run the program through strace and compare the footprint
- Profile both programs
Python script
The initial script is very straight forward. Is relies on the os
module and the pathlib.Path
class to get the uid, list the pids of all
processes being run. It then opens the /proc/$pid/status
file for each process
and reads two lines; the first to get the name of the program being run and the
ninth to get the uid.
#!/usr/bin/python
import os
from pathlib import Path
__doc__ ="""procuall.py: enumerate all processes run by the user who runs the
script.
usage: python procuall.py
"""
def search_proc():
"""for each process under /proc
parse the status file and try to match uid
if there is a match, the process belongs to the user"""
uid = os.getuid()
all = Path("/proc")
procs = all.glob("[0-9]*")
processes = [p for p in procs]
info = {}
for proc in processes:
status = proc.joinpath("status")
with open(status) as f:
lines = f.readlines()
cmd = lines[0].split("\t")
name = cmd[1].strip("\n")
uidl = lines[8].split("\t")
puid = uidl[1].strip("\n")
if int(puid) == uid:
info.update({name: proc.stem})
return info
def main():
"""Get info about running processes
output the command being run and the pid"""
info = search_proc()
for k, v in info.items():
print(f"cmd: {k}, pid: {v}")
if __name__ == '__main__':
main()
I found it quite beneficial to sketch out a working program like this. It didn’t take more than a couple of minutes to get it working and it made it possible to think of ways to improve the program for the second iteration.
Strace output
The first thing that I went looking for in the strace output was which
system calls get used to list the contents of a directory. In the python
code the Path.glob("*")
function is being used to list everything
in the proc
directory. The listing happens over the course of three
system calls, lines 912-914 below. Note that these calls are lower level
than anything we would use directly from a C program. The directory is
first opened, then fstatat
is called to get the stat
structure for
the directory. The call to openat
returns the file descriptor for the
directory. This is then passed as the first argument to the 4 following
function calls lines 913-916 below. The hex value is the address of the
directory, this is passed along with the st_size
attribute to the
getdents64
system call.
One curious part of the system calls generated by the python code is that is
seems to call newfstatat
and getents64
again with an empty string. I wonder
if this might be some side-effect of the use of the Path.glob
function. Which
would imply inclusion of the empty directory?
One moderately thing that occurred to me after writing the initial
python program was that it’s not necessary to read the uid
line of the
/proc/$PID/status
file. An fstat
call using the file descriptor is
enough to glean this info.
# output of strace python procuall.py
...
910 getuid() = 1000
911 newfstatat(AT_FDCWD, "/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}, 0) = 0
912 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
913 newfstatat(3, "", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_EMPTY_PATH) = 0
914 getdents64(3, 0x560cbd19e900 /* 416 entries */, 32768) = 11232
915 getdents64(3, 0x560cbd19e900 /* 0 entries */, 32768) = 0
916 close(3) = 0
...
It was quite clear from the strace output that the overhead of the interpreter costs practically the same as running the program itself. Strace generated over 3000 lines of output. The C version used about half the number of system calls. The really major difference is the absence of the virtual machine.
# output of strace ./procuall adam
...
97 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
98 newfstatat(3, "", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_EMPTY_PATH) = 0
99 getdents64(3, 0x7d3db0 /* 438 entries */, 32768) = 12552
...
The following is the strace output surrounding the code that generates the data for the eventual printed line of:
cmd: nvim, pid: 306559
One thing that stands out is the call to ioctl
- I have no idea why
this is happening and it also appears to be causing an error. As far as
I understand, the ioctl call signifies that the program is trying to do
a terminal control operation. Dunno.
3339 openat(AT_FDCWD, "/proc/306559/status", O_RDONLY|O_CLOEXEC) = 3
3340 newfstatat(3, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
3341 ioctl(3, TCGETS, 0x7ffff4b194b0) = -1 ENOTTY (Inappropriate ioctl for devi»
3342 lseek(3, 0, SEEK_CUR) = 0
3343 read(3, "Name:\tnvim\nUmask:\t0022\nState:\tS "..., 8192) = 1466
3344 read(3, "", 8192) = 0
3345 close(3) = 0
Strace output from the C program for gathering data for the following line.
Name: nvim pid:306559
Again, the output looks a bit tidier. One noticeable difference is that
we’re now at a much higher increment of file descriptor. This is because
of the way the program’s loop is running. We’re not closing files after
we’re finishing them, instead we’re just letting this happen implicitly
on program exit. There is also no lseek
operation as there way in the
python code. The file pointer is at byte zero on opening the file, so
I don’t really understand why the python version is calling lseek to
reposition it at the beginning of the file.
1149 openat(AT_FDCWD, "/proc/306559/status", O_RDONLY) = 341
1150 newfstatat(341, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
1151 newfstatat(341, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
1152 read(341, "Name:\tnvim\nUmask:\t0022\nState:\tS "..., 1024) = 1024
1153 write(1, "Name:\tnvim pid:306"..., 60) = 60
C program
The listing for the C code is included below. I definitely don’t have as much practice writing C as python. I try to pick up good habits when I see them. There are some good online resources for this, like Chris Wellons’ website.
The only real overhead to speak of in the C version is linking a few libraries at the top of the program. I’m using a custom function for reading lines from a file.
/* Function to read a line from file */
char* fgetLine(size_t size, FILE* fd)
{
char* input = NULL;
char* buf;
buf = malloc(size); /* allocate the min number of bytes */
if (buf == NULL) {
fprintf(stderr, "malloc error\n");
exit(1);
}
size_t len = 0, newlen = 0;
do {
/* read at most size bytes */
if(!fgets(buf, size, fd))
{
/* read null bytes */
buf[0] = '\0';
return buf;
}
newlen = strlen(buf); /* check length of string to be copied */
if (newlen > 0 && buf[newlen-1] == '\n') {
buf[--newlen] = '\0'; /* remove the trailing newline if present */
}
if (newlen == size-1) { /* we're not finished */
size *= 2; /* Double the size */
}
input = realloc(input, size);
if (input == NULL) {
fprintf(stderr, "realloc error\n");
exit(1);
}
memcpy(input + len, buf, newlen+1); /* begin to write at byte 0, else */
len += newlen;
size += newlen;
} while (buf[newlen] && buf[newlen-1]!='\n' && buf[newlen-1]!=EOF);
return input;
}
Other than that I’m making use of a function written by Michael Kerrisk to translate a name to a uid.
#include <dirent.h>
#include <stdarg.h>
#include <sys/stat.h>
#include "adio.h"
#include "cscratch_common.h"
#include "ugid_info.h"
/* procuall.c: all process being run by a user
*
* usage: procuall <username>
* */
#define MAXLINE 512
#define LPID 5
int s_isdigit(const char* s) {
int result = 0;
while (*s != '\0') {
if (('0' <= *s) && ('9' >= *s)) {
result = 1;
}
s++;
}
return result;
}
char*
make_filename(const char* pid) {
char* fname;
if (s_isdigit(pid)) {
sprintf(fname, "/proc/%s/status", pid);
return fname;
}
return NULL;
}
int
main (int argc, char* argv[])
{
char* uname = argv[1];
uid_t uid = uidFromName(uname);
printf("user: %s\tuid: %d\n", uname, uid);
DIR* dirp;
char* proc = "/proc";
dirp = opendir(proc);
struct dirent* dp;
char* fname;
int size = 0;
FILE* fp;
int fd;
char* lone;
struct stat sb;
if (dirp) {
errno = 0;
while ((dp = readdir(dirp)) != NULL) {
fname = make_filename(dp->d_name);
if (fname) {
fp = fopen(fname, "r");
fd = fileno(fp);
if (fstat(fd, &sb) == -1) {
return -1; /* just cheese it! */
}
if (uid == sb.st_uid) {
lone = fgetLine(MAXLINE, fp);
printf("%-24s pid:%-30.30s\n", lone, dp->d_name);
}
}
}
closedir(dirp);
}
return 0;
}
Edit: safely reading a /proc/$PID/status “file”
So, one fairly serious shortcoming of the code listings above is
that they do not correctly handle the case where the /proc/$PID
process disappears between the time the call to readdir
and
the call to fopen
. I’ve created another post called /proc los
that documents some of the steps taken to build understanding of how to
fix the errors in the original code. As an additional measure in the following
excerpt, access
is used to check whether the pseudo file is still available
for reading. So it doesn’t even attempt the call to fopen
if it’s not.
dirp = opendir(PROC);
if (dirp) {
errno = 0;
while ((dp = readdir(dirp)) != NULL) {
fname = make_filename(dp->d_name);
if (access(fname, F_OK) == 0) {
fp = fopen(fname, "r");
if (fp == NULL) { /* Entered a bad state */
fprintf(stderr, "Error: fopen attempted read on %s, returned %d", fname, errno);
} else {
fd = fileno(fp);
if (fstat(fd, &sb) == -1) {
return -1; /* just cheese it! */
}
if (uid == sb.st_uid) {
lone = fgetLine(MAXLINE, fp);
printf("%-24s pid:%-30.30s\n", lone, dp->d_name);
}
}
} else {
fprintf(stderr, "Error: %s failed before fopen\n", dp->d_name);
}
}
closedir(dirp);
}
The updated listing can be found on github
Edit: fix s_isdigit
The original routine s_isdigit
had a logic bug, as highlighted in one
of the comments.
As written, it should be
s_hasdigit
, but it’s actually wrong - it should be: … otherwise, it will choke on directories like/proc/etc64/
or/proc/net6/
if any such directory is added. (Plus other bits like early return, but that’s not a correctness bug.)
A slightly adapted version of the code posted in that comment would look like the following, it also returns early as soon as it’s clear that the check fails.
/* check that string s is a contiguous array of integer characters */
bool s_isinteger(const char* s) {
bool result = (*s != '\0');
while (*s != '\0') {
if ((*s < '0') || (*s > '9')) {
return false;
}
s++;
}
return result;
}
Profiling
Okay, so this section started off as a kind of a joke, I ran both
programs using the time
command on Linux. They run so quickly, it’s
a bit hard to determine (but not that hard to guess) which one runs
faster. As mentioned above, there is a huge overhead in running a python
interpreter. As much as I love python the language for it’s simple,
readable syntax and just how accessible that makes it as a programming
language, I do wonder from time to if life would be any different if I
wasn’t lugging a python interpreter everywhere.
The initial output was as follows:
python3 scripts/procuall.py 0.03s user 0.01s system 97% cpu 0.043 total
./procuall adam 0.00s user 0.01s system 93% cpu 0.009 total
I then stumbled across a CLI tool called hyperfine that is aimed at benchmarking programs. Using this, and tweaking a few flags, it was easy to generate some useful information about the two programs.
proc % hyperfine --warmup=100 --shell=none "python scripts/procuall.py" "./procuall adam
Benchmark 1: python scripts/procuall.py
Time (mean ± σ): 32.2 ms ± 0.3 ms [User: 24.3 ms, System: 7.6 ms]
Range (min … max): 31.5 ms … 33.2 ms 92 runs
Benchmark 2: ./procuall adam
Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.3 ms, System: 2.1 ms]
Range (min … max): 2.3 ms … 3.5 ms 989 runs
Summary
./procuall adam ran
12.97 ± 0.72 times faster than python scripts/procuall.py