Background
This posts is a successor to not knowing the /proc filesystem
The aim is to improve on at least one major shortcoming of the code
listing from the other post. The major fault with that program is that
it does not handle the case where the /proc/$PID
disappears between
the call to readdir
and the call to fopen
with the status file.
To be honest, I was aware of this possibility while writing writing
the other program, but was temporarily uninspired to search for a
solution. (“Ideenlos” is German for uninspired, hence the title; I live
in Austria, hence the German).
Understanding the problem
In order to better understand the problem, the following script can
be used to simulate what happens if a process disappears while we are
attempting to perform a read from the associated pseudo filesystem.
One of the side effects of my “lazybastarditis” is that I rarely have
spent time trying to use error codes that get provided with the system.
I followed the graph of header files from <errno.h>
, and eventually
landed at:
/* excerpt from /usr/include/asm-generic/errno-base.h */
/*...*/
#define EPERM 1 /* Operation not permitted */
#define ENOENT 2 /* No such file or directory */
#define ESRCH 3 /* No such process */
/*...*/
Test runner
In order to simulate the program running into an issue where the process
it is trying to gather info about dies while attempting the read, it’s
possible to use the following shell script. It basically sets two
timers TOUTER
and TINNER
, the former is passed as an argument to
the sleep
command in the shell, which runs in the background while
proclos
is started with the pid (output of $!
) and the TINNER
timer. Within the c program below, the sleep
function is called with
the larger time interval, to make sure that the process we are trying to
observe has enough time to die.
#!/usr/bin/sh
TOUTER=1
TINNER=2
sleep $TOUTER &
./proclos $! $TINNER
The following listing sha 187e96
is intentionally buggy and used to demonstrate what happens by running the test
above. The fopen
manpage describes the return value as a FILE
pointer on
successful completion, otherwise NULL
. Furthermore, the errno is set to
indicate the error.
/* proclos: test what happens if a process disappears between the call to
*
* program will attempt to read /proc/$PID/status file, but the process with
* $PID will be killed between the call to readdir and the call to fopen on the
* /proc/$PID/status file.
*/
#include <dirent.h>
#include <stdarg.h>
#include <sys/stat.h>
#include "adio.h"
#include "cscratch_common.h"
#define MAXLINE 512
#define MAXFNAME 128
#define LPID 5
#define PROC "/proc"
/* check that string s contains only contiguous integer characters */
bool s_isinteger(const char* s) {
bool result = (*s != '\0');
while (*s != '\0') {
if ((*s < '0') || (*s > '9')) {
return false;
}
s++;
}
return result;
}
int main(int argc, char* argv[]) {
if (argc != 3) {
printf("usage: proclos <pid> <t>\n");
exit(EPERM);
}
char* pid;
pid = argv[1];
if (!s_isinteger(pid)) {
fprintf(stderr, "Error: invalid pid %s\n", pid);
exit(EPERM);
}
/* Otherwise set up the path we want to read */
char fname[MAXFNAME];
sprintf(fname, "/proc/%s/status", pid);
int time;
time = atoi(argv[2]);
if (!time) {
fprintf(stderr, "Error: %s not positive integer\n", argv[2]);
exit(EPERM);
}
DIR* dirp;
struct dirent* dp;
FILE* fp;
int fd;
char* lone;
struct stat sb;
int size = 0;
dirp = opendir(PROC);
if (dirp) {
errno = 0;
if ((dp = readdir(dirp)) != NULL) {
printf("e1: %d\n", errno);
sleep(time); /* Sleep while the process gets killed */
fp = fopen(fname, "r");
printf("e2: %d\n", errno);
fd = fileno(fp);
if (fstat(fd, &sb) == -1) {
return -1; /* just cheese it! */
}
lone = fgetLine(MAXLINE, fp);
printf("%-24s pid:%-30.30s\n", lone, pid);
}
closedir(dirp);
}
return 0;
}
Compiling and running this version using the test runner, we can see
that fopen
sets errno
to 2 or ENOENT
. Furthermore, we know from
the output that the attempted call to fileno
causes the program to
segfault.
proc % ./proclos-runner.sh (proclos◆◆) ~/Code/cscratch/proc
e1: 0
e2: 2
./proclos-runner.sh: line 6: 369159 Segmentation fault (core dumped) ./proclos $! $TINNER
zsh: exit 139 ./proclos-runner.sh
Cleaning up the code
The minimal viable solution might look something like the following. As soon as
fopen
returns NULL
, the program is in a bad state and we have to act. In the
case of this example, we can just print an error message and exit.
...
fp = fopen(fname, "r");
if (fp == NULL) {
fprintf(stderr, "Error: fopen failed to complete with code %d\n", errno);
exit(ESRCH); /* Process not found */
}
...
Running the test again:
proc % ./proclos-runner.sh (proclos◆) ~/Code/cscratch/proc
Error: fopen failed to complete with code 2
zsh: exit 3 ./proclos-runner.sh
Commit sha 95cf741 reflects the updated code.
Digression:
“Die Lage in Österreich ist hoffnungslos, aber nicht ernst.” - Alfred Polger, 1922
The situation in Austria is hopeless, but not serious.