A problem was reported by a customer. They were getting a failure and in the logs it reported
error → waitpid failed 'Reason: No child processes'
The “No child processes” error came from waitpid() after using
fork/spawn to launch a utility to load data into a data base.
Upon
detailed investigation it appears it is possible that some other
process that the user is running has changed the default handler for SIGCHLD - possibly the shell (e.g. bash!) used to launch our server processes. If
the signal handler is set to SIG_IGN then when a process is started
using fork()/exec() the return code from the process is NOT returned and waitpid() cannot retrieve the response code.
The most likely reason for "No child processes" error from waitpid() is that the signal handler for child processes (SIGCHLD) is not set to SIG_DFL. This should not be possible however it seems that on Linux a process run in the shell (or maybe a shell process) can set it to SIG_IGN or some other handler.
I reproduced the “No child processes” error by setting the signal handler for SIGCHLD to SIG_IGN using signal(SIGCHLD, SIG_IGN).
Using strace will show the signals getting set. As it was at a customer installation I added logging to report if the signal handler was not set to SIG_DFL (the function signal(SIGCHLD, SIG_DFL) will return the value of the original signal handler). Unfortunately I didn't get confirmation , the problem is no longer occurring after the fix was delivered!
References:
https://colinxy.github.io/computer-science/2017/01/27/bash-handles-signals.html https://xaizek.github.io/2014-10-05/why-restoring-signal-handlers-is-important/
Comments