system() calls return "cannot allocae memory"

linuxdev817 · 08-26-2011, 08:17 PM

Quote:

Originally Posted by ta0kira

It will terminate when EOF is reached for the socket, which is why you leave the socket open in the main process until it no longer needs the worker process. Until then it will block on fgets until a newline is read, enter the loop body, then block again.

Note that this method will fail miserably if the command sent to the worker has a newline. This can be fixed by replacing or removing newline characters before fprintfing in fake_system, if it concerns you.
Kevin Barry

OK. This makes perfect sense.

Nominal Animal · 08-26-2011, 10:08 PM

Quote:

Originally Posted by johnsfine

Yes, the problem could be fixed by messing with overcommit settings. No, it could not be fixed by the specific change suggested by Nominal Animal.

If overcommit_memory is set to 1, it certainly does fix it. If you doubt this, please try it before posting further condescending comments.

I also don't understand why you consider Linux overcommit policies to be somehow hard to understand. While you may think so, it does not mean an average Linux user or administrator should find it difficult. I am, however, very familiar with the policy of rather than understanding the systems, throwing more and bigger hardware at every problem encountered. I despise that attitude.

In a nutshell, vm.overcommit_memory (/proc/sys/vm/overcommit_memory) detemines the policy:

0: Grant allocations if they can be satisfied.
1: Grant allocations up to swap + RAM + RAM * vm.overcommit_ratio / 100
2: Grant all allocations

vm.overcommit_ratio is in /proc/sys/vm/overcommit_ratio.

ta0kira · 08-26-2011, 10:58 PM

Quote:

Originally Posted by Nominal Animal

If overcommit_memory is set to 1, it certainly does fix it. If you doubt this, please try it before posting further condescending comments.

Whether or not this is true, it's not a good policy to make OS-specific (not to mention system-wide) changes as a solution to a problem that can be fixed by reconsidering the design of the program.
Kevin Barry

johnsfine · 08-27-2011, 06:07 AM

Quote:

Originally Posted by Nominal Animal

please try it before posting further condescending comments.

I have tested Linux over commit behavior in many situations, including nearly the same as the example in this thread. You are quoting roughly correct documentation (maybe exactly correct to some version of documentation) but still misinterpreting the way it all fits together.

Nominal Animal · 08-27-2011, 02:12 PM

Quote:

Originally Posted by ta0kira

Whether or not this is true, it's not a good policy to make OS-specific (not to mention system-wide) changes as a solution to a problem that can be fixed by reconsidering the design of the program.

Right. I suppose you did not see my it is kind of a big hammer to use in my earlier post.

I definitely agree with you that forking a slave early, to execute the external programs, is the correct solution. It is not at all out of the ordinary one; just consider e.g. Apache mod_cgid or mod_fastcgi.

If, as the original poster states, pgrep and dmidecode are the only external programs used, it would make most sense to provide the command output to the parent process either as a descriptor or stream, or a dynamically allocated string. Although I normally recommend a stream approach, depending on the OP's approach, a complete data block may be better approach here; I guess it depends on how OP processes the output. After all, the two commands provide limited output; certainly the amount of memory used to process the output is no issue here.

ta0kira · 08-27-2011, 02:32 PM

Quote:

Originally Posted by Nominal Animal

Right. I suppose you did not see my it is kind of a big hammer to use in my earlier post.

I did see that, but there were no caveats as to the effects this solution might have on the operation of the rest of the system, and the solution's continued discussion increases its perceived suitability.
Kevin Barry

Nominal Animal · 08-27-2011, 10:38 PM

Quote:

Originally Posted by ta0kira

I did see that, but there were no caveats as to the effects this solution might have on the operation of the rest of the system, and the solution's continued discussion increases its perceived suitability.

Perhaps the OP, linuxdev817, could tell us what kind of program is this? Since it uses 55GB on a machine with 64GB of RAM, it really should be the major task run on that machine at that time. In principle, if running that program is the main purpose for the machine, then changing the overcommit settings is a valid option, although the effects must be considered with respect to other services running on that machine. However, in this case, it does make a alot more sense to modify the original program to avoid the problem entirely. Not only is it cleaner, but it turns out the modifications are not too complicated at all.

Let us get back to the original issue at hand, and inspect both of the external programs run.

dmidecode should only be run once at the beginning. The output is really only useful for hardware identification. The interesting bits of the output should not change while the application is being run. To get reliable information about hot-plugged hardware and hardware changes, you'd better switch to an external script that users can modify according to their exact needs. Even then, if your main program is only interested in saving the output to a single file, you can just fork() a slave process at the very beginning of the program, and have it wait in a blocking read. A single byte transferred from the master tells the slave to (re-)write the file, and a single byte transferred back will tell the master the operation was completed (perhaps with exit status?). In this form, the code should be even simpler than the example code shown by ta0kira above.

That leaves just the pgrep. As mentioned earlier in this thread, it can be reimplemented by reading /proc. I've included an example C99 program to do exactly that. My function prefers to use the process name files /proc/pid/comm provided by kernels 2.6.30 and newer, but it will automatically fall back to the first component of /proc/pid/cmdline if the former are not available. (For other use cases, reading the /proc/pid/exe symlinks would tell exactly which binary each process is running, but for security reasons those symlinks are usually readable only by the owner user. Because the OP is interested in finding the pids for ntpd, I assumed that approach would not work.)

I've only lightly tested the code. It does need a bit of a cleanup and polishing before real use. It should handle all errors in a sane manner, and not leak memory. It should be thread-safe (it specifically uses readdir_r()), but the directory lookup functions are thread cancellation points. So, if you use it in a threaded environment, you should add code to disable thread cancellation for the duration of the function, so that the thread is not cancelled mid-way, leaking dynamically allocated memory. Only main() uses stdio.h, so you can safely omit that include when using in your own programs. Feel free to use the code in any way you wish, just don't hold me responsible.

Code:

#define  _ISOC99_SOURCE
#define  _BSD_SOURCE
#define  _POSIX_C_SOURCE 200809L

#include <unistd.h>
#include <stdlib.h>
#include <stddef.h>
#include <sys/types.h>
#include <dirent.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>

#include <stdio.h>

/* This function finds and returns a list of process IDs currently
 * running a command matching the specified parameter.
 * The parameter should be the name or base name of the command.
 * (The parameter should not specify a path.)
 *
 * If successful, the function returns a pointer to a zero-terminated list.
 * If an error occurs, the function will return NULL with errno set.
 * If no processes are found, the function will return NULL with errno == 0.
 *
 * If the kernel provides /proc/<pid>/comm files, they are used;
 * otherwise the function falls back to /proc/<pid>/cmdline files.
 * When matching, the specified parameter is compared as-is,
 * but the contents of the above files are trimmed:
 * leading path (up to and including the final /), and any trailing
 * ASCII control characters (including newlines and tabs), are ignored.
 * The result must match the parameter exactly.
 *
 * This function is thread-safe.
 * (You should disable thread cancellation while this function runs,
 *  so that if the thread is cancelled, it does not leak memory.)
*/
pid_t *pidof(const char *const command)
{
	const size_t	 command_length = (command) ? strlen(command) : 0;

	/* PID list */
	pid_t		*pid = NULL;
	size_t		 pids = 0;
	size_t		 pids_max = 0;

	/* Buffer for "/proc/<pid>/cmdline" */
	char		 path[16 + 3 * sizeof(pid_t)] = "/proc/";
	char		*name;

	/* Buffer for command data */
	char		 data[128];
	unsigned char	*contents;
	int		 descriptor;

	/* /proc handle and a directory entry */
	DIR		*dir = NULL;
	struct dirent	*dir_entry = NULL;

	/* General and temporary variables */
	struct dirent	*entry;
	int		 result;
	pid_t		 curr;
	ssize_t		 bytes;
	size_t		 length;

	/* An empty or NULL comm is invalid. */
	if (command_length < (size_t)1) {
		errno = EINVAL;
		return NULL;
	}

	dir = opendir("/proc/");
	if (!dir)
		return NULL;

	/* From this point on, some cleanup is necessary before failing.
	 * This expands to a full block { ... }. When using the macro
	 * in an if..then..else expression, you must not use a semicolon
	 * at end, or you will end the statement! */
#define RETURN_FAILURE(with_errno) 			\
	{	const int __e = (with_errno);		\
		int       __r;				\
		if (pid)       free(pid);		\
		if (dir_entry) free(dir_entry);		\
		do {					\
			__r = closedir(dir);		\
		} while (__r == -1 && errno == EINTR);	\
		errno = __e;				\
		return NULL;				\
	}

	/* Allocate a directory entry sufficiently large for /proc/ entries. */
	dir_entry = malloc( offsetof(struct dirent, d_name)
	                  + fpathconf(dirfd(dir), _PC_NAME_MAX)
	                  + (size_t)1 );
	if (!dir_entry)
		RETURN_FAILURE(ENOMEM);

	while (1) {

		/* Get the next directory entry. */
		do {
			entry = NULL;
			result = readdir_r(dir, dir_entry, &entry);
		} while (result == -1 && errno == EINTR);
		if (result == -1)
			RETURN_FAILURE(errno);

		/* All done? */
		if (!entry)
			break;

#ifdef _DIRENT_HAVE_D_TYPE
		/* Skip entries that are not directories. */
		if (entry->d_type != DT_UNKNOWN && entry->d_type != DT_DIR)
			continue;
#endif

		/* Ignore entries not starting with a digit. */
		if (!(entry->d_name[0] >= '0' && entry->d_name[0] <= '9'))
			continue;

		/* Parse the PID, and construct the path at the same time. */
		{	const char	*src = (const char *)&(entry->d_name[0]);

			/* Skip "/proc/" in path. */
			name = path + 6;

			curr = (pid_t)0;

			while (*src >= '0' && *src <= '9') {
				const pid_t	prev = curr;

				curr = (pid_t)10 * curr + (pid_t)(*src - '0');
				if (curr < prev)
					break; /* Overflow! */

				*(name++) = *(src++);
			}

			/* Overflow? Not a PID directory? */
			if (*src)
				continue;
		}

		/* Append slash to the path. */
		*(name++) = '/';

		/* Open /proc/<pid>/comm file. */
		name[0] = 'c';
		name[1] = 'o';
		name[2] = 'm';
		name[3] = 'm';
		name[4] = 0;
		do {
			descriptor = open(path, O_RDONLY | O_NOCTTY);
		} while (descriptor == -1 && errno == EINTR);
		if (descriptor != -1)
			goto opened;

		/* Open /proc/<pid>/cmdline file. */
		name[0] = 'c';
		name[1] = 'm';
		name[2] = 'd';
		name[3] = 'l';
		name[4] = 'i';
		name[5] = 'n';
		name[6] = 'e';
		name[7] = 0;
		do {
			descriptor = open(path, O_RDONLY | O_NOCTTY);
		} while (descriptor == -1 && errno == EINTR);
		if (descriptor != -1)
			goto opened;

		/* The process no longer exists, or we don't have sufficient
		 * access rights to know what it is. So ignore it. */
		continue;

	opened:
		/* Read the initial part of the file. Note that it is a procfs file,
		 * so a loop (for less than a page of data) is not necessary.
		 * All read errors are treated as if the file was never there. */
		do {
			bytes = read(descriptor, data, sizeof(data) - (size_t)1);
		} while (bytes == (ssize_t)-1 && errno == EINTR);

		/* Close the file. */
		do {
			result = close(descriptor);
		} while (result == -1 && errno == EINTR);

		/* Did we get anything out of the file? */
		if (bytes < (ssize_t)1)
			continue;

		/* Yes, bytes bytes. */
		data[bytes] = 0;

		/* Skip leading dash, or directory part if a path. */
		if (data[0] == '-')
			contents = (unsigned char *)data + 1;
		else
		if (data[0] == '/')
			contents = (unsigned char *)strrchr(data, '/') + 1;
		else
			contents = (unsigned char *)data;

		/* Trim out trailing control characters. */
		length = strlen((char *)contents);
		while (length > (size_t)0 && (contents[length - 1] < 32 || contents[length - 1] == 127))
			length--;
		contents[length] = 0;

		/* The result must match the given command exactly. */
		if (length != command_length || strcmp(command, (char *)contents))
			continue;

		/* Grow pid list if necessary. */
		if (pids >= pids_max) {
			const size_t	 new_max = pids + 128;
			pid_t		*new_pid;

			new_pid = realloc(pid, sizeof(pid_t) * (new_max + (size_t)1));
			if (!new_pid)
				RETURN_FAILURE(ENOMEM);

			pid      = new_pid;
			pids_max = new_max;
		}

		/* Append this pid to the list. */
		pid[pids++] = curr;
	}

	/* Close the directory handle. */
	do {
		result = closedir(dir);
	} while (result == -1 && errno == EINTR);
	if (result) 
		RETURN_FAILURE(errno);

	/* The RETURN_FAILURE macro is not valid after this point. */
#undef  RETURN_FAILURE

	/* If there are no pids, discard the list. */
	if (!pids && pid) {
		free(pid);
		pid = NULL;
	}

	/* If there are pids, terminate the list. */
	if (pids)
		pid[pids] = (pid_t)0;

	/* Done. */
	errno = 0;
	return pid;

}


int main(int argc, char *argv[])
{
	int	 arg, i, n;
	pid_t	*list;

	for (arg = 1; arg < argc; arg++) {

		list = pidof(argv[arg]);
		if (!list) {
			if (errno) {
				fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
				fflush(stderr);
			} else {
				fprintf(stdout, "%s: No processes.\n", argv[arg]);
				fflush(stdout);
			}
			continue;
		}

		n = 0; while (list[n]) n++;
		fprintf(stdout, "%s: Found %d processes:\n", argv[arg], n);
		for (i = 0; i < n; i++)
			fprintf(stdout, "\t%d\n", (int)list[i]);
		fflush(stdout);

		free(list);
	}

	return 0;
}

Having tested it on a couple of machines, it seems that the example program is more than twice as fast as a similar pgrep command; I timed a few hundred ./example getty and ./example bash runs against pgrep getty and pgrep bash. The output contained the same PIDs, of course. The memory use should also stay quite moderate (much less than running pgrep), so it should be a realistic alternative to pgrep.

Certain programs modify their argv[0]. For example, interactive Bash modifies argv[0] to "-bash". See the code under the comment Skip leading dash, above, how I handled that. (For an absolute path, only the base name (filename) part is used; otherwise, a leading dash is skipped. In all cases trailing ASCII control characters, including newlines and tabs, are ignored. Whatever is left should match the given parameter exactly.)

I hope you find this useful,

ta0kira · 08-28-2011, 12:49 PM

Quote:

Originally Posted by Nominal Animal

Having tested it on a couple of machines, it seems that the example program is more than twice as fast as a similar pgrep command; I timed a few hundred ./example getty and ./example bash runs against pgrep getty and pgrep bash. The output contained the same PIDs, of course. The memory use should also stay quite moderate (much less than running pgrep), so it should be a realistic alternative to pgrep.

Certain programs modify their argv[0]. For example, interactive Bash modifies argv[0] to "-bash". See the code under the comment Skip leading dash, above, how I handled that. (For an absolute path, only the base name (filename) part is used; otherwise, a leading dash is skipped. In all cases trailing ASCII control characters, including newlines and tabs, are ignored. Whatever is left should match the given parameter exactly.)

I'm sure part of the increased efficiency comes from avoiding regex.h. Also, to avoid the confound of a notional argv[0] value you could readlink and basename /proc/*/exe.

One problem with replicating pgrep is that it would have to find information about running processes differently on systems without procfs (e.g. FreeBSD without Linux procfs compatibility enabled.) I know that many people who program on Linux often say, "that's ok; I'll never want to run it on FreeBSD, etc." (not you specifically), but it's always nice to not have to redesign non-trivial code if you change your mind.
Kevin Barry

Nominal Animal · 08-28-2011, 05:41 PM

Quote:

Originally Posted by ta0kira

I'm sure part of the increased efficiency comes from avoiding regex.h.

I'd wager the majority, considering most of the run-time difference is in user CPU time used.

Quote:

Originally Posted by ta0kira

Also, to avoid the confound of a notional argv[0] value you could readlink and basename /proc/*/exe.

On most systems, the data at argv[0] can be modified (the string contents, not necessarily the pointer, nor the length). Changes are immediately visible to other processes. In Linux kernels 2.6.30 and later, prctl(PR_SET_NAME, (unsigned long)&namestring, 0UL, 0UL, 0UL) can be used to set the program name. These two are distinct values, but modifiable by the process itself. Thus, both /proc/*/cmdline and /proc/*/comm are under the processes own control, and may contain anything.

On the other hand, /proc/*/exe is a reliable symlink to the binary the process runs. The symlink is set when a process exec()s, so cannot be manipulated by the process itself. It is a dangling (empty) symlink for kernel processes. However, like I wrote earlier, readlink() on /proc/*/exe will typically fail due to insufficient access rights. Normally, only the owner user is allowed to find out which binary each process runs, for security reasons.

After some careful consideration, I believe it would be safe to use a setuid-root binary to report the PIDs of processes running the specified binaries. While it should not create new or potential security issues, it could possibly aid a local attacker in correctly timing some existing security hole in some specific circumstances. On the other hand, the information it outputs is normally available via ps -C command -o pid= anyway, except that processes can modify the information ps relies upon.

In the OPs case, the program would have to fork() a slave before allocating lots of memory, and have that slave fork() and exec the child.

Here is an example program I wrote for this case. It turns out to be even simpler, and faster, because instead of inspecting the /proc/*/exe symlinks, we can just stat() the targets, and compare them to the listed files. The target files we are interested in need to be kept open to make sure the device and inode numbers stay stable. I understand on distributed filesystems like NFS they are allowed to change if such a change does not introduce a conflict. Keeping a read-only descriptor makes sure the identifiers do not change, but it also makes sure the user is able to read the target file too. It might be a good idea to add checks, so that only non-setuid executable target files are considered -- but since this is just an example program, I didn't bother. If somebody finds this useful, we can discuss that and other security implications further in another thread.

This version is quite raw, has some designs that may be considered weird -- for example, the output buffer is filled backwards --, and it is only very lightly tested, but I believe it is a very promising efficient approach to those who need the PIDs for processes running specified binaries. Here is the code:

Code:

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <dirent.h>
#include <stdlib.h>
#include <errno.h>

/* Limit the number of binaries specified on the command line,
 * so we won't consume too many file descriptors et cetera.
*/
#define   MAX_BINARIES          512

#define   EXITSTATUS_OK		0	/* Success */
#define   EXITSTATUS_OUTPUT	1	/* Error writing to standard output */
#define   EXITSTATUS_NONE       0	/* No files found */
#define   EXITSTATUS_TOOMANY    2       /* Too many binaries specified */
#define   EXITSTATUS_MEMORY	2	/* Out of memory */
#define   EXITSTATUS_OPENDIR	3	/* Cannot open /proc */
#define   EXITSTATUS_READDIR	3	/* Error reading /proc */
#define   EXITSTATUS_CLOSEDIR	3	/* Error closing /proc */

/* Output buffer. */
static unsigned char		 out_data[65536];
static unsigned char *const	 out_ends = out_data + sizeof(out_data);
static unsigned char *const	 out_mark = out_data + 2 + 3 * sizeof(unsigned long);
static unsigned char		*out_next = out_data + sizeof(out_data);
static int			 out_errno = 0;

/* Flush output. */
static inline int out_flush(void)
{
	const int		 f = STDOUT_FILENO;
	unsigned char		*p = out_next;
	unsigned char *const	 q = out_ends;
	ssize_t			 n;

	/* Mark buffer clear */
	out_next = out_ends;

	/* Previous error? */
	if (out_errno)
		return out_errno;

	while (p < q) {

		/* Read some new data. */
		n = write(f, p, (size_t)(q - p));

		if (n > (ssize_t)0)
			p += n;

		else
		if (n != (ssize_t)-1)
			return out_errno = EIO;

		else
		if (errno != EINTR)
			return out_errno = errno;

	}

	return 0;
}

/* Write a PID and a newline to standard output. */
static inline void out_pid(const pid_t pid)
{
	/* Note: PIDs are always positive. */
	unsigned long	u = (unsigned long)pid;

	/* If the buffer is full, flush it. */
	if (out_next < out_mark)
		out_flush();

	/* Prepend the PID to the buffer. */
	*(--out_next) = '\n';
	do {
		*(--out_next) = '0' + (u % 10UL);
		u /= 10UL;
	} while (u);

	/* Done. */
	return;
}

int main(int argc, char *argv[])
{
	size_t		 count = 0;
	dev_t		 device[MAX_BINARIES];
	ino_t		 inode[MAX_BINARIES];
	int		 descriptor[MAX_BINARIES];

	DIR		*dir = NULL;
	struct dirent	*entry;
	char		 exe[128] = "/proc/"; /* "/proc/<pid>/exe" */

	struct stat	 info;
	size_t		 i;
	pid_t		 pid;
	const char	*src;
	char		*dst;
	int		 arg, result, desc;

	/* Scan the named files on the command line. */
	for (arg = 1; arg < argc; arg++) {

		/* Open file read-only. */
		do {
			desc = open(argv[arg], O_RDONLY | O_NOCTTY);
		} while (desc == -1 && errno == EINTR);
		if (desc == -1)
			continue;

		/* Get statistics on it. */
		do {
			result = fstat(desc, (struct stat *)&info);
		} while (result == -1 && errno == EINTR);
		if (result == -1) {
			do {
				result = close(desc);
			} while (result == -1 && errno == EINTR);
			continue;
		}

		/* Check if already listed. */
		for (i = 0; i < count; i++)
			if (device[i] == info.st_dev && inode[i] == info.st_ino)
				break;
		if (i < count) {
			do {
				result = close(desc);
			} while (result == -1 && errno == EINTR);
			continue;
		}

		/* Too many? */
		if (count >= (size_t)MAX_BINARIES) {
			for (i = 0; i < count; i++)
				do {
					result = close(descriptor[i]);
				} while (result == -1 && errno == EINTR);
			return EXITSTATUS_TOOMANY;
		}

		/* Append device and inode to the list. */
		device[count]     = info.st_dev;
		inode[count]      = info.st_ino;
		descriptor[count] = desc;
		count++;
	}

	/* Any files found? */
	if (count < (size_t)1)
		return EXITSTATUS_NONE;

	/* Open /proc for scanning. */
	do {
		dir = opendir("/proc/");
	} while (!dir && errno == EINTR);
	if (!dir) {
		for (i = 0; i < count; i++)
			do {
				result = close(descriptor[i]);
			} while (result == -1 && errno == EINTR);
		return EXITSTATUS_OPENDIR;
	}

	/* Scan loop. */
	while (1) {

		do {
			errno = 0;
			entry = readdir(dir);
		} while (!entry && errno == EINTR);
		if (!entry)
			break;

#if defined(_DIRENT_HAVE_D_TYPE) && defined(DT_DIR) && defined(DT_UNKNOWN)
		/* Skip entries that are not directories. */
		if (entry->d_type != DT_DIR && entry->d_type != DT_UNKNOWN)
			continue;
#endif

		/* Skip entries that do not begin with a digit. */
		if (!(entry->d_name[0] >= '0' && entry->d_name[0] <= '9'))
			continue;

		/* Parse the entry as a PID, but construct /proc/PID simultaneously. */
		src = (const char *)&(entry->d_name[0]);
		dst = (char *)&(exe[6]); /* Skip "/proc/", which is already there. */

		pid = (pid_t)0;
		while (*src >= '0' && *src <= '9') {
			const pid_t	old = pid;

			pid = (pid_t)10 * pid + (pid_t)(*src - '0');

			/* Check for overflow */
			if (pid < old)
				break;

			*(dst++) = *(src++);
		}

		/* Not a valid PID? */
		if (!pid || *src)
			continue;

		/* Complete the exe name. */
		dst[0] = '/';
		dst[1] = 'e';
		dst[2] = 'x';
		dst[3] = 'e';
		dst[4] = 0;

		/* Get the statistics on the symlink target.
		 * Note that this operation may fail, if the process has
		 * already exited. */
		do {
			result = stat(exe, (struct stat *)&info);
		} while (result == -1 && errno == EINTR);
		if (result == -1)
			continue;

		/* Check if we have a match. */
		for (i = 0; i < count; i++)
			if (device[i] == info.st_dev && inode[i] == info.st_ino)
				break;
		/* No match? */
		if (i >= count)
			continue;

		/* Output this pid. */
		out_pid(pid);
	}

	/* Read error? */
	if (errno) {
		do {
			result = closedir(dir);
		} while (result == -1 && errno == EINTR);
		for (i = 0; i < count; i++)
			do {
				result = close(descriptor[i]);
			} while (result == -1 && errno == EINTR);
		out_flush();
		return EXITSTATUS_READDIR;
	}

	/* Close the directory. */
	do {
		result = closedir(dir);
	} while (result == -1 && errno == EINTR);
	if (result == -1) {
		for (i = 0; i < count; i++)
			do {
				result = close(descriptor[i]);
			} while (result == -1 && errno == EINTR);
		out_flush();
		return EXITSTATUS_CLOSEDIR;
	}

	/* Close all descriptors. We no longer need them,
	 * to keep the device and inode numbers stable. */
	for (i = 0; i < count; i++)
		do {
			result = close(descriptor[i]);
		} while (result == -1 && errno == EINTR);

	/* Flush output. */
	if (out_flush())
		return EXITSTATUS_OUTPUT;

	/* All done successfully. */
	return EXITSTATUS_OK;	
}

As usual, you are free to use the code in any way you wish, just do not hold me responsible.

The program will only list the PIDs of processes it is allowed to see, basically those you own. You can override that by installing it setuid root. For the reasons I explained above, I do believe this is safe. As far as I can tell, filesystem access check override capabilities CAP_FOWNER, CAP_DAC_OVERRIDE, and CAP_READ_SEARCH, do not apply to procfs files; setuid root seems to be the only effective override. Here's how I'd install it, if the above was saved as pidof-executable.c:

Code:

gcc -Wall -pedantic -std=c99 -o pidof-executable pidof-executable.c
sudo install -m 04755 pidof-executable /usr/local/bin/pidof-executable

If there is somebody interested in developing it further, I suggest we take it up on a new thread, and not hijack this one.

Quote:

Originally Posted by ta0kira

One problem with replicating pgrep is that it would have to find information about running processes differently on systems without procfs (e.g. FreeBSD without Linux procfs compatibility enabled.)

I agree. According to Wikipedia, /proc is supported by many, but not all, Unix-like OSes. In FreeBSD, ps seems to be quite tightly coupled to an undocumented sysctl interface to obtain process information when /proc is not mounted; the ps man page says not all information on processes is available then. In other words, on FreeBSD at least, it is very difficult, probably impossible -- since the undocumented interface is liable to change in the future -- to reliably open-code the functionality.

Therefore using a slave process fork()ed early, before allocating lots of memory, and have that process fork() and execute external scripts (ones that can be modified to suit each environment best) is likely the most modular and effective approach in the long term. Both my example programs I've listed in this thread work fine as external programs too.

On the other hand, when handling lots of data and I/O, there are Linux facilities not generally available on other OSes that are useful in implementing more efficient algorithms. I'm mostly thinking of I/O -- mmap(), madvise()/posix_fadvise(), file leases, and so on -- but also peculiarities (deviations from the POSIX specs) in the threading model. In high-performance computing, specifically running very large simulations, I know those features can be utilized to get more out of the hardware. It should be considered whether portability is sufficiently important to deny the use of those features. (Personally, for tools, commandline and desktop applications, I believe portability trumps Linux-specific features.)

Quote:

Originally Posted by ta0kira

I know that many people who program on Linux often say, "that's ok; I'll never want to run it on FreeBSD, etc." (not you specifically), but it's always nice to not have to redesign non-trivial code if you change your mind.

No, no, I agree! I encounter this very often with shell scripts and stuff like that -- usually when somebody has to work around some bugs, inanities, or limitations that could have been taken care of from the get go, with a bit of thought before implementation. (Most typical one is whitespace or accented or non-latin letters in file names.) "That will not happen" is not that different from "I'll only need it here".

In the OP's case, consider the external commands for example. (I'm assuming this is a real, heavy-duty application, properly installed system-wide, and not just a standalone program compiled and sometimes run by the OP.) If they are hard-coded, you have to recompile the application if you want to modify the commands. On the other hand, if you have companion scripts, say /usr/share/application/ntpd-pids.sh and /usr/share/application/hardware-identity.sh , they can be exec'd directly (fork()+exec(), not system()) instead of the commands, without having to somehow configure or edit their names/paths, but the end users or packagers can update and modify the scripts to match different use cases and environments. For example, I might wish to use IPMI tools instead of dmidecode for hardware identification.

With this approach, the communications between the master process and the slave process (doing the fork() and exec()) can be much simplified. Basically, the one bidirectional socket is enough. Let the master process send a fixed-size identifier, telling the slave which script to run. Have the slave process return the exit status when it completes. This is not only portable (socketpair(), fork(), execv() are all in POSIX.1-2001), but should be implementable and extendable rather cleanly. If you want to run multiple external commands simultaneously, let the master process include a unique token in the identifier, and have the slave include that in the response. (It is then just a bit of thread management to sort it out in the master side.) The end result is, it all will just work for end-users, but if they want, they can adapt it to their needs. If they cannot rely on pgrep, they can use for example one of the programs I listed in this thread. Or something totally different.

linuxdev817 · 08-29-2011, 09:51 AM

This program is for an appliance that reconstructs network traffic for analysis in real-time (well, as close to "real-time" as possible). The early allocation of the 55Gb of memory occurs to eliminate the overhead of memory allocation as the program executes. Furthermore, the program:

- is the main task on the system.
- is very tightly integrated with the current hardware configuration.
- is reliant on the Debian OS (due to package dependencies). A Debian specific approach that will not work on FreeBSD will be fine for my purposes. Yes, this is the "that's ok; I'll never want to run it on FreeBSD, etc." approach.

- will only need to call dmidecode once at the beginning. This can be done before the memory is allocated. Hot-plugged hardware will not be a concern.
- may need to call pgrep simultaneously from different threads
- only needs to know if pgrep and dmidecoded were successful. The output file generated by both of these will be parsed by the program.
- currently runs on kernel 2.6.26.

Granted that pgrep is the only call holding me back right now I am more inclined to use something like the substitue that nominal has posted. This will be more than what I need to solve the current problem. However, if in the future I need to add additional calls to external binaries, forking a child process (as taOkira initially suggested and Nominal also stated is best approach) at the beginning will probably become the better solution. Programs such as pgrep are much more heavily tested than anything I would create so I would rather have the reliability of those existing binaries over something I create. In addition to this, I am going to be spending so much time coding the analysis portions of the software I would like to try to avoid the overhead of adding custom code when binaries exist that can perform essentially the same task.

Due to the nature of the processing the system performs I am also hesitant to release additional threads and proceses. It currently uses all 24 available CPU's simultaneously when under any type of load so it would be nice if the existing threads and process that are running could be used to solve this problem.

linuxdev817 · 08-29-2011, 12:13 PM

It just occurred to me that I will also need:

ntpdate
/etc/init.d/ntp stop
/etc/init.d/ntp start
chkconfig ntp on
chkconfig ntp off

I am going to have to fork() I believe.

nethomas1968 · 11-15-2011, 01:58 AM

I tried the code presented by ta0kira, but can't get it to work.
Nothing appears in the worker_loop.
i.e., the fgets just sits there and never receives anything.

I have added a printf in the fake_system call, and it gets the right command text.
But, I expect that the fprintf in there should be picked up by the fgets in the worker_loop !

Has this code actually been tested by anyone else?

Regards

Nick