fgets() do not give any output for ls command run using popen
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
fgets() do not give any output for ls command run using popen
Hi Guys,
I am a new member in this community. I have an application which uses popen extensively to query various information from shell by different functions.In long run, I am observing strange thing that fgets() start not giving any output for ls command.Please find below details of this problem:
1. I am using cross compiled kernel version for my embedded device. “Linux (none) 3.3.8-2.0 mips GNU/Linux”. I am facing problem in one small module in my application.
2. In my code I use popen to get a complete (absolute path of file abc123.bin) file name at some specific path (where the file ‘abc123.bin’ is always present in file system at that path) and I am running command from popen is “ls /home/xyz/abc*.bin”. There is no other file with name starting with “abc*”.
3. In normal case, It always pass but when I perform a long run test which do the following :
a. soft reboot using system call with argument “reboot –f” .
b. get file name using popen, and read file name using fgets.
c. Copy file “/home/xyz/abc123.bin” to some other location.
d. soft reboot using system call with argument “reboot –f”
After above steps it fail after 500~1000 times of execution. Below is the code snippet from the function where error is coming:
//======================Code snippet Start=================================//
char command[256] = {'\0'}, command_op[256] = {'\0'};
FILE *fp = NULL;
/* check if file is present*/
memset(command, '\0', sizeof(command));
snprintf(command,(sizeof(command)-1), "ls %s%s",”/home/xyz/”,”abc*.bin”);
printf ("-. command %s\n",command);
fp = popen(command, "r");
if(fp == NULL)
{
printf ("-. FAIL at %s:%s:%d\n",__FILE__,__FUNCTION__,__LINE__); //line number XXX
return -1;
}
if (fgets(command_op, sizeof(command_op), fp) == NULL)
{
printf ("-. fgets FAIL at %s:%s:%d errno %d\n",__FILE__,__FUNCTION__,__LINE__,errno); //line number YYY
return -1;
}
//======================Code snippet End=================================//
Error logs are like:
----------------------------------------------------------------
-. command ls /home/xyz/abc*.bin
-. fgets FAIL at test.c:test_function:YYY errno 0
----------------------------------------------------------------------------
4. Investigation:
a. No failure in popen.
b. Fgets fails and return NULL, error number is 0 (as popen set error number and fgets do not set error number, and popen passed so it is 0).
c. If I use shell prompt and use same last command ie. “ls /home/xyz/abc*.bin”. It returns result correctly.
d. If after getting this error, I call same function (which uses popen to get file name) again without rebooting my device, It always fails in fgets. Tried with adding 3 trials of above snippet code in case of failure.
e. If I use system call “system(“ls /home/xyz/abc*.bin >/tmp/temp_output.txt”);” and read output using fgets. It pass always (tried with adding this code in case fgets after popen fails).
5. Questions:
a. As there is no failure from popen, so is it always expected that the fgets will get the output of command run by popen?
b. Why successive retry of (popen+fgets) could not get the expected result in case of failure? But it passes if we reboot and try again.
c. Why (System call + fgets ) gives right result but (popen+fgest) do not give right result?
d. Is there any better alternative of popen?
Please help me find the problem/solution; Let me know if more information is needed.
I don't have a good answer for you but I do have an investigation suggestion. fgets is returning null which means reading on the popen is returning EOF (since you didn't have a popen failure). Where you output the fgets failure do a pclose and look at the return value. It may give you some clue as to why you are getting the EOF without any other output from the ls command.
excerpts from popen manpage
The pclose() function waits for the associated process to terminate and returns the exit status of the command as returned by wait4(2).
The pclose() function returns -1 if wait4(2) returns an error, or some other error is detected. In the event of an error, these functions set errno to indicate the cause of the error.
As suggested, I have captured the return value of pclose in case of issue occurs. I get the same result after running the application on long run for two times. Below are the observations:
“errno: 0, ret_value: 11” :- these are the prints for error number and return value of pclose.
This info may be helpful, ‘my application’ PID is 730 and ‘khelper’ process PID is 11.
“waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned. On error, -1 is returned.”
I am still confused what is meaning of return value ‘11’. As per man page it should be return value of waitpid(). But ‘11’ is PID of ‘khelper’ process which is not the process created by popen(). And error number is set to 0 so there should not be any issue.
I believe the pclose return value is the status portion rather than the pid of the child process, so you should be able to use the status macros (man 2 wait) to pull out information. 11 is odd as a return value though since I believe the lower 8 bits are the exit value and standard (gnu) ls only returns 0, 1, and 2 (iirc). You might need to peruse the source code of your ls to see what meaning the exit value has.
Discussion:
1. As per above output, it seems process was not exited properly. Reason can be due to catching some signal but signal number is coming as ‘44’ which is not a valid signal number.
2. Is it possible that this return value ‘11’ is “SIGSEGV: 11: Invalid memory reference”? But not sure if pclose return signal number.
Let me know if above testing is ok? And what else we can try to get the root cause of this issue.
You are correct in number 2 it is a SIGSEGV. Your check for it is incorrect. The macros to examine the return value, use the value and not a pointer to the value. In wait function calls you use a pointer to your variable, but use the actual variable in the macros.
So with that in hand I'd guess you have memory corruption, potentially a bad ram module and you only hit the bad spot in ram every once in a while. You could run extensive memtest on the ram and see if that shows the issue.
On the fly you might be able to test it by flushing the linux file buffer cache when you see the issue and re-running the program (without rebooting).
(might want to sync first, but since you are just doing test, I'm guessing no outstanding data needs to be written out)
sysctl -w vm.drop_caches=3
Release memory used by the Linux kernel in caches
=1 --> to free pagecache
=2 --> to free dentries and inodes
=3 --> to free pagecache, dentries and inodes
Last edited by estabroo; 02-19-2014 at 11:44 AM.
Reason: put in macro example
If you have any other of these embedded devices, run the same test on them and see if any of them exhibit the same behavior. With regards to the current one, how long did you run the memory tests, was it about the same amount of time as when you run your reboot test?
The reason I ask is since it is an itermittent issue that occurs after a time it might be a thermal one. By that I mean some chip on the board might be heating up and going out of spec, a weak solder joint that allows separation once it's warmed up, the voltage regulator might be failing under thermal load, ...
If you have a thermal camera you could take images of the board under load and see if anything looks unusually warm or if you don't have one touch the top of the chips (wear a grounding strap) they'll be warm but shouldn't be hot (voltage regulator might be an exception to that if it has to dissipate a lot of excess energy from it's input voltage)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.