LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   fgets() do not give any output for ls command run using popen (https://www.linuxquestions.org/questions/linux-newbie-8/fgets-do-not-give-any-output-for-ls-command-run-using-popen-4175494614/)

madhukar87 02-12-2014 02:57 AM

fgets() do not give any output for ls command run using popen
 
Hi Guys,

I am a new member in this community. I have an application which uses popen extensively to query various information from shell by different functions.In long run, I am observing strange thing that fgets() start not giving any output for ls command.Please find below details of this problem:

1. I am using cross compiled kernel version for my embedded device. “Linux (none) 3.3.8-2.0 mips GNU/Linux”. I am facing problem in one small module in my application.

2. In my code I use popen to get a complete (absolute path of file abc123.bin) file name at some specific path (where the file ‘abc123.bin’ is always present in file system at that path) and I am running command from popen is “ls /home/xyz/abc*.bin”. There is no other file with name starting with “abc*”.

3. In normal case, It always pass but when I perform a long run test which do the following :

a. soft reboot using system call with argument “reboot –f” .
b. get file name using popen, and read file name using fgets.
c. Copy file “/home/xyz/abc123.bin” to some other location.
d. soft reboot using system call with argument “reboot –f”
After above steps it fail after 500~1000 times of execution. Below is the code snippet from the function where error is coming:
//======================Code snippet Start=================================//
char command[256] = {'\0'}, command_op[256] = {'\0'};
FILE *fp = NULL;
/* check if file is present*/
memset(command, '\0', sizeof(command));
snprintf(command,(sizeof(command)-1), "ls %s%s",”/home/xyz/”,”abc*.bin”);
printf ("-. command %s\n",command);
fp = popen(command, "r");
if(fp == NULL)
{
printf ("-. FAIL at %s:%s:%d\n",__FILE__,__FUNCTION__,__LINE__); //line number XXX
return -1;
}
if (fgets(command_op, sizeof(command_op), fp) == NULL)
{
printf ("-. fgets FAIL at %s:%s:%d errno %d\n",__FILE__,__FUNCTION__,__LINE__,errno); //line number YYY
return -1;
}
//======================Code snippet End=================================//
Error logs are like:
----------------------------------------------------------------
-. command ls /home/xyz/abc*.bin
-. fgets FAIL at test.c:test_function:YYY errno 0
----------------------------------------------------------------------------




4. Investigation:

a. No failure in popen.
b. Fgets fails and return NULL, error number is 0 (as popen set error number and fgets do not set error number, and popen passed so it is 0).
c. If I use shell prompt and use same last command ie. “ls /home/xyz/abc*.bin”. It returns result correctly.
d. If after getting this error, I call same function (which uses popen to get file name) again without rebooting my device, It always fails in fgets. Tried with adding 3 trials of above snippet code in case of failure.
e. If I use system call “system(“ls /home/xyz/abc*.bin >/tmp/temp_output.txt”);” and read output using fgets. It pass always (tried with adding this code in case fgets after popen fails).

5. Questions:

a. As there is no failure from popen, so is it always expected that the fgets will get the output of command run by popen?
b. Why successive retry of (popen+fgets) could not get the expected result in case of failure? But it passes if we reboot and try again.
c. Why (System call + fgets ) gives right result but (popen+fgest) do not give right result?
d. Is there any better alternative of popen?

Please help me find the problem/solution; Let me know if more information is needed.

estabroo 02-12-2014 01:53 PM

I don't have a good answer for you but I do have an investigation suggestion. fgets is returning null which means reading on the popen is returning EOF (since you didn't have a popen failure). Where you output the fgets failure do a pclose and look at the return value. It may give you some clue as to why you are getting the EOF without any other output from the ls command.

excerpts from popen manpage
The pclose() function waits for the associated process to terminate and returns the exit status of the command as returned by wait4(2).
The pclose() function returns -1 if wait4(2) returns an error, or some other error is detected. In the event of an error, these functions set errno to indicate the cause of the error.

madhukar87 02-13-2014 09:35 PM

Hi,

As suggested, I have captured the return value of pclose in case of issue occurs. I get the same result after running the application on long run for two times. Below are the observations:
 “errno: 0, ret_value: 11” :- these are the prints for error number and return value of pclose.
 This info may be helpful, ‘my application’ PID is 730 and ‘khelper’ process PID is 11.
 “waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned. On error, -1 is returned.”
 I am still confused what is meaning of return value ‘11’. As per man page it should be return value of waitpid(). But ‘11’ is PID of ‘khelper’ process which is not the process created by popen(). And error number is set to 0 so there should not be any issue.

estabroo 02-14-2014 11:42 AM

I believe the pclose return value is the status portion rather than the pid of the child process, so you should be able to use the status macros (man 2 wait) to pull out information. 11 is odd as a return value though since I believe the lower 8 bits are the exit value and standard (gnu) ls only returns 0, 1, and 2 (iirc). You might need to peruse the source code of your ls to see what meaning the exit value has.

madhukar87 02-18-2014 05:14 AM

I have made some sample program to find out info about the return status ‘11’ as suggested in ‘man 2 wait’. Below is the code and output of code:

Code:
//===============================================================//
int ret_value = -1;
int *status = 0;
int status_variable = 11;
status = &status_variable;
if(ret_value = WIFEXITED(status))
{
printf("WIFEXITED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFEXITED false ret_value :%d\n",ret_value);
}
ret_value = WEXITSTATUS(status);
printf("WEXITSTATUS ret_value :%d\n",ret_value);

if(ret_value = WIFSIGNALED(status))
{
printf("WIFSIGNALED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFSIGNALED false ret_value :%d\n",ret_value);
}
ret_value = WTERMSIG(status);
printf("WTERMSIG ret_value :%d\n",ret_value);
if(ret_value = WCOREDUMP(status))
{
printf("WCOREDUMP true ret_value :%d\n",ret_value);
}
else
{
printf("WCOREDUMP false ret_value :%d\n",ret_value);
}
if(ret_value = WIFSTOPPED(status))
{
printf("WIFSTOPPED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFSTOPPED false ret_value :%d\n",ret_value);
}
ret_value = WSTOPSIG(status);
printf("WSTOPSIG true ret_value :%d\n",ret_value);

if(ret_value = WIFCONTINUED(status))
{
printf("WIFCONTINUED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFCONTINUED false ret_value :%d\n",ret_value);
}
//================================================================//

Output:
//================================================================//
WIFEXITED false ret_value :0
WEXITSTATUS ret_value :2
WIFSIGNALED true ret_value :1
WTERMSIG ret_value :44
WCOREDUMP true ret_value :128
WIFSTOPPED false ret_value :0
WSTOPSIG true ret_value :2
WIFCONTINUED false ret_value :0
//================================================================//

Discussion:
1. As per above output, it seems process was not exited properly. Reason can be due to catching some signal but signal number is coming as ‘44’ which is not a valid signal number.
2. Is it possible that this return value ‘11’ is “SIGSEGV: 11: Invalid memory reference”? But not sure if pclose return signal number.
Let me know if above testing is ok? And what else we can try to get the root cause of this issue.

estabroo 02-19-2014 11:40 AM

You are correct in number 2 it is a SIGSEGV. Your check for it is incorrect. The macros to examine the return value, use the value and not a pointer to the value. In wait function calls you use a pointer to your variable, but use the actual variable in the macros.

WIFEXITED(status_variable)
WIFSIGNALED(status_variable)
WTERMSIG(status_variable)
...


So with that in hand I'd guess you have memory corruption, potentially a bad ram module and you only hit the bad spot in ram every once in a while. You could run extensive memtest on the ram and see if that shows the issue.

On the fly you might be able to test it by flushing the linux file buffer cache when you see the issue and re-running the program (without rebooting).

(might want to sync first, but since you are just doing test, I'm guessing no outstanding data needs to be written out)
sysctl -w vm.drop_caches=3


Release memory used by the Linux kernel in caches
=1 --> to free pagecache
=2 --> to free dentries and inodes
=3 --> to free pagecache, dentries and inodes

madhukar87 02-24-2014 11:56 PM

Hi,

I have flushed the caches using below code but even after this, there is no success in fgets after using popen.

code:
///---------------------------------------------------//
system("free");
system("sync; echo 1 > /proc/sys/vm/drop_caches");
system("sync; echo 2 > /proc/sys/vm/drop_caches");
system("sync; echo 3 > /proc/sys/vm/drop_caches");
system("free");
//-----------------------------------------------------//

output:
//------------------------------------------------------//
total used free shared buffers
Mem: 769272 104920 664352 0 0
-/+ buffers: 104920 664352
Swap: 0 0 0
total used free shared buffers
Mem: 769272 104424 664848 0 0
-/+ buffers: 104424 664848
Swap: 0 0 0
//------------------------------------------------------//

I have tried flushing the buffer using command line, but after flushing, It still not able to get output using fgets.

estabroo 02-25-2014 08:01 AM

I'd run the memtest and see if it shows any issues.

madhukar87 03-06-2014 01:00 AM

I tried soem mem test. But could not see any issue. I am very confused how to get root cause of this problem.

estabroo 03-12-2014 08:25 AM

Sorry it's been a while.

If you have any other of these embedded devices, run the same test on them and see if any of them exhibit the same behavior. With regards to the current one, how long did you run the memory tests, was it about the same amount of time as when you run your reboot test?

The reason I ask is since it is an itermittent issue that occurs after a time it might be a thermal one. By that I mean some chip on the board might be heating up and going out of spec, a weak solder joint that allows separation once it's warmed up, the voltage regulator might be failing under thermal load, ...

If you have a thermal camera you could take images of the board under load and see if anything looks unusually warm or if you don't have one touch the top of the chips (wear a grounding strap) they'll be warm but shouldn't be hot (voltage regulator might be an exception to that if it has to dissipate a lot of excess energy from it's input voltage)


All times are GMT -5. The time now is 03:04 AM.