LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-12-2014, 02:57 AM   #1
madhukar87
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Rep: Reputation: Disabled
fgets() do not give any output for ls command run using popen


Hi Guys,

I am a new member in this community. I have an application which uses popen extensively to query various information from shell by different functions.In long run, I am observing strange thing that fgets() start not giving any output for ls command.Please find below details of this problem:

1. I am using cross compiled kernel version for my embedded device. “Linux (none) 3.3.8-2.0 mips GNU/Linux”. I am facing problem in one small module in my application.

2. In my code I use popen to get a complete (absolute path of file abc123.bin) file name at some specific path (where the file ‘abc123.bin’ is always present in file system at that path) and I am running command from popen is “ls /home/xyz/abc*.bin”. There is no other file with name starting with “abc*”.

3. In normal case, It always pass but when I perform a long run test which do the following :

a. soft reboot using system call with argument “reboot –f” .
b. get file name using popen, and read file name using fgets.
c. Copy file “/home/xyz/abc123.bin” to some other location.
d. soft reboot using system call with argument “reboot –f”
After above steps it fail after 500~1000 times of execution. Below is the code snippet from the function where error is coming:
//======================Code snippet Start=================================//
char command[256] = {'\0'}, command_op[256] = {'\0'};
FILE *fp = NULL;
/* check if file is present*/
memset(command, '\0', sizeof(command));
snprintf(command,(sizeof(command)-1), "ls %s%s",”/home/xyz/”,”abc*.bin”);
printf ("-. command %s\n",command);
fp = popen(command, "r");
if(fp == NULL)
{
printf ("-. FAIL at %s:%s:%d\n",__FILE__,__FUNCTION__,__LINE__); //line number XXX
return -1;
}
if (fgets(command_op, sizeof(command_op), fp) == NULL)
{
printf ("-. fgets FAIL at %s:%s:%d errno %d\n",__FILE__,__FUNCTION__,__LINE__,errno); //line number YYY
return -1;
}
//======================Code snippet End=================================//
Error logs are like:
----------------------------------------------------------------
-. command ls /home/xyz/abc*.bin
-. fgets FAIL at test.c:test_function:YYY errno 0
----------------------------------------------------------------------------




4. Investigation:

a. No failure in popen.
b. Fgets fails and return NULL, error number is 0 (as popen set error number and fgets do not set error number, and popen passed so it is 0).
c. If I use shell prompt and use same last command ie. “ls /home/xyz/abc*.bin”. It returns result correctly.
d. If after getting this error, I call same function (which uses popen to get file name) again without rebooting my device, It always fails in fgets. Tried with adding 3 trials of above snippet code in case of failure.
e. If I use system call “system(“ls /home/xyz/abc*.bin >/tmp/temp_output.txt”);” and read output using fgets. It pass always (tried with adding this code in case fgets after popen fails).

5. Questions:

a. As there is no failure from popen, so is it always expected that the fgets will get the output of command run by popen?
b. Why successive retry of (popen+fgets) could not get the expected result in case of failure? But it passes if we reboot and try again.
c. Why (System call + fgets ) gives right result but (popen+fgest) do not give right result?
d. Is there any better alternative of popen?

Please help me find the problem/solution; Let me know if more information is needed.
 
Old 02-12-2014, 01:53 PM   #2
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
I don't have a good answer for you but I do have an investigation suggestion. fgets is returning null which means reading on the popen is returning EOF (since you didn't have a popen failure). Where you output the fgets failure do a pclose and look at the return value. It may give you some clue as to why you are getting the EOF without any other output from the ls command.

excerpts from popen manpage
The pclose() function waits for the associated process to terminate and returns the exit status of the command as returned by wait4(2).
The pclose() function returns -1 if wait4(2) returns an error, or some other error is detected. In the event of an error, these functions set errno to indicate the cause of the error.
 
Old 02-13-2014, 09:35 PM   #3
madhukar87
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hi,

As suggested, I have captured the return value of pclose in case of issue occurs. I get the same result after running the application on long run for two times. Below are the observations:
 “errno: 0, ret_value: 11” :- these are the prints for error number and return value of pclose.
 This info may be helpful, ‘my application’ PID is 730 and ‘khelper’ process PID is 11.
 “waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned. On error, -1 is returned.”
 I am still confused what is meaning of return value ‘11’. As per man page it should be return value of waitpid(). But ‘11’ is PID of ‘khelper’ process which is not the process created by popen(). And error number is set to 0 so there should not be any issue.
 
Old 02-14-2014, 11:42 AM   #4
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
I believe the pclose return value is the status portion rather than the pid of the child process, so you should be able to use the status macros (man 2 wait) to pull out information. 11 is odd as a return value though since I believe the lower 8 bits are the exit value and standard (gnu) ls only returns 0, 1, and 2 (iirc). You might need to peruse the source code of your ls to see what meaning the exit value has.
 
Old 02-18-2014, 05:14 AM   #5
madhukar87
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
I have made some sample program to find out info about the return status ‘11’ as suggested in ‘man 2 wait’. Below is the code and output of code:

Code:
//===============================================================//
int ret_value = -1;
int *status = 0;
int status_variable = 11;
status = &status_variable;
if(ret_value = WIFEXITED(status))
{
printf("WIFEXITED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFEXITED false ret_value :%d\n",ret_value);
}
ret_value = WEXITSTATUS(status);
printf("WEXITSTATUS ret_value :%d\n",ret_value);

if(ret_value = WIFSIGNALED(status))
{
printf("WIFSIGNALED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFSIGNALED false ret_value :%d\n",ret_value);
}
ret_value = WTERMSIG(status);
printf("WTERMSIG ret_value :%d\n",ret_value);
if(ret_value = WCOREDUMP(status))
{
printf("WCOREDUMP true ret_value :%d\n",ret_value);
}
else
{
printf("WCOREDUMP false ret_value :%d\n",ret_value);
}
if(ret_value = WIFSTOPPED(status))
{
printf("WIFSTOPPED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFSTOPPED false ret_value :%d\n",ret_value);
}
ret_value = WSTOPSIG(status);
printf("WSTOPSIG true ret_value :%d\n",ret_value);

if(ret_value = WIFCONTINUED(status))
{
printf("WIFCONTINUED true ret_value :%d\n",ret_value);
}
else
{
printf("WIFCONTINUED false ret_value :%d\n",ret_value);
}
//================================================================//

Output:
//================================================================//
WIFEXITED false ret_value :0
WEXITSTATUS ret_value :2
WIFSIGNALED true ret_value :1
WTERMSIG ret_value :44
WCOREDUMP true ret_value :128
WIFSTOPPED false ret_value :0
WSTOPSIG true ret_value :2
WIFCONTINUED false ret_value :0
//================================================================//

Discussion:
1. As per above output, it seems process was not exited properly. Reason can be due to catching some signal but signal number is coming as ‘44’ which is not a valid signal number.
2. Is it possible that this return value ‘11’ is “SIGSEGV: 11: Invalid memory reference”? But not sure if pclose return signal number.
Let me know if above testing is ok? And what else we can try to get the root cause of this issue.
 
Old 02-19-2014, 11:40 AM   #6
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
You are correct in number 2 it is a SIGSEGV. Your check for it is incorrect. The macros to examine the return value, use the value and not a pointer to the value. In wait function calls you use a pointer to your variable, but use the actual variable in the macros.

WIFEXITED(status_variable)
WIFSIGNALED(status_variable)
WTERMSIG(status_variable)
...


So with that in hand I'd guess you have memory corruption, potentially a bad ram module and you only hit the bad spot in ram every once in a while. You could run extensive memtest on the ram and see if that shows the issue.

On the fly you might be able to test it by flushing the linux file buffer cache when you see the issue and re-running the program (without rebooting).

(might want to sync first, but since you are just doing test, I'm guessing no outstanding data needs to be written out)
sysctl -w vm.drop_caches=3


Release memory used by the Linux kernel in caches
=1 --> to free pagecache
=2 --> to free dentries and inodes
=3 --> to free pagecache, dentries and inodes

Last edited by estabroo; 02-19-2014 at 11:44 AM. Reason: put in macro example
 
Old 02-24-2014, 11:56 PM   #7
madhukar87
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hi,

I have flushed the caches using below code but even after this, there is no success in fgets after using popen.

code:
///---------------------------------------------------//
system("free");
system("sync; echo 1 > /proc/sys/vm/drop_caches");
system("sync; echo 2 > /proc/sys/vm/drop_caches");
system("sync; echo 3 > /proc/sys/vm/drop_caches");
system("free");
//-----------------------------------------------------//

output:
//------------------------------------------------------//
total used free shared buffers
Mem: 769272 104920 664352 0 0
-/+ buffers: 104920 664352
Swap: 0 0 0
total used free shared buffers
Mem: 769272 104424 664848 0 0
-/+ buffers: 104424 664848
Swap: 0 0 0
//------------------------------------------------------//

I have tried flushing the buffer using command line, but after flushing, It still not able to get output using fgets.
 
Old 02-25-2014, 08:01 AM   #8
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
I'd run the memtest and see if it shows any issues.
 
Old 03-06-2014, 01:00 AM   #9
madhukar87
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
I tried soem mem test. But could not see any issue. I am very confused how to get root cause of this problem.
 
Old 03-12-2014, 08:25 AM   #10
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
Sorry it's been a while.

If you have any other of these embedded devices, run the same test on them and see if any of them exhibit the same behavior. With regards to the current one, how long did you run the memory tests, was it about the same amount of time as when you run your reboot test?

The reason I ask is since it is an itermittent issue that occurs after a time it might be a thermal one. By that I mean some chip on the board might be heating up and going out of spec, a weak solder joint that allows separation once it's warmed up, the voltage regulator might be failing under thermal load, ...

If you have a thermal camera you could take images of the board under load and see if anything looks unusually warm or if you don't have one touch the top of the chips (wear a grounding strap) they'll be warm but shouldn't be hot (voltage regulator might be an exception to that if it has to dissipate a lot of excess energy from it's input voltage)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Requests to my ISP server are output in the middle of output by any command I run. stf92 Slackware 2 07-10-2012 08:38 AM
is there a command i can run that will give me the serial number on solaris Imtiaz Deen Solaris / OpenSolaris 3 11-03-2009 07:04 AM
catching output and errors with popen c++ mshindo Programming 2 02-04-2009 04:09 AM
popen output not appear dimsh Programming 4 10-22-2005 02:56 AM
How do I give a user to run the nice command? badmofo666 Linux - Newbie 3 03-17-2004 07:01 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration