LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-08-2010, 04:48 PM   #1
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 83

Rep: Reputation: 24
redirecting stdin, stdout and stderr, and finding files name and other stats


I would like to have a general discussion about C programming redirected io, file names, link handling and related.

If you have the time and interest to read this, and share the curiosity or have good answers to give, then please comment.

Intro:
I'm working on an application used for backup/archiving. That can be archiving contents on block devices, tapes, as well as regular files. The application stores data in hard packed low redundancy heaps with multiple indexes pointing out uniquely stored, (shared), fractions in the heap. And the application supports taking and reverting to snapshot of total storage on several computers running different OS, as well as simply taking on archiving of single files. It uses hamming code diversity to defeat the disk rot, instead of using raid arrays which has proven to become pretty much useless when the arrays climb over some terabytes in size. It is intended to be a distributed CMS (content management system) for a diversity of platforms, with focus on secure storage/archiving.


In doing this heap ofjob, i stumbled into the topic of how to manage multiple files/devices/pipes/fifos with a common method.

Lets say i have a unix shell tool that acts like gzip, cat, dd etc in being able to pipe data between applications.
Example:
dd if=/dev/sda bs=1b | gzip -cq > my.sda.raw.gz

the tool can handle different files in a struct array, like:
Code:
enum FilesOpenStatusValue {
FileIsClosed = 0,
FileIsOpen,
FileIsFopen,
FileIsPopen
};

// Struct declarations
//

// Definition of FileRecord
typedef struct FileRecord_t {
  char name[FILENAME_MAX];        // File name
  int open;                       // Does the file exist? Is the file opened? And how is the file opened?
  union FileHandle_u {
    FILE * stream;                // Stream pointer
    int handle;                   // File handle
  } f;
  int flags;                      // File open flags
  int statStatus;                 // Does the file exist? Or what is the errno from stat?
  struct stat stat;               // File statistics struct
  int encoding;                   // File encoding bitmap
  struct hd_geometry BiosGeom;    // Geometry as reported from BIOS if file is a block device
  off_t sectorSize;               // Sector size on device if file is a block device
  off_t sectors;                  // Number of sectors on device if file is a block device
  int readahead;                  // Readahead value if file is a block device. (BLKRAGET)
} FileRecord;
The above struct can contain the sum of all "files" the tool operates on. Regular files, fifos and pipes such as stdin, stdout and stderr redirected or not, (as well as threaded i/o like, for example, popen() threads.

I can then call the following function to get the important details of all the different files:
Code:
int FilesStatFromName(struct FileRecord_t * fr, int num, int nodereference)
{
  int tmpflags;
  int c;
  int n;
  int s;

  n=0;
  for (c=0; c<num; c++) {
    fr[c].encoding = FileNameSuffixType(fr[c].name);

    if (nodereference) {
      // Do not follow symlinks
      s = lstat(fr[c].name, &fr[c].stat);
    } else {
      // Follow symlinks
      s = stat(fr[c].name, &fr[c].stat);
    }

    if (s == -1) {
      fr[c].statStatus=errno;
      continue;
    } else {
      fr[c].statStatus=0; // Stat was good

      // If it is a block device...
      if ((fr[c].stat.st_mode & S_IFBLK) == S_IFBLK) {

	// Need to temporarily open the block device file to get extended stats
	tmpflags = fr[c].flags;
	fr[c].flags = O_RDONLY | O_CLOEXEC | O_NOATIME;
	if ((fr[c].f.handle = OpenBlock(fr[c].name,fr[c].flags)) == -1) {
	    fr[c].statStatus=errno;
	    fr[c].flags = tmpflags;
	    continue;
	}

	/* get disk sector size */
	if (ioctl(fr[c].f.handle,BLKSSZGET,&fr[c].sectorSize) == -1) {
	  fr[c].statStatus=errno;
	  continue;
	}
	fr[c].sectorSize = fr[c].sectorSize & 0xFFFFFFFFl; // Avoid bug/unclarity in ioctl interface description

	/* get disk size in number of 512 byte blocks */
	if (ioctl(fr[c].f.handle,BLKGETSIZE,&fr[c].sectors) == -1) {
	  fr[c].statStatus=errno;
	  continue;
	}

	if (fr[c].sectorSize > 512) {
	  unsigned long bsm;

	  bsm=fr[c].sectorSize/512;

	  if ((512*bsm) != fr[c].sectorSize) {
	    // Blocksize on input device is not a integer multiple of 512. This is unsupported.
	    fr[c].statStatus=EINVAL;
	    continue;
	  } else {
	    fr[c].sectors = fr[c].sectors / bsm; // correct block count for actual sector size
	  }
	} else if (fr[c].sectorSize < 512) {
	  unsigned long bsm;

	  bsm=512/fr[c].sectorSize;
	  if ((512/bsm) != fr[c].sectorSize) {
	    // Blocksize on input device is not an integer divisor of 512. This is unsupported.
	    fr[c].statStatus=EINVAL;
	    continue;
	  } else {
	    fr[c].sectors = fr[c].sectors * bsm; // correct block count for actual sector size
	  }
	}
	// Else leave FileRec[0].sectors as it is for FileRec[0].sectorSize == 512

	// Since stat reports st_size of block devices to be zero, we
	// need to update st_size with correct value for the block device.
	fr[c].stat.st_size = fr[c].sectorSize * fr[c].sectors;

	/* Get readahead */
	if (ioctl(fr[c].f.handle,BLKRAGET,&fr[c].readahead) == -1) {
	  fr[c].readahead = 0;
	}

	fr[c].flags = tmpflags;
	if (CloseBlock(fr[c].f.handle) == -1) {
	  fr[c].statStatus=errno;
	  continue;
	}
      }

      ++n;
    }
  }

  return n;
}
Simply calling FilesStatFromName(TheFiles, NumberOfFiles, 0); will give all the necessary stats for all the files in the array.

The problem is when handling redirected stdio with the same method. How do we get the name of a file that was "redirected into" standard input, (stdin)?

There simply does not seem to exist quick and universal method to apply for this.

One way to be able to specify redirection is to use the single dash argument.

Like: dd if=/dev/sda1 bs=1b | toolname - - > some.file

The first dash argument to the program "toolname" will indicate the input file to be piped in from stdin, and the second dash indicates the output file to be piped to via stdout.

The following code snippet will handle the dashes to represent stdin, stdout and also stderr, depending on the order they appear on the command line.

Code:
    int stdiocnt = 0;
    TotalFiles = 0;
    while (optind < argc) {

      strcat(cmdstr,argv[optind]);
      strcat(cmdstr,"\n");

      if (strcmp(argv[optind],"-") == 0) {
	// Special handling of redirect for stdin, stdout and stderr with a dash.
	switch (stdiocnt) {
	case 0 : // stdin
	  strncpy(FileRec[TotalFiles].name,"/proc/self/fd/0",FILENAME_MAX);
	  ++stdiocnt;
	  break;
	case 1 : // stdout
	  strncpy(FileRec[TotalFiles].name,"/proc/self/fd/1",FILENAME_MAX);
	  ++stdiocnt;
	  break;
	case 2 : // stderr
	  strncpy(FileRec[TotalFiles].name,"/proc/self/fd/2",FILENAME_MAX);
	  break;
	default:
	  usage("More than three - specified. Have only stdin, stdout and stderr to redirect");
	  break;
	}
      } else {
	// Default, if no redirect with a dash.
	strncpy(FileRec[TotalFiles].name,argv[optind],FILENAME_MAX);
      }

      FilesStatFromName(&FileRec[TotalFiles], 1, CmdPar.nodereference);

      ++TotalFiles;
      ++optind;
    }
The above code is very much dependent on the platform being Linux, and will presumably not work on all flavors of Linux.

It depends upon the assumption that, for example, the file /proc/self/fd/0 is exactly the same as the process stdin.

When performing stat() on /proc/self/fd/0 for the following command line: toolname - < /dev/sde1

I get:
Code:
File stat info for /proc/self/fd/0:
  Device Id:    15
  Inode:        1601
  Mode: 24992, block device
  Permissions:  640
  Hard links:   1
  UID/GID:      0/6
  Rdev: 2113
  Size: 640132383744
  Block size:   4096
  Blocks:       0
  Sector size:  512
  Sectors:      1250258562
  Readahead:    256
  Atime:        Fri Aug  6 23:13:46 2010
  Mtime:        Tue Jul 13 21:24:10 2010
  Ctime:        Tue Jul 13 21:24:18 2010
  Stat status:  0, (Success)
If i specify not to follow symlinks, and thus use lstat(), like: toolname --no-dereference - < /dev/sde1

I get:
Code:
File stat info for /proc/self/fd/0:
  Device Id:    3
  Inode:        7870887
  Mode: 41280, Symbolic link
  Permissions:  500
  Hard links:   1
  UID/GID:      0/0
  Rdev: 0
  Size: 64
  Block size:   1024
  Blocks:       0
  Sector size:  0
  Sectors:      0
  Readahead:    0
  Atime:        Sun Aug  8 23:16:32 2010
  Mtime:        Sun Aug  8 23:16:32 2010
  Ctime:        Sun Aug  8 23:16:32 2010
  Stat status:  0, (Success)
  Symbolic link points to: /dev/sde1
Inode 7870887 is verified to be that process stdin stream. So /proc/self/fd/0 is a link to what got redirected to stdin.

Also available in the file space is /dev/stdin, which will lstat() like:
Code:
File stat info for /dev/stdin:
  Device Id:    15
  Inode:        2481
  Mode: 41471, Symbolic link
  Permissions:  777
  Hard links:   1
  UID/GID:      0/0
  Rdev: 0
  Size: 15
  Block size:   4096
  Blocks:       0
  Sector size:  0
  Sectors:      0
  Readahead:    0
  Atime:        Sun Aug  8 21:35:56 2010
  Mtime:        Tue Jul 13 21:24:17 2010
  Ctime:        Tue Jul 13 21:24:17 2010
  Stat status:  0, (Success)
  Symbolic link points to: /proc/self/fd/0
Please note that using:
Code:
    if ((fr[c].stat.st_mode & S_IFMT) == S_IFLNK) {
      char linkname[FILENAME_MAX];
      int linklen;
      if ((linklen = readlink(fr[c].name, linkname, FILENAME_MAX)) != -1) {
	linkname[linklen] = '\0'; // Stupid bug in system call readlink() !!
	ConsoleLogPrintf("  Symbolic link points to: %s\n", linkname);
      }
    }
...shows us that /dev/stdin is a link that points to /proc/self/fd/0

Using stat() instead to follow the symlink will give us:
Code:
File stat info for /dev/stdin:
  Device Id:    15
  Inode:        1601
  Mode: 24992, block device
  Permissions:  640
  Hard links:   1
  UID/GID:      0/6
  Rdev: 2113
  Size: 640132383744
  Block size:   4096
  Blocks:       0
  Sector size:  512
  Sectors:      1250258562
  Readahead:    256
  Atime:        Fri Aug  6 23:13:46 2010
  Mtime:        Tue Jul 13 21:24:10 2010
  Ctime:        Tue Jul 13 21:24:18 2010
  Stat status:  0, (Success)
Aha! Inode is 1601, which we recognize as the disk partition we redirected to stdin above.

So this method seems to work pretty well. But it doesn't feel like a robust method, since it has lots of dependencies that need to be carefully checked for.

The questions i stand with:
  1. Is there a better way to do this?
  2. What about portability?
  3. Is there a better way of getting the file name of the redirected file, (respecting the fact that there may not always exist such a thing as a file name for a redirection pipe).
  4. Should i work with inodes instead, and then take a completely different approach when porting to non-unix platforms?
  5. Why isn't there a system call like get_filename(stdin); ?

If you have any input on this, or some questions, then please don't hesitate to post in this thread.

To add some offtopic to the thread - Here is a performance tip: When doing data shuffling on streams one should avoid just using some arbitrary record length, (like 512 bytes). Use stat() to get the recommended block size in stat.st_blksize and use copy buffers of that size to get optimal throughput in your programs.
 
Old 08-10-2010, 04:37 AM   #2
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,281

Rep: Reputation: 171Reputation: 171
very dangermouse I would say.
redirection has nothing to do with the OS.
it's something that the shell does.

If i was you I would bar redirection.

stat(3) IS_REGULAR_FILE or !IS_PIPE some such.
or isatty(3)

inodes cannot be trusted.

Code:
$ touch 1
$ ls -i 1
709970 1
$ gvim 1
$ ls -i 1
709971 1
i.e. 'race condition' unless your filesystem is read only.
 
Old 08-10-2010, 05:08 AM   #3
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 15

Rep: Reputation: 233Reputation: 233Reputation: 233
I was thinking maybe you should create a library similar to boost that can run in many systems that will do the complex task you need. On how to do it in other systems, I honestly still don't know any idea. Hey maybe you can find some in the source codes of boost? Take note of the license first btw. Maybe the idea is free to copy and just not the source.

Quote:
Originally Posted by wroom View Post
To add some offtopic to the thread - Here is a performance tip: When doing data shuffling on streams one should avoid just using some arbitrary record length, (like 512 bytes). Use stat() to get the recommended block size in stat.st_blksize and use copy buffers of that size to get optimal throughput in your programs.
This should be useful thanks;.. Do you think stat.st_blksize is dependable?
 
Old 08-14-2010, 06:26 AM   #4
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 83

Original Poster
Rep: Reputation: 24
Redirection barring:
Quote:
Originally Posted by bigearsbilly View Post
very dangermouse I would say.
redirection has nothing to do with the OS.
it's something that the shell does.

If i was you I would bar redirection.

stat(3) IS_REGULAR_FILE or !IS_PIPE some such.
or isatty(3)
Yes. One should not push random binary data dumps to the terminal just because someone did a typo on the command line. Testing isatty() or similar method is very recommended when redirected stdio.

Everybody may not know that if one happens to push random binary data to stdout, eventually something random comes back to stdin from the terminal and may do horrendous things like recapitulating interesting commands like .rm -rf /* or mkdosfs from the history file.

But i would rather have redirection easily available with < > | and such, than people doing more or less creative things to achieve redirection anyhow. Perhaps in a less reliable manner.
In VMS, piping commands are not supported by default. Still, i've used redirection and even command piping a lot when working with VMS. If there is a means, there is a way to do it.

Say one makes a check in the program so that binary dumps never go to a terminal. Still someone can do mkfifo f1 ; dd if=/dev/sdf2 | gzip -cq > f1 & sleep 20 ; cat f1 in a remote shell and then try to upload a disk dump through a remote login terminal session log file, not knowing what satanic rituals the random data may trigger in the tty.

Inodes:
Quote:
Originally Posted by bigearsbilly View Post
inodes cannot be trusted.

Code:
$ touch 1
$ ls -i 1
709970 1
$ gvim 1
$ ls -i 1
709971 1
i.e. 'race condition' unless your filesystem is read only.
Nope. Inodes can't be trusted. Never turn your back on your inodes. Or they'll fragg your ***

But seriously - maybe this behaviour is just what we want?
Code:
touch 1
ls -i 1*
287025 1
emacs 1
ls -i 1*
287056 1  287025 1~
Our editor makes a new file and names it "1" after it has renamed the old file to "1~". Its' a matter of point of view.
But a backup system must be faithful to the directory hierarchy / file names, and cannot expect the inode numbers to remain the same over time.

In fact, the most essential thing about backup, is saving the actual content of the files. Never mind the directories, file names or ACL's.
With thrashed directories, but fully recovered file contents, one can often recover the directory structure enough from memory and looking at older backups and investigating contents of files.

But loosing a bunch of blocks in the beginning of a zip file - Then it does not matter that you still got the file name, size and directory hierarchy. One will need a CUDA cluster worthy SETI or NSA to be able to brute force recover the zipfile from remaining contents and crc.

Trusting the enumeration:
Another example of things not to trust, besides inode mapping, is how to keep track of your disk partitions over boots, and when disks are added/removed/repartitioned or when boot order changes in BIOS.
Say you have a tool that does incremental image backup of your disks. It then need to keep track of the enumeration of disks/partitions in your system. This may change! If you change settings in BIOS, upgrade BIOS, upgrade kernel or do some other changes to the system.

Example:
Since your SATA controller have two SATA3 ports, which happen to enumerate as /dev/sdc and /dev/sdd, you decide to have your system root in the partition /dev/sdc1, swap partition in /dev/sdd1, home partition mirrored over /dev/sd[ab]1 and a raid5 array on all four drives partition 2, and it works perfectly like that for some months.

Then one of the disks in the raid array starts to go bad and you buy another drive to recover. You hotswap add your new disk since you dare not reboot with a failed raid, and the new disk enumerates as /dev/sde and you happily add the new drive to your raid array and let the raid array resynch.

Finally, when you got the raid up and running perfectly, you reboot...

...Grub gives a sneaky comment that it can not find your root partition...

XYZZY - Nothing happens


It turns out your new drive now enumerates as /dev/sdc since it is on one of the SATA2 ports, that BIOS decided to enumerate before the two SATA3 ports. And your root partition has automagically moved to /dev/sdd.
Even worse if your raid is not set up with superblocks. Then the files on the raid disk may, if you are the slightest bit unlucky, soon become as organised as a bowl of cornflakes.

Of course, using superblocks in raid arrays, mounting everything with UUID instead of device name, put an MBR on all disks, and ordering grub to use boot from partition with the matching UUID of your root partition will counteract this problem.
Caveats of using UUID's and volume identifiers is that you might end up with two drives/partitions with identical volume id's or UUID's if you clone disks/partitions, or hot add f.ex. a USB disk.


So a backup tool should look at the actual contents of data, and build indexes around the data to reflect the enumerations of drives/partitions. Instead of relying on the enumerations, and ending up making huge differential backups when the order of drives/partitions change.


Redirection again...
We can make the assumption that on most newer Linux installations we will find descriptors for stdin, stdout and stderr in the proc file system, (/proc/self/fd/[012]), or in /dev/stdin etc.. but still, it is there just because somebody said it should be so. And it might well not always be the case.

Maintaining standard:
Some people work on obsoleting the proc file system... Some installations of Linux use the udev file system, while others don't... And people tend to make their own device type mappings, despite the effort there is to keep it at least half consistent. One recent example is example that someone decided to start masking ATA drives as SD drives instead of continuing to use the HD driver.

One simply can't trust the configuration! This is in my humble opinion one of the bigger threats to Linux. There should be an even harder effort to maintain standard.

Making an application for Linux that need to dig the slightest bit deeper that will work on all installations is becoming a challenge. I guess, for example, that the developers at VMWARE will concur to this.
 
Old 08-14-2010, 06:48 AM   #5
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 83

Original Poster
Rep: Reputation: 24
Quote:
Originally Posted by konsolebox View Post
I was thinking maybe you should create a library similar to boost that can run in many systems that will do the complex task you need. On how to do it in other systems, I honestly still don't know any idea. Hey maybe you can find some in the source codes of boost? Take note of the license first btw. Maybe the idea is free to copy and just not the source.
My thoughts was to develop the application with a library/platform and a framework on top of that. And that the library part should be as portable between platforms as possible. And i want to publish open software to the public domain to be a better choice than current available options for dealing with badblocks recovery, backups and archiving. Getting tired of the headaches with dd, badblocks, etc...
And i also want to provide commercial tools for recovery/backup and archiving aimed for the MS platforms.

I have decided to avoid C++ as much as possible for speed and reliability, and stick to C. In the good old spirit of both Linux as well as "embedded software development".

Still, it is quite feasible to produce C++ that don't "inherit" bad behaviour, fragment the memory, or use 90% resources just to keep the GUI up.
So it might be that when the lower level routines have stabilized, that i will make a C++ library of it, or add it to some open software library. Could be boost.

Looking at other libraries to try sticking to the convention and possibly making ones routines portable is a good idea.

But if you take a look at the source code of f.ex. gzip, you will find that because it must be as portable as possible, it is using a very old standard of C. So it can compile on DOS, RT11, VMS... as well as for Linux and Windows.

Quote:
Originally Posted by konsolebox View Post
This should be useful thanks;.. Do you think stat.st_blksize is dependable?
Of course one has to apply the vanilla fuzzy validity tests before using it, but yes. It is "rather" dependable.
(Having in mind what i posted before on maintaining standard).
 
  


Reply

Tags
filename, linux, programming, redirection, stat


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script stdout, stderr and stdin solo9300 Linux - General 6 12-29-2009 12:33 AM
example for redirecting stderr,stdout in seperate files using two way pipes concept nagendrar Programming 0 07-06-2009 06:08 AM
Redirecting stdout, stderr to pty0? Rostfrei Linux - General 4 03-20-2007 03:15 AM
redirecting stdout to /dev/null and stderr to stdout? Thinking Programming 1 05-18-2006 02:36 AM
redirecting stdout and stderr to a file Avatar33 Programming 4 03-12-2005 07:55 AM


All times are GMT -5. The time now is 04:25 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration