LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-16-2010, 08:47 AM   #1
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Reading a simple file format in C


I made a string key-value mapping struct in C, and functions to add and remove entries. I would also like to write a function to read in this file format:

Code:
key: value
another: another value
key3:: @END DELIMITER@
this is a
multi-
line value

@END DELIMITER@
key4:: %%%
another multi
line value
%%%
key5: my value
It would be pretty easy if I could read the file line by line, but that's not easily done in C. I wonder if you have any better suggestions?
 
Old 12-16-2010, 10:09 AM   #2
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
You can read an entire line in C a bunch of ways, possibly the easiest is
Code:
#include <stdio.h>
#include <stdlib.h>

char    buf [BUFSIZ];
char    iname [FILENAME_MAX];   /* input file name              */
FILE    *in;
.
assign input file name somehow or other
.
if ((in = fopen (iname, "r")) == (FILE *) NULL) {
                (void) fprintf (stderr, "%s:\tcan't open %s\n", argv [0], iname);
                exit (EXIT_FAILURE);
        }

.
/*      read the thing                                  */
while (fgets (buf, BUFSIZ, in) != (char *) NULL) {
     if (pattern match)
          do something
     else if (some other pattern match)
          do something else
}
Hope this helps some.
 
Old 12-16-2010, 10:33 AM   #3
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
But what if the line is greater in length than BUFSIZ?
 
Old 12-16-2010, 10:45 AM   #4
devnull10
Member
 
Registered: Jan 2010
Location: Lancashire
Distribution: Slackware Stable
Posts: 572

Rep: Reputation: 120Reputation: 120
Quote:
Originally Posted by MTK358 View Post
But what if the line is greater in length than BUFSIZ?
That's why you make your buffer large enough to cope!
 
Old 12-16-2010, 10:51 AM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
If you don't mind sacrificing portability, GNU libc has getline.
 
Old 12-16-2010, 11:13 AM   #6
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
BUFSIZ (at least on my 64-bit system) is 8192 (bytes). If that ain't big enough, you can simply
Code:
char     buf [BUFSIZ*2]
But it's pretty unlikely that you'll have a line that's that long in a line feed terminated file.

If you're using a 32-bit system, you can check the sizes of things with
Code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <values.h>

int     main    (void)
{
        (void) fprintf (stdout, "char\t\t%d\n", sizeof (char));
        (void) fprintf (stdout, "short\t\t%d\n", sizeof (short));
        (void) fprintf (stdout, "int\t\t%d\n", sizeof (int));
        (void) fprintf (stdout, "long\t\t%d\n", sizeof (long));
        (void) fprintf (stdout, "float\t\t%d\n", sizeof (float));
        (void) fprintf (stdout, "double\t\t%d\n", sizeof (double));
        (void) fprintf (stdout, "BUFSIZ\t\t%d\n", BUFSIZ);
        (void) fprintf (stdout, "FILENAME_MAX\t%d\n", FILENAME_MAX);
        (void) fprintf (stdout, "time_t\t\t%d\n", sizeof (time_t));
        (void) fprintf (stdout, "MAXSHORT\t%d\n", MAXSHORT);
        (void) fprintf (stdout, "MAXINT\t\t%d\n", MAXINT);
        (void) fprintf (stdout, "MAXLONG\t\t%ld\n", MAXLONG);
        exit (EXIT_SUCCESS);
}
You can add other stuff to the above; it's just a quick-and-dirty display thingie.

Hope this helps some.
 
Old 12-16-2010, 12:38 PM   #7
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 116Reputation: 116
If you're really worried about overrunning your buffer, you *could* start by scanning the file to find the longest line (the greatest number of characters between end of line markers) and then set your buffer appropriately.

Alternatively (and probably better), fgets() requires you to specify a maximum size. Set the maximum size to your buffer size, then after reading the line check to see how many characters you read and if there is an end of line marker as the last character. If not, either throw an error or process appropriately.
 
Old 12-16-2010, 12:40 PM   #8
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Maybe just using BUFSIZ is OK. It just seems bad to set arbitrary limits on things that shouldn't have any.

And I'll look into getline, that might be a nice solution.

Anyway, what's the point of writing "(void)" in front of each fprintf, and why not use printf(str) instead of fprintf(stdout, str)?
 
Old 12-16-2010, 03:08 PM   #9
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
But what if the line is greater in length than BUFSIZ?
http://www.and.org/vstr/comparison
 
Old 12-16-2010, 03:12 PM   #10
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
That is, I used

"C" dynamic string library

in Yahoo.
 
Old 12-16-2010, 03:18 PM   #11
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Quote:
Originally Posted by MTK358
Anyway, what's the point of writing "(void)" in front of each fprintf, and why not use printf(str) instead of fprintf(stdout, str)?
I can't speak for tronayne, but I would assume:

1. fprintf() returns the number of characters printed. By typecasting the return value as void, there are no warnings or errors generated by the compiler regarding unused return values.

2. Use of fprintf() is probably habit. There are times I've written programs that need to output to stdout, stderr, and/or a file descriptor. fprintf() provides a consistent interface to all output streams, rather than using another function for one special case. Use of fprintf() from those other programs bleeds over into programming habit, and I use printf() rarely.
 
Old 12-16-2010, 03:19 PM   #12
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
From http://www.delorie.com/gnu/docs/glibc/libc_226.html,
Code:
...
Macro: int BUFSIZ
    The value of this macro is an integer constant expression that is good to use for the size argument to setvbuf. This value is guaranteed to be at least 256.

    The value of BUFSIZ is chosen on each system so as to make stream I/O efficient. So it is a good idea to use BUFSIZ as the size for the buffer when you call setvbuf.

    Actually, you can get an even better value to use for the buffer size by means of the fstat system call: it is found in the st_blksize field of the file attributes. See section 14.9.1 The meaning of the File Attributes.

    Sometimes people also use BUFSIZ as the allocation size of buffers used for related purposes, such as strings used to receive a line of input with fgets (see section 12.8 Character Input). There is no particular reason to use BUFSIZ for this instead of any other integer, except that it might lead to doing I/O in chunks of an efficient size.
For dealing with text files, using BUFSIZ for an I/O buffer is -- unless you know for sure that your lines are longer than the defined BUFSIZ -- the most efficient size.

Here's a perhaps overdone utility I wrote some years ago for examining files for the shortest and longest lines in a file (and you can change BUFSIZ to, oh, 16384 or some overkill value you'd like); hope it helps some.
Code:
/*
 *      Name:           $Source: /usr/local/cvsroot/general/length.c,v $
 *      Purpose:
 *      Version:        $Revision: 1.1.1.1 $
 *      Modified:       $Date: 2009/01/29 14:50:49 $
 *      Author:         $Author: trona $
 *      Date:
 *      $Log: length.c,v $
 *      Revision 1.1.1.1  2009/01/29 14:50:49  trona
 *      inital
 *
 *      Revision 1.1.1.1  2007/06/12 12:30:50  trona
 *      initial installation after SATA failure
 *
 *      Revision 1.1.1.1  2005/11/27 16:20:33  trona
 *      initial installation
 *
 *      Revision 1.2  2004/06/15 14:05:52  trona
 *      add use display
 *
 *      Revision 1.1  2004/05/06 14:13:30  trona
 *      initial installation of length utility
 *
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#ifndef TRUE
#       define  TRUE    1
#endif
#ifndef FALSE
#       define  FALSE   0
#endif

int     main    (int argc, char *argv [])
{
        char    buf [BUFSIZ];           /* input buffer                 */
        int     c;                      /* general-purpose              */
        int     error = FALSE;          /* error flag                   */
        int     len;                    /* length of line               */
        int     max_len = 0;            /* maximum line                 */
        int     min_len = BUFSIZ;       /* minimum line                 */
        int     vopt = FALSE;           /* verbose option               */
        time_t  t0 = (time_t) 0;        /* start time                   */
        time_t  t1 = (time_t) 0;        /* finish time                  */
        FILE    *in;                    /* we may need a file           */

        /*      process the command line arguments                      */
        while ((c = getopt (argc, argv, "?v")) != EOF) {
                switch (c) {
                case '?':
                        error = TRUE;
                        break;
                case 'v':
                        vopt = TRUE;
                        break;
                default:
                        (void) fprintf (stderr, "getopt() bug\n");
                        exit (EXIT_FAILURE);
                }
        }
        /*      any errors in the arguments, or a '?' entered...*/
        if (argc == 1 || error) {
                (void) fprintf (stderr,
                    "%s:\tshow the length (in bytes) of the shortest and longest line in file[s]\n",
                    argv [0]);
                (void) fprintf (stderr, "usage: %s [-v] file...\n",
                    argv [0]);
                exit (EXIT_FAILURE);
        }
        /*      get a start time                                */
        if (time (&t0) < (time_t) 0)
                (void) fprintf (stderr,
                    "%s:\tcan't read system clock\n", argv [0]);
        /*      now process any arguments supplied...           */
        while (optind != argc) {
                (void) fprintf (stderr, "Processing %s...\n", argv [optind]);
                /*      open the input file             */
                if ((in = fopen (argv [optind], "r")) == (FILE *) NULL) {
                        (void) fprintf (stderr, "%s:\tcan't open %s\n", argv [0], argv [optind]);
                        exit (EXIT_FAILURE);
                }
                /*      scan the input file             */
                while (fgets (buf, sizeof (buf), in) != (char *) NULL) {
                        len = strlen (buf);
                        if (len > max_len)
                                max_len = len;
                        if (len < min_len)
                                min_len = len;
                }
                /*      close the input file            */
                if (fclose (in))
                        (void) fprintf (stderr, "%s:\tcan't close %s\n", argv [0], argv [optind]);
                (void) fprintf (stdout,
                    "%s:\tshortest line is [%d], longest line is [%d]\n", argv [optind], min_len, max_len);
                optind++;
        }
        /*      get a finish time                       */
        if (time (&t1) < (time_t) 0)
                (void) fprintf (stderr,
                    "%s:\tcan't read system clock\n", argv [0]);
        if (vopt)
                (void) fprintf (stderr,
                    "%s duration %g seconds\n", argv [0], difftime (t1, t0));
        exit (EXIT_SUCCESS);
}
 
Old 12-16-2010, 07:27 PM   #13
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Is there a way to tell if the C library is GNU at compile time, so I can do something like this?

Code:
#ifdef GNU_C_LIBRARY
// use getline
#else
// use fgets and BUFSIZ
#endif
 
Old 12-17-2010, 08:42 AM   #14
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
Quote:
Anyway, what's the point of writing "(void)" in front of each fprintf, and why not use printf(str) instead of fprintf(stdout, str)?
Dark_Helmet spoke truly -- I've used the fprint() function for... uh, about 25 years now (rather than the printf() function because of the consistent interface between stdout, stderr and any other file (in Unix and Linux, everything (almost) is treated as a file). The (void) cast is simply to tell the compiler that I know it's an integer function and will return a value and, no, I don't care about the value so shut up. Over the years I've adopted the habit of doing that with everything I write and every function type (well, not with void functions). It both informs the compiler and informs me when I look at something after some passage of time what the heck I was trying to accomplish there; basically, it's good practice.

C can be forgiving about types but she can also be a mean mutha if you don't; just makes life easier to spend a second or two to tell 'er what you mean.
 
Old 12-17-2010, 09:27 AM   #15
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
Quote:
Is there a way to tell if the C library is GNU at compile time, so I can do something like this?
Yeah, but...

Using manufacturer extensions will bite you in the hiney. Been there, did that, don't ever want to do it again.

The examples I gave you above were all written on Solaris boxes and all written never, ever using extensions. Porting to Linux from Solaris was child's play -- type make, hit the carriage return, done deal (if you really want to writhe in agony, even if you're getting paid for it, try porting something written in C from "extension-rich" Windows to Linux; arrgghh!).

Just because you can doesn't always mean that you ought to, methinks. And, the day will come when that extension will bite you hard and you'll wish that you'd never done it.

Hope this helps some.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reading the Versados file format saintt Linux - General 6 02-22-2010 08:41 AM
[SOLVED] Simple Linux script to convert datetime format to UTC format shayno90 Linux - Newbie 10 10-09-2009 08:19 AM
wk1 format reading in c leonidg Programming 2 02-23-2005 01:27 PM
Helix-Server, Quicktime, and some simple file format converters Conpen2000 Linux - Software 1 08-27-2004 10:42 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration