LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   fscanf(...) not completing the read (https://www.linuxquestions.org/questions/programming-9/fscanf-not-completing-the-read-4175681086/)

bkelly13 08-26-2020 08:59 PM

fscanf(...) not completing the read
 
Edit, Just to put this at the top: I have followed the advice here and changed strategy to use strtok() and strtod().

The goal is to read a CSV file and put the values into variables. I have extracted the misbehaving code to a test program and it manifests the problem. The printf at line 79 looks good, but the next printf at line 79 shows only one character read. I am expecting a lot more. The same concepts applies within the for loop.
Conclusion: There is something about the fscanf(…) that I do not understand. Here is the code.

Edit: I simplified the code, shortened the input file to remove non-symptomatic lines, and am providing the output. In summary, the fscanf(...) is reading one field at a time, one one line. I checked errno immediately after the fscanf call and it is always zero.
After reading the post from GazL, I was certain that would resolve the problem. Alas, there is another defect that I am looking right at but cannot see.

Thank you for your time and patience.

Code:

// CentOS Linux release 7.7.1908 (Core)
// compile with c++ test.cpp
// run c++ --version to get, in part:
// (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
// run with    ./a.out

#include <stdio.h>
#include <string.h>
#include <errno.h>
int main( )
{
  FILE * multi_column_file;
  char  multi_column_name[] = "td_short.csv";
  int    field_count =    0;
 
  extern int errno;
 
  multi_column_file = fopen( multi_column_name, "r");

  size_t len = 0;
  ssize_t read = 0;;
     
  int row_number;
  float time, sine_1, sine_2, sine_3, sine_4, sine_5, composite = 0.0;
     
  char a[ 128 ] = "no";  char b[ 128 ] = "no"; char c[ 128 ] = "no";
  char d[ 128 ] = "no";  char e[ 128 ] = "no"; char f[ 128 ] = "no";
  char g[ 128 ] = "no";  char h[ 128 ] = "no";
     
      printf( "\n%4d errno at for entry        %d", __LINE__, errno );

      for( int i = 0; i < 5; i ++ )
      {
          // Simplified test, get the values as string
        field_count = fscanf( multi_column_file, "%s, %s, %s, %s, %s, %s, %s, %s\n",
            &a, &b, &c, &d, &e, &f, &g, &h );

        printf( "\n%4d after read strings errno %d", __LINE__, errno );
        printf( "\n%4d field_count %4d text  %s %s %s %s %s %s %s %s",
            __LINE__, field_count, a, b, c, d, e, f, g, h );
        printf( "\n%4d length a %4d length b %4d\n", __LINE__, strlen( a ), strlen( b ) );
       

           
          // This is the goal
        field_count = fscanf( multi_column_file, "%d, %f, %f, %f, %f, %f, %f, %f\n",
            &row_number, &time, &sine_1, &sine_2, &sine_3, &sine_4, &sine_5, &composite );

        printf( "\n%4d after read numbers errno %d", __LINE__, errno );
        printf( "\n%4d field_count %4d text  %d %f %f %f %f %f %f %f",
            __LINE__, field_count, &row_number, &time, &sine_1,
            &sine_2, &sine_3, &sine_4, &sine_5, &composite );     

        printf( "\n%4d end loop .............\n",  __LINE__ );
      }

  fclose( multi_column_file );
  printf( "\n%4d Exit\n", __LINE__ );
  return 0;
}

Here is the shortened input data, removed the header.
Code:

    0,  0.0001,  0.0002,  0.0003,  0.0004,  0.0005,  0.0006,    0.0007
    1,  0.0008,  5.2264,  8.3165,  9.2705,  8.1347,  20.0000,  50.9481
    2,  0.0017,  10.3956,  16.2695,  17.6336,  14.8629,  34.6410,  93.8025
    3,  0.0025,  15.4508,  23.5114,  24.2705,  19.0211,  40.0000,  122.2539
    4,  0.0033,  20.3368,  29.7258,  28.5317,  19.8904,  34.6410,  133.1258
    5,  0.0042,  25.0000,  34.6410,  30.0000,  17.3205,  20.0000,  126.9615
    6,  0.0050,  29.3893,  38.0423,  28.5317,  11.7557,  0.0000,  107.7189
    7,  0.0058,  33.4565,  39.7809,  24.2705,  4.1582, -20.0000,  81.6661
    8,  0.0067,  37.1572,  39.7809,  17.6336,  -4.1582, -34.6410,  55.7724
    9,  0.0075,  40.4508,  38.0423,  9.2705, -11.7557, -40.0000,  36.0079
  10,  0.0083,  43.3013,  34.6410,  0.0000, -17.3205, -34.6410,  25.9808
  11,  0.0092,  45.6773,  29.7258,  -9.2705, -19.8904, -20.0000,  26.2421
  12,  0.0100,  47.5528,  23.5114, -17.6336, -19.0211,  -0.0000,  34.4095
  13,  0.0108,  48.9074,  16.2695, -24.2705, -14.8629,  20.0000,  46.0434
  14,  0.0117,  49.7261,  8.3165, -28.5317,  -8.1347,  34.6410,  56.0172
  15,  0.0125,  50.0000,  0.0000, -30.0000,  -0.0000,  40.0000,  60.0000
  16,  0.0133,  49.7261,  -8.3165, -28.5317,  8.1347,  34.6410,  55.6537

And here is the ouptut
Code:

[mcs@localhost sim]$ ./a.out

  30 errno at for entry        0
  38 after read strings errno 0
  40 field_count    1 text  0, no no no no no no no
  41 length a    2 length b    2

  49 after read numbers errno 0
  51 field_count    1 text  -888742996 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
  54 end loop .............

  38 after read strings errno 0
  40 field_count    1 text  .0001, no no no no no no no  /// added notice field 0.0001
  41 length a    6 length b    2

  49 after read numbers errno 0
  51 field_count    1 text  -888742996 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
  54 end loop .............

  38 after read strings errno 0
  40 field_count    1 text  .0002, no no no no no no no  /// compare with 0001 above, this is the next field on that row
  41 length a    6 length b    2

  49 after read numbers errno 0
  51 field_count    1 text  -888742996 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
  54 end loop .............

 lines omitted
  58 Exit


rtmistler 08-27-2020 06:06 AM

DEBUG IT

You started to debug field_count in that compiled out code.

Debug it for the problem lines.

Perhaps the sample of data shown should show, lines before, on, and after line 79.

Use GDB and break at the fscanf() call, step over it and check the outcome.

Check ERRNO if field_count is negative.

Move the problem lines up to start at line 2 and see what happens.

Use fgets() and see what it thinks about those lines.

Read the whole file into a RAM buffer and examine that in the debugger.

GazL 08-27-2020 06:28 AM

scanf() doesn't treat a comma as a field-separator. When your input contains commas you need to specify them in the format string something like scanf(" %d, %f, %f, ...", &args, ...) else you'll get a format mismatch when it hits the comma.

Oh, and use 'free()' not 'delete' in a C program.

rtmistler 08-27-2020 08:10 AM

Figured I'd give that a small try, but your code has numerous compile errors.

boughtonp 08-27-2020 08:16 AM

Quote:

The goal is to read a CSV file and put the values into variables
I'm pretty sure there'll be an existing library that is tested and optimised to do that already. Why would you want to spend time re-writing a buggy version of it?


bkelly13 08-27-2020 09:48 AM

Quote:

Originally Posted by rtmistler (Post 6160052)
Figured I'd give that a small try, but your code has numerous compile errors.

That code did compile and run on my machine, Centos, compiles with command c++. I will look an ERRNO and check in the debugger.
Thanks for giving it a try.

EdGr 08-27-2020 11:19 AM

In addition to the delimiter problem that GazL pointed out, the code has a buffer overrun on the string variables. The safest method is to use getline to read the line and allocate a buffer, and then parse the line yourself.
Ed

bkelly13 08-28-2020 11:22 AM

Quote:

Originally Posted by GazL (Post 6160036)
scanf() doesn't treat a comma as a field-separator. When your input contains commas you need to specify them in the format string something like scanf(" %d, %f, %f, ...", &args, ...) else you'll get a format mismatch when it hits the comma.

Oh, and use 'free()' not 'delete' in a C program.

I was sure this would fix the problem, but no such luck. I edited the OP and shortened the code down to essentials, and showed the output.

foughtonp: Yes, there is probably such a library. But, once past the header, every line of data takes the exact same format.

In the meantime, I will be looking a sscanf(), or scans(), or something like that to read into a string the parse the string. Extra code but maybe I can make that work.

Thank you for your time and patience.

rtmistler 08-28-2020 11:50 AM

Probably better to skip using a fully inclusive library function like any of the scanf() variety and processing line by line in character arrays, looking for the comma delimiter.

More exhaustive code, but that's what loops are for.

Hey, ... done plenty decoding serial protocols, especially since you don't get the entire packet all the time. At least here you do not have to have a state machine which is re-entrant to remember your last parsed point, you just validate line by line from a file.

GazL 08-28-2020 04:04 PM

%s will eat the comma in the input, so there won't be one to match the literal comma in the scanf format string. It should work with %d and %f however.

Here's a minimal example to show it works with your input file:
Code:

#include <stdio.h>

int main()
{

    int n;
    int d;
    double f[7];
       
    while ( ( n = scanf( " %d, %lf, %lf, %lf, %lf, %lf, %lf, %lf", 
                        &d , &f[0], &f[1], &f[2], &f[3], &f[4], &f[5], &f[6]) ) != EOF )
    {
        if ( n != 8 )
        {
            fprintf(stderr, "invalid input line, only %d fields.\n", n);
            break;
        }

        printf ("Line % 4d\t%12.6f %12.6f %12.6f %12.6f %12.6f %12.6f %12.6f\n", d, f[0], f[1], f[2], f[3], f[4], f[5], f[6] );
       
    }
   
    return 0;
}

P.S. This example wouldn't detect a line with too many input fields.

scanf() is notoriously poor at dealing with malformed input. If robustness matters you'll likely be better off using getline() with strtok() and strtod().

bkelly13 08-28-2020 04:37 PM

Ok, I have changed my strategy and am going with the strtok and the strtod.
Wrote some test code and got it working.
Thank you to each of you for your time and patience.


All times are GMT -5. The time now is 03:50 PM.