LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-13-2023, 11:01 AM   #1
sean mckinney
Member
 
Registered: Mar 2004
Posts: 32

Rep: Reputation: 0
I can't figure out why this script is not giving me the output I expect, I assume it's my fault, pointers would be appreciated.


I want the script to scan multiple csv's for certain column titles and tell me whether or not the titles were found and, if they were found, what their column/field numbers are. The output is sent to another csv in a different folder
The line numbers, 1) to 17), have been added for reference for this post.

Code:
1) gawk -F, '{if(NR ==1){
2)		      print "a ,FILENAME ,FNR ,NR ,a ,set_max_height_col_label ,set_max_height_col_nos ,a ,max_height_status ,status_clo_nos ,a ,activation_date ,activ_date_col_nos" > ("output/set_max_height_all_hopefully.csv")
3)	      	      print " " >> ("output/set_max_height_all_hopefully.csv")
4)  		      }
5)	   	      if (FNR ==1) {Limit=0 ;status=0 ;act_date=0 ;cCol_titles_found=0 }		      
6)		      if ((FNR <=2) && (cCol_titles_found <1) && ($1 !~/sep/) && ($1 !=""))     { cCol_titles_found++		      
7)		      	      	         for (i=1; i<=NF; i++){if  (($i  ~/HOME.heightLimit/)	      && ($i !~/HOME.heightLimitStatus/))	{Limit = i }
8)				   	    		       if  (($i  ~/HOME.heightLimitStatus/) || ($i ~/HOME.isReachedLimitHeight/))	{status = i }
9)							       if  (($i  ~/RECOVER.activeTimestamp/)|| ($i ~/DETAILS.activeTimestamp/))	{act_date = i }
10)					      	 	       }	
11)			   
12)		      		         if (Limit ==0)  	       {$(Limit)    =1}
13)		      		         if (status ==0) 	       {$(status)   =2}
14)		      		         if (act_date ==0) 	       {$(act_date) =3}
15)		      		         print "  ,"FILENAME","FNR","NR","$(Limit)","Limit","$(status)","status", "$(act_date)","act_date >> ("output/set_max_height_all_hopefully.csv")
				   
# close ------------------------------------------------------------------------------------(FNR <=2) && ......
16)												 }

17)			 }' *.csv > output/$file1
I believe everything works correctly up to 'line 11'.

In some of the csv's some of the columns do not exist, so, in those cases "Limit", "status" and "act_date" will, as appropriate, be 0, zero, at 'line 11'. They are zeroed, by 'line 5', at the start the scan of each 'new' csv. Where this applies lines 12 to 14 are intended to prevent, in 'line 14', the printing of "$0" and assign non zero values to the relevant $(xyz).
This appears to work as I see 1, 2 ,3 in the correct places in the output.

I think the problem lies in the output of line 15. Wherever the column titles have been found i.e. "Limit" or "status" or "act_date" are not zero, the cells in the output csv, that should contain the column's title, are empty.

Any ideas where I am going wrong?

Thanks.

Last edited by sean mckinney; 03-13-2023 at 01:03 PM.
 
Old 03-13-2023, 11:17 AM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,660

Rep: Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584Reputation: 2584
Quote:
Originally Posted by sean mckinney View Post
I want the script to scan multiple csv's for certain column titles and tell me whether or not the titles were found and, if they were found, what their column/field numbers are.
Hrm, why does that sound familiar?

How do I scan several hundreds files for, in each file the first instance of an entry in a particular column and.......

As I noted in your previous thread:
Quote:
Originally Posted by boughtonp View Post
CSV logic should use a CSV parser - I would be looking to implement this in a language with a dedicated CSV parser, e.g. Python.
A proper CSV parser will have already encountered and dealt with all the bugs and edge cases you're currently spending time re-creating.

 
Old 03-13-2023, 11:36 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,399
Blog Entries: 3

Rep: Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778
Quote:
Originally Posted by boughtonp View Post
A proper CSV parser will have already encountered and dealt with all the bugs and edge cases you're currently spending time re-creating.
Yes. If you are in a hurry then use Perl's CSV parser. If you are not in a hurry then the upstream version of AWK now has CSV support and it is just a matter of years until the changes percolate out to the various distros.

Based on the Text::CSV manual page, untested:

Code:
#!/usr/bin/perl

use Text::CSV;
use strict;
use warnings;

my $file = shift || die;

my @rows;

# Read/parse CSV
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open(my $fh, "<:encoding(utf8)", $file) or die("$file: $!");
while (my $row = $csv->getline ($fh)) {
        $row->[2] =~ m/pattern/ or next; # 3rd field should match
        push @rows, $row;
}

close $fh;

# and write as CSV
open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
$csv->say ($fh, $_) for @rows;
close $fh or die "new.csv: $!";

exit(0);
 
Old 03-13-2023, 12:01 PM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,167

Rep: Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393Reputation: 7393
You need to prepare a small csv file, reproduce the error and insert logging statements to that script and you will "easily" find the problem. But a real csv parser is definitely a much better and safer way.
 
Old 03-13-2023, 01:16 PM   #5
sean mckinney
Member
 
Registered: Mar 2004
Posts: 32

Original Poster
Rep: Reputation: 0
Thanks Boughtonp etc..
I am aware that a proper csv parser would be better but I would have to learn that from scratch and at a guess I would still end up asking questions for thing I can not see.
The gist of the script in this thread is intended to be added to the script of the post that Boughton cited and that script works, to my satifaction, scanning around 3700 csv's, totalling around 5,200,000+ lines whilst looking at the contents of around 50 columns in those csv's.

Given that the script that Boughton cited is now around 1100 lines long it would be a massive amount of work to
a) learn a parser and
b) rewrite the code.

Just as a matter of interest I may well do just that, since I am thinking af trying to adapt the larger script to run on a Macbook. But for the moment and taking it that I accept that folks are correct and that a propser parser would be better, I would still like to know if anyone can you see what is the problem in the script in post number 1 of this thread.

Last edited by sean mckinney; 03-13-2023 at 01:32 PM.
 
Old 03-13-2023, 01:30 PM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,399
Blog Entries: 3

Rep: Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778Reputation: 3778
Quote:
Originally Posted by sean mckinney View Post
b) rewrite the code
If you refactor it for readability, the bug might become more visible either during the rewrite or as a result of being able to see it more clearly afterwards. There are a lot of ways that script could be made more readable.

Code:
#!/usr/bin/awk -f

BEGIN {
        FS =","
        out = "output/set_max_height_all_hopefully.csv"
}

NR == 1 {
        print "a ,FILENAME ,FNR ,NR ,a ,set_max_height_col_label " \
            ",set_max_height_col_nos ,a ,max_height_status ," \
            "status_clo_nos ,a ,activation_date ,activ_date_col_nos" \
            > out
        print " " > out
}

FNR == 1 {       
        Limit=0;        
        status=0;       
        act_date=0;
        cCol_titles_found=0;
}                      

(FNR <= 2) && (cCol_titles_found < 1) && ($1 !~ /sep/) && ($1 != "") {
        cCol_titles_found++                 
        for (i=1; i<=NF; i++) {
                if (($i ~/HOME.heightLimit/) && 
                    ($i !~/HOME.heightLimitStatus/)) {
                        Limit = i
                }
                if (($i  ~/HOME.heightLimitStatus/) || 
                    ($i ~/HOME.isReachedLimitHeight/)) {
                        status = i
                }
                if (($i  ~/RECOVER.activeTimestamp/) || 
                    ($i ~/DETAILS.activeTimestamp/)) {
                        act_date = i
                }
        }      

        if (Limit ==0) {
                $(Limit) = 1
        }
        if (status ==0) {
                $(status) = 2
        }
        if (act_date ==0) {
                $(act_date) = 3
        }

        print "  c," FILENAME "," FNR "," NR ", c ," $(Limit) \
            "," Limit ", c ," $(status) "," status ",  c  ,"\
            $(act_date) "," act_date \
            >> out
}
So, for example, most of the if() clauses there can be made more AWKish by omitting the if() part. Indentation and line wrapping are generally considered helpful, too. If you need line numbers, then most editors have an option to turn that on or off for you so that the paste into LQ is not broken. Or, as a last resort, use cat -n on the script.
 
Old 03-13-2023, 02:39 PM   #7
sean mckinney
Member
 
Registered: Mar 2004
Posts: 32

Original Poster
Rep: Reputation: 0
Many thanks Turbocapitalist.
 
Old 03-16-2023, 09:12 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,860

Rep: Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225
Quote:
Code:
12)		      		         if (Limit ==0)  	       {$(Limit)    =1}
13)		      		         if (status ==0) 	       {$(status)   =2}
14)		      		         if (act_date ==0) 	       {$(act_date) =3}
The $(Limit) and $(status) and $(act_date) evaluate to $0 i.e. the entire line becomes a number. Is this intended?
Perhaps you simply want to change the variables' values?:
Code:
12)		      		         if (Limit ==0)  	       {Limit    =1}
13)		      		         if (status ==0) 	       {status   =2}
14)		      		         if (act_date ==0) 	       {act_date =3}

Last edited by MadeInGermany; 03-16-2023 at 10:17 AM.
 
Old 03-18-2023, 07:47 AM   #9
sean mckinney
Member
 
Registered: Mar 2004
Posts: 32

Original Poster
Rep: Reputation: 0
@Turbocapatalist, thanks for the indentation example, it's proving very useful in tiding up the main script, which was a MESS. I will look into trying your BEGIN, it's something I have been meaning to do.
That said I still couldn't see what's wrong with the tidied up version of my idea, so, in the end I rewrote it using simple "if" and "print" statements. The new version works.
Many thanks.
 
Old 03-18-2023, 07:49 AM   #10
sean mckinney
Member
 
Registered: Mar 2004
Posts: 32

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by MadeInGermany View Post
The $(Limit) and $(status) and $(act_date) evaluate to $0 i.e. the entire line becomes a number.
[/CODE]
Thanks, can I ask, what do you mean by "the entire line becomes an number."?
 
Old 03-26-2023, 11:35 AM   #11
sean mckinney
Member
 
Registered: Mar 2004
Posts: 32

Original Poster
Rep: Reputation: 0
Ahhh I might have found the problem but I 'found' this when looking at another puzzle. Some of csv's use a comma as a field separator the others seem to use a tab-space as the field separator.

I have read that I can use two field separtors and experimenting seems to show that replacing
"gawk -F,"
with
"gawk -F[,"\T"]"
works.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
I need to know how to reset or find out what my default key password is any help would be appreciated. matanacusa Linux - Newbie 1 03-17-2018 09:17 AM
Expect script: how do i send function key F12 in an expect script alix123 Programming 4 09-01-2013 09:06 PM
[root@fugo trace]# sh expect.sh expect.sh: line 9: expect: command not found sivaloga Linux - Kernel 1 08-22-2013 04:29 AM
I can't figure out why mount is giving me this error DaRkBoDoM Linux - Software 5 02-01-2008 06:59 AM
Pointers Pointers Pointers urzumph Programming 9 03-11-2004 09:49 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration