I can't figure out why this script is not giving me the output I expect, I assume it's my fault, pointers would be appreciated.
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I can't figure out why this script is not giving me the output I expect, I assume it's my fault, pointers would be appreciated.
I want the script to scan multiple csv's for certain column titles and tell me whether or not the titles were found and, if they were found, what their column/field numbers are. The output is sent to another csv in a different folder
The line numbers, 1) to 17), have been added for reference for this post.
Code:
1) gawk -F, '{if(NR ==1){
2) print "a ,FILENAME ,FNR ,NR ,a ,set_max_height_col_label ,set_max_height_col_nos ,a ,max_height_status ,status_clo_nos ,a ,activation_date ,activ_date_col_nos" > ("output/set_max_height_all_hopefully.csv")
3) print " " >> ("output/set_max_height_all_hopefully.csv")
4) }
5) if (FNR ==1) {Limit=0 ;status=0 ;act_date=0 ;cCol_titles_found=0 }
6) if ((FNR <=2) && (cCol_titles_found <1) && ($1 !~/sep/) && ($1 !="")) { cCol_titles_found++
7) for (i=1; i<=NF; i++){if (($i ~/HOME.heightLimit/) && ($i !~/HOME.heightLimitStatus/)) {Limit = i }
8) if (($i ~/HOME.heightLimitStatus/) || ($i ~/HOME.isReachedLimitHeight/)) {status = i }
9) if (($i ~/RECOVER.activeTimestamp/)|| ($i ~/DETAILS.activeTimestamp/)) {act_date = i }
10) }
11)
12) if (Limit ==0) {$(Limit) =1}
13) if (status ==0) {$(status) =2}
14) if (act_date ==0) {$(act_date) =3}
15) print " ,"FILENAME","FNR","NR","$(Limit)","Limit","$(status)","status", "$(act_date)","act_date >> ("output/set_max_height_all_hopefully.csv")
# close ------------------------------------------------------------------------------------(FNR <=2) && ......
16) }
17) }' *.csv > output/$file1
I believe everything works correctly up to 'line 11'.
In some of the csv's some of the columns do not exist, so, in those cases "Limit", "status" and "act_date" will, as appropriate, be 0, zero, at 'line 11'. They are zeroed, by 'line 5', at the start the scan of each 'new' csv. Where this applies lines 12 to 14 are intended to prevent, in 'line 14', the printing of "$0" and assign non zero values to the relevant $(xyz).
This appears to work as I see 1, 2 ,3 in the correct places in the output.
I think the problem lies in the output of line 15. Wherever the column titles have been found i.e. "Limit" or "status" or "act_date" are not zero, the cells in the output csv, that should contain the column's title, are empty.
Any ideas where I am going wrong?
Thanks.
Last edited by sean mckinney; 03-13-2023 at 01:03 PM.
I want the script to scan multiple csv's for certain column titles and tell me whether or not the titles were found and, if they were found, what their column/field numbers are.
A proper CSV parser will have already encountered and dealt with all the bugs and edge cases you're currently spending time re-creating.
Yes. If you are in a hurry then use Perl's CSV parser. If you are not in a hurry then the upstream version of AWK now has CSV support and it is just a matter of years until the changes percolate out to the various distros.
Based on the Text::CSV manual page, untested:
Code:
#!/usr/bin/perl
use Text::CSV;
use strict;
use warnings;
my $file = shift || die;
my @rows;
# Read/parse CSV
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open(my $fh, "<:encoding(utf8)", $file) or die("$file: $!");
while (my $row = $csv->getline ($fh)) {
$row->[2] =~ m/pattern/ or next; # 3rd field should match
push @rows, $row;
}
close $fh;
# and write as CSV
open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
$csv->say ($fh, $_) for @rows;
close $fh or die "new.csv: $!";
exit(0);
You need to prepare a small csv file, reproduce the error and insert logging statements to that script and you will "easily" find the problem. But a real csv parser is definitely a much better and safer way.
Thanks Boughtonp etc..
I am aware that a proper csv parser would be better but I would have to learn that from scratch and at a guess I would still end up asking questions for thing I can not see.
The gist of the script in this thread is intended to be added to the script of the post that Boughton cited and that script works, to my satifaction, scanning around 3700 csv's, totalling around 5,200,000+ lines whilst looking at the contents of around 50 columns in those csv's.
Given that the script that Boughton cited is now around 1100 lines long it would be a massive amount of work to
a) learn a parser and
b) rewrite the code.
Just as a matter of interest I may well do just that, since I am thinking af trying to adapt the larger script to run on a Macbook. But for the moment and taking it that I accept that folks are correct and that a propser parser would be better, I would still like to know if anyone can you see what is the problem in the script in post number 1 of this thread.
Last edited by sean mckinney; 03-13-2023 at 01:32 PM.
If you refactor it for readability, the bug might become more visible either during the rewrite or as a result of being able to see it more clearly afterwards. There are a lot of ways that script could be made more readable.
Code:
#!/usr/bin/awk -f
BEGIN {
FS =","
out = "output/set_max_height_all_hopefully.csv"
}
NR == 1 {
print "a ,FILENAME ,FNR ,NR ,a ,set_max_height_col_label " \
",set_max_height_col_nos ,a ,max_height_status ," \
"status_clo_nos ,a ,activation_date ,activ_date_col_nos" \
> out
print " " > out
}
FNR == 1 {
Limit=0;
status=0;
act_date=0;
cCol_titles_found=0;
}
(FNR <= 2) && (cCol_titles_found < 1) && ($1 !~ /sep/) && ($1 != "") {
cCol_titles_found++
for (i=1; i<=NF; i++) {
if (($i ~/HOME.heightLimit/) &&
($i !~/HOME.heightLimitStatus/)) {
Limit = i
}
if (($i ~/HOME.heightLimitStatus/) ||
($i ~/HOME.isReachedLimitHeight/)) {
status = i
}
if (($i ~/RECOVER.activeTimestamp/) ||
($i ~/DETAILS.activeTimestamp/)) {
act_date = i
}
}
if (Limit ==0) {
$(Limit) = 1
}
if (status ==0) {
$(status) = 2
}
if (act_date ==0) {
$(act_date) = 3
}
print " c," FILENAME "," FNR "," NR ", c ," $(Limit) \
"," Limit ", c ," $(status) "," status ", c ,"\
$(act_date) "," act_date \
>> out
}
So, for example, most of the if() clauses there can be made more AWKish by omitting the if() part. Indentation and line wrapping are generally considered helpful, too. If you need line numbers, then most editors have an option to turn that on or off for you so that the paste into LQ is not broken. Or, as a last resort, use cat -n on the script.
12) if (Limit ==0) {$(Limit) =1}
13) if (status ==0) {$(status) =2}
14) if (act_date ==0) {$(act_date) =3}
The $(Limit) and $(status) and $(act_date) evaluate to $0 i.e. the entire line becomes a number. Is this intended?
Perhaps you simply want to change the variables' values?:
Code:
12) if (Limit ==0) {Limit =1}
13) if (status ==0) {status =2}
14) if (act_date ==0) {act_date =3}
Last edited by MadeInGermany; 03-16-2023 at 10:17 AM.
@Turbocapatalist, thanks for the indentation example, it's proving very useful in tiding up the main script, which was a MESS. I will look into trying your BEGIN, it's something I have been meaning to do.
That said I still couldn't see what's wrong with the tidied up version of my idea, so, in the end I rewrote it using simple "if" and "print" statements. The new version works.
Many thanks.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.