LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   tail help (https://www.linuxquestions.org/questions/programming-9/tail-help-4175703590/)

sean mckinney 11-14-2021 11:37 AM

tail help
 
As an addition to what I was doing / am doing in https://www.linuxquestions.org/quest...54#post6299554

I want to output the last 20 lines of all the csv's in one folder to a csv in another folder.

I think

tail -n 20 *.csv > folder2/last-20s.csv

works beautifully, I am currently using a windows computer and can't check the actual command I wrote, hence "I think".

But, in the process of writing each qualifying line to last-20s.csv, I would like to, if possible, add
1) the count for total number of lines in all the csv's processed to that instant.
2) the count for number of lines processed to that instant in the individual csv being processed.
3) the name of the individual csv being processed.

If the terminology is applicable to "tail" the above might amount to NR, FNR and FILENAME.
If relevant and necessary, I would be using awk or gawk as it/they is/are the only programming language/s with which I have any familiarity.
For reference there are 2000+ csv's being scanned with a total of well over 4,000,000 lines.
Any help would be gratefully received.

shruggy 11-14-2021 12:33 PM

Quote:

Originally Posted by sean mckinney (Post 6301270)
I think

tail -n 20 *.csv > folder2/last-20s.csv

works beautifully

The result won't be a CSV anymore unless you also specify -q, because tail prints names of processed files by default:
Code:

$ tail -n3 <(seq 5) <(seq 20)
==> /dev/fd/63 <==
3
4
5

==> /dev/fd/62 <==
18
19
20

Code:

$ tail -qn3 <(seq 5) <(seq 20)
3
4
5
18
19
20

Quote:

If relevant and necessary, I would be using awk or gawk as it/they is/are the only programming language/s with which I have any familiarity.
Yes, I guess using awk is more reasonable in this case considering your requirements. Although emulating tail with awk is rather inefficient.

With gawk, I'd try something like
Code:

gawk -vOFS=, 'ENDFILE{print NR,FNR,FILENAME; print}' *.csv
The second print could be replaced with something different, e.g. with
Code:

system("tail -n20 " FILENAME)
But then again, miller has both tail and the awk-like variables.

Turbocapitalist 11-14-2021 02:03 PM

Perl might do the job.

Code:

perl -a -n -e '
        BEGIN {
              $file=$ARGV;
        }

        if($file ne $ARGV) {
              print "\n$file Count: $c\n";
              print(join("\n", @a),"\n");
              $file=$ARGV;
              $c=0;
        }
        $c++;
        push(@a, $F[0]);
        shift(@a) if($#a>=20); 

        END {
              print "\n$file Count: $c\n";
              print(join("\n", @a),"\n");
        }
' *.csv

Thus it can be made to work a little like AWK, but with more flexibility.


All times are GMT -5. The time now is 01:55 PM.