ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a (large) file that contains a specific string on the nth line. I need a fast command line or script that will output the _previous line_, i.e. line number n - 1.
It can be awk, Perl, ksh, ... as long as it can run under Solaris from the command line and it is very fast (and simple).
As an example, say the file looks like this:
line 1
line 2
line 3
....
this is the line I want to extract
abcXXXXXdef
....
line 100
line 101
When I execute :
yourscript XXXXX
the output is:
this is the line I want to extract
As a bonus question, it would be even better if the script could return specific strings from the previous line! The previous line format looks like this
[1 0 0 1 111.11 222.22] 0 0 333.33 444.44
yourscript XXXXX
would ideally return
111.11 222.22 333.33 444.44
Note that the 999.99 format can vary: It is any decimal number with any number of decimal digits
This may be slightly OT. I simply tried extracting a line given the line number from a largish file (on Solaris). My program is:
Code:
#!/usr/bin/ksh
echo "Enter file name : \c"; read fname
lc=$(wc -l $fname |awk '{print $1}')
ml=$(($lc\/2))
echo "Line count : $lc"
echo "Middle line : $ml"
echo "--------headtail---------"
time head -n ${ml} $fname |tail -1
echo "--------sed--------------"
time sed -n -e "${ml},${ml}p" $fname
echo "--------nl--------------"
time nl -ba -nln -s+ $fname |grep "^${ml}" |cut -d"+" -f2
echo "--------awk------------"
time nawk -v ml=$ml 'NR==ml {print}' $fname
The head/tail approach by perfect_circle was generally (8-10 runs) SLOWEST when the length of each line is short:
Code:
SF1B : /supmis/ora/11mar05 > sd
Enter file name : bbbb
Line count : 390614
Middle line : 195307
--------headtail---------
000401557901
real 0m1.66s
user 0m0.44s
sys 0m2.74s
--------sed--------------
000401557901
real 0m0.35s
user 0m0.15s
sys 0m0.20s
--------nl--------------
000401557901
real 0m0.93s
user 0m1.07s
sys 0m0.27s
--------awk------------
000401557901
real 0m0.87s
user 0m0.81s
sys 0m0.05s
SF1B : /supmis/ora/11mar05 >
BUT, the same approach was generally FASTEST when the lines were much longer:
Code:
SF1B : /supmis/ora/11mar05 > sd
Enter file name : 0004newpl.dat
Line count : 390614
Middle line : 195307
--------headtail---------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac
|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004
real 0m2.85s
user 0m0.95s
sys 0m3.89s
--------sed--------------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004
real 0m4.74s
user 0m2.24s
sys 0m2.47s
--------nl--------------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004
real 0m2.98s
user 0m3.25s
sys 0m1.92s
--------awk------------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004
real 0m3.22s
user 0m2.51s
sys 0m0.71s
SF1B : /supmis/ora/11mar05 >
Maybe there is nothing intriguing here & I'm just being picky (but if there is - I would like to know the reason)....
Perfect_circle elegant solution makes two passes to the file isn't it? First a grep, then a head.
Would a Perl (a language I do not know) script that makes only one pass be better? i.e. with a pseudo code along these lines:
previous_line = blank
do while pattern not found and not EOF:
read new line
if new line matches *pattern_we_are_looking_for* then {output previous_line 4 parameters, then exit loop}
previous_line = current_line
end loop
exit
#!/usr/bin/perl -w
# print line before pattern
use strict;
my $LINE; # Line buffer
my $PATTERN = "^abc.*def"; # what to find
my $NARGS = $#ARGV + 1;
if ( $NARGS != 1 ) {
print "USE: cooltool.pl filename\n";
exit(1);
}
my $FILENAME = $ARGV[0];
open ( F , "<$FILENAME" ) or die "error opening $FILENAME\n";
my $LASTLINE;
while ( $LINE = <F> ) {
if ( $LINE =~ /$PATTERN/ ) {
# substitute
# "[1 0 0 1 111.11 222.22] 0 0 333.33 444.44"
# by "111.11 222.22 333.33 444.44"
$LASTLINE =~ s/^\[\d+ \d+ \d+ \d+ (\d+\.?\d* \d+\.?\d*)\] \d+ \d+ (\d+\.?\d* \d+\.?\d*)/$1
$2/;
print $LASTLINE;
#last; # uncomment if you ONLY need to find the 1st occurance
}
$LASTLINE = $LINE;
}
close(F);
exit(0);
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.