LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-17-2005, 07:34 PM   #1
heyyou
LQ Newbie
 
Registered: Mar 2005
Posts: 2

Rep: Reputation: 0
Extracting a specific line from an ASCII file


I have a (large) file that contains a specific string on the nth line. I need a fast command line or script that will output the _previous line_, i.e. line number n - 1.

It can be awk, Perl, ksh, ... as long as it can run under Solaris from the command line and it is very fast (and simple).

As an example, say the file looks like this:

line 1
line 2
line 3
....
this is the line I want to extract
abcXXXXXdef
....
line 100
line 101

When I execute :

yourscript XXXXX

the output is:

this is the line I want to extract

As a bonus question, it would be even better if the script could return specific strings from the previous line! The previous line format looks like this
[1 0 0 1 111.11 222.22] 0 0 333.33 444.44

yourscript XXXXX
would ideally return
111.11 222.22 333.33 444.44

Note that the 999.99 format can vary: It is any decimal number with any number of decimal digits

Thanks
 
Old 03-17-2005, 08:10 PM   #2
AltF4
Member
 
Registered: Sep 2002
Location: .at
Distribution: SuSE, Knoppix
Posts: 532

Rep: Reputation: 31
Code:
#!/usr/bin/perl -w

use strict;

my $LINE;               # Line buffer
my $LINENUM = 4;        # Line number to match

#
my $NARGS = $#ARGV + 1;
if ( $NARGS != 1 ) {
        print "USE: cooltool.pl filename\n";
        exit(1);
}
my $FILENAME = $ARGV[0];
open ( F , "<$FILENAME" ) or die "error opening $FILENAME\n";


while ( $LINE = <F> ) {
        if ( $. == $LINENUM ) {
                # substitute
                # "[1 0 0 1 111.11 222.22] 0 0 333.33 444.44"
                # by "111.11 222.22 333.33 444.44"
                $LINE =~ s/^\[\d+ \d+ \d+ \d+ (\d+\.?\d* \d+\.?\d*)\] \d+ \d+ (\d+\.?\d* \d+\.?\d*)/$1 $2/;
                print $LINE;
                last;
        }
}

close(F);
exit(0);
 
Old 03-17-2005, 09:22 PM   #3
perfect_circle
Senior Member
 
Registered: Oct 2004
Location: Athens, Greece
Distribution: Slackware, arch
Posts: 1,783

Rep: Reputation: 53
Something like this will output you the previous line of every line matching a pattern (XXXXX)
Code:
#!/bin/bash
#usage: script <pattern> <filename>
for i in `grep -n -e "$1" $2 |cut -d':' -f1`; do
    head -n $(($i-1)) $2 |tail -n 1
done
 
Old 03-18-2005, 05:50 AM   #4
dustu76
Member
 
Registered: Sep 2004
Distribution: OpenSuSe
Posts: 153

Rep: Reputation: 30
This may be slightly OT. I simply tried extracting a line given the line number from a largish file (on Solaris). My program is:

Code:
#!/usr/bin/ksh

echo "Enter file name : \c"; read fname
lc=$(wc -l $fname |awk '{print $1}')
ml=$(($lc\/2))
echo "Line count : $lc"
echo "Middle line : $ml"

echo "--------headtail---------"
time head -n ${ml} $fname |tail -1

echo "--------sed--------------"
time sed -n -e "${ml},${ml}p" $fname

echo "--------nl--------------"
time nl -ba -nln -s+ $fname |grep "^${ml}" |cut -d"+" -f2

echo "--------awk------------"
time nawk -v ml=$ml 'NR==ml {print}' $fname
The head/tail approach by perfect_circle was generally (8-10 runs) SLOWEST when the length of each line is short:

Code:
SF1B : /supmis/ora/11mar05 > sd
Enter file name : bbbb
Line count : 390614
Middle line : 195307
--------headtail---------
000401557901

real    0m1.66s
user    0m0.44s
sys     0m2.74s
--------sed--------------
000401557901

real    0m0.35s
user    0m0.15s
sys     0m0.20s
--------nl--------------
000401557901

real    0m0.93s
user    0m1.07s
sys     0m0.27s
--------awk------------
000401557901

real    0m0.87s
user    0m0.81s
sys     0m0.05s
SF1B : /supmis/ora/11mar05 >
BUT, the same approach was generally FASTEST when the lines were much longer:

Code:
SF1B : /supmis/ora/11mar05 > sd
Enter file name : 0004newpl.dat
Line count : 390614
Middle line : 195307
--------headtail---------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac
|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004

real    0m2.85s
user    0m0.95s
sys     0m3.89s
--------sed--------------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004

real    0m4.74s
user    0m2.24s
sys     0m2.47s
--------nl--------------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004

real    0m2.98s
user    0m3.25s
sys     0m1.92s
--------awk------------
0004|504807296|000401557901|Y|N|08050|GAA05|01|735|RTL-INDIVIDUAL|GRP AC WITH AVGBAL= 5000|C|no|1. Up to Rs 1 lac|N|0|G|More Than 3 Months|27-JUL-2004|11-MAR-2005|VINOD VASANT PATIL|R1|N|INR|N|1|SBA|SBKIT|15-NOV-2004

real    0m3.22s
user    0m2.51s
sys     0m0.71s
SF1B : /supmis/ora/11mar05 >
Maybe there is nothing intriguing here & I'm just being picky (but if there is - I would like to know the reason)....
 
Old 03-18-2005, 10:27 AM   #5
heyyou
LQ Newbie
 
Registered: Mar 2005
Posts: 2

Original Poster
Rep: Reputation: 0
Perfect_circle elegant solution makes two passes to the file isn't it? First a grep, then a head.

Would a Perl (a language I do not know) script that makes only one pass be better? i.e. with a pseudo code along these lines:

previous_line = blank
do while pattern not found and not EOF:
read new line
if new line matches *pattern_we_are_looking_for* then {output previous_line 4 parameters, then exit loop}
previous_line = current_line
end loop
exit
 
Old 03-19-2005, 07:20 AM   #6
perfect_circle
Senior Member
 
Registered: Oct 2004
Location: Athens, Greece
Distribution: Slackware, arch
Posts: 1,783

Rep: Reputation: 53
If efficiency and speed is what you need, you should wright this in C. It's really simple
 
Old 03-21-2005, 02:48 AM   #7
AltF4
Member
 
Registered: Sep 2002
Location: .at
Distribution: SuSE, Knoppix
Posts: 532

Rep: Reputation: 31
Code:
#!/usr/bin/perl -w

# print line before pattern

use strict;

my $LINE;               # Line buffer
my $PATTERN = "^abc.*def";      # what to find

my $NARGS = $#ARGV + 1;
if ( $NARGS != 1 ) {
        print "USE: cooltool.pl filename\n";
        exit(1);
}
my $FILENAME = $ARGV[0];
open ( F , "<$FILENAME" ) or die "error opening $FILENAME\n";


my $LASTLINE;
while ( $LINE = <F> ) {
        if ( $LINE =~ /$PATTERN/ ) {
                # substitute
                # "[1 0 0 1 111.11 222.22] 0 0 333.33 444.44"
                # by "111.11 222.22 333.33 444.44"
                $LASTLINE =~ s/^\[\d+ \d+ \d+ \d+ (\d+\.?\d* \d+\.?\d*)\] \d+ \d+ (\d+\.?\d* \d+\.?\d*)/$1
$2/;
                print $LASTLINE;
                #last; # uncomment if you ONLY need to find the 1st occurance
        }
        $LASTLINE = $LINE;
}

close(F);
exit(0);

Last edited by AltF4; 03-21-2005 at 05:57 PM.
 
Old 03-21-2005, 04:19 PM   #8
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
Code:
head -linenumber | tail -1
For a really short cmd line
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
extracting line number s_shenbaga Linux - Newbie 1 10-21-2005 05:52 AM
Non ASCII (language specific) characters in filenames? milicic.marko Linux - General 2 03-19-2005 05:15 PM
line numbering in ASCII file rohr Programming 4 03-18-2005 09:14 AM
Shell script - how to show a specific line of a text file davi_cabral Linux - Software 3 09-28-2004 01:39 PM
extracting line by line from a file sanjith11 Programming 3 07-02-2004 07:38 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration