LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-13-2010, 04:56 PM   #1
ezekieldas
Member
 
Registered: Mar 2010
Posts: 122

Rep: Reputation: 16
scripting problem: remove lines above foo


I thought this was a cut trick but maybe it's a sed trick?

cat /tmp/foo

a
b
c
7
a
b
88

From the bottom up, remove everything above a to get:

cat /tmp/foo

a
b
88
 
Old 10-13-2010, 05:22 PM   #2
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
There are many alternatives, including "sed". Here's an example:
Code:
sed -n -e '5,$p' foo.txt
Quote:
a
b
88
 
Old 10-13-2010, 05:42 PM   #3
ezekieldas
Member
 
Registered: Mar 2010
Posts: 122

Original Poster
Rep: Reputation: 16
Hmm... thanks paulsm4. I'm trying to recall an example of this. The case is more so *any number of lines* so with my example of a,b,88 --and your solution. I think you're assuming the 'a' is always going to be the 5th line. It may not be.

Reading the file from the bottom up, find the first instance of 'a' and remove everything above it. The character 'a' may be any number of lines above the last line of the file.

I'm stumped... for the moment.
 
Old 10-13-2010, 06:15 PM   #4
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Forget about "a" for a minute. Let's say you want to search for the first "7" and everything after it. Easy:
Code:
$ sed -n -e '/7/,$p' tmp.txt
7
a
b
88
Or first "b":
Code:
$ sed -n -e '/b/,$p' tmp.txt
b
c
7
a
b
88

Last edited by paulsm4; 10-13-2010 at 07:36 PM.
 
Old 10-13-2010, 06:47 PM   #5
ezekieldas
Member
 
Registered: Mar 2010
Posts: 122

Original Poster
Rep: Reputation: 16
The file isn't static, it's a logfile. It's a logfile for a process that, if it receives a -QUIT signal it does a thread dump. A thread dump begins with the string "Full thread dump Java HotSpot" --so more accurately, I want to remove everything above that line. But more to the point: There may be 1,10,or 100 of these in any given logfile --but it's only the most recent one that I care about. That being the case, the file must be read from the bottom up.
 
Old 10-13-2010, 06:57 PM   #6
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,608

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
Hi,

try this
Code:
sed -n '/a/ h; /a/ ! H; $ {x;p}' file
 
Old 10-13-2010, 07:04 PM   #7
ezekieldas
Member
 
Registered: Mar 2010
Posts: 122

Original Poster
Rep: Reputation: 16
Thank you crts. That is truly beautiful. And it works.

I'm looking through the very extensive sed manual page to understand... but do you have a second to explain?
 
Old 10-13-2010, 07:05 PM   #8
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
use awk. Its easier to understand

Code:
# awk '/a/{delete a}{a[FNR]=$0}END{for(i in a) print a[i]}' file
a
b
88
this says, when find "a", delete the array "a". "a[FNR]=$0" means to store the lines into array "a". Lastly, print it out at the END block

Last edited by ghostdog74; 10-13-2010 at 07:08 PM.
 
Old 10-13-2010, 07:49 PM   #9
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,608

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
Quote:
Originally Posted by ezekieldas View Post
Thank you crts. That is truly beautiful. And it works.

I'm looking through the very extensive sed manual page to understand... but do you have a second to explain?
Certainly.
First thing you need to know in order to understand it, is that 'sed' has two buffers. The pattern buffer and the hold buffer.
The pattern buffer can be manipulated, i.e. you can use commands and regular expressions to alter the line that has been read. The hold buffer simply holds some content that you put in there (hence the name). The hold buffer cannot be manipulated.

So now to our example. Let's break it down
Code:
sed -n '/a/ h'
This searches for lines that contain the letter 'a'. If such a line is found then it is copied into the hold buffer. That is what the command 'h' does. This is how our two buffers look like now
Code:
hold      pattern
a\n       a\n
  ^ newline character
The next command is
Code:
/a/ ! H
This searches for lines that do not contain 'a'. That is what the exclamation (!) mark is for. But this time the content is not simply copied to the hold buffer (which would overwrite the content in it), but it is appended to the content that is already in there. That is what 'H' does. This is what the two buffers look like, until the second appearance of 'a'
Code:
hold                  pattern
a\n b\n c\n 7\n       7\n
   ^ there are no spaces; I just added them for readability
When the next line is read, 'a' is encountered again and the the pattern space is copied to the hold buffer, which overwrites everything that was in there. So this is what the buffers look like after the second 'a'
Code:
hold      pattern
a\n       a\n
  ^ newline character
The next lines - since they do not contain 'a' - are again appended to the hold buffer. This goes on until 'sed' reads the last line of the file. The last line is marked by the '$' sign
Code:
$ {x;p}
Since it does not contain 'a' it was already appended to the hold buffer by the second command. So this is what the buffers look like
Code:
hold               pattern
a\n b\n 88\n       88\n
   ^ there are no spaces; I just added them for readability
Since we are on the last line (marked by '$') the commands in the curly braces '{}' are also executed. The 'x' command exchanges the contents of the hold and the pattern buffer. So after 'x' is executed our buffers look like this
Code:
hold        pattern
88\n        a\n b\n 88\n
               ^ there are no spaces; I just added them for readability
Then the 'p' command simply instructs 'sed' to print the pattern buffer, which results into our desired output:
Code:
sed -n '/a/ h; /a/ ! H; $ {x;p}' file
a
b
88
Some final remarks:
1.) Note, that the pattern buffer and hold buffer are sometimes referred to as pattern space and hold space as well.

2.) Note, that the '$' sign inside a regular expression does not refer to the last line but to the end of line. But this is not the case here.

3.) If you are interested in learning 'sed' then read this most excellent tutorial on 'sed':
http://www.grymoire.com/Unix/Sed.html

Come back afterwards and try to understand how our 'sed' command works here.

Hope this helps.

Last edited by crts; 10-13-2010 at 07:57 PM.
 
Old 10-13-2010, 11:31 PM   #10
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,608

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
What am I missing?

Quote:
Originally Posted by ghostdog74 View Post
Code:
awk '/a/{delete a}{a[FNR]=$0}END{for(i in a) print a[i]}' file
First, the command works with the sample data. However, when I try to run it with this samplefile and 'somestring' instead of 'a' then the output is odd:
Code:
$ cat file
ssomejunk 
A
B
in


other 
somestring 
  
     
data 
  
foo
somestring 
bar
somestring 
88
advc
$ awk '/somestring/{delete a}{a[FNR]=$0}END{for(i in a) print i" "a[i]}' file
17 88
18 advc
16 somestring
Do you get the same result or is one of my shell options possibly interfering? Maybe it has something to do with the number of lines (Hash issue)? This only happens when the file has more than 10 lines.
 
Old 10-13-2010, 11:46 PM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
@crts, that's because the for loop used that way don't have an order. The more correct way should be

Code:
# awk '/somestring/{delete a;d=0}{a[++d]=$0}END{for(i=1;i<=d;i++) print a[i]}' file
somestring
88
advc
by using an integer counter instead of FNR
 
Old 10-14-2010, 09:34 AM   #12
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,608

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
Hm,

did some searching and I found this on associative arrays:
http://en.wikipedia.org/wiki/Associative_array

So, the ordering issue implicates that awk uses hash tables to implement an associative array. Something to keep in mind.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to remove lines and parts of lines from python strings? golmschenk Programming 3 11-26-2009 11:29 PM
deleting particular lines using shell scripting ibabhelix Linux - Newbie 9 11-02-2009 12:26 AM
Sed/Awk: print lines between n'th and (n+1)'th match of "foo" xaverius Programming 17 08-20-2007 11:39 AM
Which config file should I use... foo or foo.new? davidguygc Slackware 6 08-01-2007 05:21 PM
Inserting lines into a file through shell scripting false-hopes Linux - General 1 10-22-2005 11:39 AM


All times are GMT -5. The time now is 09:28 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration