LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-22-2011, 06:20 AM   #1
elexx
LQ Newbie
 
Registered: Feb 2011
Posts: 4

Rep: Reputation: 0
sed - parse text file


hello,

i've following problem: i want to cut a few lines out of a structured text file, but i'm not able to create the sed command for this.

input:
Code:
+ headline1
   + subheader1
   + subheader2
      + this line
      + that line
   + subheader3
+ headline2
   + subheader2
      + another line
   + subheader3
the output i'm trying to get:
Code:
subheader2
this line
that line
subheader2
another line
i need to get "subheader2" followed by all lines until the next subheader.

is it possible to count the leading spaces followed by a "+" and get the text until the next equal count of spaces followed by another "+"?

and please excuse my bad english, i promise i'll improve it
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 02-22-2011, 07:37 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Hi and welcome to LinuxQuestions! Following your idea, I suggest an awk code:
Code:
awk '/^      +/{ if (c) { print pre; c = 0 } print }!/^      +/{ pre = $0; c = 1 }' file
waiting for a sed guru to find a suitable solution.
 
1 members found this post helpful.
Old 02-22-2011, 07:53 AM   #3
elexx
LQ Newbie
 
Registered: Feb 2011
Posts: 4

Original Poster
Rep: Reputation: 0
Thank you for the fast replay!
I think I didn't explain the task correctly, sorry.

I need to get the text, starting with the keyword "subheader2" following by any line until another header on the same level (=same number of leading spaces as "subheader2" has) appears. The numbers of spaces in front of two "subheader2" may vary, but the child-headers always have more spaces then the "subheader2".

And of course I also take a awk solution, I just stumbled across sed and thought it will solve my problem ;-)

Last edited by elexx; 02-22-2011 at 07:56 AM.
 
Old 02-22-2011, 08:18 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,628

Rep: Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943
As I am assuming the example is probably not the real text, this will require some changes, but maybe something like:
Code:
awk 'match($0,/^([^+]*\+ )(.*)/,f) && /subheader2/{len = length(f[1])}/subheader[^2]/{len = 0}len && length(f[1]) >= len{print f[2]}' file
 
2 members found this post helpful.
Old 02-22-2011, 09:00 AM   #5
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,665

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Hi,

this sed works with your sample data:
Code:
sed -nr '/subheader2/,/subheader3|headline/ {/subheader3|headline/d;s/^[[:blank:]]*\+[[:blank:]]*//;p}' file
It will also work if there is no subheader3 as in
Code:
+ headline1
   + subheader1
   + subheader2
      + this line
      + that line
   + subheader3
+ headline2
   + subheader2
      + another line
+ headline3
   + subheader2
      + another line
   + subheader3
Let me know if you have funny things like 'skipping' a header like
subheader2
...
subheader4
[EDIT]
Also let us know how many subheaders there are, e.g. the above will backfire if you have something like
subheader23

If you have less than 20 subheaders this is more robust than the above:

Code:
sed -nr '/subheader2/,/subheader[^2]|headline/ {/subheader[^2]|headline/d;s/^[[:blank:]]*\+[[:blank:]]*//;p}' file

Last edited by crts; 02-22-2011 at 09:07 AM.
 
2 members found this post helpful.
Old 02-22-2011, 10:08 AM   #6
elexx
LQ Newbie
 
Registered: Feb 2011
Posts: 4

Original Poster
Rep: Reputation: 0
The solutions are great, but they depend on knowing the name of the header after "subheader2" (within my bad example it is "subheader3"). But the output of the program I'll try to parse hasn't a fixed order of subheaders.
The fixed thing is the name "subheader2". And I know the number of spaces in front of it is equal to the number of spaces in front of the next same-level-header, but the name of the next same-level-header is unknown (it is one of about 20 different headers).
The data could also be:
Code:
+ headline2
   + subheader2
      + another line
   + another level two line
+ headline1
   + subheader2
      + this line
      + that line
   + onemoreheader
   + another level two line
   + anotherheader
... yea, I know my explanation was really bad, but I hope I clarified it.
 
Old 02-22-2011, 11:01 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,628

Rep: Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943
How about:
Code:
awk 'match($0,/^([^+]*\+ )(.*)/,f) && /subheader2/{len = length(f[1]);print f[2];next}len{if(length(f[1]) > len)print f[2];else len = 0}' file
 
1 members found this post helpful.
Old 02-22-2011, 11:03 AM   #8
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,665

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Quote:
Originally Posted by elexx View Post
The solutions are great, but they depend on knowing the name of the header after "subheader2" (within my bad example it is "subheader3"). But the output of the program I'll try to parse hasn't a fixed order of subheaders.
The fixed thing is the name "subheader2". And I know the number of spaces in front of it is equal to the number of spaces in front of the next same-level-header, but the name of the next same-level-header is unknown (it is one of about 20 different headers).
The data could also be:
Code:
+ headline2
   + subheader2
      + another line
   + another level two line
+ headline1
   + subheader2
      + this line
      + that line
   + onemoreheader
   + another level two line
   + anotherheader
... yea, I know my explanation was really bad, but I hope I clarified it.
Ok,

based on the new criteria
Code:
sed -nr '/subheader2/ {:a N;s/^([[:blank:]]*\+[[:blank:]]*)(.+)\n\1.*/\2/;Ta;s/[[:blank:]]*\+[[:blank:]]*//g;p}' file
 
1 members found this post helpful.
Old 02-22-2011, 11:19 AM   #9
elexx
LQ Newbie
 
Registered: Feb 2011
Posts: 4

Original Poster
Rep: Reputation: 0
Thumbs up

Thank you a lot grail and crts! Both scripts are excellent!

*hands over a few bottles of virtual beer to you*

 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed script to parse a file into smaller files with set # of lines kmkocot Linux - Newbie 3 11-12-2009 12:51 PM
how-to make sed read 1 random line into a file and parse it ot a variable?? Speedy2k Linux - Newbie 7 05-24-2009 12:23 PM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 03:36 PM
ssimple shell script to parse a file ~sed or awk stevie_velvet Programming 7 07-14-2006 04:41 AM
SED - display text on specific line of text file 3saul Linux - Software 3 12-29-2005 05:32 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration