LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-13-2007, 03:47 PM   #1
mauran
LQ Newbie
 
Registered: Dec 2005
Location: Sri Lanka
Distribution: unbuntu 7.04
Posts: 17

Rep: Reputation: 0
select all text between a patteren using grep


How can I select all the text between a specific pattern using grep?
or can I?

Quote:
balahblah

<text> blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah blahblahblah
</text>

bbbnnkkkmnmm
I need to select all the text situated between <text></text>
 
Old 07-13-2007, 04:40 PM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,417

Rep: Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985
grep doesn't select text, it finds and prints entire matching lines from a regex. "man grep" for details.
 
Old 07-13-2007, 04:57 PM   #3
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
You can use pcregrep as in:

pcregrep -Mi "^<text>\s.*\s</text>" somefile
 
Old 07-13-2007, 10:56 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
awk '/^$/{next}
   {  
	if (match($0,"<text>")) {		
          starttag=RSTART;endtag=RLENGTH
		  line=$0
		  if (match(line,"</text>")){		    
			line=substr(line,starttag+endtag,RSTART-RLENGTH)
			print "line:" line
		  }
		}		
		else if (match($0,"</text>")){
		  print substr(line,starttag+endtag)
		}
	}' "file"
 
Old 07-13-2007, 11:13 PM   #5
farkus888
Member
 
Registered: Oct 2006
Distribution: usually use arch
Posts: 103

Rep: Reputation: 15
pretty sure sed can do that, a quick google for "sed one liners" should get you that method. I know its quicker than that awk method if its there.
 
Old 07-13-2007, 11:55 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
"sed -n '/<text>/,/<\/text>/p' filename" should do it. Will need some work if both are on the one line.
And yes, go look at the one-liners on the sed site on sf.
 
Old 07-14-2007, 12:09 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by syg00
"sed -n '/<text>/,/<\/text>/p' filename" should do it. Will need some work if both are on the one line.
And yes, go look at the one-liners on the sed site on sf.
If i am not wrong, OP wants to extract the text in between the tags. So i guess some more manipulations required with the sed method.
Code:
awk '/<text>/,/<\/text>/' file #equivalent to sed -n '/<text>/,/<\/text>/p
 
Old 07-14-2007, 12:15 AM   #8
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by farkus888
I know its quicker than that awk method if its there.
well, it really doesn't matter, does it?
 
Old 07-14-2007, 12:29 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by ghostdog74
If i am not wrong, OP wants to extract the text in between the tags. So i guess some more manipulations required with the sed method.
Yeah, you might be right - I was thinking "inclusive" of tags.
Oh well.

As always, lots of ways of getting the job done. Won't take much to clean-up, depending on what the OP actually wanted. Could be done any number of ways.
 
Old 07-14-2007, 12:58 AM   #10
farkus888
Member
 
Registered: Oct 2006
Distribution: usually use arch
Posts: 103

Rep: Reputation: 15
Quote:
Originally Posted by ghostdog74
well, it really doesn't matter, does it?
I always like to find the shortest [least code required] method to do something. especially for some one like this, if they couldn't figure this out on their own they probably dont understand all of whats going on in the code you provided. not trying to knock you by any means, just providing insight on a more simple method. I know when I am new to something it drives me crazy to have people show me over complicated methods for doing something very simple, it makes it harder to understand so I can do it on my own next time. I try to help people not have the problems learning that I had, not just give them a one time fix for their problem.
 
Old 07-14-2007, 01:11 AM   #11
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
If the tags and the contents are on the same line, then It can be done easily using sed:
sed -n '/<text>/,/<\/text>/s/.*<text>\(.*\)<\/text>/\1/p' file.

I've used something similar with k3b. If you save the project to a file, it actually creates a zip archive containing two file. One of them is named maindata.xml.
Code:
jschiwal@hpamd64:~> unzip podcasts.k3b
Archive:  podcasts.k3b
 extracting: mimetype
  inflating: maindata.xml
The xml file contains a catalog of backed up files. You could use this file to give you a list of names that are safe to delete because they are backed up.
Code:
...
<file name="JM-001.ogg" >
<url>/home/jschiwal/Podcasts/JM-001.ogg</url>
</file>
<file name="LQ-Podcast-050207.mp3" >
<url>/home/jschiwal/Podcasts/LQ-Podcast-050207.mp3</url>
</file>
<file name="LQ-Podcast-051207.mp3" >
<url>/home/jschiwal/Podcasts/LQ-Podcast-051207.mp3</url>
</file>
Notice the similar pattern. The filenames are between the <url></url> tags.
Code:
sed -n '/^<url>/s/^<url>\(.*\)<\/url>/\1/p' maindata.xml

...
/home/jschiwal/Podcasts/CrankyGeeks/crankygeeks.064.mp4
/home/jschiwal/Podcasts/CrankyGeeks/crankygeeks.066.mp4
/home/jschiwal/Podcasts/CrankyGeeks/crankygeeks.067.mp4
/home/jschiwal/Podcasts/JM-001.ogg
/home/jschiwal/Podcasts/LQ-Podcast-050207.mp3
/home/jschiwal/Podcasts/LQ-Podcast-051207.mp3
...
In this case, because the source is an xml file, you need to watch for the patterns &gt; &lt; &amp; and replace them with the characters >,<,& respectively. So adding three sed commands are necessary.

Code:
sed -n '/^<url>/{
s/^<url>\(.*\)<\/url>/\1/
s/&gt;/>/g
}' maindata.xml
jschiwal@hpamd64:~> sed -n '/^<url>/{
s/^<url>\(.*\)<\/url>/\1/
s/&gt;/>/g
s/&lt;/</g
> s/&amp;/\&/g
> p
> }' maindata.xml
...
/home/jschiwal/Podcasts/50@10712b865b6a420bdea05b6cc5bfde98
/home/jschiwal/Podcasts/CrankyGeeks/crankygeeks.064.mp4
/home/jschiwal/Podcasts/CrankyGeeks/crankygeeks.066.mp4
/home/jschiwal/Podcasts/CrankyGeeks/<crankygeeks>&.067.mp4
/home/jschiwal/Podcasts/JM-001.ogg
/home/jschiwal/Podcasts/LQ-Podcast-050207.mp3
/home/jschiwal/Podcasts/LQ-Podcast-051207.mp3
Whatever method you use, it is best to test it out. You may have forgotten some patterns that can trip you up. The first time I did this I forgot about the reserved characters in xml, and files containing these characters weren't being deleted.
In composing this message, I added one sed rule at a time and tested it before going to the next one. Simply pressing the up arrow in the shell, and adding semicolons between sed commands, I can convert this into a true oneliner:
Code:
sed -n '/^<url>/{s/^<url>\(.*\)<\/url>/\1/;s/&gt;/>/g;s/&lt;/</g;s/&amp;/\&/g;p}' maindata.xml


I hope I remember to change the filename back to "crankygeeks.067.mp4" after this demonstration!

Last edited by jschiwal; 07-14-2007 at 03:53 AM.
 
Old 07-14-2007, 02:13 AM   #12
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by jschiwal
sed -n '/<text>/,/<\/text>/s/.*/<text>\(.*\)<\/text>/\1/p' file.
A small typo maybe ???
 
Old 07-14-2007, 03:32 AM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by farkus888
I always like to find the shortest [least code required] method to do something.
that's the problem with one liners in general IMO. They are short and specific to do a task, but not necessarily easily understandable to the one reading/maintaining it.
 
Old 07-14-2007, 03:52 AM   #14
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Quote:
Originally Posted by syg00
A small typo maybe ???
Yes. That is the one line I didn't test out. I'll blame it on finger memory.

Last edited by jschiwal; 07-14-2007 at 03:54 AM.
 
Old 07-14-2007, 03:52 AM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Edit: Response to ghostdog74.

"quick and dirty" hacks are fine for ad hoc one-time needs.
In a corporate environment, it pays to have a better (and better documented) generic solution. Personally I prefer perl in such a circumstance, but each to their own.

For a home user it may not matter.

Last edited by syg00; 07-14-2007 at 03:54 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to select text with keyboard on KDE? xlq Linux - Newbie 2 03-11-2007 04:34 PM
Grep text man_linux Linux - General 3 09-02-2006 12:06 AM
Why I can't copy or select text from XCHM ? Bonch Debian 1 12-22-2004 04:38 AM
text select Abi Word thegreatgatsby Linux - Software 8 02-20-2004 03:51 AM
grep to NOT select a word robertmarkbram Programming 2 08-21-2003 10:48 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration