LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-15-2013, 03:22 PM   #1
threezerous
Member
 
Registered: Jul 2009
Posts: 94

Rep: Reputation: 15
retrieve a section of grep result


Hello folks,

I ran a grep command on all xml files to search for the string http:// and got an output file with following results

Grep command I ran: find -name "*.xml" -type f -exec grep -H "http://" {} \; > /tmp/grep_output

The contents of output file are

/filepath//abc1_filename.xml:6: <title>http://blahblahblah/rmans/wdfth/hfpt</title>
/filepath//abc2_filename.xml:1: <keywords>auto_http://blahblahblah/rmans/kwd/hgdft</keywords>
/filepath//abc3_filename.xml:17: <desc>url_http://blahblahblah/rmans/metadesc/dklt</desc>
/filepath//abc4_filename.xml:18: <user>autourl_http://blahblahblah/rmans/wnrs/hftp</user>
/filepath//abc5_filename.xml:20: <ttl>http://blahblahblah/rmans/pqrd/dec/prts</ttl>
/filepath//abc6_filename.xml:4: <target>http://blahblahblah/rmans/xyz/seca/tttyz</target>

I actually want the grep results without the beginning or ending tag. Basically I want to strip out some string before http: and some string at the end of line after closing tag </. So my output result would look something like

/filepath//abc1_filename.xml:6: http://blahblahblah/rmans/wdfth/hfpt
/filepath//abc2_filename.xml:1: http://blahblahblah/rmans/kwd/hgdft
/filepath//abc3_filename.xml:17: http://blahblahblah/rmans/metadesc/dklt
/filepath//abc4_filename.xml:18: http://blahblahblah/rmans/wnrs/hftp
/filepath//abc5_filename.xml:20: http://blahblahblah/rmans/pqrd/dec/prts
/filepath//abc6_filename.xml:4: http://blahblahblah/rmans/xyz/seca/tttyz

I know I have to read the output file in a while loop and use sed, something like

#! / bin / bash

while read line
do
sed 'something'
done < grep_output

but that is as far as I could get. I know how to use sed in vi to replace a string, but not very sure how I could use it here.

Any suggestions or pointers will help a lot.

Thanks in advance.
 
Old 02-15-2013, 03:36 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
You can try to refine the regular expression in the grep command. First of all use the -o option to retrieve only the part matching the regexp, then you may need the -E option for extended regular expressions. Example:
Code:
grep -EHo 'http://[^<]+'
This matches http:// followed by any character until the < symbol (excluded). Hope this helps.
 
1 members found this post helpful.
Old 02-15-2013, 03:45 PM   #3
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 592

Rep: Reputation: 140Reputation: 140
Code:
sed 's/\(.*: \)<.*>\(http.*\)<.*>/\1\2/g'  /tmp/grep_output
 
1 members found this post helpful.
Old 02-15-2013, 04:24 PM   #4
threezerous
Member
 
Registered: Jul 2009
Posts: 94

Original Poster
Rep: Reputation: 15
Long-winded approach

Thanks whizje and colucix,

On my own I found a long winded approach to resolve this. Wrote two scripts each reading the output of previous as

#!/bin/bash

while read line
do
sed s/xml.*http:/xml~http/
done < grep_output


#!/bin/bash

while read line
do
cut -d "<" -f1
done < output_of_first

Definitely, not as nifty and nimble as you guys suggested, but worked
 
Old 02-17-2013, 05:37 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959Reputation: 1959
grep is used for extracting whole lines, or at best whole pattern matches. To extract substrings from matches you need to use sed or a similar tool, as demonstrated.

Incidentally, since the input is xml, you may be better off using a tool that has an actual xml parser built in, such as xmlstarlet. If I could see an example of the raw input, perhaps I could help develop an xpath rule to extract what you wanted.
 
Old 02-18-2013, 07:38 AM   #6
Habitual
LQ Addict
 
Registered: Jan 2011
Posts: 8,248
Blog Entries: 11

Rep: Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288Reputation: 2288
Quote:
Originally Posted by threezerous View Post
Definitely, not as nifty and nimble as you guys suggested, but worked
Code:
#!/bin/bash

while read line
do
   sed s/xml.*http:/xml~http/
done < grep_output


#!/bin/bash

while read line
do
  cut -d "<" -f1
done < output_of_first
Props for your own solution, however 'elegant' or 'inelegant' it may be, it is yours and now you are less likely to forget that accomplishment.

1 demerit for not using [code][/code] tags. (Documentation)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
output result from grep. brian00 Linux - Newbie 1 09-12-2012 09:39 AM
[SOLVED] Retrieve default value with grep -e? kristo5747 Linux - Newbie 4 11-02-2011 06:27 PM
Using 'grep' to retrieve a function definition vdeych Programming 2 09-24-2008 04:28 AM
ps -ef | grep iptables gives no result ? markraem Linux - Networking 1 07-07-2004 05:28 AM
ps -ef|grep -v root|grep apache<<result maelstrombob Linux - Newbie 1 09-24-2003 11:38 AM


All times are GMT -5. The time now is 05:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration