LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 08-11-2008, 05:00 PM   #1
Jean Of mArc
LQ Newbie
 
Registered: Nov 2005
Location: Canada
Posts: 25

Rep: Reputation: 15
Searching for regular expression in command line


Hello!

I am somewhat familiar with regular expressions, and know a bit about sed, grep, etc.

What I would like to do is, on the command line, take a text file and do a search on it, using a regular expression. Each "hit" for that search is then displayed on the command line as a new line.

Example:

Textfile (gibberish.txt):
alksjjlkfbdfklsdklbdlfhnds8367495634klasnhjaslkfdbaskjsdln826394864klaslksajdsbas321123

$ cat gibberish.txt | <extra command> "[0-9]*"
8367495634
826394864
321123

Any ideas?

Thanks!

Jean Of mArc
 
Old 08-11-2008, 05:20 PM   #2
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 34
Code:
echo "abc123ab133" | sed 's/[a-z]/ /g' | awk '{for(i=1; i<=NF;i++){print $i} }'
One way, not excatly one extra command.
 
Old 08-11-2008, 05:27 PM   #3
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
How about something like:

sed 's/[^0-9][^0-9]*/\n/g' gibberish.txt
 
Old 08-11-2008, 05:32 PM   #4
Jean Of mArc
LQ Newbie
 
Registered: Nov 2005
Location: Canada
Posts: 25

Original Poster
Rep: Reputation: 15
Hmmm...

Thanks for that suggestion, however that more tries to eliminate the NON-numeric stuff. What I gave was just an example, but it won't work for what I'm actually hoping to do.

There's a site that hosts a bunch of mp3 links for their podcasts. Rather than always right-clicking on the mp3 links to download them, my thought was:

Take the HTML source, find all the mp3 links, wget those files.

There are a lot of extra files I don't need on the site as well, so I don't just want to wget the whole page or site or anything.

So basically, I'm trying to match anything in the source that is fulfilled by the expression:

"http://www\.example\.com/.*\.mp3"

It seems to me that there ought to be a way in unix to just do a search for something, rather than eliminating everything that ISN'T the thing we're looking for.

Thanks!

Jean Of mArc
 
Old 08-11-2008, 05:33 PM   #5
kenoshi
Member
 
Registered: Sep 2007
Location: SF Bay Area, CA
Distribution: CentOS, SLES 10+, RHEL 3+, Debian Sarge
Posts: 159

Rep: Reputation: 32
sed -r 's/([a-zA-Z]+)/\n/g' gibberish
 
Old 08-11-2008, 05:38 PM   #6
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
It is hard to solve a problem when a representative sample provided isn't actually representative. How can we possibly know what you want? Try to be as clear as possible with your requests and data..

There are lots of ways to solve problems - that is where *nixes really shine.

How about using egrep:

egrep -o 'http://[^ ]*\.mp3' gibberish.txt
 
Old 08-11-2008, 05:53 PM   #7
Jean Of mArc
LQ Newbie
 
Registered: Nov 2005
Location: Canada
Posts: 25

Original Poster
Rep: Reputation: 15
Thanks Mr. C!

That helped a lot.
I'm sorry if you were bothered that the data wasn't the same, however I was more-so looking to see if there was a program that could just plain do regex searches and print the matches, unspecific to whatever its use was. I hope that makes sense.

"egrep -o" is exactly what I was hoping for, so thanks for pointing that out!!!

Oh, and is there a way to specify "not" a block of text? For example, I know that you can do this for individual characters:
[^A-Z] (to mean NOT A-Z), but is there a way you can do something like:
^APPLE (to mean NOT the literal string of "APPLE"??)
 
Old 08-11-2008, 05:57 PM   #8
kenoshi
Member
 
Registered: Sep 2007
Location: SF Bay Area, CA
Distribution: CentOS, SLES 10+, RHEL 3+, Debian Sarge
Posts: 159

Rep: Reputation: 32
sed 's/.*\(http[^ ]*mp3\).*/\1/g' somefile.html

Damn, Mr C is fast lol
 
Old 08-11-2008, 05:59 PM   #9
Jean Of mArc
LQ Newbie
 
Registered: Nov 2005
Location: Canada
Posts: 25

Original Poster
Rep: Reputation: 15
Well, thank you for your help as well, Kenoshi (ケノシ)
 
Old 08-11-2008, 09:01 PM   #10
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
No problem. It takes time to learn what's relevant and what's not.

You can use grep -v to exclude lines that contain APPLE:

Code:
$ echo -e 'I LOVE\nGOOD APPLE\nPIE' | grep -v APPLE
I LOVE
PIE
or use awk's ability to use a regular expression as a field delimiter:

Code:
$ echo -e 'I LOVE\nGOOD APPLE\nPIE' | awk -F'APPLE' '{print $1}'  
I LOVE
GOOD 
PIE
The answer to the "can you do it" question is almost invariably "YES!", so its not an interesting question by itself. The challenge is to learn how to characterize your problem, and which tool to use, and how to use it.

Last edited by Mr. C.; 08-11-2008 at 09:02 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regular expression (.*?) uttam_h Programming 6 05-30-2008 06:45 PM
PERL:how to find a blank line in a file--regular expression littletransformer Programming 4 03-27-2008 07:55 PM
Searching for FTP SSL command line client for Fedora chipix Linux - Security 4 06-27-2006 09:47 AM
Regular expression datbenik Programming 1 01-05-2006 02:58 PM
Quotes in command line expression afshin Programming 4 02-24-2003 08:27 AM


All times are GMT -5. The time now is 02:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration