LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Searching for regular expression in command line (http://www.linuxquestions.org/questions/programming-9/searching-for-regular-expression-in-command-line-662072/)

Jean Of mArc 08-11-2008 04:00 PM

Searching for regular expression in command line
 
Hello!

I am somewhat familiar with regular expressions, and know a bit about sed, grep, etc.

What I would like to do is, on the command line, take a text file and do a search on it, using a regular expression. Each "hit" for that search is then displayed on the command line as a new line.

Example:

Textfile (gibberish.txt):
alksjjlkfbdfklsdklbdlfhnds8367495634klasnhjaslkfdbaskjsdln826394864klaslksajdsbas321123

$ cat gibberish.txt | <extra command> "[0-9]*"
8367495634
826394864
321123

Any ideas?

Thanks!

Jean Of mArc

jim mcnamara 08-11-2008 04:20 PM

Code:

echo "abc123ab133" | sed 's/[a-z]/ /g' | awk '{for(i=1; i<=NF;i++){print $i} }'
One way, not excatly one extra command.

Mr. C. 08-11-2008 04:27 PM

How about something like:

sed 's/[^0-9][^0-9]*/\n/g' gibberish.txt

Jean Of mArc 08-11-2008 04:32 PM

Hmmm...
 
Thanks for that suggestion, however that more tries to eliminate the NON-numeric stuff. What I gave was just an example, but it won't work for what I'm actually hoping to do.

There's a site that hosts a bunch of mp3 links for their podcasts. Rather than always right-clicking on the mp3 links to download them, my thought was:

Take the HTML source, find all the mp3 links, wget those files.

There are a lot of extra files I don't need on the site as well, so I don't just want to wget the whole page or site or anything.

So basically, I'm trying to match anything in the source that is fulfilled by the expression:

"http://www\.example\.com/.*\.mp3"

It seems to me that there ought to be a way in unix to just do a search for something, rather than eliminating everything that ISN'T the thing we're looking for.

Thanks!

Jean Of mArc

kenoshi 08-11-2008 04:33 PM

sed -r 's/([a-zA-Z]+)/\n/g' gibberish

Mr. C. 08-11-2008 04:38 PM

It is hard to solve a problem when a representative sample provided isn't actually representative. How can we possibly know what you want? Try to be as clear as possible with your requests and data..

There are lots of ways to solve problems - that is where *nixes really shine.

How about using egrep:

egrep -o 'http://[^ ]*\.mp3' gibberish.txt

Jean Of mArc 08-11-2008 04:53 PM

Thanks Mr. C!

That helped a lot.
I'm sorry if you were bothered that the data wasn't the same, however I was more-so looking to see if there was a program that could just plain do regex searches and print the matches, unspecific to whatever its use was. I hope that makes sense.

"egrep -o" is exactly what I was hoping for, so thanks for pointing that out!!!

Oh, and is there a way to specify "not" a block of text? For example, I know that you can do this for individual characters:
[^A-Z] (to mean NOT A-Z), but is there a way you can do something like:
^APPLE (to mean NOT the literal string of "APPLE"??)

kenoshi 08-11-2008 04:57 PM

sed 's/.*\(http[^ ]*mp3\).*/\1/g' somefile.html

Damn, Mr C is fast lol :D

Jean Of mArc 08-11-2008 04:59 PM

Well, thank you for your help as well, Kenoshi (ケノシ)

Mr. C. 08-11-2008 08:01 PM

No problem. It takes time to learn what's relevant and what's not.

You can use grep -v to exclude lines that contain APPLE:

Code:

$ echo -e 'I LOVE\nGOOD APPLE\nPIE' | grep -v APPLE
I LOVE
PIE

or use awk's ability to use a regular expression as a field delimiter:

Code:

$ echo -e 'I LOVE\nGOOD APPLE\nPIE' | awk -F'APPLE' '{print $1}' 
I LOVE
GOOD
PIE

The answer to the "can you do it" question is almost invariably "YES!", so its not an interesting question by itself. The challenge is to learn how to characterize your problem, and which tool to use, and how to use it.


All times are GMT -5. The time now is 03:26 AM.