LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 06-01-2005, 04:01 PM   #1
sopiaz57
Member
 
Registered: Apr 2003
Distribution: RH 8
Posts: 246

Rep: Reputation: 30
text searching


Hey guys, i am a bit new to text searching and expressioins, basically I have a text file filled with garbage, what is valuable is the data inbetween ( )


what sort of string would process all the text inbetween these parens.

Thanks in advance.
 
Old 06-01-2005, 05:51 PM   #2
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
Hi,
This may get you started where the info is on one line. If the ( and ) are on separate lines, you may need to temporarily replace the \n with something else like a title ~

Code:
cat file.txt | grep -o '([^)]*)'
 
Old 06-02-2005, 02:44 AM   #3
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Rep: Reputation: 15
I had luck with the following command. It is capable of handing files with corresponding ( and ) on multiple lines.

Code:
cat test.txt | tr -d '\012' | egrep -o '\([^\(]*?\)'
I escaped the parentheses shown in the previous post to make it work on my machine. Using this command without the escape characters produced some very odd results which I intend to investigate at some point.

Cheers,

Zachary Palmer
 
Old 06-02-2005, 09:17 AM   #4
sopiaz57
Member
 
Registered: Apr 2003
Distribution: RH 8
Posts: 246

Original Poster
Rep: Reputation: 30
looks interesting, can you break it down for me.
 
Old 06-02-2005, 01:00 PM   #5
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Rep: Reputation: 15
I'd be glad to. If I cover something you already know, bear with me; I have no idea how much experience you have.

Code:
cat test.txt
The output of this part is the contents of the file. cat reads the file's contents and writes them to its standard output. The pipe symbol ('|') makes the standard output for the left-hand command readable as the standard input to the right-hand command, sort of plugging them together.

Code:
tr -d '\012'
tr stands for "translate"; this program is designed to translate characters on the standard input to another set of characters, written to the standard output. In this case, rather than translating the characters, the -d flag causes tr to delete the characters instead. There is only one character specified in the set (between the ' marks): \012. tr assumes that patterns of the form \nnn indicate the character with the octal value nnn. In this case, the octal value 12 (decimal value 10) represents the Linux newlne character.

Code:
egrep -o '\([^\(]*\)'
This snippet is fairly complicated. To my understanding, egrep does the same thing as grep -e: it interprets the contents of the expression not as a simple string but as a regular expression. If you're not familiar with regular expressions, check out the egrep manpage or have a Google. I'll explain the expression itself below. The -o flag causes egrep to only display the part of the line which matched the expression; normally grep displays the whole line on which the pattern occurred.

The pattern contains a number of backslash characters because regular expressions uses parentheses as a grouping symbol. For example, the regular expression "ab*" means 'a' followed by zero or more 'b's (ex., "a", "ab", "abb", but not "aba" or "abab"). The regular expression "(ab)*", on the other hand, means zero or more 'ab's (ex., "ab", "abab", and even "", but not "abb" or "aab"). Since we want to look for literal parentheses, we use the backslash character to tell grep that the character immediately following it isn't regular expression syntax but an actual character in the pattern.

If you rewrite the pattern to use bold characters for any of the regular expression formatting characters, it looks like this:

([^(]*)

That is, the expression looks for an open parenthesis, then the pattern [^(] zero or more times, then a closed parenthesis.

The pattern [^(] is fairly simplistic. The [] characters tell the regular expression that the contents are to be interpreted as a character set; any of the characters or character representations contained within match it. For example, [abd] matches 'a', 'b', or 'd', [a-ce] matches 'a', 'b', 'c', or 'e', and [a-zA-Z0-9] matches any alphanumeric character.

The ^ symbol in sets indicates "not." That is, [^a] represents any character which isn't 'a'. Therefore, the expression [^(] matches any character which isn't '('.

So, the expression matches an open parenthesis, any number of characters which aren't an open parenthesis, and then a closed parenthesis.

***I should note that I made an assumption about your intentions. Take, for example, the file

Code:
0123(abcd)efgh)4567(ijk)
If you want this expression to provide:

Code:
(abcd)efgh)
(ijk)
you use the expression I gave you. If, however, you want

Code:
(abcd)
(ijk)
(stopping at the first available closed parenthesis rather than the last one), you want to use

Code:
cat test.txt | tr -d '\012' | egrep -o '\([^\)]*?\)'
(noting that the escaped parenthesis inside of the set by the ^ is a closed parenthesis and not an open one).

Any questions?

Cheers,

Zachary Palmer
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Firefox searching in edit text boxes Muzzy Linux - Software 0 01-25-2005 03:14 PM
Searching text files by content will103 Linux - Software 1 01-24-2005 07:43 AM
searching inside text files minm Linux - Newbie 2 01-08-2005 11:56 PM
Searching for software with scanning, text/graphic editing, etc. functions satimis Linux - Software 0 11-06-2004 06:07 PM
Searching a text file - complicated cyph3r7 Linux - General 2 12-16-2003 10:45 AM


All times are GMT -5. The time now is 12:51 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration