LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-11-2009, 09:22 AM   #1
nonnumquam
LQ Newbie
 
Registered: Jul 2009
Posts: 4

Rep: Reputation: 0
Capture data in brackets


Trying to capture the data between the brackets in a bash script. Within the brackets the first two columns are always the same size length. Although the name can be multiple names and up to something like 64 characters. I'd like to get the name in one variable.

Sample of data:
00608EA UNACKED DN [00000089 #1 Bob Smith]L 0 REL 00000384 | C5393580 | C53931FC
00608EA UNACKED DN [000000A8 #1 Andrew J Landau]L 0 REL 00000384 | C5393580 | C53931FC
00608EA UNACKED DN [00000217 #1 Robert James Alburn]L 0 REL 00000384 | C5393580 | C53931FC
00608EA DELETE D ~ [000001E9 #1 Mike Lious]L 0 NEVER
 
Old 07-11-2009, 09:46 AM   #2
nonnumquam
LQ Newbie
 
Registered: Jul 2009
Posts: 4

Original Poster
Rep: Reputation: 0
Also, in the script I am processing these lines one at a time. This is not the only type/format of line in the file. I'm guessing this will be slow but I'm just trying to get the job done.
 
Old 07-11-2009, 09:55 AM   #3
Robhogg
Member
 
Registered: Sep 2004
Location: Old York, North Yorks.
Distribution: Debian 7 (mainly)
Posts: 653

Rep: Reputation: 85
You could pipe each line through the following sed command:

Code:
sed 's/\(.*\[\)\([^]]*\)\(\].*\)/\2/'
This translates as:

Match:
\(.*\[\) - first group: any number of characters, followed by an opening square bracket.
\([^]]*\) - second group: any number of characters that are not a closing square bracket.
\(\].*\) - third group: a closing square bracket, followed by any number of characters.
... and substitute with:
\2 - whatever was matched by the second group.

For more, see Streams and Sed (Rute Users' Tutorial and Exposition)

Last edited by Robhogg; 07-11-2009 at 09:58 AM.
 
Old 07-11-2009, 10:06 AM   #4
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Or, if you do other things to each line in a bash loop, here's how to do it for one line within bash:
Code:
while read LINE; do
    LINE=${LINE%%]*}
    LINE=${LINE##*[}
    echo  "$LINE"
done < file.txt
 
Old 07-11-2009, 10:24 AM   #5
nonnumquam
LQ Newbie
 
Registered: Jul 2009
Posts: 4

Original Poster
Rep: Reputation: 0
That works great. Thanks!

Although, I have another similar scenario I thought the same would work...

interval: <3/4/2009 4:00:00 PM - 3/27/2009 3:00:00 PM BY_DOW 42>

I tried
sed 's/\(.*\<\)\(<^>>*\)\(\>.*\)/\2/'

and it returns the whole line. Does it have to do with the < > symbols?

I clearly need to study and better understand these kinds of expressions.
 
Old 07-11-2009, 10:34 AM   #6
nonnumquam
LQ Newbie
 
Registered: Jul 2009
Posts: 4

Original Poster
Rep: Reputation: 0
The second technique does the job.

Although, I'd still like to understand what is different in the sed command.

Thanks again.
 
Old 07-11-2009, 10:57 AM   #7
Robhogg
Member
 
Registered: Sep 2004
Location: Old York, North Yorks.
Distribution: Debian 7 (mainly)
Posts: 653

Rep: Reputation: 85
Quote:
Originally Posted by nonnumquam View Post
I tried
sed 's/\(.*\<\)\(<^>>*\)\(\>.*\)/\2/'

and it returns the whole line. Does it have to do with the < > symbols?
Yes, it does. the \< and \> are special (I noticed there's a bit of a misprint in the Rute's chapter on Sed) - they match the beginning and end of a word, while the unescaped < and > match literal angle brackets. Also, I supect you meant [^>] (any character that is not a right angle bracket) rather than <^>> (an angle bracket, a caret, and two more angle brackets). The pattern should be:

Code:
sed 's/\(.*<\)\([^>]*\)\(>.*\)/\2/'
Quote:
I clearly need to study and better understand these kinds of expressions.
A lifetime's study. Regexes are very powerful, but also can be pretty confusing (especially given that there are at least three different vocabularies in regular use). Chapter 5 of the Rute should get you started.

Last edited by Robhogg; 07-11-2009 at 11:06 AM.
 
Old 07-11-2009, 11:32 AM   #8
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Although many prefer the extremely obscure sed syntax for this, I really hate to look at it, it makes me cringe when I see lines like:

Code:
sed 's/\(.*\[\)\([^]]*\)\(\].*\)/\2/'
Really, it makes me sick to my stomach, so I always like to remind myself that this is not the only way. For example:

Code:
bash-3.1$ cat test | cut -d [ -f 2 | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
or

Code:
bash-3.1$ cut -d [ -f 2 test | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
where 'test' is the file containing the data.
 
Old 07-11-2009, 12:20 PM   #9
Robhogg
Member
 
Registered: Sep 2004
Location: Old York, North Yorks.
Distribution: Debian 7 (mainly)
Posts: 653

Rep: Reputation: 85
Quote:
Originally Posted by H_TeXMeX_H View Post
Although many prefer the extremely obscure sed syntax for this, I really hate to look at it, it makes me cringe... Really, it makes me sick to my stomach...
Wow, that's a strong reaction.

Yes, in this case you're probably right. There are other cases, though, where a single juicy regular expression can be the alternative to a couple of dozen lines of code.
 
Old 07-11-2009, 03:41 PM   #10
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
I have nothing against using sed or regular expressions, but they should be used where they are not obscure, or at least in the least obscure way possible. Personally I've never had to compose such a huge regular expression for anything ... and if I was trying to, I realized that maybe this was not the way. What will I think when I come back and look at the script later, it's just some magic line that will take a while to unravel.

Maybe for others they can see into the code ... are you Neo ?

EDIT:
Oh and BTW, I have nothing against your solution, it works and is probably more efficient than anything else. I'm just making a general comment, because I see a lot of these huge sed lines and it's hard to take, I usually just move on to the next thread at that point. I don't even bother testing them, I know for sure the magic hidden within is strong.

Last edited by H_TeXMeX_H; 07-11-2009 at 03:45 PM.
 
Old 07-11-2009, 09:34 PM   #11
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,973
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Quote:
Originally Posted by H_TeXMeX_H View Post
...., so I always like to remind myself that this is not the only way. For example:

Code:
bash-3.1$ cat test | cut -d [ -f 2 | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
or

Code:
bash-3.1$ cut -d [ -f 2 test | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
where 'test' is the file containing the data.
Or awk ;}

Code:
awk -F"[\x5d\x5b]" '{print $2}' test
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious

Cheers,
Tink
 
Old 07-11-2009, 10:31 PM   #12
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
awk 'BEGIN{FS="]"}{for(i=1;i<=NF;i++){gsub(/.*\[/,"",$i)} ;$NF=""}1' file
output
Code:
# more file
00608EA UNACKED DN [00000089 #1 Bob Smith]L 0 REL 00000384 [ | C5393580 | C53931FC ]
00608EA UNACKED DN [000000A8 #1 Andrew J Landau]L 0 REL 00000384 [ | C5393580 | C53931FC ]
00608EA UNACKED DN [00000217 #1 Robert James Alburn]L 0 REL 00000384 | C5393580 | C53931FC
00608EA DELETE D ~ [000001E9 #1 Mike Lious]L 0 NEVER

# ./test.sh
00000089 #1 Bob Smith  | C5393580 | C53931FC
000000A8 #1 Andrew J Landau  | C5393580 | C53931FC
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
 
  


Reply

Tags
bash, capture


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Capture RAW Printer Data metalx1000 Linux - General 1 10-22-2007 08:13 AM
capture html form data blizunt7 Programming 12 06-19-2005 08:31 PM
Serial port capture printed data erald Linux - Software 0 04-12-2005 01:12 AM
Kmail--Unbalanced Brackets mooreted Linux - Software 4 01-14-2005 07:41 PM
Capture Raw data from modem (ttyLT0) Avian00 Linux - Software 1 05-07-2003 01:45 PM


All times are GMT -5. The time now is 05:08 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration