ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Trying to capture the data between the brackets in a bash script. Within the brackets the first two columns are always the same size length. Although the name can be multiple names and up to something like 64 characters. I'd like to get the name in one variable.
Sample of data:
00608EA UNACKED DN [00000089 #1 Bob Smith]L 0 REL 00000384 | C5393580 | C53931FC
00608EA UNACKED DN [000000A8 #1 Andrew J Landau]L 0 REL 00000384 | C5393580 | C53931FC
00608EA UNACKED DN [00000217 #1 Robert James Alburn]L 0 REL 00000384 | C5393580 | C53931FC
00608EA DELETE D ~ [000001E9 #1 Mike Lious]L 0 NEVER
Also, in the script I am processing these lines one at a time. This is not the only type/format of line in the file. I'm guessing this will be slow but I'm just trying to get the job done.
You could pipe each line through the following sed command:
Code:
sed 's/\(.*\[\)\([^]]*\)\(\].*\)/\2/'
This translates as:
Match: \(.*\[\) - first group: any number of characters, followed by an opening square bracket. \([^]]*\) - second group: any number of characters that are not a closing square bracket. \(\].*\) - third group: a closing square bracket, followed by any number of characters.
... and substitute with: \2 - whatever was matched by the second group.
For more, see Streams and Sed (Rute Users' Tutorial and Exposition)
and it returns the whole line. Does it have to do with the < > symbols?
Yes, it does. the \< and \> are special (I noticed there's a bit of a misprint in the Rute's chapter on Sed) - they match the beginning and end of a word, while the unescaped < and > match literal angle brackets. Also, I supect you meant [^>] (any character that is not a right angle bracket) rather than <^>> (an angle bracket, a caret, and two more angle brackets). The pattern should be:
Code:
sed 's/\(.*<\)\([^>]*\)\(>.*\)/\2/'
Quote:
I clearly need to study and better understand these kinds of expressions.
A lifetime's study. Regexes are very powerful, but also can be pretty confusing (especially given that there are at least three different vocabularies in regular use). Chapter 5 of the Rute should get you started.
Although many prefer the extremely obscure sed syntax for this, I really hate to look at it, it makes me cringe when I see lines like:
Code:
sed 's/\(.*\[\)\([^]]*\)\(\].*\)/\2/'
Really, it makes me sick to my stomach, so I always like to remind myself that this is not the only way. For example:
Code:
bash-3.1$ cat test | cut -d [ -f 2 | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
or
Code:
bash-3.1$ cut -d [ -f 2 test | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
Although many prefer the extremely obscure sed syntax for this, I really hate to look at it, it makes me cringe... Really, it makes me sick to my stomach...
Wow, that's a strong reaction.
Yes, in this case you're probably right. There are other cases, though, where a single juicy regular expression can be the alternative to a couple of dozen lines of code.
I have nothing against using sed or regular expressions, but they should be used where they are not obscure, or at least in the least obscure way possible. Personally I've never had to compose such a huge regular expression for anything ... and if I was trying to, I realized that maybe this was not the way. What will I think when I come back and look at the script later, it's just some magic line that will take a while to unravel.
Maybe for others they can see into the code ... are you Neo ?
EDIT:
Oh and BTW, I have nothing against your solution, it works and is probably more efficient than anything else. I'm just making a general comment, because I see a lot of these huge sed lines and it's hard to take, I usually just move on to the next thread at that point. I don't even bother testing them, I know for sure the magic hidden within is strong.
Last edited by H_TeXMeX_H; 07-11-2009 at 03:45 PM.
...., so I always like to remind myself that this is not the only way. For example:
Code:
bash-3.1$ cat test | cut -d [ -f 2 | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
or
Code:
bash-3.1$ cut -d [ -f 2 test | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
where 'test' is the file containing the data.
Or awk ;}
Code:
awk -F"[\x5d\x5b]" '{print $2}' test
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.