Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
07-11-2009, 10:22 AM
|
#1
|
LQ Newbie
Registered: Jul 2009
Posts: 4
Rep:
|
Capture data in brackets
Trying to capture the data between the brackets in a bash script. Within the brackets the first two columns are always the same size length. Although the name can be multiple names and up to something like 64 characters. I'd like to get the name in one variable.
Sample of data:
00608EA UNACKED DN [00000089 #1 Bob Smith]L 0 REL 00000384 | C5393580 | C53931FC
00608EA UNACKED DN [000000A8 #1 Andrew J Landau]L 0 REL 00000384 | C5393580 | C53931FC
00608EA UNACKED DN [00000217 #1 Robert James Alburn]L 0 REL 00000384 | C5393580 | C53931FC
00608EA DELETE D ~ [000001E9 #1 Mike Lious]L 0 NEVER
|
|
|
07-11-2009, 10:46 AM
|
#2
|
LQ Newbie
Registered: Jul 2009
Posts: 4
Original Poster
Rep:
|
Also, in the script I am processing these lines one at a time. This is not the only type/format of line in the file. I'm guessing this will be slow but I'm just trying to get the job done.
|
|
|
07-11-2009, 10:55 AM
|
#3
|
Member
Registered: Sep 2004
Location: Old York, North Yorks.
Distribution: Debian 7 (mainly)
Posts: 653
Rep:
|
You could pipe each line through the following sed command:
Code:
sed 's/\(.*\[\)\([^]]*\)\(\].*\)/\2/'
This translates as:
Match:
\(.*\[\) - first group: any number of characters, followed by an opening square bracket.
\([^]]*\) - second group: any number of characters that are not a closing square bracket.
\(\].*\) - third group: a closing square bracket, followed by any number of characters.
... and substitute with:
\2 - whatever was matched by the second group.
For more, see Streams and Sed (Rute Users' Tutorial and Exposition)
Last edited by Robhogg; 07-11-2009 at 10:58 AM.
|
|
|
07-11-2009, 11:06 AM
|
#4
|
Senior Member
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536
Rep: 
|
Or, if you do other things to each line in a bash loop, here's how to do it for one line within bash:
Code:
while read LINE; do
LINE=${LINE%%]*}
LINE=${LINE##*[}
echo "$LINE"
done < file.txt
|
|
|
07-11-2009, 11:24 AM
|
#5
|
LQ Newbie
Registered: Jul 2009
Posts: 4
Original Poster
Rep:
|
That works great. Thanks!
Although, I have another similar scenario I thought the same would work...
interval: <3/4/2009 4:00:00 PM - 3/27/2009 3:00:00 PM BY_DOW 42>
I tried
sed 's/\(.*\<\)\(<^>>*\)\(\>.*\)/\2/'
and it returns the whole line. Does it have to do with the < > symbols?
I clearly need to study and better understand these kinds of expressions.
|
|
|
07-11-2009, 11:34 AM
|
#6
|
LQ Newbie
Registered: Jul 2009
Posts: 4
Original Poster
Rep:
|
The second technique does the job.
Although, I'd still like to understand what is different in the sed command.
Thanks again.
|
|
|
07-11-2009, 11:57 AM
|
#7
|
Member
Registered: Sep 2004
Location: Old York, North Yorks.
Distribution: Debian 7 (mainly)
Posts: 653
Rep:
|
Quote:
Originally Posted by nonnumquam
I tried
sed 's/\(.*\<\)\(<^>>*\)\(\>.*\)/\2/'
and it returns the whole line. Does it have to do with the < > symbols?
|
Yes, it does. the \< and \> are special (I noticed there's a bit of a misprint in the Rute's chapter on Sed) - they match the beginning and end of a word, while the unescaped < and > match literal angle brackets. Also, I supect you meant [^>] (any character that is not a right angle bracket) rather than <^>> (an angle bracket, a caret, and two more angle brackets). The pattern should be:
Code:
sed 's/\(.*<\)\([^>]*\)\(>.*\)/\2/'
Quote:
I clearly need to study and better understand these kinds of expressions.
|
A lifetime's study  . Regexes are very powerful, but also can be pretty confusing (especially given that there are at least three different vocabularies in regular use). Chapter 5 of the Rute should get you started.
Last edited by Robhogg; 07-11-2009 at 12:06 PM.
|
|
|
07-11-2009, 12:32 PM
|
#8
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
Although many prefer the extremely obscure sed syntax for this, I really hate to look at it, it makes me cringe when I see lines like:
Code:
sed 's/\(.*\[\)\([^]]*\)\(\].*\)/\2/'
Really, it makes me sick to my stomach, so I always like to remind myself that this is not the only way. For example:
Code:
bash-3.1$ cat test | cut -d [ -f 2 | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
or
Code:
bash-3.1$ cut -d [ -f 2 test | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
where 'test' is the file containing the data.
|
|
|
07-11-2009, 01:20 PM
|
#9
|
Member
Registered: Sep 2004
Location: Old York, North Yorks.
Distribution: Debian 7 (mainly)
Posts: 653
Rep:
|
Quote:
Originally Posted by H_TeXMeX_H
Although many prefer the extremely obscure sed syntax for this, I really hate to look at it, it makes me cringe... Really, it makes me sick to my stomach...
|
Wow, that's a strong reaction  .
Yes, in this case you're probably right. There are other cases, though, where a single juicy regular expression can be the alternative to a couple of dozen lines of code.
|
|
|
07-11-2009, 04:41 PM
|
#10
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
I have nothing against using sed or regular expressions, but they should be used where they are not obscure, or at least in the least obscure way possible. Personally I've never had to compose such a huge regular expression for anything ... and if I was trying to, I realized that maybe this was not the way. What will I think when I come back and look at the script later, it's just some magic line that will take a while to unravel.
Maybe for others they can see into the code ... are you Neo ?
EDIT:
Oh and BTW, I have nothing against your solution, it works and is probably more efficient than anything else. I'm just making a general comment, because I see a lot of these huge sed lines and it's hard to take, I usually just move on to the next thread at that point. I don't even bother testing them, I know for sure the magic hidden within is strong.
Last edited by H_TeXMeX_H; 07-11-2009 at 04:45 PM.
|
|
|
07-11-2009, 10:34 PM
|
#11
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally Posted by H_TeXMeX_H
...., so I always like to remind myself that this is not the only way. For example:
Code:
bash-3.1$ cat test | cut -d [ -f 2 | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
or
Code:
bash-3.1$ cut -d [ -f 2 test | cut -d ] -f 1
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
where 'test' is the file containing the data.
|
Or awk ;}
Code:
awk -F"[\x5d\x5b]" '{print $2}' test
00000089 #1 Bob Smith
000000A8 #1 Andrew J Landau
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
Cheers,
Tink
|
|
|
07-11-2009, 11:31 PM
|
#12
|
Senior Member
Registered: Aug 2006
Posts: 2,697
|
Code:
awk 'BEGIN{FS="]"}{for(i=1;i<=NF;i++){gsub(/.*\[/,"",$i)} ;$NF=""}1' file
output
Code:
# more file
00608EA UNACKED DN [00000089 #1 Bob Smith]L 0 REL 00000384 [ | C5393580 | C53931FC ]
00608EA UNACKED DN [000000A8 #1 Andrew J Landau]L 0 REL 00000384 [ | C5393580 | C53931FC ]
00608EA UNACKED DN [00000217 #1 Robert James Alburn]L 0 REL 00000384 | C5393580 | C53931FC
00608EA DELETE D ~ [000001E9 #1 Mike Lious]L 0 NEVER
# ./test.sh
00000089 #1 Bob Smith | C5393580 | C53931FC
000000A8 #1 Andrew J Landau | C5393580 | C53931FC
00000217 #1 Robert James Alburn
000001E9 #1 Mike Lious
|
|
|
All times are GMT -5. The time now is 06:18 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|