LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-14-2012, 07:31 PM   #1
buntuser
LQ Newbie
 
Registered: Apr 2012
Posts: 8

Rep: Reputation: Disabled
extract part of line based on multiple condtions (sed?)


i want to extract part of a line, for example:
MANGO444CHERRY--ORANGE--WHITE--WHITE--MANGO555CHERRY--ORANGE--BLACK--BLACK
i want to take the number followed by CHERRY based on the following color. (white or black)
i tried:
Code:
 sed -r 's/MANGO([0-9]+).*WHITE.*?/\1/']
which gave me the desired output: 444
but, when i tried
Code:
 sed -r 's/MANGO([0-9]+).*BLACK.*?/\1/'
i expected to get 555, but instead got the whole line.
+ my expression does not take ORANGE into account
what i need is an expression that will:
a. return 444 if set to MANGO([0-9]+).*ORANGE.*WHITE
b. return 555 if set to MANGO([0-9]+).*ORANGE.*BLACK
c. return nothing if there is no ORANGE in the string i.e:
MANGO444CHERRY--BANANA--WHITE--WHITE--MANGO555CHERRY--BANANA--BLACK--BLACK --> will return no output

Last edited by buntuser; 10-15-2012 at 04:18 AM.
 
Old 10-14-2012, 08:34 PM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
Please put your code in [code] tags to make it easier to read.

First, what is the intent of this construct?:
Code:
.*?
This appears to be saying: "a group of any number of characters which either does or does not appear". This --I think-- is redundant.

Second, note that
Code:
MANGO([0-9]+).*
matches the first instance of <<MANGO, followed by at least one number, then any number of characters>>. The second instance will get lost in the ".*". Thus, to force it to get the 2nd instance, you have to work a bit harder.

I think that addressing is going to be helpful here---example, find lines containing "ORANGE" and then replacing "RED" with "BLUE", you can do this:
Code:
sed '/ORANGE/s/RED/BLUE/' filename
 
Old 10-15-2012, 05:14 AM   #3
buntuser
LQ Newbie
 
Registered: Apr 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
pixellany - the line in the example is just for demonstarting my problem. actually, the line is more like:
MANGO444CHERRYab c% f12 3#$tORANGE<55> &y#g2Tzz)-WHITE etc..
this is why .*? is there
regarding MANGO([0-9]+).* - yes, i want to get the first instance only, not all of them. but it has to match two criterias.
trying to describe the command in plain english:
match the first number, preceded by MANGO, and followed by A (AND) B
so, looking at our line of text: MANGO444CHERRY--ORANGE--WHITE--WHITE--MANGO555CHERRY--ORANGE--BLACK--BLACK
assuming A=ORANGE B=WHITE the expected output would be 444
assuming A=ORANGE B=BLACK the expected output would be 555
in case the command configuration finds no match, no output will occure (null). for example:
assuming A=APPLE B=WHITE the expected output would be null
thanks!
 
Old 10-15-2012, 05:25 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
Please post a snippet of the actual file---as I said earlier, if MANGO444 and MANGO555 are on the same line, it makes the problem harder. I'm assuming from your example that the logic is:
If ORANGE and WHITE, then get 1st instance of MANGO[0-9]+, and extract just the number
If ORANGE and BLACK, then get 2nd instance of MANGO[0-9]+, and extract just the number
If ^ORANGE, then return nothing

Finally, your sample line of text does not explain why you are using ".*?"

Last edited by pixellany; 10-15-2012 at 05:36 AM.
 
Old 10-15-2012, 04:48 PM   #5
buntuser
LQ Newbie
 
Registered: Apr 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pixellany View Post
Please post a snippet of the actual file---as I said earlier, if MANGO444 and MANGO555 are on the same line, it makes the problem harder. I'm assuming from your example that the logic is:
If ORANGE and WHITE, then get 1st instance of MANGO[0-9]+, (followed by MANGO AND WHITE) and extract just the number --> TRUE
If ORANGE and BLACK, then get 2nd instance of MANGO[0-9]+, (followed by MANGO AND WHITE) and extract just the number --> TRUE
If ^ORANGE, then return nothing --> TRUE

Finally, your sample line of text does not explain why you are using ".*?"
my remarks above
--> reviweing your first answer, i did not aim for
Quote:
a group of any number of characters which either does or does not appear
but rather - "a group of any number or kind of characters"

Last edited by buntuser; 10-15-2012 at 04:50 PM.
 
Old 10-15-2012, 08:01 PM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
Quote:
Originally Posted by buntuser View Post
my remarks above
--> reviweing your first answer, i did not aim for but rather - "a group of any number or kind of characters"
.* = any number of characters, including spaces (anything printable)
.? = a character is optional
.+ = at least one character

? means that the preceding regex is optional---thus one might assume that ".*?" means "any number of characters---or not". But "any number of characters" includes NO characters, so I still think ".*?" is redundant.......can you give an example of how ".*?" and ".*" would give a different result.

With your confirmation of the main logic, I'll think about what the code should be.
 
Old 10-16-2012, 05:00 AM   #7
buntuser
LQ Newbie
 
Registered: Apr 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
It shuld be just *
 
Old 10-16-2012, 08:07 AM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
Here is something that I think matches the logic discussed earlier. First is dump of a file I created, followed by a sed command. The logic is:
If white and blue, then print the number after the 1st "red".
If white and green, then print the number after the 2nd "red":
Code:
[mherring@herring_desk play]$ more logic
red1 white blue red2
red1 white green red2
red1 black blue red2
red1 black green red2
red1 white blue red2
[mherring@herring_desk play]$ sed -n  '/white/{/blue/s/red/XXX/1;/green/s/red/XXX/2};s/.*XXX\([0-9]*\).*/\1/p' logic
1
2
1
 
Old 10-16-2012, 09:14 AM   #9
buntuser
LQ Newbie
 
Registered: Apr 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
thank you pixellany for your help.
Code:
 sed -nr 's/.*MANGO([0-9]+).*APPLE.*WHITE.*/\1/p'
also did the trick for me... thanks again!
 
Old 10-16-2012, 08:02 PM   #10
buntuser
LQ Newbie
 
Registered: Apr 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
hi again
it appears sed commad works on the sample document, but not on the real one..
i tried both
Code:
 sed -n  '/hebrew/{/DESPiTE/s/downloadsubtitle.php?id=/XXX/1};s/.*XXX\([0-9]*\).*/\1/p'
which returns 228344 insted of 228338 (i used ony part of your suggestion, because i need only the first match.
and:
Code:
 sed -nr 's/.*downloadsubtitle.php\?id\=([0-9]+).*hebrew.*DESPiTE.*/\1/p'
which returns 228343 instead of 228338
the expected result is 228338 because it's the first number preceded by "downloadsubtitle.php\?id\=" and followed by "hebrew" and "DESPiTE"
i pasted the actual doc here: http://pastebin.com/unAifctF becuse it seemed too long to put it here on the forum.
hope this makes scence...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
filter source line based on results line in log using awk and sed samanp Programming 5 04-06-2011 10:42 AM
Extract part of a string based on regex winairmvs Linux - Software 5 02-14-2011 01:56 PM
extract part of a line with sed or awk alirezan1 Linux - Newbie 2 10-01-2008 10:44 PM
sed to extract multiple matches in a line? mhoch3 Linux - Software 8 08-01-2005 04:32 PM
How to extract a part of a line by sed? J_Szucs Programming 2 02-15-2003 07:49 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:59 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration