LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-09-2015, 07:01 AM   #1
zinon75
LQ Newbie
 
Registered: Mar 2011
Location: europe
Distribution: debian, ubuntu, mint, redhat, ttylinux
Posts: 14

Rep: Reputation: 0
Question Regular expression question in python...


Code:
#!/usr/bin/python

import re

abc = """
random text
random text
random text
flashvars="flvsource=http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv&preview_image=http://a.jpg&backgcolor=FFFFFF&autoplay=true&url_logo=http://logo-player.png&logo=top_right&floating_navbar=false&color_nav_bar_top=0x478dc2&color_nav_bar_bottom=0xE7EBEC&ads_background_color=0x00CCFF&ads_border_color=0xCCCCCC&scrubber_position_color=0x6AA1CE&scrubber_load_color=0x888888&scrubber_background_color=0xBBBBBB&volume_bar_color=0xBBBBBB&aspect_ratio=stretch"></embed>' onClick="javascript:document.code.embed.focus();document.code.embed.select(); return false;" />
random text
random text
dsfgdsgdgdgdgdgdfgdgdfretrtsdgdf
"""

flvsource = re.findall('http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv', abc)
flvsource2 = re.findall('http://zzz.tvstation.tv/uploads/*.flv', abc)

print flvsource
print flvsource2
i need to extract this kind of string:
"http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv"

with something similar to this:
"http://zzz.tvstation.tv/uploads/*.flv"

how can i do this ?

Last edited by zinon75; 11-09-2015 at 07:08 AM.
 
Old 11-09-2015, 08:00 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
"http://zzz.tvstation.tv/uploads/*.flv" :- looking at this I would imagine it is a glob and not a regex, as it currently says:

find the string "http://zzz.tvstation.tv/uploads" followed by zero or more slashes (/*) followed by any character (.) and then the string "flv"

As you can see this would never match your previous string. If you want to practice your regex skills, may i suggest the following site:

http://www.rubular.com/

This should help you to build the necessary matching pattern
 
1 members found this post helpful.
Old 11-09-2015, 01:19 PM   #3
zinon75
LQ Newbie
 
Registered: Mar 2011
Location: europe
Distribution: debian, ubuntu, mint, redhat, ttylinux
Posts: 14

Original Poster
Rep: Reputation: 0
grail,

i got this working:
"http:\/\/zzz.tvstation.tv\/uploads\/.....................flv"

19 "." dots of 20 for the characters and numbers and the last dot for the ".flv".

I can't figure out how to do the "*.flv" without the dots.
 
Old 11-09-2015, 02:06 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Did you go to the site I recommended? If so, check out the quick reference at the bottom and specifically the last column on the right.

I will show you a gotcha line that will bugger up your regex and maybe something you have to be careful of:
Code:
flashvars="flvsource=http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv&preview_image=http://a.jpg&backgcolor=FFFFFF&autoplay=true&url_logo=http://logo-player.png_flv&logo=top_right&floating_navbar=false&color_nav_bar_top=0x478dc2&color_nav_bar_bottom=0xE7EBEC&ads_background_color=0x00CCFF&ads_border_color=0xCCCCCC&scrubber_position_color=0x6AA1CE&scrubber_load_color=0x888888&scrubber_background_color=0xBBBBBB&volume_bar_color=0xBBBBBB&aspect_ratio=stretch"></embed>' onClick="javascript:document.code.embed.focus();document.code.embed.select(); return false;" />
 
1 members found this post helpful.
Old 11-10-2015, 02:51 AM   #5
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by zinon75 View Post
I can't figure out how to do the "*.flv" without the dots.
It's ok buddy, you are actually really close. Try to invert the sequence. Instead of *. try .*

Since you've been working hard at this, I'll show you one possible approach. With this file (flv.txt):
Code:
http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv&preview_image=http://a.jpg&backgcolor=FFFFFF&autoplay=true&url_logo=http://logo-player.png&logo=top_right&floating_navbar=false&color_nav_bar_top=0x478dc2&color_nav_bar_bottom=0xE7EBEC&ads_background_color=0x00CCFF&ads_border_color=0xCCCCCC&scrubber_position_color=0x6AA1CE&scrubber_load_color=0x888888&scrubber_background_color=0xBBBBBB&volume_bar_color=0xBBBBBB&aspect_ratio=stretch"></embed>' onClick="javascript:document.code.embed.focus();document.code.embed.select(); return false;
I get the result you want with this regex (in bold):
Code:
grep -o 'http.*flv' flv.txt 
http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv
Check out grail's link, regular expressions can be a pain in the b.tt, but the are very useful!

Best regards,
HMW
 
1 members found this post helpful.
Old 11-10-2015, 02:53 AM   #6
zinon75
LQ Newbie
 
Registered: Mar 2011
Location: europe
Distribution: debian, ubuntu, mint, redhat, ttylinux
Posts: 14

Original Poster
Rep: Reputation: 0
Talking

grail,

"http:\/\/zzz.tvstation.tv\/uploads\/.{1,20}.flv"

ok, this is working right.

"check out the quick reference at the bottom and specifically the last column on the right."
that helped me out
 
Old 11-10-2015, 03:04 AM   #7
zinon75
LQ Newbie
 
Registered: Mar 2011
Location: europe
Distribution: debian, ubuntu, mint, redhat, ttylinux
Posts: 14

Original Poster
Rep: Reputation: 0
HMW,

that works too.

I use the "?" and "*" in bash all the time,
but i'm not familiar with the regular expressions
in python or in ruby.
 
Old 11-10-2015, 04:10 AM   #8
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by zinon75 View Post
HMW,

that works too.

I use the "?" and "*" in bash all the time,
but i'm not familiar with the regular expressions
in python or in ruby.
They are pretty much the same.
.* means (roughly) "repeat any character any number of times", whereas
.{1,20} means repeat any character between one and 20 times.

So, as you have noticed, they both work for you in this particular case.

Best regards,
HMW
 
Old 11-10-2015, 10:23 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I would also point out, that had you used HMW's suggested solution on the extra link I provided, this is where you might get tripped up, however, part of writing regular expressions also relies on how well you know the data, ie. will you never have a file name longer than 20 characters??
 
Old 11-10-2015, 10:54 AM   #10
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by grail View Post
I would also point out, that had you used HMW's suggested solution on the extra link I provided, this is where you might get tripped up
^True. In that case, you'd have to use this regex:
Code:
grep -o 'http.*\.flv' flv.txt 
http://zzz.tvstation.tv/uploads/abcd345mngjrUeoKSDfd.flv
Best regards,
HMW
 
Old 11-10-2015, 11:31 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
hmmm ... yes in that case it would work and i am also not familiar with what alterations there might be, ie. there could be another '.flv' file name further along the string.
My point was to show the issue of greediness can sometimes be a limiting factor
 
  


Reply

Tags
python, regex



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Question MTK358 Programming 2 12-10-2010 01:38 PM
Regular Expression Question windisch Programming 8 05-22-2007 03:27 PM
Regular expression question gauge73 Linux - General 2 10-28-2005 09:32 AM
regular expression question zero79 Linux - Software 1 07-11-2005 07:03 PM
regular expression question Gantrep Linux - Software 2 04-20-2003 04:24 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration