LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-06-2013, 03:14 PM   #1
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Rep: Reputation: 60
Python Newbie Question Regex


I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short,
I am using PDB for debugging and here my function in which I am having my issue:
Code:
import re
...
...
...

def find_all_flvs(url):
    soup = BeautifulSoup(urllib2.urlopen(url))
    flvs = []
    for link in soup.findAll(onclick=re.compile("doShowCHys=1*")):
        link = str(link)
        vidnum   = re.search("\d{5,6}.*&amp", link)
        vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

        for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):

            flvs.append(hashval_url.text)

    return flvs
I verified that my regex is correct(\d{5,6}.*&amp):
Code:
"/home/Player.aspx?lpk4=108148&playChapter=True\',960,540,94343);return false;"
produces:
Code:
108148
which is what I want, so when running pdb using steps and I get to:
Code:
vidnum   = re.search("\d{5,6}.*&amp", link)
and this is what I end up with as the output:
Code:
<_sre.SRE_Match object at 0xaaf8de8>
in which I should be seeing:
Code:
108148
so it can be simply appended to:
Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum
producing:
Code:
(pdb)p vidurl
I have been through several urls and cannot seem to figure out what I am doing wrong:

http://www.tutorialspoint.com/python...xpressions.htm

??

Last edited by metallica1973; 03-06-2013 at 04:43 PM.
 
Old 03-06-2013, 03:36 PM   #2
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
I made progress. The things you can find out by just reading:\
PHP Code:
re.search(patternstringflags=0)

    
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the patternnote that this is different from finding a zero-length match at some point in the string.

and 

 
re.findall(patternstringflags=0)

    Return 
all non-overlapping matches of pattern in string, as list of stringsThe string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return list of groupsthis will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match
I was simply using the wrong function. I replaced re.search with re.findall and it worked partially.
Code:
vidnum   = re.findall("\d{5,6}.*&amp", link)
(pdb)p vidum
['108148&amp']
(pdb)p vidurl
http://www.blahblah.com/home/GetPlay...px?lpk4=108148['108148&amp']
How do I remove the brackets and single quotes to produce only:
Code:
http://www.blahblah.com/home/GetPlay...px?lpk4=108148&amp
??

Last edited by metallica1973; 03-06-2013 at 03:44 PM.
 
Old 03-06-2013, 03:44 PM   #3
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,
Quote:
Originally Posted by metallica1973 View Post
How do I remove the brackets and single quotes to produce only:
Code:
108148&amp
??
That is just because vidum is an list (of length 1). So, what you actually want is:
Code:
vidum[0]
Evo2.
 
1 members found this post helpful.
Old 03-06-2013, 03:49 PM   #4
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
Are you kidding me! simply brilliant. Many Thanks
 
Old 03-06-2013, 05:34 PM   #5
mina86
Member
 
Registered: Aug 2008
Distribution: Debian
Posts: 517

Rep: Reputation: 229Reputation: 229Reputation: 229
You can use search just fine. It returns a Match object and you can access matched string by calling group() method.
Code:
def find_all_flvs(url):
    soup = BeautifulSoup(urllib2.urlopen(url))
    flvs = []
    numSearch = re.compile(r'\d{5,6}.*&amp;').search
    for link in soup.findAll(onclick=re.compile('doShowCHys=1*')):
        m = numSearch(str(link))
        if m:
            url = 'http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=' + m.group(0)
            for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):
                flvs.append(hashval_url.text)
    return flvs
Also note, that you should use r'…' (or r"…") when you write regexes. This is because in r'…' (and r"…") backslash has no special meaning and so you can pass it directly to the regex. Compare the two:
Code:
foo_re = re.compile(r'^[A-Z]:\\.*\\')
bar_re = re.compile('^[A-Z]:\\\\.*\\\\')
Read more at http://docs.python.org/2/library/re.html
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Python Regex Question mwwynne Programming 6 12-10-2012 11:58 PM
Newbie Python programming language question punchy71 Programming 4 08-25-2012 09:03 AM
[SOLVED] Python - a newbie's question rmknox Programming 3 04-26-2012 08:49 PM
Ubuntu, Python, EC2 Question from a Newbie jcrubino Linux - Newbie 1 04-11-2009 12:16 AM
Python/PyQt4 - Newbie Structure question ocularb0b Programming 0 01-28-2008 09:42 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration