LinuxQuestions.org - Python Combing Two Commands

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Python Combing Two Commands (https://www.linuxquestions.org/questions/programming-9/python-combing-two-commands-4175546332/)

Python Combing Two Commands

I have been digging deeper into Python and want to make my code as efficient as possible. The less line of code the better so I have been experimenting and wanted to ask the Python gurus if this is possible. So:

Code:

...

...

In [109]: kbfileurl = re.search('<p>For more information about this update.*</p>', tbull.text.encode('utf8'))

In [110]: kbfileurl.group()

Out[110]: '<p>For more information about this update, see <a href="https://support.microsoft.com/kb/3020393">Microsoft Knowledge Base Article 3020393</a>.</p>'

So based on the string of the url that I parsed out of the html page, I would like to pull only in a one-liner:
https://support.microsoft.com/kb/3020393
So is it possible to combine kbfileinfo,group with re,compile:

Code:

kbfileurl.group().encode('ascii')re.compile(r'\bhttps://support.microsoft.com/kb/d+\b')

to parse out:
https://support.microsoft.com/kb/3020393
??

After playing around with it, I did a small modification and came up with but not exactly a one-liner:

Code:

In [201]: kbfileurl = re.search('<p>For more information about this update.*</p>', tbull.text.encode('utf8')).group()



In [202]: kbfileurl

Out[202]: '<p>For more information about this update, see <a href="https://support.microsoft.com/kb/3020393">Microsoft Knowledge Base Article 3020393</a>.</p>'



In [203]: kburl = re.search(r'\bhttps://support.microsoft.com/kb/\d+\b', kbfileurl).group(0)



In [204]: kburl

Out[204]: 'https://support.microsoft.com/kb/3020393'

You should not write your own regex to parse HTML in the real world, but nothing wrong with trying to do it for learning.

Anyway, you should be doing this with one regex search. If this works, it works:

Code:

re.search(r'\bhttps://support.microsoft.com/kb/d+\b', CONTENTS_OF_ENTIRE_FILE)

If that's returning too many results, then put more contextual information in the regex.