Python BeautifulSoup Re Finding Digits Within Tags
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
rpart = wphtml.find_all('tbody', limit=1, td=re.compile('\<td\>\d*.\d*.\d*.\<\/td\>'))
4.2.2
4.2.1
etc..
or
for tag in wphtml.find_all('tbody', limit=1, string=re.compile("\b\<td\>\d*.\d*.\d*.\<\/td\>\b")):
print(tag.content)
4.2.2
4.2.1
etc..
So what I am trying to do is:
1 - Search through the html page and capture on the first [tbody]....[/tbody], hence limit=1
2 - Regex through the results and only print out the digits that are inside the <td>\d*.\d*.\d*.\<td> tags
3 - Resulting in:
4.2.2
4.2.1
etc..
Last edited by metallica1973; 07-20-2015 at 02:57 PM.
Ok, me again. I tried my own idea on a rather familiar http address (http://www.linuxquestions.org/questions/) to see if it works in practice with wget, and it does. Check this out:
Code:
wget -qO- http://www.linuxquestions.org/questions/ | sed -n '/<form.*>/,/<\/form>/p' | sed -n 's/.*<td.*>\(.*\)<\/td>.*$/\1/p'
User Name
Password
What I did here was to pipe the output from wget (the source code for linuxquestions.org/questions/), to sed. Then I extracted the text between the first <form> tags, which I piped into another sed where I extracted and printed ONLY the text between the <td> tags, which is Username and Password.
So, if you simply modify the address to your address, and change the tags to the ones you are looking for, I see no reason why you shouldn't be able to get this working for you without BeautifulSoup.
Best regards,
HMW
PS. I like both Python and BeautifulSoup, but I do believe it's overkill for this operation. However, if you WANT to use those you should of course do that. DS.
Last edited by HMW; 07-21-2015 at 02:00 PM.
Reason: Spelling... again...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.