LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-27-2014, 08:49 AM   #1
sayhello_to_the_world
Member
 
Registered: May 2013
Posts: 229

Rep: Reputation: Disabled
python pewee: minor changes to get the job done


well fairly new to python i want to store the results of a parsing job in db.I heard of peewee which is told to be very useful and handy for such tasks.

i want to use python and peewee, I think i have to do something like the following:- after insalling peewee correctly i runned the script and now see what happened.


Code:
import urllib
import urlparse
import re
import peewee
import json

db = MySQLDatabase('cpan', user='root',passwd='rimbaud')

class User(Model):
    name = TextField()
    cname = TextField()
    email = TextField()
    url = TextField()

    class Meta:
        database = db # this model uses the cpan database


User.create_table() #ensure table is created


url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)

    data = { 'url':alk, 'name':name, 'cname':capname }

    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:
        data['email'] = memail.group(1)


data = json.load() #your json data file here

for entry in data: #assuming your data is an array of JSON objects
    user = User.create(name=entry["name"], cname=entry["cname"],
        email=entry["email"], url=entry["url"])
    user.save()
i got back this error.


Code:
Traceback (most recent call last):
  File "cpan5.py", line 10, in <module>
    db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
NameError: name 'MySQLDatabase' is not defined
linux-70ce:/home/martin/perl #
assuming this is all right now - i have set up this...
so well - but it fails at a certain point.

Code:
import urllib
import urlparse
import re
# import peewee
import json
from peewee import *



#from peewee import MySQLDatabase ('cpan', user='root',passwd='rimbaud') 


db = MySQLDatabase('cpan', user='root',passwd='rimbaud') 

class User(Model):
    name = TextField()
    cname = TextField()
    email = TextField()
    url = TextField()

    class Meta:
        database = db # this model uses the cpan database


User.create_table() #ensure table is created


url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)

    data = { 'url':alk, 'name':name, 'cname':capname }

    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:
        data['email'] = memail.group(1)


data = json.load('emailyour json data file here

for entry in data: #assuming your data is an array of JSON objects
    user = User.create(name=entry["name"], cname=entry["cname"],
        email=entry["email"], url=entry["url"])
    user.save()
guess that there a data-file must exist: one that have been created by the script during the parsing... is this right?


Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
  File "cpan_100.py", line 47, in <module>
    data = json.load('emailyour json data file here
  File "/usr/lib/python2.7/json/__init__.py", line 286, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
martin@linux-70ce:~/perl>

well - atm i do not know why i get so much errory.
I would be happy for any and all hints.

love to hear from you
 
Old 08-27-2014, 09:53 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by sayhello_to_the_world View Post
well fairly new to python i want to store the results of a parsing job in db.I heard of peewee which is told to be very useful and handy for such tasks. i want to use python and peewee, I think i have to do something like the following:- after insalling peewee correctly i runned the script and now see what happened.

i got back this error.
Code:
Traceback (most recent call last):
  File "cpan5.py", line 10, in <module>
    db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
NameError: name 'MySQLDatabase' is not defined
linux-70ce:/home/martin/perl #
assuming this is all right now - i have set up this...so well - but it fails at a certain point.
Does it fail or not??? Why are you ASSUMING it works? Did you correct the error in the script or not?
Quote:
guess that there a data-file must exist: one that have been created by the script during the parsing... is this right?
..and why are you asking? Can you not see if a file is created? Did you LOOK for a file?
Quote:
Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
  File "cpan_100.py", line 47, in <module>
    data = json.load('emailyour json data file here
  File "/usr/lib/python2.7/json/__init__.py", line 286, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
martin@linux-70ce:~/perl>
well - atm i do not know why i get so much errory.
Did you not read the error message?? It is VERY clear...you're trying to load a file of the name "emailyour json data file here", didn't close your parens, and have a TOTALLY different syntax from what you had in the first script. You've been posting this same script/error set for over two months in other forums, and have had other people point this out. Why is this not clear?

Either fix the file name/syntax, or it won't work.
 
Old 08-27-2014, 12:16 PM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,222

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
Quote:
data = json.load('emailyour json data file here
Well, this line, which was clearly pointed out in the error message, is obviously a) wrong and b) the problem. What is it supposed to do, and why do you think that what you have there will work?

If it was supposed to be an "example", then you were supposed to change the quoted text in the example to a file-like object, not to leave it as a string (with no closing quote).

Last edited by dugan; 08-27-2014 at 12:25 PM.
 
1 members found this post helpful.
Old 08-28-2014, 03:00 AM   #4
sayhello_to_the_world
Member
 
Registered: May 2013
Posts: 229

Original Poster
Rep: Reputation: Disabled
hello TBone hello dugan

many thanks for the replies and all the tipps. I will do as adviced and will name a file eg like so

what is done here: We are passing a string to json.load. This line expects a"file like" object,
We can call open on a file and use the returned handle.

Or - what if we just use the results of the parser`?

BTW: We are already populating the data object when parsing the html, so we can say that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at he end as written.
That above mentioned line is just added as an example as - but it is clear that we get the data from initially - the
parsing job!

We also might want to do data = []
before the html parsing loop and then we can do entry = { 'url':alk, 'name':name, 'cname':capname } and data.append(entry.copy()) within the loop.

what do you say!`?

Last edited by sayhello_to_the_world; 08-28-2014 at 08:28 AM.
 
Old 08-28-2014, 08:58 AM   #5
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by sayhello_to_the_world View Post
hello TBone hello dugan
many thanks for the replies and all the tipps. I will do as adviced and will name a file eg like so
These aren't 'tips'...this is reading the VERY CLEAR ERROR MESSAGE. It told you specifically what the problem was, and where.
Quote:
what is done here: We are passing a string to json.load. This line expects a"file like" object, We can call open on a file and use the returned handle. Or - what if we just use the results of the parser`?

BTW:We are already populating the data object when parsing the html, so we can say that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at he end as written. That above mentioned line is just added as an example as - but it is clear that we get the data from initially - the parsing job!

We also might want to do data = [] before the html parsing loop and then we can do entry = { 'url':alk, 'name':name, 'cname':capname } and ata.append(entry.copy()) within the loop.

what do you say!`?
We say it's your program...write it however you'd like. This is much like your other threads, asking about Perl and other languages to do DB insertion/manipulation. Those programs were copied from other websites, and you just posted errors you got when trying to run them. If you're not going to actually learn how to write the code, then you really should hire someone to write it for you.
 
Old 09-02-2014, 04:22 PM   #6
sayhello_to_the_world
Member
 
Registered: May 2013
Posts: 229

Original Poster
Rep: Reputation: Disabled
hello dear TBone

tx for the hints and all your support!




we are already populating the data object when parsing the html, - well that said i think that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at the end as written. we are getting the data from the parsing process.

so we can go like this:



Code:
import urllib
import urlparse
import re
# import peewee
import json
from peewee import *


#from peewee import MySQLDatabase ('cpan', user='root',passwd='rimbaud') 


db = MySQLDatabase('cpan', user='root',passwd='rimbaud') 

class User(Model):
    name = TextField()
    cname = TextField()
    email = TextField()
    url = TextField()

    class Meta:
        database = db # this model uses the cpan database

        
User.create_table() #ensure table is created


url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)

    data = { 'url':alk, 'name':name, 'cname':capname }

    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:
        data['email'] = memail.group(1)


# data = json.load('email') #your json data file here

for entry in data: #assuming your data is an array of JSON objects
    user = User.create(name=entry["name"], cname=entry["cname"],
        email=entry["email"], url=entry["url"])
    user.save()
note : i disabled the line in the code:

Code:
 # data = json.load('email') #your json data file here
doing so i get ahead and run the code:


Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
  File "cpan_100.py", line 27, in <module>
    User.create_table() #ensure table is created
  File "build/bdist.linux-i686/egg/peewee.py", line 3078, in create_table                                                                                                           
  File "build/bdist.linux-i686/egg/peewee.py", line 2471, in create_table                                                                                                           
  File "build/bdist.linux-i686/egg/peewee.py", line 2414, in execute_sql                                                                                                            
  File "build/bdist.linux-i686/egg/peewee.py", line 2283, in __exit__                                                                                                               
  File "build/bdist.linux-i686/egg/peewee.py", line 2406, in execute_sql                                                                                                            
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute                                                                                                  
    self.errorhandler(self, exc, value)                                                                                                                                             
  File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler                                                                                   
    raise errorclass, errorvalue                                                                                                                                                    
peewee.OperationalError: (1050, "Table 'user' already exists")                                                                                                                      
martin@linux-70ce:~/perl>
WELL - it seems to be clear that i have some other issues now...

Last edited by sayhello_to_the_world; 09-02-2014 at 04:31 PM.
 
Old 09-02-2014, 06:29 PM   #7
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by sayhello_to_the_world View Post
hello dear TBone
tx for the hints and all your support!
And as you've been told MANY times, SPELL OUT YOUR WORDS, and don't use text speak.
Quote:
we are already populating the data object when parsing the html, - well that said i think that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at the end as written. we are getting the data from the parsing process. so we can go like this:
Code:
import urllib
import urlparse
import re
# import peewee
import json
from peewee import *
#from peewee import MySQLDatabase ('cpan', user='root',passwd='rimbaud') 
db = MySQLDatabase('cpan', user='root',passwd='rimbaud') 

class User(Model):
    name = TextField()
    cname = TextField()
    email = TextField()
    url = TextField()

    class Meta:
        database = db # this model uses the cpan database
        
User.create_table() #ensure table is created

url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)
    data = { 'url':alk, 'name':name, 'cname':capname }
    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:
        data['email'] = memail.group(1)

# data = json.load('email') #your json data file here
for entry in data: #assuming your data is an array of JSON objects
    user = User.create(name=entry["name"], cname=entry["cname"],
        email=entry["email"], url=entry["url"])
    user.save()
note : i disabled the line in the code:
Code:
 # data = json.load('email') #your json data file here
doing so i get ahead and run the code:
Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
  File "cpan_100.py", line 27, in <module>
    User.create_table() #ensure table is created
  File "build/bdist.linux-i686/egg/peewee.py", line 3078, in create_table                                                                                                           
  File "build/bdist.linux-i686/egg/peewee.py", line 2471, in create_table                                                                                                           
  File "build/bdist.linux-i686/egg/peewee.py", line 2414, in execute_sql                                                                                                            
  File "build/bdist.linux-i686/egg/peewee.py", line 2283, in __exit__                                                                                                               
  File "build/bdist.linux-i686/egg/peewee.py", line 2406, in execute_sql                                                                                                            
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute                                                                                                  
    self.errorhandler(self, exc, value)                                                                                                                                             
  File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler                                                                                   
    raise errorclass, errorvalue                                                                                                                                                    
peewee.OperationalError: (1050, "Table 'user' already exists")                                                                                                                      
martin@linux-70ce:~/perl>
WELL - it seems to be clear that i have some other issues now...
Right, because you aren't passing the right arguments to the functions, since you commented it out, rather than fixing the syntax problem(s). You've asked about how to do this in perl before, and now python. You've directly copied programs from others...haven't you tried to read/understand the code, so that YOU can write it, rather than asking us to customize a program for you?
 
Old 09-06-2014, 04:34 PM   #8
sayhello_to_the_world
Member
 
Registered: May 2013
Posts: 229

Original Poster
Rep: Reputation: Disabled
hello dear TBone,


right said - you re right: All your preliminary thoughts and ideas are not bad - sure thing.

you convinced me in doing more work to find out the issues. I for now figure out what goes wrong with the database connection.


Code:
  File "cpan_100.py", line 27, in <module>
    User.create_table() #ensure table is created
  File "build/bdist.linux-i686/egg/peewee.py", line 3078, in create_table                                                                                                           
  File "build/bdist.linux-i686/egg/peewee.py", line 2471, in create_table                                                                                                           
  File "build/bdist.linux-i686/egg/peewee.py", line 2414, in execute_sql                                                                                                            
  File "build/bdist.linux-i686/egg/peewee.py", line 2283, in __exit__                                                                                                               
  File "build/bdist.linux-i686/egg/peewee.py", line 2406, in execute_sql                                                                                                            
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute                                                                                                  
    self.errorhandler(self, exc, value)                                                                                                                                             
  File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
i will come back and let you know all my findings


thankx for all your help - greetings
 
Old 09-06-2014, 08:49 PM   #9
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by sayhello_to_the_world View Post
hello dear TBone,

right said - you re right: All your preliminary thoughts and ideas are not bad - sure thing.

you convinced me in doing more work to find out the issues. I for now figure out what goes wrong with the database connection.

i will come back and let you know all my findings
Yes...right...like you've said several times before, after you've been "convinced" to show some effort, but don't ever actually seem to DO it. Just like you've never come back and posted your work or solutions.

Based on your posting history, I just don't believe you.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Zed Creator Quits His Job, Will Attempt to Make the Open Source Editor His Day Job jeremy Linux - News 0 04-02-2014 12:17 PM
LXer: Python Python Python (aka Python 3) LXer Syndicated Linux News 0 08-05-2009 08:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:15 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration