ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
well fairly new to python i want to store the results of a parsing job in db.I heard of peewee which is told to be very useful and handy for such tasks.
i want to use python and peewee, I think i have to do something like the following:- after insalling peewee correctly i runned the script and now see what happened.
Code:
import urllib
import urlparse
import re
import peewee
import json
db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
class User(Model):
name = TextField()
cname = TextField()
email = TextField()
url = TextField()
class Meta:
database = db # this model uses the cpan database
User.create_table() #ensure table is created
url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
alk = urlparse.urljoin(url, lk)
data = { 'url':alk, 'name':name, 'cname':capname }
phtml = urllib.urlopen(alk).read()
memail = re.search('<a href="mailto:(.*?)">', phtml)
if memail:
data['email'] = memail.group(1)
data = json.load() #your json data file here
for entry in data: #assuming your data is an array of JSON objects
user = User.create(name=entry["name"], cname=entry["cname"],
email=entry["email"], url=entry["url"])
user.save()
i got back this error.
Code:
Traceback (most recent call last):
File "cpan5.py", line 10, in <module>
db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
NameError: name 'MySQLDatabase' is not defined
linux-70ce:/home/martin/perl #
assuming this is all right now - i have set up this...
so well - but it fails at a certain point.
Code:
import urllib
import urlparse
import re
# import peewee
import json
from peewee import *
#from peewee import MySQLDatabase ('cpan', user='root',passwd='rimbaud')
db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
class User(Model):
name = TextField()
cname = TextField()
email = TextField()
url = TextField()
class Meta:
database = db # this model uses the cpan database
User.create_table() #ensure table is created
url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
alk = urlparse.urljoin(url, lk)
data = { 'url':alk, 'name':name, 'cname':capname }
phtml = urllib.urlopen(alk).read()
memail = re.search('<a href="mailto:(.*?)">', phtml)
if memail:
data['email'] = memail.group(1)
data = json.load('emailyour json data file here
for entry in data: #assuming your data is an array of JSON objects
user = User.create(name=entry["name"], cname=entry["cname"],
email=entry["email"], url=entry["url"])
user.save()
guess that there a data-file must exist: one that have been created by the script during the parsing... is this right?
Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
File "cpan_100.py", line 47, in <module>
data = json.load('emailyour json data file here
File "/usr/lib/python2.7/json/__init__.py", line 286, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
martin@linux-70ce:~/perl>
well - atm i do not know why i get so much errory.
I would be happy for any and all hints.
well fairly new to python i want to store the results of a parsing job in db.I heard of peewee which is told to be very useful and handy for such tasks. i want to use python and peewee, I think i have to do something like the following:- after insalling peewee correctly i runned the script and now see what happened.
i got back this error.
Code:
Traceback (most recent call last):
File "cpan5.py", line 10, in <module>
db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
NameError: name 'MySQLDatabase' is not defined
linux-70ce:/home/martin/perl #
assuming this is all right now - i have set up this...so well - but it fails at a certain point.
Does it fail or not??? Why are you ASSUMING it works? Did you correct the error in the script or not?
Quote:
guess that there a data-file must exist: one that have been created by the script during the parsing... is this right?
..and why are you asking? Can you not see if a file is created? Did you LOOK for a file?
Quote:
Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
File "cpan_100.py", line 47, in <module>
data = json.load('emailyour json data file here
File "/usr/lib/python2.7/json/__init__.py", line 286, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
martin@linux-70ce:~/perl>
well - atm i do not know why i get so much errory.
Did you not read the error message?? It is VERY clear...you're trying to load a file of the name "emailyour json data file here", didn't close your parens, and have a TOTALLY different syntax from what you had in the first script. You've been posting this same script/error set for over two months in other forums, and have had other people point this out. Why is this not clear?
Either fix the file name/syntax, or it won't work.
Well, this line, which was clearly pointed out in the error message, is obviously a) wrong and b) the problem. What is it supposed to do, and why do you think that what you have there will work?
If it was supposed to be an "example", then you were supposed to change the quoted text in the example to a file-like object, not to leave it as a string (with no closing quote).
many thanks for the replies and all the tipps. I will do as adviced and will name a file eg like so
what is done here: We are passing a string to json.load. This line expects a"file like" object,
We can call open on a file and use the returned handle.
Or - what if we just use the results of the parser`?
BTW: We are already populating the data object when parsing the html, so we can say that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at he end as written.
That above mentioned line is just added as an example as - but it is clear that we get the data from initially - the
parsing job!
We also might want to do data = []
before the html parsing loop and then we can do entry = { 'url':alk, 'name':name, 'cname':capname } and data.append(entry.copy()) within the loop.
what do you say!`?
Last edited by sayhello_to_the_world; 08-28-2014 at 08:28 AM.
hello TBone hello dugan
many thanks for the replies and all the tipps. I will do as adviced and will name a file eg like so
These aren't 'tips'...this is reading the VERY CLEAR ERROR MESSAGE. It told you specifically what the problem was, and where.
Quote:
what is done here: We are passing a string to json.load. This line expects a"file like" object, We can call open on a file and use the returned handle. Or - what if we just use the results of the parser`?
BTW:We are already populating the data object when parsing the html, so we can say that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at he end as written. That above mentioned line is just added as an example as - but it is clear that we get the data from initially - the parsing job!
We also might want to do data = [] before the html parsing loop and then we can do entry = { 'url':alk, 'name':name, 'cname':capname } and ata.append(entry.copy()) within the loop.
what do you say!`?
We say it's your program...write it however you'd like. This is much like your other threads, asking about Perl and other languages to do DB insertion/manipulation. Those programs were copied from other websites, and you just posted errors you got when trying to run them. If you're not going to actually learn how to write the code, then you really should hire someone to write it for you.
we are already populating the data object when parsing the html, - well that said i think that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at the end as written. we are getting the data from the parsing process.
so we can go like this:
Code:
import urllib
import urlparse
import re
# import peewee
import json
from peewee import *
#from peewee import MySQLDatabase ('cpan', user='root',passwd='rimbaud')
db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
class User(Model):
name = TextField()
cname = TextField()
email = TextField()
url = TextField()
class Meta:
database = db # this model uses the cpan database
User.create_table() #ensure table is created
url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
alk = urlparse.urljoin(url, lk)
data = { 'url':alk, 'name':name, 'cname':capname }
phtml = urllib.urlopen(alk).read()
memail = re.search('<a href="mailto:(.*?)">', phtml)
if memail:
data['email'] = memail.group(1)
# data = json.load('email') #your json data file here
for entry in data: #assuming your data is an array of JSON objects
user = User.create(name=entry["name"], cname=entry["cname"],
email=entry["email"], url=entry["url"])
user.save()
note : i disabled the line in the code:
Code:
# data = json.load('email') #your json data file here
doing so i get ahead and run the code:
Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
File "cpan_100.py", line 27, in <module>
User.create_table() #ensure table is created
File "build/bdist.linux-i686/egg/peewee.py", line 3078, in create_table
File "build/bdist.linux-i686/egg/peewee.py", line 2471, in create_table
File "build/bdist.linux-i686/egg/peewee.py", line 2414, in execute_sql
File "build/bdist.linux-i686/egg/peewee.py", line 2283, in __exit__
File "build/bdist.linux-i686/egg/peewee.py", line 2406, in execute_sql
File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
peewee.OperationalError: (1050, "Table 'user' already exists")
martin@linux-70ce:~/perl>
WELL - it seems to be clear that i have some other issues now...
Last edited by sayhello_to_the_world; 09-02-2014 at 04:31 PM.
hello dear TBone
tx for the hints and all your support!
And as you've been told MANY times, SPELL OUT YOUR WORDS, and don't use text speak.
Quote:
we are already populating the data object when parsing the html, - well that said i think that we can omit the data = json.load('email') line and simply access the data object directly in the for loop at the end as written. we are getting the data from the parsing process. so we can go like this:
Code:
import urllib
import urlparse
import re
# import peewee
import json
from peewee import *
#from peewee import MySQLDatabase ('cpan', user='root',passwd='rimbaud')
db = MySQLDatabase('cpan', user='root',passwd='rimbaud')
class User(Model):
name = TextField()
cname = TextField()
email = TextField()
url = TextField()
class Meta:
database = db # this model uses the cpan database
User.create_table() #ensure table is created
url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
alk = urlparse.urljoin(url, lk)
data = { 'url':alk, 'name':name, 'cname':capname }
phtml = urllib.urlopen(alk).read()
memail = re.search('<a href="mailto:(.*?)">', phtml)
if memail:
data['email'] = memail.group(1)
# data = json.load('email') #your json data file here
for entry in data: #assuming your data is an array of JSON objects
user = User.create(name=entry["name"], cname=entry["cname"],
email=entry["email"], url=entry["url"])
user.save()
note : i disabled the line in the code:
Code:
# data = json.load('email') #your json data file here
doing so i get ahead and run the code:
Code:
martin@linux-70ce:~/perl> python cpan_100.py
Traceback (most recent call last):
File "cpan_100.py", line 27, in <module>
User.create_table() #ensure table is created
File "build/bdist.linux-i686/egg/peewee.py", line 3078, in create_table
File "build/bdist.linux-i686/egg/peewee.py", line 2471, in create_table
File "build/bdist.linux-i686/egg/peewee.py", line 2414, in execute_sql
File "build/bdist.linux-i686/egg/peewee.py", line 2283, in __exit__
File "build/bdist.linux-i686/egg/peewee.py", line 2406, in execute_sql
File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
peewee.OperationalError: (1050, "Table 'user' already exists")
martin@linux-70ce:~/perl>
WELL - it seems to be clear that i have some other issues now...
Right, because you aren't passing the right arguments to the functions, since you commented it out, rather than fixing the syntax problem(s). You've asked about how to do this in perl before, and now python. You've directly copied programs from others...haven't you tried to read/understand the code, so that YOU can write it, rather than asking us to customize a program for you?
right said - you re right: All your preliminary thoughts and ideas are not bad - sure thing.
you convinced me in doing more work to find out the issues. I for now figure out what goes wrong with the database connection.
Code:
File "cpan_100.py", line 27, in <module>
User.create_table() #ensure table is created
File "build/bdist.linux-i686/egg/peewee.py", line 3078, in create_table
File "build/bdist.linux-i686/egg/peewee.py", line 2471, in create_table
File "build/bdist.linux-i686/egg/peewee.py", line 2414, in execute_sql
File "build/bdist.linux-i686/egg/peewee.py", line 2283, in __exit__
File "build/bdist.linux-i686/egg/peewee.py", line 2406, in execute_sql
File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
right said - you re right: All your preliminary thoughts and ideas are not bad - sure thing.
you convinced me in doing more work to find out the issues. I for now figure out what goes wrong with the database connection.
i will come back and let you know all my findings
Yes...right...like you've said several times before, after you've been "convinced" to show some effort, but don't ever actually seem to DO it. Just like you've never come back and posted your work or solutions.
Based on your posting history, I just don't believe you.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.