ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Ok, I wasn't really sure where to post this, but I'm using pygresql in a python script, so it's sort of programming.
I have to update 10s of thousands of entries and it just takes forever. Think days. I started the script last night and it's about 15% of the way through. It's just a simple for loop with a one line update command.
The postgresql server is a virtual machine in which the VDI file is located on a USB hard drive.
Is there a particular reason it's slow? Maybe the print statements slow it down? It prints the command once a loop so I know how far along it is.
Ok, I wasn't really sure where to post this, but I'm using pygresql in a python script, so it's sort of programming.
I have to update 10s of thousands of entries and it just takes forever. Think days. I started the script last night and it's about 15% of the way through. It's just a simple for loop with a one line update command.
The postgresql server is a virtual machine in which the VDI file is located on a USB hard drive.
Is there a particular reason it's slow? Maybe the print statements slow it down? It prints the command once a loop so I know how far along it is.
You say "postgresql server is a virtual machine in which the VDI file is located on a USB hard drive." Is your script making remote connection to DB?
USB 2.0 or 3.0?
Do you have to do update one row at a time?
How is the load on VM and the host (memory/CPU usage, and disk IO)?
The print is potentially a bottle-neck, but the query you are using is also the first place to look for improving the performance.
Tens of thousands of records should not be a problem for any RDBMS if the data model and queries are well considered.
You don't provide any clues as to how complex the update may be, how many tables are invloved or how records are identified, does it require subqueries, etc... That would be helpful information.
My first question would be whether it is actually necessary to update each record or row in a scripted loop at all? Doing one query per row is not very efficient unless it is really necessary.
Can you not rewrite the query to update all or multiple records and allow the RDBMS to optimize it once and do what it does best?
The print is potentially a bottle-neck, but the query you are using is also the first place to look for improving the performance.
Tens of thousands of records should not be a problem for any RDBMS if the data model and queries are well considered.
You don't provide any clues as to how complex the update may be, how many tables are invloved or how records are identified, does it require subqueries, etc... That would be helpful information.
My first question would be whether it is actually necessary to update each record or row in a scripted loop at all? Doing one query per row is not very efficient unless it is really necessary.
Can you not rewrite the query to update all or multiple records and allow the RDBMS to optimize it once and do what it does best?
Code:
cmd = '''update "{}" set close = {} where time = '{}';'''.format( t, i['avg'], i['time'] )
print( cmd )
db.query( cmd )
print()
That's what's in the for loop that's taking days. It's doing what it's supposed to do, otherwise.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.