LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   postgresql update slow` (https://www.linuxquestions.org/questions/programming-9/postgresql-update-slow%60-4175650567/)

hedron 03-20-2019 01:20 PM

postgresql update slow`
 
Ok, I wasn't really sure where to post this, but I'm using pygresql in a python script, so it's sort of programming.

I have to update 10s of thousands of entries and it just takes forever. Think days. I started the script last night and it's about 15% of the way through. It's just a simple for loop with a one line update command.

The postgresql server is a virtual machine in which the VDI file is located on a USB hard drive.

Is there a particular reason it's slow? Maybe the print statements slow it down? It prints the command once a loop so I know how far along it is.

dc.901 03-20-2019 01:29 PM

Quote:

Originally Posted by hedron (Post 5975909)
Ok, I wasn't really sure where to post this, but I'm using pygresql in a python script, so it's sort of programming.

I have to update 10s of thousands of entries and it just takes forever. Think days. I started the script last night and it's about 15% of the way through. It's just a simple for loop with a one line update command.

The postgresql server is a virtual machine in which the VDI file is located on a USB hard drive.

Is there a particular reason it's slow? Maybe the print statements slow it down? It prints the command once a loop so I know how far along it is.

You say "postgresql server is a virtual machine in which the VDI file is located on a USB hard drive." Is your script making remote connection to DB?
USB 2.0 or 3.0?
Do you have to do update one row at a time?
How is the load on VM and the host (memory/CPU usage, and disk IO)?

astrogeek 03-20-2019 01:38 PM

The print is potentially a bottle-neck, but the query you are using is also the first place to look for improving the performance.

Tens of thousands of records should not be a problem for any RDBMS if the data model and queries are well considered.

You don't provide any clues as to how complex the update may be, how many tables are invloved or how records are identified, does it require subqueries, etc... That would be helpful information.

My first question would be whether it is actually necessary to update each record or row in a scripted loop at all? Doing one query per row is not very efficient unless it is really necessary.

Can you not rewrite the query to update all or multiple records and allow the RDBMS to optimize it once and do what it does best?

hedron 03-20-2019 02:15 PM

Quote:

Originally Posted by astrogeek (Post 5975922)
The print is potentially a bottle-neck, but the query you are using is also the first place to look for improving the performance.

Tens of thousands of records should not be a problem for any RDBMS if the data model and queries are well considered.

You don't provide any clues as to how complex the update may be, how many tables are invloved or how records are identified, does it require subqueries, etc... That would be helpful information.

My first question would be whether it is actually necessary to update each record or row in a scripted loop at all? Doing one query per row is not very efficient unless it is really necessary.

Can you not rewrite the query to update all or multiple records and allow the RDBMS to optimize it once and do what it does best?

Code:

cmd = '''update "{}" set close = {} where time = '{}';'''.format( t, i['avg'], i['time'] )
print( cmd )
db.query( cmd )
print()

That's what's in the for loop that's taking days. It's doing what it's supposed to do, otherwise.


EDIT: Yes, it was the print statement.

NevemTeve 03-21-2019 06:20 AM

Let's hope you have an index on field 'time'
Also you could try to use bind variables.


All times are GMT -5. The time now is 06:21 AM.