LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-19-2019, 12:13 PM   #1
hedron
Member
 
Registered: Jul 2009
Location: NYC
Distribution: Slackware64-multilib 14.2, SARPI
Posts: 322

Rep: Reputation: 30
Why do writes to SQL databases take so long?


Granted, the files I'm writing total about 40GB or so and there are about a million of them, so I expected a long wait. But I'm projecting it's going to take 2 whole weeks to finish. Now I'm using the COPY command, which is much quicker than reading from the CSV file and then INSERT, but this is ridiculous.

Is the fact I'm using a python script a factor? I know python loops are very slow, especially with large data sets. Basically my script is:

Code:
for FILE in csv_dir:
    TABLE = re.findall( "^[a-z]+", FILE )
    DB( **config ).query( '''copy "{}" ( col1, col2, col3, col4 ) from '{}' delimiter ',' csv header;'''.format( TABLE, FILE )
...or so. That's the meat of it.
 
Old 07-19-2019, 01:23 PM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,795
Blog Entries: 15

Rep: Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645Reputation: 1645
Quote:
Originally Posted by hedron View Post
there are about a million of them
It often takes far more time to process thousands of small files than a couple of very large ones that take up more space because of the directory structures that have to be accounted for.

Have you tried moving, say 5 of them, to a separate directory (not a subdirectory of the one you're in) and check to see how long it takes to process that subset per file as opposed to how long it is taking per file from your directory with millions of files?
 
2 members found this post helpful.
Old 07-19-2019, 02:14 PM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: Slackware
Posts: 9,030

Rep: Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899Reputation: 3899
If you have an index on that column, that's going to add a lot to the time.
 
3 members found this post helpful.
Old 07-19-2019, 02:42 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 21,673

Rep: Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734
Quote:
Originally Posted by dugan View Post
If you have an index on that column, that's going to add a lot to the time.
Yep...can always drop the index, load the data, then put the index back. Or load to a temporary table, and copy things in directly.

There are also things in play like where the database is, and where your import file is....if you've got everything on ONE hard drive spindle, that's far different than reading from one and writing to another. Personally, I'd use the MySQL csv import function directly, and skip Python. Direct load should be faster.
 
3 members found this post helpful.
Old 07-19-2019, 02:43 PM   #5
individual
Member
 
Registered: Jul 2018
Posts: 231

Rep: Reputation: 175Reputation: 175
Adding on to what others have suggested, have you tried using transactions, i.e. disabling auto commit and only commiting after N-records? Of course, with that amount of data it's going to take some time.

Quote:
Originally Posted by TB0ne View Post
Personally, I'd use the MySQL csv import function directly, and skip Python. Direct load should be faster.
Does MySQL only handle "well-formed" csv?

Last edited by individual; 07-19-2019 at 02:45 PM.
 
Old 07-19-2019, 02:48 PM   #6
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 21,673

Rep: Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734
Quote:
Originally Posted by individual View Post
Adding on to what others have suggested, have you tried using transactions, i.e. disabling auto commit and only commiting after N-records? Of course, with that amount of data it's going to take some time.

Does MySQL only handle "well-formed" csv?
Yep..needs to be clean and consistent, but it *IS* faster.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Aboout "with very long lines",how long is very long? yun1st Linux - Newbie 4 07-20-2012 03:38 PM
long long long: Too long for GCC Kenny_Strawn Programming 5 09-18-2010 01:14 AM
sendmail and NIS databases (aliases, mail.aliases) - what kind of databases? cotton213 Linux - Software 0 03-14-2006 05:57 PM
limiting users sql databases amadkow Linux - General 1 08-09-2005 11:05 AM
Sql databases CurlyMo Linux - General 5 02-12-2003 04:31 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration