LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Dumping MySQL (https://www.linuxquestions.org/questions/linux-software-2/dumping-mysql-944282/)

Skaperen 05-10-2012 03:18 PM

Dumping MySQL
 
I see that mysqldump outputs the databases as SQL to re-create. That might make sense. But I wonder if that scales. If I have several terabytes of data, and this several terabytes of SQL dumped, and need to restore this, does it do this without needing several terabytes of RAM to cache the SQL, or even several terabytes of extra disk space above and beyond the space it will be creating databases in?

I guess if it can handle SQL as a serialized data stream, which it is just deserializing, that could work. SQL just doesn't seem to be a language that can do that very easily. It seems to me that some serializing format like JSON (not necessarily the exact same as JSON) would be better.

Kustom42 05-10-2012 03:31 PM

It will probably kill your system on I/O with that much data trying to be read by MySQL but it should complete(maybe next year) as long as you are doing it via the mysql CLI in SSH and not trying to use something like PHPMyAdmin.

I have never heard of anyone with that much data inside of MySQL, thats pretty nuts.

Kustom42 05-10-2012 03:32 PM

You can also look at using the 'quick' option for mysql CLI as it will read one row of data at a time as opposed to trying to read an entire table structure at a time(MySQL Default).

Skaperen 05-10-2012 03:45 PM

Quote:

Originally Posted by Kustom42 (Post 4675380)
It will probably kill your system on I/O with that much data trying to be read by MySQL but it should complete(maybe next year) as long as you are doing it via the mysql CLI in SSH and not trying to use something like PHPMyAdmin.

I have never heard of anyone with that much data inside of MySQL, thats pretty nuts.

So Postgres instead?

chrism01 05-10-2012 08:18 PM

I think you can tune mysqldump to create multi-row inserts, saving on disk space. I don't think it puts everything in memory, then to disk, I think it streams in/out to/from disk.
For storage most people pipe through gzip/bzip2.
Also, consider mysqlhotcopy for ISAM or archive files, otherwise you might need the 'MySQL Enterprise Backup' for Innodb https://dev.mysql.com/doc/mysql-back...p-methods.html

Skaperen 05-11-2012 11:45 AM

Quote:

Originally Posted by chrism01 (Post 4675550)
I think you can tune mysqldump to create multi-row inserts, saving on disk space. I don't think it puts everything in memory, then to disk, I think it streams in/out to/from disk.
For storage most people pipe through gzip/bzip2.
Also, consider mysqlhotcopy for ISAM or archive files, otherwise you might need the 'MySQL Enterprise Backup' for Innodb https://dev.mysql.com/doc/mysql-back...p-methods.html

I do use "xz -9" on my dumps. Saves more space than gzip or bzip2 do.

I would like to see a tool that is the SQL equivalent of rsync. It would be nice if it could work across different database types (for example, to do a one time sync of all the tables of one database in MySQL over to a specified database in Postgres).

chrism01 05-13-2012 08:45 PM

I think finding an existing tool for that would be unlikely (but not impossible; try sourceforge)
However, I have written similar stuff like that in the past for data take-ons etc using Perl.
You can open multiple db handles to different dbs/schemas simultaneously, then ETL (Extract / Transform / Load) the data from one to the other.
I've even had ones that run continuously, sort of like an rsync if you like... :)


Quote:

do use "xz -9" on my dumps. Saves more space than gzip or bzip2 do.
Just out of curiosity, did you compare like with like ie gzip -9, bzip2 -9, xz -9; the results tend to differ depending on the exact data.

Skaperen 05-16-2012 08:16 PM

Quote:

Originally Posted by chrism01 (Post 4677618)
I think finding an existing tool for that would be unlikely (but not impossible; try sourceforge)
However, I have written similar stuff like that in the past for data take-ons etc using Perl.
You can open multiple db handles to different dbs/schemas simultaneously, then ETL (Extract / Transform / Load) the data from one to the other.
I've even had ones that run continuously, sort of like an rsync if you like... :)



Just out of curiosity, did you compare like with like ie gzip -9, bzip2 -9, xz -9; the results tend to differ depending on the exact data.

I did only a simple compression test. xz worked best, though slower. From experience with these compression methods, I expected xz to win on space and gzip to win on time. And that is how it turned out. Most cases are like that.


All times are GMT -5. The time now is 07:13 PM.