LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Blogs > Musings on technology, philosophy, and life in the corporate world
User Name
Password

Notices


Hi. I'm jon.404, a Unix/Linux/Database/Openstack/Kubernetes Administrator, AWS/GCP/Azure Engineer, mathematics enthusiast, and amateur philosopher. This is where I rant about that which upsets me, laugh about that which amuses me, and jabber about that which holds my interest most: *nix.
Rate this Entry

PostgreSQL 8.x backup caveat

Posted 03-24-2011 at 09:48 AM by rocket357
Updated 03-29-2011 at 07:58 AM by rocket357

During the deployment of our latest software revision (a major version bump that migrated all of our databases from MSSQL to Linux/PostgreSQL and migrated all of our web servers from Apache/PHP to Tomcat/GWT), we started hitting snags in IT operations immediately with database backups. When this major version revision came out, it allowed customers to store documents, photos, etc...directly in the database (something MSSQL 2005 isn't capable of). Doing a simple "pg_dump <opts> <dbname>" resulted in a massive wait that for larger customer databases could easily top 10+ hours for the backup operation to complete.

The solution was to break the backup up into four separate files...one for the schema layout, one for a table that contained photos (COPY ... WITH BINARY), one for another table that contained documents (COPY ... WITH BINARY), and one final one for all of the other data (excluding the two tables). Suddenly backups are taking mere minutes. Restores are a bit trickier, as you have to restore the schema, then data, then binary data (in that order), but that's nothing a shell script can't accomplish.

So it raised an interesting question. Why did the straight pg_dump commands take forever to complete?

The answer is that when you store binary data as text, you have to represent "unprintable characters" somehow so there is no data loss. The way this is typically accomplished is called base64 encoding. I won't go into terribly intricate detail here, but basically what base64 encoding does is split the output up into a "stream" of 6 bit blocks, and then each 6 bit block is mapped to a printable character (which takes 8 bits). There's roughly a 33% increase in output size, and processing it takes time...a lot of time. If that sounds inefficient, well, it is.

So remember, if your schema includes binary data such as photos or office documents, split the tables containing the binaries off with a COPY...WITH BINARY command. You'll save yourself a ton of time and storage space.
Views 1902 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 07:08 PM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration