LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-03-2007, 01:02 AM   #1
saumitra
LQ Newbie
 
Registered: Mar 2007
Posts: 9

Rep: Reputation: 0
PostgreSQL database replication just by transfering/mirroring the files


i have some queries about replication of data from one postgresql.. but by some different approch.

as a small summry of my project,

I am working on a project of replication of data. and I have done with kernel module programming in kernel 2.6 that has two machines A and B, when i update any file(in whole directory tree) on some specified directory on machine A, my programs updates the same file on machine B..

(on each write system call on machine A, the difference in new file and old file is patched on machine B)



So, now i can have my PostgreSQL database on some directory say /usr/share/data (on machine A)

and have same on machine B initially.



now what i want to do is replicate the changes made by machine A to B.



so i started my program in this situations by passing whole directory.

1> stopped postgres on B

2> updated on A

3> started postgres on B

4> checked database on B IT WAS UPDATED..



now just problem is, the updation is taking much time.. I WANT THAT TO EXECUTE FASTER.



so can i AVOID replication of SOME FILES?? like log files etc?

like 000010000000 file in pg_xlog its 16MB and taking too much time for patching.

Or should i replicate only files that are in ..../base/ directory?



(currently for checking my code i'm using diff and patch, which i know are not effecient ways..i'll implement those algorithms later)
 
Old 03-03-2007, 05:01 AM   #2
fukawi2
Member
 
Registered: Oct 2006
Location: Melbourne, Australia
Distribution: ArchLinux, ArchServer, Fedora, CentOS
Posts: 448

Rep: Reputation: 34
You might get better answers on the Postgres mail list @ www.postgresql.org

They're very helpful there
 
Old 03-03-2007, 02:27 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,004
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by saumitra
i have some queries about replication of data from one postgresql.. but by some different approch.

as a small summry of my project,

I am working on a project of replication of data. and I have done with kernel module programming in kernel 2.6 that has two machines A and B, when i update any file(in whole directory tree) on some specified directory on machine A, my programs updates the same file on machine B..

(on each write system call on machine A, the difference in new file and old file is patched on machine B)



So, now i can have my PostgreSQL database on some directory say /usr/share/data (on machine A)

and have same on machine B initially.



now what i want to do is replicate the changes made by machine A to B.



so i started my program in this situations by passing whole directory.

1> stopped postgres on B

2> updated on A

3> started postgres on B

4> checked database on B IT WAS UPDATED..



now just problem is, the updation is taking much time.. I WANT THAT TO EXECUTE FASTER.



so can i AVOID replication of SOME FILES?? like log files etc?

like 000010000000 file in pg_xlog its 16MB and taking too much time for patching.

Or should i replicate only files that are in ..../base/ directory?



(currently for checking my code i'm using diff and patch, which i know are not effecient ways..i'll implement those algorithms later)
Even if certain files could be omitted, you'd face two
basic issues w/ that approach.
1) Table data and index for on table will live in
on file each; they will have a certain size, and
the bigger they get, the longer the replication will
take. Your issue is that you have to transfer the whole
file even for a minute change. Or are you taking binary
diffs when you refer to patching?
2) Whether you take diffs or not, the second instance
of Postgres may try to write back local changes while
you're updating the file. You'll run into concurrency
issues. My suggestion would be that you have a close
look at only transferring the WAL and tackling it from
there.

All that said: I think asking on Postgres' General
mailing list would be a marvelous idea.


Cheers,
Tink
 
Old 03-04-2007, 11:27 PM   #4
saumitra
LQ Newbie
 
Registered: Mar 2007
Posts: 9

Original Poster
Rep: Reputation: 0
yes i'm taking binary difference.

the replication works something like this..

i update something on machine A

1> WAL updated.(.../pg_xlog/00000100000)


2> (after about 2 min) .../base/<database number> gets updated


3> (after about 5 min) WAL updated (.../pg_xlog/00000100000)


4> ../pg_clog/0000 updated


5> ../global/pg_control gets updated.

after all this i can see the database updated!

how can i see immediate results?
 
Old 03-04-2007, 11:56 PM   #5
fukawi2
Member
 
Registered: Oct 2006
Location: Melbourne, Australia
Distribution: ArchLinux, ArchServer, Fedora, CentOS
Posts: 448

Rep: Reputation: 34
Is there a reason you can't use a tried and tested method such as "slony" to achieve this?
 
Old 03-07-2007, 02:16 AM   #6
saumitra
LQ Newbie
 
Registered: Mar 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by fukawi2
Is there a reason you can't use a tried and tested method such as "slony" to achieve this?

ya but i want to do the database replication on File System Level..
in my project, i replicate any file on hard disk that is updated.

so i want to apply my project to replication of PostgreSQL..

and till now.. i can replicate database.. but its getting LAG of 2-3 Seconds due to big log file in "xlog".. so how can i avoid this time?
 
Old 03-07-2007, 06:35 AM   #7
fukawi2
Member
 
Registered: Oct 2006
Location: Melbourne, Australia
Distribution: ArchLinux, ArchServer, Fedora, CentOS
Posts: 448

Rep: Reputation: 34
That's the best you're going to get, short of increasing your hard drive and network performance, at a hardware level - your bottleneck is the time it takes to transfer the file from one host to the other correct?

There are some things that a best left to the "traditional" methods...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Postgresql- non-savable 'sandbox' database? mschrank99 Programming 4 01-04-2007 08:31 PM
LXer: How To Set Up Database Replication In MySQL LXer Syndicated Linux News 1 06-23-2006 09:53 AM
database replication software test111 Linux - Software 1 08-30-2005 09:02 PM
Transfering a large MySql database E-Oreo Programming 3 06-04-2004 08:15 AM
Transfering data from MS Access to PostgreSQL dmg2206 Linux - General 0 03-27-2002 02:51 PM


All times are GMT -5. The time now is 04:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration