LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Solution for File Redundancy between Multi-OS WS, nix File Server, and NAS appliance? (https://www.linuxquestions.org/questions/linux-software-2/solution-for-file-redundancy-between-multi-os-ws-nix-file-server-and-nas-appliance-655566/)

Doom0r 07-14-2008 12:11 AM

Solution for File Redundancy between Multi-OS WS, nix File Server, and NAS appliance?
 
Hi everybody. After 4 years, search isn't giving me enough of an answer here, so I think it's time for my first post.

As the subject states, I'm looking for a file redundancy solution (mirroring more or less) that meets a few criteria. This will mostly be for a media collection. The idea is triplication as both backup and availability solutions.

A separate backup solution will run alongside it for files that I wish to keep different versions of (rsync or svn perhaps, maybe locally and mirrored to the file server), such as code(just starting to learn), graphical projects, and the like as I get into more things.

Hardware:

1. Workstation: Multi-OS (XP, nix, others maybe) core2duo box which will most likely use NTFS as the shared FS (FS shouldn't be relevant, but for the sake...)

2. File Server: Multi-purpose server running Debian testing ATM (P3-800, IDE drives)

3. NAS: NAS appliance, LinkStation Live V2 to be specific, which *may* be changed to an alternate embedded nix or a more complete Debian install, however currently supports SMB and FTP

Ideal goals:

1. Most content will be created on Workstation such as ripping my new CD's to FLAC using EAC or cdparanoia. This will also serve as normal local source for this Workstation.

2. Content should be mirrored to File Server and be accessible such as the name implies: able to be served off, backup is not the sole intent here. A LinuxMCE box will get files from this eventually.

3. NAS will get a copy of the media at least and possible backup content. So, at a minimum for the initial layout I need a method to update an embedded device (small cpu) with the content the File Server gets, but not necessarily all. NAS will mostly be there simply as a triplicate in the solution, but will also serve as a fail-over for the File Server, requiring files to be directly accessible, not locked into a backup scheme.

Problems in finding a solution:

1. Anything that needs to run on the Workstation should be as OS-agnostic or developed enough for most any OS as possible. Windows, nix, hopefully BSD should be supported, with ideally OSX and Solaris also.

2. The method should have a way to deal with changes and deletions, such as Unison does, in a manner that you are notified and have the option to decide the appropriate action. This should be on-demand/able to be scheduled as this stuff won't constantly change (when I'm satisfied with a CD rip, I'll sync it). For stuff I need constant version control, I can implement that elsewhere.

This should not take excessively long for normal syncing conditions as we have 11k+ files (~100GB) already and will certainly grow. Counter-point in #3.

3. There needs to be a method to periodically check integrity of duplicates/originals. This is to ensure that if are replicating system doesn't check file integrity during normal operation we can insure that files haven't changed by some manner.

This is to both make sure that we haven't missed a file that has changed and we don't know (i.e. no access timestamps) and the possibility of data corruption/bad sectors.

4. NAS device needs a way to receive it's copy without being an overly taxing process. Yet, just as in point issue 3 we need to periodically insure this copy is perfect. Periodic, low-load checks on portions of the files (or replacement) would be better than a full hard non-stop check. Keep in mind that this is a MIPS(sel) embedded platform and we don't want to turn it into a mini oven. ;x Also, by default, this is simply an FTP or SMB share to us.

5. Scalability should be available to cover at least the normal expectations of the Home Network model. In other words, we may add varying storage devices and/or systems into the mix over time. Failed devices and/or systems should be expected to never be replaced by exact hardware, but rather what's available to me at the moment.

Summary

I would like to be able to use file redundancy as both a backup and fail-over/availability solution in a mixed hardware environment. Original source is multi-OS, second copy goes to Debian server, third copy goes to LinkStation Live NAS device. Updates/syncing should be on-demand primarily, scheduling as a direct or indirect (cron) option. NAS device normally doesn't need immediate push, but it should be available. And, finally, we want to check all our data integrity periodically.

I simply haven't been able to find much info about mirroring/redundancy in general other than things like distributed file systems over raid, let alone simply mirroring info across multiple platforms and checking integrity while making all copies available at any moment.

Open-minded, don't be afraid to shoot an idea at me.

Thanks,
Lenard

emi_ramo 07-14-2008 04:40 AM

There are lots of tools on Linux that may help you doing mirrors, as lftp, for example (su -c apt-get install lftp ; man lftp). It will look to timestamps to decide if files needs to be redownloaded. With so, you'll be able to sync filesystems just making a simple bash script and calling it by cron or shell (on demand). I guess NAS has a Linux system on it, and so your Debian server. If not sure or as alternative, you can also add some little file with the last syncing date to be able to sync files older than that (using find, for example).

You can also let your workstation serving its files over Samba, so server will be able to access them and mirror if necessary/demanded (just mounting SAMBA filesystem and syncing both).

I don't know too much about M$ solutions for that, but sure you can find out some free ftp client program that does the same (mirroring via FTP and capable to catch shell options so you can automate the mirroring action, but not needed at all: everything may be done directly over Linux).

After mirroring is done, serving may be done by Samba and/or NFS. I suppose SMB is better for M$ and worst than NFS in speed and robustness.

If some server (Debian, NAS) goes down, you'll still have the other on a separate SAMBA and FTP host.

Whenever you want to sync, you execute a script (for example, on Debian) that will sync its files with SAMBA Workstation files (mounting SAMBA and doing find+cp stuff, in both directions). Then, it will connect via SSH to NAS and autocall the syncing script. Once done, you'll have synced everything!!

I know there are more mirroring specific solutions as rsync, but I've never used it. Just remember it can do the find+lftp job.

Am I pissing out of box? (hehe, this is a spanish literal translation that means: Have I understood your question?)

emi

Doom0r 07-14-2008 06:22 PM

FTP is certainly an option, however my questions would be, which is more robust for permissions and which has better performance?

I have no experience with SMB or NFS, so I don't know how it stacks up against FTP. That's certainly one thing I should have looked into already. Still, features will limit choices. Timestamps will not be enough to decide if a file should be updated or checked however. NTFS, if I remember correctly does not update timestamps on many actions, leaving you unable to determine if a file, or its metadata has changed. This would be enough for files that have had their content modified, or have been rewritten with the same file name and different content.

I would still need some checksum style system to compare all files every once in a while to make sure none are corrupt. This would have to be added on top of a simple difference copy system.

The NAS would have to have it's OS changed before being able to run much on it. It's running on a Linux kernel, but it's a limited environment currently, but this *could* easily be changed to give it a little larger set of commands or even a near complete distro. Without changing it, we would need to run all checks from a different machine, which would be too much bandwidth to do normally. This would be fine for the periodic complete integrity check, though. Since this is just replicating the changes made to the Debian file server, if we have a solution to repeat those changes to it, that would be ideal.

We could, also, choose to use the currently embedded tools to copy the changes to itself... that is if it can do that. I will have to look more into that; I know it does backup processes, but I don't know if it will mirror to itself and do an integrity check.

You're definitely "on the right path", but I would like to use a more developed tool along the lines of Unison that can display options for me to decide what to do when it finds a difference, but not spend too long on every *normal* sync to see what needs to change. This would take a lot of work to implement into a script. Some sort of a more simple copy via FTP or SMB method as you described would probably be best for the NAS device.

Always open to thoughts.

Thanks,
Lenard

emi_ramo 07-15-2008 05:52 AM

I don't know what Unison is. You should look at 'man lftp', that's pretty interesting. You also have multihreading download with pget (increases speed magically :); very accurate mirroring options: by name regex, by date, by size; print copied or not-copied files; change file permissions (if server supports), etc

You decide :D
emi

Doom0r 07-15-2008 02:31 PM

Quote:

Originally Posted by emi_ramo (Post 3214963)
I don't know what Unison is.

http://www.cis.upenn.edu/~bcpierce/unison/

Quote:

Originally Posted by emi_ramo (Post 3214963)
You should look at 'man lftp', that's pretty interesting. You also have multihreading download with pget (increases speed magically :); very accurate mirroring options: by name regex, by date, by size; print copied or not-copied files; change file permissions (if server supports), etc

This still is just a transfer mechanism as it doesn't do any checksum validation. Unfortunately size and modification dates are not enough on a filesystem that doesn't update on metadata changes which don't change a file size or update access time. =\

Still need something more... anyone?

emi_ramo 07-15-2008 03:28 PM

And why not Unison? You've got it in Debian repositories and it seems that works just fine. You'll probably add some tunneling to be able to work with its ssh connection, but seems good. Isn't it?

Doom0r 07-15-2008 08:34 PM

Well, Unison requires the same version across all devices and operating systems. If your windows version was hypothetically 2.5.3, your linux version would have to be a corresponding 2.5.3. Secondly, it's no longer in development. And of most importance, this will not work for the NAS device without changing its OS and even then I'm not sure of any complications.


All times are GMT -5. The time now is 03:41 AM.