LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   General (https://www.linuxquestions.org/questions/general-10/)
-   -   Dealing with backups from 2 data hoarders.... (https://www.linuxquestions.org/questions/general-10/dealing-with-backups-from-2-data-hoarders-4175665339/)

Basslord1124 12-03-2019 08:33 AM

Dealing with backups from 2 data hoarders....
 
So, the upgrade of my new Debian file server is coming to a close...and I have begun the data transfer process. During this process I figured it would be a good idea to try and organize some of the files so at least there is some sort of structure to it AND to go through some of the files so can I keep what's important and remove what's not.

I am dealing basically with my data AND my wife's data going back the last 10 years or so. I am finding that we are both quite the data hoarders...or at least we both went through the phase where we downloaded everything in sight...music, books, movies, etc. Also with me being in IT, having multiple computers, and not having a RAID system in place (still don't BUT plan to after our tax season), I have backups and copies of files everywhere. And I'm sure there are backups to user computers that I have worked on over the years! :eek: So it's pretty overwhelming.

Early next year (March-April) I will be purchasing at least two 3-6TB hard drives to use for file storage in a RAID mirror. While I am ok on space for now, at least with that setup I won't have to worry about space for a while.

So anyways, that's my current situation. About to get back to sifting through files/folders again and move things around and/or delete stuff. Anybody else go through years and years worth of data to try and organize it? Any tips you want to share? Words of encouragement? :)

rtmistler 12-03-2019 08:56 AM

More space which is the first thing you're doing in the next few months.

As far as determining what is needed versus not, I've relegated very old stuff to something like a USB flash drive in the 1-2 TB range (provided it fits), and disconnect that leaving it in a drawer. If no one barks, then there's the answer, but the data is stored. I have no problem obtaining several TB sized flash drives for low cost, filling them up and then putting them in a drawer. Haven't yet gotten to the point where they don't work, but perhaps that is a very long, long term consideration. It also depends largely on whether or not you expect to use things like old pictures and videos. Experience shows that in times of prominent change, such as marriages, deaths, or big life events, people do look to find that data so they can make collages with it.

michaelk 12-03-2019 10:28 AM

Maybe a bit off topic but just a reminder that RAID is not a backup.

SSDs and flash drives are nice but I don't consider them good for long term storage. You can't put them in a drawer and forget about them.

syg00 12-03-2019 04:17 PM

Give more space to a data hoarder ? Guess what'll happen. Guess how I know.

I second the RAID warning above - think about what happens when you accidently do "rm -rf Photos". But at least you'll have some redundancy - for a while .... :p
The issue I have is multiple copies. Everywhere. Some are deliberate, some just happen. There are a bunch of duplicate finder tools available - GUI and cli; I use the latter. The problem with all of them is I have never trusted the automatic deletion option. I have photos, some of which have been editted. Which one(s) to keep is an interesting discussion I continually have with myself - and sometimes herself. Allowing a tool to just delete all but the oldest (or newest) just isn't on. They can usually produce a list you can then parse, but again it ain't always obvious.

tl;dr - anyway you cut it, you've got some legwork ahead of you.

Basslord1124 12-04-2019 08:30 AM

Quote:

Originally Posted by rtmistler (Post 6064265)
More space which is the first thing you're doing in the next few months.

As far as determining what is needed versus not, I've relegated very old stuff to something like a USB flash drive in the 1-2 TB range (provided it fits), and disconnect that leaving it in a drawer. If no one barks, then there's the answer, but the data is stored. I have no problem obtaining several TB sized flash drives for low cost, filling them up and then putting them in a drawer. Haven't yet gotten to the point where they don't work, but perhaps that is a very long, long term consideration. It also depends largely on whether or not you expect to use things like old pictures and videos. Experience shows that in times of prominent change, such as marriages, deaths, or big life events, people do look to find that data so they can make collages with it.

I like the flash drive approach...and when I step back and look, I have sort of done that a little bit. I have put some backups on flash drives just to have a copy of files in another location. Sometimes I'd just do it on a whim.

Quote:

Originally Posted by michaelk (Post 6064277)
Maybe a bit off topic but just a reminder that RAID is not a backup.

SSDs and flash drives are nice but I don't consider them good for long term storage. You can't put them in a drawer and forget about them.

In this case, it will be a mirror of the drive in case one fails that I still have a full copy until the RAID can be rebuilt. I like to sort of see a mirror as a constant running backup unless both drives fail at the same time which I don't know how often something like that would occur. Despite using the RAID though, I will still be backing up periodic copies to other locations in the event of any kind of failures.

What really is a good long term solution though? SSDs and flash drives can only take so many writes, hard drives can have mechanical failure, and I would imagine CD/DVD media can deteriorate as well.

Quote:

Originally Posted by syg00 (Post 6064380)
Give more space to a data hoarder ? Guess what'll happen. Guess how I know.

I second the RAID warning above - think about what happens when you accidently do "rm -rf Photos". But at least you'll have some redundancy - for a while .... :p
The issue I have is multiple copies. Everywhere. Some are deliberate, some just happen. There are a bunch of duplicate finder tools available - GUI and cli; I use the latter. The problem with all of them is I have never trusted the automatic deletion option. I have photos, some of which have been editted. Which one(s) to keep is an interesting discussion I continually have with myself - and sometimes herself. Allowing a tool to just delete all but the oldest (or newest) just isn't on. They can usually produce a list you can then parse, but again it ain't always obvious.

tl;dr - anyway you cut it, you've got some legwork ahead of you.

Oh yeah, I'm sure more space will be good for these data hoarders haha! :foot:

Yeah I'm bad for multiple copies too...or maybe it's a good thing. I record music and I may have 4-5 copies of the same song BUT each with a different setting or effect applied so I can see how each one sounds.

As for the legwork...yeah I know. :( It's coming together though. I have decided that I would create select Public folders both me and my wife can access and share. So pictures, music, other media, etc. And then we would each have our own user backup folder for everything else. It's simple enough and hopefully should work out ok.

michaelk 12-04-2019 09:16 AM

I am talking about archival storage. The problem with flash memory is not write limitations but data retention. Unplugged the voltage leaks from the memory cells and how fast it leaks is temperature dependent. At normal room temperatures i.e 72 degrees F the JEDEC general SSD specifications have data failure around 2 years. Optical media if stored properly is 30 to 40 years as based on a study by the Library of Congress but that was about 10 years ago. I think there are optical disks that are supposed to have a life span of 100 years.

Basslord1124 12-05-2019 08:26 AM

Ah ok...wow, 2 years really? That's crazy. I wonder if that's situation dependent or on the hardware manufacturer itself. I have a Sandisk Cruzer Enterprise 4GB flash drive that I would bet is around 10 years old and is still going strong. I don't use it for mission critical stuff and I have wiped the original password security partition off of it. Use it mainly now for installing Linux distros on machines. And it's been through all sorts of temperature fluctuations.

I had just recently came across the optical media that can last 100 years...pretty nice, although I haven't burned a CD/DVD in a while.

michaelk 12-05-2019 09:38 AM

Long term storage i.e throw the flash drive in the drawer, do not use it or plug it into a computer for 2 years. If your always using the drive data retention is not a concern.

ordealbyfire83 12-05-2019 12:39 PM

I generally shy away from USB sticks as backups because they're basically the flash memory plus a controller circuit. The controller varies by manufacture and by model. I've had a couple flash drives dies completely, as in electronically dead, though the flash memory was probably still fine. So unless you have identical flash drives on hand to use as controller-donors (ie for desoldering / replacing the flash memory) then they're good as useless. Optical disks don't have that problem and will definitely last longer. Think of a flash drive as carrying the CD-R plus the CD-ROM drive in your pocket, soldered together. At least CD's / DVD's are portable from one drive to another.

agillator 12-05-2019 07:35 PM

If you are going to undertake a project this huge, are you also checking for identical files regardless of name? A major task but might eliminate a number of files you might otherwise miss.

I agree that you need to end up on non-temporary media, i.e. DVDs or something similar, but 10 years worth of files? How many DVDs would that take?

You might consider developing a system of backups and then once a year updating a set of DVDs or something. Personally I backup my entire LAN (8 computers - a couple of TB) daily to a separate computer. For my setup I use a raspberry pi for the backup computer because it is inexpensive and works well and put the backups on an 8TB external drive. Of course it can fail, and I plan for it to fail so periodically I move all irreplaceable files to permanent media. But usually I will have some advance warning of possible failure. Also, any good rsync based backup system (personally I am happy with rsnapshot) will keep the actual disk space used to a minimum because of hard links. As to irreplaceable, that is a judgment call of course. But I am one that leans heavily in the direction of if I haven't needed it in two or three years I can probably do without it except for some things like tax files, family records and so on. I have a stack of backup DVDs going back many, many years and I can only think of one time I have tried to find something on them (without success, I might add). By the way, a daily backup (after the initial one) of my entire LAN normally takes 10 to 30 minutes in the background depending on what we have done in the previous 24 hours. Of course if I were really picky I would use two drives and alternate them in some fashion so even with catastrophic failure I wouldn't lose much, if anything.

Basslord1124 12-06-2019 08:59 AM

Quote:

Originally Posted by agillator (Post 6065153)
If you are going to undertake a project this huge, are you also checking for identical files regardless of name? A major task but might eliminate a number of files you might otherwise miss.

I agree that you need to end up on non-temporary media, i.e. DVDs or something similar, but 10 years worth of files? How many DVDs would that take?

You might consider developing a system of backups and then once a year updating a set of DVDs or something. Personally I backup my entire LAN (8 computers - a couple of TB) daily to a separate computer. For my setup I use a raspberry pi for the backup computer because it is inexpensive and works well and put the backups on an 8TB external drive. Of course it can fail, and I plan for it to fail so periodically I move all irreplaceable files to permanent media. But usually I will have some advance warning of possible failure. Also, any good rsync based backup system (personally I am happy with rsnapshot) will keep the actual disk space used to a minimum because of hard links. As to irreplaceable, that is a judgment call of course. But I am one that leans heavily in the direction of if I haven't needed it in two or three years I can probably do without it except for some things like tax files, family records and so on. I have a stack of backup DVDs going back many, many years and I can only think of one time I have tried to find something on them (without success, I might add). By the way, a daily backup (after the initial one) of my entire LAN normally takes 10 to 30 minutes in the background depending on what we have done in the previous 24 hours. Of course if I were really picky I would use two drives and alternate them in some fashion so even with catastrophic failure I wouldn't lose much, if anything.


What sort of started this whole thing for me was that my wife and I had our first (and at this point, probably only) child almost 2 years ago...born early at 24 weeks. So as a result, she has been a big focus as far as pictures AND videos of her progress and everything. We have a lot of stuff posted on Facebook too, but some stuff is not posted there. So you have lots of pictures/videos of her PLUS the fact that we live in an age the quality of pictures has put them into the MB range (instead of the KB range that many of us had seen). And the wife had mentioned about making sure we keep her stuff backed up...which she's not a big on backing up data but I totally understand it in our case with our daughter. And with me doing IT, she probably figures I'd back it up anyways. So I had known for a while that my backup process was "ok" but definitely needed improvement. So the birth of my daughter was basically the inspiration for cleaning this up. So that's where I'm at now.

As far as identical files/names...from a picture perspective, I don't look at the names so much as most have the default name from whatever the device gave it...usually a timestamp and some sort of other identifier. I am doing all these from a GUI (actually mostly from Windows right now) so I can at least see a thumbnail of the image and then put it in its appropriate place. In 99% of these cases, I'm not changing any file names. Just finding similar-themed pictures and putting them in the right folder...birthdays, weddings, concerts, other events, etc. Some stuff that we accumulated over the years such as funny pics (such as the infamous cat meme pictures) and other "non personals" are getting deleted. Yes we've accumulated things like that too. :eek: :rolleyes: :) Through my digging I finally came across my Linux Registered User number...so I now have it in my signature. :)

Something else that will be more of a future "task" but on the same topic is that I am a musician and do have a collection of recordings I've made over the years. Some of these "sessions" can contain multiple instrument tracks...which from a data standpoint is multiple .wav files that can sometimes add up to a few GB in size. I eventually want to downsize these too. AND another thing is I have been making youtube videos for a little while now and to create those can lead to a few GB of video clips to make a single youtube video. But that will be tackled sometime after I get the intial backup thing sorted out.

In the grand scheme, my daughter's pics/videos are a huge priority in terms of backups so that will probably get a lot more emphasis in terms of the backup process as well as other "important things" (tax records like you mentioned for example). Everything else will fall below that. I had pondered that beyond my main file server to perhaps have another file server/external drive/NAS for one additional spot to do a 2nd backup (but not performed as often). If I did do the optical disk route, that might be the big focus for our daughter's pictures and videos while everything else may remain on hard disk or other media. I had pondered too that I might take some of the super low priority stuff and put it on optical or some other lower end media and clear it off any hard disk (just b/c I figure it won't be needed or accessed).

I'll cross that bridge when I get there. I'm still sifting through things now and organizing them. And speaking of which, I have just now started getting into our daughter's pics/videos. After working on this off and on for the past 3-4 days.

agillator 12-06-2019 10:29 AM

Wow, do you have a project! However, the thing that still keeps clattering around in the back of my mind is the number of optical disks involved for actual permanent storage. I agree with the folks who don't trust the USB drives. But, if you have a TB of data to save and are going to save it to standard DVDs that is 250 or so disks - perhaps less if you go the double layer route. That's not impossible, but . . . . In all seriousness you might look into the idea of using one or more raspberry pi's with external drives to make a backup system that is itself backed up. I haven't seen any figures on how long a hard drive lasts without being powered or used, but if you can find that you can swap drives back and forth always keeping one off site (in a safety deposit box?) and have a vanishingly miniscule chance of losing anything important. The pi's are inexpensive (a complete kit for about $75, the board itself for $35), good external drives are not too bad . . . .

One other thought and word of warning: are you sure when you daughter is 16 or 28 she is going to want all those pictures around? She may hate you because grown up kids often don't think the pictures we think are 'cute' are so cute. You may have to hide them VERY well and protect them VERY well; Fort Knox comes to mind.

Personally I have all the photos I have taken since I went from film to digital (kicking and screaming, mind you) so that goes back ten years or so and backups have served me well. You may find you don't have as much of a problem as you imagine. Some of my photos are quite important. My wife is an attorney and some of my photos are evidence in cases. Those she maintains in addition to my copies. So far we have lost none that we know of.

Good luck; have fun.

Basslord1124 12-26-2019 01:59 PM

Update
 
I would like to say I'm near completion but in all honesty, I think this will be an ongoing thing. As long as we are taking pictures/video and doing things, there will always be data there. I do feel like I have made some significant process and that the worst of it is out of the way...mainly the organizational aspect. I feel like I could much more easily locate things with this new organizational structure. And no doubt it'll be easier to manage too.

I still have more files to sift through and put places, but at this point right now, I have used about 310GB of a 500GB drive and 52GB on another drive. I do honestly think when it's all said and done that I'll be looking at a TB. I may try and revisit this thread with size updates.

And in all honesty, I think when I step back and look at this, it sorta seems like a waste. I did it more for myself more than anything else. I kinda feel like my wife doesn't care about what it is I'm doing. She rarely asks for any kind of old data and I honestly don't think that will change. She just likes that assurance that I keep it or have it backed up somewhere.


All times are GMT -5. The time now is 12:45 AM.