How can I purge files logarithmically by modification date (BASH)?
Hi!
Today I do backups regularly but purge old backups older than a specific date. What I would like is to save all files from the last two days, one file per day from the last week, one file per week for the last month, one file per month for the last year and one file for every year. I don't fully understand what logic I should implement to achieve something like this. Can anyone help me with pointers on how to implement this and maybe suggestion on packages that can be of help? What I have achieved so far is this: Code:
smart_rm () |
Have you you looked at the exec mtime and similar options on the find command?
|
Some possible logic:
Code:
BASENAME = SomeFileName |
CollieJim's pseudo code looks good. You have to decide on Day of Week, Day of Month, Month of Year values, etc.
|
CollieJim's pseudo code looks promising though I don't fully understand how this would be implemented. In the first part are you suggesting I should modify the filename of the files?
Code:
if day of month == 1 With this snippet I can sort the files on their modification date: Code:
#Select and sort all files Code:
#Delete files I have also thought about the approach with find and using the -mtime option: Code:
find /path/to/files* -mtime +5 -exec rm {} \; Any suggestions on how I should proceed? If I've have misunderstood CollieJim's code then please help me understand what he means :) |
Could you use something like the Towers of Hanoi backup rotation scheme? There is a shell script that says it implements it here.
|
Quote:
I also came across information about Rsnapshot. It's a utility that does what I want automatically, so I might base the whole backup system on that instead. Suggestions? |
I expected basename to be derived from a hostname or username and timestamp, among other possibilities. That way each is unique but grouped by tag (AN, WK, MO).
|
rsnapshot couldn't be used in the way I needed it too so I've kept going trying to find a solution myself. This is what I've come up with:
Code:
#!/bin/bash I've used this script to generate files to test with: Code:
#!/bin/bash |
Rather than running the stat command on each individual file, I might be tempted to do something like this:
Code:
ls -ltd --time-style full-iso | ( read modes links owner group size date time utc_offset file_name |
Just use find, this is what it is for.
Code:
# all files between 25 and 35 days old to maximum depth of 2. |
Quote:
Just curious though, will find require more resources? |
At some point, whatever tool you use is going to have to walk the filesystem, whether it is the shell doing it through wildcards or whether it is find.
find's role is to do just that and, while I don't have any data to back me up, I would be surprised if it isn't optimised. BTW, if you want to look for an alternative, stat is useful for its variety of output. You can parse the output quite easily to get attributes you want. But I would use find if it was me. |
Quote:
Code:
FILES="$(find $TRGT_DIR* -daystart \( -mtime +$DATE_RM_THRESHOLD -a -mtime -$DATE_RM_LIMIT \) \+)" |
You haven't given find an exec action. -exec some_command {} \+
{} is a placeholder for all the files that find finds. + means pass them all through at once. |
All times are GMT -5. The time now is 10:46 PM. |