[SOLVED] How can I purge files logarithmically by modification date (BASH)?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How can I purge files logarithmically by modification date (BASH)?
Hi!
Today I do backups regularly but purge old backups older than a specific date. What I would like is to save all files from the last two days, one file per day from the last week, one file per week for the last month, one file per month for the last year and one file for every year.
I don't fully understand what logic I should implement to achieve something like this. Can anyone help me with pointers on how to implement this and maybe suggestion on packages that can be of help?
What I have achieved so far is this:
Code:
smart_rm ()
{
#If wrong number of parameters been specified exit
if [ -z "$1" ]; then
echo "$ISO_DATETIME [ERROR]: You must specify a directory to clean."
return 1
fi
local TRGT_DIR=$1
#Target must be a directory
if [ ! -d "$TRGT_DIR" ]; then
echo "$ISO_DATETIME [ERROR]: The target must exist and be a directory."
return 1
fi
#Make sure that the path ends with /
if [ "${TRGT_DIR#${TRGT_DIR%?}}" != "/" ]; then
TRGT_DIR="${TRGT_DIR}/"
fi
#Select and sort all files
local FILES
for i in $(ls -t $TRGT_DIR)
do
FILES=(${FILES[@]} "${TRGT_DIR}${i}")
done
#Delete files
local FILES_TO_KEEP
local FILES_FROM_LAST_WEEK
for i in ${FILES[@]}
do
local MOD_DATE=$(stat -c %y $FILE)
MOD_DATE=$(date -d "${MOD_DATE:0:10}" +%s)
#If file has been modified within two days we save it
if [ $MOD_DATE > $(date -d "2 days ago" +%s) ]; then
FILES_TO_KEEP=(${FILES[@]} "${TRGT_DIR}${i}")
fi
#WHAT NOW?!?!
done
}
BASENAME = SomeFileName
if day of month == 1
BASENAME = $BASENAME + "_MO"
if day of month == 1 and month == 6 then
BASENAME = $BASENAME + "_AN"
if day of week == MONDAY
BASENAME = $BASENAME + "_WK"
do backups
for each backup file
if *AN*
continue
else if *MO*
if older than 1 year
delete
fi
continue
else if *WK*
if older than 30 days
delete
fi
continue
else
if older than 7 days
delete
fi
CollieJim's pseudo code looks promising though I don't fully understand how this would be implemented. In the first part are you suggesting I should modify the filename of the files?
Code:
if day of month == 1
BASENAME = $BASENAME + "_MO"
if day of month == 1 and month == 6 then
BASENAME = $BASENAME + "_AN"
if day of week == MONDAY
BASENAME = $BASENAME + "_WK"
Since I can't set a filename before the backup has been run (since I need the script to purge old backups) I don't understand how I should be able to select the files. My problem is that I don't understand how I should be able to select files within a certain time period and only select only one of them.
With this snippet I can sort the files on their modification date:
Code:
#Select and sort all files
local FILES
for i in $(ls -t $TRGT_DIR)
do
FILES=(${FILES[@]} "${TRGT_DIR}${i}")
done
But I still get all files when I just want one for ever day, one for every week, etc. So my first attempt was to try and filter these afterwards:
Code:
#Delete files
local FILES_TO_KEEP
local FILES_FROM_LAST_WEEK
for i in ${FILES[@]}
do
local MOD_DATE=$(stat -c %y $FILE)
MOD_DATE=$(date -d "${MOD_DATE:0:10}" +%s)
#If file has been modified within two days we save it
if [ $MOD_DATE > $(date -d "2 days ago" +%s) ]; then
FILES_TO_KEEP=(${FILES[@]} "${TRGT_DIR}${i}")
fi
#WHAT NOW?!?!
done
...But I have no idea what to do with it.
I have also thought about the approach with find and using the -mtime option:
Code:
find /path/to/files* -mtime +5 -exec rm {} \;
At the moment it seems like the most reasonable option. I guess I would still need to compare it to a modification date on the file to get a date range. And I would also like to be able to sort them on modification date so that I keep the newest copy from the week, etc.
Any suggestions on how I should proceed? If I've have misunderstood CollieJim's code then please help me understand what he means
That looks interesting! Thanks, I'll look into that!
I also came across information about Rsnapshot. It's a utility that does what I want automatically, so I might base the whole backup system on that instead. Suggestions?
I expected basename to be derived from a hostname or username and timestamp, among other possibilities. That way each is unique but grouped by tag (AN, WK, MO).
rsnapshot couldn't be used in the way I needed it too so I've kept going trying to find a solution myself. This is what I've come up with:
Code:
#!/bin/bash
smart_rm ()
{
#If wrong number of parameters been specified exit
if [ -z "$1" ]; then
echo "$ISO_DATETIME [ERROR]: You must specify a directory to clean."
return 1
fi
local TRGT_DIR=$1
#Target must be a directory
if [ ! -d "$TRGT_DIR" ]; then
echo "$ISO_DATETIME [ERROR]: The target must exist and be a directory."
return 1
fi
#Make sure that the path ends with /
if [ "${TRGT_DIR#${TRGT_DIR%?}}" != "/" ]; then
TRGT_DIR="${TRGT_DIR}/"
fi
#Files to delete
local FILES_TO_DELETE
#Set a minimum age for files to be deleted
local DATE_RM_THRESHOLD=2
#Create the controller for found files
local FOUND_ONE=1
COUNTER=0
#Loop as long as there are files to examine
for FILE in $(ls -t $TRGT_DIR)
do
#Get the file's modification date
MTIME=$(date -d "$(stat -c %y $TRGT_DIR$FILE)" +%s)
#Find one to save for every day the last 7 days
if [ $DATE_RM_THRESHOLD -le 7 ]; then
#Get date range
DAY_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
DAY_START=$(($DAY_END-60*60*24))
#If the file's modification time is earlier then our thrashold we push it back one day
if [ $MTIME -lt $DAY_END ]; then
DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+1))
FOUND_ONE=1
fi
#Have we found one to keep for this day?
if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $DAY_START ] && [ $MTIME -lt $DAY_END ]; then
FOUND_ONE=0
echo "DAY"
echo "$FILE"
else
FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
fi
fi
#Find one to save for every week the last 4 weeks
if [ $DATE_RM_THRESHOLD -gt 7 ] && [ $DATE_RM_THRESHOLD -le $((7*4)) ]; then
#Get date range
WEEK_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
WEEK_START=$(($WEEK_START-60*60*24*7))
#If the file's modification time is earlier than our threshold we push it back one week
if [ $MTIME -lt $WEEK_END ]; then
DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+7))
FOUND_ONE=1
fi
#Have we found one to keep for this week?
if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $WEEK_START ] && [ $MTIME -lt $WEEK_END ]; then
FOUND_ONE=0
echo "WEEK"
echo "$FILE"
else
FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
fi
fi
#Find one to save for every month the last 12 months
if [ $DATE_RM_THRESHOLD -gt $((7*4)) ] && [ $DATE_RM_THRESHOLD -le $((30*12)) ]; then
#Get date range
MONTH_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
MONTH_START=$(($MONTH_START-60*60*24*30))
#If the file's modification time is earlier than our threshold we push it back one month
if [ $MTIME -lt $MONTH_END ]; then
DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+30))
FOUND_ONE=1
fi
#Have we found one to keep for this week?
if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $MONTH_START ] && [ $MTIME -lt $MONTH_END ]; then
FOUND_ONE=0
echo "MONTH"
echo "$FILE"
else
FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
fi
fi
#Find one to save for every year
if [ $DATE_RM_THRESHOLD -gt $((30*12)) ]; then
#Get date range
YEAR_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
YEAR_START=$(($MONTH_START-60*60*24*365))
#If the file's modification time is earlier than our threshold we push it back one month
if [ $MTIME -lt $YEAR_END ]; then
DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+365))
FOUND_ONE=1
fi
#Have we found one to keep for this week?
if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $YEAR_START ] && [ $MTIME -lt $YEAR_END ]; then
FOUND_ONE=0
echo "YEAR"
echo "$FILE"
else
FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
fi
fi
done
#Show result
#for FILE in ${FILES_TO_DELETE[@]}
#do
# echo $FILE
#done
#Delete the selected files
for FILE in ${FILES_TO_DELETE[@]}
do
echo $FILE
rm -R $FILE
done
}
I "almost" works! The first run everything goes as it should but on every subsequent run it deletes one more file though there should be no more files to delete.
I've used this script to generate files to test with:
Code:
#!/bin/bash
DAYS=0
DATE=$(date -d "$DAYS days ago" +%Y-%m-%d)
while [ $DAYS -le 1200 ]
do
DATE=$(date -d "$DAYS days ago" +%Y-%m-%d)
touch "/home/niklas/test/$DATE.txt"
touch -d "$DATE" "/home/niklas/test/$DATE.txt"
DAYS=$(($DAYS+1))
done
echo "You've just created a whole lot of files!"
Rather than running the stat command on each individual file, I might be tempted to do something like this:
Code:
ls -ltd --time-style full-iso | ( read modes links owner group size date time utc_offset file_name
while [ $? -eq 0 ]
do
date -d "$date $time $utc_offset" +%s
read modes links owner group size date time utc_offset file_name
done
)
# all files between 25 and 35 days old to maximum depth of 2.
FILES="$(find . \( -mtime +25 -a -mtime -35 \) -maxdepth 2 -type f -exec /bin/ls -1 {} \+)"
for i in "$FILES" ; do /bin/ls -l "$i" ; done
An old saying in the software field is "good enough is good enough". IOW, it is easy to obsess on getting This done exactly right, and That done exactly right, etc. Really, good enough is ok. If you have some file(s) that is about 30 days old, that is good enough. Rinse and repeat for 7 days, 90 days, 180 days, etc.
# all files between 25 and 35 days old to maximum depth of 2.
FILES="$(find . \( -mtime +25 -a -mtime -35 \) -maxdepth 2 -type f -exec /bin/ls -1 {} \+)"
for i in "$FILES" ; do /bin/ls -l "$i" ; done
An old saying in the software field is "good enough is good enough". IOW, it is easy to obsess on getting This done exactly right, and That done exactly right, etc. Really, good enough is ok. If you have some file(s) that is about 30 days old, that is good enough. Rinse and repeat for 7 days, 90 days, 180 days, etc.
I tried with find first but I didn't know it could find files modified within a date range larger than one day (find -mtime x) I never thought that you could combine the same tests Thank you! This will make the code much simpler!
Just curious though, will find require more resources?
At some point, whatever tool you use is going to have to walk the filesystem, whether it is the shell doing it through wildcards or whether it is find.
find's role is to do just that and, while I don't have any data to back me up, I would be surprised if it isn't optimised.
BTW, if you want to look for an alternative, stat is useful for its variety of output. You can parse the output quite easily to get attributes you want. But I would use find if it was me.
At some point, whatever tool you use is going to have to walk the filesystem, whether it is the shell doing it through wildcards or whether it is find.
find's role is to do just that and, while I don't have any data to back me up, I would be surprised if it isn't optimised.
BTW, if you want to look for an alternative, stat is useful for its variety of output. You can parse the output quite easily to get attributes you want. But I would use find if it was me.
I'm trying "find" solution now but I keep getting an error I don't know how to get rid off:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.