LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-15-2012, 12:58 PM   #1
gunnarflax
LQ Newbie
 
Registered: Nov 2010
Posts: 20

Rep: Reputation: 4
How can I purge files logarithmically by modification date (BASH)?


Hi!

Today I do backups regularly but purge old backups older than a specific date. What I would like is to save all files from the last two days, one file per day from the last week, one file per week for the last month, one file per month for the last year and one file for every year.

I don't fully understand what logic I should implement to achieve something like this. Can anyone help me with pointers on how to implement this and maybe suggestion on packages that can be of help?

What I have achieved so far is this:

Code:
smart_rm ()
{
	#If wrong number of parameters been specified exit
	if [ -z "$1" ]; then
		echo "$ISO_DATETIME [ERROR]: You must specify a directory to clean."
		return 1
	fi

	local TRGT_DIR=$1

	#Target must be a directory
	if [ ! -d "$TRGT_DIR" ]; then
		echo "$ISO_DATETIME [ERROR]: The target must exist and be a directory."
		return 1
	fi

	#Make sure that the path ends with /
	if [ "${TRGT_DIR#${TRGT_DIR%?}}" != "/" ]; then
		TRGT_DIR="${TRGT_DIR}/"
	fi

	#Select and sort all files
	local FILES

	for i in $(ls -t $TRGT_DIR)
	do
		FILES=(${FILES[@]} "${TRGT_DIR}${i}") 
	done

	#Delete files
	local FILES_TO_KEEP
	local FILES_FROM_LAST_WEEK

	for i in ${FILES[@]}
	do
		local MOD_DATE=$(stat -c %y $FILE)
		MOD_DATE=$(date -d "${MOD_DATE:0:10}" +%s)

		#If file has been modified within two days we save it
		if [ $MOD_DATE > $(date -d "2 days ago" +%s) ]; then
			FILES_TO_KEEP=(${FILES[@]} "${TRGT_DIR}${i}") 
		fi

#WHAT NOW?!?!

	done
}
Thanks!
 
Old 01-15-2012, 11:27 PM   #2
kakaka
Member
 
Registered: Sep 2003
Posts: 382

Rep: Reputation: 86
Have you you looked at the exec mtime and similar options on the find command?
 
Old 01-16-2012, 12:16 AM   #3
CollieJim
Member
 
Registered: Mar 2005
Distribution: Gentoo, Kubuntu
Posts: 434

Rep: Reputation: 19
Some possible logic:
Code:
BASENAME = SomeFileName


if day of month == 1
   BASENAME = $BASENAME + "_MO"
if day of month == 1  and  month == 6  then
   BASENAME = $BASENAME + "_AN"
if day of week == MONDAY
   BASENAME = $BASENAME + "_WK"

do backups

for each backup file
   if *AN*
       continue
   else if *MO* 
       if older than 1 year
           delete
       fi
       continue
   else if *WK*
       if older than 30 days
           delete
       fi
       continue
   else
       if older than 7 days
           delete
       fi
 
Old 01-16-2012, 12:40 AM   #4
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 553

Rep: Reputation: 46
CollieJim's pseudo code looks good. You have to decide on Day of Week, Day of Month, Month of Year values, etc.
 
Old 01-16-2012, 03:38 AM   #5
gunnarflax
LQ Newbie
 
Registered: Nov 2010
Posts: 20

Original Poster
Rep: Reputation: 4
CollieJim's pseudo code looks promising though I don't fully understand how this would be implemented. In the first part are you suggesting I should modify the filename of the files?

Code:
if day of month == 1
   BASENAME = $BASENAME + "_MO"
if day of month == 1  and  month == 6  then
   BASENAME = $BASENAME + "_AN"
if day of week == MONDAY
   BASENAME = $BASENAME + "_WK"
Since I can't set a filename before the backup has been run (since I need the script to purge old backups) I don't understand how I should be able to select the files. My problem is that I don't understand how I should be able to select files within a certain time period and only select only one of them.

With this snippet I can sort the files on their modification date:

Code:
#Select and sort all files
local FILES

for i in $(ls -t $TRGT_DIR)
do
	FILES=(${FILES[@]} "${TRGT_DIR}${i}") 
done
But I still get all files when I just want one for ever day, one for every week, etc. So my first attempt was to try and filter these afterwards:

Code:
#Delete files
local FILES_TO_KEEP
local FILES_FROM_LAST_WEEK

for i in ${FILES[@]}
do
	local MOD_DATE=$(stat -c %y $FILE)
	MOD_DATE=$(date -d "${MOD_DATE:0:10}" +%s)

	#If file has been modified within two days we save it
	if [ $MOD_DATE > $(date -d "2 days ago" +%s) ]; then
		FILES_TO_KEEP=(${FILES[@]} "${TRGT_DIR}${i}") 
	fi

#WHAT NOW?!?!

done
...But I have no idea what to do with it.

I have also thought about the approach with find and using the -mtime option:

Code:
find /path/to/files* -mtime +5 -exec rm {} \;
At the moment it seems like the most reasonable option. I guess I would still need to compare it to a modification date on the file to get a date range. And I would also like to be able to sort them on modification date so that I keep the newest copy from the week, etc.

Any suggestions on how I should proceed? If I've have misunderstood CollieJim's code then please help me understand what he means
 
Old 01-16-2012, 05:39 AM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,546
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Could you use something like the Towers of Hanoi backup rotation scheme? There is a shell script that says it implements it here.
 
Old 01-16-2012, 06:46 AM   #7
gunnarflax
LQ Newbie
 
Registered: Nov 2010
Posts: 20

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by catkin View Post
Could you use something like the Towers of Hanoi backup rotation scheme? There is a shell script that says it implements it here.
That looks interesting! Thanks, I'll look into that!

I also came across information about Rsnapshot. It's a utility that does what I want automatically, so I might base the whole backup system on that instead. Suggestions?
 
Old 01-16-2012, 07:26 AM   #8
CollieJim
Member
 
Registered: Mar 2005
Distribution: Gentoo, Kubuntu
Posts: 434

Rep: Reputation: 19
I expected basename to be derived from a hostname or username and timestamp, among other possibilities. That way each is unique but grouped by tag (AN, WK, MO).
 
Old 01-16-2012, 03:48 PM   #9
gunnarflax
LQ Newbie
 
Registered: Nov 2010
Posts: 20

Original Poster
Rep: Reputation: 4
rsnapshot couldn't be used in the way I needed it too so I've kept going trying to find a solution myself. This is what I've come up with:

Code:
#!/bin/bash

smart_rm ()
{
	#If wrong number of parameters been specified exit
	if [ -z "$1" ]; then
		echo "$ISO_DATETIME [ERROR]: You must specify a directory to clean."
		return 1
	fi

	local TRGT_DIR=$1

	#Target must be a directory
	if [ ! -d "$TRGT_DIR" ]; then
		echo "$ISO_DATETIME [ERROR]: The target must exist and be a directory."
		return 1
	fi

	#Make sure that the path ends with /
	if [ "${TRGT_DIR#${TRGT_DIR%?}}" != "/" ]; then
		TRGT_DIR="${TRGT_DIR}/"
	fi

	#Files to delete
	local FILES_TO_DELETE
	#Set a minimum age for files to be deleted
	local DATE_RM_THRESHOLD=2
	#Create the controller for found files
	local FOUND_ONE=1

	COUNTER=0

	#Loop as long as there are files to examine
	for FILE in $(ls -t $TRGT_DIR)
	do
		#Get the file's modification date
		MTIME=$(date -d "$(stat -c %y $TRGT_DIR$FILE)" +%s)

		#Find one to save for every day the last 7 days
		if [ $DATE_RM_THRESHOLD -le 7 ]; then

			#Get date range
			DAY_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
			DAY_START=$(($DAY_END-60*60*24))

			#If the file's modification time is earlier then our thrashold we push it back one day
			if [ $MTIME -lt $DAY_END ]; then
				DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+1))
				FOUND_ONE=1
			fi

			#Have we found one to keep for this day?
			if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $DAY_START ] && [ $MTIME -lt $DAY_END ]; then
				FOUND_ONE=0
				echo "DAY"
				echo "$FILE"
			else
				FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
			fi
		fi
		
		#Find one to save for every week the last 4 weeks
		if [ $DATE_RM_THRESHOLD -gt 7 ] && [ $DATE_RM_THRESHOLD -le $((7*4)) ]; then
			
			#Get date range
			WEEK_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
			WEEK_START=$(($WEEK_START-60*60*24*7))

			#If the file's modification time is earlier than our threshold we push it back one week
			if [ $MTIME -lt $WEEK_END ]; then
				DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+7))
				FOUND_ONE=1
			fi

			#Have we found one to keep for this week?
			if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $WEEK_START ] && [ $MTIME -lt $WEEK_END ]; then
				FOUND_ONE=0
				echo "WEEK"
				echo "$FILE"
			else
				FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
			fi	
		fi

		#Find one to save for every month the last 12 months
		if [ $DATE_RM_THRESHOLD -gt $((7*4)) ] && [ $DATE_RM_THRESHOLD -le $((30*12)) ]; then

			#Get date range
			MONTH_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
			MONTH_START=$(($MONTH_START-60*60*24*30))

			#If the file's modification time is earlier than our threshold we push it back one month
			if [ $MTIME -lt $MONTH_END ]; then
				DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+30))
				FOUND_ONE=1
			fi

			#Have we found one to keep for this week?
			if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $MONTH_START ] && [ $MTIME -lt $MONTH_END ]; then
				FOUND_ONE=0
				echo "MONTH"
				echo "$FILE"
			else
				FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
			fi	
		fi

		#Find one to save for every year
		if [ $DATE_RM_THRESHOLD -gt $((30*12)) ]; then
			
			#Get date range
			YEAR_END=$(date -d "$DATE_RM_THRESHOLD days ago" +%s)
			YEAR_START=$(($MONTH_START-60*60*24*365))

			#If the file's modification time is earlier than our threshold we push it back one month
			if [ $MTIME -lt $YEAR_END ]; then
				DATE_RM_THRESHOLD=$(($DATE_RM_THRESHOLD+365))
				FOUND_ONE=1
			fi

			#Have we found one to keep for this week?
			if [ $FOUND_ONE -eq 1 ] && [ $MTIME -ge $YEAR_START ] && [ $MTIME -lt $YEAR_END ]; then
				FOUND_ONE=0
				echo "YEAR"
				echo "$FILE"
			else
				FILES_TO_DELETE=(${FILES_TO_DELETE[@]} "$TRGT_DIR$FILE")
			fi
		fi
	done

	#Show result
	#for FILE in ${FILES_TO_DELETE[@]}
	#do
	#	echo $FILE
	#done

	#Delete the selected files
	for FILE in ${FILES_TO_DELETE[@]}
	do
		echo $FILE
		rm -R $FILE
	done
}
I "almost" works! The first run everything goes as it should but on every subsequent run it deletes one more file though there should be no more files to delete.

I've used this script to generate files to test with:

Code:
#!/bin/bash

DAYS=0
DATE=$(date -d "$DAYS days ago" +%Y-%m-%d)

while [ $DAYS -le 1200 ]
do
	DATE=$(date -d "$DAYS days ago" +%Y-%m-%d)

	touch "/home/niklas/test/$DATE.txt"
	touch -d "$DATE" "/home/niklas/test/$DATE.txt"

	DAYS=$(($DAYS+1))
done

echo "You've just created a whole lot of files!"
Any suggestions or improvements to my code?
 
Old 01-16-2012, 07:37 PM   #10
kakaka
Member
 
Registered: Sep 2003
Posts: 382

Rep: Reputation: 86
Rather than running the stat command on each individual file, I might be tempted to do something like this:

Code:
ls -ltd --time-style full-iso | ( read  modes links owner group size date time utc_offset file_name
while [ $? -eq 0 ]
    do
        date -d "$date $time $utc_offset" +%s
        read   modes links owner group size date time utc_offset file_name
    done
)
 
Old 01-16-2012, 08:13 PM   #11
padeen
Member
 
Registered: Sep 2009
Location: Perth, W.A.
Distribution: Slackware 14, Debian 7, FreeBSD, OpenBSD
Posts: 177

Rep: Reputation: 34
Just use find, this is what it is for.

Code:
# all files between 25 and 35 days old to maximum depth of 2.
FILES="$(find .  \( -mtime +25 -a -mtime -35 \) -maxdepth 2   -type f -exec /bin/ls -1 {} \+)"
for i in "$FILES" ; do /bin/ls -l "$i" ; done
An old saying in the software field is "good enough is good enough". IOW, it is easy to obsess on getting This done exactly right, and That done exactly right, etc. Really, good enough is ok. If you have some file(s) that is about 30 days old, that is good enough. Rinse and repeat for 7 days, 90 days, 180 days, etc.
 
Old 01-17-2012, 02:45 AM   #12
gunnarflax
LQ Newbie
 
Registered: Nov 2010
Posts: 20

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by padeen View Post
Just use find, this is what it is for.

Code:
# all files between 25 and 35 days old to maximum depth of 2.
FILES="$(find .  \( -mtime +25 -a -mtime -35 \) -maxdepth 2   -type f -exec /bin/ls -1 {} \+)"
for i in "$FILES" ; do /bin/ls -l "$i" ; done
An old saying in the software field is "good enough is good enough". IOW, it is easy to obsess on getting This done exactly right, and That done exactly right, etc. Really, good enough is ok. If you have some file(s) that is about 30 days old, that is good enough. Rinse and repeat for 7 days, 90 days, 180 days, etc.
I tried with find first but I didn't know it could find files modified within a date range larger than one day (find -mtime x) I never thought that you could combine the same tests Thank you! This will make the code much simpler!

Just curious though, will find require more resources?
 
Old 01-17-2012, 07:13 AM   #13
padeen
Member
 
Registered: Sep 2009
Location: Perth, W.A.
Distribution: Slackware 14, Debian 7, FreeBSD, OpenBSD
Posts: 177

Rep: Reputation: 34
At some point, whatever tool you use is going to have to walk the filesystem, whether it is the shell doing it through wildcards or whether it is find.

find's role is to do just that and, while I don't have any data to back me up, I would be surprised if it isn't optimised.

BTW, if you want to look for an alternative, stat is useful for its variety of output. You can parse the output quite easily to get attributes you want. But I would use find if it was me.
 
Old 01-17-2012, 09:13 AM   #14
gunnarflax
LQ Newbie
 
Registered: Nov 2010
Posts: 20

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by padeen View Post
At some point, whatever tool you use is going to have to walk the filesystem, whether it is the shell doing it through wildcards or whether it is find.

find's role is to do just that and, while I don't have any data to back me up, I would be surprised if it isn't optimised.

BTW, if you want to look for an alternative, stat is useful for its variety of output. You can parse the output quite easily to get attributes you want. But I would use find if it was me.
I'm trying "find" solution now but I keep getting an error I don't know how to get rid off:

Code:
FILES="$(find $TRGT_DIR* -daystart \( -mtime +$DATE_RM_THRESHOLD -a -mtime -$DATE_RM_LIMIT \) \+)"

output:
find: paths must precede expression: +
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
How do I proceed? I can't find an answer on google
 
Old 01-17-2012, 09:48 AM   #15
padeen
Member
 
Registered: Sep 2009
Location: Perth, W.A.
Distribution: Slackware 14, Debian 7, FreeBSD, OpenBSD
Posts: 177

Rep: Reputation: 34
You haven't given find an exec action. -exec some_command {} \+

{} is a placeholder for all the files that find finds. + means pass them all through at once.
 
  


Reply

Tags
bash, purge, smart, ubuntu


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Get file modification date/time in Bash script cmfarley19 Programming 12 01-19-2013 09:37 AM
Bash Script to Copy Modification Date from a file to his folder pjgm Programming 12 07-31-2011 08:33 AM
[SOLVED] copying files according to modification date and extension SriniKlr Linux - Newbie 5 01-03-2011 03:45 AM
[SOLVED] merge files by creation/modification date? andre.fm Linux - Newbie 5 10-04-2010 06:41 PM
copy folder/files according to modification date bkcreddy17 Programming 14 10-15-2008 07:24 PM


All times are GMT -5. The time now is 05:21 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration