LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   deleting files in directories but with exclusions of some directories (https://www.linuxquestions.org/questions/linux-newbie-8/deleting-files-in-directories-but-with-exclusions-of-some-directories-4175683070/)

goodiemobster 10-02-2020 07:08 AM

deleting files in directories but with exclusions of some directories
 
Hi,
I'm trying to delete files older then 1 year on a server with +1000 vhosts.

location of the files:
/var/www/vhosts/CLIENT1/httpdocs/files/fotos/
/var/www/vhosts/CLIENT2/httpdocs/files/fotos/
...
/var/www/vhosts/CLIENT1000/httpdocs/files/fotos/

I used to do this with a simple:
find /var/www/vhosts/ -type f -name '*.jpg' -mtime +356 -exec rm {} \;

This worked very good but now some clients don't want these files removed.

Is there a way to exclude multiple /CLIENTX/httpdocs/files/fotos directories?
It would be great if I could put these in a blacklist.txt or something which i can simply adjust before running the removal command.

I'm new here and relatively new to linux, if i did not post this in the right forum, my excuses!

pan64 10-02-2020 07:23 AM

https://stackoverflow.com/questions/...n-find-command

Turbocapitalist 10-02-2020 07:33 AM

If there is a pattern to the excluded directories such that you can make a relevant regular expression pattern, then you could use the -regex option:

Code:

find /var/www/vhosts/ -type f -name '*.jpg' -mtime +356 \
        -regextype posix-egrep \
        -not -regex '^/var/www/vhosts/CLIENT[0-9]+/httpdocs/files/fotos/.*' \
        -print

If that works, then append a -delete after the -print option. For the types of regular expression patterns supported, try find -regextype help or see the manual page.

Otherwise you can make a shell script to generate the find command.

Edit: There is an implied logical AND between all the find options. It doesn't have to be written but it is there.

goodiemobster 10-02-2020 08:06 AM

tnx for the answers. I must say that unfortunately regex does not apply because the client names are all different (like companynames)

I'm looking for a nice overview also, so i can quickly read the clients who are excluded, instead of one long sentence with all directories after another one.
Mixing up some solutions provided by you, would this work then? (i will definitely test this also myself in a testing environment, but a quick hint is always welcome :-)


find /var/www/vhosts/ -type f -name '*.jpg' -mtime +356 -exec rm {} \;
-not \( -path /companyx -prune \) \
-not \( -path /firmZ -prune \) \
-not \( -path /othercompany12 -prune \) \
-not \( -path /somecompany \) \
-not \( -path /xyz \) \
-not \( -path /123 \) \
-not \( -path /list -prune \)
-not \( -path /goes -prune \)
-not \( -path /on -prune \)

Turbocapitalist 10-02-2020 08:52 AM

I'm not sure there's an easy way with find, if you are reading a file containing excluded directories. You could pipe things through some other utilities though and maybe the combined result does what you want.

Code:

find /var/www/vhosts/ -type f -print0 \
        | grep --invert-match --null-data --extended-regexp --file directories.to.exclude.txt \
        | xargs --null echo rm

The file 'directories.to.exclude.txt' would then contain the directory patterns to exclude. So it would be good to anchor them to the start of the line with a caret ^ on each one and treat them as absolute paths rather than relative paths.

Code:

cat << EOF > directories.to.exclude.txt
^/companyx
^/firmZ
^/othercompany12
^/somecompany
^/xyz
^/123
^/list
^/goes
^/on
EOF

See "man grep" and "man xargs" for shorter options.

MadeInGermany 10-02-2020 08:58 AM

The following comes to my mind - untested.
Code:

skipdirs=(
/companyx
/firmZ
/othercompany12
/list
/goes
/on
)
oIFS=$IFS
IFS="
"
prunelist= or=
for d in "${skipdirs[@]}"
do
  prunelist+="$or
-path
$d
"
  or="-o"
done
[ -n "$or" ] && prunelist="-type
d
(
$prunelist
)
-prune
-o
"
find /var/www/vhosts/ $prunelist -type f -name '*.jpg' -mtime +365 -atime +7 -exec echo rm {} \;


pan64 10-02-2020 09:13 AM

it looks like (for me) we need a client specific setup/config and a script to process them one by one.
I would probably use something else instead of find and shell: perl/python/...

X-LFS-2010 10-04-2020 12:32 AM

> I'm trying to delete files older then 1 year on a server with +1000 vhosts.

#1 be VERY CAREFUL. PC's are infamous for "occasionally having the wrong time", either the PC clock is off at boot, or some files (that you moved or copied) have a wrong date (perhaps they were altered when the PC clock was off), etc.

in general: never do backups by time unless you don't care what is lost

computersavvy 10-04-2020 10:27 AM

Quote:

Originally Posted by goodiemobster (Post 6171877)
Hi,
I'm trying to delete files older then 1 year on a server with +1000 vhosts.

location of the files:
/var/www/vhosts/CLIENT1/httpdocs/files/fotos/
/var/www/vhosts/CLIENT2/httpdocs/files/fotos/
...
/var/www/vhosts/CLIENT1000/httpdocs/files/fotos/

I used to do this with a simple:
find /var/www/vhosts/ -type f -name '*.jpg' -mtime +356 -exec rm {} \;

This worked very good but now some clients don't want these files removed.

Is there a way to exclude multiple /CLIENTX/httpdocs/files/fotos directories?
It would be great if I could put these in a blacklist.txt or something which i can simply adjust before running the removal command.

I'm new here and relatively new to linux, if i did not post this in the right forum, my excuses!


It would seem to me very easy to create a file with a list of all the clients who desire that the files be deleted in the format
CLIENT1
CLIENT2
CLIENT3
etc.

Yes, the initial creation of the file may take a while if done manually, but only a few minutes if done with a script which could pull the client name from the directory names, and maintenance should be easy.

Then add a while loop to the find command which reads the client name from the file, one at a time, and for each client do the find and delete similar to

Code:


read CLIENT from clientfile
while not end of clientfile
do
  find /var/www/vhosts/$CLIENT/httpdocs/files/fotos/ -type f -name '*.jpg' -mtime +356 -exec rm {}
  read CLIENT from clientfile

done

something like this would make it easy to add or remove clients from the list whose files would be deleted by a simple edit of the client file.

chrism01 10-08-2020 01:03 AM

I was going to suggest basically the same thing; create a whitelist of those that DO need to be deleted and loop through it - much simpler than all that fancy negative matching; K.I.S.S :)

This would also be a good time to add a cmd at the end to check the disk space, as the uncleared companies' files are going to eat up the disk ...


All times are GMT -5. The time now is 12:57 AM.