LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Elegant bash script needed to make new directories based on matched patterns (http://www.linuxquestions.org/questions/linux-general-1/elegant-bash-script-needed-to-make-new-directories-based-on-matched-patterns-4175450769/)

emmalg 02-19-2013 05:10 AM

Elegant bash script needed to make new directories based on matched patterns
 
Hi,

I am challenging myself a bit here and could do with some help.

I have a directory full of files which have long names such as (made up example):
aa_bbbb_ccc1D__YYYYMMDD.....
aa_bbbb_ccc1F__YYYYMMDD....
aa_bbbb_ccc2D__YYYYMMDD....
aa_gggg_ccc1D__YYYYMMDD...
aa_gggg_ccc1F__YYYYMMDD...

etc. We ignore the _gggg_ files.

I was aiming for something which avoided looping over all the entries as there might be a lot, so starting with:

find -name 'aa_bbbb_ccc1D*' -exec ...

For each file found I want to find the YYYYMM date characters, so something like:

expr substr fname 22 6 seems to work.

I want to use this YYYYMM to create a directory elsewhere on the system if it doesn't exist, then copy the file into it.

Then I move onto the aa_bbbb_ccc1F files, then the aa_bbbb_ccc2D files, the 2F files....

How can I do this most efficiently?

Cheers

colucix 02-19-2013 06:39 AM

Well, a loop would make the commands more readable and explicit and maybe less prone to errors. Anyway, here we go:
Code:

mkdir -p $(find . -name aa_bbbb_ccc??\* | uniq -s17 -w6 | sed -r 's:.{17}(.{6}).*:/path/to/destination/\1:')
This one just creates the destination directories. The uniq command removes the duplicates based only on the year and the month. The sed command adds the path of the destination directory and extract year and month. As you can see the pipeline is inside command substitution and the results are passed as arguments to mkdir.

At this point you can move the files using another find command with -exec:
Code:

find . -name aa_bbbb_ccc??\* -exec bash -c 'file="{}"; echo mv $file /path/to/destination/${file:17:6}/' \;
since the results of find are cached, it will not take a long time to run again. The whole thing assumes the files are exactly in the same format as shown in your example and the number of characters up to the YYYYMM part is exactly the same for all the files. Please, notice the echo statement before mv: it will let you review the results before actually execute the mv commands. If they look correct, remove echo and run again. Hope this helps.

allend 02-19-2013 06:57 AM

I suggest using a bash shell
Code:

#/bin/bash

Path_Stub="/tmp/"

for name in "aa_bbbb_ccc1D", "aa_bbbb_ccc1F", "aa_bbbb_ccc2D" ; do
  for file in "$name"*; do
    [[ ! -d "$Path_Stub"${file##*_} ]] && mkdir "$Path_Stub"${file##*_};
    cp "$file" "$Path_Stub"${file##*_}/;
  done
done


allend 02-19-2013 05:42 PM

Rereading the original post, I realised I missed the requirement to just extract the YYYYMM.
Based on the example given '${file:16:6}' would be a better parameter expansion than '${file##*_}' in the above script.

emmalg 03-11-2013 05:24 AM

Thanks guys! Sorry for the really late response - the job only comes up once a month and I was too busy to pursue it in between!

I'm actually going to give the loop a go to start with as on a monthly basis we only have a few files at a time. If it turns out to be too slow when we are doing several years' worth I will use yours colucix but I do have to do some minor editing to your solution. The one thing I now see I wasn't clear about in the OP (which is what makes it such a pain) was that I need to put the date directory inside a directory which is like:

/destination_dir/bbbb_ccc1D/YYYYMM
/destination_dir/bbbb_ccc1F/YYYYMM
/destination_dir/bbbb_ccc2D/YYYYMM...

Why someone thought an archive like that was a good idea I don't know! You end up with one directory per file which is bloody stupid if you ask me.


All times are GMT -5. The time now is 08:30 PM.