LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 02-19-2013, 05:10 AM   #1
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Rep: Reputation: 16
Elegant bash script needed to make new directories based on matched patterns


Hi,

I am challenging myself a bit here and could do with some help.

I have a directory full of files which have long names such as (made up example):
aa_bbbb_ccc1D__YYYYMMDD.....
aa_bbbb_ccc1F__YYYYMMDD....
aa_bbbb_ccc2D__YYYYMMDD....
aa_gggg_ccc1D__YYYYMMDD...
aa_gggg_ccc1F__YYYYMMDD...

etc. We ignore the _gggg_ files.

I was aiming for something which avoided looping over all the entries as there might be a lot, so starting with:

find -name 'aa_bbbb_ccc1D*' -exec ...

For each file found I want to find the YYYYMM date characters, so something like:

expr substr fname 22 6 seems to work.

I want to use this YYYYMM to create a directory elsewhere on the system if it doesn't exist, then copy the file into it.

Then I move onto the aa_bbbb_ccc1F files, then the aa_bbbb_ccc2D files, the 2F files....

How can I do this most efficiently?

Cheers
 
Old 02-19-2013, 06:39 AM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,489

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Well, a loop would make the commands more readable and explicit and maybe less prone to errors. Anyway, here we go:
Code:
mkdir -p $(find . -name aa_bbbb_ccc??\* | uniq -s17 -w6 | sed -r 's:.{17}(.{6}).*:/path/to/destination/\1:')
This one just creates the destination directories. The uniq command removes the duplicates based only on the year and the month. The sed command adds the path of the destination directory and extract year and month. As you can see the pipeline is inside command substitution and the results are passed as arguments to mkdir.

At this point you can move the files using another find command with -exec:
Code:
find . -name aa_bbbb_ccc??\* -exec bash -c 'file="{}"; echo mv $file /path/to/destination/${file:17:6}/' \;
since the results of find are cached, it will not take a long time to run again. The whole thing assumes the files are exactly in the same format as shown in your example and the number of characters up to the YYYYMM part is exactly the same for all the files. Please, notice the echo statement before mv: it will let you review the results before actually execute the mv commands. If they look correct, remove echo and run again. Hope this helps.
 
1 members found this post helpful.
Old 02-19-2013, 06:57 AM   #3
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 3,408

Rep: Reputation: 834Reputation: 834Reputation: 834Reputation: 834Reputation: 834Reputation: 834Reputation: 834
I suggest using a bash shell
Code:
#/bin/bash

Path_Stub="/tmp/"

for name in "aa_bbbb_ccc1D", "aa_bbbb_ccc1F", "aa_bbbb_ccc2D" ; do
  for file in "$name"*; do
    [[ ! -d "$Path_Stub"${file##*_} ]] && mkdir "$Path_Stub"${file##*_};
    cp "$file" "$Path_Stub"${file##*_}/;
  done
done

Last edited by allend; 02-19-2013 at 07:11 AM.
 
2 members found this post helpful.
Old 02-19-2013, 05:42 PM   #4
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 3,408

Rep: Reputation: 834Reputation: 834Reputation: 834Reputation: 834Reputation: 834Reputation: 834Reputation: 834
Rereading the original post, I realised I missed the requirement to just extract the YYYYMM.
Based on the example given '${file:16:6}' would be a better parameter expansion than '${file##*_}' in the above script.

Last edited by allend; 02-19-2013 at 05:43 PM.
 
1 members found this post helpful.
Old 03-11-2013, 05:24 AM   #5
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Original Poster
Rep: Reputation: 16
Thanks guys! Sorry for the really late response - the job only comes up once a month and I was too busy to pursue it in between!

I'm actually going to give the loop a go to start with as on a monthly basis we only have a few files at a time. If it turns out to be too slow when we are doing several years' worth I will use yours colucix but I do have to do some minor editing to your solution. The one thing I now see I wasn't clear about in the OP (which is what makes it such a pain) was that I need to put the date directory inside a directory which is like:

/destination_dir/bbbb_ccc1D/YYYYMM
/destination_dir/bbbb_ccc1F/YYYYMM
/destination_dir/bbbb_ccc2D/YYYYMM...

Why someone thought an archive like that was a good idea I don't know! You end up with one directory per file which is bloody stupid if you ask me.

Last edited by emmalg; 03-11-2013 at 05:51 AM.
 
  


Reply

Tags
bash, pattern, scripting


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash shell script filename matching patterns james jackson Programming 10 08-01-2012 08:02 AM
Help needed for using awk to parse a file to make array for bash script tallmtt Programming 12 04-14-2012 01:16 PM
bash script to use sed for filter mutiples patterns from apache access logs matyu Programming 5 02-06-2008 10:28 PM
bash: make rename script traverse directories morrolan Programming 2 11-08-2006 10:52 AM
Inputting with sed, based on matched pattern mike9287 Programming 5 07-20-2006 07:49 AM


All times are GMT -5. The time now is 03:13 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration