LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-06-2010, 03:40 PM   #1
tubaboy
LQ Newbie
 
Registered: Nov 2010
Posts: 3

Rep: Reputation: 0
Unhappy Bash scripting issue


Hi,

So what I'm trying to do is get a list of filenames from a directory, then group those files according to any common prefixes and store those groupings in an array. I'm stuck on a certain line, but it very well could be I'm just going about this the wrong way. Here's what I have so far:

Code:
DIR_PATH=$1;
[ ! -d $DIR_PATH ] && { echo "First argument does not appear to be a valid path"; exit 1; }

#like js|css
EXT=$2;

#CREATE ARRAY OF GROUPED FILES
FILES=$( ls $DIR_PATH/*.$EXT | xargs -n1 basename );
FILE_COUNT=$( echo $FILES | wc -w );
i=0;
for FILE in $FILES
do
  prefix=$( echo $FILE | cut -d '.' -f1 ); #this correctly gets everything up to first '.'
  group=$( echo ${FILES} | grep "${prefix}" );
  echo "group = $group"; #this still has all the other files that don't match $prefix
  #remove all entries that have the current prefix (so we don't continue looping though them) -- this doesn't work either
  FILES=$( echo -e $FILES | grep -Ev "${prefix}*\.${EXT}" );
...

What am I doing wrong?

THanks!!
 
Old 11-06-2010, 04:52 PM   #2
tubaboy
LQ Newbie
 
Registered: Nov 2010
Posts: 3

Original Poster
Rep: Reputation: 0
Figured it out

I figured it out. When I was echoing the $FILES string to pipe to sed or grep, I hadn't put double quotes around it. This means that spaces were not being preserved!

Correct lines:

Code:
group=$( echo "${FILES}" | grep "${prefix}" );
FILES=$( echo "${FILES}" | grep -Ev "${prefix}*\.${EXT}" );
Ugh.
 
Old 11-06-2010, 06:12 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Hi, welcome to LQ,

Good on yah for sorting it yourself, and thanks for
posting back your findings!


Cheers,
Tink
 
Old 11-06-2010, 11:11 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,254

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
I know you have solved your initial issue but I thought I would give a little feedback on what you have as it confused me
a little.
Quote:
#CREATE ARRAY OF GROUPED FILES
Whilst I understand what you have said here, strictly speaking you only created a single string with:
Code:
FILES=$( ls $DIR_PATH/*.$EXT | xargs -n1 basename )
Yes it will act like an array in a for loop because word splitting takes affect and the spaces used as delimeters.
Also the assumption here is no file names have spaces.

To actually be an array you need another set of parenthesis, like so:
Code:
FILES=($( ls $DIR_PATH/*.$EXT | xargs -n1 basename ))
I mention this as then you would be able to use array properties in the next line:
Code:
FILE_COUNT=$( echo $FILES | wc -w )
#becomes
FILE_COUNT=${#FILES[*]}
The following assumes not dots anywhere else in filename:
Code:
prefix=$( echo $FILE | cut -d '.' -f1 ); #this correctly gets everything up to first '.'
As you already have the extension stored in EXT you could use substituion:
Code:
prefix=${FILE%.$EXT}
Lastly, the following line concerns me:
Code:
FILES=$( echo "${FILES}" | grep -Ev "${prefix}*\.${EXT}" );
The reason it concerns me is because changing the values of what you are iterating over could come unstuck and cause
the rest of the program to be very wrong (just a thought)
 
Old 11-07-2010, 03:14 PM   #5
tubaboy
LQ Newbie
 
Registered: Nov 2010
Posts: 3

Original Poster
Rep: Reputation: 0
Hi Grail

Thanks for all your thoughts on the matter. I'm afraid that comment was a bit misleading however, as the array I was creating was further down the line. You are right that I was trying to modify the looping condition variable to affect the number of loops made. And this was a flawed decision.

Having finally got the whole script working, I realized I was making things a lot more complicated than I needed to. As often happens, the end result was very simple and took very few lines (though I'm sure someone could simplify further):

Code:
FILES=$( ls $DIR_PATH/*.$EXT | xargs -n1 basename );
for FILE in $FILES
do
  prefix=$( echo $FILE | cut -d '.' -f1 );
  group=$( echo "${FILES}" | grep -Ei "^${prefix}.*${EXT}$" );

  num_already_added=$( echo "${added}" | grep -c "^${prefix}.*${EXT}$" );
  if [ $num_already_added -lt 1 ] #ADD ALL FILES WITH THIS PREFIX
  then
    for g in $group
    do  
      #STUFF TO DO TO THESE FILES GOES HERE
      added=$added" "$g;
    done
  fi  
done
exit
In the end, there was no need for an Array. I understand how in your example an array would be preferable, but I don't think there would be an advantage now.

Finally, the reason for using cut instead of bash substitution is that the file names WILL have multiple dots. What I'm doing here is taking files like:

xxx.yyy.txt
xxx.zzz.txt
xxx.aaa.txt
jjj.bbb.txt
jjj.txt

and aggregating all lines from the files with the 'xxx' prefix into one file, say, xxx.aggregate.txt, then doing same for 'jjj', etc.

Is there a way to get just the 'xxx' using bash substitution?

Thanks again for your kind response.
 
Old 11-15-2010, 08:41 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,254

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Hi tubaboy

Sorry this took so long but i was on holidays when I posted and have only just got back to a computer
Yes substitution is not an issue and here is another alternative for you to look at:
Code:
#!/bin/bash

declare -A FILES

DIR_PATH=t2
EXT=txt

while read -r FILE
do
    prefix=${FILE%%.*}
    (( ${#FILES[$prefix]} > 0 )) && continue
    FILES[$prefix]=$(find $DIR_PATH -type f -iname "$prefix*.$EXT" -printf "%f ")
done< <(find $DIR_PATH -type f -iname "*.$EXT" -printf "%f\n")

for x in ${!FILES[@]}
do
    echo "${FILES[$x]}" # this is just to show the right details are being retrieved
    #STUFF TO DO TO THESE FILES GOES HERE
done
Hope some of this helps.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash scripting issue with "source" or . (dot) operator (Cygwin & Ubuntu) brian.hussey Programming 3 01-22-2010 02:11 PM
bash scripting issue thesav Programming 7 06-20-2009 03:38 AM
Reading a bash variable in bash scripting problem freeindy Programming 3 11-27-2008 03:29 AM
Looping issue in bash scripting ZAMO Linux - General 4 09-26-2008 12:22 PM
Bash scripting Gunslinger_ROL Programming 5 09-28-2004 12:37 PM


All times are GMT -5. The time now is 02:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration