LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Bash scripting issue (https://www.linuxquestions.org/questions/linux-newbie-8/bash-scripting-issue-842763/)

tubaboy 11-06-2010 02:40 PM

Bash scripting issue
 
Hi,

So what I'm trying to do is get a list of filenames from a directory, then group those files according to any common prefixes and store those groupings in an array. I'm stuck on a certain line, but it very well could be I'm just going about this the wrong way. Here's what I have so far:

Code:

DIR_PATH=$1;
[ ! -d $DIR_PATH ] && { echo "First argument does not appear to be a valid path"; exit 1; }

#like js|css
EXT=$2;

#CREATE ARRAY OF GROUPED FILES
FILES=$( ls $DIR_PATH/*.$EXT | xargs -n1 basename );
FILE_COUNT=$( echo $FILES | wc -w );
i=0;
for FILE in $FILES
do
  prefix=$( echo $FILE | cut -d '.' -f1 ); #this correctly gets everything up to first '.'
  group=$( echo ${FILES} | grep "${prefix}" );
  echo "group = $group"; #this still has all the other files that don't match $prefix
  #remove all entries that have the current prefix (so we don't continue looping though them) -- this doesn't work either
  FILES=$( echo -e $FILES | grep -Ev "${prefix}*\.${EXT}" );
...


What am I doing wrong?

THanks!!

tubaboy 11-06-2010 03:52 PM

Figured it out
 
I figured it out. When I was echoing the $FILES string to pipe to sed or grep, I hadn't put double quotes around it. This means that spaces were not being preserved!

Correct lines:

Code:

group=$( echo "${FILES}" | grep "${prefix}" );
FILES=$( echo "${FILES}" | grep -Ev "${prefix}*\.${EXT}" );

Ugh.

Tinkster 11-06-2010 05:12 PM

Hi, welcome to LQ,

Good on yah for sorting it yourself, and thanks for
posting back your findings!


Cheers,
Tink

grail 11-06-2010 10:11 PM

I know you have solved your initial issue but I thought I would give a little feedback on what you have as it confused me
a little.
Quote:

#CREATE ARRAY OF GROUPED FILES
Whilst I understand what you have said here, strictly speaking you only created a single string with:
Code:

FILES=$( ls $DIR_PATH/*.$EXT | xargs -n1 basename )
Yes it will act like an array in a for loop because word splitting takes affect and the spaces used as delimeters.
Also the assumption here is no file names have spaces.

To actually be an array you need another set of parenthesis, like so:
Code:

FILES=($( ls $DIR_PATH/*.$EXT | xargs -n1 basename ))
I mention this as then you would be able to use array properties in the next line:
Code:

FILE_COUNT=$( echo $FILES | wc -w )
#becomes
FILE_COUNT=${#FILES[*]}

The following assumes not dots anywhere else in filename:
Code:

prefix=$( echo $FILE | cut -d '.' -f1 ); #this correctly gets everything up to first '.'
As you already have the extension stored in EXT you could use substituion:
Code:

prefix=${FILE%.$EXT}
Lastly, the following line concerns me:
Code:

FILES=$( echo "${FILES}" | grep -Ev "${prefix}*\.${EXT}" );
The reason it concerns me is because changing the values of what you are iterating over could come unstuck and cause
the rest of the program to be very wrong (just a thought)

tubaboy 11-07-2010 02:14 PM

Hi Grail
 
Thanks for all your thoughts on the matter. I'm afraid that comment was a bit misleading however, as the array I was creating was further down the line. You are right that I was trying to modify the looping condition variable to affect the number of loops made. And this was a flawed decision.

Having finally got the whole script working, I realized I was making things a lot more complicated than I needed to. As often happens, the end result was very simple and took very few lines (though I'm sure someone could simplify further):

Code:

FILES=$( ls $DIR_PATH/*.$EXT | xargs -n1 basename );
for FILE in $FILES
do
  prefix=$( echo $FILE | cut -d '.' -f1 );
  group=$( echo "${FILES}" | grep -Ei "^${prefix}.*${EXT}$" );

  num_already_added=$( echo "${added}" | grep -c "^${prefix}.*${EXT}$" );
  if [ $num_already_added -lt 1 ] #ADD ALL FILES WITH THIS PREFIX
  then
    for g in $group
    do 
      #STUFF TO DO TO THESE FILES GOES HERE
      added=$added" "$g;
    done
  fi 
done
exit

In the end, there was no need for an Array. I understand how in your example an array would be preferable, but I don't think there would be an advantage now.

Finally, the reason for using cut instead of bash substitution is that the file names WILL have multiple dots. What I'm doing here is taking files like:

xxx.yyy.txt
xxx.zzz.txt
xxx.aaa.txt
jjj.bbb.txt
jjj.txt

and aggregating all lines from the files with the 'xxx' prefix into one file, say, xxx.aggregate.txt, then doing same for 'jjj', etc.

Is there a way to get just the 'xxx' using bash substitution?

Thanks again for your kind response.

grail 11-15-2010 07:41 AM

Hi tubaboy

Sorry this took so long but i was on holidays when I posted and have only just got back to a computer :)
Yes substitution is not an issue and here is another alternative for you to look at:
Code:

#!/bin/bash

declare -A FILES

DIR_PATH=t2
EXT=txt

while read -r FILE
do
    prefix=${FILE%%.*}
    (( ${#FILES[$prefix]} > 0 )) && continue
    FILES[$prefix]=$(find $DIR_PATH -type f -iname "$prefix*.$EXT" -printf "%f ")
done< <(find $DIR_PATH -type f -iname "*.$EXT" -printf "%f\n")

for x in ${!FILES[@]}
do
    echo "${FILES[$x]}" # this is just to show the right details are being retrieved
    #STUFF TO DO TO THESE FILES GOES HERE
done

Hope some of this helps.


All times are GMT -5. The time now is 06:31 AM.