[SOLVED] bash

Hoxygen232 · 01-23-2013, 04:08 AM

hi,

I would like to print this file:

Code:

anche
due
-tre
quattro
pile,
a
patto.
che
eroghino--
la
stessa
tensione,

to onother file in this way:

Code:

anche
due
tre
quattro
pile
a
patto
che
eroghino
la
stessa
tensione

so without spaces, commas, etc...
it has to be clean of spaces etc... because after that I need to parse the file word by word

I have done this:

Code:

 while read line; 
 do 
     TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )  
     echo "WORD TO SEARCH (trasformata): $TRASFORMER:" 
     WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )  # search a dictionar
  if [ -z "$WORD" ]  
  then
     echo "not found"
     echo 
  else               
     echo "word found.........." 
     RIGA=$( fgrep -w -n "$WORD" "$dictfile" | sed s/:$WORD//g | sed 's/ //g')   
     echo $RIGA
     echo "$RIGA" >> $FILE_OUTPUT 
  fi
 done < $FILE_INPUT

but when it reach the word "a" it prints on terminal:

Code:

WORD TO SEARCH: a:
word found..........
sed: expression -e #1, character 5: command `s' not terminated

why?

Thank you

millgates · 01-23-2013, 05:14 AM

Hi,

Code:

TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )

You don't have to put each character into a separate regexp. You can just write

Code:

sed -e 's/[.,- ]//g'

Also, if you have recent enough bash, this should work:

Code:

TRANSFORMER="${line//[,.- ]/}"

You can also transform the text to lower/upper case like this:

Code:

lower="${line,,}"
upper="${line^^}"

But this is not necessary, since your greps are case-insensitive anyway.

Code:

sed: expression -e #1, character 5: command `s' not terminated

This is because you don't quote the expression:

Code:

sed s/:$WORD//g

if $WORD contains any whitespace characters, the expression will be word-split by bash.
My guess is, the

Code:

WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )

finds multiple matches in your $dictfile, so $WORD contains multiple words separated by newlines.

Hoxygen232 · 01-23-2013, 07:38 AM

you're right, the problem is that, while searching the word in dictfile, it finds the LAST word that contains 'a';
i.e. I search word a and it finds: whisky-a-gogo
so how to solve it?

thanks

millgates · 01-23-2013, 08:53 AM

Well, you need a better matching pattern. Dash is not considered a "word" character, so grep -w foo will match "bar-foo-baz". If your "$dictfile" contains one word per line, with no leading/trailing whitespaces, you can match whole lines:

Code:

grep "^${WORD}$"

Hoxygen232 · 01-23-2013, 08:57 AM

thanks, it works great