LinuxQuestions.org - [SOLVED] bash - exclude some characters from a file

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - bash - exclude some characters from a file (https://www.linuxquestions.org/questions/programming-9/bash-exclude-some-characters-from-a-file-4175446861/)

Hoxygen232

01-23-2013 04:08 AM

bash - exclude some characters from a file

hi,

I would like to print this file:

Code:

anche

due

-tre

quattro

pile,

a

patto.

che

eroghino--

la

stessa

tensione,

to onother file in this way:

Code:

anche

due

tre

quattro

pile

a

patto

che

eroghino

la

stessa

tensione

so without spaces, commas, etc...
it has to be clean of spaces etc... because after that I need to parse the file word by word

I have done this:

Code:

 while read line; 

 do 

    TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )  

    echo "WORD TO SEARCH (trasformata): $TRASFORMER:" 

    WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )  # search a dictionar

  if [ -z "$WORD" ]  

  then

    echo "not found"

    echo 

  else              

    echo "word found.........." 

    RIGA=$( fgrep -w -n "$WORD" "$dictfile" | sed s/:$WORD//g | sed 's/ //g')  

    echo $RIGA

    echo "$RIGA" >> $FILE_OUTPUT 

  fi

 done < $FILE_INPUT

but when it reach the word "a" it prints on terminal:

Code:

WORD TO SEARCH: a:

word found..........

sed: expression -e #1, character 5: command `s' not terminated

why?

Thank you

millgates

01-23-2013 05:14 AM

Hi,

Code:

TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )

You don't have to put each character into a separate regexp. You can just write

Code:

sed -e 's/[.,- ]//g'

Also, if you have recent enough bash, this should work:

Code:

TRANSFORMER="${line//[,.- ]/}"

You can also transform the text to lower/upper case like this:

Code:

lower="${line,,}"

upper="${line^^}"

But this is not necessary, since your greps are case-insensitive anyway.

Code:

sed: expression -e #1, character 5: command `s' not terminated

This is because you don't quote the expression:

Code:

sed s/:$WORD//g

if $WORD contains any whitespace characters, the expression will be word-split by bash.
My guess is, the

Code:

WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )

finds multiple matches in your $dictfile, so $WORD contains multiple words separated by newlines.

Hoxygen232

01-23-2013 07:38 AM

you're right, the problem is that, while searching the word in dictfile, it finds the LAST word that contains 'a';
i.e. I search word a and it finds: whisky-a-gogo
so how to solve it?

thanks

millgates

01-23-2013 08:53 AM

Well, you need a better matching pattern. Dash is not considered a "word" character, so grep -w foo will match "bar-foo-baz". If your "$dictfile" contains one word per line, with no leading/trailing whitespaces, you can match whole lines:

Code:

grep "^${WORD}$"

Hoxygen232

01-23-2013 08:57 AM

thanks, it works great

All times are GMT -5. The time now is 11:59 AM.