LinuxQuestions.org - Script to replace Control characters with a number, if not with Space

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Script to replace Control characters with a number, if not with Space (https://www.linuxquestions.org/questions/linux-newbie-8/script-to-replace-control-characters-with-a-number-if-not-with-space-4175527232/)

Sunray74

12-03-2014 01:32 PM

Script to replace Control characters with a number, if not with Space

Hi Guys,

I'm new to Linux and i have a project where my script need to read the text files and replace the Control-L character in the First Byte with a number and if there is no Control-L character present then
the script should append/insert 'space' in the 1st byte of every line in the file.

I have created the script till replacing the Control-L character but unable to add the Decode statement to the SED, like if there is no Control-L then insert space in the first byte.

My Script:

** Quote **

#!/bin/bash

outdir='outdir'
indir=`pwd`

if [ $# -eq 1 ]
then

filename=$1

>$outdir/$filename.txt

files=`ls $indir/$filename[a-z].txt`

for i in $files
do
i=`basename $i`

sed s/^L/1/g $indir/$i > $outdir/$i

cat $outdir/$i >> $outdir/$filename.txt

rm $outdir/$i

done
else

echo "Invalid args"

exit 1
fi

** UnQuote **

Any help would be highly appreciated.

Regards.

grail

12-03-2014 07:21 PM

Couple of points:

1. Please use [code][/code] tags around code or data to protect formatting

2. 'Invalid args' whilst semi-informative it does not advise what the correct input may have been

3. $() is clearer and more versatile than ``

4. [[]] is preferred over [] and (()) preferred for arithmetic

5. Use your globbing directly in the for loop and try not to use ls. This example may not have issues but should your files names have unusual characters, such as spaces, the for loop will not behave as expected

Code:

for i in $indir/$filename[a-z].txt

6. Use meaningful variable names, like the 'i' in for loop above as in a large script the start of the loop may be many lines back so when you come across $i you have no real idea what it refers to

7. An alternative to basename is using the ## bash construct

8. Quote all variables, again to preserve any unusual characters ... of course the caveat is you may well want the unusual characters to expand

9. Quote sed statements (I have also shown quoting for variables below)

Code:

sed 's/^L/1/g' "$indir/$i" > "$outdir/$i"

10. You go from indir_i to outdir_i to outdir_filename and then remove outdir_i. Why not cut out the middle man and simply redirect the sed immediately into outdir_filename. This also allows you to remove the basename
line as well as we can now simply use our for loop variable:

Code:

sed 's/^L/1/g' "$i" >> "$outdir/$filename"

11. sed pattern flag, 'g', makes no sense based on your requirement that we are looking at the start of the line. Currently you will replace '^L' anywhere in the line and not specifically at the start.
Here is the alternative:

Code:

sed 's/^^L/1/' "$i" >> "$outdir/$filename"

Above are merely suggestions :)

Ok, on to your actual question. You are already replacing the '^L' so now you need to tell sed what to do for the non-^L starting lines.
Important tip here is that you need to process the non-^L lines first (try placing solution below in the alternate order and see what happens:

Code:

sed -re 's/^([^^L])/ \1/' -e 's/^^L/1/' "$i" >> "$outdir/$filename"

Sunray74

12-03-2014 11:30 PM

Thanks a lot "grail". i would test the functionality and would update you accordingly.

Thanks once again for your suggestions and would make sure that i follow the standards.

Sunray74

12-04-2014 10:40 AM

Thanks a Ton !! The code has worked for me and i have taken your suggestions in to consideration to modify the code accordingly.

Thanks once again.

Regards

grail

12-04-2014 08:27 PM

No probs ... glad we got there :) Please mark as SOLVED once you have your solution.

Sunray74

12-08-2014 10:55 AM

Hi Grail,

Need your help in finding a way to read "*" (asterisk) in the file name, like for ex: test*123, and based on that i have to include
a character (any of the 4 alphabets - F, P, R or X) in place of the "*" and write the file name as "testF123" (replace asterisk with "F" in this case) to the output file.

Based on the file name, if it has a * then there should be a condition to write any of the 4 alphabets mentioned above in place of * and write the new file name (TESTF123)
to the output on the top left, precisely 3rd or 4th line and if there is no "*" in the file name then the earlier logic should be applied as is.

makes sense ?

Please let me know if you need more clarifications, so that i can get back to you with more information.

Thanks for your help.

grail

12-08-2014 06:26 PM

Quoting ... quoting ... quoting :)

Double quotes around variables will protect them from expanding unusual characters, such as *, and single quotes force any character to be seen as is, hence
using these two concepts you should easily be able to get bash to work out which files you need to update.

Sunray74

12-10-2014 10:34 AM

Thanks for providing the tips. i was able to achieve the requirement which i have posted couple days back with the help of SED command.

also, can you guide me on how to search a specific string in a file and add a new string below the searched string ?

Ex: using my above script, i need to search a word "search_word" in the files where my script is reading and exactly below that i need to insert
a new word "new_word" and write that to the output. "search_word" would be in many places but would be starting only from the first byte.

Appreciate if you could throw some light.

Regards

pan64

12-10-2014 10:42 AM

Quote:

Originally Posted by Sunray74 (Post 5282543)

also, can you guide me on how to search a specific string in a file and add a new string below the searched string ?

in general:
sed 's/<search string>/<replace string>/' inputfile > outputfile works, you need to only specify those strings. For example beginning of line is ^, so:
sed 's/^search_word/new_wordsearch_word/' will do something like you described. see man page of sed and tutorials, examples about usage to find the most convenient way....

Sunray74

12-10-2014 11:51 AM

Hi Pan64,

Thanks for the idea but this would actually replace the existing string which i do not want.
i tried that option to search for a 'search_word' and replace with 'search_word new_word", this way
i can achieve it but my requirement is as below..

File:

xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx
xxxxxxxxxxxxxxxxxxx

New Output should be like this...

xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx
new_wordxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx
new_wordxxxxxxxxxxx

Appreciate if you could guide me in achieving the required output.

Thanks a lot for all your help.

Regards,

pan64

12-10-2014 12:00 PM

so that means you want to print the original line and the modified version too?
sed '/search_word/{p;s/search_word/replace_word/}'

Sunray74

12-10-2014 12:05 PM

Awesome, thanks a lot.

i have tested the logic on the command line and it worked. i would include this logic in to my script to test the functionality and would update you the results.

Thanks once again Pan64.

pan64

12-10-2014 12:09 PM

you are welcome
(if you really want to say thanks just press YES)

Sunray74

12-10-2014 01:37 PM

Hi Pan64,

I have made a mistake here, the original file has empty lines after the "search_word" and the "new_word" only should insert below the "search_word" but it has taken the whole line with the command.

File:

xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx

xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx

xxxxxxxxxxxxxxxxxxx

New Output should be like this...

xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx
new_word

xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
search_wordxxxxxxxx
new_word

xxxxxxxxxxxxxxxxxxx

Sorry for the confusion and request you to provide me the correct syntax.

grail

12-10-2014 07:02 PM

So try:

Code:

sed '/search/a\new_word' file

Also, you can look here for more information on how to use sed.

pan64

12-11-2014 02:39 AM

Hi grail, what is that \ good for?
sed '/search/a\new_word' file
it works without that too.

grail

12-11-2014 07:54 AM

Well to be perfectly honest I just always followed the grymoire stuff, http://www.grymoire.com/Unix/Sed.html#uh-40

But if it works without I am happy too :)

Sunray74

12-11-2014 10:14 PM

Thanks a lot for all your help.

pan64

12-12-2014 03:12 AM

Glad to help you.
If you really want to say thanks just press YES

Sunray74

12-16-2014 12:42 PM

Hi Guys,

Need some more help on a specific scenario to achieve using a SED command, i have the logic in AWK but since i have lot of logic involved using SED am requesting the command for SED.

i have a pattern as shown below, need to search for 'zzzzz' and delete the next line but should not touch the first pattern, it should operate from the 2nd pattern

xxxxx
yyyyy
zzzzz -->search string and delete the empty line below (there are 3 empty lines) from the 2nd pattern

12345
12345
xxxxx
yyyyy
zzzzz --> basically from here, the empty line should be deleted

12345
12345

AWK command which does the job is shown below, tested succefully..
awk '/<search_word>/&&c++ {next} 1' < test2.txt

SED command which i tried is not working....
sed '1!{/^<search_word>/d;}' < test2.txt
or
sed '2,${/^<search_word>/d;}' < test2.txt

Appreciate your help.

pan64

12-16-2014 12:48 PM

this is a new problem, would be nice to open a new thread for that next time.
I do not really understand what should be deleted and also I do not really understand that awk script (for example what is that c++ good for?) Can you please show the result too.

Sunray74

12-16-2014 01:12 PM

Hi Pan64,

I have actually used the logic which you have advised to <search_word> and use <replace_word> but whats happening was it was actually copying the whole line and then replacing the <search_word> with the <replace_word>, so to delete the rest of the line, i have modified the script to remove the rest of the line as..

Original: sed '/search_word/{p;s/search_word/replace_word/}'

modified: sed -e "/^ <search_word>/{p;s/<search_word> .*/<replace_word>/}"

but this is actually inserting in a new line and i need to write another SED command to delete the extra line created from the above command, like

-e "/<look_for_replace_word>/{N;s/\n.*//;}"

which is working fine but now there is an issue with the alignment, as from the 2nd pattern the output seems good after deleting the extra line but the alignment is going wrong on the first pattern as am deleting the line below the pattern which should not happen.....should only delete the next line only after the 2nd pattern...

is there an option which would use the Original command above and add the <replace_word> with out adding a new line, like read the empty line and write the
<replace_word> with out inserting ? so that i do not need to delete any extra lines... ???

Make sense ?

onebuck

12-16-2014 04:17 PM

Moderator response

This whole thread seems to be Homework style queries. Spoon feeding helps no one.

Per the LQ Rules, please do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and Google searches) and we'll do our best to help. Also, keep in mind that your instructor might also be an LQ member.

All times are GMT -5. The time now is 04:16 PM.