LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-27-2010, 02:56 PM   #1
dmafcoi
LQ Newbie
 
Registered: May 2010
Posts: 22

Rep: Reputation: 0
Text manipulation find / replace variable efficiency


What I have works, but wondering what is the 'right' way to replace the digits with the letters given in this loop? somehow use a case or multiple sed? i thought of a multiple sed or a case but couldn't get it to work
Code:
# ...
bcv=$(echo $line | awk -F" " '{ print $1 }' | sed 's/1/q/g;s/2/w/g;s/3/e/g') # and so on
Code:
while read line
do
     bcv=$(echo $line | awk -F" " '{ print $1 }')
     if [ $bcv == "" ]
     then
          echo "skipping null line"
     else
          bcv=$(echo $line | awk -F" " '{ print $1 }')
          bcv=$(echo ${bcv//1/q})
          bcv=$(echo ${bcv//2/w})
          bcv=$(echo ${bcv//3/e})
          bcv=$(echo ${bcv//4/r})
          bcv=$(echo ${bcv//5/t})
          bcv=$(echo ${bcv//6/y})
          bcv=$(echo ${bcv//7/u})
          bcv=$(echo ${bcv//8/i})
          bcv=$(echo ${bcv//9/o})
          bcv=$(echo ${bcv//0/p})
          lc=$(echo $line | awk -F" " '{ print $1 }' | head -c 4)
          lc2=$(echo ${babb}${cch} | head -c 4)
          if [ $lc == $lc2 ]
          then
               echo "inserting ref to ${filelink}"
               echo "<p><a href=\"toc.html\">${bcv}</a> | ${line}</p>" >> ${filelink}
          else
               echo "<p>${line}</p>" >> ${filelink}
               echo "skipped non-ref line"
          fi
     fi
done < $x
 
Old 08-27-2010, 10:32 PM   #2
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 100Reputation: 100
You can use 'sed -e' as many times as you want.
 
1 members found this post helpful.
Old 08-28-2010, 03:28 AM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
The first example using multiple sed commands separated by semi-colons works for me. Please, can you show an example of what actually doesn't work? An aside note: the default field separator in awk is one or more blank spaces so that it's not necessary to specify it using the -F option.

Back to the topic, this task is perfectly suitable for the tr command. Examples:
Code:
$ echo 01234567890123456789 | tr 1234567890 qwertyuiop
pqwertyuiopqwertyuio
$ echo 457262 | tr 1234567890 qwertyuiop
rtuwyw
the characters (numbers) in the first set will be translated to the matching characters in the second set. If I understand well, this is exactly what you need in an efficient way.

Last edited by colucix; 08-28-2010 at 04:09 AM. Reason: Spelling corrected... maybe...
 
1 members found this post helpful.
Old 08-28-2010, 04:17 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
I would also add that have a sed follow an awk is a complete waste as both would have the necessary options to perform either task.
 
1 members found this post helpful.
Old 08-28-2010, 10:03 AM   #5
dmafcoi
LQ Newbie
 
Registered: May 2010
Posts: 22

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by colucix View Post
... this task is perfectly suitable for the tr command.
tr! Exactly what I needed, thank you. I could get sed to work, but it was taking forever... still learning. I can usually get done what I need, but I want to do it 'well'. If anything's worth doing...

thanks all

Code:
while read line
do
     bcv=$(echo $line | awk '{ print $1 }')
     if [ $bcv == "" ]
     then
          echo "skipping null line"
     else
          bcv=$(echo $line | awk '{ print $1 }' | tr 1234567890 qwertyuiop)
          lc=$(echo $line | awk '{ print $1 }' | head -c 4)
          lc2=$(echo ${babb}${cch} | head -c 4)
          if [ $lc == $lc2 ]
          then
               echo "inserting ref to ${filelink}"
               echo "<p><a href=\"toc.html\">${bcv}</a> | ${line}</p>" >> ${filelink}
          else
               echo "<p>${line}</p>" >> ${filelink}
               echo "skipped non-ref line"
          fi
     fi
done < $x

Last edited by dmafcoi; 08-28-2010 at 10:36 AM.
 
Old 08-28-2010, 10:55 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Well I am guessing there must be more to the script seeing as there are several variables used that have never been assigned.
Glad you got it working though.

I am curious though, why not throw your awk onto the input for your loop and that way negate the need to test for null lines?
Unless of course you need to see this output for some reason.

Would look something like:
Code:
while read -r line
do
done< <(awk '!/^$/{print $1}' $x)
Also you can get away without the echo | awk | head as well and keep it all in house, ie in bash:
Code:
lc=${line:0:4}
In fact, sorry I keep thinking about things, you could tidy it up something like:
Code:
while read -r line
do
    bcv=$(echo $line | tr 1234567890 qwertyuiop)
    lc2="${babb}${cch}"
    if [[ ${line:0:4} == ${lc2:0:4} ]]
    then
        echo "inserting ref to ${filelink}"
        echo "<p><a href=\"toc.html\">${bcv}</a> | ${line}</p>" >> ${filelink}
    else
        echo "<p>${line}</p>" >> ${filelink}
        echo "skipped non-ref line"
    fi
done < <(awk '!/^$/{print $1}' $x)
Well something to think about at least. You may need to play with a little bit.
 
1 members found this post helpful.
Old 08-28-2010, 01:21 PM   #7
dmafcoi
LQ Newbie
 
Registered: May 2010
Posts: 22

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
Well I am guessing there must be more to the script seeing as there are several variables used that have never been assigned.
...
Hi and thank you. I have a folder of txt files. Some of the lines in these files have a a book and chapter reference. My goal was to convert all the txt files to basic html with contents, then to epub.. etc. My e-reader's numbers are the top row, so have to hold down the alt key to get the number, which is a pain, so i changed all the numbers to the corresponding letter. Here's the whole thing if you want to see it... i can usually get done what I need to do, but my code is ugly and the only way to learn is to do and ask for improvements...

I didn't know I could easily extract a range of text like.. echo ${var:0:4}, thanks

Code:
#!/bin/bash
# begin toc.html
echo "<HTML>" > toc.html
echo "<div>Table of Contents</div>" >> toc.html
echo "<br>" >> toc.html
echo "<BODY>" >> toc.html

# put each chapter into file
for i in `cat books.nfo`
do
	sed -i '/./!d' $i

	bnam=$(echo $i | awk -F"." '{ print $1 }')
	csplit --digits=3 -s -f ${bnam}_ $i /CHAPTER/ {*}

	rm ${bnam}_000

	echo "<div>$bnam</div>" >> toc.html
	echo "  <blockquote>" >> toc.html
	echo "  <div>" >> toc.html

	for x in `ls *[0-9]`
	do
		chn=$(echo $x | awk -F"_" '{ print $2 }')
		sed -i 's/.$//' $x
		sed -i '/^$/d' $x
		# get book abbr from chapters.nfo
		babb=$(cat chapters.nfo | grep -m1 $bnam | awk '{ print $4 }' | tr "[:upper:]" "[:lower:]")
		echo "babb is $babb"
		cch=$(echo $x | awk '{ print $2 }' | sed 's/^0*//;s/^$/0/')
		sed -i "s/^\([0-9][0-9]*\)\( .*\)/${babb}${cch}v\\1  \2/" $x
		sed -i '1d' $x

		idx=$(cat chapters.nfo | grep -m1 "${bnam}_${chn}.html" | awk '{ print $1 }')
		filelink=$(echo ${idx}_${babb}_${chn}.html)

		echo "  <a href=\"${filelink}\">${cch}</a> " >> toc.html

		echo "Creating html for ${filelink}"
		echo "<HTML>" > ${filelink}
		echo "<a href=\"toc.html\"><b>$bnam $cch</b></a>" >> ${filelink}
		echo "<BODY>" >> ${filelink}

		while read line
		do
			bcv=$(echo $line | awk '{ print $1 }')
			if [ $bcv == "" ]
			then
				echo "skipping null line"
			else
				bcv=$(echo $line | awk '{ print $1 }' | tr 1234567890 qwertyuiop)
				lc=$(echo $line | awk '{ print $1 }' | head -c 4)
				lc2=$(echo ${babb}${cch} | head -c 4)
				if [ $lc == $lc2 ]
				then
					echo "inserting verse to ${filelink}"
					echo "<p><a href=\"toc.html\">${bcv}</a> | ${line}</p>" >> ${filelink}
				else
					echo "<p>${line}</p>" >> ${filelink}
					echo "skipped non-verse line"
				fi
			fi
		done < $x


	echo "<a href=\"toc.html\">TOC</a>" >> ${filelink}
	echo "</BODY>" >> ${filelink}
	echo "</HTML>" >> ${filelink}

	rm $x
	done

	echo "  </div>" >> toc.html
	echo "  </blockquote>" >> toc.html

done

echo "</BODY>" >> toc.html
echo "</HTML>" >> toc.html
 
Old 08-29-2010, 01:05 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Ok ... I will say 2 things:

1. I am intrigued and would be happy to have a look at improvements but would need one of your input files (ie the ones with the chapters)

2. You would probably find a number of people recommend Perl for this type of thing. Unfortunately I am newish to Perl so would not be a lot of use there

I can tell you straight off that there are some easy things you can remove / change that will help eliminate errors.

eg. You should almost never use 'ls' as it has unpredictable and sometimes unreproducible results (mainly on other systems). This could easily be changed:
Code:
for x in *[0-9]
Also, I realise this is probably only going to ever be for use, but, in case you later work on something for other users you should consider not using
such ambiguous variable names as 'i' and 'x'

I did notice to this line:
Quote:
sed -i '/^$/d' $x
Assuming this is the same $x we were looking at as the input to the loop we first helped you with, it means your test for empty lines is mute as this line
has already deleted them all???
 
1 members found this post helpful.
Old 08-29-2010, 01:45 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by dmafcoi View Post
Code:
while read line
do
     bcv=$(echo $line | awk '{ print $1 }')
     if [ $bcv == "" ]
     then
          echo "skipping null line"
     else
          bcv=$(echo $line | awk '{ print $1 }' | tr 1234567890 qwertyuiop)
          lc=$(echo $line | awk '{ print $1 }' | head -c 4)
          lc2=$(echo ${babb}${cch} | head -c 4)
          if [ $lc == $lc2 ]
          then
               echo "inserting ref to ${filelink}"
               echo "<p><a href=\"toc.html\">${bcv}</a> | ${line}</p>" >> ${filelink}
          else
               echo "<p>${line}</p>" >> ${filelink}
               echo "skipped non-ref line"
          fi
     fi
done < $x

All bash, no external commands
Code:
tr() {
    stra="$1"
    strb="$2"
    string="$3"
    for((i=0;i<=${#stra};i++))
    do
        search="${stra:$i:1}"
        replace="${strb:$i:1}"
        string="${string//$search/$replace}"
    done
    echo "$string"
}

while read -r a b
do
     case "$a" in
       "") echo "skipping null line";;
       *)
           bcv=$(tr "1234567890" "qwertyuiop" "$a")
           lc="${a:0:4}"
           babbcch="${babb}${cch}"
           lc2={$babbcch:0:4}
           case "$lc" in
              "$lc2")
               echo "inserting ref to ${filelink}"
               echo "<p><a href=\"toc.html\">${bcv}</a> | ${a} {$b}</p>" >> ${filelink}
                 ;;
              *)
               echo "<p>${a} ${b}</p>" >> ${filelink}
               echo "skipped non-ref line"
                 ;;
           esac
     esac
done < $x
 
1 members found this post helpful.
Old 08-29-2010, 07:01 AM   #10
dmafcoi
LQ Newbie
 
Registered: May 2010
Posts: 22

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ghostdog74 View Post
All bash, no external commands
...
I can't imagine it being any more efficient... thanks for taking time to type all that out... very helpful
 
Old 08-29-2010, 07:18 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Quote:
I can't imagine it being any more efficient
Actually the less calls to outside programs (hence the less forking of processes required) will make it quite a bit quicker on especially large
data.
 
1 members found this post helpful.
Old 08-29-2010, 07:27 AM   #12
dmafcoi
LQ Newbie
 
Registered: May 2010
Posts: 22

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
...would be happy to have a look at improvements but would need one of your input files (ie the ones with the chapters)...

eg. You should almost never use 'ls' as it has unpredictable and sometimes unreproducible results (mainly on other systems). This could easily be changed:
Code:
for x in *[0-9]
If it is just curiosity I'll happily share one of the chapter files, but I'd feel bad for you to take even more time on something that was only a hour long thing, and is already done. You've already been more helpful than you may realize:

for x in ls *[0-9] # bad examples commonly found on 'help' sites
for x in *[0-9] # I didn't know I could do that

var=$(echo something | head -c 4)
echo ${var:0:4}

awk -F" " '{ print $1 }' # commonly found example
awk '{ print $1 }'

while read.. I thought the '< file' HAD to be a filename, because most examples show it to be that way. You showed me it could be the output of other commands while read... < (awk..

You've corrected, hopefully, a lot of bad habits, so thanks again.
 
Old 08-29-2010, 07:36 AM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by dmafcoi View Post
I can't imagine it being any more efficient... thanks for taking time to type all that out... very helpful
calling external commands like awk/head/tr as you iterate the file is always slower due forking of extra processes. you can always do a test run to find out.
Code:
$ more file
1234567890 blah blah
blah blah 1234567890 1234567890
1234567890 blah blah
blah blah 1234567890 1234567890


$ cat test.internal
#!/bin/bash
tr() {
    stra="$1"
    strb="$2"
    string="$3"
    for((i=0;i<=${#stra};i++))
    do
        search="${stra:$i:1}"
        replace="${strb:$i:1}"
        string="${string//$search/$replace}"
    done
    echo "$string"
}

while read -r a b
do
      tr "1234567890" "qwertyuiop" "$a"
      echo ${a:0:4}
done <"file"

$ cat test.external
#!/bin/bash
while read -r line
do
      echo $line | awk '{ print $1 }' | tr 1234567890 qwertyuiop
      echo $line | awk '{ print $1 }' | head -c 4
done < "file"
Result
Code:
$ time bash test.internal >/dev/null

real    0m0.014s
user    0m0.012s
sys     0m0.001s

$ time bash test.external >/dev/null

real    0m0.043s
user    0m0.013s
sys     0m0.032s

I have also looked at your latest code, that with a lot of extra seds, awks, useless use of ls with the for loop, and useless use of cats all play a part whether your script will be efficient or not.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Exact text find and replace gdavis2287 Programming 3 08-14-2010 01:35 AM
Perl: Find variable text between keywords & replace w/ "term_<variable_text>_term" dsayars Programming 6 07-15-2010 12:05 AM
Find text and replace another field jaysin_aus Linux - Server 9 07-26-2007 03:16 PM
Find variable in template file replace w/date+ and save as jmanjohn61 Linux - General 14 12-13-2004 07:49 AM
How to 'Find and Replace' text in vi editor? concoran Programming 2 11-21-2001 10:40 AM


All times are GMT -5. The time now is 07:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration