[SOLVED] Replacing word occurance with an increasing number in a file using bash

twoleggedtripod · 08-02-2010, 11:50 AM

Hi there.

I have a file in the form below, and wish to replace each start line with an increasing number. So instead of:

Code:

start
content content
start
content content
start
content content

I want to generate:

Code:

 1
content content
2
content content
3
content content

I've tried this code in bash:

Code:

for ((a=1; a=100 ; a++))
do
        sed 's/start/'$a'/' input > output
done

to make it work but unfortunately it just spits out the following:

Code:

100
content content
100
content content
100
content content

After several searches and a bit of messing around, it's clear I'm missing something, so was wondering if anyone could offer any insight?

Thanks a lot

grail · 08-02-2010, 11:58 AM

Well I would not agree with the method you are using, but the reason your for statement is not working is you assign it 2 different values.
Firstly the value 1 and then the value 100. Let us have a look at the for statement make up and see if that helps you:

for(<blah>;<foo>;<bar>)

for - command or statement to be used

<blah> - set variable to initial value

<foo> - provide a reason (expression) to stop

<bar> - increase variable by a set amount

Your issue is at the <foo> stage as you have assigned to equal 100 instead of testing when does it equal 100

I will leave the rest to you

smoker · 08-02-2010, 12:06 PM

Your comparison operator is not a comparison operator.
You have assigned something to the variable a twice.

http://www.linuxconfig.org/Bash_scri...ic-comparisons

twoleggedtripod · 08-02-2010, 12:19 PM

Ah, sorry about that, I'd cleaned up the code and missed an operator in there. It should be:

Code:

for ((a=1; a<=100 ; a++))
do
        sed 's/start/'$a'/' input > output
done

smoker · 08-02-2010, 12:41 PM

You have still not compared 2 things. Look at the examples.

dannybpng · 08-02-2010, 03:33 PM

Here is one way to do it:

a=0
while read Line
do
if [ "$Line" == "start" ]; then
((a++))
fi
echo $Line | sed 's/start/'$a'/'
done < input > output

I would just do it with awk like so:

awk '/start/ {count++; print count; next}
{print}' input > output

Trickie · 08-03-2010, 06:07 AM

I'm not sure why you would want to do that within the file when you can enumerate the lines as the file is read. For example, when using cat -n or a text editor like vi. Is this a homework question from college?

twoleggedtripod · 08-03-2010, 08:47 AM

No, not homework. I attempted to make it as general as possible, so I can understand the reasoning.

What I'm writing it for is an output file from a different program. In short, the program runs something, names the cycle=x on one line, provides stuff I need, then a variable amount of data. Earlier, each run was labeled cycle=1, cycle=2 etc, so that was simply a case of using "grep -A $numberoflines cycle=$a", successfully using the "for ((a=1; a<=100 ; a++))" to strip that from a file, and do what I wanted to it:

Code:

#for ((n=1; n <= 9 ; n++))
#do
#        grep -A $1 "cycle =      $n" potentialcurve > out_$n
#done
#
#for ((m=10; m <= 90 ; m++))
#do
#        grep -A $1 "cycle =     $m" potentialcurve > out_$m
#done
#
#for ((x=1; x <= 90 ; x++))
#do
#       sed '1s/.*/$coord/' out_$x > tmol_$x
#       echo '$end' >> tmol_$x
#       babel -itmol tmol_$x -oxyz test_$x.xyz
#       cat test_$x.xyz >> combined.xyz
#done
#
#rm o* t*

However, under a different option the cycle runs for each point a few times and spits the final answer out, causing cycle=13,cycle=25,cycle=5 etc. making the previous method useless

From what I understood, the "for ((a=1; a<=100 ; a++))" line says start at 1, then go to another number (in this case 100) at regular intervals (a++), however it seems I'm wrong with this interpretation.

The awk method does work and is murderously simple, but is admittedly something I haven't looked at at all yet (but this is a newbie forum, right?

)

From what I understand from other posters I'm missing something small and easy to find from my first attempt, which I also admittedly haven't figured out yet, but I'll keep you posted.

archtoad6 · 08-03-2010, 10:40 AM

The thread is marked "[SOLVED]", what was the solution? (Be polite, give us feedback, let us & those who follow know what you did.)

I thought there was a missing '$', but

Code:

for ((a=1; a<=10 ; a++)); do echo $a; done

&

Code:

for ((a=1; $a<=10 ; a++)); do echo $a; done

both work.

Personally, I would use awk.
(See Taylor's Laws of Programming)

jthill · 08-03-2010, 12:48 PM

sed isn't really the right tool for this: starting 10000 processes for a 10000 line file is ... not aesthetically pleasing. Plus, in circumstances other than casual use it can actually turn into a performance problem. So get used to avoiding things like that.

The awk oneliner is probably best:

Code:

awk '/start/{sub(/start/,++n);{print}'

or if you want the substitution only on lines that contain *only* 'start',

Code:

awk '/^start$/{$0=++n};{print}'

but you can do it without ever leaving bash if you want. The whole-line case is really easy. I'll explain the IFS stuff in a minute.

Code:

IFS='' n=0; while read; do [[ $REPLY == start ]] && REPLY=$((++n))
echo $REPLY; done
unset IFS

and bash's parameter expansion can do quite a lot -- it's (relative to what bash can do) also easy to do the more general substitution in bash:

Code:

IFS=''; n=0
while read; do[[ $REPLY == *start* ]] && 
REPLY=${REPLY%%start*}$((++a))${REPLY#*start}
echo $REPLY;done
unset IFS

The shell uses the 'IFS' variable to decide where to split in its input. The 'read' command distributes the parameters into the variables you give it, or 'REPLY' if you don't give it one. With the default IFS, that means whitespace sequences are collapsed to a single space. Setting IFS to '' tells the shell to not do any parameter splitting. Do remember to unset it afterwards. Or you could leave it unset if you want the default behavior.

rojee · 08-04-2010, 01:00 AM

cat & sed

as per:

[root@hostest ~]# cat it.txt
start
content content
start
content content
start
content content

[root@hostest ~]# sed '/^start/d' it.txt |cat -n |sed 's/content.*/\n&/'
1
content content
2
content content
3
content content
[root@hostest ~]#

archtoad6 · 08-06-2010, 08:36 AM

rojee,

I think OP made a slight mistake in his sample file: 'start' is a literal, but 'content content' is not. Therefore you cannot expect a search for "content" to be useful. As I said, I would use awk & jthill was kind enough to provide 2 awk-based solutions.

twoleggedtripod,

It's a custom at LQ to give positive feedback to to those who tried to help you by telling what solution you adopted. That also helps create an archive of solutions for others in the future. Please give back to LQ, don't be a taker only.

rojee · 08-08-2010, 06:42 AM

[root@hostest ~]# cat text.txt
start
contantly changing content content
start
mary had a little lamb
start
and everwhere mary went
[root@hostest ~]#
[root@hostest ~]#
[root@hostest ~]# sed '/^start/d' it.txt |cat -n |sed 's/\t.*/\n&/'
1
contantly changing content content
2
mary had a little lamb
3
and everwhere mary went
[root@hostest ~]#

archtoad6 · 08-08-2010, 12:47 PM

Quote:

Originally Posted by twoleggedtripod

In short, the program runs something, names the cycle=x on one line, provides stuff I need, then a variable amount of data.

I take "variable amount of data" to mean possibly more than one line. The sed approaches posted so far don't allow for that. I still think awk is the right tool for this job.

BTW, using '/' for the delimiter in a regex in both sed & perl is optional -- any character is allowed (RT

M).

Also, FWIW, code samples & ASCII files are best put in "Code:" blocks when posting on LQ. Long or wide ones go better in a pastebin.