LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-09-2008, 01:20 PM   #1
Jude Terror
LQ Newbie
 
Registered: Dec 2008
Posts: 27

Rep: Reputation: 0
Is it possible to manipulate strings INSIDE an awk script?


--I posted this in the Linux Newbie forum before realizing it should probably go here. If any mods can delete it from there, please feel free--

Hello, I'm new to the forums, though I've used them as a reference many times in the past and found them extremely helpful. A little background info on my skills - I've been teaching myself Linux shell scripts as well as php through reverse engineering. I'm pretty good at looking at how something is done and then feeling my way through creating a script to do what I need, but I can't just write one off the top of my head - yet.

Anyway, I've been working on this problem for two days, and I can't seem to figure out a solution. I am working with a program that uses an xml file to process jpeg images. In many cases, one set of files may contain several hundred jpg images, which are ordered into pages with the xml file. For example, there may be filename_0001.jpg, filename_0002.jpg, filename_0003.jpg, etc., and inside the xml file, there is data about the image contained inside an xml field for each image:

Code:
<page leafNum="1">
<width>123</width>
<height>123</height>
</page>
<page leafNum="2">
<width>123</width>
<height>123</height>
</page>
...and so on, but with many more fields inside the page tags, and with hundreds or thousands of pages.

Now, I have come into some situations where I need to insert a page or two into the mix, which would be a major pain, since it would require changing the "leafNum" for every single page after the pages I insert in the xml file to make space, as well as renaming each of the jpeg files in the same fashion. SO I decided that I should devote some time to automating this process with shell scripts.

Now, I WAS able to successfully create a script to rename the jpeg files, adding a number of my choice onto the file numbering. However, I am hitting a wall with the XML.

What I have right now is this:

Code:
awk '{if ($2 ~ /^leafNum/)
{$test=$2;
print $test}
}' filename.xml
The print line exists just so that I can see the output as I try to figure this out.

This works to identify the lines that need to be changed - those containing "leafNum." It then takes the "leafNum="###">" string from that line and sticks it in the $test variable for me, which I hoped to use to manipulate it. I tried several methods of string manipulation on the $test variable, including sed and simply using things like $test=${test#leafNum} to no avail - errors every time. I'm not sure if they just don't work inside an awk command, or if they require a different syntax. I also tried putting the awk command inside a do loop that was processing the file line by line, echoing the read variable into the awk command to search for the leafNum, but I couldn't get the syntax right on that either.

I need to cut that $test variable, which currently reads "leafNum="###">" (where ### can be any integer from 0 to 9999, without any excess zeros) so that I can add a variable $X (a number which I will define based on how many pages I am inserting), and then rewrite the line in the xml file - something like:

Code:
print "<page leafNum="$test">"
...though I haven't gotten around to working that part out either, as I'm stuck on this first part. So here is the plan for my script:

Code:
awk '{if ($2 ~ /^leafNum/)
{$test=$2;

(manipulate this variable to just the numbers, and add to it)

(rewrite the xml line)

}

(ELSE print the line as normal)

}' filename.xml

(output to new file)
So, sorry about the really long post, but I thought I'd give you all the info I have. Does anyone have any suggestions for me?
 
Old 12-09-2008, 02:00 PM   #2
Aeiri
Member
 
Registered: Feb 2004
Posts: 307

Rep: Reputation: 30
If the input will always be:

Code:
<page leafNum="{numbers}">
With the new line at the end and the double quotes everytime, you could do this:

Code:
$test=substr($test, 10, length($test)-11);
$test+=$X;
print "<page leafNum=\"" $test "\">";
And for the else statement:

Code:
else print $0;

Last edited by Aeiri; 12-09-2008 at 02:05 PM.
 
Old 12-09-2008, 02:00 PM   #3
Aeiri
Member
 
Registered: Feb 2004
Posts: 307

Rep: Reputation: 30
(somehow double posted, please ignore)

Last edited by Aeiri; 12-09-2008 at 02:02 PM.
 
Old 12-09-2008, 02:17 PM   #4
forrestt
Senior Member
 
Registered: Mar 2004
Location: Cary, NC, USA
Distribution: Fedora, Kubuntu, RedHat, CentOS, SuSe
Posts: 1,288

Rep: Reputation: 99
Here is a start. Remember, you will also still need to insert the xml data for the page(s) you are adding. The variable "newpage" is the position in the pages that you should be inserting the page (I used 2 for your above sample data).

Code:
awk '{if ($2 ~ /^leafNum/) {newpage=2; leafLine=$2; split(leafLine, parts, "\""); if ( parts[2] >= newpage ) { parts[2]++;} print "<page leafNum=\"" parts[2] "\">"; } else print $0 } ' filename.xml
HTH

Forrest
 
Old 12-09-2008, 02:37 PM   #5
Jude Terror
LQ Newbie
 
Registered: Dec 2008
Posts: 27

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Aeiri View Post
(somehow double posted, please ignore)

Genius! Thank you! It works like a charm. I can now proceed with the script. I'll post it when I'm done (or if I get stuck again). Thanks a lot.
 
Old 12-09-2008, 02:39 PM   #6
Jude Terror
LQ Newbie
 
Registered: Dec 2008
Posts: 27

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Aeiri View Post
(somehow double posted, please ignore)

Genius! Thank you! It works like a charm. I can now proceed with the script. I'll post it when I'm done (or if I get stuck again). Thanks a lot.
 
Old 12-09-2008, 03:04 PM   #7
Jude Terror
LQ Newbie
 
Registered: Dec 2008
Posts: 27

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by forrestt View Post
Here is a start. Remember, you will also still need to insert the xml data for the page(s) you are adding. The variable "newpage" is the position in the pages that you should be inserting the page (I used 2 for your above sample data).

Code:
awk '{if ($2 ~ /^leafNum/) {newpage=2; leafLine=$2; split(leafLine, parts, "\""); if ( parts[2] >= newpage ) { parts[2]++;} print "<page leafNum=\"" parts[2] "\">"; } else print $0 } ' filename.xml
HTH

Forrest

Thanks! I will try that out tomorrow (I'm about to head out for the night, so I have to finish tomorrow).

Right now, this is where I'm at using Aeiri's suggestions:

Code:
awk '{if ($2 ~ /^leafNum/)
	{
	$test=$2;
	$test=substr($test, 10, length($test)-11);
	if ($test >= 630)
		{
		$test=$test+2;
		print "    <page leafNum=\"" $test "\">";
		}
	else
		print "    <page leafNum=\"" $test "\">"
	}
else
{print $0}
    	}' filename.xml
Where "630" is the page I'm starting the inserts at and "2" is the number of pages. This works perfectly.

However, when I try to substitute a variable:

Code:
X=2
awk '{if ($2 ~ /^leafNum/)
	{
	$test=$2;
	$test=substr($test, 10, length($test)-11);
	if ($test >= 630)
		{
		$test=$test+$X;
		print "    <page leafNum=\"" $test "\">";
		}
	else
		print "    <page leafNum=\"" $test "\">"
	}
else
{print $0}
    	}' filename.xml
It seems to be multiplying the variables instead of adding them, so that when it reaches page 630, it outputs 1260 instead of 632. The only change I'm making between the two is using $test=$test+$X instead of $test=$test+2. I'm sure I'm missing something boneheaded here which will be plainly obvious in the morning...

Anyway, thanks for the help today, it broke through a two-day barrier for me.

Last edited by Jude Terror; 12-11-2008 at 08:33 AM.
 
Old 12-11-2008, 08:51 AM   #8
Jude Terror
LQ Newbie
 
Registered: Dec 2008
Posts: 27

Original Poster
Rep: Reputation: 0
Ok, so I've got my program pretty much completed to do all I need:

Code:
#!/bin/bash
filename="filename"
X=2
Y=4
awk '{if ($1 ~ /^<leafNum/)
	{
	$test=$1;
	$test=substr($test, 10, length($test)-19);
	if (($test + 0) >= 4)
		{
		$test=$test+2;
		print "        <leafNum>" $test "</leafNum>";
		}
	else
		print "        <leafNum>" $test "</leafNum>"
	}
else
{print $0}
	}' $filename"_scandata.xml" > "temp.xml"

awk '{if ($1 ~ /^<leafCount/)
	{
	$test=$1;
	$test=substr($test, 12, length($test)-23);
	$test=$test+2;
	print "    <leafCount>" $test "</leafCount>";
	}
else
{print $0}
	}' "temp.xml" > "temp2.xml"

awk '{if ($2 ~ /^leafNum/)
	{
	$test=$2;
	$test=substr($test, 10, length($test)-11);
	if (($test + 0) >= 4)
		{
		$test=$test+2;
		print "    <page leafNum=\"" $test "\">";
		}
	else
		print "    <page leafNum=\"" $test "\">"
	}
else
{print $0}
    	}' "temp2.xml" > $filename"_new.xml"

rm temp.xml
rm temp2.xml

#EOF
I've adapted it to edit two other fields in the XML as well, and it works perfectly.

The problem is, I cannot seem to use the variables defined in the beginning inside the awk statements. In other words, wherever I have "2" in the awk statements, I would like to use $X instead, and wherever I have 4, I would like to use $Y. This way, I can have a user define how many pages they are inserting ($X) and what page the insertion goes at ($Y) through prompts, rather than having to edit the numbers into the code manually. However, when I replace with 2 with $X in the statement "$test=$test+2;", which is part of the awk statement, instead of adding two to the value of $test, it appears to double the value of $test. Is there a special way for me to word this statement to add two variables ($test=$test+$X doesn't seem to work)? I've even tried putting the value of $X into a new variable in the awk statement itself (awk -v num=$x), but that does not seem to work either. Any ideas?
 
Old 12-11-2008, 11:03 AM   #9
Jude Terror
LQ Newbie
 
Registered: Dec 2008
Posts: 27

Original Poster
Rep: Reputation: 0
Finally figured this out myself - I have to assign the variables in the awk statement (awk -v varx=$X -v vary=$Y) and call them inside the awk statement without the "$" (test=$test+varx). It works now.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it possible to manipulate strings INSIDE awk scripts? Jude Terror Linux - Newbie 2 12-09-2008 04:42 PM
Awk Question to search specific strings grouped by blank lines rk4k Programming 6 07-07-2008 11:56 PM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM
How to acess Variable defined in perl script inside an awk call sumin Programming 3 04-26-2007 05:19 AM
how to manipulate string in script? ringerxyz Programming 2 02-17-2005 01:14 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration