Hello, I'm new to the forums, though I've used them as a reference many times in the past and found them extremely helpful. A little background info on my skills - I've been teaching myself Linux shell scripts as well as php through reverse engineering. I'm pretty good at looking at how something is done and then feeling my way through creating a script to do what I need, but I can't just write one off the top of my head - yet.
Anyway, I've been working on this problem for two days, and I can't seem to figure out a solution. I am working with a program that uses an xml file to process jpeg images. In many cases, one set of files may contain several hundred jpg images, which are ordered into pages with the xml file. For example, there may be filename_0001.jpg, filename_0002.jpg, filename_0003.jpg, etc., and inside the xml file, there is data about the image contained inside an xml field for each image:
Code:
<page leafNum="1">
<width>123</width>
<height>123</height>
</page>
<page leafNum="2">
<width>123</width>
<height>123</height>
</page>
...and so on, but with many more fields inside the page tags, and with hundreds or thousands of pages.
Now, I have come into some situations where I need to insert a page or two into the mix, which would be a major pain, since it would require changing the "leafNum" for every single page after the pages I insert in the xml file to make space, as well as renaming each of the jpeg files in the same fashion. SO I decided that I should devote some time to automating this process with shell scripts.
Now, I WAS able to successfully create a script to rename the jpeg files, adding a number of my choice onto the file numbering. However, I am hitting a wall with the XML.
What I have right now is this:
Code:
awk '{if ($2 ~ /^leafNum/)
{$test=$2;
print $test}
}' filename.xml
The print line exists just so that I can see the output as I try to figure this out.
This works to identify the lines that need to be changed - those containing "leafNum." It then takes the "leafNum="###">" string from that line and sticks it in the $test variable for me, which I hoped to use to manipulate it. I tried several methods of string manipulation on the $test variable, including sed and simply using things like $test=${test#leafNum} to no avail - errors every time. I'm not sure if they just don't work inside an awk command, or if they require a different syntax. I also tried putting the awk command inside a do loop that was processing the file line by line, echoing the read variable into the awk command to search for the leafNum, but I couldn't get the syntax right on that either.
I need to cut that $test variable, which currently reads "leafNum="###">" (where ### can be any integer from 0 to 9999, without any excess zeros) so that I can add a variable $X (a number which I will define based on how many pages I am inserting), and then rewrite the line in the xml file - something like:
Code:
print "<page leafNum="$test">"
...though I haven't gotten around to working that part out either, as I'm stuck on this first part. So here is the plan for my script:
Code:
awk '{if ($2 ~ /^leafNum/)
{$test=$2;
(manipulate this variable to just the numbers, and add to it)
(rewrite the xml line)
}
(ELSE print the line as normal)
}' filename.xml
(output to new file)
So, sorry about the really long post, but I thought I'd give you all the info I have. Does anyone have any suggestions for me?