LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Grep and AWK (https://www.linuxquestions.org/questions/linux-general-1/grep-and-awk-779804/)

keogk 01-04-2010 03:51 PM

Grep and AWK
 
I have a XML file that I want to pull a number out of.
The file looks like this
<WEBSERVER>
<actual>
<temp>
27.855
</temp>
<timestamp>
04.01.2010
</timestamp>
</actual>

Then it starts over again but with iem 1 , 2 3 etc instead of actual.
I want to just pull the number between <temp> and </temp>

That term appears a few times in the document as well
Would I use grep or awk to do this?

ozanbaba 01-04-2010 04:04 PM

Quote:

Originally Posted by keogk (Post 3813997)
I have a XML file that I want to pull a number out of.
The file looks like this
<WEBSERVER>
<actual>
<temp>
27.855
</temp>
<timestamp>
04.01.2010
</timestamp>
</actual>

Then it starts over again but with iem 1 , 2 3 etc instead of actual.
I want to just pull the number between <temp> and </temp>

That term appears a few times in the document as well
Would I use grep or awk to do this?

are they on the same line?

colucix 01-04-2010 04:11 PM

Quote:

Originally Posted by keogk (Post 3813997)
Would I use grep or awk to do this?

awk or sed or even perl as in:
Code:

while (<>) {
  if (/<temp>/../<\/temp>/) {
    next if /<temp>/ || /<\/temp>/;
    print;
  }
}


keogk 01-04-2010 04:13 PM

No each piece of code is on its own line.
The number I need is always on line 4 and is always the only thing on line 4.

carolh 01-04-2010 04:15 PM

re: grep and awk
 
I would use both awk and grep, like this:
$ cat YourFile | awk '/<temp>/,/<\/temp>/' | grep -v temp

where the awk command prints out all lines between
<temp> and <endtemp> pairs, but that includes the <temp> and </temp> lines
So:
grep -v temp
to remove those lines

syg00 01-04-2010 05:02 PM

Do it all in one call
Code:

awk '/<temp/,/<\/temp/ {if ($0 !~ /temp/) {print } }'  temp.txt
Similarly, the perl above can be reduced to a one-liner.

ghostdog74 01-04-2010 06:45 PM

Quote:

Originally Posted by keogk (Post 3814022)
No each piece of code is on its own line.
The number I need is always on line 4 and is always the only thing on line 4.

Code:

awk 'NR==4' file

ghostdog74 01-04-2010 06:47 PM

Quote:

Originally Posted by carolh (Post 3814025)
I would use both awk and grep, like this:
$ cat YourFile | awk '/<temp>/,/<\/temp>/' | grep -v temp

where the awk command prints out all lines between
<temp> and <endtemp> pairs, but that includes the <temp> and </temp> lines
So:
grep -v temp
to remove those lines

1) no need to use cat.
2) use grep + awk on BIG files.
3) other than that, just awk will do.

pixellany 01-04-2010 07:44 PM

Quote:

Originally Posted by keogk (Post 3814022)
No each piece of code is on its own line.
The number I need is always on line 4 and is always the only thing on line 4.

You are contradicting what you said in the first post.......If the number is always on line 4, then all you need is:
Code:

sed -n '4p' filename
If it is in fact what you first said, then try this:
Code:

sed -n '/<temp>/,/<\/temp>/{/^[0-9]/p}' filename

sundialsvcs 01-04-2010 07:55 PM

"Tools for the job."

Perl provides a very large library of XML-support routines... all of them thoroughly tested.

Use one to read the XML file and then to apply an "XPath expression" to automagically select from it exactly the nodes that you want. Then, output the results as you please.

The Unix/Linux environments provide you with "an embarrassment of riches" in terms of "possible ways to do it." What you want to find, then, is the best way.

Quite frankly, IMHO, Perl usually is that "best way," hands down. And the reason for this is the astounding "CPAN" library.

ghostdog74 01-04-2010 08:03 PM

ideally, that should be the case, using libraries to do the job. But for this simple case, there's no need to. Its not that complicated a task.


All times are GMT -5. The time now is 04:10 AM.