Latest LQ Deal: Linux Power User Bundle
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 01-21-2012, 03:07 PM   #1
Registered: Jun 2009
Posts: 131

Rep: Reputation: 18
Using Awk on an Unusual Character?

I need to process the fifth line on this text using awk:

Today Jan 21 Sun 22 Mon 23 Tue 24 Wed 25
[74]Light Wintry Mix [75]Partly Cloudy [76]Showers [77]Sunny [78]Partly
Light Wintry Mix Partly Cloudy Showers Sunny Partly Cloudy
30°FHigh 35°High 52°High 45°High 43°High
18°Low 33°Low 39°Low 31°Low 30°Low
Chance of Precip:
100% Chance of Precip:
20% Chance of Rain:
60% Chance of Rain:
0% Chance of Rain:

Notice there are small circles between the numbers and the letters on the fifth line.

I have tried this:

awk 'BEGIN {print "\n\t\t\b\b\b\b\b\bTHE FIVE DAY WEATHER REPORT\n"} \
/[0-9][0-9]*[a-z|A-Z]*/{print $1"\t\t"$2" "$3"\t\t"$4" "$5"\n"}' 2>> error.txt
This gives too much info. I need to narrow it down to only the fifth line.
Old 01-21-2012, 04:13 PM   #2
Nominal Animal
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946
° is the degree character. The actual binary data depends on what character set is used. In UTF-8 it is \xC2\xB0 and in ISO-8859-1, ISO-8859-15 and Windows-1252 it is \xB0 .

Let's assume you don't know the character set, and that you're only interested in getting the five numeric values (and nothing else) from the file using awk. The solution is simple: use a field separator that includes the "degrees Fahrenheit high", and only handle the first line with such fields. Note that because the separator will follow each value, NF will be one more than the number of temperatures.

awk 'BEGIN { RS="[\r\n]+"; FS="[^-+0-9.,]+[Hh][Ii][Gg][Hh][\t\v\f ]*"; }
  (NF > 1) { printf("%s", $1);
             for (i = 2; i < NF; i++) printf("\t%s", $i);
           }' input-file > output-file
If you wanted to record the high and low temperatures, and output them on different lines, use
awk 'BEGIN { RS="[\r\n]+"; FS="[^-+0-9.,]+([Hh][Ii][Gg][Hh]|[Ll][Oo][Ww])[\t\v\f ]*"; }
  (NF > 1) { if ($0 ~ /[Hh][Ii][Gg][Hh]/) {
                 split("", hi);
                 nhi = NF-1;
                 for (i = 1; i < nhi; i++) hi[i] = $i;
             } else
             if ($0 ~ /[Ll][Oo][Ww]/) {
                 split("", lo)
                 nlo = NF-1;
                 for (i = 1; i < nlo; i++) lo[i] = $i;
       END { printf("%s", hi[1]);
             for (i = 2; i <= nhi; i++) printf("\t%s", hi[i]);
             printf("%s", lo[1]);
             for (i = 2; i <= nlo; i++) printf("\t%s", lo[i]);
           }' input-file > output-file
If you want to output each high and low value in pairs (low1 high1 low2 high2 ... lowN highN), use
       END { n = nhi; if (nlo > n) n = nlo;
             printf("%s\t%s", lo[1], hi[1]);
             for (i = 2; i <= n; i++) printf("\t%s\t%s", lo[i], hi[i]);

Last edited by Nominal Animal; 01-21-2012 at 04:40 PM. Reason: removed a duplicate "use" in the description.
1 members found this post helpful.
Old 01-21-2012, 04:25 PM   #3
Registered: Jun 2009
Posts: 131

Original Poster
Rep: Reputation: 18

Wow, that is amazing. Thank you for explaining all of that to me.


awk, bash

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk: remove the last character in the file cristalp Programming 5 11-03-2011 10:19 AM
awk regexp for one character match nemobluesix Linux - General 7 02-16-2009 10:50 PM
Character \ in awk indiancosmonaut Programming 6 06-30-2008 07:57 PM
awk does not seem to recognize character classes new_2_unix Linux - Newbie 6 10-15-2007 05:36 AM
Insert character by using sed/awk manish_meet_in Linux - General 3 04-05-2007 12:19 PM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:01 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration