LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Using Awk on an Unusual Character? (http://www.linuxquestions.org/questions/programming-9/using-awk-on-an-unusual-character-925006/)

cryingthug 01-21-2012 04:07 PM

Using Awk on an Unusual Character?
 
I need to process the fifth line on this text using awk:

Today Jan 21 Sun 22 Mon 23 Tue 24 Wed 25
[74]Light Wintry Mix [75]Partly Cloudy [76]Showers [77]Sunny [78]Partly
Cloudy
Light Wintry Mix Partly Cloudy Showers Sunny Partly Cloudy
30°FHigh 35°High 52°High 45°High 43°High
18°Low 33°Low 39°Low 31°Low 30°Low
Chance of Precip:
100% Chance of Precip:
20% Chance of Rain:
60% Chance of Rain:
0% Chance of Rain:
10%


Notice there are small circles between the numbers and the letters on the fifth line.

I have tried this:

Code:

awk 'BEGIN {print "\n\t\t\b\b\b\b\b\bTHE FIVE DAY WEATHER REPORT\n"} \
/[0-9][0-9]*[a-z|A-Z]*/{print $1"\t\t"$2" "$3"\t\t"$4" "$5"\n"}' 2>> error.txt

This gives too much info. I need to narrow it down to only the fifth line.

Nominal Animal 01-21-2012 05:13 PM

° is the degree character. The actual binary data depends on what character set is used. In UTF-8 it is \xC2\xB0 and in ISO-8859-1, ISO-8859-15 and Windows-1252 it is \xB0 .

Let's assume you don't know the character set, and that you're only interested in getting the five numeric values (and nothing else) from the file using awk. The solution is simple: use a field separator that includes the "degrees Fahrenheit high", and only handle the first line with such fields. Note that because the separator will follow each value, NF will be one more than the number of temperatures.

Code:

awk 'BEGIN { RS="[\r\n]+"; FS="[^-+0-9.,]+[Hh][Ii][Gg][Hh][\t\v\f ]*"; }
  (NF > 1) { printf("%s", $1);
            for (i = 2; i < NF; i++) printf("\t%s", $i);
            printf("\n");
            exit(0);
          }' input-file > output-file

If you wanted to record the high and low temperatures, and output them on different lines, use
Code:

awk 'BEGIN { RS="[\r\n]+"; FS="[^-+0-9.,]+([Hh][Ii][Gg][Hh]|[Ll][Oo][Ww])[\t\v\f ]*"; }
  (NF > 1) { if ($0 ~ /[Hh][Ii][Gg][Hh]/) {
                split("", hi);
                nhi = NF-1;
                for (i = 1; i < nhi; i++) hi[i] = $i;
            } else
            if ($0 ~ /[Ll][Oo][Ww]/) {
                split("", lo)
                nlo = NF-1;
                for (i = 1; i < nlo; i++) lo[i] = $i;
            }
          }
      END { printf("%s", hi[1]);
            for (i = 2; i <= nhi; i++) printf("\t%s", hi[i]);
            printf("\n");
            printf("%s", lo[1]);
            for (i = 2; i <= nlo; i++) printf("\t%s", lo[i]);
            printf("\n");
          }' input-file > output-file

If you want to output each high and low value in pairs (low1 high1 low2 high2 ... lowN highN), use
Code:

      END { n = nhi; if (nlo > n) n = nlo;
            printf("%s\t%s", lo[1], hi[1]);
            for (i = 2; i <= n; i++) printf("\t%s\t%s", lo[i], hi[i]);
            printf("\n")
          }

instead.

cryingthug 01-21-2012 05:25 PM

Thanks!
 
Wow, that is amazing. Thank you for explaining all of that to me.


All times are GMT -5. The time now is 11:24 PM.