LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 01-21-2012, 04:07 PM   #1
cryingthug
Member
 
Registered: Jun 2009
Posts: 131

Rep: Reputation: 18
Using Awk on an Unusual Character?


I need to process the fifth line on this text using awk:

Today Jan 21 Sun 22 Mon 23 Tue 24 Wed 25
[74]Light Wintry Mix [75]Partly Cloudy [76]Showers [77]Sunny [78]Partly
Cloudy
Light Wintry Mix Partly Cloudy Showers Sunny Partly Cloudy
30°FHigh 35°High 52°High 45°High 43°High
18°Low 33°Low 39°Low 31°Low 30°Low
Chance of Precip:
100% Chance of Precip:
20% Chance of Rain:
60% Chance of Rain:
0% Chance of Rain:
10%


Notice there are small circles between the numbers and the letters on the fifth line.

I have tried this:

Code:
awk 'BEGIN {print "\n\t\t\b\b\b\b\b\bTHE FIVE DAY WEATHER REPORT\n"} \
/[0-9][0-9]*[a-z|A-Z]*/{print $1"\t\t"$2" "$3"\t\t"$4" "$5"\n"}' 2>> error.txt
This gives too much info. I need to narrow it down to only the fifth line.
 
Old 01-21-2012, 05:13 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
° is the degree character. The actual binary data depends on what character set is used. In UTF-8 it is \xC2\xB0 and in ISO-8859-1, ISO-8859-15 and Windows-1252 it is \xB0 .

Let's assume you don't know the character set, and that you're only interested in getting the five numeric values (and nothing else) from the file using awk. The solution is simple: use a field separator that includes the "degrees Fahrenheit high", and only handle the first line with such fields. Note that because the separator will follow each value, NF will be one more than the number of temperatures.

Code:
awk 'BEGIN { RS="[\r\n]+"; FS="[^-+0-9.,]+[Hh][Ii][Gg][Hh][\t\v\f ]*"; }
  (NF > 1) { printf("%s", $1);
             for (i = 2; i < NF; i++) printf("\t%s", $i);
             printf("\n");
             exit(0);
           }' input-file > output-file
If you wanted to record the high and low temperatures, and output them on different lines, use
Code:
awk 'BEGIN { RS="[\r\n]+"; FS="[^-+0-9.,]+([Hh][Ii][Gg][Hh]|[Ll][Oo][Ww])[\t\v\f ]*"; }
  (NF > 1) { if ($0 ~ /[Hh][Ii][Gg][Hh]/) {
                 split("", hi);
                 nhi = NF-1;
                 for (i = 1; i < nhi; i++) hi[i] = $i;
             } else
             if ($0 ~ /[Ll][Oo][Ww]/) {
                 split("", lo)
                 nlo = NF-1;
                 for (i = 1; i < nlo; i++) lo[i] = $i;
             }
           }
       END { printf("%s", hi[1]);
             for (i = 2; i <= nhi; i++) printf("\t%s", hi[i]);
             printf("\n");
             printf("%s", lo[1]);
             for (i = 2; i <= nlo; i++) printf("\t%s", lo[i]);
             printf("\n");
           }' input-file > output-file
If you want to output each high and low value in pairs (low1 high1 low2 high2 ... lowN highN), use
Code:
       END { n = nhi; if (nlo > n) n = nlo;
             printf("%s\t%s", lo[1], hi[1]);
             for (i = 2; i <= n; i++) printf("\t%s\t%s", lo[i], hi[i]);
             printf("\n")
           }
instead.

Last edited by Nominal Animal; 01-21-2012 at 05:40 PM. Reason: removed a duplicate "use" in the description.
 
1 members found this post helpful.
Old 01-21-2012, 05:25 PM   #3
cryingthug
Member
 
Registered: Jun 2009
Posts: 131

Original Poster
Rep: Reputation: 18
Thanks!

Wow, that is amazing. Thank you for explaining all of that to me.
 
  


Reply

Tags
awk, bash


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk: remove the last character in the file cristalp Programming 5 11-03-2011 11:19 AM
awk regexp for one character match nemobluesix Linux - General 7 02-16-2009 11:50 PM
Character \ in awk indiancosmonaut Programming 6 06-30-2008 08:57 PM
awk does not seem to recognize character classes new_2_unix Linux - Newbie 6 10-15-2007 06:36 AM
Insert character by using sed/awk manish_meet_in Linux - General 3 04-05-2007 01:19 PM


All times are GMT -5. The time now is 03:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration