LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-30-2012, 02:47 PM   #16
KlausMu
LQ Newbie
 
Registered: Mar 2012
Location: DE
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: Disabled

Quote:
Originally Posted by PTrenholme View Post
This slight rewrite might make NominalAnimal's code somewhat easier to follow.
maintain and write.
for information only
interesting different outputs ,many fields agree ,others are wrong
with the same file

which you see here
Minimum_ 10.0 1.0 27.0 1.0 34.0 1.0 24.0 1.0 #your code
daily-minimum 10.0 27.8 34.5 24.4 23.6 41.6 36.9 17.2 #the Output from Nominal Animal is correct
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 03-31-2012, 05:36 PM   #17
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by KlausMu View Post
for information only
interesting different outputs ,many fields agree ,others are wrong
with the same file

which you see here
Minimum_ 10.0 1.0 27.0 1.0 34.0 1.0 24.0 1.0 #your code
daily-minimum 10.0 27.8 34.5 24.4 23.6 41.6 36.9 17.2 #the Output from Nominal Animal is correct
Yes, I noticed that after I posted, and was working on a correction with somewhat more complicated calculation algorithms. (There is a problem using sum(x^2)-(avg(x))^2 in a computation: If the numbers are large and not too far from the average, the computed difference may be zero or the sum of imprecisely computed values. Your numbers seem like they might be like that.)

The problem in the code I posted is the test if ($field) { which should be if ($field !="") { so the first function looks like this:
Code:
function get_stats_for(date, dates, count, sum, ss, min, max,  field) 
{
    dates[date]++
    for (field = 6; field <= NF; field++) {
        if ($field !="") {
          ++count[date, field]
          if (count[date, field] == 1) {
            min[date, field]=max[date, field]=$field
            sum[date, field]=ss[date, field]=0.0
          }
          sum[date, field] += $field
          ss[date, field] += $field * $field
          if ($field < min[date, field]) min[date, field] = $field
          if ($field > max[date, field]) max[date, field] = $field
        }
    }
}
That test may not, in fact, be necessary: It protects against the case where one of the input fields is entered as a null string, which would only happen if you were using an actual csv file for your input, and an observation was missing (i.e., entered as ",,").

Oh, if you're willing to assume that the first observation represents a "typical" value, then the folling pair of function may produce more numerically stable results:
Code:
function get_stats_for(date, dates, count, sum, ss, min, max,  field) 
{
    dates[date]++
    for (field = 6; field <= NF; field++) {
        if ($field !="") {
          ++count[date, field]
          if (count[date, field] == 1) {
            min[date, field]=max[date, field]=$field
            sum[date, field]=ss[date,field]=0.0
            base[date, field]=$field
          }
          diff=($field) - base[date,field]
          sum[date, field] += diff
          ss[date, field] += diff^2
          if ($field < min[date, field]) min[date, field] = $field
          if ($field > max[date, field]) max[date, field] = $field
        }
    }
}
function write_results(unit, dates, count, sum, ss, min, max,  n,k,i,sorted,date)
{
    printf("\n%s results\n", unit)
    n=asorti(dates, sorted)
    for (k=1;k<=n;++k) {
        date=sorted[k]
        print "    " date
        printf("\tCount")
        for (field = 6; field<=fields;++field) {
          printf("\t%d", count[date,field])
        }
        printf("\n\tMinimum")
        for (field = 6; field <= fields; field++) {
            printf("\t%.1f", min[date,field])
        }
        printf("\n\tAverage%s",datum, s)
        for (field = 6; field <= fields; field++) {
            printf("\t%.1f", (count[date,field]>0)?((sum[date,field] / count[date,field])+base[date,field]):0)
        }
        printf("\n\tStdErr")
        for (field = 6; field <= fields; field++) {
            if (count[date,field] > 1) {
              avg=sum[date,field]/count[date,field]
              printf("\t%.1f", sqrt((ss[date,field] - avg^2) / (count[date,field]-1)))
            }
            else {
              printf("\t%.1f", 0.0)
            }
        }
        printf("\n\tMaximum")
        for (field = 6; field <= fields; field++) {
            printf("\t%.1f", max[date,field])
        }
        printf("\n\n")
    }
}
 
Old 08-09-2012, 02:25 PM   #18
KlausMu
LQ Newbie
 
Registered: Mar 2012
Location: DE
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: Disabled
NEW Output

I have a new final output of a log file
number of columns are the same
However, this has changed the date format
Unfortunately, the historian complex with the date
see here
old complex string
{ cmd = "LANG=C LC_ALL=C date -d \047" $1 " " $2 " " $3 " " $4 " " $5 "\047 +\047%Y-%m-%d %GW%V %Y-%m\047"
cmd | getline datestr
close(cmd)

can you give me correct that, please
thank you

the old output
Mon Feb 27 21:11:00 2012 10.9 32.9 40.6 29.5 29.7 49.8 42.4 45.6 0 0 0 0 0 0 270179 0 0 0 21:10 Mo,21:10
NEW output
03.04.2012 20:55:59 24,1 32,9 34,9 31,2 31,1 48,2 38,4 40,4 0 0 0 0 0 0 361 0 0 0 20:55 Di,20:55
 
Old 08-14-2012, 11:53 AM   #19
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by KlausMu View Post
for information only
interesting different outputs ,many fields agree ,others are wrong
with the same file

which you see here
Minimum_ 10.0 1.0 27.0 1.0 34.0 1.0 24.0 1.0 #your code
daily-minimum 10.0 27.8 34.5 24.4 23.6 41.6 36.9 17.2 #the Output from Nominal Animal is correct
I haven't actually looked, but I did mention that I'd changed the variance estimator to a standard error estimate to illustrate how easy it was to make simple changes. (Depending on your data model, a standard error is often preferable to an uncorrected variance. But that's a different subject.)
 
Old 08-14-2012, 02:59 PM   #20
KlausMu
LQ Newbie
 
Registered: Mar 2012
Location: DE
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by PTrenholme View Post
I haven't actually looked, but I did mention that I'd changed the variance estimator to a standard error estimate to illustrate how easy it was to make simple changes. (Depending on your data model, a standard error is often preferable to an uncorrected variance. But that's a different subject.)
Thanks for the answer
I have another problem
the output of the date has changed

can you give me correct this please (red market)
Quote:
the old output
Mon Feb 27 21:11:00 2012 10.9 32.9 40.6 29.5 29.7 49.8 42.4 45.6 0 0 0 0 0 0 270179 0 0 0 21:10 Mo,21:10
NEW output
03.04.2012 20:55:59 24,1 32,9 34,9 31,2 31,1 48,2 38,4 40,4 0 0 0 0 0 0 361 0 0 0 20:55 Di,20:55
in this String
Quote:
{ cmd = "LANG=C LC_ALL=C date -d \047" $1 " " $2 " " $3 " " $4 " " $5 "\047 +\047%Y-%m-%d %GW%V %Y-%m\047"
cmd | getline datest
many thanks
 
Old 08-15-2012, 03:43 PM   #21
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Try using -Rd instead of -d and dropping the whole output format specification. Here's an example:
Code:
$ date -d '03/04/2012 20:55:59' +'%Y-%m-%d %GW%V %Y-%m' #(Using your output specification)
2012-03-04 2012W09 2012-03
$ date -Rd '03/04/2012 20:55:59' #(Using the RFC 2822 standard. My local specifies that times, by default, are UTC-8.)
Sun, 04 Mar 2012 20:55:59 -0800
Note that the output format you used (the first output, above) does not produce the "new output" string you displayed. Is this embedded in your code somewhere? (I didn't try to figure out what code you were actually using, since this thread is somewhat long. I'd suggest that you reference a post number for the whole block of code from which you extract parts for which you have questions.)

Note also that trying your commands interactively (as I did, above) is a useful technique for isolating problems.

See man date for date output formatting details.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
parsing a text file - to awk or not to awk ? rollyah Programming 9 08-18-2011 02:20 PM
[SOLVED] call awk from bash script behaves differently to awk from CLI = missing newlines titanium_geek Programming 4 05-26-2011 09:06 PM
[SOLVED] awk: how can I assign value to a shell variable inside awk? quanba Programming 6 03-23-2010 02:18 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM
What is datalogger? wood Linux - General 1 02-03-2004 04:22 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration