LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-09-2010, 05:46 AM   #16
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956

Assuming you can use the code with the asorti function, as per post #2:
Code:
BEGIN{ FS = ","; getline }
{
  balance[$1] = balance[$1] + $2
}
END{
  n = asorti(balance,indices)

  for (i = 1; i <= n; i++)
      printf "%s, %5.2f\n", indices[i], balance[indices[i]]
}
you can change it to the following:
Code:
BEGIN{ FS = ","; getline }
{
  balance[$1] = ( balance[$1] "," $2 )
}
END{
  n = asorti(balance,indices)

  for (i = 1; i <= n; i++)
      printf "%s%s\n", indices[i], balance[indices[i]]
}
Here you don't sum $2, but concatenate values in a string, using comma as separator. Then you have to change the format in the printf statement, since you have to print out a string and not a floating point number. The same modifications can be applied to the other versions of the code. Hope this helps.
 
Old 12-09-2010, 06:36 AM   #17
czezz
Member
 
Registered: Nov 2004
Location: Poland/Warsaw
Distribution: Slackware/Solaris
Posts: 563

Rep: Reputation: 30
For the original input file it works perfectly.
However my real file is slight different... I thought it will be easy for me to modify it but it occur much more difficult.

here is a sample
Code:
# text text text
12/7/10 00:00,gg2a,15791,3372,4018,5,
12/7/10 00:00,gg2b,4961,92,31190,4,
# text2 text2 text2
12/7/10 00:00,gg2a,1.8840170106E10,3.043735864E9,1.5796434242E10,1.7081492E7,
12/7/10 00:00,gg2b,8.6964647131E10,1.1799862993E10,7.5164784138E10,7.1079514E7,
What I was going to achive:
Code:
12/7/10 00:00,gg2a,15791,3372,4018,5,1.8840170106E10,3.043735864E9,1.5796434242E10,1.7081492E7,
12/7/10 00:00,gg2b,4961,92,31190,4,8.6964647131E10,1.1799862993E10,7.5164784138E10,7.1079514E7,
My script modification is that (but doesnt work too good :/ )
Code:
BEGIN{ FS = ","; getline }
{
  balance[$1 "," ,$2, "," $3 "," ,$4 "," ,$5 "," ,$6] = ( balance[$1 "," ,$2, "," $3 "," ,$4 "," ,$5 "," ,$6] "," $3 "," $4 )
}
END{
  n = asorti(balance,indices)

  for (i = 1; i <= n; i++)
      printf "%s%s\n", indices[i], balance[indices[i]]
}

Last edited by czezz; 12-10-2010 at 04:33 AM.
 
Old 12-09-2010, 07:00 AM   #18
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Yes, things are a bit more complicate here. First a question: based on what criteria do you merge lines? How many variants of the "gg" field may occur?

Last edited by colucix; 12-10-2010 at 04:59 AM.
 
Old 12-09-2010, 07:36 AM   #19
czezz
Member
 
Registered: Nov 2004
Location: Poland/Warsaw
Distribution: Slackware/Solaris
Posts: 563

Rep: Reputation: 30
merge line criteria is:
Code:
12/7/10 00:00,gg2a
where "gg" may be: gg2a, gg2b, gg4a, gg4b, etc.

There 2 types of lines but each contain 6 columns separated by comma:
eg. type 1:
Code:
12/7/10 00:00,gg2b,4961,92,31190,4,
eg. type 2:
Code:
12/7/10 00:00,gg2a,1.8840170106E10,3.043735864E9,1.5796434242E10,1.7081492E7,
Sometimes between them, hashed lines may occur which should be ignored.

Last edited by czezz; 12-10-2010 at 04:33 AM.
 
Old 12-09-2010, 08:43 AM   #20
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Here is a working example based on the input data in post #17:
Code:
BEGIN { FS = "," }

! /^#/ {
  
  balance[$1 "," $2] = ( balance[$1 "," $2] "," $3  "," $4  "," $5 "," $6 )

}

END { 
     n = asorti(balance,indices)

     for (i = 1; i <= n; i++)
         printf "%s%s,\n", indices[i], balance[indices[i]]
}
First note the (negated) regular expression before the main rule: ! /^#/. This excludes every line that begins with a hash (or in other world the rule is applied to every line that does not begin with hash). The rest should be clear, as you already tried something similar.

I have some doubt about the sorting process, anyway. First, if you want to sort by date (for example from the oldest to the most recent) you should have a date format that can be naturally sorted in an alphanumeric sense. For example:
Code:
10/07/12 00:00
10/08/13 04:00
in this way the asorti function sorts strings by means of the alphanumeric order and the result is automaticaaly sorted by date. In alternative you might transform the date string in a date number (julian date), sort them numerically and finally transform them back to the original format. You can do this using awk's time functions.

Last edited by colucix; 12-10-2010 at 05:00 AM.
 
1 members found this post helpful.
Old 12-10-2010, 04:32 AM   #21
czezz
Member
 
Registered: Nov 2004
Location: Poland/Warsaw
Distribution: Slackware/Solaris
Posts: 563

Rep: Reputation: 30
Works perfect! Thank you.

There is one my big mistake which makes to you not clear situation.
The flag gg should always be "gg[one digit 0-9][one a-z].
I have written that:
Quote:
where "gg" may be: gg2a, gg2b, gg4a, gg4b, etc.
then by mistake I have written in one example other name, longer name which alrdy corrected.
Sorry for that.
 
Old 12-10-2010, 05:01 AM   #22
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Corrected my posts accordingly!
 
  


Reply

Tags
awk, bash, csv, duplicate, merge, records


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
print duplicate records only vakharia Mahesh Linux - General 1 05-24-2007 02:19 PM
bash script - remove header row from csv file pljvaldez Programming 5 08-30-2006 11:05 AM
Bash - Deleting duplicate records Wire323 Programming 5 12-04-2005 08:51 AM
Detecting duplicate keys in records. carl.waldbieser Programming 15 09-15-2005 06:24 AM


All times are GMT -5. The time now is 05:02 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration