LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-02-2011, 08:47 PM   #1
chargaff
LQ Newbie
 
Registered: Jul 2010
Posts: 7

Rep: Reputation: 0
Limit in number of fields that awk can handle ?


Hi,

I have a file with 200 000 lines and I want to append the fields of each line based on matching first field. The resulting file should have 70 000 columns but has "only" 18 000. The command I'm using is working perfectly with a smaller file, wich lead to 14 000 columns. Could there be a limit in number of fields that awk can handle ? Here's my awk command :

Code:
awk -F, 'END { for (k in _) print _[k] } { _[$1] = $1 in _ ? _[$1] FS $4 : $1","$4 } ' file > out
Also, this command writes ^M (windows line break) after each columns. Removing them is easy but where do they come from ?

Working on Ubuntu 10.10

Any help would be greatly appreciated
 
Old 03-03-2011, 02:06 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
So to confirm, you have a csv file that has at least 70,000 lines where the first field is the same?

The issue is not the number of columns but more likely the number of characters a single string can contain.
Unfortunately I was not able to find a definitive answer to whether a maximum size exists

So my suggestion would be to perhaps use the the length function and once a particular entry reaches a certain size, split it into a new one.
 
Old 03-03-2011, 06:19 AM   #3
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
Maybe using dos2unix will help to remove the CR in the file - does it originate form Windows?

Could the problem be rephrased: first sort the CSV file by the first field and then it would be to output each $4 from all lines until $1 changes - then output the final '\n' of the line.
 
Old 03-03-2011, 08:29 AM   #4
chargaff
LQ Newbie
 
Registered: Jul 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Thanks grail an Reuti for your replies

Yes, exactly grail, the file as at least 70,000 lines where the first field is the same.

I will test a solution to split the strings soon, it's a good idea.

Reuti, the problem with th ^M is not to remove them (with dos2unix, sed ect...) but why are they there. This file as never seen a windows machine...
 
Old 03-03-2011, 08:36 AM   #5
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
Quote:
Originally Posted by chargaff View Post
Reuti, the problem with th ^M is not to remove them (with dos2unix, sed ect...) but why are they there. This file as never seen a windows machine...
Ok, then: which application creates the file? Maybe it's defined there that CSV creates a file which is compatible to Windows as outlined in RFC 4180. It's there because it should be there.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk question on handling *.CSV "text fields" in awk jschiwal Programming 8 05-27-2010 06:23 AM
[SOLVED] get fields using awk ashok.g Programming 9 12-09-2009 01:21 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM
What is the best way to handle the hash key used to decrypt database fields? abefroman Linux - Security 2 06-10-2008 03:40 AM
Supressing Fields w/ AWK Rv5 Programming 3 10-19-2004 11:06 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration