LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-27-2014, 11:33 AM   #1
keif
Member
 
Registered: Apr 2013
Posts: 83

Rep: Reputation: Disabled
Create columns for each section of a line following the comma


Hello all,

I have a problem separating data in the lines below. Normally, in order to get a single column I would do something like:

Code:
awk '{print $1}'
The problem with that is, the lines in the file below cannot be separated by column.

So I need to find a way to separate the sections of the lines into columns so it'll be easier to distribute the data.

If anyone has advice on how I can put every string of characters that resides before each comma into a column, and remove the comma, it would be greatly appreciated.

Code:
2,1393516939,1,6005411421404424578,0,762,3
3,1393516905,1,6005411421405090576,0,4830,3
25,1393516975,1,6005411420255300952,0,0,3
1,1393516861,3,6005411420240423232,0,13516,3
9,1393516945,1,6005411421420585154,70,70,3
1,1393517058,1,6005411421404433942,0,0,3
7,1393517036,6,6005411420217366497,0,7595,3
3,1393517053,1,6005411421404416707,0,14386,3
9,1393517090,3,6005411420245755810,0,5241,3
8,1393516890,1,6005411421404436234,0,26552,3
1,1393517103,1,6005411421401195114,0,287,3
3,1393517153,1,6005411421430297097,0,0,3
1,1393517206,1,6005411421428406379,0,0,3
2,1393517080,1,6005411421401184431,0,0,3
7,1393517104,24,6005411421421405923,0,1229,3
3,1393517266,1,6005411421404444238,0,0,3
3,1393517320,9,6005411421422787410,5,5,3
8,1393517154,17,6005411421404371936,0,78,3
2,1393517273,1,6005411421404434783,0,4553,3
1,1393517271,29,6005411421428498194,0,20,3
3,1393517394,22,6005411421422787410,5,5,3
8,1393517424,1,6005411421404440558,0,4603,3
7,1393517288,1,6005411421401181726,0,0,3
1,1393517469,2,6005411421426871137,0,0,3
2,1393517436,2,6005411421405090626,0,30,3
27,1393517312,1,6005411420237455817,0,130,3
7,1393517493,2,6005411421430881585,0,0,3
3,1393517496,1,6005411421406976625,0,9262,3
2,1393517517,1,6005411421431197239,0,241,3
7,1393517558,6,6005411421406989560,0,1049,3
3,1393517593,3,6005411421411910445,0,0,3
7,1393517622,6,6005411421400938787,0,3099,3
3,1393517661,1,6005411421418123919,0,0,3
2,1393517589,1,6005411421404371704,0,0,3
8,1393517501,1,6005411421412875571,0,103,3
7,1393517702,3,6005411421431198013,0,0,3
2,1393517766,4,6005411421404434254,0,458,3
3,1393517734,3,6005411421428407294,0,0,3
7,1393517814,4,6005411421401193838,0,5791,3
 
Old 02-27-2014, 12:02 PM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,294

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
With this InFile ...
Code:
2,1393516939,1,6005411421404424578,0,762,3
3,1393516905,1,6005411421405090576,0,4830,3
25,1393516975,1,6005411420255300952,0,0,3
1,1393516861,3,6005411420240423232,0,13516,3
9,1393516945,1,6005411421420585154,70,70,3
1,1393517058,1,6005411421404433942,0,0,3
... this column ...
Code:
column -ts, $InFile >$OutFile
... produced this OutFile ...
Code:
2   1393516939  1  6005411421404424578  0   762    3
3   1393516905  1  6005411421405090576  0   4830   3
25  1393516975  1  6005411420255300952  0   0      3
1   1393516861  3  6005411420240423232  0   13516  3
9   1393516945  1  6005411421420585154  70  70     3
1   1393517058  1  6005411421404433942  0   0      3
Daniel B. Martin
 
2 members found this post helpful.
Old 02-27-2014, 01:35 PM   #3
keif
Member
 
Registered: Apr 2013
Posts: 83

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
With this InFile ...
Code:
2,1393516939,1,6005411421404424578,0,762,3
3,1393516905,1,6005411421405090576,0,4830,3
25,1393516975,1,6005411420255300952,0,0,3
1,1393516861,3,6005411420240423232,0,13516,3
9,1393516945,1,6005411421420585154,70,70,3
1,1393517058,1,6005411421404433942,0,0,3
... this column ...
Code:
column -ts, $InFile >$OutFile
... produced this OutFile ...
Code:
2   1393516939  1  6005411421404424578  0   762    3
3   1393516905  1  6005411421405090576  0   4830   3
25  1393516975  1  6005411420255300952  0   0      3
1   1393516861  3  6005411420240423232  0   13516  3
9   1393516945  1  6005411421420585154  70  70     3
1   1393517058  1  6005411421404433942  0   0      3
Daniel B. Martin
Thank you Daniel. That was exactly what I was looking for.
 
Old 02-27-2014, 07:54 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,978

Rep: Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123
Quote:
The problem with that is, the lines in the file below cannot be separated by column.
Not true, you simply need to alter your original awk:
Code:
awk -F, '{print $1}' file
However, if the above result provided by Daniel is what is required, that is the best way to process it
 
Old 02-27-2014, 08:16 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,294

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Quote:
Originally Posted by grail View Post
Code:
awk -F, '{print $1}' file
This delivers only column 1. I interpreted OP's requirement as being for all columns.

If only column 1 is needed I would use cut. Fewer keystrokes.
Code:
cut -d, -f1 $InFile >$OutFile
Daniel B. Martin
 
Old 02-27-2014, 08:24 PM   #6
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,433
Blog Entries: 1

Rep: Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144
Or if you have a very evil computer science prof that wants you to do everything with shell builtins:

Code:
IFS="," read -r col1 col2 col3 col4 col5 col6 col7 <<< "2,1393516939,1,6005411421404424578,0,762,3"
printf "$col1 $col2 $col3 $col4 $col5 $col6 $col7"
 
Old 02-27-2014, 08:25 PM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,294

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Post #2 in this thread proposed using column. That works nicely but all data is left-justified. To my eye, right-justified values make a more attractive table (assuming that's what you really want). Column doesn't have a shift-right option. (Pity!) A Google search turned up an awk which does the job. The code isn't pretty but it works.

With this InFile ...
Code:
2,1393516939,1,6005411421404424578,0,762,3
3,1393516905,1,6005411421405090576,0,4830,3
25,1393516975,1,6005411420255300952,0,0,3
1,1393516861,3,6005411420240423232,0,13516,3
9,1393516945,1,6005411421420585154,70,70,3
1,1393517058,1,6005411421404433942,0,0,3
... this awk ...
Code:
awk -F, '
NR==FNR {for (i=1;i<=NF;i++)
  if (w[i]<length($i)) w[i]=length($i); next}
        {for (i=1;i<=NF;i++)
  printf "%*s",w[i]+(i>1?1:0),$i; print ""}' \
  $InFile $InFile >$OutFile
... produced this OutFile ...
Code:
 2 1393516939 1 6005411421404424578  0   762 3
 3 1393516905 1 6005411421405090576  0  4830 3
25 1393516975 1 6005411420255300952  0     0 3
 1 1393516861 3 6005411420240423232  0 13516 3
 9 1393516945 1 6005411421420585154 70    70 3
 1 1393517058 1 6005411421404433942  0     0 3
Daniel B. Martin
 
Old 02-27-2014, 08:32 PM   #8
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,433
Blog Entries: 1

Rep: Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144Reputation: 1144
Just guessing at the values here,. but this should do somewhat the same thing as that awk program, if right justify is really what is needed.

Code:
IFS="," read -r col1 col2 col3 col4 col5 col6 col7 <<< "2,1393516939,1,6005411421404424578,0,762,3"
printf "%1d $col1 %12d $col2 %1d $col3 %18d $col4 %1d $col5 %8d $col6 %1d $col7 \n"
 
Old 02-27-2014, 09:01 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,294

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Quote:
Originally Posted by szboardstretcher View Post
Just guessing at the values here,. but this should do somewhat the same thing as that awk program, if right justify is really what is needed.

Code:
IFS="," read -r col1 col2 col3 col4 col5 col6 col7 <<< "2,1393516939,1,6005411421404424578,0,762,3"
printf "%1d $col1 %12d $col2 %1d $col3 %18d $col4 %1d $col5 %8d $col6 %1d $col7 \n"
I don't take credit for the awk; I found it with a Google search. It's strength is that it is generalized. The user need not know the number columns in the input file or the width needed to print each column.

Daniel B. Martin
 
Old 02-27-2014, 11:30 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,978

Rep: Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123Reputation: 2123
Quote:
Originally Posted by danielbmartin View Post
This delivers only column 1. I interpreted OP's requirement as being for all columns.

If only column 1 is needed I would use cut. Fewer keystrokes.
Code:
cut -d, -f1 $InFile >$OutFile
Daniel B. Martin
There are of course easier / better options, but I was using the OPs original input as they seemed to be saying that the awk being used was no longer able to provide output
 
Old 02-28-2014, 03:44 PM   #11
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,164

Rep: Reputation: 333Reputation: 333Reputation: 333Reputation: 333
As a variation on the code in post by Mr. Martin (#7, above), consider this gawk script. (Note that this uses gawk extensions.)
My point is to illustrate the additional possibilities provided by gawk (well, actually that part could be done Posix awk).
Anyhow:
Code:
#!/bin/gawk -f
# Convert a CSV file to a fixed field length file
#
# Usage: convert {file1 {file2 ... fileN} output_file}
# 
# where input defaults to stdin and output to stdout
#
# Notes: This program stores all the input data an array
#        available for processing, if needed, before the 
#        input files are written.
#
#        The number of fields is NOT assumed to be the same
#        in each record. Since the data array is two-dimensional,
#        the number of fields in the i-th input line is length(data[i]),
#        and those field values are data[i][j], j=1...length(data[i]).
########################################################################
#
# Initialization
BEGIN {
  # Set the field seperator to a comma
  FS=","
  # If more than one file is specified on the input
  # line, and OutFile is not defined, assume that
  # the last entry is the output file name.
  if ((! OutFile) && (ARGC > 2)) {
    OutFile=ARGV[ARGC-1]
    --ARGC
  }
  else {
    OutFile="/dev/stdout"
  }
}

{
  ++record_count
  if (field_count < NF) {
    field_count = NF
  }
  for (i=1;i<=NF;i++) {
    if (w[i]<length($i)) {
      w[i]=length($i)
    }
    data[record_count][i]=$i
  }
}

# Add an extra blank before each field except the first
END {
  for (i=2; i<=field_count; ++i) {
    ++w[i]
  }
}

# Write the output
END {
  for (j=1; j<=record_count; ++j) {
    for (i=1;i<=length(data[j]);++i) {
      printf("%" w[i] "s", data[j][i]) > OutFile
    }
    printf "\n" > OutFile
  }
}

# And, to illustrate other possibilities, some statistics ...
END {
  for (j=1; j<=record_count; ++j) {
    for (i=1; i<=length(data[j]);++i) {
      datum=0+data[j][i]
      if (datum != data[j][i]) {
        continue # Skip any non-numeric values
      }
      ++n[i]
      if (datum > max[i]) {
        max[i]=datum
      }
      if (min[i] > datum) {
        min[i]=datum
      }
      sum[i]+=datum
      ss[i]+=datum*datum
      ++freq[i][datum]
    }
  }
  for (i=1; i<=field_count; ++i) {
    if (n[i]>0) {
      mean=sum[i]/n[i]
      nv=asorti(freq[i],asc)
      if (nv==1) {
        printf("Column %d: All values are equal to %d.\n", i, asc[1])
        continue
      }
      median=asc[nv]
      std=((ss[i]-(mean*mean))/(n[i]-1))**0.5
      printf("Column %d: Average=%f, Meadian=%f, Standard Error=%f\n", i, mean, median, std)
    }
    else {
      printf("Column %d had no numeric entries.\n", i)
    }
  }
}
which, after being saved as "columize" and made executable, produces:
Code:
$ ./columize test_data test_out
Column 1: Average=5.333333, Meadian=1.000000, Standard Error=7.711160
Column 2: Average=1393517309.000000, Meadian=1393516861.000000, Standard Error=1393517309.000027
Column 3: Average=4.333333, Meadian=1.000000, Standard Error=8.062620
Column 4: Average=6005411421262476288.000000, Meadian=6005411420217366528.000000, Standard Error=6005411421262476288.000000
Column 5: Average=2.051282, Meadian=0.000000, Standard Error=11.408437
Column 6: Average=2663.974359, Meadian=0.000000, Standard Error=6005.381993
Column 7: All values are equal to 3.
with test_out as follows:
Code:
$ cat test_out 
 2 1393516939  1 6005411421404424578  0   762 3
 3 1393516905  1 6005411421405090576  0  4830 3
25 1393516975  1 6005411420255300952  0     0 3
 1 1393516861  3 6005411420240423232  0 13516 3
 9 1393516945  1 6005411421420585154 70    70 3
 1 1393517058  1 6005411421404433942  0     0 3
 7 1393517036  6 6005411420217366497  0  7595 3
 3 1393517053  1 6005411421404416707  0 14386 3
 9 1393517090  3 6005411420245755810  0  5241 3
 8 1393516890  1 6005411421404436234  0 26552 3
 1 1393517103  1 6005411421401195114  0   287 3
 3 1393517153  1 6005411421430297097  0     0 3
 1 1393517206  1 6005411421428406379  0     0 3
 2 1393517080  1 6005411421401184431  0     0 3
 7 1393517104 24 6005411421421405923  0  1229 3
 3 1393517266  1 6005411421404444238  0     0 3
 3 1393517320  9 6005411421422787410  5     5 3
 8 1393517154 17 6005411421404371936  0    78 3
 2 1393517273  1 6005411421404434783  0  4553 3
 1 1393517271 29 6005411421428498194  0    20 3
 3 1393517394 22 6005411421422787410  5     5 3
 8 1393517424  1 6005411421404440558  0  4603 3
 7 1393517288  1 6005411421401181726  0     0 3
 1 1393517469  2 6005411421426871137  0     0 3
 2 1393517436  2 6005411421405090626  0    30 3
27 1393517312  1 6005411420237455817  0   130 3
 7 1393517493  2 6005411421430881585  0     0 3
 3 1393517496  1 6005411421406976625  0  9262 3
 2 1393517517  1 6005411421431197239  0   241 3
 7 1393517558  6 6005411421406989560  0  1049 3
 3 1393517593  3 6005411421411910445  0     0 3
 7 1393517622  6 6005411421400938787  0  3099 3
 3 1393517661  1 6005411421418123919  0     0 3
 2 1393517589  1 6005411421404371704  0     0 3
 8 1393517501  1 6005411421412875571  0   103 3
 7 1393517702  3 6005411421431198013  0     0 3
 2 1393517766  4 6005411421404434254  0   458 3
 3 1393517734  3 6005411421428407294  0     0 3
 7 1393517814  4 6005411421401193838  0  5791 3
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] bash/sed/awk to convert comma's not in quotes in a line with many comma's oly_r Programming 23 01-25-2012 08:53 AM
[SOLVED] How to remove line breaks from lines ending with a comma jasonws Linux - General 1 11-10-2010 12:03 PM
Adding a Comma to the end of every nth line in Vi (or sed). Euler2 Linux - Newbie 6 10-12-2009 09:38 AM
using sed to remove line in a comma-delimited file seefor Programming 4 03-10-2009 03:35 PM
separating a comma delimited line mrobertson Programming 7 07-27-2005 01:56 PM


All times are GMT -5. The time now is 01:01 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration