[SOLVED] Create columns for each section of a line following the comma
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Create columns for each section of a line following the comma
Hello all,
I have a problem separating data in the lines below. Normally, in order to get a single column I would do something like:
Code:
awk '{print $1}'
The problem with that is, the lines in the file below cannot be separated by column.
So I need to find a way to separate the sections of the lines into columns so it'll be easier to distribute the data.
If anyone has advice on how I can put every string of characters that resides before each comma into a column, and remove the comma, it would be greatly appreciated.
Post #2 in this thread proposed using column. That works nicely but all data is left-justified. To my eye, right-justified values make a more attractive table (assuming that's what you really want). Column doesn't have a shift-right option. (Pity!) A Google search turned up an awk which does the job. The code isn't pretty but it works.
I don't take credit for the awk; I found it with a Google search. It's strength is that it is generalized. The user need not know the number columns in the input file or the width needed to print each column.
This delivers only column 1. I interpreted OP's requirement as being for all columns.
If only column 1 is needed I would use cut. Fewer keystrokes.
Code:
cut -d, -f1 $InFile >$OutFile
Daniel B. Martin
There are of course easier / better options, but I was using the OPs original input as they seemed to be saying that the awk being used was no longer able to provide output
As a variation on the code in post by Mr. Martin (#7, above), consider this gawk script. (Note that this uses gawk extensions.)
My point is to illustrate the additional possibilities provided by gawk (well, actually that part could be done Posix awk).
Anyhow:
Code:
#!/bin/gawk -f
# Convert a CSV file to a fixed field length file
#
# Usage: convert {file1 {file2 ... fileN} output_file}
#
# where input defaults to stdin and output to stdout
#
# Notes: This program stores all the input data an array
# available for processing, if needed, before the
# input files are written.
#
# The number of fields is NOT assumed to be the same
# in each record. Since the data array is two-dimensional,
# the number of fields in the i-th input line is length(data[i]),
# and those field values are data[i][j], j=1...length(data[i]).
########################################################################
#
# Initialization
BEGIN {
# Set the field seperator to a comma
FS=","
# If more than one file is specified on the input
# line, and OutFile is not defined, assume that
# the last entry is the output file name.
if ((! OutFile) && (ARGC > 2)) {
OutFile=ARGV[ARGC-1]
--ARGC
}
else {
OutFile="/dev/stdout"
}
}
{
++record_count
if (field_count < NF) {
field_count = NF
}
for (i=1;i<=NF;i++) {
if (w[i]<length($i)) {
w[i]=length($i)
}
data[record_count][i]=$i
}
}
# Add an extra blank before each field except the first
END {
for (i=2; i<=field_count; ++i) {
++w[i]
}
}
# Write the output
END {
for (j=1; j<=record_count; ++j) {
for (i=1;i<=length(data[j]);++i) {
printf("%" w[i] "s", data[j][i]) > OutFile
}
printf "\n" > OutFile
}
}
# And, to illustrate other possibilities, some statistics ...
END {
for (j=1; j<=record_count; ++j) {
for (i=1; i<=length(data[j]);++i) {
datum=0+data[j][i]
if (datum != data[j][i]) {
continue # Skip any non-numeric values
}
++n[i]
if (datum > max[i]) {
max[i]=datum
}
if (min[i] > datum) {
min[i]=datum
}
sum[i]+=datum
ss[i]+=datum*datum
++freq[i][datum]
}
}
for (i=1; i<=field_count; ++i) {
if (n[i]>0) {
mean=sum[i]/n[i]
nv=asorti(freq[i],asc)
if (nv==1) {
printf("Column %d: All values are equal to %d.\n", i, asc[1])
continue
}
median=asc[nv]
std=((ss[i]-(mean*mean))/(n[i]-1))**0.5
printf("Column %d: Average=%f, Meadian=%f, Standard Error=%f\n", i, mean, median, std)
}
else {
printf("Column %d had no numeric entries.\n", i)
}
}
}
which, after being saved as "columize" and made executable, produces:
Code:
$ ./columize test_data test_out
Column 1: Average=5.333333, Meadian=1.000000, Standard Error=7.711160
Column 2: Average=1393517309.000000, Meadian=1393516861.000000, Standard Error=1393517309.000027
Column 3: Average=4.333333, Meadian=1.000000, Standard Error=8.062620
Column 4: Average=6005411421262476288.000000, Meadian=6005411420217366528.000000, Standard Error=6005411421262476288.000000
Column 5: Average=2.051282, Meadian=0.000000, Standard Error=11.408437
Column 6: Average=2663.974359, Meadian=0.000000, Standard Error=6005.381993
Column 7: All values are equal to 3.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.