How to convert 1 column into several rows in Linux?

markraem · 11-03-2004, 11:47 AM

Suppose I have an ASCII file (=data.txt) that only contains 1 column :
#cat data.txt
item10
item11
item12
item20
item21
item22
item30
item31
item32
...

I am looking for a script that creates rows for each 3 elements in the column.

This means that after executing

#script data.txt > result.txt

result.txt will be :

#cat result.txt
item10,item11,item12
item20,item21,item22
item30,item31,item32
...

I know there are possibilities to convert the column entirely in 1 row (see this forum) but I do not see how I can convert into multiple rows, based on a fixed number of elements of the column.

Anbody any thoughts ?

Tnx in advance.

DarkstarNL · 11-03-2004, 01:12 PM

This cant be that hard!! read a line from the file print is print a comma read another line print it print a comma read another line print it print a newline and so on ............

Bash scipting howto can help you with this!!

kscott121 · 11-03-2004, 01:25 PM

Here is almost what you want (it is a BASH script)
Save this. make it executable (using chmod)
This came from a script that acts on mutilple file in a dir and I wasn't sure how to strip that part away so I didn't.
It presumes that the input file is foo.dat and the output file is foo.new
I ran it and it works for me
Good Luck
Ken

#!/bin/sh
#doit4.sh
# effort at reading a column of items and outputting it as
# item1, item2, item3
# item 4, item 5,item 6 etc
for file in `ls foo.dat` ; do
exec < $file
while read line
do
temp1=$line

read line
temp2=$line

read line
temp3=$line
echo $temp1,$temp2,$temp >> foo.new
echo $temp1,$temp2,$temp3
done
done

druuna · 11-03-2004, 01:27 PM

Or a 'one-liner':

awk 'BEGIN { FS="\n"; ORS="," } { print $0 }' test | sed 's/$[a-zA-Z0-9]*,[a-zA-Z0-9]*,[a-zA-Z0-9]*$,/\1\n/g'

Only works if itemXX is made up out of lower case and/or upper case and/or digits. i.e. item_01 would fail......

The regular expression could be written shorter and better, but I leave that to somebody else

DarkstarNL's suggestion would allow you to make the code 'a bit' more readable and you also don't have the limitation that the above one-liner has (at the moment).

Matir · 11-03-2004, 02:24 PM

Try this:

Code:

#!/bin/sh
n=0

for i in `cat ${1}`
        do STR=${STR},${i}
        let n=${n}+1
        if [ "$n" -eq "3" ] ;
                then echo ${STR} | sed 's/^,//g'
                STR=""
                n=0
        fi
done
echo ${STR} | sed 's/^,//g'

Usage:

Code:

conv.sh inputfile > outputfile

markraem · 11-04-2004, 03:50 AM

Since my data contain underscores, I tried the solution of Matir and it works like champ !

Even if I do not use others solution, I learnt something about it. (the awk command will get my further study)

Special thanks also for Matir.

Matir · 11-04-2004, 09:23 PM

Not a problem, just glad I was able to help.

Most of what I do here is just read and ask questions. It's rare that I am able to help anyone, and tonight I seem to have been able to help in 2 threads.

On another note, I got myself several wonderful linux books tonight. Well, 2 directly linux. I got "Linux Complete", "Linux Security Cookbook", and "Security+ For Dummies." This could be useful.

quantumqueen · 03-29-2010, 01:13 PM

What if I desire to skip the first few lines before ordering the column vector into multiple rows? Can this be added to Matir's script? Also, I'd like to prevent any carriage returns in the output.

kscott121 · 03-29-2010, 01:43 PM

Maybe put a couple or three of these dummy line reads in before you start processing

temp=$line
not sure how to have no carriage returns but if you don't something, you'll get only 1 run-on line of output and you probably don't want that.

quantumqueen · 03-30-2010, 11:24 AM

I ended up using a recursive sed '1d' in a shell script. Like this:

#!/bin/bash
COUNTER=0
sed '1d' 94Mo.Spe > temp_$COUNTER.txt
while [ $COUNTER != 11 ]
do
sed '1d' temp_$COUNTER.txt > temp_mid.txt
let COUNTER=COUNTER+1
cp temp_mid.txt temp_$COUNTER.txt
done
cp temp_$COUNTER.txt 94Mo_mod.txt
rm temp*.txt

This removed the header lines. Then, I opened the _mod.txt file in xemacs and used replace-string with CTRL-Q CTRL-M to remove the carriage returns. I then used the code from Matir below to rearrange the column into multiple rows. I modified it, so there were no commas, but instead a space in between:

#!/bin/sh
n=0

for i in `cat ${1}`
do STR=${STR}${i}" "
let n=${n}+1
if [ "$n" -eq "8" ] ;
then echo ${STR}
STR=""
n=0
fi
done
echo ${STR}