LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   text manipulation in bash: sort columns according to the first row (https://www.linuxquestions.org/questions/linux-newbie-8/text-manipulation-in-bash-sort-columns-according-to-the-first-row-887389/)

lethalfang 06-20-2011 02:26 PM

text manipulation in bash: sort columns according to the first row
 
Let's say I have 50,000 rows of data that looks like the following:


B A D C Z E F ......
3 1 4 1 9 2 3 ......
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
0 1 3 4 2 4 8 ......


And I want to sort the columns of the data according to the first row, such that the output looks like:

A B C D E F ...... Z
1 3 1 4 2 3 ...... 9
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
1 0 4 3 4 8 ...... 2



How do I do that?


I tried to transpose columns into rows using "awk," and then use the "sort" command, and transpose back use "awk."

But awk doesn't work if the number of columns exceed ~33K, which happens when those rows are first transposed to columns, i.e., transposing back didn't work for me.


Any idea how to do this?

Thanks in advance.

drub 06-20-2011 02:51 PM

Tokenize a line, save into a temp file, sort the temp file
 
Hey Lethalfang

set() can be used for the purpose of parsing the line into tokens. The tokens can then be re-ordered.

Sorry, I don't have a functional example to provide, but it goes something like this:

Code:

tmpfile = "MyTmpFile"

# You may want to change the field separator value. Here it is set to ":"
# If not, ignore the next 2 lines.
orig_IFS = $IFS
IFS = ":"

while read line
do
  # Tokenize the input line.
  set $line
 
  # You now have the tokens of the input line.
  print "token 1 ..... $1"
  print "token 2 ..... $2"
  print "token 9 ..... $9"
 
  # Do your magic. Re-order them?
  print "$2 $9 $1" > $tmpfile
done

# Sort the tempfile, selecting the field(s) of interest.
sort $tmpfile

exit 0

I hope this is what you are after.

Cheers
Drub

colucix 06-20-2011 02:52 PM

You can try the asorti function (available from GNU awk version 3.1.2). You need to sort only the first row and use the sorted indices to print the fields in the correct order:
Code:

NR == 1 {

  c = 0

  for ( i = 1; i <= NF; i++ )
    array[$i] = ++c
 
  n = asorti(array, indices)

}

{

  for ( i = 1; i <= n; i++ )
    printf "%s ", $(array[indices[i]])
   
    printf "\n"

}


lisle2011 06-20-2011 03:09 PM

Your question
 
(printf "PERM LINKS OWNER GROUP SIZE MONTH DAY HH:MM/YEAR NAME\n" \ ; ls -l) | column -t

This works for a directory listing, depending on how you have separated your columns or if you have separated your columns

man columns.

lethalfang 06-20-2011 04:01 PM

Quote:

Originally Posted by colucix (Post 4391066)
You can try the asorti function (available from GNU awk version 3.1.2). You need to sort only the first row and use the sorted indices to print the fields in the correct order:
Code:

NR == 1 {

  c = 0

  for ( i = 1; i <= NF; i++ )
    array[$i] = ++c
 
  n = asorti(array, indices)

}

{

  for ( i = 1; i <= n; i++ )
    printf "%s ", $(array[indices[i]])
   
    printf "\n"

}




How should the code look like for this function?
Thanks.

colucix 06-20-2011 04:10 PM

Quote:

Originally Posted by lethalfang (Post 4391112)
How should the code look like for this function?
Thanks.

You can try it as is. Suppose you write it in a file called test.awk:
Code:

awk -f test.awk file
where file is the file you want to sort out. It parses the first row to retrieve the sorted indexes, that is the order by which to sort all the other lines, then it prints the sorted lines accordingly. If it is not clear, feel free to ask. More details about the asorti function, here.


All times are GMT -5. The time now is 06:15 AM.