LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 06-20-2011, 03:26 PM   #1
lethalfang
LQ Newbie
 
Registered: Jun 2011
Location: San Francisco, CA
Posts: 11

Rep: Reputation: Disabled
text manipulation in bash: sort columns according to the first row


Let's say I have 50,000 rows of data that looks like the following:


B A D C Z E F ......
3 1 4 1 9 2 3 ......
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
0 1 3 4 2 4 8 ......


And I want to sort the columns of the data according to the first row, such that the output looks like:

A B C D E F ...... Z
1 3 1 4 2 3 ...... 9
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
1 0 4 3 4 8 ...... 2



How do I do that?


I tried to transpose columns into rows using "awk," and then use the "sort" command, and transpose back use "awk."

But awk doesn't work if the number of columns exceed ~33K, which happens when those rows are first transposed to columns, i.e., transposing back didn't work for me.


Any idea how to do this?

Thanks in advance.
 
Old 06-20-2011, 03:51 PM   #2
drub
LQ Newbie
 
Registered: Feb 2009
Posts: 3

Rep: Reputation: 0
Tokenize a line, save into a temp file, sort the temp file

Hey Lethalfang

set() can be used for the purpose of parsing the line into tokens. The tokens can then be re-ordered.

Sorry, I don't have a functional example to provide, but it goes something like this:

Code:
tmpfile = "MyTmpFile"

# You may want to change the field separator value. Here it is set to ":"
# If not, ignore the next 2 lines.
orig_IFS = $IFS
IFS = ":"

while read line 
do
  # Tokenize the input line.
  set $line
  
  # You now have the tokens of the input line.
  print "token 1 ..... $1"
  print "token 2 ..... $2"
  print "token 9 ..... $9"
  
  # Do your magic. Re-order them?
  print "$2 $9 $1" > $tmpfile
done

# Sort the tempfile, selecting the field(s) of interest.
sort $tmpfile

exit 0
I hope this is what you are after.

Cheers
Drub
 
Old 06-20-2011, 03:52 PM   #3
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
You can try the asorti function (available from GNU awk version 3.1.2). You need to sort only the first row and use the sorted indices to print the fields in the correct order:
Code:
NR == 1 {

  c = 0

  for ( i = 1; i <= NF; i++ )
    array[$i] = ++c
  
  n = asorti(array, indices)

}

{

  for ( i = 1; i <= n; i++ )
    printf "%s ", $(array[indices[i]])
    
    printf "\n"

}

Last edited by colucix; 06-20-2011 at 04:02 PM.
 
Old 06-20-2011, 04:09 PM   #4
lisle2011
Member
 
Registered: Mar 2011
Location: Surrey B.C. Canada (Metro Vancouver)
Distribution: Slackware 2.6.33.4-smp
Posts: 179
Blog Entries: 1

Rep: Reputation: 25
Your question

(printf "PERM LINKS OWNER GROUP SIZE MONTH DAY HH:MM/YEAR NAME\n" \ ; ls -l) | column -t

This works for a directory listing, depending on how you have separated your columns or if you have separated your columns

man columns.
 
Old 06-20-2011, 05:01 PM   #5
lethalfang
LQ Newbie
 
Registered: Jun 2011
Location: San Francisco, CA
Posts: 11

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by colucix View Post
You can try the asorti function (available from GNU awk version 3.1.2). You need to sort only the first row and use the sorted indices to print the fields in the correct order:
Code:
NR == 1 {

  c = 0

  for ( i = 1; i <= NF; i++ )
    array[$i] = ++c
  
  n = asorti(array, indices)

}

{

  for ( i = 1; i <= n; i++ )
    printf "%s ", $(array[indices[i]])
    
    printf "\n"

}


How should the code look like for this function?
Thanks.
 
Old 06-20-2011, 05:10 PM   #6
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by lethalfang View Post
How should the code look like for this function?
Thanks.
You can try it as is. Suppose you write it in a file called test.awk:
Code:
awk -f test.awk file
where file is the file you want to sort out. It parses the first row to retrieve the sorted indexes, that is the order by which to sort all the other lines, then it prints the sorted lines accordingly. If it is not clear, feel free to ask. More details about the asorti function, here.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Text Manipulation with bash for Snort Alert andrapgm03 Programming 34 11-29-2010 01:29 AM
Text file manipulation: selecting specific lines/columns using awk and print CHARL0TTE Linux - Newbie 2 02-27-2010 03:40 AM
Row manipulation with awk SHIFTA Linux - Newbie 1 11-05-2009 11:37 PM
sort by multiple columns wakatana Linux - Newbie 5 10-18-2009 04:35 PM
Text editing: Adding a digit/text to the end of a row CHARL0TTE Linux - Newbie 13 07-16-2009 07:44 AM


All times are GMT -5. The time now is 11:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration