LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-21-2013, 05:33 PM   #1
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Rep: Reputation: Disabled
How to order data based on missing values?


I have a file which looks like this.

Code:
con1	BC1a:25	BC1b:25	BC2a:2	BC2b:20	BC3a:5 BC3b:56
con2	BC1a:25	BC2a:2				
con3	BC2a:2	BC3a:5	BC3b:6			
con4	BC1b:20	BC2a:12	BC2b:20	BC3a:50	BC3b:5
All the rows doesn't have the equal number of columns. I would like to order the data in the columns based on BCtag and insert missing values for columns that are empty.

My result should look something like this

Code:
con1	BC1a:25	BC1b:25	BC2a:2	BC2b:20	BC3a:5 BC3b:56
con2	BC1a:25	-	BC2a:2	-	-	-
con3	-	-	BC2a:2	-	BC3a:5	BC3b:6
con4	-	BC1b:20	BC2a:12	BC2b:20	BC3a:50 BC3b:5

My actual file has many thousands of rows and 150 columns. Any ideas of how to do this?

Thanks in advance.
 
Old 01-21-2013, 09:55 PM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,714

Rep: Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280
not easy... but doable.

I'd do it in perl - the associative hash tables would do it (the BC1a labels used as subscript).

You start with an array - containing all the labels you will be using.

Then create a reference hash table that has values for each of the labels (the "-").

in a loop for each line,
initialize a new hash table with the reference table. for each token on the line (you do have to split the token at the ":") replace the new hash table entry with the string based on the BCxx label that is in the token list.

Last, output the new hash table based on the array (in the proper order) containing the labels.

You could do it with python, but I'm not that familiar with python. Can't give you a sample code right now - have to deal with some cats...
Maybe tomorrow though.
 
Old 01-22-2013, 10:00 AM   #3
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,685

Rep: Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571Reputation: 1571
My attempt at a bash solution.
Code:
!/bin/bash

# Read the data file to get a sorted list of prefixes
while read -a aline ; do
  for (( i=1; i<"${#aline[@]}"; i+=1 )); do
    echo "${aline[i]%:*}" >> prefixes.txt;
  done
  sort -u prefixes.txt > temp.txt;
  mv temp.txt prefixes.txt;
done < data.txt

# Use the sorted list to produce the output file
readarray -t prefixes < prefixes.txt
while read -a aline ; do
  j=1;
  echo -n "${aline[0]}" >> output.txt;
  for (( i=0; i<"${#prefixes[@]}"; i+=1 )); do
    if [[ "${aline[j]%:*}" == "${prefixes[i]}" ]] ; then 
      echo -n " ${aline[j]}" >> output.txt;
      (( j+=1 ));
    else   
      echo -n " -" >> output.txt;
    fi
  done
  echo "" >> output.txt;
done < data.txt

# Cleanup  
rm prefixes.txt
Using the input you gave in a file named data.txt, this script produces a file output.txt containing
Code:
con1 BC1a:25 BC1b:25 BC2a:2 BC2b:20 BC3a:5 BC3b:56
con2 BC1a:25 - BC2a:2 - - -
con3 - - BC2a:2 - BC3a:5 BC3b:6
con4 - BC1b:20 BC2a:12 BC2b:20 BC3a:50 BC3b:5
I should point out that the script assumes the fields in a line are in lexicographic order.

Last edited by allend; 01-22-2013 at 10:30 AM.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3 hubleo Linux - Newbie 9 04-25-2012 08:30 AM
[SOLVED] Delete rows based on values in a column using sed captainentropy Linux - Newbie 6 01-19-2011 09:59 AM
Equivalence classes, based on field values and multi-key hashtable openSauce Programming 5 05-16-2010 09:35 PM
Form values and session variables missing when translated by Babelfish qb1 Programming 0 05-30-2009 02:00 AM
Rkhunter Missing Hash values Golgo13 Linux - Software 2 07-29-2008 09:21 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration