LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sort multiple columns + replace another column (https://www.linuxquestions.org/questions/linux-newbie-8/sort-multiple-columns-replace-another-column-871667/)

cedance 03-29-2011 04:30 AM

sort multiple columns + replace another column
 
Hi, I have two files as shown here:

File 1: http://cl.ly/2n0J3O2w1Z0p2A1j0o30
File 2: http://cl.ly/2j3B2h1J193Z0r1B3I43

I have to sort for the first column in reverse and then sort the second column numerically, which I managed to do with the command:

Code:

sort -k1,1r -k2,2n <file1> <file2>
The output is shown here: http://cl.ly/3a38450X0n260I1f0z3s

However, since the 4th column just counted each entry from 1 to n, I have duplicate entries after merge. I would like to have this column run from JUNC000001 to JUNC<no: of entries> again. I can get the number of entries with
Code:

cat <merged_file> | wc -l
But is it possible to replace this column with JUNC<number> running from 1 to number of entries directly via unix command line?

Thank you very much!

David the H. 03-29-2011 05:59 AM

It's rather hard to run tests when all we have are screenshots to work from. I'm not about to transcribe those images into actual text. Care to post some actual data in text format?

In any case, I'm not entirely clear on what you're asking. In your examples all entries in c1 are the same, and I don't see any duplicate lines in the output screenshot, only duplicates inside c4. If I understand correctly though, you want to replace everything in the c4 column with sequential numbers? That should be possible with awk. Something like this, perhaps?
Code:

awk 'BEGIN{format="%s %s %s JUNC%08d %s %s %s %s %s %s %s\n"} ; { i++ ; $4=i ;  printf(format , $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11) }' merged-file
It's kind of messy though, because I assume you'd want to properly zero-pad the entries, and my knowledge of printf in awk is limited. I can't figure out how to do it without defining every field separately (if that's even possible). Perhaps someone with more experience can come along and simplify it. :)


BTW, it's not usually necessary to use cat and a pipe with wc (or most other cli tools for that matter). It can take a file name as an argument.
Code:

wc -l merged-file


All times are GMT -5. The time now is 12:43 PM.