LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-22-2006, 06:54 PM   #1
smkamene
Member
 
Registered: Sep 2004
Location: Atlanta
Posts: 34

Rep: Reputation: 23
Can't get "sort' to work on a particular column


Hello folks, i am on hp-ux box running korn shell. I am trying to sort a file on a particular column as well i need to make it unique. I have tried so many different things such as:

cat goodtest|sort -u -k 2,2
cat goodtest|sort -u +1

nothings seems to work, i need the second column to be unique.i am still getting duplicates. Here is my file. Please help me

/dev/rdsk/c4t1d3 06BD 09A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c6t1d3 06BD 09A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c8t1d3 06BD 08A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c10t1d3 06BD 08A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
 
Old 06-23-2006, 09:26 AM   #2
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Form #1 works fine on my MEPIS 3.3.2 GNU/Linux box running bash, although I would have used the form:
Code:
sort -uk 2,2 goodtest
Are you using a different ver. of sort?:
Code:
$ sort --version
sort (coreutils) 5.2.1
Written by Mike Haertel and Paul Eggert.

Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Sorry, I pretty much use only bash & definitely only GNU/Linux, so I have no idea if you're having a korn or hp-ux problem.
 
Old 06-23-2006, 09:35 AM   #3
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
Code:
awk '!x[$2]++' filename > newfilename
where $2 is the second column
 
Old 06-23-2006, 09:36 AM   #4
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
And yes, his sort is probably not the GNU version - he's on hpux.
 
Old 06-23-2006, 03:07 PM   #5
smkamene
Member
 
Registered: Sep 2004
Location: Atlanta
Posts: 34

Original Poster
Rep: Reputation: 23
strange ..but this is what i get, for some reason it removes two instances of 06BD ..but leaves two there even though i did -u. Any ideas guys?

# sort -uk 2,2 goodtest
/dev/rdsk/c4t1d3 06BD 09A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c10t1d3 06BD 08A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
 
Old 06-23-2006, 03:13 PM   #6
smkamene
Member
 
Registered: Sep 2004
Location: Atlanta
Posts: 34

Original Poster
Rep: Reputation: 23
Jim,

your example did work , I've used awk before ..mostly something like this: "cat filename|awk '{print $2}'". Can you explain the syntax of your command. Thank you very much ..
 
Old 06-25-2006, 08:20 AM   #7
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Yes Jim, please explain.

FWIW, I think this is what's happening:
x[] is an array.
Its indices are values of $2.
Every time it "sees" a $2, the '++' increments the value associated w/ that index, creating a new array element if necessary.
This works because awk arrays can have non-numeric indexing, like a hash.
This may seem backward -- the indices are strings & the array elements are numbers.

The part that I 'm not sure about is why the logical negation, '!', makes it work -- w/ it in, only the 1st instance of a value for $2 prints; remove it, & everything after the 1st instance prints. I suspect that the '!' is operating on the logical value of the array element "x[$2]", which, before it is created by the '++', is false. See: http://www.gnu.org/software/gawk/man...l#Truth-Values. Note: only the trailing '++', "post-decrement", will work.
I believe he is also relying on an implicit "print $0" when no other action is specified.
 
Old 06-25-2006, 08:26 AM   #8
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
If you have just '<condition> ' as some testable value (true/false), when true, then awk prints $0 - the default action.

awk supports associative arrays. x[] is an array - the test is
if !x[column value] then print $0. The ++ then increments the element. From then on x[column value] is always non-zero.

Last edited by jim mcnamara; 06-25-2006 at 08:28 AM.
 
Old 06-25-2006, 11:27 AM   #9
smkamene
Member
 
Registered: Sep 2004
Location: Atlanta
Posts: 34

Original Poster
Rep: Reputation: 23
Jim,

thank you very much for taking the time to explain this. My programming skills are rather weak. So "!x[column value] then print $0" means that the first time around x is nothing so when x is compared against the value in [column value] it will be true since x is not equal to 06BD hence it goes to stdout. Now the second time around did x get assigned the value from [column value] and now when x is tested against [column value] it equal to 06BD and therefore is not printed to stdout?

Thank you again
 
Old 06-25-2006, 01:16 PM   #10
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Not at all.

x is an array.

x[$2] is an element of that array -- one such element is created for each unique $2.

In addition to being a field value in your input, each $2 is an index of the array.

The value of the array element x[$2] is the number of lines that contain that particular $2. This results from x[$2] being incremented each time the script reads a line that contains that particular $2.

The first time the script reads a line containing some new $2, x[$2] tests false because it is empty, as yet undefined. (It is then incremented to '1', after it is tested). The '!' negates the false to true, & the std. awk default action, print $0, is performed. (Print $0 means print the whole line).

The values of the array x are not the values of $2, but the number of occurrences of those values. Even though they are strings, the $2's are indices (names) of the elements of x.
 
Old 06-27-2006, 10:57 AM   #11
smkamene
Member
 
Registered: Sep 2004
Location: Atlanta
Posts: 34

Original Poster
Rep: Reputation: 23
archtoad6 ..i think i need to do more reading on associative arrays and awk. I am still struggling to understand. But i was playing around with sort and found this command to work, althogh i am still not clear what "2.2b" option does. I know that the first 2 means second filed but have not clue about 2b. Here is what worked for me:

sort -k 2.2b,2 -u filename

Thanks
 
Old 06-30-2006, 11:37 AM   #12
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
In GNU/Linux, & not necessarily on your HP-UX,
Code:
sort -k 2.2b,2 -u filename
Would mean "Sort <filename> on the 2nd field only, starting w/ the 2nd character (origin 1), ignoring blanks; & show only unique lines". To see if it might be different for you, I suggest you: a) post the ver. of your sort & b) check your man page -- it may be different from mine.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
"Used By" column blank in lsmod output rangalo Linux - Hardware 0 01-31-2006 05:45 AM
"Used By" column gone fro lsmod rangalo Slackware 1 01-30-2006 09:47 AM
Can "sort of" access the internet using Firefox & Suse 10.0 ejr Linux - General 3 12-27-2005 11:02 PM
"Price Paid" column in the HCL pages slackist LQ Suggestions & Feedback 0 07-02-2005 10:38 AM
The 6th column of "/etc/fsatb"? faezeh Fedora 4 03-22-2005 10:18 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration