Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
06-22-2006, 06:54 PM
|
#1
|
Member
Registered: Sep 2004
Location: Atlanta
Posts: 34
Rep:
|
Can't get "sort' to work on a particular column
Hello folks, i am on hp-ux box running korn shell. I am trying to sort a file on a particular column as well i need to make it unique. I have tried so many different things such as:
cat goodtest|sort -u -k 2,2
cat goodtest|sort -u +1
nothings seems to work, i need the second column to be unique.i am still getting duplicates. Here is my file. Please help me
/dev/rdsk/c4t1d3 06BD 09A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c6t1d3 06BD 09A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c8t1d3 06BD 08A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c10t1d3 06BD 08A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
|
|
|
06-23-2006, 09:26 AM
|
#2
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
Form #1 works fine on my MEPIS 3.3.2 GNU/Linux box running bash, although I would have used the form:
Code:
sort -uk 2,2 goodtest
Are you using a different ver. of sort?:
Code:
$ sort --version
sort (coreutils) 5.2.1
Written by Mike Haertel and Paul Eggert.
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Sorry, I pretty much use only bash & definitely only GNU/Linux, so I have no idea if you're having a korn or hp-ux problem.
|
|
|
06-23-2006, 09:35 AM
|
#3
|
Member
Registered: May 2002
Posts: 964
Rep:
|
Code:
awk '!x[$2]++' filename > newfilename
where $2 is the second column
|
|
|
06-23-2006, 09:36 AM
|
#4
|
Member
Registered: May 2002
Posts: 964
Rep:
|
And yes, his sort is probably not the GNU version - he's on hpux.
|
|
|
06-23-2006, 03:07 PM
|
#5
|
Member
Registered: Sep 2004
Location: Atlanta
Posts: 34
Original Poster
Rep:
|
strange ..but this is what i get, for some reason it removes two instances of 06BD ..but leaves two there even though i did -u. Any ideas guys?
# sort -uk 2,2 goodtest
/dev/rdsk/c4t1d3 06BD 09A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
/dev/rdsk/c10t1d3 06BD 08A:1 15A:C2 RAID-5 Grp'd (M) RW 34526
|
|
|
06-23-2006, 03:13 PM
|
#6
|
Member
Registered: Sep 2004
Location: Atlanta
Posts: 34
Original Poster
Rep:
|
Jim,
your example did work , I've used awk before ..mostly something like this: "cat filename|awk '{print $2}'". Can you explain the syntax of your command. Thank you very much ..
|
|
|
06-25-2006, 08:20 AM
|
#7
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
Yes Jim, please explain.
FWIW, I think this is what's happening:
x[] is an array.
Its indices are values of $2.
Every time it "sees" a $2, the '++' increments the value associated w/ that index, creating a new array element if necessary.
This works because awk arrays can have non-numeric indexing, like a hash.
This may seem backward -- the indices are strings & the array elements are numbers.
The part that I 'm not sure about is why the logical negation, '!', makes it work -- w/ it in, only the 1st instance of a value for $2 prints; remove it, & everything after the 1st instance prints. I suspect that the '!' is operating on the logical value of the array element "x[$2]", which, before it is created by the '++', is false. See: http://www.gnu.org/software/gawk/man...l#Truth-Values. Note: only the trailing '++', "post-decrement", will work.
I believe he is also relying on an implicit "print $0" when no other action is specified.
|
|
|
06-25-2006, 08:26 AM
|
#8
|
Member
Registered: May 2002
Posts: 964
Rep:
|
If you have just '<condition> ' as some testable value (true/false), when true, then awk prints $0 - the default action.
awk supports associative arrays. x[] is an array - the test is
if !x[column value] then print $0. The ++ then increments the element. From then on x[column value] is always non-zero.
Last edited by jim mcnamara; 06-25-2006 at 08:28 AM.
|
|
|
06-25-2006, 11:27 AM
|
#9
|
Member
Registered: Sep 2004
Location: Atlanta
Posts: 34
Original Poster
Rep:
|
Jim,
thank you very much for taking the time to explain this. My programming skills are rather weak. So "!x[column value] then print $0" means that the first time around x is nothing so when x is compared against the value in [column value] it will be true since x is not equal to 06BD hence it goes to stdout. Now the second time around did x get assigned the value from [column value] and now when x is tested against [column value] it equal to 06BD and therefore is not printed to stdout?
Thank you again
|
|
|
06-25-2006, 01:16 PM
|
#10
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
Not at all.
x is an array.
x[$2] is an element of that array -- one such element is created for each unique $2.
In addition to being a field value in your input, each $2 is an index of the array.
The value of the array element x[$2] is the number of lines that contain that particular $2. This results from x[$2] being incremented each time the script reads a line that contains that particular $2.
The first time the script reads a line containing some new $2, x[$2] tests false because it is empty, as yet undefined. (It is then incremented to '1', after it is tested). The '!' negates the false to true, & the std. awk default action, print $0, is performed. (Print $0 means print the whole line).
The values of the array x are not the values of $2, but the number of occurrences of those values. Even though they are strings, the $2's are indices (names) of the elements of x.
|
|
|
06-27-2006, 10:57 AM
|
#11
|
Member
Registered: Sep 2004
Location: Atlanta
Posts: 34
Original Poster
Rep:
|
archtoad6 ..i think i need to do more reading on associative arrays and awk. I am still struggling to understand. But i was playing around with sort and found this command to work, althogh i am still not clear what "2.2b" option does. I know that the first 2 means second filed but have not clue about 2b. Here is what worked for me:
sort -k 2.2b,2 -u filename
Thanks
|
|
|
06-30-2006, 11:37 AM
|
#12
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
In GNU/Linux, & not necessarily on your HP-UX,
Code:
sort -k 2.2b,2 -u filename
Would mean "Sort <filename> on the 2nd field only, starting w/ the 2nd character (origin 1), ignoring blanks; & show only unique lines". To see if it might be different for you, I suggest you: a) post the ver. of your sort & b) check your man page -- it may be different from mine.
|
|
|
All times are GMT -5. The time now is 01:23 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|