Script to remove repetition from file
Hi Guys,
Consider the following scenario. I have a file which has list of users e.g jone micheal jone jone steve adam steve Now as you can see this list has repetition as well . I need to remove repetition from this file as this file has around 100s of entries. Can i have any sample script. Please guide. |
Code:
sort file.txt | uniq |
I thought of sort too but that would destroy the original order.
So using awk (maybe): In each line, if associative_array($0) doesnt exist, then the value in the array is NR; On EOF, Sort by the value and dump out. Have to develop it but seems OK. End |
True, but I can't see the original could could really matter in this scenario.
|
Many thanks guys
|
$ cat file
jone micheal jone jone steve adam steve $ sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file jone micheal steve adam HTH HANDY ONE-LINERS FOR SED (Unix stream editor) Apr. 26, 2004 Latest version of this file is usually at: http://sed.sourceforge.net/sed1line.txt http://www.student.northpark.edu/pem...d/sed1line.txt |
Code:
# sort -u file Quote:
|
Quote:
|
Quote:
anyways, here's a shorter/more readable awk solution to your problem Code:
$ awk 'seen[$0]!=1{print} {seen[$0]=1}' file |
Hya,
I am trying to understand that sed command (and regular expression). However, it seems that I need more time. So far, I have found, that the script works with sed on Macintosh (most probably BSD one, sed -v or --version gives me an error). But gnu sed (on Penguin, Debian lenny and etch) version 4.1.5 does not. (even with --posix option) I will update. Happy Penguins! |
Hya,
update to post #10. After Code:
setenv LANG C Now I know it is off topic. Happy Penguins! |
$ sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
I can decipher everything except the part in bold. "[a-f]" means anything in the range of a thru f (it can also mean A thru F---it does on my system). I assume that "[ -~]" is meant to mean everything from " " (space)to "~". After several experiments, I am finding that ranges that include more than alphas and digits can be ambiguous and unpredictable--if for no other reason than characters within a range can have a special meaning. I never seen anything about this in the books. |
Hi.
Quote:
Quote:
|
That's why Perl has the
use locale; stricture available. :) Actually I thought this page would mention it (http://perldoc.perl.org/perltrap.html) but it doesn't :( |
I can't find my handy little reference but I believe [:print:] and [ -~] are the same thing.
So it would match any non control character, i.e. any ascii character not below char(32). |
All times are GMT -5. The time now is 01:15 AM. |