LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Script to remove repetition from file (https://www.linuxquestions.org/questions/programming-9/script-to-remove-repetition-from-file-622748/)

talat 02-21-2008 06:27 AM

Script to remove repetition from file
 
Hi Guys,

Consider the following scenario. I have a file which has list of users e.g


jone
micheal
jone
jone
steve
adam
steve

Now as you can see this list has repetition as well . I need to remove repetition from this file as this file has around 100s of entries. Can i have any sample script. Please guide.

acid_kewpie 02-21-2008 06:29 AM

Code:

sort file.txt | uniq

AnanthaP 02-21-2008 06:43 AM

I thought of sort too but that would destroy the original order.

So using awk (maybe):
In each line,
if associative_array($0) doesnt exist, then the value in the array is NR;
On EOF,
Sort by the value and dump out.

Have to develop it but seems OK.

End

acid_kewpie 02-21-2008 06:49 AM

True, but I can't see the original could could really matter in this scenario.

talat 02-23-2008 03:43 AM

Many thanks guys

/bin/bash 02-23-2008 05:47 AM

$ cat file
jone
micheal
jone
jone
steve
adam
steve

$ sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
jone
micheal
steve
adam


HTH
HANDY ONE-LINERS FOR SED (Unix stream editor) Apr. 26, 2004
Latest version of this file is usually at:
http://sed.sourceforge.net/sed1line.txt
http://www.student.northpark.edu/pem...d/sed1line.txt

ghostdog74 02-23-2008 06:50 AM

Code:

# sort -u file
adam
jone
micheal
steve

# awk '!x[$0]++' file
jone
micheal
steve
adam

Quote:

sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
don't think OP will understand.

pixellany 02-23-2008 07:04 AM

Quote:

sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file

don't think OP will understand.
I'm not sure if there are 100 people in the WORLD who would understand.....;) They say that C gives you the power to write incomprehensible code. SED's pretty good at that too........

angrybanana 02-23-2008 11:11 PM

Quote:

sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
Wow.. my head hurts just looking at that. I'm not that great with sed, can someone please explain that?

anyways, here's a shorter/more readable awk solution to your problem
Code:

$ awk 'seen[$0]!=1{print} {seen[$0]=1}' file
jone
micheal
steve
adam


kaz2100 02-24-2008 09:28 AM

Hya,

I am trying to understand that sed command (and regular expression). However, it seems that I need more time.

So far, I have found, that the script works with sed on Macintosh (most probably BSD one, sed -v or --version gives me an error). But gnu sed (on Penguin, Debian lenny and etch) version 4.1.5 does not. (even with --posix option)

I will update.

Happy Penguins!

kaz2100 02-24-2008 12:12 PM

Hya,

update to post #10.

After
Code:

setenv LANG C
the sed script works as expected. LANG was en_US, when the script did not work.

Now I know it is off topic.

Happy Penguins!

pixellany 02-24-2008 12:12 PM

$ sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file

I can decipher everything except the part in bold.
"[a-f]" means anything in the range of a thru f (it can also mean A thru F---it does on my system).

I assume that "[ -~]" is meant to mean everything from " " (space)to "~". After several experiments, I am finding that ranges that include more than alphas and digits can be ambiguous and unpredictable--if for no other reason than characters within a range can have a special meaning. I never seen anything about this in the books.

makyo 02-24-2008 01:44 PM

Hi.
Quote:

Originally Posted by pixellany (Post 3068302)
... I never seen anything about this in the books.

Quote:

"Caution: ranges are locale-sensitive, and thus not portable."

-- Classic Shell Programming, page 34, POSIX meta-characters table, Robbins and Beebe, O'Reilly, 2005
On the other hand, I skimmed Effective AWK Programming, and didn't see any warning, nor in Programming Perl, 3rd. Perhaps such warnings are taken for granted by the time one is ready for awk and perl ... cheers, makyo

chrism01 02-24-2008 05:28 PM

That's why Perl has the

use locale;

stricture available. :)
Actually I thought this page would mention it (http://perldoc.perl.org/perltrap.html) but it doesn't :(

/bin/bash 03-02-2008 08:10 AM

I can't find my handy little reference but I believe [:print:] and [ -~] are the same thing.
So it would match any non control character, i.e. any ascii character not below char(32).


All times are GMT -5. The time now is 01:15 AM.