LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-21-2008, 06:27 AM   #1
talat
Member
 
Registered: Jan 2006
Distribution: Centos
Posts: 145

Rep: Reputation: 16
Question Script to remove repetition from file


Hi Guys,

Consider the following scenario. I have a file which has list of users e.g


jone
micheal
jone
jone
steve
adam
steve

Now as you can see this list has repetition as well . I need to remove repetition from this file as this file has around 100s of entries. Can i have any sample script. Please guide.
 
Old 02-21-2008, 06:29 AM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,417

Rep: Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985
Code:
sort file.txt | uniq
 
Old 02-21-2008, 06:43 AM   #3
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 952

Rep: Reputation: 217Reputation: 217Reputation: 217
I thought of sort too but that would destroy the original order.

So using awk (maybe):
In each line,
if associative_array($0) doesnt exist, then the value in the array is NR;
On EOF,
Sort by the value and dump out.

Have to develop it but seems OK.

End
 
Old 02-21-2008, 06:49 AM   #4
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,417

Rep: Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985Reputation: 1985
True, but I can't see the original could could really matter in this scenario.
 
Old 02-23-2008, 03:43 AM   #5
talat
Member
 
Registered: Jan 2006
Distribution: Centos
Posts: 145

Original Poster
Rep: Reputation: 16
Many thanks guys
 
Old 02-23-2008, 05:47 AM   #6
/bin/bash
Senior Member
 
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802

Rep: Reputation: 47
$ cat file
jone
micheal
jone
jone
steve
adam
steve

$ sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
jone
micheal
steve
adam


HTH
HANDY ONE-LINERS FOR SED (Unix stream editor) Apr. 26, 2004
Latest version of this file is usually at:
http://sed.sourceforge.net/sed1line.txt
http://www.student.northpark.edu/pem...d/sed1line.txt
 
Old 02-23-2008, 06:50 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
# sort -u file
adam
jone
micheal
steve

# awk '!x[$0]++' file
jone
micheal
steve
adam
Quote:
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
don't think OP will understand.
 
Old 02-23-2008, 07:04 AM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file

don't think OP will understand.
I'm not sure if there are 100 people in the WORLD who would understand..... They say that C gives you the power to write incomprehensible code. SED's pretty good at that too........
 
Old 02-23-2008, 11:11 PM   #9
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Quote:
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file
Wow.. my head hurts just looking at that. I'm not that great with sed, can someone please explain that?

anyways, here's a shorter/more readable awk solution to your problem
Code:
$ awk 'seen[$0]!=1{print} {seen[$0]=1}' file
jone
micheal
steve
adam
 
Old 02-24-2008, 09:28 AM   #10
kaz2100
Senior Member
 
Registered: Apr 2005
Location: Penguin land, with apple, no gates
Distribution: SlackWare > Debian testing woody(32) sarge etch lenny squeeze(+64) wheezy .. bullseye bookworm
Posts: 1,832

Rep: Reputation: 108Reputation: 108
Hya,

I am trying to understand that sed command (and regular expression). However, it seems that I need more time.

So far, I have found, that the script works with sed on Macintosh (most probably BSD one, sed -v or --version gives me an error). But gnu sed (on Penguin, Debian lenny and etch) version 4.1.5 does not. (even with --posix option)

I will update.

Happy Penguins!
 
Old 02-24-2008, 12:12 PM   #11
kaz2100
Senior Member
 
Registered: Apr 2005
Location: Penguin land, with apple, no gates
Distribution: SlackWare > Debian testing woody(32) sarge etch lenny squeeze(+64) wheezy .. bullseye bookworm
Posts: 1,832

Rep: Reputation: 108Reputation: 108
Hya,

update to post #10.

After
Code:
setenv LANG C
the sed script works as expected. LANG was en_US, when the script did not work.

Now I know it is off topic.

Happy Penguins!

Last edited by kaz2100; 02-24-2008 at 12:12 PM. Reason: typo 1 -> 10
 
Old 02-24-2008, 12:12 PM   #12
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
$ sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' file

I can decipher everything except the part in bold.
"[a-f]" means anything in the range of a thru f (it can also mean A thru F---it does on my system).

I assume that "[ -~]" is meant to mean everything from " " (space)to "~". After several experiments, I am finding that ranges that include more than alphas and digits can be ambiguous and unpredictable--if for no other reason than characters within a range can have a special meaning. I never seen anything about this in the books.
 
Old 02-24-2008, 01:44 PM   #13
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.
Quote:
Originally Posted by pixellany View Post
... I never seen anything about this in the books.
Quote:
"Caution: ranges are locale-sensitive, and thus not portable."

-- Classic Shell Programming, page 34, POSIX meta-characters table, Robbins and Beebe, O'Reilly, 2005
On the other hand, I skimmed Effective AWK Programming, and didn't see any warning, nor in Programming Perl, 3rd. Perhaps such warnings are taken for granted by the time one is ready for awk and perl ... cheers, makyo

Last edited by makyo; 02-24-2008 at 01:45 PM.
 
Old 02-24-2008, 05:28 PM   #14
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,358

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
That's why Perl has the

use locale;

stricture available.
Actually I thought this page would mention it (http://perldoc.perl.org/perltrap.html) but it doesn't
 
Old 03-02-2008, 08:10 AM   #15
/bin/bash
Senior Member
 
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802

Rep: Reputation: 47
I can't find my handy little reference but I believe [:print:] and [ -~] are the same thing.
So it would match any non control character, i.e. any ascii character not below char(32).

Last edited by /bin/bash; 03-02-2008 at 08:11 AM. Reason: Turn off smilies.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
HELP! Script to RENAME/REMOVE ( ) From File NAMES!! xberetta21 Linux - Newbie 4 01-29-2008 01:10 PM
Need a script to remove last comma in a file jgombos Programming 15 01-14-2008 01:30 PM
How to remove first 2 lines of a file in a script nazs Programming 16 02-19-2007 07:08 AM
Make a script remove lines from a file? spiffytech Linux - Software 5 12-29-2005 11:50 AM
remove text from file with script paul_mat Linux - Software 3 11-17-2005 12:21 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration