LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-07-2011, 06:44 PM   #1
ut0ugh1
Member
 
Registered: Oct 2011
Posts: 59

Rep: Reputation: Disabled
SED script with conditional filtering


is there any1 who can write a shorter sed script to do the same thing that the following does?

cat file.txt | sed -e '/\([0-9][0-9]\)\1\{0,\}/d' -e '/\([bcdfghjklmnpqrstvxwz]\)\{44,60\}/d' -e '/\([bcdfghjklmnpqrstvxwz]\[bcdfghjklmnpqrstvxwz]\)\1\{2,\}/d' -e '/\([b][b][b]\)\1\{0,\}/d' -e '/\([c][c][c]\)\1\{0,\}/d' -e '/\([d][d][d]\)\1\{0,\}/d' -e '/\([f][f][f]\)\1\{0,\}/d' -e '/\([g][g][g]\)\1\{0,\}/d' -e '/\([h][h][h]\)\1\{0,\}/d' -e '/\([j][j][j]\)\1\{0,\}/d' -e '/\([k][k][k]\)\1\{0,\}/d' -e '/\([l][l][l]\)\1\{0,\}/d' -e '/\([l][l][l]\)\1\{0,\}/d' -e '/\([m][m][m]\)\1\{0,\}/d' -e '/\([n][n][n]\)\1\{0,\}/d' -e '/\([p][p][p]\)\1\{0,\}/d' -e '/\([q][q][q]\)\1\{0,\}/d' -e '/\([r][r][r]\)\1\{0,\}/d' -e '/\([s][s][s]\)\1\{0,\}/d' -e '/\([t][t][t]\)\1\{0,\}/d' -e '/\([v][v][v]\)\1\{0,\}/d' -e '/\([x][x][x]\)\1\{0,\}/d' -e '/\([w][w][w]\)\1\{0,\}/d' -e '/\([z][z][z]\)\1\{0,\}/d' -e '/\([aeiou][aeiou][aeiou]\)\1\{0,\}/d'

thx
 
Old 11-07-2011, 07:47 PM   #2
Ian John Locke II
Member
 
Registered: Mar 2008
Location: /dev/null
Distribution: Slackware, Android, Slackware64
Posts: 130

Rep: Reputation: 17
Well for starters, you can use the -r option so you don't have to escape the special characters like you're doing above. That'll cut out I don't know how many backslashes.

Second, for the times where you're looking for consonants you can do [^aeiou] which is shorter than [bcdfghjklmnpqrstvxwz].

For the times where you're specifying [d][d][d] for example you can just use d{3}.

I'm not sure why you're using \1 in certain places when that's usually a back-ref but I'm not extraordinarily familiar with sed.

Also, there's no reason to do
Code:
cat file.txt | sed -e ...
You can just do
Code:
sed -e ... <file.txt
 
Old 11-07-2011, 11:27 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Maybe you should try giving a clue as to what it is you are trying to do?
 
Old 11-08-2011, 10:01 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Please use [code][/code] tags around your code, to preserve formatting and to improve readability.

And whatever else, a command that long and complex really should be placed into its own script file, complete with clear formatting (one expression per line) and comments, so that it's more easily readable and comprehensible.

Then use sed -f scriptfile to run it.

Finally, how about an example of the input text, so we can see what it's supposed to do, and to test variations?

Edit:
Quote:
Originally Posted by Ian John Locke II
I'm not sure why you're using \1 in certain places when that's usually a back-ref but I'm not extraordinarily familiar with sed.
Yes, those are back-references. They aren't just for the replacement field. It's possible to reuse them inside the original expression too.

So if I'm reading it right, the first expression should match any repeating sequence of two digits (87, 8787, 878787, etc).

But since the repeat has been told to match the repeat zero or more times ( {0,} == * ), and the command is simply to delete the line, it seems to me that all that extra is superfluous. Simply matching the first two digits alone will achieve the same effect.
Code:
-e '/[0-9]{2}/d'
***Edit2: I just realized this is the same guy who was warned recently about posting some kind of cracking question. I think we really need to hear more about the purpose of this code before helping him further.

Last edited by David the H.; 11-08-2011 at 10:39 AM. Reason: as stated
 
1 members found this post helpful.
Old 11-08-2011, 01:13 PM   #5
Ian John Locke II
Member
 
Registered: Mar 2008
Location: /dev/null
Distribution: Slackware, Android, Slackware64
Posts: 130

Rep: Reputation: 17
Quote:
Originally Posted by David the H. View Post
Yes, those are back-references. They aren't just for the replacement field. It's possible to reuse them inside the original expression too.
Ah, I've never used the /expr/command much. Usually stick to s///. Good to know though. Thanks for the explanation.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
BASH script alphabetical conditional operator query. lupusarcanus Linux - Newbie 5 03-15-2010 03:14 AM
sed, awk - solution for filtering logs cmeyer Linux - Software 8 10-11-2008 01:01 PM
Conditional (IF) test in bash sh script - presence of first parameter Critcho Linux - Newbie 6 10-01-2008 12:20 AM
filtering files with SED ovince Programming 4 03-13-2007 05:04 AM
sed substitution conditional frostillicus Linux - Newbie 3 04-17-2005 12:36 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration