LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   SED script with conditional filtering (https://www.linuxquestions.org/questions/programming-9/sed-script-with-conditional-filtering-912388/)

ut0ugh1 11-07-2011 06:44 PM

SED script with conditional filtering
 
is there any1 who can write a shorter sed script to do the same thing that the following does?

cat file.txt | sed -e '/\([0-9][0-9]\)\1\{0,\}/d' -e '/\([bcdfghjklmnpqrstvxwz]\)\{44,60\}/d' -e '/\([bcdfghjklmnpqrstvxwz]\[bcdfghjklmnpqrstvxwz]\)\1\{2,\}/d' -e '/\([b][b][b]\)\1\{0,\}/d' -e '/\([c][c][c]\)\1\{0,\}/d' -e '/\([d][d][d]\)\1\{0,\}/d' -e '/\([f][f][f]\)\1\{0,\}/d' -e '/\([g][g][g]\)\1\{0,\}/d' -e '/\([h][h][h]\)\1\{0,\}/d' -e '/\([j][j][j]\)\1\{0,\}/d' -e '/\([k][k][k]\)\1\{0,\}/d' -e '/\([l][l][l]\)\1\{0,\}/d' -e '/\([l][l][l]\)\1\{0,\}/d' -e '/\([m][m][m]\)\1\{0,\}/d' -e '/\([n][n][n]\)\1\{0,\}/d' -e '/\([p][p][p]\)\1\{0,\}/d' -e '/\([q][q][q]\)\1\{0,\}/d' -e '/\([r][r][r]\)\1\{0,\}/d' -e '/\([s][s][s]\)\1\{0,\}/d' -e '/\([t][t][t]\)\1\{0,\}/d' -e '/\([v][v][v]\)\1\{0,\}/d' -e '/\([x][x][x]\)\1\{0,\}/d' -e '/\([w][w][w]\)\1\{0,\}/d' -e '/\([z][z][z]\)\1\{0,\}/d' -e '/\([aeiou][aeiou][aeiou]\)\1\{0,\}/d'

thx

Ian John Locke II 11-07-2011 07:47 PM

Well for starters, you can use the -r option so you don't have to escape the special characters like you're doing above. That'll cut out I don't know how many backslashes.

Second, for the times where you're looking for consonants you can do [^aeiou] which is shorter than [bcdfghjklmnpqrstvxwz].

For the times where you're specifying [d][d][d] for example you can just use d{3}.

I'm not sure why you're using \1 in certain places when that's usually a back-ref but I'm not extraordinarily familiar with sed.

Also, there's no reason to do
Code:

cat file.txt | sed -e ...
You can just do
Code:

sed -e ... <file.txt

grail 11-07-2011 11:27 PM

Maybe you should try giving a clue as to what it is you are trying to do?

David the H. 11-08-2011 10:01 AM

Please use [code][/code] tags around your code, to preserve formatting and to improve readability.

And whatever else, a command that long and complex really should be placed into its own script file, complete with clear formatting (one expression per line) and comments, so that it's more easily readable and comprehensible.

Then use sed -f scriptfile to run it.

Finally, how about an example of the input text, so we can see what it's supposed to do, and to test variations?

Edit:
Quote:

Originally Posted by Ian John Locke II
I'm not sure why you're using \1 in certain places when that's usually a back-ref but I'm not extraordinarily familiar with sed.

Yes, those are back-references. They aren't just for the replacement field. It's possible to reuse them inside the original expression too.

So if I'm reading it right, the first expression should match any repeating sequence of two digits (87, 8787, 878787, etc).

But since the repeat has been told to match the repeat zero or more times ( {0,} == * ), and the command is simply to delete the line, it seems to me that all that extra is superfluous. Simply matching the first two digits alone will achieve the same effect.
Code:

-e '/[0-9]{2}/d'
***Edit2: I just realized this is the same guy who was warned recently about posting some kind of cracking question. I think we really need to hear more about the purpose of this code before helping him further.

Ian John Locke II 11-08-2011 01:13 PM

Quote:

Originally Posted by David the H. (Post 4518902)
Yes, those are back-references. They aren't just for the replacement field. It's possible to reuse them inside the original expression too.

Ah, I've never used the /expr/command much. Usually stick to s///. Good to know though. Thanks for the explanation.


All times are GMT -5. The time now is 02:57 PM.