SED script with conditional filtering
is there any1 who can write a shorter sed script to do the same thing that the following does?
cat file.txt | sed -e '/\([0-9][0-9]\)\1\{0,\}/d' -e '/\([bcdfghjklmnpqrstvxwz]\)\{44,60\}/d' -e '/\([bcdfghjklmnpqrstvxwz]\[bcdfghjklmnpqrstvxwz]\)\1\{2,\}/d' -e '/\([b][b][b]\)\1\{0,\}/d' -e '/\([c][c][c]\)\1\{0,\}/d' -e '/\([d][d][d]\)\1\{0,\}/d' -e '/\([f][f][f]\)\1\{0,\}/d' -e '/\([g][g][g]\)\1\{0,\}/d' -e '/\([h][h][h]\)\1\{0,\}/d' -e '/\([j][j][j]\)\1\{0,\}/d' -e '/\([k][k][k]\)\1\{0,\}/d' -e '/\([l][l][l]\)\1\{0,\}/d' -e '/\([l][l][l]\)\1\{0,\}/d' -e '/\([m][m][m]\)\1\{0,\}/d' -e '/\([n][n][n]\)\1\{0,\}/d' -e '/\([p][p][p]\)\1\{0,\}/d' -e '/\([q][q][q]\)\1\{0,\}/d' -e '/\([r][r][r]\)\1\{0,\}/d' -e '/\([s][s][s]\)\1\{0,\}/d' -e '/\([t][t][t]\)\1\{0,\}/d' -e '/\([v][v][v]\)\1\{0,\}/d' -e '/\([x][x][x]\)\1\{0,\}/d' -e '/\([w][w][w]\)\1\{0,\}/d' -e '/\([z][z][z]\)\1\{0,\}/d' -e '/\([aeiou][aeiou][aeiou]\)\1\{0,\}/d' thx |
Well for starters, you can use the -r option so you don't have to escape the special characters like you're doing above. That'll cut out I don't know how many backslashes.
Second, for the times where you're looking for consonants you can do [^aeiou] which is shorter than [bcdfghjklmnpqrstvxwz]. For the times where you're specifying [d][d][d] for example you can just use d{3}. I'm not sure why you're using \1 in certain places when that's usually a back-ref but I'm not extraordinarily familiar with sed. Also, there's no reason to do Code:
cat file.txt | sed -e ... Code:
sed -e ... <file.txt |
Maybe you should try giving a clue as to what it is you are trying to do?
|
Please use [code][/code] tags around your code, to preserve formatting and to improve readability.
And whatever else, a command that long and complex really should be placed into its own script file, complete with clear formatting (one expression per line) and comments, so that it's more easily readable and comprehensible. Then use sed -f scriptfile to run it. Finally, how about an example of the input text, so we can see what it's supposed to do, and to test variations? Edit: Quote:
So if I'm reading it right, the first expression should match any repeating sequence of two digits (87, 8787, 878787, etc). But since the repeat has been told to match the repeat zero or more times ( {0,} == * ), and the command is simply to delete the line, it seems to me that all that extra is superfluous. Simply matching the first two digits alone will achieve the same effect. Code:
-e '/[0-9]{2}/d' |
Quote:
|
All times are GMT -5. The time now is 02:57 PM. |