using sed to trim lines greater than maximum number of characters
Hi all,
I'm new to this so my knowledge is very limited and need much help. I need to change a single line of text to contain 52 characters or less. *All lines have at least 2 chars. in them- no blanks ones. I need to only use GNU sed for this because I want to continue the following mess: Code:
s/[^A-Za-z0-9_ `@#$+-=,.'(){}//g;s/ */ /g;s/^ *//;s/ *$// 2nd part changes multiple spaces to just one. (The 's/ */ /g' part has 2 spaces followed by 1) 3rd part gets rid of leading spaces. 4th part gets rid of trailing spaces. Everything works but I'm missing the 5th part to trim the result to 52 chars. or less which must be done last. Or actually, I should probably trim trailing spaces last 'cause I can't have 'em. btw, if there's a way I can better combine all this stuff, don't hesitate to tell me! Thanks in advance! |
Hi and welcome to LQ,
try this to trim the line: Code:
sed 's/\(.\{,52\}\).*/\1/' |
Well I would probably add that maybe you could look at the exclusion list compared to your inclusion list and see which is shorter. Also, and yes I red the part about ONLY sed, but worth mentioning is awk could handle a few things for you
to give you less to change, namely the handling of multiple spaces and leading and trailing white space (just a suggestion) |
Quote:
I ended up putting your part 4th as I thought I would have to. So now the meat of it looks like this: Code:
s/[^A-Za-z0-9_ `',;@#$+-=}{]//g;s/ */ /g;s/^ *//;s/\(.\{,52\}\).*/\1/;s/ *$// Is there a "real" help manual for sed anywhere, or is this just a Regex thing? Anyway, thanks a lot! |
I would say the bulk of this is probably regex, but the following is a fairly good resource anyway:
http://www.grymoire.com/Unix/Sed.html |
@grail:
Thanks for the suggestion but I'm actually inserting this into an existing Windows batch script of all things (using GNU sed for Windows) which is why I couldn't use anything else. This was used to renames files before processing by other existing scripts. Here's part of it: Code:
... Don't ask. :p |
As a thought, depending on the version of RegEx you have available to you in your environment, for your first section, how about using something like:
Code:
s/[[:punct:]]//g Now, I'm pretty sure that the syntax is correct, but, if it doesn't work in your situation, it should be close enough to give you an idea where to take it to clean up your script a bit more. For RegEx, there are a few POSIX character classes that you can use to help you get to what you are looking for faster: Code:
[:digit:] Only the digits 0 to 9 Code:
s/[[:punct:][:space:]]//g |
Unfortunately, they wanted to keep as many as possible, so I needed to remove just the 'poison' ones that are invalid for file names in Windows:
Code:
~!/\:?"<>| but thanks for the tip ShadowCat8, might come in handy someday. |
Why not remove just the "poison" ones like this:
Code:
sed 's,[~!/\:?"<>|],,g' Consider '+' rather than '*' in removing extra spaces: Code:
sed -r 's, +, ,g' Code:
sed 's,^ ,,' |
Quote:
Quote:
Quote:
Thanks for your input, much appreciated. |
btw, my final code for this section turned out like this:
Code:
if exist *.htm ( |
All times are GMT -5. The time now is 05:13 PM. |