LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 08-15-2011, 03:56 PM   #1
cxny
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Rep: Reputation: Disabled
Question using sed to trim lines greater than maximum number of characters


Hi all,

I'm new to this so my knowledge is very limited and need much help.

I need to change a single line of text to contain 52 characters or less.
*All lines have at least 2 chars. in them- no blanks ones.

I need to only use GNU sed for this because I want to continue the following mess:

Code:
s/[^A-Za-z0-9_ `@#$+-=,.'(){}//g;s/  */ /g;s/^ *//;s/ *$//
1st part keeps only certain chars.,i.e. no *!"<>[], etc.
2nd part changes multiple spaces to just one. (The 's/ */ /g' part has 2 spaces followed by 1)
3rd part gets rid of leading spaces.
4th part gets rid of trailing spaces.

Everything works but I'm missing the 5th part to trim the result to 52 chars. or less which must be done last. Or actually, I should probably trim trailing spaces last 'cause I can't have 'em.

btw, if there's a way I can better combine all this stuff, don't hesitate to tell me!

Thanks in advance!

Last edited by cxny; 08-15-2011 at 04:14 PM.
 
Old 08-15-2011, 05:36 PM   #2
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi and welcome to LQ,

try this to trim the line:
Code:
sed 's/\(.\{,52\}\).*/\1/'
Let me know if you have trouble incorporating it into the solution you have got so far.
 
Old 08-15-2011, 11:07 PM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,517

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
Well I would probably add that maybe you could look at the exclusion list compared to your inclusion list and see which is shorter. Also, and yes I red the part about ONLY sed, but worth mentioning is awk could handle a few things for you
to give you less to change, namely the handling of multiple spaces and leading and trailing white space (just a suggestion)
 
Old 08-16-2011, 12:44 AM   #4
cxny
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by crts View Post
Hi and welcome to LQ,

try this to trim the line:
Code:
sed 's/\(.\{,52\}\).*/\1/'
Let me know if you have trouble incorporating it into the solution you have got so far.
Whoa, wasn't expecting such a quick response. It worked perfectly!

I ended up putting your part 4th as I thought I would have to. So now the meat of it looks like this:
Code:
s/[^A-Za-z0-9_ `',;@#$+-=}{]//g;s/  */ /g;s/^ *//;s/\(.\{,52\}\).*/\1/;s/ *$//
btw, I Googled and searched this site for an answer before posting but found nothing too helpful.
Is there a "real" help manual for sed anywhere, or is this just a Regex thing?

Anyway, thanks a lot!

Last edited by cxny; 08-16-2011 at 01:13 AM.
 
Old 08-16-2011, 01:01 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,517

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
I would say the bulk of this is probably regex, but the following is a fairly good resource anyway:

http://www.grymoire.com/Unix/Sed.html
 
Old 08-16-2011, 01:13 AM   #6
cxny
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
@grail:

Thanks for the suggestion but I'm actually inserting this into an existing Windows batch script of all things (using GNU sed for Windows) which is why I couldn't use anything else.

This was used to renames files before processing by other existing scripts. Here's part of it:
Code:
...
if exist *.htm (
  :: generate new file names
  for /F "skip=5 tokens=5" %%S in ('dir /X *.htm') do (
    if "%%S" NEQ "free" (
      for /F "delims=" %%L in ('dir /B "%%S"') do (
        for /f "tokens=*" %%N in ('echo "%%~nL" ^| "%~dp0sed.exe" "s/[^A-Za-z0-9_ `',;@#$+-=}{]//g;s/  */ /g;s/^ *//;s/\(.\{,52\}\).*/\1/;s/ *$//"') do (
          ... save the new names ...
        )
      )
    )
  )
  ... rename the files ...
)
...
Ever try renaming files with poison chars. using a Windows batch?!
Don't ask.

Last edited by cxny; 08-16-2011 at 01:19 AM.
 
Old 08-16-2011, 08:21 PM   #7
ShadowCat8
Member
 
Registered: Nov 2004
Location: Arcadia, CA
Distribution: Gentoo, Arch, (RedHat4.x-9.x, FedoraCore 1.x-4.x, Debian Potato-Sarge, LFS 6.0, etc.)
Posts: 209

Rep: Reputation: 43
As a thought, depending on the version of RegEx you have available to you in your environment, for your first section, how about using something like:
Code:
 s/[[:punct:]]//g
to strip all the punctuation characters from the line?

Now, I'm pretty sure that the syntax is correct, but, if it doesn't work in your situation, it should be close enough to give you an idea where to take it to clean up your script a bit more. For RegEx, there are a few POSIX character classes that you can use to help you get to what you are looking for faster:
Code:
[:digit:]	Only the digits 0 to 9
[:alnum:]	Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]	Any alpha character A to Z or a to z.
[:blank:]	Space and TAB characters only.
[:xdigit:]	Hexadecimal notation 0-9, A-F, a-f.
[:punct:]	Punctuation symbols . , " ' ? ! ; : # $ % & ( ) * + - / < > = @ [ ] \ ^ _ { } | ~
[:print:]	Any printable character.
[:space:]	Any whitespace characters (space, tab, NL, FF, VT, CR). Many system abbreviate as \s.
[:graph:]	Exclude whitespace (SPACE, TAB). Many system abbreviate as \W.
[:upper:]	Any alpha character A to Z.
[:lower:]	Any alpha character a to z.
[:cntrl:]	Control Characters NL CR LF TAB VT FF NUL SOH STX EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.
And, as an example, to strip ALL punctuation and whitespace characters, you can use:
Code:
 s/[[:punct:][:space:]]//g
HTH.
 
Old 08-17-2011, 03:34 PM   #8
cxny
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Unfortunately, they wanted to keep as many as possible, so I needed to remove just the 'poison' ones that are invalid for file names in Windows:
Code:
~!/\:?"<>|
And some were removed to not break other scripts, especially: ~!()

but thanks for the tip ShadowCat8, might come in handy someday.
 
Old 09-02-2011, 08:15 PM   #9
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 231Reputation: 231Reputation: 231
Why not remove just the "poison" ones like this:
Code:
sed 's,[~!/\:?"<>|],,g'
(Note: I was taught to use ',' as my std. delimiter)

Consider '+' rather than '*' in removing extra spaces:
Code:
sed -r 's, +, ,g'
If that last did its job, then the following will suffice for leading spaces:
Code:
sed 's,^ ,,'
 
Old 09-03-2011, 01:19 PM   #10
cxny
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by archtoad6 View Post
Why not remove just the "poison" ones like this:
Code:
sed 's,[~!/\:?"<>|],,g'
(Note: I was taught to use ',' as my std. delimiter)
The problem is that I cannot use the poison chars. within a windows batch because it would break the script. For instance, the > would be interpreted as redirection, the | as a pipe, the ! just causes havoc, etc. Also, GNU sed for Windows needs parameters to be in quotes, i.e. sed "s/[^A-Za-z0-9...]" so I couldn't specify the quote symbol either. This is why I had to choose the 'NOT' method instead.

Quote:
Originally Posted by archtoad6 View Post
Consider '+' rather than '*' in removing extra spaces:
Code:
sed -r 's, +, ,g'
Being that I wanted it all on one line, I found I couldn't specify the -r option in between commands.

Quote:
Originally Posted by archtoad6 View Post
If that last did its job, then the following will suffice for leading spaces:
Code:
sed 's,^ ,,'
Makes sense, you saved me a character!

Thanks for your input, much appreciated.
 
Old 09-03-2011, 01:25 PM   #11
cxny
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
btw, my final code for this section turned out like this:
Code:
if exist *.htm (
	:: generate new filenames (remove bad chars.,superfluous spaces and limit filename size)
	for /F "skip=5 tokens=5" %%S in ('dir /X *.htm') do (
		if "%%S" NEQ "free" (
			for /F "delims=" %%L in ('dir /B "%%S"') do (
				for /f "tokens=*" %%N in ('echo "%%~nL" ^| "%~dp0sed.exe" "s/[^A-Za-z0-9_ `',;@#$+-=}{]//g;s/  */ /g;s/^ //;s/\(.\{,52\}\).*/\1/;s/ *$//"') do (
					CALL :RENAME_Files "%%S" "%%L" "%%N"
				)
			)
		)
	)
)

Last edited by cxny; 09-03-2011 at 01:27 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed or awk question, find lines with signs and number them Krzysztow Linux - Newbie 14 05-18-2010 10:23 AM
The maximum number of characters and rows in a database? puppymagic Linux - Newbie 1 04-18-2010 10:14 PM
sed: replace same number of characters between tags unihiekka Linux - Newbie 6 12-30-2008 03:51 AM
Trim first 10 lines out of a file hattori.hanzo Linux - Newbie 7 11-12-2008 08:40 AM
using sed to insert lines with special characters disorderly Linux - Software 26 04-20-2006 05:30 PM


All times are GMT -5. The time now is 05:36 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration