LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Remove lines with sed (https://www.linuxquestions.org/questions/linux-newbie-8/remove-lines-with-sed-868610/)

Stevy12 03-14-2011 10:56 PM

Remove lines with sed
 
Hi

I have a large file and need to remove all the lines containing symbol/symbols.

For example: . , ! " # $ % & / ( ) = ? ¡ ¿ ' ´ + * ¨ { } ] [ - _ : ; , > < (maybe more)

Thanks in advance!

GlennsPref 03-14-2011 11:39 PM

Quote:

Hi, Welcome to LQ!

LQ has a fantastic search function that may save you time waiting for an answer to a popular question.

With over 4 million posts to search it's possible the answer has been given.
:)

Some tutes here from IBM.

The first one may be the one.
what you want to do is remove all chars except [A-Z] [a-z] [0-9]
regexp would look like...[^A-Z] [^a-z] [^0-9]

http://www.ibm.com/developerworks/li...ry/l-sed2.html

http://www.ibm.com/developerworks/vi...for+a+new+user

http://www.regular-expressions.info

http://sed.sourceforge.net/sed1line.txt

Regards Glenn

Stevy12 03-15-2011 12:10 AM

Hi GlennsPref :)

I don't want to remove chars, I want to remove the whole lines containing one or more symbols.
I already read about sed at some spanish sites, by the way I can't understand english tutorials then it's a bit hard to learn at these sites.

Anyway, I will take a look.

GlennsPref 03-15-2011 12:34 AM

using the regexp, [A-Z][a-z][0-9]

Code:

# print only lines which match regular expression (emulates "grep")
 sed -n '/regexp/p'          # method 1
 sed '/regexp/!d'            # method 2

and redirect the output to a file.

Code:

sed -n '/[A-Z][a-z][0-9]/p' > ~/filename
All lines containing symbols do not appear in the out put.

Or the other way 'round.
Code:

# print only lines which do NOT match regexp (emulates "grep -v")
 sed -n '/regexp/!p'          # method 1, corresponds to above
 sed '/regexp/d'              # method 2, simpler syntax

This is the theory of it anyway. (awk, grep, sed and vi)
Hope that helps, you'll have to experiment.

Cheers Glenn

grail 03-15-2011 01:42 AM

The 'd' flag is indeed the easier option and the easier regex is a character class:
Code:

sed '/[^[:alnum:]]/d' file
Include '-i' option to make the change to the file or redirect if an alternate file required.

GlennsPref 03-15-2011 04:34 AM

Hmm, I got this, still looking for something more succinct.

Code:

bash-4.1$ sed -e '/[^A-Z][^a-z][^0-9]/d' -e '/:/d; / *#/d; /^ *$/d' /home/glenn/build/scripting/filename1
checking sed to remove lines with symbols 0123456789
 sed G
bash-4.1$

input file, /home/glenn/build/scripting/filename1

Code:

http://sed.sourceforge.net/sed1line.txt
-------------------------------------------------------------------------
USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor)        Dec. 29, 2005
Compiled by Eric Pement - pemente[at]northpark[dot]edu        version 5.5

Latest version of this file (in English) is usually at:
  http://sed.sourceforge.net/sed1line.txt
  http://www.pement.org/sed/sed1line.txt

This file will also available in other languages:
  Chinese    - http://sed.sourceforge.net/sed1line_zh-CN.html
  Czech      - http://sed.sourceforge.net/sed1line_cz.html
  Dutch      - http://sed.sourceforge.net/sed1line_nl.html
  French      - http://sed.sourceforge.net/sed1line_fr.html
  German      - http://sed.sourceforge.net/sed1line_de.html
  Italian    - (pending)
  Portuguese  - http://sed.sourceforge.net/sed1line_pt-BR.html
  Spanish    - (pending)

#ref. http://www.ibm.com/developerworks/linux/library/l-sed2.html
#sed script that will remove HTML tags from a file
sed -e 's/<[^>]*>//g' myfile.html
checking sed to remove lines with symbols 0123456789
# Rem blank lines and # comments

# Use following sed magic to remove both comments and empty lines at the same expense:

sed '/ *#/d; /^ *$/d'

#SED processes whatever you give it, and displays it on "STDOUT"---by default, your terminal window. It does not change filenames---that is done with the "mv" command.

#why "ls -d" ?

#I think you need something like this:
for filename in *; do newname= $(sed 's/+//g' $filename); mv $filename $newname; done

To drill down in the directory tree, use "$(ls -R) instead of "*"

sed -e '/[^.][^,][^!][^"][^#][^$][^%][^&][^/][^(][^)][^=][^?][^¡][^¿][^'][^´][^+][^*][^¨][^{][^}][^]][^[][^-][^_][^:][^]][:blank:][:alnum:]/d' /home/glenn/filename1
sed s -e '/[^\.][^\,][^\!][^\"][^\#][^\$][^\%][^\&][^\/][^\(][^\)][^\=][^\?][^\¡][^\¿][^\'][^\´][^\+][^\*][^\¨][^\{][^\}][^\]][^\[][^\-][^\_][^\:][^\]][:blank:][:alnum:]/d' /home/glenn/filename1
sed -e '/[[:blank:]][[:alnum:]]/d' /home/glenn/filename1
cat /home/glenn/filename1 | sed -d '/#\.\*\[\]\\\/\$\^\-\_\?/d'
cat /home/glenn/filename1 | sed -e '/#\*\[\]\\/d'
cat /home/glenn/filename1 | sed -e '/#\.\*\[\]\\\/\$\^\-\_\?/d'



FILE SPACING:

 # double space a file
 sed G

 # double space a file which already has blank lines in it. Output file
 # should contain no more than one blank line between lines of text.
 sed '/^$/d;G'

 # triple space a file
 sed 'G;G'

 # undo double-spacing (assumes even-numbered lines are always blank)
 sed 'n;d'

 # insert a blank line above every line which matches "regex"
 sed '/regex/{x;p;x;}'

 # insert a blank line below every line which matches "regex"
 sed '/regex/G'

 # insert a blank line above and below every line which matches "regex"
 sed '/regex/{x;p;x;G;}'

NUMBERING:

 # number each line of a file (simple left alignment). Using a tab (see
 # note on '\t' at end of file) instead of space will preserve margins.
 sed = filename | sed 'N;s/\n/\t/'

 # number each line of a file (number on left, right-aligned)
 sed = filename | sed 'N; s/^/    /; s/ *\(.\{6,\}\)\n/\1  /'

 # number each line of file, but only print numbers if line is not blank
 sed '/./=' filename | sed '/./N; s/\n/ /'

 # count lines (emulates "wc -l")
 sed -n '$='

TEXT CONVERSION AND SUBSTITUTION:

 # IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format.
 sed 's/.$//'              # assumes that all lines end with CR/LF
 sed 's/^M$//'              # in bash/tcsh, press Ctrl-V then Ctrl-M
 sed 's/\x0D$//'            # works on ssed, gsed 3.02.80 or higher

 # IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format.
 sed "s/$/`echo -e \\\r`/"            # command line under ksh
 sed 's/$'"/`echo \\\r`/"            # command line under bash
 sed "s/$/`echo \\\r`/"              # command line under zsh
 sed 's/$/\r/'                        # gsed 3.02.80 or higher

 # IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format.
 sed "s/$//"                          # method 1
 sed -n p                            # method 2

 # IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format.
 # Can only be done with UnxUtils sed, version 4.0.7 or higher. The
 # UnxUtils version can be identified by the custom "--text" switch
 # which appears when you use the "--help" switch. Otherwise, changing
 # DOS newlines to Unix newlines cannot be done with sed in a DOS
 # environment. Use "tr" instead.
 sed "s/\r//" infile >outfile        # UnxUtils sed v4.0.7 or higher
 tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

 # delete leading whitespace (spaces, tabs) from front of each line
 # aligns all text flush left
 sed 's/^[ \t]*//'                    # see note on '\t' at end of file

 # delete trailing whitespace (spaces, tabs) from end of each line
 sed 's/[ \t]*$//'                    # see note on '\t' at end of file

 # delete BOTH leading and trailing whitespace from each line
 sed 's/^[ \t]*//;s/[ \t]*$//'

 # insert 5 blank spaces at beginning of each line (make page offset)
 sed 's/^/    /'

 # align all text flush right on a 79-column width
 sed -e :a -e 's/^.\{1,78\}$/ &/;ta'  # set at 78 plus 1 space

 # center all text in the middle of 79-column width. In method 1,
 # spaces at the beginning of the line are significant, and trailing
 # spaces are appended at the end of the line. In method 2, spaces at
 # the beginning of the line are discarded in centering the line, and
 # no trailing spaces appear at the end of lines.
 sed  -e :a -e 's/^.\{1,77\}$/ & /;ta'                    # method 1
 sed  -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/'  # method 2

 # substitute (find and replace) "foo" with "bar" on each line
 sed 's/foo/bar/'            # replaces only 1st instance in a line
 sed 's/foo/bar/4'            # replaces only 4th instance in a line
 sed 's/foo/bar/g'            # replaces ALL instances in a line
 sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # replace the next-to-last case
 sed 's/\(.*\)foo/\1bar/'            # replace only the last case

 # substitute "foo" with "bar" ONLY for lines which contain "baz"
 sed '/baz/s/foo/bar/g'

 # substitute "foo" with "bar" EXCEPT for lines which contain "baz"
 sed '/baz/!s/foo/bar/g'

 # change "scarlet" or "ruby" or "puce" to "red"
 sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'  # most seds
 gsed 's/scarlet\|ruby\|puce/red/g'                # GNU sed only

 # reverse order of lines (emulates "tac")
 # bug/feature in HHsed v1.5 causes blank lines to be deleted
 sed '1!G;h;$!d'              # method 1
 sed -n '1!G;h;$p'            # method 2

 # reverse each character on the line (emulates "rev")
 sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

 # join pairs of lines side-by-side (like "paste")
 sed '$!N;s/\n/ /'

 # if a line ends with a backslash, append the next line to it
 sed -e :a -e '/\\$/N; s/\\\n//; ta'

Will that do?

@Grail.
On my system that command erased every thing except blank lines.

I'm still learning these tools. Hope this wasn't homework!!!

Regards Glenn

colucix 03-15-2011 04:48 AM

Quote:

Originally Posted by GlennsPref (Post 4291268)
On my system that command erased every thing except blank lines.

Maybe we have to retain lines with blank spaces (or tabs) among the words:
Code:

sed '/[^[:alnum:][:space:]]/d' filename
;)

GlennsPref 03-15-2011 05:39 AM

@colucix, That one works well.

Quote:

sed '/[^[:alnum:][:space:]]/d' filename
But left behind the blank lines too.

I'm having trouble appending another regexp to remove leading spaces.

Cheers Glenn

jschiwal 03-15-2011 06:29 AM

You can also use the -v option in grep to exclude lines matching the patterns, leaving the rest of the lines.

colucix 03-15-2011 06:32 AM

You can put together multiple expressions using multiple -e options. For example:
Code:

sed -e '/[^[:alnum:][:space:]]/d' -e 's/^[[:space:]]*//' -e '/^$/d' filename
The second one removes leading spaces, if any. The third one removes empty lines. Since a line may contain only spaces, better to keep the expressions in this order: first spaces are removed, then the resulting empty line is deleted.

GlennsPref 03-15-2011 07:54 AM

Beautiful! colucix.

I hope the OP likes it.

Glenn

Animal X 03-15-2011 08:13 AM

does it have t be sed? grep -v and then the regex for your symbols will give you what you want.

edit:ah, just saw where someone already mentioned it

Stevy12 03-16-2011 12:09 PM

Thanks everyone, this help me so much :)

grail 03-16-2011 07:10 PM

Glad you got a solution. Please mark as SOLVED once you are satisfied.


All times are GMT -5. The time now is 04:00 PM.