LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   using sed to remove all characters on a line except the first (https://www.linuxquestions.org/questions/programming-9/using-sed-to-remove-all-characters-on-a-line-except-the-first-910986/)

sorrymouse 10-30-2011 08:30 PM

using sed to remove all characters on a line except the first
 
I have a file that is structured as follows:

@HW367
TCTGTCTGATC
+HW367
########
@HW785
GCGCTGCG
+HW785
##%##:DDD

etc. The lengths of lines are variable, and consist of many special characters. Following either the @HW or +HW each has a unique id number (in pairs of two, so each number has one + and one @ entry). I need to remove everything on the + line except the +. I have been trying modify a sed script:

sed '/+/,/\n/ s/*//'

to do this, modified by me from this one:

sed '/start/,/stop/ s/#.*//'

But I don't really know what I am doing. Any ideas would be really appreciated.

Thank you!

Juako 10-30-2011 08:59 PM

Code:

sed -r 's/^(.).*$/\1/g'
s/what-to-match/what-to-replace-it-with/g

s=replace
g="global" flag for replace

what to match:
^ = beginning of line
(.) = a character. sourround it in parenthesis to refer to it later as backreference #1 (the \1)
.* = any characters following
$ = end of line

what to replace it with:
\1 = the captured expression in the previous section of the s command

the -r switch i've put it mostly so I don't have to escape the parenthesis, otherwise it would have look like this:

Code:

sed 's/^\(.\).*$/\1/g'
edit
sorry i just see you only need to do this operation in lines beginning with a "+", this will do:

Code:

sed -r '/\+/s/^(.).*$/\1/g'
Of course, since you already know you'll use always a "+" for replacement you could leave it fixed too:

Code:

sed -r 's/^\+.*$/+/g'

crts 10-31-2011 04:36 AM

Quote:

Originally Posted by Juako (Post 4512170)
Code:

sed -r '/\+/s/^(.).*$/\1/g'

Hi,

that is a good idea. But you really do not need the 'g' flag at the end here. Since you only want to keep the first character there is no need for 'global' repetition of the 's' command.

Juako 10-31-2011 08:10 AM

Quote:

Originally Posted by crts (Post 4512336)
Hi,

that is a good idea. But you really do not need the 'g' flag at the end here. Since you only want to keep the first character there is no need for 'global' repetition of the 's' command.

You're right that "global" isn't necessary here. But as it happens that in most cases I end up using it I just tend to leave it out only when it affects the result (in this case i would have omitted it if the OP wanted to, say, transform just the first ocurrence of a "+" and keep processing). But, since here the command will stop anyway after first match, cutting the line to a "+", it doesn't affect anything in practice, thus I went away with my common pattern.

sorrymouse 10-31-2011 10:33 AM

Thank you!
 
This seems to be doing the trick. I appreciate the effort this community puts into helping people like me out!


All times are GMT -5. The time now is 02:37 AM.