LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   replace 8 times successive spaces by | (https://www.linuxquestions.org/questions/linux-newbie-8/replace-8-times-successive-spaces-by-%7C-906564/)

sumeet inani 10-05-2011 08:52 AM

replace 8 times successive spaces by |
 
1 Attachment(s)
Hi
I am trying to make GUI for cmdow
Attached here is the log.
My aim is to separate columns by | which is understood by 'autoit' software.
i have attached the log.
The problem is I have to replace one or more consecutive spaces by a single | . And the title of window may contain spaces but that has to be discarded.
Is there a way using command line that I replace ' \+' by '|' only 8 times in each line ?

Thank You.

crts 10-05-2011 10:27 AM

Hi,

a somewhat convoluted sed will do:
Code:

sed -r 'h;s/[[:blank:]]+/|/g;x;s/([^[:blank:]]+[[:blank:]]+){8}//;x;s/\|[^|]+//g8;G;s/\n/|/' cmdow-log
This will also work if the Window Title (last field) contains more than one consecutive space. If we can assume that there are no more than one consecutive spaces (or if we want to transform them to a single space) then things can be simplified:
Code:

sed -r 's/[[:blank:]]+/|/g;s/\|/ /g9' cmdow-log

lithos 10-05-2011 10:57 AM

if you would wrote:

I have a file:
Code:

0x010064 1 2880 Res Ina Ena Hid explorer WorkerW
0x030032 1 2880 Res Ina Ena Hid explorer WorkerW
0x030052 1 2880 Res Ina Ena Hid explorer DDEMLEvent
0x03004E 1 2880 Res Ina Ena Hid explorer DDEMLMom
0x01008E 1 2372 Min Ina Ena Hid msseces  GDI+ Window
0x01007E 1 2880 Res Ina Ena Hid explorer tooltips_class32
0x01006E 1 2880 Res Ina Ena Vis explorer Program Manager

in which you would like to replace " " spaces with "|"

it would be understood,
but saying "The problem is I have to replace one or more consecutive spaces by a single | "

I understand like you have " ...8x... " spaces and need to replace it with single "|".

Of course it can be done with many commands like "sed" or "awk"


@crts NICE WORK !

grail 10-05-2011 10:42 PM

Not sure what you wanted to do about the Header line, but the following will have the rest the way you want:
Code:

ruby -ane '(0..($F.length - 1)).each{|i| $F[i]+=(i<=7)?"|":" "};puts $F.join' cmdow-log

sumeet inani 10-05-2011 11:33 PM

to crts
Code:

sed -r 'h;s/[[:blank:]]+/|/g;x;s/([^[:blank:]]+[[:blank:]]+){8}//;x;s/\|[^|]+//g8;G;s/\n/|/' cmdow-log
It works like a charm.

Actually i am using unxutils on windows which has no binary for 'ruby'.

I will try to decipher the code.

sumeet inani 10-05-2011 11:55 PM

Can you explain me this ?
Code:

echo "123 abc" | sed 's/[0-9]*/& &/'
123 123 abc
understandable
echo "abc 123" | sed 's/[0-9]*/& &/'
 abc 123
In this case 123 should have matched so I was expecting 'abc 123 123'
since 'abc ' was not search pattern . It would be unchanged .


grail 10-06-2011 02:08 AM

The issue is where the match starts as you are using asterisk, meaning zero or more, it is looking
from left to right and saying that there are zero digits at the front of the string so replace that with
itself, a space and itself. So 2 lots of nothing with a space leaves you with a space at the start.

What makes less sense to me however is:
Code:

$ echo "abc 123" | sed 's/[0-9][0-9]*/& &/'
abc 123 123
#understandable as now you have asked for a digit followed by zero or more

$ echo "abc 123" | sed 's/[0-9]+/& &/'
abc 123
# this asks for one or more (which I would interpret the same as above) but does not work


sumeet inani 10-06-2011 05:18 AM

I get it .
I was reading this sed totorial by Bruce Barnett.
I haven't used g flag so first occurence is substituted & rest printed as it is.
so 'abc 123' is actually ^abc 123$ thus output is '^ ^abc 123' where ^ can be called nothing.

sumeet inani 10-06-2011 05:29 AM

to grail
I think
Code:

echo abc 123 | sed 's/[0-9]\+/& &/' gives
abc 123 123
NOTE:\+ not +


sumeet inani 10-06-2011 05:31 AM

also I found a solution to question I asked .
It is simple,workable though it removes extra spaces from last column
Code:

sed -e "s/ \+/|/g" -e "s/|/ /9g" cmdow-log.txt

grail 10-06-2011 10:13 AM

Or you could just use '-r' switch. What I presented does confuse me a little but I do also know that it is an extended regular expression solution (I was trying to direct you to this):
Code:

$ echo "abc 123" | sed -r 's/[0-9]+/& &/'
abc 123 123


crts 10-06-2011 12:27 PM

about '-r' option
 
So, reading the last few posts there appear to be some misconceptions about the '-r' option and extended RegEx in GNU sed.
The most important thing to notice is that GNU sed by default understands extended RegEx. Supplying the '-r' option does not add any additional functionality. It simply avoids the need for escaping them. E.g., the "+" is an extended RegEx. To have sed interpret it as such you have to prepend a backslash, like "\+". Using the -r option simply makes the backslash obsolete in most cases:
Code:

echo 'word' | sed 's/w\+/C/'
echo 'word' | sed -r 's/w+/C/' # same as above

Same goes for parenthesis:
Code:

echo 'word word' | sed 's/\(word\) \1/\1 CHANGE/'
echo 'word word' | sed -r 's/(word) \1/\1 CHANGE/' # same as above

However, this is not true for "\<" and "\>:
Code:

echo 'word' | sed 's/\<w.*\>/CHANGE/'
echo 'word' | sed -r 's/\<w.*\>/CHANGE/' # same as above
echo 'word' | sed -r 's/<w.*>/CHANGE/' # not same as above; expects literal '<' and '>' in input string.

So word boundary symbols "\<" and "\>" need to be escaped in any case. This behavior is a bit inconsistent.

@OP: You said that you are reading the tutorial by Bruce Barnett. I suppose you mean the tutorials on this site:
http://www.grymoire.com/Unix

You might get a bit confused when you read the tutorial about Regex in general on that site, especially this chapter:
http://www.grymoire.com/Unix/Regular.html#uh-12

It says that "\{" and "\}" are basic RegEx and that they cannot be used as extended RegEx. However, in the table further down it is marked as extended RegEx. This is contradictory.

Anyway, RegExes are a great source for confusion since every language/program seems to add its own small modifications to them.

BTW, this is how sed handles "{}":
Code:

echo 'hello' | sed 's/l\{2\}/CC/'
echo 'hello' | sed -r 's/l{2}/CC/' # same as above


grail 10-06-2011 06:31 PM

Thanks for the clarity crts :)


All times are GMT -5. The time now is 07:34 PM.