replace 8 times successive spaces by |

sumeet inani · 10-05-2011, 08:52 AM

Hi
I am trying to make GUI for cmdow
Attached here is the log.
My aim is to separate columns by | which is understood by 'autoit' software.
i have attached the log.
The problem is I have to replace one or more consecutive spaces by a single | . And the title of window may contain spaces but that has to be discarded.
Is there a way using command line that I replace ' \+' by '|' only 8 times in each line ?

Thank You.

crts · 10-05-2011, 10:27 AM

Hi,

a somewhat convoluted sed will do:

Code:

sed -r 'h;s/[[:blank:]]+/|/g;x;s/([^[:blank:]]+[[:blank:]]+){8}//;x;s/\|[^|]+//g8;G;s/\n/|/' cmdow-log

This will also work if the Window Title (last field) contains more than one consecutive space. If we can assume that there are no more than one consecutive spaces (or if we want to transform them to a single space) then things can be simplified:

Code:

sed -r 's/[[:blank:]]+/|/g;s/\|/ /g9' cmdow-log

lithos · 10-05-2011, 10:57 AM

if you would wrote:

I have a file:

Code:

0x010064 1 2880 Res Ina Ena Hid explorer WorkerW
0x030032 1 2880 Res Ina Ena Hid explorer WorkerW
0x030052 1 2880 Res Ina Ena Hid explorer DDEMLEvent
0x03004E 1 2880 Res Ina Ena Hid explorer DDEMLMom
0x01008E 1 2372 Min Ina Ena Hid msseces  GDI+ Window
0x01007E 1 2880 Res Ina Ena Hid explorer tooltips_class32
0x01006E 1 2880 Res Ina Ena Vis explorer Program Manager

in which you would like to replace " " spaces with "|"

it would be understood,
but saying "The problem is I have to replace one or more consecutive spaces by a single | "

I understand like you have " ...8x... " spaces and need to replace it with single "|".

Of course it can be done with many commands like "sed" or "awk"

@crts NICE WORK !

grail · 10-05-2011, 10:42 PM

Not sure what you wanted to do about the Header line, but the following will have the rest the way you want:

Code:

ruby -ane '(0..($F.length - 1)).each{|i| $F[i]+=(i<=7)?"|":" "};puts $F.join' cmdow-log

sumeet inani · 10-05-2011, 11:33 PM

to crts

Code:

sed -r 'h;s/[[:blank:]]+/|/g;x;s/([^[:blank:]]+[[:blank:]]+){8}//;x;s/\|[^|]+//g8;G;s/\n/|/' cmdow-log

It works like a charm.

Actually i am using unxutils on windows which has no binary for 'ruby'.

I will try to decipher the code.

sumeet inani · 10-05-2011, 11:55 PM

Can you explain me this ?

Code:

echo "123 abc" | sed 's/[0-9]*/& &/'
123 123 abc
understandable
echo "abc 123" | sed 's/[0-9]*/& &/'
 abc 123 
In this case 123 should have matched so I was expecting 'abc 123 123'
since 'abc ' was not search pattern . It would be unchanged .

grail · 10-06-2011, 02:08 AM

The issue is where the match starts as you are using asterisk, meaning zero or more, it is looking
from left to right and saying that there are zero digits at the front of the string so replace that with
itself, a space and itself. So 2 lots of nothing with a space leaves you with a space at the start.

What makes less sense to me however is:

Code:

$ echo "abc 123" | sed 's/[0-9][0-9]*/& &/'
abc 123 123
#understandable as now you have asked for a digit followed by zero or more

$ echo "abc 123" | sed 's/[0-9]+/& &/'
abc 123
# this asks for one or more (which I would interpret the same as above) but does not work

sumeet inani · 10-06-2011, 05:18 AM

I get it .
I was reading this sed totorial by Bruce Barnett.
I haven't used g flag so first occurence is substituted & rest printed as it is.
so 'abc 123' is actually ^abc 123$ thus output is '^ ^abc 123' where ^ can be called nothing.

sumeet inani · 10-06-2011, 05:29 AM

to grail
I think

Code:

echo abc 123 | sed 's/[0-9]\+/& &/' gives
abc 123 123
NOTE:\+ not +

sumeet inani · 10-06-2011, 05:31 AM

also I found a solution to question I asked .
It is simple,workable though it removes extra spaces from last column

Code:

sed -e "s/ \+/|/g" -e "s/|/ /9g" cmdow-log.txt

grail · 10-06-2011, 10:13 AM

Or you could just use '-r' switch. What I presented does confuse me a little but I do also know that it is an extended regular expression solution (I was trying to direct you to this):

Code:

$ echo "abc 123" | sed -r 's/[0-9]+/& &/'
abc 123 123

crts · 10-06-2011, 12:27 PM

So, reading the last few posts there appear to be some misconceptions about the '-r' option and extended RegEx in GNU sed.
The most important thing to notice is that GNU sed by default understands extended RegEx. Supplying the '-r' option does not add any additional functionality. It simply avoids the need for escaping them. E.g., the "+" is an extended RegEx. To have sed interpret it as such you have to prepend a backslash, like "\+". Using the -r option simply makes the backslash obsolete in most cases:

Code:

echo 'word' | sed 's/w\+/C/'
echo 'word' | sed -r 's/w+/C/' # same as above

Same goes for parenthesis:

Code:

echo 'word word' | sed 's/\(word\) \1/\1 CHANGE/'
echo 'word word' | sed -r 's/(word) \1/\1 CHANGE/' # same as above

However, this is not true for "\<" and "\>:

Code:

echo 'word' | sed 's/\<w.*\>/CHANGE/'
echo 'word' | sed -r 's/\<w.*\>/CHANGE/' # same as above
echo 'word' | sed -r 's/<w.*>/CHANGE/' # not same as above; expects literal '<' and '>' in input string.

So word boundary symbols "\<" and "\>" need to be escaped in any case. This behavior is a bit inconsistent.

@OP: You said that you are reading the tutorial by Bruce Barnett. I suppose you mean the tutorials on this site:
http://www.grymoire.com/Unix

You might get a bit confused when you read the tutorial about Regex in general on that site, especially this chapter:
http://www.grymoire.com/Unix/Regular.html#uh-12

It says that "\{" and "\}" are basic RegEx and that they cannot be used as extended RegEx. However, in the table further down it is marked as extended RegEx. This is contradictory.

Anyway, RegExes are a great source for confusion since every language/program seems to add its own small modifications to them.

BTW, this is how sed handles "{}":

Code:

echo 'hello' | sed 's/l\{2\}/CC/'
echo 'hello' | sed -r 's/l{2}/CC/' # same as above

grail · 10-06-2011, 06:31 PM

Thanks for the clarity crts