[SOLVED] regex problem

validator456 · 08-03-2014, 03:01 AM

I want to select whole lines from an XML-file (SVG) but only lines with clip-path in it. Like this piece of code.

Code:

<path
       d=""
       transform="translate(475.84716,103.57904)"
       clip-path="url(#SVG_CP_1)"
       id="path18"
       style="fill:#e9afaf;fill-rule:evenodd;stroke:none" />

How do I match this line?

sycamorex · 08-03-2014, 03:48 AM

There are a few ways of doing it. One of them is:

Code:

sed -n '/clip-path/p' file.txt

grail · 08-03-2014, 03:54 AM

Anything wrong with simple grep?

Code:

grep 'clip-path' file

It would appear that not much effort went into solving this?

validator456 · 08-03-2014, 04:02 AM

I see that I haven't made myself clear. I want to remove from an XML-file this code:

Code:

<path
       d="m 381.8312,251.53728 0.95997,0.24001 0.71999,1.20009 0.71998,0 0.47999,-1.68012 1.91996,0.24002 0.71998,0.48003 -0.24,0.48003 1.19998,2.8802 1.19997,0 2.15995,1.92013 0.71998,-0.24001 2.87994,1.68011 0.23999,0.48003 0.95998,-0.48003 1.43997,2.40017 1.19997,-0.24002 0.47999,0.72005 0.23999,1.20008 1.43997,0.24002 0,0.72005 1.43997,1.20008 0.47998,1.68012 -1.67996,-0.24002 -0.95997,0.48003 -0.95998,2.16015 -1.91996,1.92013 -0.95998,1.68012 0,0.72005 -0.47999,0.48003 -2.87993,-0.48003 -1.91995,0.24001 -2.39995,1.20009 -2.15995,-0.24002 -3.11993,0.96007 -0.71998,0.24001 -3.83991,0.24002 -4.3199,-0.72005 -2.15995,-2.40017 -2.15995,-4.08028 0.71998,-0.48003 -0.47999,0 -0.71998,-1.20008 -1.19997,-2.8802 -0.24,-2.16015 5.99986,-2.16015 3.35993,-3.84026 0.23999,-0.96007 1.19997,-1.20008 0.24,-0.96006 2.15995,-1.4401 z"
       transform="matrix(3.8794702,0,0,3.8794702,-1292.2521,-947.17743)"
       clip-path="url(#SVG_CP_1)"
       id="path4314"
       style="fill:#e6f6ba;fill-opacity:1;fill-rule:evenodd;stroke:#757575;stroke-width:0.18043701;stroke-miterlimit:4;stroke-opacity:1" />

and this code:

Code:

<path
       d=""
       transform="translate(882.67042,139.23984)"
       clip-path="url(#SVG_CP_1)"
       id="path346"
       style="fill:#e9afaf;stroke:#ffffff;stroke-width:0.47999001;stroke-linecap:square;stroke-miterlimit:10" />

but not this code:

Code:

 <path
         d="m 563.375,496.84375 c -0.033,1.00895 -0.23446,1.97092 -0.5625,2.875 l 10.71875,-2.875 -10.15625,0 z"
         id="path25038"
         style="fill:none;stroke:#000000;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:9;stroke-opacity:1;stroke-dasharray:none" />

<path.*/> This code will select all the path-lines.
I only want to select the path-lines with clip-path in it.

The codes are taken from the file: https://commons.wikimedia.org/wiki/F...arr%C3%A9s.svg

grail · 08-03-2014, 09:19 AM

Ok ... thanks for explaining further

Although I am a little more confused

Is the data you are showing supposed to be on a single line? If not, I am not sure I see how the regex '<path.*/>' would encompass what we are seeing?
Also you do not elude to what you are using to parse this data, ie sed, perl, ruby, etc...

syg00 · 08-03-2014, 09:38 AM

Quote:

Originally Posted by grail

Although I am a little more confused

Don't go confusing gurus - bad karma ...

validator456 · 08-03-2014, 01:53 PM

Code:

<path (a lot of code) />

This is to me is one line. "<path" is the opening tag and "/>" is the closing tag.

Because it is a SVG-file (which is a XML-file), many lines start with the "<path"-tag. But I only want to delete the path-lines with a particular line of code in it: the ones with the clip-path line.

I am cleaning up the above mentioned SVG-file. A lot of code needs to be removed. Not just in this file. But in other files the same creator has made:https://commons.wikimedia.org/wiki/C...lta_in_Spanish. So I am trying to find an automation.

Now I that look at it again, I think it can only be done with perl or ruby. I have some working knowledge with regex but almost no knowledge of perl.

keefaz · 08-03-2014, 03:07 PM

Code:

grep -v 'clip-path' file.svg > newfile.svg

Seems to work on the example file (no newlines between <path and />)

grail · 08-03-2014, 08:14 PM

Well if we can assume that the data is on one line, it would be as easy as:

Code:

sed -i '/<path.*clip-path/d' file

This will remove all lines in a file where '<path' appears before 'clip-path'

If this is not the desired affect and it really is over multiple lines, you will need some form of a looping structure to capture what you are looking for.

pan64 · 08-04-2014, 01:38 AM

for me it looks like a multi-line regexp, probably with perl:
read whole file in a variable, and
s!<path[^/]+clip[-]path[^/]+/>!!gm
I have not tested, probably you need to check greediness too

validator456 · 08-04-2014, 01:58 AM

Yes, that worked, Grail. Thank you all for your answers.