LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   sed and regexp for search in multilines (https://www.linuxquestions.org/questions/linux-software-2/sed-and-regexp-for-search-in-multilines-834224/)

Felipe 09-24-2010 06:45 AM

sed and regexp for search in multilines
 
Hallo:

I've a text file which have a structure like:
<managed-data-source .....
...
name="nameDS"
....
/>
<connection-pool ....
....
name="connectioPool"
....
>
...
</connection-pool>

Can any tell me a regular expression (with sed or grep) to search for the data-sources and the connectionPool?

Tried with:

sed -n -e '/<connection-pool/,/\/>/p' file
works fine with connection-pool, but

sed -n -e '/<managed-data-source/,/>/p' file
doesn't work (the file is bigger, I've resumed, and put it with a echo for testing):

echo ' <managed-data-source connection-pool-name="Example Connection Pool" jndi-name="jdbc/OracleDS" name="OracleDS"/>
<managed-data-source login-timeout="15" connection-pool-name="mds1PoolDS" jndi-name="jdbc/mds1DS" name="mds1"/>
<managed-data-source login-timeout="15" connection-pool-name="mds2PoolDS" jndi-name="jdbc/mds2DS" name="mds2DS"/>
<managed-data-source login-timeout="15" connection-pool-name="mds3PoolDS" jndi-name="jdbc/mds3DS" name="mds3DS"/>
<managed-data-source login-timeout="15" connection-pool-name="mds4PoolDS" jndi-name="jdbc/mds4DS" name="mds4DS"/>

<connection-pool name="Example Connection Pool">
<managed-data-source login-timeout="15" connection-pool-name="mds2PoolDS" jndi-name="jdbc/mds2DS" name="mds5DS"/>

<connection-pool name="mds2PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" max-connect-attempts="5" max-connections="50" min-connections="5" initial-limit="5" used-connection-wait-timeout="30" lower-threshold-limit="10" time-to-live-timeout="300" property-check-interval="90" validate-connection="true" validate-connection-statement="select 1 from dual">
<managed-data-source
login-timeout="15"
connection-pool-name="PrrrPoolDS"
jndi-name="jdbc/PrrrDS"
name="PrrrDS"/>
' | sed -n '/<managed-data-source/,/>/p'


If you execute the previous command, you will see that it also displays connection-pool lines.

Why? or how?

Thanks

druuna 09-24-2010 07:23 AM

Hi,

Your posted structure and the echo example are not the same.

Could you post or attach a relevant part (or parts) of the text file you are using?

Felipe 09-24-2010 07:34 AM

You are right. I've tried to extract a piece of the file....

But the command is fine. In that echo what I try is to extract the <managed-data-source .... /> using the filter sed -n '/<managed-data-source/,/>/p'. But it returns some text which I don't hope.
If you execute it you'll see that shows <connection-pool...> when that is not in the filter no?

What's wrong?

Thanks

druuna 09-24-2010 07:48 AM

Hi,

Maybe you don't understand what I'm saying in my previous post: Post or attach the original input file here. Without it we cannot help you because the given examples in your first post are not the same.

BTW:
The sed command used in your echo example does what it is asked. When setting a range (/<managed-data-source/,/>/) sed is "greedy". It shows all from the first <managed-data-source it finds to the very last > it finds, which is everything in your echo example.

Felipe 09-24-2010 07:55 AM

OK,

But the problem is that I want the shortest, I mean, the first ">", not the last.
¿What can I do to find <managed-data-source and the firs occurrence of ">"

Thanks

druuna 09-24-2010 08:09 AM

Would you be so kind to do as I asked in post number 2 and number 4. I'm not going to ask again.........

If you don't post/attach/upload the original input file we cannot and will not help you.

Felipe 09-27-2010 03:48 AM

Here is the file:

I'm looking for filters for:
- Search for a managed-data-source by name (ej: name="Apl2DS").
- Search for a connection-pool by name (ej: name="Apl1PoolDS").
- Look for all managed-data-sources.
- Look for all connection-pool.

For listing all managed-data-source I use a filter like:


sed -e "/<managed-data-source/,/[^>]*>/p", but it doesn't work.


Any idea?
Thanks


<?xml version = '1.0' encoding = 'UTF-8'?>
<data-sources xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://xmlns.oracle.com/oracleas/schema/data-sources-10_1.xsd" schema-major-version="10" schema-minor-version="1">
<managed-data-source login-timeout="15" connection-pool-name="Apl1PoolDS" jndi-name="jdbc/Apl1PoolDS" name="Apl1DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl2PoolDS" jndi-name="jdbc/gis" name="Apl2DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl3PoolDS" jndi-name="jdbc/Apl3DS" name="Apl3DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl4PoolDS" jndi-name="jdbc/Apl4DS" name="Apl4DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl5PoolDS" jndi-name="jdbc/Apl5DS" name="Apl5DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl6PoolDS" jndi-name="jdbc/Apl6DS" name="Apl6DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl7PoolDS" jndi-name="jdbc/Apl7DS" name="Apl7DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl8PoolDS" jndi-name="jdbc/Apl8DS" name="Apl8DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Apl9PoolDS" jndi-name="jdbc/Apl9DS" name="Apl9DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Ap10PoolDS" jndi-name="jdbc/Ap10DS" name="Ap10DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Ap11PoolDS" jndi-name="jdbc/Ap11DS" name="Ap11DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Ap12PoolDS" jndi-name="jdbc/Ap12DS" name="Ap12DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Ap13PoolDS" jndi-name="jdbc/Ap13DS" name="Ap13DS"/>
<managed-data-source login-timeout="15" connection-pool-name="Ap14PoolDS" jndi-name="jdbc/Ap14DS" name="ArsiDS"/>
<connection-pool name="Apl1PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" max-connect-attempts="5" max-connections="25" min-connections="5" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.driver.OracleDriver" user="Apl1" password="clave1" url="jdbc:racle:thin:@database1.com:1521:EIC1">
<property name="v$session.program" value="Apl1PoolDS"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl2PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" max-connect-attempts="5" max-connections="25" min-connections="5" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.driver.OracleDriver" user="Apl2" password="Apl2" url="jdbc-racle:thin:@database2.com:1521:rcl">
<property name="v$session.program" value="Apl2PoolDS"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl3PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl3" password="Apl3" url="jdbc:racle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Apl3PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl4PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="25" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl4" password="clave3" url="jdbc:racle:thin:@//database3.com:1521/PRcl01">
<property name="connectionCacheName" value="Apl4PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl5PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl5" password="Apl5" url="jdbc:racle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Apl5PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl6PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl6" password="Apl6" url="jdbc:racle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Apl6PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl7PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl7" password="Apl7" url="jdbcracle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Apl7PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl8PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl8" password="Apl8" url="jdbcracle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Apl8PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Apl9PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Apl9" password="Apl9" url="jdbcacle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Apl9PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Ap10PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Ap10" password="Ap10" url="jdbcracle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Ap10PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Ap11PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Ap11" password="Ap1s" url="jdbcracle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Ap11PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Ap12PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="apl3" password="apl33" url="jdbcracle:thin:@//database4.com:1521/orcl3">
<property name="connectionCacheName" value="Ap12PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Ap13PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Ap13" password="Ap13" url="jdbcracle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Ap13PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
<connection-pool name="Ap14PoolDS" abandoned-connection-timeout="90" connection-retry-interval="30" inactivity-timeout="90" initial-limit="5" lower-threshold-limit="10" max-connect-attempts="5" max-connections="50" min-connections="5" property-check-interval="90" time-to-live-timeout="300" used-connection-wait-timeout="30" validate-connection="true" validate-connection-statement="select 1 from dual">
<connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="Ap14" password="Ap14" url="jdbcracle:thin:@//database2.com:1521/orcl">
<property name="connectionCacheName" value="Ap14PoolDS"/>
<property name="connectionCachingEnabled" value="true"/>
<property name="fastConnectionFailoverEnabled" value="true"/>
</connection-factory>
</connection-pool>
</data-sources>

kurumi 09-27-2010 04:07 AM

you should really use an XML parser. Here's a Ruby example ( similarly with other languages and their XML libraries)

Code:

#!/usr/bin/env ruby -w
# Ruby 1.9.1

require 'rexml/document'
include REXML
ret = File.open("file").read
xml= Document.new(ret)
xml.elements.each("*/managed-data-source") do |element|
    puts element if element.attributes["name"] == "Apl2DS"
end
....
....


druuna 09-27-2010 05:43 AM

Hi,

All the managed dat source entries are on the same line, no multi line sed is needed:
Quote:

- Look for all managed-data-sources
sed -n '/<managed-data-source/p' inputfile

Quote:

- Search for a managed-data-source by name (ej: name="Apl2DS")
sed -n '/<managed-data-source/{/name="Apl2DS"/p}' inputfile

The following are a lot harder to do with sed. This because not all entries have the same amount of lines (do have a look at kurumi's suggestion!).
The answers below assume that every <connection-pool ... entry has 7 lines (not true for the first few in your example!!)
Quote:

- Search for a connection-pool by name (ej: name="Apl1PoolDS")
sed -n '/<connection-pool name="Ap13PoolDS"/,+6p' inputfile

Quote:

- Look for all connection-pool
sed -n '/<connection-pool /,+6p' inputfile

Parsing xml (and html) files isn't easy due to the possible differences in the layout. Perl and Ruby do have xml parsers that could be of help.

Hope this helps.

Felipe 09-27-2010 07:48 AM

Thank you.

I suppose I'll have to use a parser as now I only have to filter but finally I'll have to modify data.

The problem is I'm creating shell scripts and don't know perl or Ruby. I'll try to find an easy xml parser to manage it from shell scripts.

Regards

GrapefruiTgirl 09-27-2010 07:58 AM

At a glance, my opinion is that gawk (awk) would be better suited to a task like this, even though as you see, so far sed is doing the job. As mentioned/implied above, parsing markup languages can be tricky.

Just in case you might be interested, I figured I'd point you to `xgawk` or `XMLgawk`, which is just what it sounds like: awk, for parsing XML; here's the homepage: http://home.vrweb.de/~juergen.kahrs/gawk/XML/

Note that I haven't played with it in some time, but it did work well when I last tried it, and it's only gotta be better now. :)

Good luck!


All times are GMT -5. The time now is 12:15 AM.