LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 02-15-2012, 03:42 AM   #1
tlin
LQ Newbie
 
Registered: Feb 2012
Posts: 3

Rep: Reputation: Disabled
Question extract part of a string and reinsert back into original


Hi,

I'm new to LQ and bash programming. Currently I'm faced with this task:

File name is abc.xml
inside this file look like this
...
...
...
<some text1 value="WWXXRRR222BBBCC" some1 moretext2/>
<some text2 value="YYXXRRR444KKKCC" some4 moretext7/>
<some text1 value="WWXXYYY222BBBTT" some6 moretext5/>
<some text2 value="GGG66RRIIIBBBCC" some8 moretext0/>
<some text1 value="WWXXRRR222VVVEE" some9 moretext3/>
...
...
...

The value (WWXXRRR222BBBCC) can be totally random but have a fixed length.
The some# moretest# following the value (WWXXRRR222BBBCC) are also random, but always starts with (" some).
The string before the value always starts with (value=")
I need to extract the value, double the value (WWXXRRR222BBBCC) into (WWXXRRR222BBBCCWWXXRRR222BBBCC) and insert it back into the exact place.

I'm able to get to the point where I can get just a list of values but I don't know how to process it one line at a time and reinsert it back into the original file.
I used code:

grep "<Key parity" abc.xml | sed -e 's/.*value=\"//' -e 's/\" from.*//'

Thanks in advance!
 
Old 02-15-2012, 03:54 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

Give this a try:
Code:
sed 's/"\(.*\)"/"\1\1"/' infile
Example run:
Code:
$ sed 's/"\(.*\)"/\1\1/' infile 
<some text1 value="WWXXRRR222BBBCCWWXXRRR222BBBCC" some1 moretext2/>
<some text2 value="YYXXRRR444KKKCCYYXXRRR444KKKCC" some4 moretext7/>
<some text1 value="WWXXYYY222BBBTTWWXXYYY222BBBTT" some6 moretext5/>
<some text2 value="GGG66RRIIIBBBCCGGG66RRIIIBBBCC" some8 moretext0/>
<some text1 value="WWXXRRR222VVVEEWWXXRRR222VVVEE" some9 moretext3/>
The above uses back referencing. All between \( and \) in the search part can be represented by \1 in the replace part.

PS: If this needs to be done in-place (changes are made in the original file), change the command to
Code:
sed -i 's/"\(.*\)"/"\1\1"/' infile
Hope this helps.
 
Old 02-15-2012, 05:42 AM   #3
tlin
LQ Newbie
 
Registered: Feb 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks druuna for the quick reply!

Your solutions works for my example but when I try to run it on my actual file I'm not getting the result I'm looking for.

I guess it's because my actual file have a lot of "" in them.

It looks more like this:
<some text="1" value="WWXXRRR222BBBCC" some="ABC" moretext="1"/>
<some text="2" value="YYXXRRR444KKKCC" some="3" moretext="4"/>
<some text="1" value="WWXXYYY222BBBTT" some="7 moretext="5"/>
<some text="2" value="GGG66RRIIIBBBCC" some="8" moretext="0"/>
<some text="1" value="WWXXRRR222VVVEE" some="6" moretext="3"/>

There is also some text before this and after this that I don't want to change at all.

Thanks!
 
Old 02-15-2012, 06:00 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

Assuming you did post a relevant example this time:
Code:
sed 's/value="\(.*\)" some/value="\1\1" some/' infile
Quote:
There is also some text before this and after this that I don't want to change at all.
You need to provide a relevant example, this doesn't help. What is before/after?

Hope this helps.

Last edited by druuna; 02-15-2012 at 06:01 AM.
 
Old 02-15-2012, 08:43 AM   #5
tlin
LQ Newbie
 
Registered: Feb 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Sorry about give a bad example to start with.
I figured out how to do everything using your code as a start.

Here is what I end up writing which works for me:

Code:
sed '/^\s*<some text=/s/"\([^"]*\)"\([^"]*\)"\([^"]*\)"/"\1"\2"\3\3"/' abc.xml
I want to change only lines starting with <some text=

so it end up looking like this:

Code:
sed '/^\s*<some text=/s/"\([^"]*\)"\([^"]*\)"\([^"]*\)"/"\1"\2"\3\3"/' abc.xml

<some other text="6" something else="AAYURYE" more text="YYUU" number="234332">  #Something before (No change)
<some text="1" value="WWXXRRR222BBBCCWWXXRRR222BBBCC" some="ABC" moretext="1"/>  #Changed because it start with <some text=
<some text="2" value="YYXXRRR444KKKCCYYXXRRR444KKKCC" some="3" moretext="4"/>    #Changed because it start with <some text=
<some text="1" value="WWXXYYY222BBBTTWWXXYYY222BBBTT" some="7 moretext="5"/>     #Changed because it start with <some text=
<some text="2" value="GGG66RRIIIBBBCCGGG66RRIIIBBBCC" some="8" moretext="0"/>    #Changed because it start with <some text=
<some text="1" value="WWXXRRR222VVVEEWWXXRRR222VVVEE" some="6" moretext="3"/>    #Changed because it start with <some text=
<some other text="8" something else="AAYTTBB" more text="IIEE" number="278462">  #Something after (No change)
Hopefully this will help someone else.
Note: I was told this sed usage may not work for Unix but should be just fine with Linux.

Thanks druuna!

Last edited by tlin; 02-15-2012 at 08:44 AM.
 
Old 02-15-2012, 01:38 PM   #6
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 1,421

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
Alternate solution with xmlstarlet:
Code:
xml ed --inplace -u '//Key[@parity]/value' -x 'concat(.,.)' abc.xml
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract part of a string based on regex winairmvs Linux - Software 5 02-14-2011 12:56 PM
extract a string within a string using a pattern adshocker Linux - Newbie 1 11-04-2010 10:44 PM
[SOLVED] extract part of string himu3118 Programming 4 05-07-2010 07:13 AM
[SOLVED] C - How to put a specific arbitrary part of a string into it's own string? golmschenk Programming 9 04-19-2010 08:27 PM
Extract part of a string steven.c.banks Linux - General 7 05-07-2008 07:18 AM


All times are GMT -5. The time now is 04:28 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration