LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-12-2012, 02:16 AM   #1
nbkisnz
LQ Newbie
 
Registered: May 2012
Posts: 1

Rep: Reputation: Disabled
Search and replace Pattern preceeding another pattern


Hi All,

I have a an XML file which ideally should contain records of the format "<EMPLOYEE><EMP_NAME>ABC</EMP_NAME><EMP_ID>XY21Z</EMP_ID></EMPLOYEE><DEPARTMENT><DEPT_NAME>HR</DEPT_NAME></DEPARTMENT>"

But due to some issues in the ETL, some records are getting created in the following format:
"<EMPLOYEE><EMP_NAME>ABC</EMP_NAME><EMP_ID>XY21Z<DEPARTMENT><DEPT_NAME>HR</DEPT_NAME></DEPARTMENT>"

Due to certain limitations in the ETL process, I have to rectify such records using Shell programming. Is there any command using which I can find all the occurrences of "<DEPARTMENT>" tag which arent preceeded by "</EMPLOYEE>", so that I can replace those occurrences with the correct format.

Thanks in advance
Sriram
 
Old 05-12-2012, 03:34 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
If I understand correctly, maybe something like:
Code:
sed -n '/\/EMPLOYEE.*DEPARTMENT/!p' file
 
Old 05-12-2012, 07:30 AM   #3
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Please use [CODE][/CODE] tags around the data to make it more readable. See:
Quote:
Originally Posted by nbkisnz View Post
Code:
<EMPLOYEE>
 <EMP_NAME>ABC</EMP_NAME>
 <EMP_ID>XY21Z</EMP_ID>
</EMPLOYEE>
<DEPARTMENT>
 <DEPT_NAME>HR</DEPT_NAME>
</DEPARTMENT>
But due to some issues in the ETL, some records are getting created in the following format:
Code:
<EMPLOYEE>
 <EMP_NAME>ABC</EMP_NAME>
 <EMP_ID>XY21Z
  <DEPARTMENT>
   <DEPT_NAME>HR</DEPT_NAME>
  </DEPARTMENT>
Are those records really that broken? The <EMP_ID> and <EMPLOYEE> elements never closed at all?

It is not difficult to fix things like this using awk with < as the record separator, and > as the field separator. You just need a stack describing currently open elements, and manipulate that to correct the structure.

However, it seems to me both your records are broken. The first one uses the braindead Microsoft approach, where sibling nodes apply to each other. (Do not expect that kind of data model to survive any standard XML tools: they expect elements to be "containers", where the element only applies to its contents, elements within itself.) The latter one leaves elements open, and is therefore not even valid XML at all.

Could you verify exactly what needs to be done to fix the broken records, and whether the correct records are formatted as you first displayed?
 
Old 05-13-2012, 01:50 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
If newlines, tabs, and other eye-candy whitespace doesn't matter, this seems to do the job:
Code:
sed 's/<DEPARTMENT>/<\/EMP_ID><\/EMPLOYEE><DEPARTMENT>/g' LQnbkisnz.xml
--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to replace newline pattern in file by other newline pattern in a shell script XXLRay Linux - Software 9 11-29-2010 07:57 AM
[SOLVED] /bin/bash if statement pattern search, end of pattern special character? headhunter_unit23 Programming 3 04-29-2010 08:05 AM
replace a text pattern with the reverse of another text pattern lothario Linux - Software 5 07-25-2008 02:43 PM
vim: search pattern / replace with file naflan Programming 5 01-07-2006 03:16 PM
search for pattern in files and replace mizuki26 Linux - Newbie 3 01-04-2004 11:57 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:06 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration