Please use
[CODE][/CODE] tags around the data to make it more readable. See:
Quote:
Originally Posted by nbkisnz
Code:
<EMPLOYEE>
<EMP_NAME>ABC</EMP_NAME>
<EMP_ID>XY21Z</EMP_ID>
</EMPLOYEE>
<DEPARTMENT>
<DEPT_NAME>HR</DEPT_NAME>
</DEPARTMENT>
But due to some issues in the ETL, some records are getting created in the following format:
Code:
<EMPLOYEE>
<EMP_NAME>ABC</EMP_NAME>
<EMP_ID>XY21Z
<DEPARTMENT>
<DEPT_NAME>HR</DEPT_NAME>
</DEPARTMENT>
|
Are those records really that broken? The
<EMP_ID> and
<EMPLOYEE> elements never closed at all?
It is not difficult to fix things like this using
awk with
< as the record separator, and
> as the field separator. You just need a stack describing currently open elements, and manipulate that to correct the structure.
However, it seems to me both your records are broken. The first one uses the braindead Microsoft approach, where
sibling nodes apply to each other. (Do not expect that kind of data model to survive any standard XML tools: they expect elements to be "containers", where the element only applies to its contents, elements within itself.) The latter one leaves elements open, and is therefore not even valid XML at all.
Could you verify exactly what needs to be done to fix the broken records, and whether the correct records are formatted as you first displayed?