LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-07-2014, 01:57 AM   #1
azheruddin
Member
 
Registered: Dec 2011
Posts: 83
Blog Entries: 1

Rep: Reputation: Disabled
Splitting file on content basis


Dear all ,

I have one file around 20 MB and wanted to split it on content basis by awk or split utility.

I have done it by on basis of size but splitted files are of no use so wanted to split on content basis.

So here I need to splitt this file on content basis with addition of opening and closing tags in each splitted files.
for e.g
Original file having Opening tags...
<?xml version="1.0" encoding="UTF-8"?>
<ns0:ABCFile xmlns:ns0="urn:PQR:OTHERS:WXYZ:HELLOTEST">
<ABCFileHeader>
<RecordType>01</RecordType>
<Date>20140405</Date>
<TotalRecord>46048</TotalRecord> // 46048/4 = 11512 records in each file
</ABCFileHeader>
.
.
Actualrecord ....starts like
<ABRecordDetail>
<RecordType>02</RecordType>
<LineItem>0000000002</LineItem>
<CompanyCode>PQR</CompanyCode>
<ABDate>20130901</ABtDate>
<CurrencyKey>PVR</CurrencyKey>
<AmountInDC>0</AmountInDC>
<AmountInLC>0</AmountInLC>
<CostCenter>BBN</CostCenter>
<FType>DTH</FType>
<QNumber>VBR3581 </QNumber>
<SNumber>9kBQ</SNumber>
<VNumber>BBGRB</SNumber>
<Assignment>0945</Assignment>
</ABRecordDetail>

So the above actual 15 lines are the actual record and in original file it has 46048 such records so I wanted to split in a way that records 46048/4 = 11512 in each file in addition to opening and closing tags in each file

Opening tags.

<?xml version="1.0" encoding="UTF-8"?>
<ns0:ABCFile xmlns:ns0="urn:PQR:OTHERS:WXYZ:HELLOTEST">
<ABCFileHeader>
<RecordType>01</RecordType>
<Date>20140405</Date>
<TotalRecord>46048</TotalRecord> // 46048/4 = 11512 records in each file so in splited file tag would be like <TotalRecord>11512</TotalRecord>
</ABCFileHeader>

Closing tag:
</ns0:ABCFile>

Hope you understood, in a simple way file needs to be splitted on content basis [record basis] i.e 15 line just need to add fixed tags at top and bottom of each file.
 
Old 05-07-2014, 09:38 AM   #2
sag47
Senior Member
 
Registered: Sep 2009
Location: Orange County, CA
Distribution: Kubuntu x64, Raspbian, CentOS
Posts: 1,850
Blog Entries: 36

Rep: Reputation: 455Reputation: 455Reputation: 455Reputation: 455Reputation: 455
Your best bet is to use a language that has an XML parsing library. Use an option parsing library to take options (such as how many splits or the name of the output file) and then output the split files (e.g. file001.xml file002.xml etc). You're not going to get a decent solution unless you use real parsing.
 
Old 05-07-2014, 10:28 AM   #3
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,792

Rep: Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888Reputation: 2888
actually you can try to set record separator to </ABRecordDetail> and print the lines into a file (name is created using lineno/4)
Code:
awk ' BEGIN {RS="</ABRecordDetail>"}
      { filename = "file" NR/4 ".xml"
        print > filename }
' inputfile
but it was not tested

Last edited by pan64; 05-07-2014 at 10:29 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help! Debian: no basis file system defined PJHAMVS Linux - Newbie 1 06-03-2013 02:37 AM
How to search data from a source file on the basis of some other file? ektubbe Linux - Software 5 02-01-2012 11:30 AM
Dividing content of one file by content of another larspend Linux - Newbie 5 04-12-2011 09:00 PM
Reading a .CSV file and then calculating average per minute basis in shell script. krishdeeps Linux - Newbie 1 04-23-2010 05:38 PM
Mailing a file on a regular basis Sn7 Linux - Software 4 07-19-2004 09:37 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration