LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-31-2004, 11:40 AM   #1
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Rep: Reputation: 15
Exclamation Splitting A Large Xml File


hi
i have a 200mb xml file with 19845 records of diff no of linesfor each record
i want to seperate each of the records a store it in a new file (for each record a new file) can anybody help me in doing so
how can i do this in linux,or java /c
i tried using split in linux but it removes only a default no of lines but not the records as i want it to be

plz help
 
Old 08-31-2004, 02:56 PM   #2
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 244Reputation: 244Reputation: 244
How do your records look like ?
 
Old 08-31-2004, 03:10 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
awk, perl or python would offer themselves for
such tasks :)


Cheers,
Tink
 
Old 09-01-2004, 06:12 AM   #4
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Original Poster
Rep: Reputation: 15
each of my records are like the code given below


<party xmlns:defns="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:ns5="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:ns4="http://INDL060BB:8080/home/SCOTT/AIG/xsd/" xmlns:ns3="http://INDL060BB:8080/home/SCOTT/AIG/xsd/" xmlns:ns2="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:ns1="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://INDL060BB:8080/home/SCOTT/AIG/xsd/party.xsd">
- <partyGdr>
- <addressGdr>
<address />
<addressType>Physical</addressType>
</addressGdr>
- <aigLegalEntity>
<aigLegalEntityType>AIGC</aigLegalEntityType>
- <corporateSplit>
<domesticOrForeign>Foreign</domesticOrForeign>
<generalOrLife>General</generalOrLife>
</corporateSplit>
<fcrClassificationCode>ForeignOffsGenEurB</fcrClassificationCode>
</aigLegalEntity>
<currentOwner>CICADA</currentOwner>
- <industryClassification>
- <activity>
<activityType activityTypeScheme="" />
<activityCode />
</activity>
</industryClassification>
- <lastUpdate>
<timestamp>2004-07-02T08:47:58.000000</timestamp>
<user>magellan</user>
</lastUpdate>
- <names>
<sequenceNumber>1</sequenceNumber>
<legalName>American International Underwriters Overseas Association</legalName>
<longName>American International Underwriters Overseas Association</longName>
<shortName>American International Underwriters Overseas Association</shortName>
</names>
<originalOwner>CICADA</originalOwner>
- <parentage>
- <immediate>
<partyId>AC0000755</partyId>
<partyName>American International Group, Inc.</partyName>
<providerAssignedId />
</immediate>
</parentage>
<partyType partyTypeScheme="Party">AIG LEGAL ENTITY</partyType>
<processingDirective>MOD</processingDirective>
<processingDirectiveIssuer>GDR</processingDirectiveIssuer>
<processingDirectiveDate>2004-04-23</processingDirectiveDate>
<recordStatus>Active</recordStatus>
- <sourceSystem>
<aigClientId>AIUOA</aigClientId>
<aigClientParentId>AIG</aigClientParentId>
<counterPartyName>American International Underwriters Overseas Association</counterPartyName>
<internalId>D326C</internalId>
- <lastUpdate>
<timestamp>2004-07-02T08:47:58.000000</timestamp>
<user>magellan</user>
</lastUpdate>
<reportingDate>2003-03-31</reportingDate>
<systemId>CPP</systemId>
</sourceSystem>
</partyGdr>
<partyId>AC0000169</partyId>
<partyName>American International Underwriters Overseas Association</partyName>
</party>
 
Old 09-02-2004, 01:25 PM   #5
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Original Poster
Rep: Reputation: 15
sax IN JAVA

HI THERE
I FOUND THAT SAX CAN BE USED TO DO THIS BUT I DONT KNOW JAVA CAN ANYBODY HELP ME DO THIS each record looks as above (xml) SPLITTING.
PLZ
 
Old 09-02-2004, 06:35 PM   #6
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 244Reputation: 244Reputation: 244
Try this code (in Perl) :

Code:
#!/usr/bin/perl

$xml_file       = "records.xml";
$output_dir     ="/home/me/output";
$file_prefix    ="result_";
$open           =0;
$count          =0;

open XML_FILE, $xml_file or die "can't open $xml_file";

while(<XML_FILE>) {
    if(/^<party\sxmlns/) {
        print "New record found\nCreating $file_prefix$count\n";
        open RESULT, ">", "$output_dir/$file_prefix$count"
                or die "Error : can't open $output_dir/$file_prefix$count";
        print RESULT $_;
        $open = 1;
        $count++;

    } elsif(/^<\/party>/) {
        if($open) {
            print RESULT $_;
            close RESULT;
            $open = 0;
        }
    } elsif($open) {
        print RESULT $_;
    }
}
close XML_FILE;
chmod +x and ./ it after configure the variables inside

Last edited by Cedrik; 09-02-2004 at 06:42 PM.
 
Old 09-03-2004, 04:15 AM   #7
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Original Poster
Rep: Reputation: 15
hi cedrik thanks a lot the programed worked
 
Old 09-03-2004, 04:28 AM   #8
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 244Reputation: 244Reputation: 244
good, you may learn a little Perl to adapt the script to your needs, say it would take the xml file and output directory as argument rather than hard coding it...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting XML Files Lanmate Linux - Software 0 11-09-2005 04:24 PM
splitting large files to smaller parts ZaphyR Linux - Software 2 01-28-2005 12:15 PM
Help! Splitting 1 large partition into 3! sh1ft Linux - Hardware 2 06-30-2004 09:04 AM
splitting large file - with ffmpeg or vcdimager zstingx Linux - General 2 11-02-2003 10:54 AM
splitting large tar file derfberg Linux - Newbie 1 01-08-2003 07:44 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration