LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Splitting A Large Xml File (https://www.linuxquestions.org/questions/programming-9/splitting-a-large-xml-file-224834/)

anirudh 08-31-2004 11:40 AM

Splitting A Large Xml File
 
hi
i have a 200mb xml file with 19845 records of diff no of linesfor each record
i want to seperate each of the records a store it in a new file (for each record a new file) can anybody help me in doing so
how can i do this in linux,or java /c
i tried using split in linux but it removes only a default no of lines but not the records as i want it to be

plz help

Cedrik 08-31-2004 02:56 PM

How do your records look like ?

Tinkster 08-31-2004 03:10 PM

awk, perl or python would offer themselves for
such tasks :)


Cheers,
Tink

anirudh 09-01-2004 06:12 AM

each of my records are like the code given below


<party xmlns:defns="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:ns5="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:ns4="http://INDL060BB:8080/home/SCOTT/AIG/xsd/" xmlns:ns3="http://INDL060BB:8080/home/SCOTT/AIG/xsd/" xmlns:ns2="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:ns1="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://INDL060BB:8080/home/SCOTT/AIG/xsd/party.xsd">
- <partyGdr>
- <addressGdr>
<address />
<addressType>Physical</addressType>
</addressGdr>
- <aigLegalEntity>
<aigLegalEntityType>AIGC</aigLegalEntityType>
- <corporateSplit>
<domesticOrForeign>Foreign</domesticOrForeign>
<generalOrLife>General</generalOrLife>
</corporateSplit>
<fcrClassificationCode>ForeignOffsGenEurB</fcrClassificationCode>
</aigLegalEntity>
<currentOwner>CICADA</currentOwner>
- <industryClassification>
- <activity>
<activityType activityTypeScheme="" />
<activityCode />
</activity>
</industryClassification>
- <lastUpdate>
<timestamp>2004-07-02T08:47:58.000000</timestamp>
<user>magellan</user>
</lastUpdate>
- <names>
<sequenceNumber>1</sequenceNumber>
<legalName>American International Underwriters Overseas Association</legalName>
<longName>American International Underwriters Overseas Association</longName>
<shortName>American International Underwriters Overseas Association</shortName>
</names>
<originalOwner>CICADA</originalOwner>
- <parentage>
- <immediate>
<partyId>AC0000755</partyId>
<partyName>American International Group, Inc.</partyName>
<providerAssignedId />
</immediate>
</parentage>
<partyType partyTypeScheme="Party">AIG LEGAL ENTITY</partyType>
<processingDirective>MOD</processingDirective>
<processingDirectiveIssuer>GDR</processingDirectiveIssuer>
<processingDirectiveDate>2004-04-23</processingDirectiveDate>
<recordStatus>Active</recordStatus>
- <sourceSystem>
<aigClientId>AIUOA</aigClientId>
<aigClientParentId>AIG</aigClientParentId>
<counterPartyName>American International Underwriters Overseas Association</counterPartyName>
<internalId>D326C</internalId>
- <lastUpdate>
<timestamp>2004-07-02T08:47:58.000000</timestamp>
<user>magellan</user>
</lastUpdate>
<reportingDate>2003-03-31</reportingDate>
<systemId>CPP</systemId>
</sourceSystem>
</partyGdr>
<partyId>AC0000169</partyId>
<partyName>American International Underwriters Overseas Association</partyName>
</party>

anirudh 09-02-2004 01:25 PM

sax IN JAVA
 
HI THERE
I FOUND THAT SAX CAN BE USED TO DO THIS BUT I DONT KNOW JAVA CAN ANYBODY HELP ME DO THIS each record looks as above (xml) SPLITTING.
PLZ

Cedrik 09-02-2004 06:35 PM

Try this code (in Perl) :

Code:

#!/usr/bin/perl

$xml_file      = "records.xml";
$output_dir    ="/home/me/output";
$file_prefix    ="result_";
$open          =0;
$count          =0;

open XML_FILE, $xml_file or die "can't open $xml_file";

while(<XML_FILE>) {
    if(/^<party\sxmlns/) {
        print "New record found\nCreating $file_prefix$count\n";
        open RESULT, ">", "$output_dir/$file_prefix$count"
                or die "Error : can't open $output_dir/$file_prefix$count";
        print RESULT $_;
        $open = 1;
        $count++;

    } elsif(/^<\/party>/) {
        if($open) {
            print RESULT $_;
            close RESULT;
            $open = 0;
        }
    } elsif($open) {
        print RESULT $_;
    }
}
close XML_FILE;

chmod +x and ./ it after configure the variables inside

anirudh 09-03-2004 04:15 AM

hi cedrik thanks a lot the programed worked
:)

Cedrik 09-03-2004 04:28 AM

;) good, you may learn a little Perl to adapt the script to your needs, say it would take the xml file and output directory as argument rather than hard coding it...


All times are GMT -5. The time now is 02:08 PM.