LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-31-2004, 12:40 PM   #1
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Rep: Reputation: 15
Exclamation Splitting A Large Xml File


hi
i have a 200mb xml file with 19845 records of diff no of linesfor each record
i want to seperate each of the records a store it in a new file (for each record a new file) can anybody help me in doing so
how can i do this in linux,or java /c
i tried using split in linux but it removes only a default no of lines but not the records as i want it to be

plz help
 
Old 08-31-2004, 03:56 PM   #2
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
How do your records look like ?
 
Old 08-31-2004, 04:10 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,000
Blog Entries: 11

Rep: Reputation: 893Reputation: 893Reputation: 893Reputation: 893Reputation: 893Reputation: 893Reputation: 893
awk, perl or python would offer themselves for
such tasks :)


Cheers,
Tink
 
Old 09-01-2004, 07:12 AM   #4
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Original Poster
Rep: Reputation: 15
each of my records are like the code given below


<party xmlns:defns="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:ns5="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:ns4="http://INDL060BB:8080/home/SCOTT/AIG/xsd/" xmlns:ns3="http://INDL060BB:8080/home/SCOTT/AIG/xsd/" xmlns:ns2="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:ns1="http://INDL060BB:8080/home/SCOTT/AIG/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://INDL060BB:8080/home/SCOTT/AIG/xsd/party.xsd">
- <partyGdr>
- <addressGdr>
<address />
<addressType>Physical</addressType>
</addressGdr>
- <aigLegalEntity>
<aigLegalEntityType>AIGC</aigLegalEntityType>
- <corporateSplit>
<domesticOrForeign>Foreign</domesticOrForeign>
<generalOrLife>General</generalOrLife>
</corporateSplit>
<fcrClassificationCode>ForeignOffsGenEurB</fcrClassificationCode>
</aigLegalEntity>
<currentOwner>CICADA</currentOwner>
- <industryClassification>
- <activity>
<activityType activityTypeScheme="" />
<activityCode />
</activity>
</industryClassification>
- <lastUpdate>
<timestamp>2004-07-02T08:47:58.000000</timestamp>
<user>magellan</user>
</lastUpdate>
- <names>
<sequenceNumber>1</sequenceNumber>
<legalName>American International Underwriters Overseas Association</legalName>
<longName>American International Underwriters Overseas Association</longName>
<shortName>American International Underwriters Overseas Association</shortName>
</names>
<originalOwner>CICADA</originalOwner>
- <parentage>
- <immediate>
<partyId>AC0000755</partyId>
<partyName>American International Group, Inc.</partyName>
<providerAssignedId />
</immediate>
</parentage>
<partyType partyTypeScheme="Party">AIG LEGAL ENTITY</partyType>
<processingDirective>MOD</processingDirective>
<processingDirectiveIssuer>GDR</processingDirectiveIssuer>
<processingDirectiveDate>2004-04-23</processingDirectiveDate>
<recordStatus>Active</recordStatus>
- <sourceSystem>
<aigClientId>AIUOA</aigClientId>
<aigClientParentId>AIG</aigClientParentId>
<counterPartyName>American International Underwriters Overseas Association</counterPartyName>
<internalId>D326C</internalId>
- <lastUpdate>
<timestamp>2004-07-02T08:47:58.000000</timestamp>
<user>magellan</user>
</lastUpdate>
<reportingDate>2003-03-31</reportingDate>
<systemId>CPP</systemId>
</sourceSystem>
</partyGdr>
<partyId>AC0000169</partyId>
<partyName>American International Underwriters Overseas Association</partyName>
</party>
 
Old 09-02-2004, 02:25 PM   #5
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Original Poster
Rep: Reputation: 15
sax IN JAVA

HI THERE
I FOUND THAT SAX CAN BE USED TO DO THIS BUT I DONT KNOW JAVA CAN ANYBODY HELP ME DO THIS each record looks as above (xml) SPLITTING.
PLZ
 
Old 09-02-2004, 07:35 PM   #6
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
Try this code (in Perl) :

Code:
#!/usr/bin/perl

$xml_file       = "records.xml";
$output_dir     ="/home/me/output";
$file_prefix    ="result_";
$open           =0;
$count          =0;

open XML_FILE, $xml_file or die "can't open $xml_file";

while(<XML_FILE>) {
    if(/^<party\sxmlns/) {
        print "New record found\nCreating $file_prefix$count\n";
        open RESULT, ">", "$output_dir/$file_prefix$count"
                or die "Error : can't open $output_dir/$file_prefix$count";
        print RESULT $_;
        $open = 1;
        $count++;

    } elsif(/^<\/party>/) {
        if($open) {
            print RESULT $_;
            close RESULT;
            $open = 0;
        }
    } elsif($open) {
        print RESULT $_;
    }
}
close XML_FILE;
chmod +x and ./ it after configure the variables inside

Last edited by Cedrik; 09-02-2004 at 07:42 PM.
 
Old 09-03-2004, 05:15 AM   #7
anirudh
Member
 
Registered: Aug 2004
Location: bangalore india
Posts: 50

Original Poster
Rep: Reputation: 15
hi cedrik thanks a lot the programed worked
 
Old 09-03-2004, 05:28 AM   #8
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
good, you may learn a little Perl to adapt the script to your needs, say it would take the xml file and output directory as argument rather than hard coding it...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting XML Files Lanmate Linux - Software 0 11-09-2005 05:24 PM
splitting large files to smaller parts ZaphyR Linux - Software 2 01-28-2005 01:15 PM
Help! Splitting 1 large partition into 3! sh1ft Linux - Hardware 2 06-30-2004 10:04 AM
splitting large file - with ffmpeg or vcdimager zstingx Linux - General 2 11-02-2003 11:54 AM
splitting large tar file derfberg Linux - Newbie 1 01-08-2003 08:44 PM


All times are GMT -5. The time now is 03:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration