LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 12-10-2010, 08:31 AM   #1
Mithrilhall
Member
 
Registered: Feb 2002
Location: Massachusetts
Distribution: Debian (Lenny)
Posts: 286

Rep: Reputation: 30
Question Splitting text file into multiple files


I have a text file that is filled with references to duplicate files.

I'm trying to create a text file for each duplicate file found that contains the paths to the duplicates. I would also like the text file names to be based on the size and file name.

Some thing like:
231.5 KB - P&S.doc.txt
138.5 KB - LIMITED#C71.doc.txt


If someone could point me in the right direction I would greatly appreciate it.

Code:
Name	Path	Size	Last Change	Last Access	File Type	Owner	Attributes
P&S.doc	(3 Files)						
  P&S.doc	Z:\Leg\_Pri_Leg\Pur\P&S\BUY\Barry V\	231.5 KB	11/2/2001 4:07 PM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	Lou_A	C
  P&S.doc	Z:\Leg\_Pri_Leg\P&S\BUY\Barry V\	231.5 KB	11/2/2001 4:07 PM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	DMs	C
  P&S.doc	Z:\Leg\_Pri_Leg\Props\Pur\P&S\BUY\Barry V\	231.5 KB	11/2/2001 4:07 PM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	DMs	C
LIMITED#C71.doc	(2 Files)						
  LIMITED#C71.doc	Z:\Leg\_Pri_Leg\Pur\CV\	138.5 KB	12/15/2003 1:04 PM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	Lou_A	C
  LIMITED#C71.doc	Z:\Leg\_Pri_Leg\Props\Pur\CV\	138.5 KB	12/15/2003 1:04 PM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	DMs	C
ps revised.8.30.05.clean.doc	(3 Files)						
  ps revised.8.30.05.clean.doc	Z:\Leg\_Pri_Leg\Props\Pur\P&S\Sell\VP\Summit\	54.5 KB	8/31/2005 11:46 AM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	DMs	C
  ps revised.8.30.05.clean.doc	Z:\Leg\_Pri_Leg\P&S\Sell\VP\Summit\	54.5 KB	8/31/2005 11:46 AM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	DMs	C
  ps revised.8.30.05.clean.doc	Z:\Leg\_Pri_Leg\Pur\P&S\Sell\VP\Summit\	54.5 KB	8/31/2005 11:46 AM	11/22/2010 2:38 AM	.doc (Microsoft Office Word 97 - 2003 Document)	Lou_A	C
Copy of 08 Lee All July Billing.xls	(2 Files)						
  Copy of 08 Lee All July Billing.xls	Z:\IS\_Sh_IS\Dev\Doc\Docl 26 upgrade\AS6 backup code\APImport\	131.5 KB	7/30/2010 12:11 PM	11/22/2010 2:38 AM	.xls (Microsoft Office Excel 97-2003 Worksheet)	Administrators	C
  Copy of 08 Lee All July Billing.xls	Z:\AP\Kellie\	131.5 KB	7/30/2010 10:03 AM	11/22/2010 2:38 AM	.xls (Microsoft Office Excel 97-2003 Worksheet)	Kellie	C
 
Old 12-10-2010, 08:54 AM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Do you have any previous programming experience? The problem you asked could be rather easily solved in any number of languages. I would go for python (or perl, if you prefer. Or even Bash, if you're strangely masochistic)

And I assume that that was an extract you posted, rather than the whole file, or it would be far far quicker to just do it by hand
 
Old 12-10-2010, 08:58 AM   #3
Mithrilhall
Member
 
Registered: Feb 2002
Location: Massachusetts
Distribution: Debian (Lenny)
Posts: 286

Original Poster
Rep: Reputation: 30
I do have some programming experience. I'm familiar with C,C++,VB 6, PHP, ASP, COBOL...but I haven't coded in a while. I have taken a look at phython in the past.

And you are correct, that is only a partial sample of the file in question. The files is a 50 MB text file.
 
Old 12-10-2010, 09:24 AM   #4
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,362

Rep: Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910
A good job for awk. Example:
Code:
BEGIN { 
  FS = "\t"
  getline
}

!/^ / {
  dupname = $1
  ndup = gensub(/\(| Files\)/,"","g",$2)
  for ( i = 1; i <= ndup; i++ ) {
    getline
    file = ( $3 " - " dupname ".txt" )
    print $2 dupname >> file
  } 
}
 
Old 12-10-2010, 09:51 AM   #5
Mithrilhall
Member
 
Registered: Feb 2002
Location: Massachusetts
Distribution: Debian (Lenny)
Posts: 286

Original Poster
Rep: Reputation: 30
Colucix, thanks for the link.

The text you provided, is it a filter for awk?



I think I see how it is used:

Code:
awk -f (your file) input_file

Last edited by Mithrilhall; 12-10-2010 at 10:19 AM.
 
Old 12-10-2010, 10:32 AM   #6
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,362

Rep: Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910
Quote:
Originally Posted by Mithrilhall View Post
The text you provided, is it a filter for awk?
I don't know what do you mean for filter. It is simply an awk piece of code.
Quote:
Originally Posted by Mithrilhall View Post
I think I see how it is used:

Code:
awk -f (your file) input_file
Exactly!

awk is a very powerful tool to parse and extract information from text files. However if you want to learn or refresh a new language, python is more complete since - as you already know - it offers a huge collection of libraries for a large variety of tasks. Anyway, normally you don't need to develop complicate awk programs but you can consider it as a handy command line utility, so that you can limit your learning process to the basics.
 
  


Reply

Tags
script, text


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Renaming multiple files from filnames within a text file lt1776 Linux - Newbie 18 02-24-2010 03:33 PM
Combine multiple one column text file into one text file with multiple colum khairilthegreat Linux - Newbie 7 11-23-2007 01:31 PM
Steps needed to convert multiple text files into one master text file jamtech Programming 5 10-07-2007 11:24 PM
Splitting humongously huge text file frankie_DJ Programming 17 05-31-2007 04:38 PM
Script: splitting lines in multiple files and joining them timmay9162 Programming 28 04-14-2006 08:52 AM


All times are GMT -5. The time now is 11:30 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration