LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-07-2009, 02:32 PM   #1
truculentknight
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Rep: Reputation: 0
Editing Large Text Files.


hey everyone,

I'm having problems trying to edit a large text file, the file is a result of software analysing the data.

The data is based upon products...heres an example!!


<item>

<title> Callaway Golf Mens RH X-Forged Chrome Approach Wedge (50 Degrees-12 Degrees Bounce) S300 (Stiff) Flex </title>

<category>Sports</category>

<pubDate>Sat, 06 Feb 2010 08:05:13 GMT</pubDate>

<link></link>

<description>

<a href=""><b>Callaway Golf Mens RH X-Forged Chrome Approach Wedge (50 Degrees-12 Degrees Bounce) S300 (Stiff) Flex</b></a><br>

<table>

<tr>

<td><a href=""><img align="left" src="http://shop.callawaygolf.com/images/products/wedges/2008/x-forged-chrome/1.jpg"> </a>

Legendary clubmaker Roger Cleveland raised the bar once again with the new X-Forged Wedges. Designed with input from Tour players, they are constructed from soft 1020 carbon steel for incredible feel. The clubs also feature a tighter heel-toe radius that provides increased versatility from anywhere around the green. </td>

</tr>

<tr>

<td>

Price: $109.00 <a href="">Buy/More Info</a>

</td>

</tr>

</table>

</description>

</item>


I would love to learn how to separate important information, I want to determine how many categories there are, I tried using grep but I couldn't get it to work.

Couldn't grep be used with a wild card to identify all of the categories within this large file? something like "<category>*</category>"

I would also like to identify products that are less then $100, how can both of these things be done?

Thanks!!
 
Old 11-07-2009, 03:22 PM   #2
gerryd
LQ Newbie
 
Registered: Jul 2009
Distribution: slackware 13
Posts: 21

Rep: Reputation: 1
grep displays the lines matching the pattern you indicate. for it to work here each category would have to be on a single line.
 
Old 11-07-2009, 03:58 PM   #3
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,979

Rep: Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850
Hello truculentknight and welcome to LQ,

in this case egrep should work for you:
Code:
egrep '\<category\>' *
will print every line with the category-tag.

Otherwise this looks strongly like an xml-file. I'd suggest to use a scripting language like perl which comes with a package for scanning xml-files (http://xml.coverpages.org/perl-xml-faq11.html). This will help you if you have to do something more elaborate than simply find lines in such a file.

Markus
 
Old 11-07-2009, 06:45 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
you are using the wrong tool for the job. ideally you should be using a HTML or XML parser. But if you want to do it hardcore, use gawk
Code:
awk -vRS="</item>" '
{
 gsub(/.*Price:?/,"")
 gsub(/<.*/,"")
 print
}
' file
output
Code:
# ./shell.sh
 $109.00
See here or here for similar examples

Last edited by ghostdog74; 11-07-2009 at 07:06 PM.
 
Old 11-08-2009, 09:54 PM   #5
truculentknight
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Original Poster
Rep: Reputation: 0
WoW!! You guys are really helpful, by any chance, could someone help me figure out how to write a perl script that will help me with these large data files?

I need a perl script that can determine...

1. How many different categories there are, I need a number. And also display all the different categories on the terminal, not displaying any category more then once.
2. Extract all of the categories I specify along with the product associated with the categories (all of the xml) into a separate file.

I'm seriously not a programmer, I've used linux for years, but still I can't program, I don't think this script would be that hard? Can someone help me with it?

Thanks.
 
Old 11-08-2009, 10:27 PM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
nobody is a programmer at first. All we ever did was read the docs and practice! If you want to program in Perl, read the docs and start to learn it. See my sig for Perl doc link.
 
Old 11-08-2009, 11:46 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
As per ghostdog, you're best off learning how to program or you'll be forever asking qns and unable to make the most of the answers, which may a take a long time to arrive.
Start with the Perl docs as per his link, then look at search.cpan.org .
Search on XML. XML::Parser http://search.cpan.org/~msergeant/XM...2.36/Parser.pm is comprehensive, but probably overkill. Try XML::Simple http://search.cpan.org/~grantm/XML-S.../XML/Simple.pm or XML::Twig http://search.cpan.org/~mirod/XML-Twig-3.32/Twig.pm
 
Old 11-09-2009, 12:24 AM   #8
truculentknight
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Original Poster
Rep: Reputation: 0
hey!

I would totally love to do that, unfortunately I need the perl script to run my business, i dont have time to learn perl... haha

Instead i will just hire someone to make the script for me. Thanks anyways!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting large text files tmaxx AIX 14 02-19-2009 07:32 PM
how can I differentiate two large text files using shell script? Files are like below surya_gadde Linux - Software 1 01-20-2009 03:52 AM
Text editor for large files PMorph Linux - Software 1 07-17-2007 09:07 AM
sed with large text files? apollyonus Linux - Server 3 03-22-2007 08:33 AM
editing text files killertofu Linux - Newbie 2 10-15-2004 06:55 PM


All times are GMT -5. The time now is 10:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration