LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-09-2004, 05:55 PM   #1
linux-nerd
LQ Newbie
 
Registered: Sep 2004
Posts: 26

Rep: Reputation: 15
Using Grep and Egrep


I have a file like the one below:

Code:
   REFRESH(1800 sec):
   [1]file://localhost/home/george/Documents/food_network/www.foodnetwork
   .com/food/recipes/recipe/0%2C%2CFOOD_9936_9266%2C00.html

   [spacer.gif]
   [spacer.gif] [2][spacer.gif] [spacer.gif] [spacer.gif]

                                  IFRAME:
   [3]http://adsremote.scripps.com/html.ng/site=FOOD&category=RECIPES&vgn
       content=RECIPES&pagetype=RECIPE_DETAIL&adsize=468x60&PagePos=1
   [4][params.richmedia=yes&site=FOOD&category=RECIPES&vgncontent=RECIPES
             &pagetype=RECIPE_DETAIL&adsize=468x60&PagePos=1] 

   [spacer.gif] [5][spacer.gif] 
   [6][USEMAP:spacer.gif]

   [spacer.gif]
   [lnb_bgseperator.jpg]

   [spacer.gif]
   Search
   ____________
   [Recipes] Go!
   o  [7]Search Tips
   [spacer.gif]

   [spacer.gif]
   [8]RECIPES 
   [spacer.gif]
   o   [9]Power Search 
   o   [10]Recipe Collections 
   o   [11]Recipes of the Day 
   [spacer.gif]
   [spacer.gif]

   In Our Store
   [12][BKS6045_lnb.jpg] 
   [13]Paula Deen 3-Book Kit
   $45.95

                               Find a TV Show

                        [Select a TV Show.........]
                               [spacer.gif]

   Sponsor
   Recommendations
   [spacer.gif]
   [spacer.gif]
   [spacer.gif]
   [spacer.gif]

   [spacer.gif] [spacer.gif]
   [spacer.gif]
   [14]Home > [15]Recipes
   [spacer.gif]
   Persimmon Punch (Soo Jeung Ga)
   Recipe courtesy Hyungshin Song
   Show:  [16]Cooking Live Episode:  [17]Solal: Korean New Year 
   [spacer.gif]
   [spacer.gif] Recipe Summary
   Prep Time: 15 minutes
   Cook Time: 8 hours 30 minutes
   Yield: about 2 quarts
   Ratings and Reviews
   User Rating: No Rating
   [18]Rate Recipe    [19]Read Reviews [20]Ratings & Reviews FAQ
   [spacer.gif]
   [21]Add To Recipe Box [22]Add to My Recipe Box
   [23]Email [24]Email to a Friend 
   [25]Print  Print: [26]Full Page
   [27]3X5 Card | [28]4X6 Card
   [spacer.gif]
   [spacer.gif]

                                [spacer.gif]
                               ADVERTISEMENT
                                [spacer.gif]
                                [spacer.gif]

   2 quarts water
   1/2 cup fresh ginger, sliced thin
   3 cinnamon sticks
   1 1/2 cups sugar
   1 cup dried persimmons, sliced
   1/2 cup pine nuts

   In a pot, combine the water, ginger, and cinnamon and let simmer for
   1/2 hour. Remove from heat and strain the liquid. Stir in the sugar
   and persimmons. Cool and let sit in the refrigerator overnight. Serve
   well chilled with a teaspoon of pine nuts floating in each cup.

   Other Recipes from this Episode
   [bullet_orange_3x3.gif] [29]Rice Cake Soup with Wontons (Duk Kook)
   [bullet_orange_3x3.gif] [30]Water Kimchi (Mul Kimchi)
   [bullet_orange_3x3.gif] [31]Clam and Scallion Pancakes (Jogae Pa Jon)

   [spacer.gif] [spacer.gif]

                  [spacer.gif] [32]Home   |  [33]About Us
            |  [34]Questions  |  [35]Advertising  |  [36]Privacy
                            Policy  |  [37]Legal
      [38]DIY   |  [39]FINE LIVING   |  [40]HGTV   |  [41]Shop At Home
                           |  [42]Video On Demand

           [43] 2004 Scripps Networks, Inc. All rights reserved.

   [44][flogo.jpg] 

   [blank.gif]

References

   1. file://localhost/home/george/Documents/food_network/www.foodnetwork.com/food/recipes/recipe/0%2C%2CFOOD_9936_9266%2C00.html
   2. http://www.foodnetwork.com/
   3. http://adsremote.scripps.com/html.ng...8x60&PagePos=1
   4. http://adsremote.scripps.com/click.n...8x60&PagePos=1
   5. http://www.foodnetwork.com/
   6. LYNXIMGMAP:file://localhost/home/george/Documents/food_network/www.foodnetwork.com/food/recipes/recipe/0%2C%2CFOOD_9936_9266%2C00.html#mainnav
   7. http://www.foodnetwork.com/food/reci...840790,00.html
   8. file://localhost/food/recipes/0,1977,FOOD_9936,00.html
   9. file://localhost/food/re_super_search/0,1977,FOOD_9934,00.html
  10. file://localhost/food/re_collections/0,1977,FOOD_11656,00.html
  11. file://localhost/food/re_chef_host_recipes_week/0,1977,FOOD_9931,00.html
  12. http://store.foodnetwork.com/shop/pr...pe=subcategory
  13. http://store.foodnetwork.com/shop/pr...pe=subcategory
  14. file://localhost/food/home/0,1904,FOOD_9888,00.html
  15. file://localhost/food/recipes/0,1977,FOOD_9936,00.html
  16. file://localhost/food/show_cl/0,1976,FOOD_9952,00.html
  17. file://localhost/food/show_cl/episode/0,1976,FOOD_9952_14041,00.html
  18. http://web.foodnetwork.com/food/web/...,,9266,00.html
  19. file://localhost/food/my_recipe_box/review/0,1973,FOOD_9919_9266,00.html
  20. http://www.foodnetwork.com/food/reci...D_9935,00.html
  21. http://web.foodnetwork.com/food/web/...peBox?rid=9266
  22. http://web.foodnetwork.com/food/web/...peBox?rid=9266
  23. javascript:popup('/cr/cda/email/recipe/1,1249,FOOD_9936_9266,00.html',400,350)
  24. javascript:popup('/cr/cda/email/recipe/1,1249,FOOD_9936_9266,00.html',400,350)
  25. file://localhost/food/cda/recipe_print/0,1946,FOOD_9936_9266_PRINT-RECIPE-FULL-PAGE,00.html
  26. file://localhost/food/cda/recipe_print/0,1946,FOOD_9936_9266_PRINT-RECIPE-FULL-PAGE,00.html
  27. file://localhost/food/cda/recipe_print/0,1946,FOOD_9936_9266_PRINT-RECIPE-3X5-CARD,00.html
  28. file://localhost/food/cda/recipe_print/0,1946,FOOD_9936_9266_PRINT-RECIPE-4X6-CARD,00.html
  29. file://localhost/food/recipes/recipe/0,1977,FOOD_9936_9263,00.html
  30. file://localhost/food/recipes/recipe/0,1977,FOOD_9936_9264,00.html
  31. file://localhost/food/recipes/recipe/0,1977,FOOD_9936_9265,00.html
  32. file://localhost/food/home
  33. file://localhost/food/about_us
  34. file://localhost/food/faq
  35. file://localhost/food/advertising
  36. file://localhost/food/privacy_policy
  37. file://localhost/food/legal_information
  38. http://www.diynetwork.com/
  39. http://www.fineliving.com/
  40. http://www.hgtv.com/
  41. http://www.shopathometv.com/
  42. http://www.foodnetwork.com/food/food_network_on_demand
  43. http://www.scripps.com/
  44. http://www.foodnetwork.com/
How do I strip it so I just get this:

Code:
 [15]Recipes
   [spacer.gif]
   Persimmon Punch (Soo Jeung Ga)
   Recipe courtesy Hyungshin Song
   Show:  [16]Cooking Live Episode:  [17]Solal: Korean New Year 
   [spacer.gif]
   [spacer.gif] Recipe Summary
   Prep Time: 15 minutes
   Cook Time: 8 hours 30 minutes
   Yield: about 2 quarts
   Ratings and Reviews
   User Rating: No Rating
   [18]Rate Recipe    [19]Read Reviews [20]Ratings & Reviews FAQ
   [spacer.gif]
   [21]Add To Recipe Box [22]Add to My Recipe Box
   [23]Email [24]Email to a Friend 
   [25]Print  Print: [26]Full Page
   [27]3X5 Card | [28]4X6 Card
   [spacer.gif]
   [spacer.gif]

                                [spacer.gif]
                               ADVERTISEMENT
                                [spacer.gif]
                                [spacer.gif]

   2 quarts water
   1/2 cup fresh ginger, sliced thin
   3 cinnamon sticks
   1 1/2 cups sugar
   1 cup dried persimmons, sliced
   1/2 cup pine nuts

   In a pot, combine the water, ginger, and cinnamon and let simmer for
   1/2 hour. Remove from heat and strain the liquid. Stir in the sugar
   and persimmons. Cool and let sit in the refrigerator overnight. Serve
   well chilled with a teaspoon of pine nuts floating in each cup.

   Other Recipes from this Episode
   [bullet_orange_3x3.gif] [29]Rice Cake Soup with Wontons (Duk Kook)
   [bullet_orange_3x3.gif] [30]Water Kimchi (Mul Kimchi)
   [bullet_orange_3x3.gif] [31]Clam and Scallion Pancakes (Jogae Pa Jon)

   [spacer.gif] [spacer.gif]

                  [spacer.gif] [32]Home
So the part I want starts with
`[15]` and ends with `[number]Home`

Once we have that as a pipe, i need to

1)Strip out everything between the square brackets []
2)Rename the file so it is called Persimmon Punch (Soo Jeung Ga).txt (i.e. the 3rd line)

The file may be longer or shorter than the one specified above.

This needs to be done recursively

i.e.

for i in *.html

[do stuff with $i as the filename string]


done;

Last edited by linux-nerd; 10-09-2004 at 05:56 PM.
 
Old 10-09-2004, 06:09 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
http://www.linuxquestions.org/rules.php
Do not expect LQ members to do your homework - you will learn much more by doing it yourself.
 
Old 10-09-2004, 06:11 PM   #3
linux-nerd
LQ Newbie
 
Registered: Sep 2004
Posts: 26

Original Poster
Rep: Reputation: 15
Thats not my homework... I need to do it for a friend...
 
Old 10-09-2004, 06:23 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Why would a friend require:
Quote:
The file may be longer or shorter than the one specified above.

This needs to be done recursively
If it's not your homework it's still his, and the same
thing applies...



Cheers,
Tink
 
Old 10-09-2004, 06:32 PM   #5
linux-nerd
LQ Newbie
 
Registered: Sep 2004
Posts: 26

Original Poster
Rep: Reputation: 15
This isnt homework.

I am doing it for a friend, she doesnt know how to use linux- knows much less than me.

I have downloaded an entire recipe site for my friend, as her windows computer ate all her files.
She had used copy + paste from the website for nearly 3000 recipe files (and other websites).
I thought there might be an easier way using egrep/grep. I know how to use grep for basic stuff like extracting matches from files, but not how to change them.

If you cant give me a complete answer, perhaps a hint would be good like....
"to extract everything between two specific regexp (or text in backquotes) then you need to type this'

or

"to output the nth line of a file, you need to type this"

or

"to strip a regexp from a file occuring multiple times...."



Once I know how to do it for one file, it will work for the 500+ files i have here, I am sure.
If not, I will fiddle with it until it does.

I also need to do it for other websites, which I will be able to suss out myself.
 
Old 10-10-2004, 11:37 AM   #6
linux-nerd
LQ Newbie
 
Registered: Sep 2004
Posts: 26

Original Poster
Rep: Reputation: 15
well i did it.....
Code:
mkdir stripped2;
for i in *.html;
do
lynx -dump -hiddenlinks=ignore -nolist $i | sed -e '1,56d' | sed -e '/.gif/d' |sed -e '/ADVERTISEMENT/d' | sed -e '/Privacy/d'| sed -e '/HGTV/d' | sed -e '/Scripps/d' | sed -e '/.jpg/d' > $i.txt
  
echo "file done"
done;

Last edited by linux-nerd; 10-10-2004 at 11:45 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep/egrep logical AND function? boozer_2 Linux - Newbie 11 04-10-2010 01:19 AM
Problem matching strings with grep/egrep Seb74 Linux - Newbie 5 05-26-2005 01:40 PM
Help with egrep smart_sagittari Linux - Newbie 2 05-02-2005 08:18 AM
ps -ef|grep -v root|grep apache<<result maelstrombob Linux - Newbie 1 09-24-2003 11:38 AM
Using egrep Barbarian Programming 5 10-20-2002 02:54 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 08:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration