LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 08-07-2006, 05:07 AM   #1
changcheh
Member
 
Registered: Nov 2003
Location: china
Distribution: Linux Mint
Posts: 49

Rep: Reputation: 15
awk searching a string from a file within another file


I'm trying to copy some text from a website to a file using awk. The text I want is between 2 strings I know. e.g.

text I don't want
string 1
text I want to grab
more text I want to grab

string 2
more text I don't want

I've written a bash script which works if I manually enter the date. However, I'd like to automate date entry.

I used awk -f to read from a file but, awk thinks the filename is todaysdate,/to comments/. I've tried using " and ' to comment the filename, but awk ignores this. Is there a better way to input the date (pipes, setting variables etc)?


#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html
date '+%A %d%m%y' > todaysdate
awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt

thanks a lot
 
Old 08-07-2006, 05:40 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi,

Using awk and a seperate file with the date for this is to complicated (my opinion). Using sed and variables would make this a lot easier, as the example shows:

Code:
#!/bin/bash

fromThis="string 1"
toThis="string 2"

sed -n "/$fromThis/,/$toThis/p" infile
Or, using your code snippet:
Code:
#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html
todaysdate="`date '+%A %d%m%y'`"
sed -n "/$todaysdate/,/to comments/p" crossfit.html >> workout.txt
Hope this helps.
 
Old 08-07-2006, 02:57 PM   #3
changcheh
Member
 
Registered: Nov 2003
Location: china
Distribution: Linux Mint
Posts: 49

Original Poster
Rep: Reputation: 15
Thanks for the reply Druuna

However, the script still does not work. I removed the -n option from sed so it would write it's output to a new text file and changed crossfit.html to crossfit.txt to give:
sed "/$todaysdate/,/to comments/p" crossfit.txt >> workout.txt

But the text file contains all of the original document, not just the pattern. I have played around with switches using /p,/P,/n,/N and /x and every time I get the whole document. What am I doing wrong?

I tried using your script and the modified version of mine, I also manually entered the date; 060807, as it's incorrect on the crossfit page today. e.g.

sed "/'Monday 060807'/,/'toThis'/x" crossfit.txt >> workout.txt

Any Ideas?
 
Old 08-07-2006, 03:14 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi,

It's not clear to me what's wrong:
- Why did you remove the -n option, it's essential for the correct working of the sed command.
- Could it be that sed is being greedy. I.e: are the 2 string pairs unique?

If that doesn't help:

Could you post a few examples of what your tried (and the results).

Hope this helps.
 
Old 08-07-2006, 03:44 PM   #5
changcheh
Member
 
Registered: Nov 2003
Location: china
Distribution: Linux Mint
Posts: 49

Original Poster
Rep: Reputation: 15
Hi Druuna

Thank you very much for your help. I have it working now using this script:

#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html

todayDate="`date '+%B %d, %Y'`"

sed -n "/$todayDate/,/to comments./p" crossfit.txt > workout.txt


I had written the last line like this:
sed -n "/'$todayDate'/,/'to comments.'/p" crossfit.txt > workout.txt
with comments and that is why it didn't work.

Thanks again
Johnny
 
Old 08-07-2006, 06:10 PM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
The search term "Monday 060807" is not unique to the page. But on the other matches, there is no end range match. So a false positive is what was causing the problem.

sed -n '/<h3 class="title">'"$todayDate"'<\/h3>/,/to comment/p' crossfit.html

I studied the problem like this:
cat -n crossfit.html | sed -n '/Monday 060807/p'

The cat -n part will add line numbers to each line which makes it easier to see what the ranges are.

Last edited by jschiwal; 08-10-2006 at 03:22 AM.
 
Old 08-13-2006, 07:35 AM   #7
changcheh
Member
 
Registered: Nov 2003
Location: china
Distribution: Linux Mint
Posts: 49

Original Poster
Rep: Reputation: 15
I'd like to thank everybody for their help. I've finally got a working script for the crossfit webpage. It downloads the workout of the day and adds it to an html file, so that you have an archive of workouts.

I used 3 scripts and a text file.
  1. Create a text file called 3days with a single number in it - e.g. for the 1st day of the four day cycle set it to 1, or for the rest day set it to 4.
  2. Create the 3 scripts: script, workday and restday and make them executable; chmod +x scriptsname
  3. Create a folder named john, or change the script to specify another folder name
  4. Run the ./script once a day

MAIN SCRIPT

#!/bin/bash
file=`cat 3days`
if [ $file = 1 ]
then
./workday
echo 2 > 3days
elif [ $file = 2 ]
then
./workday
echo 3 > 3days
elif [ $file = 3 ]
then
./workday
echo 4 > 3days
else
./restday
echo 1 > 3days
fi


WORKDAY SCRIPT
#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html

todayDate="`date '+%B %d, %Y'`"

sed -n "/$todayDate/,/to comments./p" crossfit.txt > john/"$todayDate".txt
cat john/"$todayDate".txt >> john/Workouts.txt
rm crossfit.txt

cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html


REST DAY SCRIPT
#!/bin/bash
curl http://www.crossfit.com > crossfit.html
todayDate="`date '+%B %d, %Y'`"
cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html | sed -n -e '/Rest Day/p' > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html
 
Old 12-29-2006, 09:18 AM   #8
grahamatlq
Member
 
Registered: Dec 2006
Posts: 37

Rep: Reputation: 17
awk format

#!/bin/bash

curl some-site > crossfit.html
textutil -convert txt crossfit.html
date '+%A %d%m%y' > todaysdate
awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt

This doesn't work since 'awk -f' expects the filename to be an input script.
See man awk.

The correct format is:
awk '/todaysdate/,/to comments/' crossfit.txt >> workout.txt

You could write an awk script to make life easier:
--------------------------
#!/usr/bin/awk -f

/todaysdate/,/to comments/
{
print >"workout.txt"
}
--------------------------
save this as thescript.awk and chmod a+x

Run it like:

$ curl some-site |thescript.awk
 
  


Reply

Tags
awk, bash, curl, script, website


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
c++ file string from text file into 1d array ak3 Programming 16 12-29-2006 09:45 AM
awk: fatal:cannot open file for reading (no such file or Directory) in Linux sangati vishwanath Linux - Software 4 07-06-2005 12:59 AM
File Searching gfrair Linux - Newbie 1 03-14-2005 06:24 PM
searching with awk pantera Programming 1 05-14-2004 06:57 AM
File Searching JC404 Linux - Newbie 2 08-02-2003 08:45 PM


All times are GMT -5. The time now is 10:54 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration