LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   awk searching a string from a file within another file (https://www.linuxquestions.org/questions/linux-software-2/awk-searching-a-string-from-a-file-within-another-file-471533/)

changcheh 08-07-2006 05:07 AM

awk searching a string from a file within another file
 
I'm trying to copy some text from a website to a file using awk. The text I want is between 2 strings I know. e.g.

text I don't want
string 1
text I want to grab
more text I want to grab

string 2
more text I don't want

I've written a bash script which works if I manually enter the date. However, I'd like to automate date entry.

I used awk -f to read from a file but, awk thinks the filename is todaysdate,/to comments/. I've tried using " and ' to comment the filename, but awk ignores this. Is there a better way to input the date (pipes, setting variables etc)?


#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html
date '+%A %d%m%y' > todaysdate
awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt

thanks a lot

druuna 08-07-2006 05:40 AM

Hi,

Using awk and a seperate file with the date for this is to complicated (my opinion). Using sed and variables would make this a lot easier, as the example shows:

Code:

#!/bin/bash

fromThis="string 1"
toThis="string 2"

sed -n "/$fromThis/,/$toThis/p" infile

Or, using your code snippet:
Code:

#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html
todaysdate="`date '+%A %d%m%y'`"
sed -n "/$todaysdate/,/to comments/p" crossfit.html >> workout.txt

Hope this helps.

changcheh 08-07-2006 02:57 PM

Thanks for the reply Druuna

However, the script still does not work. I removed the -n option from sed so it would write it's output to a new text file and changed crossfit.html to crossfit.txt to give:
sed "/$todaysdate/,/to comments/p" crossfit.txt >> workout.txt

But the text file contains all of the original document, not just the pattern. I have played around with switches using /p,/P,/n,/N and /x and every time I get the whole document. What am I doing wrong?

I tried using your script and the modified version of mine, I also manually entered the date; 060807, as it's incorrect on the crossfit page today. e.g.

sed "/'Monday 060807'/,/'toThis'/x" crossfit.txt >> workout.txt

Any Ideas?

druuna 08-07-2006 03:14 PM

Hi,

It's not clear to me what's wrong:
- Why did you remove the -n option, it's essential for the correct working of the sed command.
- Could it be that sed is being greedy. I.e: are the 2 string pairs unique?

If that doesn't help:

Could you post a few examples of what your tried (and the results).

Hope this helps.

changcheh 08-07-2006 03:44 PM

Hi Druuna

Thank you very much for your help. I have it working now using this script:

#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html

todayDate="`date '+%B %d, %Y'`"

sed -n "/$todayDate/,/to comments./p" crossfit.txt > workout.txt


I had written the last line like this:
sed -n "/'$todayDate'/,/'to comments.'/p" crossfit.txt > workout.txt
with comments and that is why it didn't work.

Thanks again
Johnny

jschiwal 08-07-2006 06:10 PM

The search term "Monday 060807" is not unique to the page. But on the other matches, there is no end range match. So a false positive is what was causing the problem.

sed -n '/<h3 class="title">'"$todayDate"'<\/h3>/,/to comment/p' crossfit.html

I studied the problem like this:
cat -n crossfit.html | sed -n '/Monday 060807/p'

The cat -n part will add line numbers to each line which makes it easier to see what the ranges are.

changcheh 08-13-2006 07:35 AM

I'd like to thank everybody for their help. I've finally got a working script for the crossfit webpage. It downloads the workout of the day and adds it to an html file, so that you have an archive of workouts.

I used 3 scripts and a text file.
  1. Create a text file called 3days with a single number in it - e.g. for the 1st day of the four day cycle set it to 1, or for the rest day set it to 4.
  2. Create the 3 scripts: script, workday and restday and make them executable; chmod +x scriptsname
  3. Create a folder named john, or change the script to specify another folder name
  4. Run the ./script once a day

MAIN SCRIPT

#!/bin/bash
file=`cat 3days`
if [ $file = 1 ]
then
./workday
echo 2 > 3days
elif [ $file = 2 ]
then
./workday
echo 3 > 3days
elif [ $file = 3 ]
then
./workday
echo 4 > 3days
else
./restday
echo 1 > 3days
fi


WORKDAY SCRIPT
#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html

todayDate="`date '+%B %d, %Y'`"

sed -n "/$todayDate/,/to comments./p" crossfit.txt > john/"$todayDate".txt
cat john/"$todayDate".txt >> john/Workouts.txt
rm crossfit.txt

cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html


REST DAY SCRIPT
#!/bin/bash
curl http://www.crossfit.com > crossfit.html
todayDate="`date '+%B %d, %Y'`"
cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html | sed -n -e '/Rest Day/p' > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html

grahamatlq 12-29-2006 09:18 AM

awk format
 
#!/bin/bash

curl some-site > crossfit.html
textutil -convert txt crossfit.html
date '+%A %d%m%y' > todaysdate
awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt

This doesn't work since 'awk -f' expects the filename to be an input script.
See man awk.

The correct format is:
awk '/todaysdate/,/to comments/' crossfit.txt >> workout.txt

You could write an awk script to make life easier:
--------------------------
#!/usr/bin/awk -f

/todaysdate/,/to comments/
{
print >"workout.txt"
}
--------------------------
save this as thescript.awk and chmod a+x

Run it like:

$ curl some-site |thescript.awk


All times are GMT -5. The time now is 07:30 PM.