LinuxQuestions.org - awk searching a string from a file within another file

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - awk searching a string from a file within another file (https://www.linuxquestions.org/questions/linux-software-2/awk-searching-a-string-from-a-file-within-another-file-471533/)

awk searching a string from a file within another file

I'm trying to copy some text from a website to a file using awk. The text I want is between 2 strings I know. e.g.

text I don't want
string 1
text I want to grab
more text I want to grab
string 2
more text I don't want

I've written a bash script which works if I manually enter the date. However, I'd like to automate date entry.

I used awk -f to read from a file but, awk thinks the filename is todaysdate,/to comments/. I've tried using " and ' to comment the filename, but awk ignores this. Is there a better way to input the date (pipes, setting variables etc)?

#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html
date '+%A %d%m%y' > todaysdate
awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt

thanks a lot

Hi,

Using awk and a seperate file with the date for this is to complicated (my opinion). Using sed and variables would make this a lot easier, as the example shows:

Code:

#!/bin/bash



fromThis="string 1"

toThis="string 2"



sed -n "/$fromThis/,/$toThis/p" infile

Or, using your code snippet:

Code:

#!/bin/bash



curl http://www.crossfit.com > crossfit.html

textutil -convert txt crossfit.html

todaysdate="`date '+%A %d%m%y'`"

sed -n "/$todaysdate/,/to comments/p" crossfit.html >> workout.txt

Hope this helps.

Thanks for the reply Druuna

However, the script still does not work. I removed the -n option from sed so it would write it's output to a new text file and changed crossfit.html to crossfit.txt to give:
sed "/$todaysdate/,/to comments/p" crossfit.txt >> workout.txt

But the text file contains all of the original document, not just the pattern. I have played around with switches using /p,/P,/n,/N and /x and every time I get the whole document. What am I doing wrong?

I tried using your script and the modified version of mine, I also manually entered the date; 060807, as it's incorrect on the crossfit page today. e.g.

sed "/'Monday 060807'/,/'toThis'/x" crossfit.txt >> workout.txt

Any Ideas?

Hi,

It's not clear to me what's wrong:
- Why did you remove the -n option, it's essential for the correct working of the sed command.
- Could it be that sed is being greedy. I.e: are the 2 string pairs unique?

If that doesn't help:

Could you post a few examples of what your tried (and the results).

Hope this helps.

Hi Druuna

Thank you very much for your help. I have it working now using this script:

#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html

todayDate="`date '+%B %d, %Y'`"

sed -n "/$todayDate/,/to comments./p" crossfit.txt > workout.txt

I had written the last line like this:

sed -n "/'$todayDate'/,/'to comments.'/p" crossfit.txt > workout.txt

with comments and that is why it didn't work.

Thanks again
Johnny

The search term "Monday 060807" is not unique to the page. But on the other matches, there is no end range match. So a false positive is what was causing the problem.

sed -n '/<h3 class="title">'"$todayDate"'<\/h3>/,/to comment/p' crossfit.html

I studied the problem like this:
cat -n crossfit.html | sed -n '/Monday 060807/p'

The cat -n part will add line numbers to each line which makes it easier to see what the ranges are.

I'd like to thank everybody for their help. I've finally got a working script for the crossfit webpage. It downloads the workout of the day and adds it to an html file, so that you have an archive of workouts.

I used 3 scripts and a text file.

Create a text file called 3days with a single number in it - e.g. for the 1st day of the four day cycle set it to 1, or for the rest day set it to 4.
Create the 3 scripts: script, workday and restday and make them executable; chmod +x scriptsname
Create a folder named john, or change the script to specify another folder name
Run the ./script once a day

MAIN SCRIPT

#!/bin/bash
file=`cat 3days`
if [ $file = 1 ]
then
./workday
echo 2 > 3days
elif [ $file = 2 ]
then
./workday
echo 3 > 3days
elif [ $file = 3 ]
then
./workday
echo 4 > 3days
else
./restday
echo 1 > 3days
fi

WORKDAY SCRIPT
#!/bin/bash

curl http://www.crossfit.com > crossfit.html
textutil -convert txt crossfit.html

todayDate="`date '+%B %d, %Y'`"

sed -n "/$todayDate/,/to comments./p" crossfit.txt > john/"$todayDate".txt
cat john/"$todayDate".txt >> john/Workouts.txt
rm crossfit.txt

cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html

REST DAY SCRIPT
#!/bin/bash
curl http://www.crossfit.com > crossfit.html
todayDate="`date '+%B %d, %Y'`"
cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html | sed -n -e '/Rest Day/p' > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html

#!/bin/bash

curl some-site > crossfit.html
textutil -convert txt crossfit.html
date '+%A %d%m%y' > todaysdate
awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt

This doesn't work since 'awk -f' expects the filename to be an input script.
See man awk.

The correct format is:
awk '/todaysdate/,/to comments/' crossfit.txt >> workout.txt

You could write an awk script to make life easier:
--------------------------
#!/usr/bin/awk -f

/todaysdate/,/to comments/
{
print >"workout.txt"
}
--------------------------
save this as thescript.awk and chmod a+x

Run it like:

$ curl some-site |thescript.awk