awk searching a string from a file within another file
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
awk searching a string from a file within another file
I'm trying to copy some text from a website to a file using awk. The text I want is between 2 strings I know. e.g.
text I don't want string 1
text I want to grab
more text I want to grab string 2 more text I don't want
I've written a bash script which works if I manually enter the date. However, I'd like to automate date entry.
I used awk -f to read from a file but, awk thinks the filename is todaysdate,/to comments/. I've tried using " and ' to comment the filename, but awk ignores this. Is there a better way to input the date (pipes, setting variables etc)?
Using awk and a seperate file with the date for this is to complicated (my opinion). Using sed and variables would make this a lot easier, as the example shows:
Code:
#!/bin/bash
fromThis="string 1"
toThis="string 2"
sed -n "/$fromThis/,/$toThis/p" infile
However, the script still does not work. I removed the -n option from sed so it would write it's output to a new text file and changed crossfit.html to crossfit.txt to give: sed "/$todaysdate/,/to comments/p" crossfit.txt >> workout.txt
But the text file contains all of the original document, not just the pattern. I have played around with switches using /p,/P,/n,/N and /x and every time I get the whole document. What am I doing wrong?
I tried using your script and the modified version of mine, I also manually entered the date; 060807, as it's incorrect on the crossfit page today. e.g.
sed "/'Monday 060807'/,/'toThis'/x" crossfit.txt >> workout.txt
It's not clear to me what's wrong:
- Why did you remove the -n option, it's essential for the correct working of the sed command.
- Could it be that sed is being greedy. I.e: are the 2 string pairs unique?
If that doesn't help:
Could you post a few examples of what your tried (and the results).
The search term "Monday 060807" is not unique to the page. But on the other matches, there is no end range match. So a false positive is what was causing the problem.
sed -n '/<h3 class="title">'"$todayDate"'<\/h3>/,/to comment/p' crossfit.html
I studied the problem like this:
cat -n crossfit.html | sed -n '/Monday 060807/p'
The cat -n part will add line numbers to each line which makes it easier to see what the ranges are.
I'd like to thank everybody for their help. I've finally got a working script for the crossfit webpage. It downloads the workout of the day and adds it to an html file, so that you have an archive of workouts.
I used 3 scripts and a text file.
Create a text file called 3days with a single number in it - e.g. for the 1st day of the four day cycle set it to 1, or for the rest day set it to 4.
Create the 3 scripts: script, workday and restday and make them executable; chmod +x scriptsname
Create a folder named john, or change the script to specify another folder name
Run the ./script once a day
MAIN SCRIPT
#!/bin/bash
file=`cat 3days`
if [ $file = 1 ]
then
./workday
echo 2 > 3days
elif [ $file = 2 ]
then
./workday
echo 3 > 3days
elif [ $file = 3 ]
then
./workday
echo 4 > 3days
else
./restday
echo 1 > 3days
fi
cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html
REST DAY SCRIPT
#!/bin/bash
curl http://www.crossfit.com > crossfit.html
todayDate="`date '+%B %d, %Y'`"
cat crossfit.html | sed -n '/<div class="date"> '"$todayDate"' <\/div>/,/to comments./p' > work.html
cat work.html | sed s/"Post time to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post time and body weight to comments."/"For Time."/g > work.html
cat work.html | sed s/"Post loads to comments."/"For Load."/g > work.html
cat work.html | sed s/"Post your choice of girls and rounds completed to comments."/"Choose a Girl for Rounds."/g > work.html
cat work.html | sed -n -e '/Rest Day/p' > work.html
cat work.html > john/"$todayDate".html
cat work.html >> john/Workouts.html
rm work.html
rm crossfit.html
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.