awk searching a string from a file within another file
I'm trying to copy some text from a website to a file using awk. The text I want is between 2 strings I know. e.g.
text I don't want string 1 text I want to grab more text I want to grab string 2 more text I don't want I've written a bash script which works if I manually enter the date. However, I'd like to automate date entry. I used awk -f to read from a file but, awk thinks the filename is todaysdate,/to comments/. I've tried using " and ' to comment the filename, but awk ignores this. Is there a better way to input the date (pipes, setting variables etc)? #!/bin/bash curl http://www.crossfit.com > crossfit.html textutil -convert txt crossfit.html date '+%A %d%m%y' > todaysdate awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt thanks a lot |
Hi,
Using awk and a seperate file with the date for this is to complicated (my opinion). Using sed and variables would make this a lot easier, as the example shows: Code:
#!/bin/bash Code:
#!/bin/bash |
Thanks for the reply Druuna
However, the script still does not work. I removed the -n option from sed so it would write it's output to a new text file and changed crossfit.html to crossfit.txt to give: sed "/$todaysdate/,/to comments/p" crossfit.txt >> workout.txt But the text file contains all of the original document, not just the pattern. I have played around with switches using /p,/P,/n,/N and /x and every time I get the whole document. What am I doing wrong? I tried using your script and the modified version of mine, I also manually entered the date; 060807, as it's incorrect on the crossfit page today. e.g. sed "/'Monday 060807'/,/'toThis'/x" crossfit.txt >> workout.txt Any Ideas? |
Hi,
It's not clear to me what's wrong: - Why did you remove the -n option, it's essential for the correct working of the sed command. - Could it be that sed is being greedy. I.e: are the 2 string pairs unique? If that doesn't help: Could you post a few examples of what your tried (and the results). Hope this helps. |
Hi Druuna
Thank you very much for your help. I have it working now using this script:
I had written the last line like this: sed -n "/'$todayDate'/,/'to comments.'/p" crossfit.txt > workout.txtwith comments and that is why it didn't work. Thanks again Johnny |
The search term "Monday 060807" is not unique to the page. But on the other matches, there is no end range match. So a false positive is what was causing the problem.
sed -n '/<h3 class="title">'"$todayDate"'<\/h3>/,/to comment/p' crossfit.html I studied the problem like this: cat -n crossfit.html | sed -n '/Monday 060807/p' The cat -n part will add line numbers to each line which makes it easier to see what the ranges are. |
I'd like to thank everybody for their help. I've finally got a working script for the crossfit webpage. It downloads the workout of the day and adds it to an html file, so that you have an archive of workouts.
I used 3 scripts and a text file.
MAIN SCRIPT
WORKDAY SCRIPT #!/bin/bash REST DAY SCRIPT #!/bin/bash |
awk format
#!/bin/bash
curl some-site > crossfit.html textutil -convert txt crossfit.html date '+%A %d%m%y' > todaysdate awk -f 'todaysdate',/"to comments"/ crossfit.txt >> workout.txt This doesn't work since 'awk -f' expects the filename to be an input script. See man awk. The correct format is: awk '/todaysdate/,/to comments/' crossfit.txt >> workout.txt You could write an awk script to make life easier: -------------------------- #!/usr/bin/awk -f /todaysdate/,/to comments/ { print >"workout.txt" } -------------------------- save this as thescript.awk and chmod a+x Run it like: $ curl some-site |thescript.awk |
All times are GMT -5. The time now is 07:30 PM. |