Loop through list of URLs in txt file, parse out parameters, pass to wget in bash.
What I have:
1. list of URLs in text file (i.e. in this form http://www.domain.tld/more-stuff-here) 2. script that extracts parameters from text file with URLs (example below) 3. script that downloads file with wget (example below) I want to create a loop that: 1. takes a text file of URLs 2. parses $host and $host_and_domain from each URL 3. sends $host and $host_and_domain to the wget script 4. creates a file name by appending $host with time/date (i.e. mm:dd:yy:hh:mm:ss) Feel free to let me know if I could clarify anything. Also open to code examples to play with instead of outright answers. Thanks! Example of URL parsing script: Code:
#!/bin/sh Code:
#!/bin/sh |
Unfortunately, I can't get to that link. Anyway, what you need is a loop like
Code:
# assumes no spaces in urls http://rute.2038bug.com/index.html.gz http://tldp.org/LDP/Bash-Beginners-G...tml/index.html http://www.tldp.org/LDP/abs/html/ that should get you started |
Thanks Chris. That definitely helped. I used "export date=$(date +%s)" to append to "$host" for file naming convention.
Strangely, instead of iterating through the list, processing each line, the script only processes the last line of the text file. "test-urls.txt" is an 11 line file with no spaces. It contains URLs in this form: http://[host.com]/[pages-go-here] I'll look into this but if you have any suggestions in the meantime feel free to share. Here's the updated code: Code:
#!/bin/sh |
What line separators are you using, is this a Linux or DOS file?
Cheers, Tink |
Just out of curiosity, you are aware that your greps you are using are doing nothing?
If we assume the format you provided is correct for each line of the file (http://[host.com]/[pages-go-here]), then something like: Code:
proto="$(echo $full_url | grep :// | sed -e's,^\(.*://\).*,\1,g')" to reduce what has been past in. Also, I am not exactly sure what details are in the url lines in the file but are you aware that wget can read url information directly from a file? (just a thought) |
Also, no need to use the 'export' keyword to define a var unless you want it to be visible to a sub-shell.
|
Tink, I'm using CR line terminators and this is a Mac OS file.
Thanks |
grail, no, I wasn't aware of that. Übernoob with this stuff.
re: your thought, would this be wget's "-i" option? If so, the reason I didn't use this was because I also want to parse out the host of each URL and use the value of host to name the files I'm downloading. But if something else I could look into it. Thanks In case it helps, here're sample lines from the text file: Chris, thanks for the feedback. |
Do you have to use sh? Can you use bash? That is, could the first line be #!/bin/bash?
Reason for asking is that sh may effectively be several different shells depending on the distro and, even if it is linked to bash, bash when called as sh has a subset of its full functionality. Regards only getting the last line and evolving the code to work with URLs including spaces, the outer loop could be changed to Code:
while read -r full_url |
Code:
url="$(echo ${full_url/$proto/})" Code:
url=${full_url/$proto/} Code:
url=${full_url#$proto} |
Thanks, put #!/bin/bash instead.
So it appears the special characters in the URLs could prevent the script from working as intended. To troubleshoot I substituted the URLs with random strings without any special characters and can echo each line just fine. However even using the -r option in the below script doesn't produce any output when I reinsert the URLs into the text file. Code:
#!/bin/bash Code:
#!/bin/bash |
Problem solved: it was a filetype issue, as Tink may have alluded to earlier. The script copied/pasted in comment #3 works.
I noticed when the "file" command returned "ASCII test" and nothing else for the file in question, the script worked. However, when the "file" command returned, for example, "ASCII text, with CR line terminators," the script did not work. |
Quote:
So how does the script look now? There may be things that can be tided up such as replacing Code:
user="$(echo $url | grep @ | cut -d@ -f1)" Code:
user=${url%%@*} |
May I also ask if the content of the text file with urls you posted in #8 is incomplete? I ask as it obviously has no user and or host details anywhere in it (this may be confidential)
so the other lines for setting user and host seem to have nothing to work on?? |
catkin, yep, will happily paste in the new code, assuming I'm able to later today. Thanks
grail, the script I adapted for processing the URLs was written by someone else, and I didn't (still don't, to a certain extent) understand all the code. There was actually no user information in the original file, I just kept that line in there because I wasn't quite ready to mess with that part. Before posting the finished script, however, I plan to strip out all the superfluous code so you'll be able to see how I'm using it then. Thanks |
All times are GMT -5. The time now is 08:24 AM. |