LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-17-2020, 08:10 AM   #1
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Rep: Reputation: Disabled
Find if file available on Webpage


Hi There,

I am trying to find if a file is available on the particular website. Typically it is an .mp3.

If it is available, I wish to action a command, likewise if not available - either exit or action a command.

I have had a go, but not sure where I am going wrong... I am hoping to write this in BASH as the other files I am running are in bash - keeping it simple (if possible)...

Code:
file=“music.mp3"
url="http://www.some.site/here”


curl -s $url/$file | grep 404

  
if [ -f $file ] ; then
     echo " File -> $file <- FOUND!"
     Run_download_script_here
  else
   echo " File -> $file <- Not found!" 
fi
exit 0
Above is not displaying the desired result. I am simply wanting to check that the file is available on the website (for downloading).

I am getting lost with the grep section, possibly the If statement checking for the file showing on the website and whether or not I am required to download the file or is there a command to make sure that the file is there (rather than downloading it).

Hope that makes sense...?

Thank you very much
Cheers
 
Old 07-17-2020, 08:16 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,123

Rep: Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371
probably you need to add -w http_code to curl
 
1 members found this post helpful.
Old 07-17-2020, 09:29 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,382
Blog Entries: 3

Rep: Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773
Also, curl sends the fetched data to stdout. So the output from -w will be appended there unless you do -w '%{stderr} %{http_code}' to redirect to stderr. However, I don't know how to juggle that so you'd get the file redirected from stdin at the same time stderr gets piped to grep.

Another option would be to use wget which will return an error code in the event of a 404 HTTP status code or similar failure.

Code:
#!/bin/sh

file="music.mp3"
url="http://www.some.site/here”

if wget $url/$file; then
     echo " File -> $file <- FOUND!"
     Run_download_script_here
else
     echo " File -> $file <- Not found!" 
fi

exit 0
 
2 members found this post helpful.
Old 07-17-2020, 10:07 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,123

Rep: Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371
curl -O <file> can be useful too
 
1 members found this post helpful.
Old 07-17-2020, 10:24 AM   #5
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,688

Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
I don't know how to juggle that so you'd get the file redirected from stdin at the same time stderr gets piped to grep.
It's sure possible, but rather complicated:
Code:
#!/bin/sh
{
  {
    (echo file; echo 404 >&2) |
      sed 's/^/stdout: /'
  } 2>&1 >&3 3>&- |
    sed 's/^/stderr: /'
} 3>&1

Last edited by shruggy; 07-17-2020 at 01:18 PM.
 
2 members found this post helpful.
Old 07-17-2020, 04:35 PM   #6
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
I am aiming to run the script to see if the file is present on the webpage, not necessarily downloading the file to trigger a command.

With the stdout, assume that would be after the curl line (no output file)?

Let say the file is found (mp3), will curl download the file or simply output some data which we can use a command to say Yes/No in the IF statement?

I don’t understand what the code in #5 is doing and where it should go to try and make it happen?
It looks like it’s doing something when it sees a 404 error?
 
Old 07-17-2020, 05:17 PM   #7
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,765

Rep: Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225Reputation: 2225
In the OP:
Code:
if [ -f $file ]
^^ Isn't this testing for the file name contained in the variable (music.mp3) exists on the local disk in the directory in which the script is running? Has nothing to do with whether a file by that name is found by curl...
 
1 members found this post helpful.
Old 07-17-2020, 05:35 PM   #8
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,688

Rep: Reputation: Disabled
Quote:
Originally Posted by orangepeel190 View Post
I am aiming to run the script to see if the file is present on the webpage, not necessarily downloading the file to trigger a command.
Then perhaps curl -f would suffice?

Quote:
Originally Posted by orangepeel190 View Post
I don’t understand what the code in #5 is doing and where it should go to try and make it happen?
It looks like it’s doing something when it sees a 404 error?
It separately evaluates standard output and standard error. Using your code from #1 it would be something like this
Code:
#!/bin/sh
{
  {
    curl -w '%{stderr} %{http_code}' -s $url/$file >$file
  } 2>&1 >&3 3>&- | grep -q 404 && echo not found
} 3>&1
 
Old 07-17-2020, 05:36 PM   #9
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Thanks scasey,

I was simply having an attempt at some scripting, rather than be “one of those people” that simply asks someone else to do all the work for them.

Yes, I am aware that -f will see if the file is available on the local disc, maybe it was not the best to use that “-f” statement given the request to see if the file is simply available on the website. I was interested to see where the scripting in #5 would best go to give it a go.

The issue that I am seeing is that the script could download a error message imposed as $file which the system will see as a Pass.

Classic example was this morning, I ran a script thinking it was downloading the file, yet when I explored deeper, the file looked like the mp3 file, but the message below: cat $file

Quote:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>509 Bandwidth Limit Exceeded</TITLE>
</HEAD><BODY>
<H1>Bandwidth Limit Exceeded</H1>


The server is temporarily unable to service your
request due to the site owner reaching his/her
bandwidth limit. Please try again later.
</BODY></HTML>
The -f resulted in a “Yes, the file was downloaded”, when it clearly was not the audio file. I am now having to add an external conditional statement in the script to ensure the file is larger than, say, 1Mb. If the file is downloaded and larger than 1Mb, then “Success, the file was downloaded”, if not, delete the small file and Error out - try again later.

It would be good to not have to download the file IF is not the correct file or not even available..... kinda gotten a little bigger problem than a simple download script.

Happy to try as many options as possible to get a functioning script and learn in the process.
Appreciate the feedback and assistance to my steep learning curve....

Last edited by orangepeel190; 07-17-2020 at 09:16 PM.
 
Old 07-17-2020, 05:47 PM   #10
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Thanks shruggy,

I’ll try and pop that into the script as well as develop a IF/THEN statement for the size of the file to try and filter the file correctly.

The curve ball presented this morning with the bandwidth message masquerading as the .mp3 file. I am thinking that a size comparison filter would be the best to help tighten the filter on the correct file download., as typically the error files are smaller than 1Mb and audio files on this site are larger than 3Mb.

I was hoping to have not gone through the download process and filter locally, rather have some funky script that looks at the size of the remote file to make sure that it “appears” to be the correct file (by availability, name and size) and if so, run a command (either download or send email) that the correct remote file appears to be available.

I was thinking either dump the HTML code and use grep to pull out the file name or 404 error, but that technique appears to be flawed with the file masquerading as the audio file but contains the “This server is unavailable....” message this morning - hence now also having to compare file sizes

I am hoping that something would be possible with some crafty but robust scripting and commands in Linux? I’m running it on a Raspberry Pi.
 
Old 07-18-2020, 01:05 AM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,382
Blog Entries: 3

Rep: Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773
Quote:
Originally Posted by orangepeel190 View Post
I was thinking either dump the HTML code and use grep to pull out the file name or 404 error, but that technique appears to be flawed with the file masquerading as the audio file but contains the “This server is unavailable....” message this morning - hence now also having to compare file sizes
Is that information visible in the response headers?

Code:
curl --silent --head ${url}/${file}
wget --quiet --server-response --output-document=/dev/null ${url}/${file}
 
1 members found this post helpful.
Old 07-18-2020, 01:16 AM   #12
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Thanks for your response....
The website issue have since resolved, as it was a hosting issue this morning.... so I cannot answer your question when the server was experiencing issues. Don’t know where there was a bandwidth issue but I sent an email, they confirmed they were having issues and it appears the issues have resolved (more bandwidth?)

I’ve checked the curl output and it would appear to be best running a check based on a header for 404 or returning 200?

Bash Script

Code:
user:~/$ curl --silent —head $url/$bogus_file
HTTP/1.1 404 Not Found
Date: Sat, 18 Jul 2020 06:07:05 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

user:~/$ curl --silent --head $url/$file
HTTP/1.1 200 OK
Date: Sat, 18 Jul 2020 06:07:52 GMT
Server: Apache
Last-Modified: Fri, 17 Jul 2020 07:17:15 GMT
Accept-Ranges: bytes
Content-Length: 5281011
Content-Type: audio/mpeg
How can I grep the header (either 200 or 404) to provide me with an IF/THEN/ELSE/FI option to either run the download if “200” is returned or error out if “404” returned?

It seems like the webserver has a fit this morning which added to my additional question relating to file size

Would something like this potentially work?

Code:
enquiry=$(curl -sLo --head /dev/null -w "%{$url/$file}\n" ${1})
  if [[ $enquiry != 200 ]]; then
    echo "Success  ${enquiry} on ${1}"
    echo “Sending notification email the file is available....”
else
   echo “File is not available.... try again later”
  fi
Would it be —head or -I (capital i (eye))

Or alternate for enquiry != 404 (equating to error or not available)

Close for a bash script?

Last edited by orangepeel190; 07-18-2020 at 01:34 AM.
 
Old 07-18-2020, 03:26 AM   #13
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,382
Blog Entries: 3

Rep: Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773Reputation: 3773
Quote:
Originally Posted by orangepeel190 View Post
Would it be —head or -I (capital i (eye))

Or alternate for enquiry != 404 (equating to error or not available)

Close for a bash script?
Usually there are long options and short options, so with curl it would be matter of style and not substance whether to use -I or --head. Mind the type of dashes though. You need two single dashes (n-dashes) not one m-dash in this particular case.

The conditional statements in shell scripting work on the exit codes of programs or piped chains of programs. So you could do,

Code:
if /usr/bin/true; then
        echo OK
else
        echo Not OK
fi

if /usr/bin/false; then
        echo OK
else
        echo Not OK
fi
And then

Code:
if curl --silent --head $url/$file | grep -q -c 1 -P '^HTTP/\w\.\w\s200\sOK'; then
        echo OK
else
        echo Try later
fi
 
1 members found this post helpful.
Old 07-18-2020, 04:08 AM   #14
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,123

Rep: Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371Reputation: 7371
what about this?
Code:
curl -s --fail -w '%{http_code}\n' $url
 
Old 07-18-2020, 04:29 AM   #15
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
pan64 = that appears to download the file rather than checking if it’s available... is there an alternative to downloading and check via a curl command or result (potentially in the header)?

Last edited by orangepeel190; 07-18-2020 at 05:05 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Chromium - spurious "Webpage not available" cgr Linux - Software 2 11-10-2017 01:25 AM
webpage in webpage? kalleanka Programming 6 06-07-2009 04:13 PM
Any command available to find out the number of inodes available in the system? Marty21 Linux - Newbie 3 01-09-2009 07:31 PM
Can't find the my apache server default webpage Doug.Gentry Linux - General 3 07-29-2006 06:38 AM
webpage of available linux command ? nazib Linux - General 1 12-12-2004 12:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration