LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-03-2019, 12:22 PM   #16
peter7089
Member
 
Registered: May 2016
Distribution: MX Linux
Posts: 249

Original Poster
Rep: Reputation: Disabled

Quote:
Originally Posted by l0f4r0 View Post
Ok, you can get rid of the directory structure with switch --no-directories as some members told you before.
However, my command should work nonetheless (from my side it's OK with a links.txt file full of links). So please provide the following outputs:
Code:
wget --version
wget -r -A "*.jpg,*.jpeg" --ignore-case --spider --no-directories http://www.slackware.com/ 2>&1
wget -r -A "*.jpg,*.jpeg" --ignore-case --spider --no-directories http://website.com/dir1/ 2>&1
curl http://website.com/dir1/
NB: regarding the last 2 commands, you might want to anonymize the outputs and give us bogus URLs instead. Actually, I'm only interested in seeing the global look of the output and noticing if there is any URLs pointing to .jpg/.jpeg resources...
The wget version is 1.18, but i am not sure how to post the output for the two wget commands. I tried exporting it to text file but it didn't worked.
 
Old 01-04-2019, 01:09 AM   #17
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
^ Redirections can be tricky sometimes
Try this:
Code:
wget -r -A "*.jpg,*.jpeg" --ignore-case --spider --no-directories http://www.slackware.com/ &>wgetSlackware.txt
wget -r -A "*.jpg,*.jpeg" --ignore-case --spider --no-directories http://website.com/dir1/ &>wgetWebsite.txt
curl http://website.com/dir1/ >curlWebsite.txt
Then attach those 3 files in a new post (anonymize their content if need be).
 
Old 01-04-2019, 03:18 AM   #18
peter7089
Member
 
Registered: May 2016
Distribution: MX Linux
Posts: 249

Original Poster
Rep: Reputation: Disabled
This time it worked. But i found that wget gets out of the directory that is scraping. If the url is http://website.com/dir1/ it goes to http://website.com/ when finishing parsing the links in http://website.com/dir1/. I don't know if this is normal behavior though.

These are the files:

wgetSlackware.txt

curlWebsite.txt

wgetWebsite.txt
 
Old 01-04-2019, 06:57 AM   #19
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
^ Ok, commands are well processed. It's just hard for me to analyze their outputs as you have anonymized them quite a lot!
Yes, it's normal that wget downloads resources from outside folder dir1 because of its recursive mode enabled. You can activate --level=1 and/or --no-parent options if you want to disable that behavior. Is it better now?

Last edited by l0f4r0; 01-04-2019 at 06:58 AM.
 
Old 01-04-2019, 12:34 PM   #20
peter7089
Member
 
Registered: May 2016
Distribution: MX Linux
Posts: 249

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by l0f4r0 View Post
^ Ok, commands are well processed. It's just hard for me to analyze their outputs as you have anonymized them quite a lot!
Ok, no problem. I still learned some new things.
 
Old 01-05-2019, 04:04 AM   #21
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
If your problem has been resolved, please mark your thread as such (see HOWTO in my sig).
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
text files in folder and display them as links on html page using AWK eiefriends8 Linux - Newbie 12 04-26-2017 11:44 AM
[SOLVED] Extracting every nth line in a text file to a new text file? paradeboy Linux - General 4 03-29-2012 10:03 PM
web page/links links/links vendtagain Linux - Newbie 2 09-19-2009 08:13 PM
extracting a chunk of text from a large text file lothario Linux - Software 3 02-28-2007 08:16 AM
links (hard links and soft links..) sachitha Programming 1 08-10-2005 12:10 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration