LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 07-29-2017, 04:46 AM   #1
rupeshforu3
Member
 
Registered: Jun 2013
Location: India
Distribution: any Linux, BSD, Solaris, sco unixware, Windows 8
Posts: 59

Rep: Reputation: 0
Wink How to copy file properties and directory structure of website without any contents.


Hi I am Rupesh from India. I have examined a website and it contains of 50000 mp3 files of which I want to download 11000 files and I have downloaded 8000 mp3 files and I want to download upto 3000 mp3 files from the same website and discard the remaining files.

I have downloaded the files from net using offline browser called extreme picture founder. In that application it has an option called skip if the destination file exists. I am going to re download the files and select the option skip if the destination file exists. The above application has also has options for scanning the website and spidering etc., of which all options I have understood.

Previously after downloading files using offline browser I have copied the files to another directory and the directory structure was lost but have files in another directories.

As I want to download 11000 files which is of size 135 gb but I have downloaded 93 gb. If I can obtain the directory structure of website and the files with filenames only without any data in it I can maintain the directory structure same as website I mean I can keep file names and directory names same as website.

At present I have installed opensuse leap 42.2 on my system. Upon opening terminal emulator and issuing the command ls -r > filenames.txt I can obtain the list of filenames with directory names. Is there any command or tool to obtain just the filenames and directory names of directory in website and store the output in a text file.

So please suggest a way how to obtain list of directory names and also filenames containing in those directories and store content in text file. If possible can you please suggest how to maintain directory structure same as website and also filenames without any content.

Regards,
Rupesh.
 
Old 07-29-2017, 07:29 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,632

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by rupeshforu3 View Post
Hi I am Rupesh from India. I have examined a website and it contains of 50000 mp3 files of which I want to download 11000 files and I have downloaded 8000 mp3 files and I want to download upto 3000 mp3 files from the same website and discard the remaining files.

I have downloaded the files from net using offline browser called extreme picture founder. In that application it has an option called skip if the destination file exists. I am going to re download the files and select the option skip if the destination file exists. The above application has also has options for scanning the website and spidering etc., of which all options I have understood.

Previously after downloading files using offline browser I have copied the files to another directory and the directory structure was lost but have files in another directories.

As I want to download 11000 files which is of size 135 gb but I have downloaded 93 gb. If I can obtain the directory structure of website and the files with filenames only without any data in it I can maintain the directory structure same as website I mean I can keep file names and directory names same as website.

At present I have installed opensuse leap 42.2 on my system. Upon opening terminal emulator and issuing the command ls -r > filenames.txt I can obtain the list of filenames with directory names. Is there any command or tool to obtain just the filenames and directory names of directory in website and store the output in a text file.

So please suggest a way how to obtain list of directory names and also filenames containing in those directories and store content in text file. If possible can you please suggest how to maintain directory structure same as website and also filenames without any content.
Ok...we'll suggest you write a script to just compare what you want to download with what you've already downloaded. Since you've been here for several years, under both "rupeshforu3" and "rupeshforu", and have asked about scripting/programming for quite some time, this should be fairly easy for you to do:
http://www.linuxquestions.org/questi...eg-4175605139/
http://www.linuxquestions.org/questi...ux-4175516540/
http://www.linuxquestions.org/questi...an-4175442279/
http://www.linuxquestions.org/questi...ml#post4985952

...and since your thread involves what is essentially stealing/copyright violations from a website (by your own admission), I'm reporting this to the moderators for review.
 
Old 08-02-2017, 08:59 AM   #3
rupeshforu3
Member
 
Registered: Jun 2013
Location: India
Distribution: any Linux, BSD, Solaris, sco unixware, Windows 8
Posts: 59

Original Poster
Rep: Reputation: 0
I am not going to steal any others data or harm other's.The site I want to download is a non-profit spiritual website and they are distributing the file's freely. In the website itself they have clearly mentioned that it doesn't contain any copyrighted material and if anyone finds they suggested to complain what they found which is copyrighted. For your reference I am providing the website address below. As the content they provide is not copyrighted anyone can download them.


http://www.pravachanam.com/


Regards,
Rupesh.
 
Old 08-02-2017, 09:05 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,632

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by rupeshforu3 View Post
I am not going to steal any others data or harm other's.The site I want to download is a non-profit spiritual website and they are distributing the file's freely. In the website itself they have clearly mentioned that it doesn't contain any copyrighted material and if anyone finds they suggested to complain what they found which is copyrighted. For your reference I am providing the website address below. As the content they provide is not copyrighted anyone can download them.
Be that as it may...you still haven't shown any work or effort on your part, and you have done this here previously. Again, as you've been told before, we WILL NOT write your scripts for you, but will be happy to help if you're stuck. The first part of this is you posting what you have done/tried on your own, and you have not, despite being asked.

You have been registered here for four years; your previous questions are in this same vein, going back to 2013:
http://www.linuxquestions.org/questi...eg-4175605139/
http://www.linuxquestions.org/questi...re-4175478332/

You said four years ago you were a 'newbie'..that is not the case after four years. So again, you will have to write your own scripts. There are ample tutorials you can find with a simple Google search, scripting examples, etc., which you should be familiar with after four years. Once you have a script written, post it if you can't make it work and we will all be happy to help you.
 
Old 08-02-2017, 11:10 AM   #5
IsaacKuo
Senior Member
 
Registered: Apr 2004
Location: Baton Rouge, Louisiana, USA
Distribution: Debian Stable
Posts: 2,546
Blog Entries: 8

Rep: Reputation: 465Reputation: 465Reputation: 465Reputation: 465Reputation: 465
Quote:
Originally Posted by rupeshforu3 View Post
So please suggest a way how to obtain list of directory names and also filenames containing in those directories and store content in text file.
There is generally no way to get this directly, unless you have ssh or ftp access to the site's files. Rather, you can use a web spider to crawl the web site to see all linked files and all files linked to other pages etc etc etc...

So, you end up downloading the content of the http files, pretty much no matter what.

Not precisely what you asked for, but try something like this:

Code:
wget --spider --recursive --level=inf --no-verbose --output-file=outfile.txt http://www.pravachanam.com/
This will spider through that entire web site, downloading all of the html pages it can find in search of linked files. Then, you can get a list of all of the mp3 files found with:

Code:
cat outfile.txt | grep ".mp3" | awk '{print $4}'
This will give you a list of URLs. You can then further process this list to figure out which ones you already have, and then use some method to download the remaining urls...
 
Old 08-02-2017, 11:18 AM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,305
Blog Entries: 3

Rep: Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720
Quote:
Originally Posted by IsaacKuo View Post
Code:
cat outfile.txt | grep ".mp3" | awk '{print $4}'
Or

Code:
awk '/\.mp3$/ { print $4; }' outfile.txt
 
Old 08-02-2017, 11:38 AM   #7
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,305
Blog Entries: 3

Rep: Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720
Best if you could connect with SSH and then use find or rsync

If you use wget you'll have to download all the HTML files anyway just to be able to follow the links. However you might want to look more closely at some options: --reject, --delete-after, and --recursive
 
Old 08-03-2017, 03:35 AM   #8
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by rupeshforu3 View Post
I have examined a website and it contains of 50000 mp3 files of which I want to download 11000 files and I have downloaded 8000 mp3 files and I want to download upto 3000 mp3 files from the same website and discard the remaining files.
in that case the thread title indicates an x-y-problem - what you THINK is the solution to your problem, isn't.

what you really want is this:
http://dt.iki.fi/download-filetype-website
 
Old 08-08-2017, 01:48 PM   #9
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
Even if the OP isn't technically in violation of the law, I'm sure the site in question doesn't want users using download robots.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to copy directory and file structure with zero length files dchurch315 Linux - Newbie 7 07-25-2016 08:41 PM
Copy Directory Structure Only ronin1 Linux - Newbie 10 06-08-2013 06:23 AM
Convert directory structure from long file names in Linux to DOS 8.3 structure? manorina Linux - Software 5 09-12-2009 09:18 AM
Copy directory structure? tpe Programming 2 06-02-2005 04:59 AM
equivalent of windows directory properties providing basic stats on contents alpha21 Linux - Newbie 9 05-01-2005 08:09 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 05:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration