LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Effective FTP client for mass-download for Linux? (http://www.linuxquestions.org/questions/linux-software-2/effective-ftp-client-for-mass-download-for-linux-673519/)

Sheridan 10-01-2008 07:43 AM

Effective FTP client for mass-download for Linux?
 
Hello folks,

I hope you can help me - I was looking for a solution for a long time, but anything I found didn't do me too much good...

I have to transfer a _lot_ of files of various size from a remote server on the other side of the world (literally) which has not so big bandwidth, unrelyable connection, etc. and the only possible method of file transfer is passive mode FTP (not possible to use anything else).

The server's root FTP folder stores about 14000 subdirectories, each of which contains lots of small files, some big ones, and there're some, which cannot be downloaded at all (permission issues).

The owner of the server provided access to our company trough FTP only and is unwilling/unable to provide a more useful access.

I need an FTP client which can reach at least the following goals:

1) Not bothered by unreliable connection, stupid server (drops connection or logs you out seemingly randomly), frozen transfers, etc etc.
2) Able to download several (10+) files in parallel (effectiveness)
3) Handles passive mode
4) Can "auto-skip" files with screwed-up names (has "?" in filename, etc), ignores permission errors, etc

Can be either command line or graphic.

What would you suggest (I tried wget, mc and ftp cli so far, but was very disappointing experience)?

Any comment is greatly appreciated.

Levente

indienick 10-01-2008 08:39 AM

Well, I use GFTP when I'm in an X session, as for when I build a TGZ package from a Slackbuild, I back it up onto the FTP server on my website provider. I usually do this once every two weeks, or so, and GFTP prompts me to overwrite or skip the transfer of a file if it exists on the remote location (or local location, depending on the direction of transfer).

Thankfully, it just prompts the overwrite/skip options with a list of the files in question, instead of one-by-one interactive prompting.

If you're looking for something from the command line, I use (almost exclusively) LFTP, but I have never messed around with situations where "if the file exists on the far end, and it is of the same size and modification date, skip it".

I don't know about "stupid server" allowances though, and automatically re-connecting lost connections with either client, though.

ilikejam 10-01-2008 08:57 AM

Hi.

wget does everything you need except for parallel downloads. It should retry after lost connections up to 20 times, it escapes weird characters in filenames (so it'll download the files, but you shouldn't get any breakage from illegal characters in the filename), and it'll do passive FTP.

Dave

theYinYeti 10-01-2008 08:57 AM

I would suggest “lftp”.

Yves.

i92guboj 10-01-2008 10:55 AM

Quote:

Originally Posted by Sheridan (Post 3296967)
1) Not bothered by unreliable connection, stupid server (drops connection or logs you out seemingly randomly), frozen transfers, etc etc.

Quote:

Originally Posted by wget man page
Wget has been designed for robustness over slow or unstable network connec‐
tions; if a download fails due to a network problem, it will keep retrying
until the whole file has been retrieved. If the server supports regetting,
it will instruct the server to continue the download from where it left off.

Quote:

2) Able to download several (10+) files in parallel (effectiveness)
This one is not supported. However, to write a wrapper for such purpose should be trivial enough even using shell scripting. There's a simple way around this as well: put all the download links on a text file, then split it on 4 files (for example, any number will do). Then use wget -i four times on four xterms, once for each file. You will get 4 concurrent wgets working and each one of them will download a separate list of urls. Simple, clean, efficient.

Quote:

3) Handles passive mode
The hard thing is to find an ftp/dl client that doesn't do it. Of course, wget does by default.

Quote:

4) Can "auto-skip" files with screwed-up names (has "?" in filename, etc), ignores permission errors, etc
Wget can use -i to read urls from a file, if a given file can't be retrieved due to an error, the following will be downloaded instead. You can use --tries=number to retry a given number of times, or 0 for infinite. The default is 20 times. Even if you use 0, it will still skip if the error is critical, so you shouldn't have any problem at all about that.

Quote:

What would you suggest (I tried wget, mc and ftp cli so far, but was very disappointing experience)?
Explain why. If the server is completely screwed ALL the clients will deceive you, it doesn't mind how good and/or complete they are.

On the other side, there's axel, though I found it to be a bit unstable on certain circumstances (but it can do threaded downloads).

If you prefer something graphical, there's kget and d4x, I have no idea how solid or good they are. I only use graphical tools when I have no option, or when the command line counterpart is insanely complicated, which is not the case.

jlinkels 10-01-2008 08:10 PM

wget was the one first coming to my mind as well.

Are you *sure* you need parallel downloads? Parallel downloads is mostly relevant if you have a server which limits the bandwidth per connection. However, I have the idea that you are dealing with a server which has low bandwidth anyway.

If so, if your client uses parallel downloads, you'll share the available bandwidth over your downloads slowing down each individual download, the sum remaining equal. This is not true of course if you have to share this bandwidth with others, then it pays if you have say, 4 parallel downloads and the other client has only 1 download. Until he uses a download manager as well of course creating multiple streams.

jlinkels

i92guboj 10-01-2008 11:21 PM

Quote:

Originally Posted by jlinkels (Post 3297542)
wget was the one first coming to my mind as well.

Are you *sure* you need parallel downloads? Parallel downloads is mostly relevant if you have a server which limits the bandwidth per connection. However, I have the idea that you are dealing with a server which has low bandwidth anyway.

If so, if your client uses parallel downloads, you'll share the available bandwidth over your downloads slowing down each individual download, the sum remaining equal. This is not true of course if you have to share this bandwidth with others, then it pays if you have say, 4 parallel downloads and the other client has only 1 download. Until he uses a download manager as well of course creating multiple streams.

jlinkels

Yes. And even more, more threads is more server load, and less bandwidth for thread can make it even worse, becase connections will have a much bigger chance to fail starved. As I said above, if it's a crappy server, you are not going to fix it with a download manager, unless it's a very specific issue like the one you describe (limited bandwidth per connection).

Wget and curl are very solid programs. You can find nicer ones, but hardly better ones for that task.

Sheridan 10-02-2008 04:49 AM

Dear Folks,

Thank you so much for all the replies. You gave me some stuff to test, and for that I'm very grateful. I'll make sure to tell which has worked out best, but for now I think lftp will be the winner...

Anyway...

Some of you asked why I had problems with wget. Well maybe I'm just green, but here's the problem:

I used the following syntax to mirror the directories onto the local server:

Code:

wget --mirror ftp://ftp.somesite.net/bigdirectory -o /home/me/xferlog
The result is that in the current directory I get some empty folders, but no files, and wget exits after a few passes. No apparent errors.

Levente


All times are GMT -5. The time now is 03:18 PM.