8 May 2009

Recursively download a FTP folder with wget

I have been using cURL quite extensively for both, downloading and uploading using command line. But today morning I found myself unable to download a full folder tree over FTP with it.  My need was quite simple. I need to regularly download new and changed files and directories from under a fixed path on a FTP server.

After fiddling around with cURL for a bit, I realized it was simply not meant for the job (Maybe it was just me being unable to do it!).

So here’s the result of the my attempt with an old handy tool – GNU wget in under 5 mins (of which 3.5 were needed to look up switches in the manual):

wget 
   --mirror ftp://ftp.srvr.com/public/docs/reports/mytechieself/
  --user=my_username
  --password=mypassword 
  -N
  -nH
  --cut-dirs=3
  -nv

The switches are explained below

  • --mirror (or –m) = recursively download new and changed files(aka mirroring)
  • -N (or -–timestamping) = Download only new and changed files
  • -nH (or –no-host-directories) = disables creation of host prefixed directories. Using wget in recursive download mode will start by creating a top-level directory named “ftp.srvr.com” (using the above example). This switch stops it from doing so.
  • --cut-dirs = Skips the creation of a given number of parent directories when writing the files to the disk (more info). In our example, it will skip creating public/, docs/ and reports/ directories and start with creating a mytechieself/ directory. This is handy when you want have wget avoid unnecessarily creating intermediate folder structures and directly start from the one that is most relevant. See the manual for a good example

The above command line should work for almost all platforms that wget is available on. It should also be quite simple to wrap a shell script or DOS batch script around this command, add it to a scheduler job and run it periodically to download whatever you need – whenever you need it :-)

Admins may have quickly noticed that this set-up can be used to regularly download stuff like log files, core dumps, reports and what not. All you need is FTP access and wget :-D The wget manual has just such an example.

Further reading on wget

See also

And finally, if you like what you’ve seen do take some time to look at the other excellent projects under GNU.