If I run wget twice, it doesn't recognise that it has already downloaded that file, and creates a new one. Is there any way to prevent it downloading the file again?
$ wget https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png’
...
$ wget https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png.1’
...
(Happy to use curl or a similar scriptable alternative if wget can't do this.)
I suggest you use the
-N
option.It enables time-stamping, which re-downloads the file only if its newer on the server than the downloaded version.
Caveat (from αғsнιη's comment)
If the server is not configured properly, it may always report that the file is new and
-N
will always re-download the file. In this case,-nc
is probably a better option.Yes it's
-c
option.If the file is the same, the second download attempt will stop.
Caveats (from jofel's comments)
If the file has changed on the server, the
-c
option can give incorrect results.With
-c
, wget simply asks the server for any data beyond the part of the already downloaded file, nothing else. It does not check if there was any change in the part of the file that is already downloaded. Thus, you could a corrupted file which is a mixture of the old and new file.Local test
You can test it by running simple local web-server as following(Thanks to @roadmr's answer):
Open a Terminal windows and type:
Now open another Terminal and do:
Note that
filename-to-download
is the file that located in/path/to/parent-download-dir/
which we want to download it.Now if you run wget command for multiple times you will see:
Ok,now go to
/path/to/parent-download-dir/
directory and add something to the source file, for example if it is a text file, add a simple extra line in it and save the file. Now try withwget -c ...
. Great, now you will see the file re-downloads again but you already have downloaded it before.Reason: why re-downloading?
because its size changed to larger size than old downloaded file and nothing else.
Also there is another option called
-nc
for wgetting:When
-nc
option is specified, Wget will refuse to download copies of the same file. If you had the same file thatwget
tries to download, it will refuse to download it unless you rename or remove the local file.Sometimes this option is strongly good and I recommended to use
-nc
option instead of both-c
or-N
option because these options will overwrite the download-file with your local file if they had same names.Caveat (from jofel's comment)
The
-nc
option does not update the file if it has changed on the server. If you know the file will change, the-N
option is preferable. If you know the file will not change (or you don't care) then-nc
is ok.I know this was a specific question concerning wget but the OP did mention "Happy to use curl or a similar scriptable alternative if wget can't do this." I am not sure what the requirement here is (multiple files, keep old version if different from original, replace with newly downloaded version). Depending on what you want and how you want to handle duplicates you may need more than this.. A very simple way to do what you seem to want is simply to use curl instead.
This command will replace the old file with the newly downloaded one every time.
Do not output this to the terminal (without the "> [filename]") if you are downloading a binary file as opposed to text. Doing so will could potentially mess with your terminal session. In the case you do do this on accident you may need to open another shell/terminal session.