I was mirroring a website with the following command:
wget -m -nc -p -E -k -np -e robots=off https://www.somesite.com/ & disown
And everything was going on alright until I saw that it was stuck in
Reusing existing connection to www.somesite.com:443.
and I closed that tty.
What should I do to make it continue?
Here is a part of wget output:
www.somesite.com/.../sport.html [ <=> ] 833.32K 1.53MB/s in 0.5s
Last-modified header missing -- time-stamps turned off.
2018-02-10 16:34:23 (1.53 MB/s) - ‘www.somesite.com/.../sport.html’ saved [853319]
--2018-02-10 16:34:23-- http://www.somesite.com/.../social
Reusing existing connection to www.somesite.com:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘www.somesite.com/.../social.html’
www.somesite.com/.../social.html [ <=> ] 141.35K 816KB/s in 0.2s
Last-modified header missing -- time-stamps turned off.
2018-02-10 16:34:24 (816 KB/s) - ‘www.somesite.com/.../social.html’ saved [144747]
--2018-02-10 16:34:24-- http://www.somesite.com/.../parliament
Reusing existing connection to www.somesite.com:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘www.somesite.com/.../parliament.html’
The command I used is:
wget -m -c -p -E -k -np -e robots=off https://www.somesite.com
Is there no way to instruct wget to not download the same url that it had already downloaded before?
Just run the command again.
wget
is clever enough to continue the download. However, you must specify correct options.For example, remove the
-nc
option if you want to re-download changed files (see also Skip download if files exist in wget?):If the download was interrupted during downloading a large file, you might want to add the
-c
option:Source of quotes:
man wget
You should also consider using
screen
ortmux
instead ofdisown
to be able to check the status and output of your background processes.