I'm trying to download winamp's website in case they shut it down. I need to download literally everything.
I tried once with wget
and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?
You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:
With HTTrack, first install it:
now run it just 1 external link:
This will download the winapp CDN files, but not the files in the files in the files in the whole internet.
From
man wget
sorry for my bad indentation :(
This is the most effective and easy way I've found to create a complete mirror of a website that can be viewed locally with working scripts, styles, etc:
Using
-m
(mirror) instead of-r
is preferred as it intuitively downloads assets and you don't have to specify recursion depth, using mirror generally determines the correct depth to return a functioning site.The commands
-p -E -k
ensure that you're not downloading entire pages that might be linked to (e.g. Link to a Twitter profile results in you downloading Twitter code) while including all pre-requisite files (JavaScript, css, etc.) that the site needs. Proper site structure is preserved as well (instead of one big .html file with embedded scripts/stylesheet that can sometimes be the output.It's fast, I have never had to limit anything to get it to work and the resulting directory looks better than simply using the
-r "url"
arg and provides better insight into how the site was put together, especially if you're reverse-engineering for educational purposes.If you end up getting kicked from the site's IP, or the download stops, try running the same command, but with:
--wait="duration"
enabled. This adds a duration between requests so as not to trigger any DDoS flags on their end.Note that if you're downloading a web-app or a site with lots of JavaScript that was compiled from TypeScript, you won't be able to get the TypeScript that was used initially, only what is compiled and sent to the browser. Take this into consideration if the site is very script heavy.
Try:
wget -r --no-parent http://www.mysite.com/dict
r means recursively
If You want to download everything associated with the link you have You can try this
You may wanna use
--wait="duration"
to avoid your ip being blocked. Its weird requesting page after page without wait periods. that's not human