Sometimes a link has unicode characters in it, such as http://www.example.com/файл.zip
If you point your browser to it, it will properly prompt you to download the file as файл.zip. But if you try to do it with wget
, the file comes with a mix of ?, percent encoding (like %D0%BB) and the (invalid encoding)
string after the filename.
What parameters can I add to wget, or any other command line tricks, so that it behaves as Chrome and Firefox and saves the file exactly as specified in the rendered link - in this case, as файл.zip?
The solution should work without having to explicitly write it in the command, so an explicit wget -O файл.zip http://www.example.com/файл.zip
is not a good solution.
I realize that as soon as you run wget http://www.example.com/файл.zip
it tries to retrieve http://www.example.com/%D1%84%D0%B0%D0%B9%D0%BB.zip, that is, it converts the link to percent encoding, which may be the reason why it saves it doesn't render the filename "properly".
I posted a somewhat related question here, whose answer may or may not be of help to this one.
For wget, you can use:
if your system can handle UTF-8 or other encoding properly.
Finally, if you still have those % symbols left in your downloaded file, you can use Python module
urllib.unquote(filename)
that will replace %xx escapes by their single-character equivalent.You can use curl instead, as follow
It will save it to файл.zip.
I couldn't find a way to solve this issue with
wget
but could successfully transfer the files with Midnight Commander.My answer is similar to the one posted by Balaji Purushotham.
I had to add
.parse
to get this working in python: