I am getting a file with a faroese name and trying to save it in a PHP script:
2010_08_Útflutningur.xls
In Ubuntu 10.04 LTS is saving it as:
2010_08_�tflutningur.xls (invalid encoding)
I've installed and run utf8-migration-tool
, but with no effect.
Is this an Ubuntu error that I can fix or I just have to give up and modify the name in php?
Is there a document which states what is the acceptable charset for a filename in Ubuntu, or what are the encoding specs?
Thanks
By default Ubuntu uses UTF-8 for filenames. Most modern linux distros and many other operating systems do so (Windows/NTFS is the best known exception with UTF-16).
To fix files that have names in the wrong encoding like the one you show, you can try to use
nautilus-filename-repairer
You can use the PHP
iconv
functions to convert strings (filenames) from one encoding to the other. Of course that requires that you know what encoding they are in to begin with.To get correctly encoded filenames from the client, you can try the technique explained by eswald.
This looks like an encoding issue. Unfortunately, PHP needs a bit of hand-holding when it comes to encodings, because its strings are single-byte by default. If you are creating the filename within PHP,
utf8_encode()
should be helpful; note, however, that it assumes ISO-8859-1 encoding for the input.On the other hand, if you are using the filename submitted by a client, perhaps you can request the the client do the encoding for you. That is done with the
accept-charset
attribute of the<form>
tag, and/or by setting the charset of the page that the form is on. Certain clients may use one or the other, so for best results use UTF-8 for each.