I have a couple of PHP applications actually running on Windows 2003 Server. As they are actually using PHP, Mysql and even Apache on Windows, the project is to move them to a new Linux server (Debian based).
But I got a problem with files uploaded by the users when 'special characters' (non-ASCII files, like éèàç) are used for file names (which is regular in French).
For example the file "accusé réception.pdf" is stored like:
$ ls
accus? r?ception.pdf
It seems there is no problem when I upload a new file on the Linux server: the file will be named like that on the fs but the application can find it. The problem is with the content migrated, the file is available but the application can't find it!
I wonder where the problem can come from:
- filesystem table of characters/encoding, I think it comes from here
- the php code of the applications itself, it would be a problem as I can't change it. I can file bug requests but I'm not sure when they'll be fixed.
- another problem
And above all I need to find a way to fix that. As it only happens with migrated data, I could write a script or tune my fs/php/whatever to solve it when putting these applications in production on the Linux server.
Thanks in advance for your help.
Note: when the application can't find a file, my Apache logs are filled with 'readdir() expects parameter 1 to be resource, boolean given in ...' errors
Windows usually uses unicode to encode non-ASCII characters, so if you're using a unicode-locale on your debian server you're set. It doesn't have to be french just because the characters you're trying to use are a french speciality (just tested this, I have my LANG set to en_US.UTF-8 and I can create a file with the name you mentioned ("accusé réception.pdf") and it shows up that way as well.
Chances are the accents are there, they just can't be displayed. To test this theory, replace that "ls" command with "LANG=en_US.UTF8 ls". If it shows up correctly it's just your terminal. Just set your LANG variable in your shell's startup file (eg. .bashrc) or system-wide in /etc/default/locale
I finally found some information about my problem, and a workaround. The company which develops these php applications tells me to use iso-8859-1 to serve the stored files, configuring Apache in that way. It doesn't solve my problem but gives me an idea.
I used convmv http://www.j3e.de/linux/convmv/man/ (thanks to How to tell the language encoding of a filename on Linux?) to convert filenames from utf-8 (the copy to debian makes them utf-8 i think) to iso-8859-1 with
It solves my problem as my applications can now find the stored files (those migrated and new).
The only problem is that on my shell I still don't see correctly the filenames:
But this is a 'minor' problem.
ps: sorry for the original question which did not exposed correctly the problem, and for answering with the solution so late