This is related to this Stack Overflow post:
glob() can't find file names with multibyte characters on Windows?
I'm having issues with PHP and files that have multibyte characters on Windows. Here's my test case:
print_r(scandir('./uploads/'));
print_r(glob('./uploads/*'));
Correct Output on remote UNIX server:
Array
(
[0] => .
[1] => ..
[2] => filename-äöü.jpg
[3] => filename.jpg
[4] => test이test.jpg
[5] => имя файла.jpg
[6] => פילענאַמע.jpg
[7] => 文件名.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
[2] => ./uploads/test이test.jpg
[3] => ./uploads/имя файла.jpg
[4] => ./uploads/פילענאַמע.jpg
[5] => ./uploads/文件名.jpg
)
Incorrect Output locally on Windows:
Array
(
[0] => .
[1] => ..
[2] => ??? ?????.jpg
[3] => ???.jpg
[4] => ?????????.jpg
[5] => filename-äöü.jpg
[6] => filename.jpg
[7] => test?test.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
)
Here's a relevant excerpt from the answer I chose to accept (which actually is a quote from an article that was posted online over 2 years ago):
From the comments on this article: http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php
The output from your PHP installation on Windows is easy to explain : you installed the wrong version of PHP, and used a version not compiled to use the Unicode version of the Win32 API. For this reason, the filesystem calls used by PHP will use the legacy "ANSI" API and so the C/C++ libraries linked with this version of PHP will first try to convert yout UTF-8-encoded PHP string into the local "ANSI" codepage selected in the running environment (see the CHCP command before starting PHP from a command line window)
Your version of Windows is MOST PROBABLY NOT responsible of this weird thing. Actually, this is YOUR version of PHP which is not compiled correctly, and that uses the legacy ANSI version of the Win32 API (for compatibility with the legacy 16-bit versions of Windows 95/98 whose filesystem support in the kernel actually had no direct support for Unicode, but used an internal conversion layer to convert Unicode to the local ANSI codepage before using the actual ANSI version of the API).
Recompile PHP using the compiler option to use the UNICODE version of the Win32 API (which should be the default today, and anyway always the default for PHP installed on a server that will NEVER be Windows 95 or Windows 98...)
I can't confirm whether this is my problem or not. I used phpinfo()
and did not find anything interesting, but I wasn't sure what to look for. I've been using XAMPP for easy installations, so I'm really not sure exactly how it was installed.
I'm using Windows 7, 64 bit - so forgive my ignorance, but I'm not even sure if "Win32" is relevant here. How can I check if my current version of PHP was compiled with the configuration mentioned above?
- PHP Version: 5.3.8
- System: Windows NT WES-PC 6.1 build 7601 (Windows 7 Home Premium Edition Service Pack 1) i586
- Build Date: Aug 23 2011 11:47:20
- Compiler: MSVC9 (Visual C++ 2008)
- Architecture: x86
- Configure Command:
cscript /nologo configure.js "--enable-snapshot-build" "--disable-isapi" "--enable-debug-pack" "--disable-isapi" "--without-mssql" "--without-pdo-mssql" "--without-pi3web" "--with-pdo-oci=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8-11g=D:\php-sdk\oracle\instantclient11\sdk,shared" "--enable-object-out-dir=../obj/" "--enable-com-dotnet" "--with-mcrypt=static" "--disable-static-analyze"
In case it's relevant or reveals any useful information, here's a screen shot of my phpinfo()
(mbstring section):
How can I find out if my PHP install was "compiled with the UNICODE version of the Win32 API"? (and does that actually make any sense?)
I think you should download an oficial binary from PHP Windows repository and install it (take note of the installation path).
After that you will need to configure apache to use the new binary instead of the one it carried by default. It is simple:
Find your
httpd.conf
file in the WAMP folder (something like C:\wamp\bin\apache\ApacheXXX\conf\httpd.conf) - it may be also possible to go through trayicon.Ok, now that you found it locate a string matching
LoadModule php5_module
Good, just replace this line with your new
php5_module
which is probaly in c:/php/php5apache2_2.dll (you saved the installation path!). Resulting in something likeLoadModule php5_module "c:/php/php5apache2_2.dll"
Voila. Reset wamp server and test your application with the lastest version of php build specially for windows.
I'm not sure this will solve your problem but surely is a real way to go. If you have problems on the php setup, read this article.
Good luck!
Here is some code I worked on to handle a
mbstring
problem I was running into. I ended up iterating through every combination of encodings and options until one of them presented the output I needed. I have the feeling this kind of procedure might help you find the answer you're seeking.Do not rely upon documentation, as in my case, the results were not what I thought the options and encodings would do. I recall in my testing, I would get the rectangles, ?s, and things like A~. My testing was exactly as yours,
print_r
the info. In my case, my script is importing customer and sales info into Quickbooks, which cannot handle UTF-8. (Either QB itself can't or the QODBC Driver can't) Tildes, graves, and umlats are out of the question.That link above is http://www.php.net/manual/en/function.mb-detect-encoding.php#89915 and if Google finds you here, definitely go read that.
It seems as though this question has been out there for a while and whether or not php was compiled with unicode flags does not affect it's unicode support, but if you need to determine whether a given PE image was likely compiled against the Unicode version of the Windows API, you can use
dumpbin
to examine the kernel32.dll imports used. This is not exactly something I would do pragmatically, but in a pinch, could work for diagnostics.For example, a Unicode executable could list:
noting the number of functions ending in W, aka Wide for unicode characters.
For a ANSI executable or DLL, you may see something closer to:
with most of the functions ending in A, we can see the executable was most likely compiled with ANSI flags.
I believe you'll want to check to see if PHP was compiled with mbstring (or has the mbstring module installed and enabled if you're using modules). Having that extension enabled should solve your issues. This page should tell you everything you need to know to get it working.