I need another pair of eyes to help debug an issue. This will be a long message. I'm going to provide a lot of context because I'm not sure which details might be important.
We have two servers: dev, and production. The existing servers are running RHEL7 and due to hit end of life. I am in the process of building two new RHEL9 VMs to take their place.
The servers host a variety of apps. They're independent of one another and each resides in its own subdirectory off the document root. The document root is located in /var/www/html.
The one that's giving me trouble is a fresh, shiny installation of Drupal 10.2. I installed it to /var/www/drupal using composer, and then created a symbolic link at /var/www/html/databases that points to /var/www/drupal/web. I had to adjust the permissions a bit to make sure the Apache user could do everything it needed to, but pretty soon I had a functional installation of Drupal on both dev and prod.
Then the one on dev stopped working, shortly after I cloned a copy of our custom theme to the themes folder. When I visit, I get this error message:
The website encountered an unexpected error. Try again later.
The ssl_access_log shows this error:
10.133.0.30 - - [21/Feb/2024:13:55:07 -0600] "GET /databases/ HTTP/1.1" 500 61
Oh boy, a 500 error message. The error message that could mean literally anything and therefore means nothing at all. Terrific.
I set my LogLevel to debug, reproduced the issue and checked the ssl_error_log, where it reports the following:
[Wed Feb 21 13:55:07.118063 2024] [ssl:info] [pid 7769:tid 7944] [client 10.133.0.30:64845] AH01964: Connection to child 81 established (server undcflwebdev0.und.edu:443)
[Wed Feb 21 13:55:07.118300 2024] [ssl:debug] [pid 7769:tid 7944] ssl_engine_kernel.c(2391): [client 10.133.0.30:64845] AH02043: SSL virtual host for servername undcflwebdev0.und.edu found
[Wed Feb 21 13:55:07.118492 2024] [core:debug] [pid 7769:tid 7944] protocol.c(2460): [client 10.133.0.30:64845] AH03155: select protocol from , choices=h2,http/1.1 for server undcflwebdev0.und.edu
[Wed Feb 21 13:55:07.120744 2024] [ssl:debug] [pid 7769:tid 7944] ssl_engine_kernel.c(2251): [client 10.133.0.30:64845] AH02041: Protocol: TLSv1.3, Cipher: TLS_AES_256_GCM_SHA384 (256/256 bits)
[Wed Feb 21 13:55:07.120862 2024] [socache_shmcb:debug] [pid 7769:tid 7944] mod_socache_shmcb.c(508): AH00831: socache_shmcb_store (0x16 -> subcache 22)
[Wed Feb 21 13:55:07.120889 2024] [socache_shmcb:debug] [pid 7769:tid 7944] mod_socache_shmcb.c(745): AH00842: expiring 11 and reclaiming 0 removed socache entries
[Wed Feb 21 13:55:07.120901 2024] [socache_shmcb:debug] [pid 7769:tid 7944] mod_socache_shmcb.c(765): AH00843: we now have 0 socache entries
[Wed Feb 21 13:55:07.120911 2024] [socache_shmcb:debug] [pid 7769:tid 7944] mod_socache_shmcb.c(862): AH00847: insert happened at idx=0, data=(0:32)
[Wed Feb 21 13:55:07.120917 2024] [socache_shmcb:debug] [pid 7769:tid 7944] mod_socache_shmcb.c(865): AH00848: finished insert, subcache: idx_pos/idx_used=0/1, data_pos/data_used=0/230
[Wed Feb 21 13:55:07.120923 2024] [socache_shmcb:debug] [pid 7769:tid 7944] mod_socache_shmcb.c(530): AH00834: leaving socache_shmcb_store successfully
[Wed Feb 21 13:55:07.121011 2024] [ssl:debug] [pid 7769:tid 7944] ssl_engine_kernel.c(415): [client 10.133.0.30:64845] AH02034: Initial (No.1) HTTPS request received for child 81 (server undcflwebdev0.und.edu:443)
[Wed Feb 21 13:55:07.121357 2024] [authz_core:debug] [pid 7769:tid 7944] mod_authz_core.c(815): [client 10.133.0.30:64845] AH01626: authorization result of Require all granted: granted
[Wed Feb 21 13:55:07.121390 2024] [authz_core:debug] [pid 7769:tid 7944] mod_authz_core.c(815): [client 10.133.0.30:64845] AH01626: authorization result of <RequireAny>: granted
[Wed Feb 21 13:55:07.121491 2024] [authz_core:debug] [pid 7769:tid 7944] mod_authz_core.c(815): [client 10.133.0.30:64845] AH01626: authorization result of Require all granted: granted
[Wed Feb 21 13:55:07.121514 2024] [authz_core:debug] [pid 7769:tid 7944] mod_authz_core.c(815): [client 10.133.0.30:64845] AH01626: authorization result of <RequireAny>: granted
[Wed Feb 21 13:55:07.121680 2024] [proxy:debug] [pid 7769:tid 7944] mod_proxy.c(1520): [client 10.133.0.30:64845] AH01143: Running scheme unix handler (attempt 0)
[Wed Feb 21 13:55:07.121703 2024] [proxy_ajp:debug] [pid 7769:tid 7944] mod_proxy_ajp.c(795): [client 10.133.0.30:64845] AH00894: declining URL fcgi://localhost/var/www/html/databases/index.php
[Wed Feb 21 13:55:07.121719 2024] [proxy_fcgi:debug] [pid 7769:tid 7944] mod_proxy_fcgi.c(1069): [client 10.133.0.30:64845] AH01076: url: fcgi://localhost/var/www/html/databases/index.php proxyname: (null) proxyport: 0
[Wed Feb 21 13:55:07.121736 2024] [proxy_fcgi:debug] [pid 7769:tid 7944] mod_proxy_fcgi.c(1078): [client 10.133.0.30:64845] AH01078: serving URL fcgi://localhost/var/www/html/databases/index.php
[Wed Feb 21 13:55:07.121758 2024] [proxy:debug] [pid 7769:tid 7944] proxy_util.c(2568): AH00942: FCGI: has acquired connection for (*:80)
[Wed Feb 21 13:55:07.121776 2024] [proxy:debug] [pid 7769:tid 7944] proxy_util.c(2626): [client 10.133.0.30:64845] AH00944: connecting fcgi://localhost/var/www/html/databases/index.php to localhost:8000
[Wed Feb 21 13:55:07.121792 2024] [proxy:debug] [pid 7769:tid 7944] proxy_util.c(2662): [client 10.133.0.30:64845] AH02545: fcgi: has determined UDS as /run/php-fpm/www.sock
[Wed Feb 21 13:55:07.122008 2024] [proxy:debug] [pid 7769:tid 7944] proxy_util.c(2852): [client 10.133.0.30:64845] AH00947: connected /var/www/html/databases/index.php to httpd-UDS:0
[Wed Feb 21 13:55:07.122082 2024] [proxy:debug] [pid 7769:tid 7944] proxy_util.c(3222): AH02823: FCGI: connection established with Unix domain socket /run/php-fpm/www.sock (*:80)
[Wed Feb 21 13:55:07.150902 2024] [proxy:debug] [pid 7769:tid 7944] proxy_util.c(2584): AH00943: FCGI: has released connection for (*:80)
[Wed Feb 21 13:55:07.151140 2024] [ssl:debug] [pid 7769:tid 7944] ssl_engine_io.c(1151): [client 10.133.0.30:64845] AH02001: Connection closed to child 81 with standard shutdown (server undcflwebdev0.und.edu:443)
There aren't any obvious smoking guns, but I think the problem might be this line:
[Wed Feb 21 13:55:07.121703 2024] [proxy_ajp:debug] [pid 7769:tid 7944] mod_proxy_ajp.c(795): [client 10.133.0.30:64845] AH00894: declining URL fcgi://localhost/var/www/html/databases/index.php
If I understand correctly, FCGI is the interface between Apache and PHP. If FCGI is declining the connection, then the PHP script won't be interpreted and everything grinds to a halt.
We have SELinux installed, and it often causes problems, so I temporarily turned it off. But that didn't help -- the problem occurs even when SELinux is been put in permissive mode. So I don't think it's an SELinux issue.
Everything else on the server is working fine. I have multiple PHP apps living in their own folders that work without a hitch. So I reasoned perhaps it's something about the symbolic link. I deleted the link and replaced it with an ordinary folder named "databases" containing an index.html file that says "Hello world". It worked great. I renamed that index.html to index.php and it immediately failed with the same error messages as above. On a whim, I copied that index.php file to "test.php" and accessed that. And it worked fine!
Meanwhile, every other PHP app on the machine continues to work smoothly. Also, the installation of Drupal on the production server is running just fine despite being in a symbolic link. As far as I can tell, the problem only occurs for the exact path /var/www/html/databases/index.php, regardless of whether databases is an actual folder or a symbolic link. Literally every other PHP file on the server works fine.
I did some googling. I did not find anyone who ran into quite the same problem. I did find some people reporting the error code AH00894 on servers with reverse proxies in place. That gave me pause, because I do indeed have a reverse proxy set up. One of the other apps on the server is ancient, crufty and annoyingly irreplaceable. It won't run in anything newer than PHP 5.6. So I've got a docker container running providing a PHP 5.6 environment. Apache is configured to talk to the docker instance through a reverse proxy, thus:
# Listen to all traffic on port 80.
<VirtualHost *:80>
# Keep the original host name.
ProxyPreserveHost On
# Redirect all traffic for the /archon folder to :8080 instead.
ProxyPass /archon http://localhost:8080/
ProxyPassReverse /archon http://localhost:8080/
</VirtualHost>
And again in ssl.conf I have the following bits:
SSLProxyEngine On
ProxyPreserveHost On
# Prevent SSL offloading
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set X-Forwarded-Port "443"
# Redirect all traffic for the /archon folder to :8080 instead.
ProxyPass /archon http://localhost:8080/
ProxyPassReverse /archon http://localhost:8080/
The proxied app works fine. I can go to https://dev.example.com/archon/ and it loads up. But I wondered if maybe the proxying is somehow interfering with the drupal instance.
So I disabled all of the proxy stuff. I commented out the virtualhost and the all the Proxy config lines in ssl.conf, and restarted Apache. I verified that archon stopped working as expected, and then tried my Drupal. It generated the exact same error. So I turned all the proxy stuff back on.
At this point I've been banging my head against this problem for two days. If I've read the logs correctly, the problem is that FCGI is refusing the connection. But I don't know WHY it's refusing. Why do all of the other PHP apps on the server work fine? The production server is configured identically to the dev server, but production works and dev doesn't. Why? Every other PHP file on the server works like a charm. Why? What is it about this one URL that consistently dies when everything else works?
I could maybe rename the index.php file to something else and then use a .htaccess file to identify that file name as the index of the folder. That might be a workaround. But it's awfully awkward, and I worry that renaming Drupal's core file would cause unexpected issues. I don't know if any of Drupal's code has that file name hard coded anywhere.
I'm stuck. I don't know what else to try to figure out what the hell is wrong with the wretched thing. The new servers are supposed to go live Monday the 26th. I am rapidly running out of time, and this is my last blocker. Any suggestions would be appreciated.