I have PHP application that uses pdf2htmlEX and HTMLpurifier to convert pdf documents to text format. Conversion process consist of few steps:
1. uploading book using web browser
2. conversion from pdf to txt using pdf2htmlex
3. processing txt files using HTMLPurifer
For most of the documents everything works properly but for some documents with a lot of pages (more than 230) step 3 fails. While HTMLpurifier processes page it raises error: "PHP Fatal error: Maximum execution time of 0 seconds exceeded". In my configuration max_execution_time is set to 0. I've attached strace to Apache process and here's the output just before termination:
lstat("/tmp/books/3349/html/78.page", {st_mode=S_IFREG|0644, st_size=40165, ...}) = 0
open("/tmp/books/3349/html/78.page", O_RDONLY) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=40165, ...}) = 0
lseek(20, 0, SEEK_CUR) = 0
fstat(20, {st_mode=S_IFREG|0644, st_size=40165, ...}) = 0
read(20, "<div class=\"pd w1 h1\"><div id=\"p"..., 8192) = 8192
read(20, "AACAsAQAAQFgCAAAgLAEAABCWAAAACEs"..., 8192) = 8192
read(20, "7\"><span class=\"_ _1f\"> </span>F"..., 8192) = 8192
read(20, "class=\"_ _8\"> </span>of<span cla"..., 8192) = 8192
read(20, "/span></div><div class=\"t m1 x7a"..., 8192) = 7397
read(20, "", 8192) = 0
read(20, "", 8192) = 0
close(20) = 0
lstat("/tmp/books/3349/text/78.txt", 0x7fff115a43f0) = -1 ENOENT (No such file or directory)
open("/tmp/books/3349/text/78.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lseek(20, 0, SEEK_CUR) = 0
write(20, "66 2. TOPOSESa \357\254\201xed space is a"..., 2157) = 2157
close(20) = 0
lstat("/tmp/books/3349/html/79.page", {st_mode=S_IFREG|0644, st_size=48214, ...}) = 0
open("/tmp/books/3349/html/79.page", O_RDONLY) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=48214, ...}) = 0
lseek(20, 0, SEEK_CUR) = 0
fstat(20, {st_mode=S_IFREG|0644, st_size=48214, ...}) = 0
read(20, "<div class=\"pd w1 h1\"><div id=\"p"..., 8192) = 8192
read(20, "AWAIAACAsAQAAYN5hAoBPSWIEdtXWCAD"..., 8192) = 8192
read(20, "=\"_ _0\"></span>oof<span class=\"f"..., 8192) = 8192
read(20, "c\"></span>).</span></div><div cl"..., 8192) = 8192
read(20, "lass=\"_ _23\"> </span>sho<span cl"..., 8192) = 8192
read(20, "ls0 ws0 r0\">F<span class=\"ff4\"><"..., 8192) = 7254
read(20, "", 8192) = 0
read(20, "", 8192) = 0
close(20) = 0
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
What's interesting - I have two environments in the same system configuration - one in AWS and another VM in VirtualBox. Both have Ubuntu 12.04 + Apache 2.2 + PHP 5.4.13, configuration setting are the same, but problem occurs only on AWS node. Any idea?