I've been trying to use ImageMagick with OpenCL to speed up resizing of images in batch.
For this, I've started a GPU instance (g2.2xlarge) on Amazon EC2, which according to AWS, features:
High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory
I've used a specific AMI for GPU instances, namely Amazon Linux AMI with NVIDIA GRID GPU Driver provided by NVIDIA.
With OpenMP
Before compiling ImageMagick from source, as a basis for comparison, I've tried the built-in ImageMagick, that only supports OpenMP:
$ convert --version
Version: ImageMagick 6.7.8-9 2015-10-08 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC
Features: OpenMP
I resized a 50 Mpx JPEG image to 25% of its size, and timed it:
$ time convert -resize 1158x1737 01.jpg 01b.jpg
real 0m1.371s
user 0m5.388s
sys 0m0.204s
I've run it several times to ensure that the timing is consistent (in particular because ImageMagick performs a benchmark of the devices performance on first use).
With OpenCL
I then downloaded the ImageMagick sources, and compiled them:
$ export C_INCLUDE_PATH=/opt/nvidia/cuda/include
$ ./configure --enable-opencl
$ make
I headed to the compiled binaries, and checked that OpenCL was now enabled:
$ ./convert --version
Version: ImageMagick 6.9.2-5 Q16 x86_64 2015-11-08 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC OpenCL OpenMP
Then ran the benchmark:
$ time ./convert -resize 1158x1737 01.jpg 01b.jpg
real 0m2.655s
user 0m1.720s
sys 0m0.928s
Again, I ran it several times to ensure that the timing was consistent.
To my surprise, this is half the speed as the version with OpenMP only.
Trying to make sense of it
As suggested in this StackOverflow answer, I checked the ImageMagick device benchmark file:
$ cat ~/.cache/ImageMagick/ImagemagickOpenCLDeviceProfile
<version>ImageMagick Device Selection v0.9</version>
<device><type></type><name>GRID K520</name><driver>340.32</driver><max cu>8</max cu><max clock>797</max clock><score>0.2780</score></device>
<device><type></type><score>1.4140</score></device>
Note: this file is only created when I run the compiled version of ImageMagick; for some reason, it's not created when I run the version that ships with Amazon Linux.
So as I read it, there are two devices that ImageMagick can use:
- The GPU, recognized as a NVIDIA GRID K520, with a score of 0.278
- An unknown device (the CPU?), with a score of 1.414
So as far as I understand it, the CPU outperforms the GPU here.
Ok, the CPU is not bad (E5-2670 @ 2.60GHz), but the GPU is quite a beast in its domain.
My questions
- How can the compiled ImageMagick version be half as fast as the version that ships with Amazon Linux?
- How can the CPU outperform the GPU in the ImageMagick benchmark?
Any hint would be welcome to regain the expected GPU performance.
When using OpenCL it is not different initialization it is additional initialization; it will always take longer. We have the kernels precompiled of course but just getting the libraries loaded, making the command queues, loading the kernels... it all takes time. It's unfortunate, but "OpenCL mode" is not well suited for that type of one shot command line usage. An application or persistent server that can initialize the ImageMagick library once and make multiple calls into the library will do really well.
You are reading the information wrong. A lower score means the device is faster. The GPU is nearly 6x faster. The term score can be a confusing in this situation so we might want to rename that in a future release of ImageMagick.