News Flash Archive: Apache

A Better Way to Handle Images in Drupal

February 1st, 2008 by Andy

Warning: Drupal nerddom ahead! (I spent enough time on this issue that I wanted to post this write-up in the hopes that it’ll help someone else out there…)

Though I’ve recently gushed about how version control has helped us upgrade Drupal much more easily, I had a bear of a time getting one site upgraded recently. The site is very simple as Drupal goes, with only a handful of contributed modules, but a decent amount of content, including lots of images. After struggling with issues related to both the Image and Image Assist modules, I finally decided to just bite the bullet and make some radical changes to the way the site handles images.

This site just uses images to be displayed inline with other content (and doesn’t use any other file uploads). So i’d been planning to make these changes due to my agreement with the Drupal prevailing winds that such images should not be their own nodes, nor do they require the overhead of “private” Drupal files.

I made these changes:

  • switched from “public” to “private” files
  • dropped the Image and Image Assist modules in favor of IMCE

In a nutshell, the process went like this:

  1. change the Drupal download method and file system path
  2. run a custom converter script I cooked up to change Image Assist’s “bbcode”-style markup to xhtml
  3. add a couple of Apache rewrite rules to keep the old image URLs from breaking
  4. switch from Image and Image Assist to IMCE

Private to Public

The changes to Drupal’s download method (from “private” to “public” files), and file system path (changed to a web-accessible path relative to the Drupal install) were smoother than I’d anticipated. The database revealed no instances of the old path other than the files table, which I’d address after changing image-handling modules. So essentially i just needed to move the files on the server, change Drupal’s settings, and clear Drupal’s cache.

Image Converter Script

The image converter script was obviously what took all the time. I wrote a script that bootstrapped Drupal, ran through every node in the database that was using Image Assist’s bbcode-like markup, and converted each instance of that markup to xhtml, doing its best to get the proper image dimensions as it went.

Special care had to be taken with Drupal’s input formats and desired image sizes. Specifically, I needed to anticipate what xhtml tags and attributes would offer the most flexibility (e.g. image captions and varying image sizes…) while still working within a reasonable HTML filter.

The script is pretty clunky, but got the job done.

mod_rewrite

Next I enlisted Apache’s mod_rewrite to help keep old image URLs (e.g. all of those indexed and cached by search engines) from breaking. This consisted of two parts: 1) rewrite all requests for the old private image paths to the new public ones; and 2) rewrite all the old ‘+’ marks Drupal had used to represent spaces in image names for the old private URLs to the proper ‘%20′ encoding required by the new public URLs (and remind the site editors that image file names should not include spaces!).

The first rewrite was simple — I just added the following line to Drupal’s .htaccess file:

RewriteRule ^system/files/images/(.*)$ /files/images/$1 [R=301,L]

The second one was more complex… A big hat tip goes to dkg for supplying the solution. I added the following lines to the .htaccess file in the new files directory:

RewriteCond %{SCRIPT_FILENAME} ^.*\+.*$
RewriteRule ^([^+]*)\+(.*)$ $1\%20$2 [N]

Here’s dkg’s explanation:

the [first] line says “if there is a plus in the filename, apply the next rule”

and the rule itself says:

“match the first bunch of text that is not a +, then match the plus, then match everything else. keep the first bunch of stuff ($1), put in a literal %20 (we need the backslash or the % is interpreted as a backreference to the RewriteCond itself), and then put in the trailing business.”

the [N] says “now restart the rewriting from the top again with the new string.” This is useful in case there is more than one + character, since each application of the RewriteRule only replaces a single character.

Switch to IMCE

Next, I switched the contributed image-handling modules. I installed and configured IMCE, and made sure I could see all of the previous images. Then I removed and uninstalled Image and Image Assist, and related settings/data like the image gallery vocabularies, any remaining image node path aliases, and some leftover image records from the files table in the database.

Finally, I was ready to try the Drupal upgrade again. No problems this time ;)

Many thanks go to all of the awesome Drupal developers who’ve contributed untold hours to the Drupal community!