Reinventing gifsicle in Rust

Disclaimer: I hate writing. I’m using AI to get my ideas onto paper. The opinions, experience, and numbers are mine. The grammar is not.

I resize a lot of GIFs. At Rave we process user-uploaded content at scale, and gifsicle is the tool everyone reaches for. It’s been around forever, it works, and it’s… fine. Until you’re staring at a 34MB GIF waiting for gifsicle to grind through 367 frames and wondering why this takes 23 seconds in 2025.

So over the holiday break, instead of doing something reasonable, I paired up with Claude and wrote a GIF processing library in Rust. Three days later, rusticle was resizing GIFs 3-6x faster than gifsicle while producing comparable or smaller output files. AI handled a lot of the implementation grunt work, but the architectural decisions and performance insights were the human side of the equation.

The speedup didn’t come from where you’d expect. It wasn’t “Rust fast, C slow.” The biggest wins came from rethinking what work actually needs to happen when you resize a GIF.

The Baseline: Just Do It In Rust

Day one was boring in the best way. Wire up fast_image_resize (which uses AVX2/NEON internally) for the actual pixel resizing, use the gif crate for decode/encode, and imagequant (the same engine behind pngquant and gifski) for color quantization. Throw jemalloc on top because GIF processing is allocation-heavy.

This already beat gifsicle. Not dramatically, but noticeably. The problem was the encode step. Resizing is fast. Quantizing every frame through imagequant is not.

Here’s what the GIF resize pipeline looks like:

┌──────────┐    ┌──────────┐    ┌──────────────┐    ┌──────────┐
│  Decode  │───▶│  Resize  │───▶│  Quantize    │───▶│  Encode  │
│  GIF     │    │  Frames  │    │  RGBA → 256  │    │  GIF     │
│          │    │  (fast)  │    │  (SLOW)      │    │          │
└──────────┘    └──────────┘    └──────────────┘    └──────────┘
                  ~10% time        ~80% time           ~10% time

The quantization step dominates. Every frame gets fed through imagequant, which runs a median-cut algorithm, builds an optimal 256-color palette, applies Floyd-Steinberg dithering, and remaps every pixel. It’s excellent quality. It’s also doing way more work than we need for a resize operation.

The Insight: You Already Have a Palette

Here’s the thing about resizing a GIF. The source GIF already has a carefully chosen palette (either a global palette or per-frame local palettes). When you resize, the colors don’t fundamentally change. You’re interpolating between existing pixels, and the resulting colors are blends of colors that were already representable in the original palette.

So what if we just… skip quantization entirely and map the resized pixels back to the original palette?

Standard Pipeline:
┌─────────┐    ┌──────────┐    ┌───────────────────────┐    ┌─────────┐
│ Resized │───▶│imagequant│───▶│ New 256-color palette │───▶│ Indexed │
│ RGBA    │    │ (slow)   │    │ + dithered indices    │    │ pixels  │
└─────────┘    └──────────┘    └───────────────────────┘    └─────────┘

Fast Path:
┌─────────┐    ┌──────────┐    ┌───────────────────────┐    ┌─────────┐
│ Resized │───▶│Palette   │───▶│ Original palette      │───▶│ Indexed │
│ RGBA    │    │ LUT (O1) │    │ + nearest-neighbor    │    │ pixels  │
└─────────┘    └──────────┘    └───────────────────────┘    └─────────┘

The naive approach would be: for each pixel, scan all 256 palette entries, compute the Euclidean distance in RGB space, pick the closest one. That’s O(pixels × 256), which isn’t great when you’ve got a 640×480 frame with 307,200 pixels.

The Palette LUT: Trading 262KB For O(1) Lookups

Instead of computing distances at encode time, I precompute a lookup table that covers the entire RGB color space at reduced precision. Chop each 8-bit channel down to 6 bits, giving you a 64×64×64 table with 262,144 entries. Each entry stores the index of the nearest palette color for that region of color space.

Full RGB Space (256³ = 16.7M colors)
┌─────────────────────────────────┐
│                                 │
│  Too big to precompute.         │
│  16.7 million entries.          │
│                                 │
└─────────────────────────────────┘

Quantized RGB Space (64³ = 262K entries)
┌─────────────────────────────────┐
│ Each cell covers a 4×4×4 cube   │
│ of the original color space.    │
│                                 │
│ Entry = nearest palette index   │
│                                 │
│ 262KB total. Fits in L2 cache.  │
└─────────────────────────────────┘

Lookup: pixel (R=201, G=44, B=180)
┌─────────────────────────────────────────────────┐
│                                                 │
│  R' = 201 >> 2 = 50 ─┐                         │
│  G' =  44 >> 2 = 11  ├──▶ table[(50<<12)       │
│  B' = 180 >> 2 = 45 ─┘          |(11<<6)       │
│                                  |45]           │
│                        = palette index 7        │
│                                                 │
│  One shift, two ORs, one array access. Done.    │
└─────────────────────────────────────────────────┘

Building the table is the expensive part, but you do it once per GIF. The construction is parallelized across the 64 R-slices using Rayon, so on an M-series Mac it takes a couple milliseconds. After that, every pixel lookup is a bit shift and an array index.

The 6-bit precision (rather than 5-bit, which would give a 32KB table) was a deliberate upgrade. The original 5-bit version worked but showed measurable quality loss on GIFs with subtle gradients. Going to 6 bits cost 230KB of extra memory and bought +1-2 dB PSNR. Still trivially small for any server workload.

Quality Gating: Knowing When to Bail

The fast path is great when it works, but it doesn’t always work. A GIF with a global palette of 32 cartoon colors will map beautifully. A photographic GIF where the resize introduces colors far from any palette entry will look terrible. You need to know the difference automatically.

Every frame gets quality-checked during the palette mapping pass. For each opaque pixel, I track the squared distance between the original color and the nearest palette match. Three metrics gate the fast path:

┌─────────────────────────────────────────────┐
│           Quality Gate (per frame)           │
│                                              │
│  avg_distance² < 150     Mean color error    │
│  outlier_ratio < 5%      Badly-matched px    │
│  palette_utilization > 30%   Palette spread  │
│                                              │
│  ALL THREE must pass.                        │
│  Any failure ──▶ fallback to imagequant      │
└─────────────────────────────────────────────┘

Typical cartoon GIF:              Photographic GIF:
avg_dist²:  ~29    ✓ PASS        avg_dist²:  ~340   ✗ FAIL
outliers:    0%    ✓ PASS        outliers:   12%    ✗ FAIL
utilization: 88%   ✓ PASS        utilization: 95%   ✓ PASS
─────────────────────            ─────────────────────
Result: fast path (4.9x)         Result: imagequant (3.5x)

The 150 threshold for average distance squared was empirically chosen from testing against the resize artifacts I cared about. Below 150, the palette approximation is visually indistinguishable from a proper requantization at web-display sizes. Above it, you start seeing color banding in gradients and halo effects around edges. The outlier ratio catches cases where most pixels are fine but a few are wildly wrong, and palette utilization catches degenerate cases where the resized content only uses a tiny slice of the original palette.

The fallback to imagequant isn’t a failure state. On GIFs with local palettes (no global palette to reuse), the fast path never activates and rusticle still beats gifsicle 3.5-3.8x from the other optimizations alone.

The Quality Tradeoff, Honestly

The fast path trades quality for speed. There’s no getting around it. Here are the real numbers:

Quality Comparison (resize to 320×240)
──────────────────────────────────────────────────────────────
Test File         │ rusticle PSNR │ gifsicle PSNR │ Δ
──────────────────┼───────────────┼───────────────┼─────────
cartoon (fast)    │   31.9 dB     │   40.5 dB     │ -8.6 dB
photo (fast)      │   36.9 dB     │   38.8 dB     │ -1.9 dB
photo (imagequant)│   34.7 dB     │   37.2 dB     │ -2.5 dB
──────────────────┴───────────────┴───────────────┴─────────

Both tools rate "GOOD" (PSNR ≥ 30 dB) on all tests.
gifsicle rates "EXCELLENT" on cartoon content.

The question: do you care?

If you’re serving originals to a photo editor, yes, you care. If you’re generating thumbnails, previews, or chat previews at web resolution, 31.9 dB is more than adequate. Nobody is zooming into your 320×240 GIF thumbnail and complaining about quantization artifacts.

And here’s the part that surprised me: rusticle often produces smaller files than gifsicle, even with the “lower quality” fast path. The original palette tends to be well-optimized for LZW compression, while a freshly-quantized palette can actually compress worse despite being more accurate.

SIMD: Making Frame Comparison Fast

Beyond the LUT, rusticle uses portable SIMD (std::simd on nightly) for two operations that run on every frame pair: marking unchanged pixels transparent and computing diff bounding boxes.

The pixel comparison loads 16 bytes (4 RGBA pixels) at a time, computes the absolute difference per channel using max(a,b) - min(a,b) (no branching), and checks all channels against a threshold simultaneously:

SIMD Pixel Comparison (4 pixels at a time)
────────────────────────────────────────────────────────

Load 16 bytes from current frame:
┌────┬────┬────┬────┬────┬────┬────┬────┬───┐
│ R₀ │ G₀ │ B₀ │ A₀ │ R₁ │ G₁ │ B₁ │ A₁ │...
└────┴────┴────┴────┴────┴────┴────┴────┴───┘

Load 16 bytes from previous frame:
┌────┬────┬────┬────┬────┬────┬────┬────┬───┐
│ R₀'│ G₀'│ B₀'│ A₀'│ R₁'│ G₁'│ B₁'│ A₁'│...
└────┴────┴────┴────┴────┴────┴────┴────┴───┘

diff = max(curr, prev) - min(curr, prev)
mask = diff <= threshold

Check per-pixel:
  Pixel 0: (mask & 0x000F) == 0x000F  →  match  →  transparent
  Pixel 1: (mask & 0x00F0) == 0x00F0  →  match  →  transparent
  Pixel 2: (mask & 0x0F00) == 0x0F00  →  match  →  transparent
  Pixel 3: (mask & 0xF000) == 0xF000  →  match  →  transparent

The diff bounding box uses the same SIMD comparison to find the minimal rectangle containing all changed pixels between frames. Rows scan with SIMD and early exit, columns scan scalar because column-wise traversal defeats SIMD memory access patterns. At O3 optimization, frames are cropped to just this bounding box, which dramatically reduces encoding work for GIFs where only a small region changes per frame.

Frame N-1                    Frame N
┌────────────────────┐       ┌────────────────────┐
│                    │       │                    │
│                    │       │        ┌──┐        │
│                    │  vs   │        │▓▓│ changed│
│                    │       │        └──┘        │
│                    │       │                    │
└────────────────────┘       └────────────────────┘

Diff bounding box:
┌──┐  ← Only this region gets encoded
│▓▓│     (with position offset in frame header)
└──┘

640×480 frame with 40×40 change = 99.7% less pixel data to encode

The Bug That Made Everything Click

Here’s my favorite part. After wiring up the fast path and SIMD optimization, I was getting disappointing results. On a 24-frame cartoon GIF, only 1 out of 24 frames was using the fast path. The other 23 were falling back to imagequant. The quality gate was rejecting almost everything.

I spent an embarrassing amount of time checking my thresholds before realizing the problem: the optimization pass was running before encode. It marks unchanged pixels as transparent by setting their RGBA values to [0, 0, 0, 0]. Those transparent pixels have garbage RGB values (literally zeros) that don’t match anything in the palette, which inflates the average distance and blows out the outlier ratio.

Before fix:
┌─────────────────────────────────────────────────┐
│ Frame after optimize():                         │
│                                                 │
│ [255, 0, 0, 255]  ← opaque red, dist=0    ✓    │
│ [  0, 0, 0,   0]  ← transparent, dist=HIGH ✗   │
│ [  0, 0, 0,   0]  ← transparent, dist=HIGH ✗   │
│ [  0,255, 0, 255]  ← opaque green, dist=0  ✓   │
│                                                 │
│ avg_dist² = 9500    ← way over 150 threshold    │
│ Fast path: REJECTED (falls back to imagequant)  │
└─────────────────────────────────────────────────┘

After fix (skip transparent pixels in quality stats):
┌─────────────────────────────────────────────────┐
│ Same frame, but only count opaque pixels:       │
│                                                 │
│ [255, 0, 0, 255]  ← opaque red, dist=0    ✓    │
│ [  0, 0, 0,   0]  ← transparent, SKIPPED       │
│ [  0, 0, 0,   0]  ← transparent, SKIPPED       │
│ [  0,255, 0, 255]  ← opaque green, dist=0  ✓   │
│                                                 │
│ avg_dist² = 0       ← well under threshold      │
│ Fast path: ACCEPTED                             │
└─────────────────────────────────────────────────┘

Result: 1/24 frames fast path  →  24/24 frames fast path
        Encode: 70ms           →  31ms (2.3x faster)

One if pixel[3] >= 128 check in the quality stats loop. That was it. The fix is three lines of code and it more than doubled encode performance on optimized GIFs.

Results

All benchmarks on Apple Silicon with release builds, jemalloc, and SIMD enabled.

Speedup vs gifsicle 1.96
───────────────────────────────────────────────
Resize (fast path)  │████████████████████│ 4.6-4.9x
Resize (fallback)   │███████████████     │ 3.5-3.8x
Full pipeline       │█████████████████████████│ 4.7-6.2x
───────────────────────────────────────────────

Output file sizes (resize + optimize O3 + lossy 80):
───────────────────────────────────────────────
Test          │ rusticle  │ gifsicle  │ Winner
9MB, 197 fr   │ 5.9 MB   │ 6.5 MB   │ rusticle
34MB, 367 fr  │ 6.0 MB   │ 5.1 MB   │ gifsicle
───────────────────────────────────────────────

The full pipeline number (6.2x on the 34MB file) is where everything compounds: fast_image_resize for SIMD resizing, rayon for parallel frame quantization, the palette LUT for skipping imagequant on eligible frames, SIMD diff detection for frame optimization, and diff bounding box cropping to reduce encoding work.

On the file size front, it’s a mixed bag. Rusticle wins on some inputs and loses on others. Gifsicle has decades of LZW encoding optimizations that I haven’t tried to replicate. For my use case, the speed difference matters more than a 15% file size variance.

What I’d Do Differently

I used portable_simd, which requires nightly Rust. This was fine for a frustration project but would be a problem for adoption. The same operations could be done with std::arch intrinsics on stable, or even just let LLVM auto-vectorize the scalar code (which it does reasonably well for the simpler loops).

The quality gating thresholds are empirically tuned against my test corpus. They work well for the GIF categories I care about (UI recordings, cartoon content, photo-derived thumbnails), but a GIF with an unusual color distribution might trip the gate incorrectly in either direction. More testing with a broader corpus would help.

The lossy compression implementation is conservative by design. Even at quality=0, the maximum lossy threshold is 20. Gifsicle goes much more aggressive at low quality settings. I’d rather be too conservative here because lossy artifacts in GIFs are visible and ugly.

Try It

cargo install rusticle-cli

# Resize preserving aspect ratio
rusticle resize input.gif --fit -W 640 -H 480

# Full pipeline: resize + optimize + lossy
rusticle resize input.gif --fit -W 640 -H 480 --optimize o3 --lossy 80

# Compare quality between outputs
rusticle quality original.gif processed.gif

The library is also usable as a crate if you want to embed it:

use rusticle::{Gif, Filter, OptLevel};

let bytes = Gif::from_bytes(&data)?
    .resize(640, 480, Filter::Lanczos3)?
    .optimize(OptLevel::O2)
    .lossy(80)
    .to_bytes()?;

The code is at github.com/GEverding/rusticle. MIT licensed. Still alpha, so expect rough edges, but the core pipeline is solid and benchmarked.

The best optimization is the work you don’t do. For most GIF resizing, full requantization is work you don’t need to do.