Compression, Preloading, and Tree-Shaking: Cutting Load Times by 75% at Lorikeet

TL;DR: Three well-known optimizations (HTTP compression, asset preloading, and tree-shaking) cut web app transfer by 75% and chat widget time-to-ready down to 681ms. None were novel. The compounding was.

Finding Performance Headroom Through a Stack Audit

We cut our chat widget's cold load time by 45% and web app transfer size by 75% through three standard optimizations that compounded in unexpected ways. As Lorikeet's product matured, we carved out time for a performance audit and found headroom at every layer of the stack.

Static assets were being served without HTTP compression, the chat widget downloaded every asset fresh each time a user opened it, and the design system's bundle configuration meant the widget carried more code than it actually needed. None of these were novel techniques: compression, preloading, and tree-shaking are well-understood optimizations. What surprised us was how much they amplified each other. Tree-shaking made the preload phase 88% faster. Compression shrank what preloading needed to cache. The combined effect was greater than the sum of the parts, and that compounding is the real story here.

Combined Results

75%

Web app cold load reduction (1.56 MB to 392 KB)

681ms

Chat widget time-to-ready (down from 1.24s)

~2.1 MB to 795 KB

Chat widget cold load transfer, uncompressed baseline (62% reduction)

630KB

Trimmed from chat bundles via tree-shaking

For context on what these numbers mean in practice: the chat widget is embedded on customer websites as a third-party script. Every kilobyte we transfer is overhead on someone else's page load. A sub-700ms time-to-ready means the widget feels instant when a user clicks the chat bubble, with no visible loading state. For customers with users in high-latency regions far from our infrastructure, smaller payloads mean disproportionately bigger wins.

We don't yet have business metrics like widget abandonment rates tied to these improvements. That instrumentation is next. But the technical foundation is now in place to measure whether faster loads correlate with higher engagement.

1. HTTP Compression: A Low-Hanging Win Hiding in Plain Sight

When we audited the stack, we spotted an opportunity: HTTP compression hadn't been enabled at the CDN layer yet. Unlike some providers that compress by default, GCP Cloud CDN requires you to explicitly opt in, so it's something you only discover when you go looking for it.

The Audit

We wrote a shell script that parsed the web app's HTML to discover every CDN asset URL, then curled each one and inspected the Content-Encoding response header and transfer size. We ran the same check against non-CDN endpoints (the NestJS API, Remix SSR HTML) as a control group, so we could see exactly which layers were compressing and which were not:

Layer	Compression
CDN static assets	Disabled
NestJS API server	No middleware
Remix SSR apps	No middleware
GKE L7 load balancer	Brotli (origin responses only)

The good news: the GKE L7 external load balancer was already Brotli-compressing origin-served responses (HTML, tRPC JSON). The opportunity was in the CDN layer, where all 33 JS/CSS bundles were being served at their full uncompressed size.

The Fix

The fix was a single Terraform attribute (compression_mode = "AUTOMATIC") on the CDN backend bucket. Cloud CDN inspects the Accept-Encoding request header and compresses eligible responses on-the-fly. Brotli is preferred for the ~97% of browsers that support it, with gzip as fallback. Rollback is instant: remove the attribute and re-apply.

To be clear: the fix itself took five minutes. The value was in the audit that found it. GCP Cloud CDN doesn't compress by default, unlike Cloudflare or Vercel, and that's the kind of infrastructure default that goes unquestioned until someone stops to measure. The near-miss with NestJS gzip middleware (which would have downgraded Brotli to gzip) only surfaced because we checked every layer before changing anything.

We also investigated adding compression middleware to the NestJS server, but HAR analysis revealed this would be counterproductive. The GKE load balancer already Brotli-compresses all non-CDN responses. Adding application-level gzip would either downgrade compression quality (gzip instead of Brotli) or cause double compression.

Results

We ran the shell script before the Terraform change to establish a baseline, applied the single-attribute change, then re-ran it immediately after. The Content-Encoding headers flipped from none to br (Brotli) on every asset above ~1KB. Assets under ~1KB were left uncompressed, which is expected Cloud CDN behaviour since the compression overhead isn't worth it at that size. The top 5 largest assets tell the story:

components68% smaller

267 KB → 85 KB

customer.store69% smaller

198 KB → 61 KB

entry.client69% smaller

126 KB → 40 KB

hub71% smaller

44 KB → 13 KB

browser.client69% smaller

27 KB → 9 KB

Across the full set: the web app dropped 1.56 MB → 392 KB (75% smaller) across 33 assets, and the chat widget 1.81 MB → 795 KB (56% smaller) across 17 assets plus the font.

The chat widget reduction is lower because the woff2 font (346 kB) is already compressed, so CDN compression does not help much. We have not done a font audit yet. That includes checking whether it bundles multiple weights or large glyph ranges, and whether subsetting makes sense.

Why This Matters Beyond Raw Transfer Size

These are shared CDN assets that every route depends on. Smaller assets mean the browser downloads, parses, and executes JavaScript faster, leading to faster Remix hydration, earlier route loader execution, and quicker time-to-interactive across every page. For the chat widget, which customers embed on their own sites, every KB shaved is less overhead on their page load.

2. Preparative Iframe: Warming the Cache Before Users Need It

The chat widget is embedded on customer websites as a third-party script. When a user opens the widget, the browser needs to download all the JS, CSS, and font assets before anything renders. On a cold load, that means hitting the network for every asset. The preparative iframe changes this by loading a lightweight preload route in a hidden iframe as soon as the host page loads, well before the user actually opens the widget.

How It Works

The preload route serves a minimal HTML page that references the same assets the widget will need. The browser downloads and caches these assets during idle time. When the user eventually opens the widget, 85% of assets are served from the browser cache at effectively 0ms instead of fetching from the network.

This works because of how modern browsers partition their caches. Cache entries are keyed by a tuple of (top-level site, frame site, resource URL). Since both the preload iframe and the widget iframe are embedded on the same customer site and served from the same origin, they share a cache partition. Assets cached during preload are reused when the widget loads.

Validating with HAR Files

To verify the preload was actually working, we captured HAR (HTTP Archive) files for both the preloaded and non-preloaded flows in production. HAR files record every network request the browser makes, including timing, response size, and whether assets were served from cache. By comparing the two captures side by side, we could confirm that preloaded assets showed transferSize: 0 (a cache hit) and see the exact timing waterfall for each request. This was especially important because the Performance Resource Timing API can't detect cache status for cross-origin CDN assets without a Timing-Allow-Origin header, so HAR analysis was the most reliable way to confirm the behavior.

Timings below are from initial preload testing, before tree-shaking was applied. Final combined numbers appear in the Compounding Effect section.

Without Preload

1,242ms to ready

/widget

362ms

/tickets

270ms

13 assets

~880ms

With Preload

769ms to ready (38% faster)

/widget

328ms

/tickets

296ms

11 assets

0ms (cached)

2 assets

~95ms

Navigation Network Cache hit

Scale: 0–1,300ms

Production Results

Metric	Without Preload	With Preload	Improvement
Time to ready	1,242ms	769ms	38% faster
Assets cached	0 / 13	11 / 13	85% cache hit
Total asset load time	3,837ms	1,928ms	50% less

The two assets that still hit the network are a hash-specific route chunk and the font file, which have different cache keys between the preload and widget routes. The font is the largest remaining payload. A dedicated font size pass is not covered here.

Trade-offs

The preparative iframe fires for every visitor to the host page, even those who never open the widget. This is a deliberate trade-off worth being transparent about.

The cost: On a first visit with an empty cache, the preload downloads ~795KB of widget assets speculatively. The iframe uses loading="lazy" so the browser defers it until idle time. On subsequent page loads, it validates cached assets and completes in roughly 52ms, essentially free.

Why we accepted it: Lorikeet is a customer support widget. Our customers embed it because they expect users to interact with it. The preloaded assets are the exact same files the widget needs, so there's no wasted bandwidth when users do open the chat. Since assets use content-hashed URLs with long cache lifetimes, the cost is paid at most once per deploy.

The alternative was worse: Without preloading, every widget open hits the network for all assets, adding ~500ms of latency at the exact moment the user is actively waiting. The preload shifts that cost to page load, when the user isn't waiting for anything.

3. Tree-Shaking and Code Splitting: Less Code, Faster Everything

We identified another opportunity in bundle size. Our design system package didn't yet have sideEffects configuration, which meant Vite couldn't tree-shake unused exports. Components only used by the web app were being pulled into the chat bundle even though the widget never imports them.

The Changes

We shipped this across multiple PRs, measuring bundle output and HAR files after each one to confirm each change moved the needle in the right direction.

1. Enable tree-shaking: Adding a sideEffects field to the design system's package.json told Vite which exports are side-effect-free and can be safely eliminated if unused.

2. Consolidate React into a single chunk: Tree-shaking caused React to get duplicated across multiple chunks. We added a manualChunks configuration in Vite to keep React in a single shared chunk.

3. Clean up barrel exports: The sideEffects flag doesn't help when a barrel file has top-level imports with side effects. We found cases where a shared file imported a heavy dependency at the module level even though only a lightweight export from that file was used by the widget. Splitting those files and removing unused re-exports from the barrel prevented the bundler from pulling in dependencies the widget never needed.

4. Preload the font: The Inter font was being downloaded during widget load. We added a <link rel="preload"> to the preload route so the font is cached during the preparative iframe phase.

Bundle Impact

Chunk	Before	After	Reduction
`post-message`	563 KB	73 KB	87%
`hub`	305 KB	167 KB	45%
`react`	(scattered)	144 KB	(consolidated)
Server bundle	2,807 KB	2,013 KB	28%

The post-message chunk saw the most dramatic reduction: 87%. This chunk contained the bulk of unused design system code that tree-shaking eliminated.

Verifying in Production

Build output tells you what changed in theory. To verify the real-world impact, we re-ran the same HAR file analysis from the preload testing after the PR was deployed. We captured fresh HAR files for both the preloaded and non-preloaded widget flows and compared them against the baseline HARs from before tree-shaking.

The Network tab in DevTools confirmed smaller transfer sizes across the board, and the HAR diffs showed that individual asset load times dropped significantly. The hub chunk, for example, went from 339ms to 65ms on a cold load without preload, a direct result of being 45% smaller after tree-shaking and then Brotli-compressed on top of that.

More importantly, the widget's lorikeet:performance postMessage event, which includes a timeToReady timestamp, gave us a single production metric to confirm the end-to-end improvement. This is how we confirmed that smaller bundles translated to faster load times in production, not just smaller numbers in a build log.

The Compounding Effect

These three optimizations are complementary, not redundant. Tree-shaking reduces what needs to be compressed. Compression reduces what needs to be transferred. Preloading ensures transfers happen before the user is waiting. The combined effect is multiplicative.

After all three were in production, we re-measured the chat widget with and without the preparative iframe:

Metric	Original (no optimizations)	Final (all three)	Improvement
Time to ready (with preload)	769ms	681ms	11% faster
Time to ready (without preload)	1,242ms	956ms	23% faster
Preload duration	442ms	52ms	88% faster

The preload duration dropping from 442ms to 52ms is particularly notable. With smaller bundles and HTTP caching from prior visits, the preload phase becomes essentially free. The preload percentage improvement is smaller (29% vs the original 38%) because the without-preload baseline is now much faster, there's less room for caching to help when assets are already small and compressed.

Where These Gains Matter Most

Chat widget cold loads: End-customers loading the widget for the first time on a ticket page. CDN transfer dropped from 1.81 MB to 795 KB.
Web app cold loads: Users opening the app for the first time or after a deploy invalidates the cache.
High-latency regions: Customers far from us-west1 benefit disproportionately since smaller payloads mean fewer round trips.
Core Web Vitals: Smaller JS bundles reduce First Contentful Paint and Largest Contentful Paint by getting critical resources to the browser faster.

Methodology: Measure, Change, Verify

Every optimization followed the same discipline: capture a baseline measurement, make the change, and re-measure immediately. For CDN compression, we wrote a shell script that curls all 33 CDN assets and records their Content-Encoding and transfer sizes. For the preparative iframe, we compared HAR files and the widget's lorikeet:performance postMessage event, which includes timeToReady. For tree-shaking, we compared Vite's build output before and after.

A note on the timing numbers throughout this post: all latency measurements (681ms, 769ms, etc.) come from a single development machine hitting production. They represent the improvement we observed, not a population-wide benchmark. Real-world numbers will vary across p50–p95 depending on geography, device, and network conditions. Transfer size reductions are deterministic and don't have this caveat.

The cross-origin nature of CDN assets made measurement tricky. The Performance Resource Timing API can't detect cache status for cross-origin assets without a Timing-Allow-Origin header. We verified cache hits through three methods: the timeToReady metric (the primary signal), DevTools Network tab showing "(disk cache)", and HAR file analysis where transferSize: 0 indicates a cache hit.

What We Learned

Know your provider's defaults. GCP Cloud CDN requires an explicit opt-in for compression. It's worth auditing what your infrastructure does out of the box.
Audit every layer. We found compression unconfigured at the CDN, not yet added at the app level, and already working at the load balancer. Each layer had different behavior. You can't improve what you haven't measured.
Small bundles compound with preloading. Tree-shaking made the preload phase 88% faster (442ms to 52ms). Optimizations that seem independent often amplify each other.
sideEffects is table stakes. A single line in package.json eliminated 87% of one chunk. If you maintain a shared package consumed by multiple apps, ensure bundlers can tree-shake it.
Don't compress twice. We nearly added gzip middleware to NestJS before discovering the load balancer already applies Brotli. HAR analysis saved us from degrading compression quality.

Closing Thoughts

These three optimizations reinforced a principle I keep coming back to: the methodology matters more than any individual fix. Instrument, measure, identify, optimize. The same discipline that guided our SSR performance work applied just as well to infrastructure-level changes like CDN compression. Performance wins are everywhere once you start measuring.