inferno

Shared Backbones: Loading Weights Once, Serving Many Models

Many multimodal and multi-task models actually share the same underlying text encoder or LLM backbone but self-hosted inference stacks typically load these weights separately for each model. In this post I explore a simple idea: load shared backbones once, and let multiple "heads" reuse them. I keep running

Thoughts, stories and ideas.

Latest

Shared Backbones: Loading Weights Once, Serving Many Models