Shared Backbones: Loading Weights Once, Serving Many Models
Many multimodal and multi-task models actually share the same underlying text encoder or LLM backbone but self-hosted inference stacks typically load these weights separately for each model. In this post I explore a simple idea: load shared backbones once, and let multiple "heads" reuse them. I keep running