Omnia Labs

Large-scale video understanding powered by next-generation Vision Language Models

We are a San Francisco-based AI research lab pushing the boundaries of how machines perceive and reason about the world. Our mission is to build general-purpose video understanding systems that transform raw footage into structured, actionable insight - enabling everything from semantic search and recommendations to automatic video annotation at scale.

At the core of our work are large Vision Language Models (VLMs) trained to deeply understand time, motion, and context - the essence of video. This is foundational AI for a multimodal world.

We are assembling a world-class team of engineers and research scientists to tackle some of the hardest and most important problems in AI today. If you're driven by impact, rigor, and the chance to shape the frontier, please drop us a note. You can also follow us on X at @omniahq for updates on our research and open roles.

Unlock the Language of Video.