I went looking for the AI equivalent of Sysmon or EDR telemetry and found mechanistic interpretability—a field pioneered by teams at Anthropic, Google DeepMind, and academia that analyzes what’s happening inside a model’s layers and activations, not just its outputs.
The challenge: most research is fragmented and hard to reproduce, with teams repeatedly reinventing setups and focusing narrowly on single models instead of scaling insights across architectures.
That’s when it clicked: this isn’t an AI problem—it’s a tooling problem. Security has long specialized in reverse-engineering black boxes; we just needed to bring that discipline to AI and build instrumentation that works at scale, across models, and repeatably.
I brought the idea to Carl, whose background in reverse engineering and zero-day research immediately validated it. This wasn’t unsolvable—just a familiar problem lacking the right tradecraft.
So we built Starseer: a platform grounded in AI interpretability, delivering model-level runtime detection and response, AI-native detection engineering, and pre-deployment model validation.
One mission: make AI systems interpretable and secure from the inside out.