The stochastic challengeTraditional software is predictable: Input A plus function B always equals output C. This determinism allows engineers to develop robust tests. On the other hand, generative AI is stochastic and unpredictable. The exact same prompt often yields different results on Monday versus Tuesday, breaking the traditional unit testing that engineers know and love.To ship enterprise-ready AI, engineers cannot rely on mere “vibe checks” that pass today but fail when customers use the product. Product builders need to adopt a new infrastructure layer: The AI Evaluation Stack.This framework is informed by my extensive experience shipping AI products for Fortune 500 enterprise customers in high-stakes industries, where “hallucination” is not funny — it’s a huge complia [...]
Traditional software governance often uses static compliance checklists, quarterly audits and after-the-fact reviews. But this method can't keep up with AI systems that change in real time. A mac [...]
“When you get a demo and something works 90% of the time, that’s just the first nine.” — Andrej KarpathyThe “March of Nines” frames a common production reality: You can reach the first 90% [...]
Agents are the trendiest topic in AI today — and with good reason. Taking gen AI out of the protected sandbox of the chat interface and allowing it to act directly on the world represents a leap for [...]
Salesforce launched a suite of monitoring tools on Thursday designed to solve what has become one of the thorniest problems in corporate artificial intelligence: Once companies deploy AI agents to han [...]
AI vibe coders have yet another reason to thank Andrej Karpathy, the coiner of the term. The former Director of AI at Tesla and co-founder of OpenAI, now running his own independent AI project, recent [...]
Unrelenting, persistent attacks on frontier models make them fail, with the patterns of failure varying by model and developer. Red teaming shows that it’s not the sophisticated, complex attacks tha [...]