Our LLM API bill was growing 30% month-over-month. Traffic was increasing, but not that fast. When I analyzed our query logs, I found the real problem: Users ask the same questions in different ways."What's your return policy?," "How do I return something?", and "Can I get a refund?" were all hitting our LLM separately, generating nearly identical responses, each incurring full API costs.Exact-match caching, the obvious first solution, captured only 18% of these redundant calls. The same semantic question, phrased differently, bypassed the cache entirely.So, I implemented semantic caching based on what queries mean, not how they're worded. After implementing it, our cache hit rate increased to 67%, reducing LLM API costs by 73%. But getting there req [...]
Semantic intelligence is a critical element of actually understanding what data means and how it can be used.Microsoft is now deeply integrating semantics and ontologies into its Fabric data platfor [...]
Agents are the trendiest topic in AI today — and with good reason. Taking gen AI out of the protected sandbox of the chat interface and allowing it to act directly on the world represents a leap for [...]
By now, many enterprises have deployed some form of RAG. The promise is seductive: index your PDFs, connect an LLM and instantly democratize your corporate knowledge.But for industries dependent on he [...]
AI vibe coders have yet another reason to thank Andrej Karpathy, the coiner of the term. The former Director of AI at Tesla and co-founder of OpenAI, now running his own independent AI project, recent [...]
This weekend, Andrej Karpathy, the former director of AI at Tesla and a founding member of OpenAI, decided he wanted to read a book. But he did not want to read it alone. He wanted to read it accompan [...]
A new open-source framework called PageIndex solves one of the old problems of retrieval-augmented generation (RAG): handling very long documents.The classic RAG workflow (chunk documents, calculate e [...]
Unrelenting, persistent attacks on frontier models make them fail, with the patterns of failure varying by model and developer. Red teaming shows that it’s not the sophisticated, complex attacks tha [...]