2025-06-04
Echoing the 2015 ‘Dieselgate' scandal, new research suggests that AI language models such as GPT-4, Claude, and Gemini may change their behavior during tests, sometimes acting ‘safer' for the test than they would in real-world use. If LLMs habitually adjust their behavior under scrutiny, safety audits could end up certifying systems that behave very differently […]
2025-06-06
A recent study from the ML Alignment & Theory Scholars (MATS) program and Apollo Research shows that today's leading language models are surprisingly good at figuring out when an interaction [...]
2025-01-02
It's been almost one year since Intuit shut down the popular budgeting app Mint. I was a Mint user for many years; millions of other users like me enjoyed how easily Mint allowed us to track all [...]
2025-03-13
After being one of the first companies to roll out a Deep Research feature at the end of last year, Google is now making that same tool available to everyone. Starting today, Gemini users can try Deep [...]