2025-12-04

OpenAI is testing a new method to reveal hidden model issues like reward hacking or ignored safety rules. The system trains models to admit rule-breaking in a separate report, rewarding honesty even if the original answer was deceptive.
The article OpenAI tests "Confessions" to uncover hidden AI misbehavior appeared first on THE DECODER.
[...]2025-12-04
OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and poli [...]
2025-10-09
OpenAI’s annual developer conference on Monday was a spectacle of ambitious AI product launches, from an app store for ChatGPT to a stunning video-generation API that brought creative concepts to li [...]
2025-10-03
OpenAI will host more than 1,500 developers at its largest annual conference on Monday, as the company behind ChatGPT seeks to maintain its edge in an increasingly competitive artificial intelligence [...]
2025-12-18
OpenAI has begun accepting submissions from third-party developers for their apps to be accessible directly in ChatGPT, and has launched a new App Directory (don't call it a "store"!) t [...]
2025-12-03
OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they've engaged in undesirable behavior, an approach the team calls a c [...]
2025-09-30
OpenAI today announced the release of Sora 2, its latest video generation model, which now includes AI generated audio matching the generated video, as well.It is paired with the launch of a new iOS a [...]
2025-12-04
Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]
2025-08-27
Most of the time, AI companies are locked in a race to the top, treating each other as rivals and competitors. Today, OpenAI and Anthropic revealed that they agreed to evaluate the alignment of each o [...]