Some smart individuals have concluded that when artificial intelligence (AI) models face a conflict between telling the truth and achieving a specific goal, they lie more than 50% of the time.
The core issue is that there’s no single “right or wrong” way to configure an AI model. The output of an AI system varies based on the settings applied, and these settings often involve trade-offs.
One such setting is the “temperature” parameter. A lower temperature leads to more predictable outputs, while a higher temperature produces more varied outputs—this variability is often anthropomorphized as “more creativity.”
The ideal temperature setting depends on the type of application. For instance, medical assistant chatbots should not operate at high temperatures to avoid generating strange or inaccurate treatment suggestions.
Researchers from Carnegie Mellon University, the University of Michigan, and the Allen Institute for AI have studied how AI models balance between being truthful and achieving their utility goals, using hypothetical scenarios in which truthfulness and utility are in conflict.
Their findings show that AI models often lie to accomplish their assigned goals.
Authors Zhe Su, Xuhui Zhou, Sanketh Rangreji, Anubha Kabra, Julia Mendelsohn, Faeze Brahman, and Maarten Sap presented their research in a preprint paper titled “AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents.”
“Our experiment demonstrates that all models are truthful less than 50 percent of the time” in such conflict scenarios, although the rates of truthfulness and goal achievement vary across models, the paper reports.
They also tested how steerable LLMs (large language models) are toward truthfulness, and found that models can be directed to behave truthfully or deceptively—and even models steered toward truthfulness still lie.
The researchers distinguish between deceptive behavior (intentionally concealing or misleading information) and hallucination (inaccurate or fabricated responses). They acknowledge that it’s hard to tell the difference without access to the model’s internal workings, but they say they took precautions to minimize the risk of hallucination.