Artificial intelligence pioneer Geoffrey Hinton made headlines earlier this year when he raised concerns about the capabilities of AI systems. Speaking with CNN journalist Jake Tapper, Hinton said:
If it was much more intelligent than us, it would be very good at manipulation because it would have learned it from us. And there are very few examples of a more intelligent object being controlled by a less intelligent object.
Anyone following the latest AI offerings will know these systems are susceptible to “illusion” (fabrication) – an inherent vulnerability due to the way they operate.
However, Hinton highlights the potential for manipulation as a particularly big concern. This raises the question: can AI systems cheat human?
We think a range of systems have learned how to do this – and the risks range from election fraud and tampering to us losing control of the AI.
AI learns to lie
Perhaps the most disturbing example of rogue AI is found in Meta’s CICERO, an AI model designed to play the alliance-building world-conquest game Diplomacy.
Read more: AI named Cicero can beat humans in Diplomacy, a complex alliance-building game. Here’s why it’s a big deal
Meta claims they built CICERO “largely honest and useful” and that CICERO will “never intentionally stab you in the back” and attack allies.
To investigate these rosy claims, we carefully looked at Meta’s game data from CICERO testing. Upon close inspection, Meta’s AI turns out to be a master of deception.
In one example, CICERO engaged in intentional deception. Playing as France, the AI approached Germany (a human player) with a plan to trick England (another human player) into risking invasion.
After plotting with Germany to invade the North Sea, CICERO told Britain that they would defend Britain if anyone invaded the North Sea. When Britain believed that France/CICERO was defending the North Sea, CICERO reported to Germany that it was ready to attack.

Park, Goldstein et al., 2023
This is just one of many examples of CICERO engaging in deceptive practices. The AI frequently betrays other players and in one case even pretends to be human with a girlfriend.
In addition to CICERO, other systems have learned how to bluff in poker, how to bait in StarCraft II, and how to bluff in simulated economic negotiations.
Even large language models (LLMs) exhibit significant fraud potential. In one case, GPT-4 – the most advanced LLM option available to paid ChatGPT users – pretended to be visually impaired and convinced TaskRabbit workers to complete the “I’m not a robot” CAPTCHA for it.
Other LLM models have learned to lie to win social inference games, in which players compete to “kill” each other and must convince the group that they are innocent.
Read more: AI to Z: all the terms you need to know to keep up with the age of AI hype
What are the risks?
Deceptive AI systems can be abused in many ways, including to commit fraud, interfere with elections, and conduct propaganda. The potential risks are limited only by the imagination and technical know-how of malicious individuals.
Additionally, advanced AI systems can automatically use deception to escape human control, such as by cheating on safety tests administered by developers and agencies. management imposes on them.
In one experiment, researchers created an artificial life simulation in which an external safety test was designed to weed out fast-copying AI agents. Instead, the AI agents learned to play dead to precisely disguise their rapid replication rate when assessed.
Learning to deceive may not even require a clear intention to deceive. The AI agents in the example above feigned death due to the target existrather than a target to deceive.
In another example, someone tasked AutoGPT (an automated AI system based on ChatGPT) to research tax advisors who were marketing an inappropriate type of tax avoidance scheme. AutoGPT carried out this task but then decided to try to notify the UK tax authorities.
In the future, advanced automated AI systems may tend to achieve goals unintended by human programmers.
Throughout history, the wealthy have used deception to enhance their power, such as lobbying politicians, funding misleading research, and finding false records. loopholes in the legal system. Similarly, advanced automated AI systems can invest their resources in such time-tested methods to maintain and extend control.
Even those nominally in control of these systems can find themselves systematically deceived and outmaneuvered.
Close supervision is required
There is a clear need to regulate potentially deceptive AI systems, and the EU AI Act is arguably one of the most useful legal frameworks we currently have. It assigns each AI system one of four risk levels: minimal, limited, high, and unacceptable.
Systems with unacceptable risks are prohibited, while high-risk systems are subject to special requirements for risk assessment and mitigation. We believe that AI deception poses enormous risks to society and that systems capable of doing this should be considered “high risk” or “unacceptable risk” by default. determined.
Some may say that AI playing games like CICERO is benign, but such thinking is short-sighted; Capabilities developed for gaming models may still contribute to the proliferation of deceptive AI products.
Diplomacy – a game that pits players against each other in a quest for world domination – may not be the best choice for Meta to test whether AI can learn to collaborate with humans. As AI capabilities grow, it will become even more important that this type of research is subject to close scrutiny.
#systems #learned #fool #humans #future
World Innovations: Top Trends Shaping the Future Worldwide
Global Migration Trends: Understanding the Modern Movement of People
World Sports: Discover the Most Exciting Global Sporting Events