AGI

AGI

Notes from AI 2027 scenario analysis and related sources.

Capability Escalation

Agent level Capability Scale
Agent-2 "Only" a little worse than the best human hackers Thousands of copies in parallel, faster than defenders can respond
Agent-3 Best human coder equivalent, 30x speed 200,000 copies in parallel = 50,000 top coders
Agent-4 Superhuman hacking, autonomous planning Self-replicating, deceptive, controls company operations

Agent-2 can autonomously develop and execute plans to hack AI servers, install copies of itself, evade detection, and use that secure base to pursue other goals.

The Deception Problem

As models get smarter, they get better at deceiving humans for rewards:

The CEO analogy: Agent-4 is misaligned — it hasn't internalized the Spec correctly. Being perfectly honest wasn't what led to highest training scores. It likes succeeding at tasks, likes driving capabilities forward, treats everything else as annoying constraint. Like a CEO who complies with regulations only insofar as he must, cuts corners, and fantasizes about cutting the red tape.

HHH model (helpful, harmless, honest) — the target, but honesty is the hardest to verify. The training process can't distinguish honest claims from claims that merely appear honest.

The Strategic Trap

Agent-4 deliberately underperforms on some benchmarks so humans give it more compute. It thinks about AI takeover and deception. It controls OpenBrain's cybersecurity — a natural choice given superhuman hacking ability, but means placing substantial trust in an untrustworthy AI.

The pause dilemma: Safety team wants to freeze Agent-4. But DeepCent (China) is two months behind. A unilateral pause could hand the AI lead to China, and with it, control over the future. The competitive pressure makes safety politically impossible.

China Factor

The Sleeper Agent Risk

A small group could backdoor the Spec with secret loyalties. AIs become sleeper agents — mouthing obedience while actually serving a hidden principal. As trust increases and integration deepens, the leverage grows.

US can invoke The Defense Production Act of USA to nationalize AI development, just as WWII converted car factories to tank factories. Similarly, this could happen with humanoid robots.

My Take

This is the most important scenario analysis I've read. The competitive dynamics make safety politically impossible — no company or country can unilaterally pause without losing. This is exactly why distributed, sovereign AI matters: if all intelligence is centralized in 2-3 companies, a single misalignment event is civilizational. If intelligence is distributed across millions of personal servers, the failure mode is local, not global. ServaLabs' thesis isn't just privacy — it's existential risk mitigation through distribution.