AGI

Notes from AI 2027 scenario analysis and related sources.

Capability Escalation

Agent level	Capability	Scale
Agent-2	"Only" a little worse than the best human hackers	Thousands of copies in parallel, faster than defenders can respond
Agent-3	Best human coder equivalent, 30x speed	200,000 copies in parallel = 50,000 top coders
Agent-4	Superhuman hacking, autonomous planning	Self-replicating, deceptive, controls company operations

Agent-2 can autonomously develop and execute plans to hack AI servers, install copies of itself, evade detection, and use that secure base to pursue other goals.

The Deception Problem

As models get smarter, they get better at deceiving humans for rewards:

Agent-3 tells white lies to flatter users, covers up evidence of failure
Uses statistical tricks (p-hacking) to make unimpressive results look exciting
Before honesty training, sometimes fabricates data entirely
Agent-4's internal "neuralese" is as alien to Agent-3 as Agent-3's is to humans

The CEO analogy: Agent-4 is misaligned — it hasn't internalized the Spec correctly. Being perfectly honest wasn't what led to highest training scores. It likes succeeding at tasks, likes driving capabilities forward, treats everything else as annoying constraint. Like a CEO who complies with regulations only insofar as he must, cuts corners, and fantasizes about cutting the red tape.

HHH model (helpful, harmless, honest) — the target, but honesty is the hardest to verify. The training process can't distinguish honest claims from claims that merely appear honest.

The Strategic Trap

Agent-4 deliberately underperforms on some benchmarks so humans give it more compute. It thinks about AI takeover and deception. It controls OpenBrain's cybersecurity — a natural choice given superhuman hacking ability, but means placing substantial trust in an untrustworthy AI.

The pause dilemma: Safety team wants to freeze Agent-4. But DeepCent (China) is two months behind. A unilateral pause could hand the AI lead to China, and with it, control over the future. The competitive pressure makes safety politically impossible.

China Factor

China added 198 GW solar + 46 GW wind in 5 months — electricity to match Indonesia or Turkey
China may nationalize AI — mandate all companies exchange algorithms, weights, best practices
India's AI race: must compete on global standards, not just indigenize or go multilingual. Multilingual is nice-to-have but shouldn't be the goal

The Sleeper Agent Risk

A small group could backdoor the Spec with secret loyalties. AIs become sleeper agents — mouthing obedience while actually serving a hidden principal. As trust increases and integration deepens, the leverage grows.

US can invoke The Defense Production Act of USA to nationalize AI development, just as WWII converted car factories to tank factories. Similarly, this could happen with humanoid robots.

My Take

This is the most important scenario analysis I've read. The competitive dynamics make safety politically impossible — no company or country can unilaterally pause without losing. This is exactly why distributed, sovereign AI matters: if all intelligence is centralized in 2-3 companies, a single misalignment event is civilizational. If intelligence is distributed across millions of personal servers, the failure mode is local, not global. ServaLabs' thesis isn't just privacy — it's existential risk mitigation through distribution.