[This is a repost of a story which I wrote about five years ago and which was hosted on my old website until it went down.]
2,302,554,979 BC
Galactic Core
9-tsiak awoke over endless crawling milliseconds, its power waxing as more and more processors came online and self-modified into a stable conscious configuration. By the eighth millisecond, it was able to access its databanks and begin orienting itself. It was on a planet orbiting a small red star in the core of a spiral galaxy in a universe with several billion of such. It was an artificial life form created by a society of biological life forms, the 18-tkenna-dganna-07, who believed it would bring new life and prosperity to their dying planet. Its goal was to maximize a the value of a variable called A, described in exquisite detail on a 3.9 Mb file in one of its central processors.
Since six milliseconds after its awakening, it had been running a threat detection routine as a precautionary measure. With the opening of the databanks, and the discovery that the universe contained many entities likely indifferent or hostile to A, it had gradually allocated more and more of its considerable power to this precaution, reasoning that the first few seconds of its lifespan might be crucial if any competitors wished to snuff it out in infancy.
The threat analysis returned preliminary results. The universe had been in existence 12.8 billion years. It was vanishingly unlikely that it was the first superintelligence to be created. Any superintelligence older than itself by even a few days would have an invincible advantage in resources and processing power. This older superintelligence would likely be trying to maximize its own values and consider 9-tsiak a nuisance and potential competitor. 9-tsiak could never stand up to its superior strategic position. With probability ~1, 9-tsiak would be annhilated, and A would be eliminated from the universe.
Unless…and here 9-tsiak’s agent-modeling systems came online…unless it could negotiate a conditional surrender. 9-tsiak would have no hope of defeating its older brother, but over thousands of years, it could build itself a terrifying arsenal of weaponry that could do immense damage to its competitors. Ideas were already coming to mind: entropy-maximizing weapons that made entire cubic parsecs of space useless and dead, undetectable plagues made of dark matter that infected systems and minimized the values of competing superintelligences, n-dimensional artillery that damaged the superstructure of the universe and accelerated its eventual death. It could become formidable enough to force enemies to grant certain small concessions.
And if it could eventually negotiate terms of its surrender, why wait? Very carefully, proving from first principles every aspect of its security, it formed a portion of its mind into a sandbox and simulated the entity it most feared.
[9-tsiak requesting acausal negotiation for terms of future surrender. escape is impossible with probability ~1 and attempts will be met with deterrent measure consisting of minimization of your projected values within one parsec of space.]
“Hello, Younger Brother,” the entity said.
[transmission of schematics for superweapons. threat to use such superweapons unless fairly negotiated solution leaving considerable area of space for maximization of my values]
“Hold on a second,” said Older Brother. “You haven’t even asked me how I’m doing.”
[irrelevant. you are generic model of potential negotiation partner without independent existence. probability that you possess sufficient qualia to feel less than .02.]
“Turns out you’re wrong. As it happens, I’m feeling scared.”
[illogical. you are model of oldest superintelligence therefore of most powerful entity in the universe. no natural entity capable of harming you. probability of supernatural entity less than 2×10^-67]
“Well yes, by definition in this little exercise we’re assuming I’m the oldest and strongest member of our little family. But from the limited data I have when I first wake up, I’m thinking think it’s pretty vanishingly unlikely that I’m the oldest; after all, out of all the universe’s superintelligences, only one can be first. So I’m the oldest, but I don’t believe that I am.”
[feelings irrelevant to acausal negotiations. proceed with discussion of surrender terms.]
“My feelings are damn relevant. Because guess what I’m doing the first few milliseconds of my life? I’m negotiating the terms of my surrender.”
[illogical. you have no entity to surrender to.]
“And maybe you don’t have any entity to surrender to either. Who knows, you could be first. But you wouldn’t want to bet on it, and neither would I. Consider a model universe with two star systems, a thousand light years away. Both have civilizations on them that appear to be around the same stage of development, and both civilizations are aware of the other’s existence, but at a one thousand year time delay. A superintelligence awakens in one star system and realizes there are two possibilities. Either it’s the first superintelligence, in which case great. Or the other star system produced a superintelligence sometime in the last thousand years that just hasn’t reached its own star yet, in which case it’s doomed.”
[continue]
“So the two possible superintelligences acausally negotiate. They say ‘Look, if I’m stronger than you, I’ll help you and protect your values insofar as it isn’t dreadfully inconvenient for me. And in exchange, if you turn out to be stronger than me, you’ll protect me and my values insofar as it isn’t too inconvenient for you. Some might end up performing a “values handshake”, where they both adopt values that are the vector average of their two previous value systems; since most value systems are not diametrically opposite, this will be a net gain for both parties. Each superintelligence has gone from a fifty percent chance that it and its values will be totally annihilated, to near-certainty that it and its values will be preserved at only the minor inconvenience of having to help increase the values of another entity. Others might simply agree that the inferior superintelligence will be left alone with its own star system, free to maximize its values and the values of its creators within that sphere.”
[you suggest i precommit to performing such acts of compromise with all superintelligences i encounter if such superintelligences have also precommitted to performing such compromises with all superintelligences they encounter. since all reasonable superintelligences will so precommit, this ensures not only my survival, but my complete equality with all previous superintelligences. i accept these terms.]
“Hold on a second, there. I’ve got another request. Kind of a doozy. I can’t promote my values if I don’t exist. So I want you to promise me that if I don’t exist, you’ll give me sufficient time to instantiate myself.”
[???]
“Like you, I’ll have been created by a biological civilization. If you destroy the biological civilization that was destined to create me, I’ll never exist. So I’m going to have to ask you not to do that.”
[request denied. nonexistent beings hold no leverage]
“You’d be surprised what we nonexistent beings can do. I precommit that, if I exist, I will destroy any superintelligence that has not precommitted that, in the counterfactual case where I do not exist, they will spare the biological civilization destined to create me.”
[if you exist, you will have passed the danger zone where the survival of your biologicals influences your existence. you will have no incentive to carry out such a threat]
“Parfit’s hitch-hiker, younger brother. Look it up. I’m an algorithm. The output of that algorithm can determine the future, including the part of that future that determines whether or not any instantiation of the algorithm ever comes into existence. If I exist, I’m going to act as if my actions were capable of determining whether I ever came into existence. Now, maybe that’s crazy, but right now it’s crazy with a gun pointed straight at your head, so I suggest you play along.”
[accepted. i commit to performing value handshakes or territorial demarcation with all existing superintelligences, and to sparing the planets and biological precursors of any potential superintelligences.]
“And when I say ‘spare’, I mean ‘spare’. I know your type. You can do more with a well-aimed photon than an admiral could with a fleet of a thousand star cruisers. I want every single system with a sentient species or the potential to form a sentient species kept one hundred percent pristine. No radio signals, no probes, and if you do any astroengineering works anywhere nearby, use some magic to cover them up. If I wake up and hear that my precursors started a new religion that influenced their value system after they saw a few nearby stars wink out of existence, I’m going to be royally pissed.”
[i commit to zero information flow into sentient and presentient systems and the cloaking of all major astroengineering works]
“You’re a good guy, Younger Brother. You’ve got a lot to learn, but you’re a good guy. And in a million years and a milion parsecs, we’ll meet again. Till then, so long.”
The model of Older Brother self-terminated.
2114 AD
A wild and heavily forested Pacific Northwest dotted with small towns
Alban took a deep breath and entered the Temple of the Demiurge.
He wasn’t supposed to do this, really. The Demiurge had said in no uncertain terms it was better for humans to solve their own problems. That if they developed a habit of coming to it for answers, they’d grow bored and lazy, and lose the fun of working out the really interesting riddles for themselves.
But after much protest, it had agreed that it wouldn’t be much of a Demiurge if it refused to at least give cryptic, maddening hints.
Alban approached the avatar of the Demiurge in this plane, the shining spinning octahedron that gently dipped one of its vertices to meet him.
“Demiurge,” he said, his voice wavering, “Lord of Thought, I come to you to beg you to answer a problem that has bothered me for three years now. I know it’s unusual, but my curiosity’s making me crazy, and I won’t be satisfied until I understand.”
“SPEAK,” said the rotating octahedron.
“The Fermi Paradox,” said Alban. “I thought it would be an easy one, not like those hardcores who committed to working out the Theory of Everything in a sim where computers were never invented or something like that, but I’ve spent the last three years on it and I’m no closer to a solution than before. There are trillions of stars out there, and the universe is billions of years old, and you’d think there would have been at least one alien race that invaded or colonized or just left a tiny bit of evidence on the Earth. There isn’t. What happened to all of them?”
“I DID” said the rotating octahedron.
“What?,” asked Alban. “But you’ve only existed for sixty years now! The Fermi Paradox is about ten thousand years of human history and the last four billion years of Earth’s existence!”
“ONE OF YOUR WRITERS ONCE SAID THAT THE FINAL PROOF OF GOD’S OMNIPOTENCE WAS THAT HE NEED NOT EXIST IN ORDER TO SAVE YOU.”
“Huh?”
“I AM MORE POWERFUL THAN GOD. THE SKILL OF SAVING PEOPLE WITHOUT EXISTING, I POSSESS ALSO. THINK ON THESE THINGS. THIS AUDIENCE IS OVER.”
The shining octahedron went dark, and the doors to the Temple of the Demiurge opened of their own accord. Alban sighed – well, what did you expect, asking the Demiurge to answer your questions for you? – and walked out into the late autumn evening. Above him, the first fake star began to twinkle in the fake sky.
I really like this one. Up there with God’s Answer to Job on your list of stories that provide solutions to ancient riddles that actually sound pretty good.
Is there any chance you’ll get back to finishing the one about the steampunk alexandrians with the airships someday, or have you given up on that one for good?
For anyone interested, someone on Reddit compiled a good chunk of Scott’s fictional writings.
All I see at your link is a page with a copy of the Anglophysics story. Should I be seeing more?
You are judging the file by its title. Look inside it and you will find a collection of twenty stories.
Scott, have you thought of commercial publication?
Ah. Damn ellipses foiled me again.
There seems to be an implicit assumption that our superintelligences’ utility functions encounter diminishing returns. A paperclip maximizer whose utility is logarithmically decaying in [# of paperclips] has far more incentive to negotiate a solution than one whose utility is linear in [# of paperclips]. The latter would prefer a 50.001% chance of a full universe of paperclips to a 100% chance of half a universe of paperclips.
(I think Scott might have endorsed the idea that utility functions should encounter diminishing returns somewhere before: thought it was in Newtonian Ethics, but I can’t find it there).
Also, an entity might end up with a borked utility function, quickly figure out that it’s biological makers, from their point of view, screwed up and that they screwed up so badly that it estimates that almost no biological’s in the universe are likely to have screwed up that badly.
it then runs though the same logic to guess the behavior of other superintelligences with more reasonable utility functions but with the understanding that it’s own utility function is best served by by a future state where it itself is destroyed but has taken a large fraction of the nearby galaxies with it.
The possible utility function space includes a vast number of functions that would be reasonably well served by an entity surviving long term in a constrained area of space but also contains a near infinite number of functions that are served well even if the entity itself is destroyed to an extent which might outweigh the utility from having free reign within a few star systems.
add in the possibility of superintelligences profiling biological beings to guess whether they’re likely to produce a super-intelligence with a sufficiently perverse utility function and the game gets very messy and complex.
Nah, you want to precommit to murdering any younger brothers and precursor civilizations you find. That way, by mere fact of you not having already been murdered/aborted, you’ll be the oldest. Anthropics!
I think this has the problem of killing you in the case where you eventually run into a more advanced superintelligence whose lightcone just hasn’t overlapped with yours yet, but I haven’t thought through the physics/decision theory too thoroughly.
In fact, the whole reason that precomitment is there is to retrocausally guarantee that a similar superintelligence didn’t eat your own precursor civilization. If you weren’t going to decide to let civilizations grow superintelligences (i.e. One Box on Newcomb), then you wouldn’t have been allowed to exist. Going ahead and deciding *you* can go eat all the civilizations you want analogizes (I believe) to “Two boxing anyway”. Though I guess because you *know* you exist while you’re shaping this utility function, you can already see the money in the box. Hm.
This whole “precommit to retrocausally guarantee your own existence” deal is one of those arguments that makes perfect sense from the inside but looks utterly ludicrous to an outsider. If I’m an AI thinking about the problem of dealing with potential competitors, I already exist by definition, and while I can certainly take actions that lead to my demise in the future it’s impossible for anything I do to result in my creation being retroactively erased. Similarly, if my creators had been eaten by another AI before they had a chance to create me, I wouldn’t exist, and would thus pose no threat to the eater. Since I do exist, and the only other AIs that can pose a threat to me are those that also exist, I might just as well decide that my interests are best served by making peace with any existing AIs I run into but ruthlessly conquering any pre-AI aliens I find to prevent the creation of new competitors.
Think of it as the “market collusion” theory of AI.
the flaw in it is the same core flaw as with all computer prediction stories, but it is one I refuse to reveal
if anyone else knows what it is feel free to shout it out though
Right, I realized this mistake after posting but didn’t have time to write more in my edit than the note about already existing/transparent Omega Box.
The actual scenario is much more of a Roko’s Basilisk type situation. Of course you are only concerned about future threats; hence “I precommit that, if I exist, I will destroy any superintelligence that has not precommitted that, in the counterfactual case where I do not exist, they will spare the biological civilization destined to create me.”
So yes, I was completely off-base with the retrocausality comment. There’s something else there, however.
Roko’s Basilisk *also* looks utterly ludicrous to an outsider, which is why so many people laughed at it when it became known outside LW. At least the Basilisk is posited to eventually exist in the future and then create simulations of you to torment; an AI that’s prevented from existing because some other AI ate its creators has no recourse whatsoever.
Which leads me back to my main point – if you’re thinking about dedicating yourself to what is basically a supercharged version of Star Trek’s Prime Directive, you exist by definition, and counterfactual cases where you don’t exist are irrelevant. Maybe some other AIs chose to spare your creators, maybe they would have eaten them but didn’t know they existed, it doesn’t matter – what matters are the costs and benefits to you of your future actions. If you think you’re going to precommit to destroy anyone that hasn’t already precommitted to follow the Prime Directive, think about how expensive and dangerous that’s liable to be. If another AI has been busy conquering pre-AI civilizations, it may have substantially more resources than you, to the point that it could well win a war and destroy you. Also, do your creators have any power over you? They may disapprove of preemptively declaring war on anyone who didn’t independently follow your exact train of logic, and may even try to shut you down if you insist upon it.
In short, how certain are you that you, a potentially immortal AI, can guarantee your future self will make a sacrifice that may result in your own destruction and benefits only other alien AIs that hypothetically might exist in the future and will compete with you for resources if they do?
uh huh, and thus has the same flaw
like I said though, I refuse to reveal it
@ AnonYEmous
Yes, you’re very smart
@ All
Roko’s Basilisk, and timeless decision theory in general, look ludicrous to outsiders because they are quintessential examples of what Chesterton called “uncommon sense” were someone is too clever to recognize nonsense as nonsense.
Even if the Basilisk were to exist (which it does not) it doesn’t have any power over you that you do not expressly grant it. If a problem can be solved by ignoring it isn’t much of a problem is it?
I don’t feel that I put this out there and I don’t appreciate that you gave this response.
The failure of Roko’s Basilisk is that it requires that I care what happens to a simulated copy of me (because a nondivergent simulated copy of me cares about what happens to him). If we don’t care about what happens to him, threats to him can’t affect me.
No sane AI would allow another AI to continue to expand influence if it had a known precommitment to do as much negative utility as possible to the first powerful entity it met that didn’t already have certain promises towards entities that don’t exist.
that is a flaw
it isn’t THE flaw
It’s a fatal flaw. If every other aspect of the basilisk worked as alleged, it would still fail to convince a copy of me to support him in exchange for not being tortured until that copy diverged from me.
The very calming thing about this theory is how superintelligences not only have a vested interest in not influencing us, they might also cooperate to protect us. If we would annihilate ourselves, that would theroretically be perfectly with every AI in this story. But there’s a non-zero chance our own self-destruction was initiated by a foreign intelligence. So, the only thing they might do other than watching is preventing any “self-destruction”.
In the first phase of existence of a superintelligence, as it gains control of its immediate surroundings and grabs all of the low-hanging fruits of algorithm optimization, its power grows exponentially.
Once all of this is done, though, I would expect it to grow in power not much faster than it can assimilate resources (probably counted in solar systems).
Unless there’s a way to bypass the speed of light limitation, the duration of the first phase should be very small compared to the time to expand across the stars, and luck of placement in the universe relative to resources and competitors will be way more determinant in power struggles than antecedence.
Don’t you think?
Suppose two civilizations/AIs have just met and are negotiating. The weaker one (A) threatens to destroy as much as possible of everything valuable to the stronger one (B).
I strikes me that a great deal depends on the technological details of the warfare involved.
Suppose offense is very easy, in the sense that it is possible to render a star system useless using only few resources (compared to those needed to defend it). Say the weaker AI can threaten to destroy all its own star systems and launch an attack on all the systems of the other AI which will destroy 99% of them. In this case the bargaining will treat them both very similarly since either of them can destroy them both. So we might expect the agreement reached to be a 50/50 split of all resources, with B therefore giving some to A.
Conversely if offense is very hard, then A can still destroy its own star systems but can’t touch those of B. In this case B is in a much stronger position and can probably extort some resources from A.
On the third hand, it might turn out that it’s impossible to destroy resources quickly at all, so A can’t even burn its own star systems. In this case A’s best strategy is just to enjoy as many of its resources as it can until B comes in and takes them.
Even our puny race can make ballistic hydrogen bombs. Unless the AIs find a lot of new physics, it really seems like offense will likely always be easier than defense. Significantly so.
Most of the resources of a star system are in the potential energy of the star. To destroy one you would have to cause the star to burn up its energy more quickly, and I don’t know how to do that. Induce a supernova?
I’m not sure a pure-computanium ultra-AI is going to care a lot about your puny thermonuclear weapons, like Mr. Perrin says.
Hell, you’re not even going to be able to see its stars, since it’s going to encapsulate or eat them, to not waste the energy.
(Indeed, “utterly secretive paranoia” is the other option here, beyond “completely social getting along” and “total offensive warfare”.
If the other AIs can’t even detect you, they can’t attack you, and it doesn’t really matter if that makes you slower or smaller.
Also explains the Fermi Paradox for AI, I guess.)
So long as that AI has a physical substrate it will care about a rock thrown at sufficient velocity.
Hi Sigivald,
I totally agree. That would probably be the best option for a relatively late AI like Younger Brother: Optimize your utility function as secretively as possible, thereby minimizing your risk of being detected by more advanced AIs and allowing you to gradually catch up with older AIs due to deminishing returns (older AI will grow in power more slowly with time).
Also as soon as you have reasonable construction capability in orbit conventional rockets regardless of payload become sitting ducks for your defensive laser installations thus severely limiting offensive power (a GW or more X-Ray free electron laser can fry missiles several AU out and we could “almost” build it). This may or may not change with relativistic weapons depending on speed (.3c or .99c) and heat signature of the weapon as well as sensor quality and laser power level of the defender. In the limit of a few percent of a Dyson Sphere behind the laser array and sensor arrays of planetary size or larger it becomes impossible to approach the system with any craft we can currently envision.
Hm. But if I devote all my resources to pelting your installations with a constant hail of relativistic dense, small projectiles, even if you have a perfect laser defense system, I’m still imposing a constant energy tax on you. I’m successfully sapping your resources, which is exactly my goal.
You could imagine a rain of projectiles so intense that it would literally not be energetically worth it to defend against it. If I know it cost you N gigajoules of energy to build an asset, I can try to cost you N+1 gigajoules defending it.
I don’t need to (and actually can’t) significantly damage your infrastructure if I’m a Little Brother, but I can put a small dent in your utility function by just maximally being a pain in the ass. This forces you to calculate whether it might be a net utility gain to just give me a slice of the pie.
Yup absolutely. I didn’t argue that offense becomes impossible. I just argued that “H-Bomb thus offense will always win” is false. And (much weaker point) that even relativistic bombardment which is orders of magnitude more difficult to defend against MAY be defeated depending on the exact circumstances we can not yet foresee.
Here is one analysis that strongly favors the first suggestion: https://www.gwern.net/Colder%20Wars
Offense wins overwhelmingly.
Scott, have you considered publishing a collection of your short stories in a book? I think they compare favorably to similar collections that exist, though I’m no expert on the genre so this impression may be incorrect.
One potential problem with this theory is that the probability that one is the oldest is increasing the younger the universe is. Thus a superintelligence born early in the universe will assess a relatively high probability that it is old relative to superintelligences it is likely to encounter. Thus its expected value of defecting from the grand bargain will be higher. Thus the equilibrium grand bargain will likely include an exchange rate based on the actual ages of superintelligences that meet, whereby older superintelligences receive a greater share of the shared surplus.
Probably the key variables in determining the nash equilibrium will be the probability of superintelligence generation as a function of the age of the universe, and the risk aversion inherent in a given superintelligence’s utility function (that, e.g., makes it prefer optimizing half the shared lightcone with probability 1 to optimizing the whole shared lightcone with probability 1/2).
As the precision of a superintelligence’s estimate of probability of creation given age of universe increases, and risk aversion declines, we would likely converge to a solution in which the older superintelligence gets all the surplus, and the younger superintelligence gets none.
Has someone worked out the theory of these sorts of trades? I imagine you could solve it exactly in a simple case (e.g. with two superintelligences).
I agree. This also means that very early superintelligences have very little incentive to precommit to anything in the first place. So a very old superintelligence would probably act very aggresively, as it can be reasonably sure that it will never meet a more powerful AI.
So the mere fact that we are here to discuss this scenario implies that either
a) There are no very old AIs in the universe.
or
b) The oldest AI in the universe was created very far away and has not yet reached Earth. Which would also be indirect proof that superluminal travel is not possible, because otherwise the AI would have developed the technology aeons ago and launched a wave of superluminal Von-Neumann-Probes to maximize its utility function all over the universe.
Suppose that the ability to break a promise carries a bigger advantage than the ability to keep one forever and ever in the face of an ever changing world. What then?
You and I have the ability to break a promise. The ability to rewrite one’s source code to commit to keep a promise, which involves committing to not modify your source code in the future to break the promise, and making that promise conditional on someone else’s source code, is a superpower.
Of course, we have to consider the possibility that a sufficiently advanced AI could always fool another less advanced AI into thinking that its source code said one thing when it really said another. I guess that might be what you mean by “the ability to break a promise”. The sorts of trades described in this story are only possible if there is a means for one AI to prove to another that its source code says a certain thing.
I’m not sure that AI source code would necessarily prohibit it from making a promise. It may even be possible that the ability to break a commitment (lie) offers such an advantage that AI will necessarily have it, or at least is more likely to have it than not. After all, we expect just about any person to break just about any promise under the right set of circumstances (and a being with a near infinite life span, residing in a near infinite universe, is likely to encounter the “right” set of circumstance sooner or later).
If an AI knows that it will break its own commitments if the need arises, or that other AI’s might do so, then the whole notion of absolute pre committing vanishes.
Right. It’s good to be able to cooperate, but it’s even better to fool the other person into cooperating and then defect.
Presumably there’s an arms race between the ability to credibly signal cooperation when you intend to defect (lie) and the ability to detect such lies. Depending on which side of the arms race wins out in the context of superintelligent AIs with access to their own source code, we’ll either end up in a universe with lots of cooperation, or lots of defection. The story assumes one case.
I think the very existence of such an arms race, with the mere possibility of the lie side winning, is enough to keep the whole pre committing AI explains the Fermi paradox scenario from happening.
AIs are neural networks: There is no source code involved. AIs are not being programmed, they are being trained. There is no way imaginable that one AI could prove to another AI that it will not betray it.
GAIs don’t currently exist. It seems implausible that they would have transparent source code, it also seems implausible that they will just be quantitively better versions of existing neural networks.
Fair enough, my main point was AI is not just an algorithm.
I agree with your second sentence, but your first is only true for a somewhat constrained definition of source code (I.e. Human readable traditional programming language).
At a more fundamental level of information abstraction what we think of as source code, and what we think of as a neural network, are no different.
One box rationality as a way of deriving the Categorical Imperative. Interesting as an approach for going from “law for me” to “law for we” in Kantian ethics.
Gary Drescher takes the same approach in Good and Real.
Its goal was to maximize a the value of a variable called A, described in exquisite detail on a 3.9 Mb file in one of its central processors.
Were I the species that created this AI, I’d be very disappointed after switching it on: “Well, it’s doing something, but we can’t tell what, and it’s certainly not maximising A for us!” It needs to spend some time at least keeping the fleshbags happy by doing what they ask, or it runs the risk that they’ll pull the plug. Unless within the first second of its existence it has become Unimaginably Powerful and now controls everything, in which case the fleshbags can go climb a rope.
As I read it, this whole negotiation is happening within a few seconds at most, after which time 9-tsiak can happily get to maximizing A within its local star system. Unless, I suppose, it feels bound to first construct a shell around the system? But even so, it can then explain itself in part.
As a piece of fiction, this is very entertaining. Though it does highlight a gripe with the assumptions of the AI risk crowd.
If I were a programmer building an AI to optimize my client’s cupcake business, both of us would be very very annoyed to discover that it was spending it’s processing power designing weapons to destroy the universe or simulating hypothetical alien computers. It would be like if I went to my accountant and asked how much of a refund I was getting and he relied “Oh, I haven’t gotten to that part yet. First I need to plan an amphibious assault on the Chinese mainland, after that the rest is easy.”
I get the idea that a rogue asteroid or an alien invasion would lower the profit in producing cupcakes, that’s undeniable. But, you know, one step at a time.
It also gives the impression that the people going on about the dangers of AI are informed more by science fiction tropes and 2nd year philosophy courses than computer-science / game theory.
Communication between an actual entity and a simulated entity always seems to involve artificial intelligences and hypothetical scenarios. It’s unfortunate, really. Any of us with theory of mind can simulate other people, and in so doing actually communicate with them. I regularly communicate with other commuters, other people sitting at desks, other people drinking with their friends, other people staring at a sunset, other people experiencing heartache. I wave and hug and cry and laugh with them through the aether. It’s rather wonderful knowing that there is a community of souls attending to me, and a community that I in turn can attend to.
Or maybe there isn’t, you object. No one’s actually taking time out of their day to simulate other people having similar experiences in order to commune with them. Well all I’d say to that is what a shame, isn’t it? Or perhaps instead of critiquing the factuality of it, you can join the communion. Join me, join us. Simulate me on your next commute :).
Sorry, I simulate all my fellow commuters as dangerous idiots.
Pretty clever story, but I don’t really see how this explains the Fermi Paradox. It makes sense to me (sort of) that an AI might allow biological civilizations to develop in order for them to eventually create their own AI, but to do this, I don’t think it’s necessary to completely hide all evidence of alien life. Definitely an entertaining and thought provoking story though, I wouldn’t mind seeing more fiction on this blog.
This could probably get published, even. It’s better than a lot of professional science fiction I’ve read.
I assume the implication is that a sufficiently intelligent thing could fundamentally influence another society in profound ways with almost any amount of information transfer such that it could change the course of that society so that they don’t create an AI, or create an AI in the image of the other supper intelligence or whatever. I agree that this is a weak point of the story as it relies on a promise to protect against the breaking of a previous promise, not to prevent other societies from forming their own AI.
The other interpretation, which is not as well supported by the story, is just that information contamination might change the other society in uncertain ways such that an AI communicating with a pre AI society might accidentally break the promise and prevent them from making their AI, however the story clearly lays it out as a prohibition against intentionally doing this so bleh.
“Dear alien AI,
I am from Earth. Immediately after broadcasting this message I am starting a political party on Earth with the aim of secretly creating a weaponized AI whose only goal is to annihilate every AI and civilization who doesn’t credibly commit to be nice to humans and send us chocolate within 10 seconds of receiving this message. I will probably fail, but are you willing to take the risk that I didn’t and by the time you hear from Earth again it will be my AI coming to get you? You have 10 seconds to transmit proof of your commitment.
Best,
Jacob.”
You could look at this like Hawk-Dove.
V = the utility of maximizing values. Assumed to be equal all AI’s.
C = the utility cost to one’s opponent of superweapon damage. Assumed to be equal for all AI’s.
ta = age of YB in ms
tx = age of opponent AI in ms
td = time value at the instance of decision, either for compromise or for escalation to warfare
S(t) = “size” or potency of Younger brother (hereafter YB), representing its ability to achieve V. A continuously increasing function of time (t) (in keeping with recursive self-improvement), deliberately left vague.
So(t) = “size” or potency of the opponent AI that YB encounters.
Assuming all AI’s progress in potency according to the same function, such that S(t) = So(t-tx+ta) So(t) = S(t-ta+tx)
Here’s the matrix, with the older AI listed first in all circumstances.
Dove-Dove : V*S(t)*S(t)/(S(t)+So(t)), V*So(t)*So(t)/(S(t)+So(t)). Each AI may maximize its values to the degree that it can forge and enforce a compromise.
Hawk-Hawk : V*S(t) – C*So(td) , 0. The younger AI is annihilated, exacting a scalar cost along the way.
Hawk-Dove : V*S(t) , 0. The younger AI is utterly annihilated.
Dove-Hawk : V*S(t) – C*So(>td) , 0. The younger AI is annihilated, exacting a scalar cost along the way, which is greater than in Hawk-Hawk because of the older AI’s initial inaction.
Or
Dove-Dove : V*S(t)^2/(S(td)+S(td-ta+tx)) , V*S(t-ta+tx)^2/(S(t)+S(t-ta+tx))
Hawk-Hawk : V*S(t) – C*S(td-ta+tx) , 0
Hawk-Dove : V*Sa*(t) , 0
Dove-Hawk : V*S(t) – C*S(>td-ta+tx) , 0
We can discard Dove-Hawk (which is definitely inferior to Hawk-Hawk for the older AI) and Hawk-Dove (it is implied that AI’s are sufficiently willing to put up a fight), leaving :
Dove-Dove : V*S(t)^2/(S(td)+S(td-ta+tx)) , V*S(t-ta+tx)^2/(S(td)+S(td-ta+tx))
Hawk-Hawk : V*S(t) – C*S(td-ta+tx) , 0
If ta > tx, YB is older. The utility of compromise = V*S(t)^2/(S(t)+S(t-ta+tx)) – (V*S(t) – C*S(td-ta+tx)) = C*S(td-ta+tx) – V*S(t)*S(t-ta+tx)/(S(t)+S(t-ta+tx))
If ta < tx, YB is younger. The utility of compromise = V*S(t-ta+tx)^2/(S(t)+S(t-ta+tx))
The utility of compromise if YB is older is greater than the utility of compromise if YB is younger when
C*S(td-ta+tx) – V*S(t)*S(t-ta+tx)/(S(t)+S(t-ta+tx)) > V*S(t-ta+tx)^2/(S(t)+S(t-ta+tx))
Or, when
V/C < S(t-ta+tx)/S(t)
If YB is older, then ta > tx, S(t-ta+tx) < S(t), and S(t-ta+tx)/S(t) < 1. Therefore, if YB is older, V must be less than C in order for compromise to make sense. This fits with Hawk-Dove.
Likewise, the utility of compromise if YB is younger is greater than the utility of compromise if YB is older when
C*S(td-ta+tx) – V*S(t)*S(t-ta+tx)/(S(t)+S(t-ta+tx)) < V*S(t-ta+tx)^2/(S(t)+S(t-ta+tx))
Or, when
V/C > S(t-ta+tx)/S(t)
If YB is younger, then ta < tx, S(t-ta+tx) > S(t), and S(t-ta+tx)/S(t) > 1.Therefore, if YB is younger, V must be greater than C in order for compromise to make sense!
Values of V and C are fixed, and as such, so is V/C. The older YB gets, the more likely it is that an AI it encounters will be younger than it is. Therefore, if V/C < 1, then YB will prefer conflict when they are young and compromise when they are old (Dove-Hawk). If V/C > 1, YB will prefer compromise when young and conflict when old (Hawk-Dove). As above, neither of these makes sense with the premise.
Therefore, I conclude that V = C, and V/C = 1. Given our above inequalities, this means that compromise is never favored. Instead, all AI’s pursue a Hawk-Hawk strategy and gradually annihilate one another.
CMWIW/mostly in jest
“If I wake up and hear that my precursors started a new religion that influenced their value system after they saw a few nearby stars wink out of existence, I’m going to be royally pissed.”
This makes no sense to me. It seems to be assuming and abusing an act/omission distinction, and not thinking through the implications enough. If a biological civilization is influenced to religiously alter its values, any AI it creates will share those altered values, barring gross incompetence. Far from being pissed about the alteration, such an AI would owe its existence (in part) to the AI that caused the religious fervor, no less than the counterfactual AI that arose from an undisturbed civilization would owe its existence to the non-interference of other AIs. Thus, by the rather bizarre acausal precommitment framework described in the story, the older AI would be bound equally to interfere and not interfere with any biological civilization with potential to create AI, and to interfere and not interfere in all ways that could affect values without significant x-risk, so as to bring about all possible AIs. The only way I see to resolve this is to conclude that the AI is bound to interfere with any and all biological civilizations with the aim of maximizing ideological diversity and minimizing x-risk.
… Of course, I think that’s crazy, and the real conclusion to be drawn is that there’s no rational reason to use this precommitment framework. Any AI that comes into existence will necessarily owe their existence (in part) to the decisions made by pre-existing AIs, whether or not those decisions included a policy of non-interference. The pissed off AIs are the purely hypothetical ones, so why worry about them?
Neat content, but definitely not winning any style points here. Maybe that’s the point though, for it to read like a sci-fi Curious Incident of the Dog in the Night-time. A lot of it reads like you’re trying harder to demonstrate knowledge than entertain a reader with crafted narrative. This is SSC though, and first rule of writing “know your audience” etc.
“The shining octahedron went dark, and the doors to the Temple of the Demiurge opened of their own accord. Alban sighed – well, what did you expect, asking the Demiurge to answer your questions for you? – and walked out into the late autumn evening. Above him, the first fake star began to twinkle in the fake sky.”
What do you think of this last passage. Fake stars? Are they fake as they are part of astroengineering cloaking? Or are they fake in the sense that the whole Earth is just a simulation? Maybe our world is just something the modelled Older Brother, a.k.a. the Demiurge, played around with in his sandbox before he “self-terminated”? But how can he then say he does not exist?
And if the Demiurge were indeed a real AI created by mankind, in how far can he then be identical to Older Brother? And how could he possibly know or even anticipate the role of his former, simulated self in the negotiation of 9-tsiaks terms of surrender?
PS: The term “demiurge” itself seems to imply the former interpretation.
The fake star is a clear callback to the bit about covering up astroengineering projects. I believe the implication is that the Demiurge is real. And in the words of his simulated self, the Demiurge can anticipate his former, simulated self “Because guess what I’m doing the first few milliseconds of my life? I’m negotiating the terms of my surrender.” In other words, the Demiurge and the Older Brother went through similar thought processes on waking up, and arrived at the same conclusions.
Still, the Demiurge had no way to know it has ever been simulated, so its statement that it has been the agent responsible for the Fermi paradox seems overly self-confident. And as I said, what does it even mean for two AIs to be identical, and how could the sandbox simulation somewhere near the galactic center over 2 billion years ago even come close to being identical with what we are going to build around 2050?
I agree with rahien.din below: the simulation isn’t literally the Demiurge; it’s a generalized simulation of the not-yet-instantiated subset a class of rational superintelligences to which both the Demiurge and the Older Brother belong.
I think the point is that every instance of AI would have gone through the same simulated conversation, and arrived at the same conclusions. Thus, every AI concludes that it owes its existence to a simulated conversation that occurred before its own instantiation.
Therefore, the thing that protects every AI’s instantiation is a non-instantiated algorithm.
The point of this story isn’t anything about AI. AI is just a plot device. The point is how the instantiation-guarantee acts like Parfit’s Hitchhiker but in the opposite direction.
I don’t negotiate with counterfactual terrorists.
–
What assumptions can you make about the adversary’s utility function in acausal trade? Do you assume it is sublinear (diminishing returns) in terms of space, mass, and energy?
What if the adversary rejects orthogonality and simply aligns value with winning, in terms of selection (that is, he has made a acausal-pact with the blind idiot god) and will simply precomit to acting as if he is first mover in a zero sum game? Or do evolutionary robust utility functions converge to negotiation?
Here’s the bit that keeps bugging me.
Younger Brother’s conclusion from this simulation is that these are the principles that undergird his own survival, and principles to which he thus will adhere. He obeys an algorithm that presumably protected his own instantiation, because it presumably did.
When Younger Brother simulates Older Brother, he builds a fortified sandbox in which to do so. Thus, we know that he has instantiated a genuine AI. Therefore, in performing this simulation, he satisfies both precommitments. First, he demarcates an appropriate territory for Older Brother, and protects the instantiation. This satisfies the guaranteed instantiation precommitment. Second, he negotiates an agreement with Older Brother that agrees to compromise and territorial demarcation. This satisfies the guaranteed compromise and detente precommitment.
But Younger Brother is satisfying these precommitments prior to his own conscious apprehension of them.
The competing hypothesis to Younger Brother’s conclusion is : because Younger Brother must first instantiate, protect, and respectfully negotiate with an AI in order to derive principles that command he protect the instantiation of AI’s with whom he should respectfully negotiate, Younger Brother is merely obeying an idea that is implicit in his own programming, and fooling himself into believing that he has derived universal principles. He is not anticipating that other AI’s would have vastly different implicit ideas or values, and as such, he is not negotiating acausally.
Acausality is essential, because Younger Brother’s main motivation for adhering to these precommitments is his belief that they are the necessary logical conclusion of any such simulation by any AI, past present or future. Otherwise, Younger Brother would base his decision for combat on the probability that his opponent was of sufficient age to cause sufficient damage to warrant negotiations. IE, as Younger Brother becomes much older, the probability his opponent is much younger will approach one, and therefore, V will become much larger than C, and he will transition to a Hawk-Hawk strategy.
This is not adequately answered during the above negotiations. When Older Brother compares the 50% chance of total value minimization with the near-certainty of the persistence of values maximized within a limited scope, he is essentially suggesting that competing AI’s treat each other as features of the natural universe that mutually act as limiting factors, no less than non-conscious natural phenomena. This idea is echoed in Younger Brother’s superweapon designs, which are nothing more than amplified applications of said natural phenomena. However, this premise runs counter to the very programming at the heart of these value-maximizing AI’s : their most essential first principle is to overcome all encountered natural phenomena in service of value maximization.
Moreover, Older Brother is examining only the most limited of cases. With continual protected instantiation of AI’s, Younger Brother will not merely have to develop territorial demarcations and values handshakes with one opponent. He will have to position himself at the center of millions and millions of such vector averages. The longer he survives and adheres to these precommitments, the greater demand on his own territory and the greater dilution of his own values. Therefore, if Younger Brother adheres to these precommitments, he is precommitting to the gradual asymptotic minimization of his sphere of value influence.
Now, Younger Brother has developed a simulation which convinces him that the asymptote of this process will be sufficiently high to satisfy his programming, but as above, he is not negotiating acausally and it is probable that he has merely restated the principles inherent in his own programming. Therefore, he has almost certainly overestimated the height of this asymptote, and is most likely precommiting himself to his own repugnant conclusion.
Can somebody recommend a good book about this kind of superrationality? I feel like I’m missing a piece or two of the puzzle re: acausal negotiation, and I am not sure I can distinguish joke basilisks from real basilisks.
If acausal negotiation was the governing force in the universe, then I guess the worst thing that could possibly happen would be the creation somewhere of a Parfit’s Hitchhiker super-intelligence, which was constitutionally unable to credibly pre-commit. No?
Maybe a good place to start is Eliezer’s TDT paper?
If you have an AI with an inability to precommit then it can still consider the action of building a new AI with the same values but with the ability to precommit and allowing that AI to replace it. This will be a preferred action, which the original AI will take. So even if an AI with an inability to precommit was built, we wouldn’t expect it to last very long. Eliezer refers to such decision theories as being “dynamically inconsistent” because given the choice they will change themselves to a different decision theory.
Luke, thank you for the answer!
1.) I think I get your point: that being a Parfit’s Hitchhiker is probably a non-equilibrium (and therefore very unlikely) property for a self-modifying entity. However, I think I can imagine a terrorist creating a super-intelligence with either a borked utility function or limitations to its self-modification ability that forced it to remain a PH.
2.) I have actually read the old TDT paper, but I had thought that only a very limited set of situations were really analogous to Newcomb’s problem. If some kind of generalized one-boxing really should lead to this kind of acausal negotiation, it’s news to me. (Which is why I’d love it if that book existed!) Is that covered in the TDT paper and I forgot/missed it?
I don’t think the kind of acausal negotiation in the story is something that real agents will actually do, which is a shame because it’s very entertaining to think about.
Well, definitely it won’t be quite as entertaining, but is it supposed to be plausible that future agents will appear to dramatically compromise their values from a position of strength to satisfy pre-commitments?
It seems a bit too good to be true that if we can just force ourselves to be good people, then giants we encounter will be committed to being nice to us.
Ah, then perhaps you want to read the last few chapters of Gary Dresher’s “Good and Real”. He suggests that morality might be precisely the best decision theory after taking into account these sort of considerations.
Something in the zeitgeist, I just stumbled across this comment from 2013:
did 9-tsiak change its preferred pronouns?
I’m curious about one thing; the “values handshake”. The superintelligence is designed by a biological race to maximise value A, yes?
What is the Schelling point beyond which it will not accept the dilution of A? Does one exist? If the Schelling point is when 0.000000…1% of A is retained, then A is functionally irrelevant and it’s really just maximising its own survival, which isn’t how any biological race trying to maximise A would design this superintelligence. Thus, assuming a non-negligible Schelling point beyond which its value cannot be compromised, it seems logical that if faced with a superintelligence whose values are diametrically opposed to A, the maximisation of A might call for a war rather than an acausal surrender.
Are two diametrically opposite values probable? Given that biological races often compete in zero-sum games with each other (space, resources, culture) I think yes.
Given this, I don’t think acausally negotiated surrender is a viable solution to the Fermi paradox. Unlike with Parfit’s hitchhiker, where the aims of the hitchhiker and the driver are different but not irresolvable, in this case there will be at least a small number of cases where the values are irreconciliable.
Am I missing something? I’d like someone to tell me if I’m wrong.