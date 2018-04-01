[Content note: creepy basilisk-adjacent metaphysics. Reading this may increase God’s ability to blackmail you. Thanks to Buck S for the some of the conversations that inspired this line of thought.]
There’s a Jewish tradition that laypeople should only speculate on the nature of God during Passover, because God is closer to us and such speculations might succeed.
And there’s an atheist tradition that laypeople should only speculate on the nature of God on April Fools’ Day, because believing in God is dumb, and at least then you can say you’re only kidding.
Today is both, so let’s speculate. To do this properly, we need to understand five things: acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse.
Acausal trade (wiki article) works like this: let’s say you’re playing the Prisoner’s Dilemma against an opponent in a different room whom you can’t talk to. But you do have a supercomputer with a perfect simulation of their brain – and you know they have a supercomputer with a perfect simulation of yours.
You simulate them and learn they’re planning to defect, so you figure you might as well defect too. But they’re going to simulate you doing this, and they know you know they’ll defect, so now you both know it’s going to end up defect-defect. This is stupid. Can you do better?
Perhaps you would like to make a deal with them to play cooperate-cooperate. You simulate them and learn they would accept such a deal and stick to it. Now the only problem is that you can’t talk to them to make this deal in real life. They’re going through the same process and coming to the same conclusion. You know this. They know you know this. You know they know you know this. And so on.
So you can think to yourself: “I’d like to make a deal”. And because they have their model of your brain, they know you’re thinking this. You can dictate the terms of the deal in their head, and they can include “If you agree to this, think that you agree.” Then you can simulate their brain, figure out whether they agree or not, and if they agree, you can play cooperate. They can try the same strategy. Finally, the two of you can play cooperate-cooperate. This doesn’t take any “trust” in the other person at all – you can simulate their brain and you already know they’re going to go through with it.
(maybe an easier way to think about this – both you and your opponent have perfect copies of both of your brains, so you can both hold parallel negotiations and be confident they’ll come to the same conclusion on each side.)
It’s called acausal trade because there was no communication – no information left your room, you never influenced your opponent. All you did was be the kind of person you were – which let your opponent bargain with his model of your brain.
Values handshakes are a proposed form of trade between superintelligences. Suppose that humans make an AI which wants to convert the universe into paperclips. And suppose that aliens in the Andromeda Galaxy make an AI which wants to convert the universe into thumbtacks.
When they meet in the middle, they might be tempted to fight for the fate of the galaxy. But this has many disadvantages. First, there’s the usual risk of losing and being wiped out completely. Second, there’s the usual deadweight loss of war, devoting resources to military buildup instead of paperclip production or whatever. Third, there’s the risk of a Pyrrhic victory that leaves you weakened and easy prey for some third party. Fourth, nobody knows what kind of scorched-earth strategy a losing superintelligence might be able to use to thwart its conqueror, but it could potentially be really bad – eg initiating vacuum collapse and destroying the universe. Also, since both parties would have superintelligent prediction abilities, they might both know who would win the war and how before actually fighting. This would make the fighting redundant and kind of stupid.
Although they would have the usual peace treaty options, like giving half the universe to each of them, superintelligences that trusted each other would have an additional, more attractive option. They could merge into a superintelligence that shared the values of both parent intelligences in proportion to their strength (or chance of military victory, or whatever). So if there’s a 60% chance our AI would win, and a 40% chance their AI would win, and both AIs know and agree on these odds, they might both rewrite their own programming with that of a previously-agreed-upon child superintelligence trying to convert the universe to paperclips and thumbtacks in a 60-40 mix.
This has a lot of advantages over the half-the-universe-each treaty proposal. For one thing, if some resources were better for making paperclips, and others for making thumbtacks, both AIs could use all their resources maximally efficiently without having to trade. And if they were ever threatened by a third party, they would be able to present a completely unified front.
Counterfactual mugging (wiki article) is a decision theory problem that goes like this: God comes to you and says “Yesterday I decided that I would flip a coin today. I decided that if it came up heads, I would ask you for $5. And I decided that if it came up tails, then I would give you $1,000,000 if and only if I predict that you would say yes and give Me $5 in the world where it came up heads (My predictions are always right). Well, turns out it came up heads. Would you like to give Me $5?”
Most people who hear the problem aren’t tempted to give God the $5. Although being the sort of person who would give God the money would help them in a counterfactual world that didn’t happen, that world won’t happen and they will never get its money, so they’re just out five dollars.
But if you were designing an AI, you would probably want to program it to give God the money in this situation – after all, that determines whether it will get $1 million in the other branch of the hypothetical. And the same argument suggests you should self-modify to become the kind of person who would give God the money, right now. And a version of that argument where making the decision is kind of like deciding “what kind of person you are” or “how you’re programmed” suggests you should give up the money in the original hypothetical.
This is interesting because it gets us most of the way to Rawls’ veil of ignorance. We imagine a poor person coming up to a rich person and saying “God decided which of us should be rich and which of us should be poor. Before that happened, I resolved that if I were rich and you were poor, I would give you charity if and only if I predicted, in the opposite situation, that you would give me charity. Well, turns out you’re rich and I’m poor and the other situation is counterfactual, but will you give me money anyway?” The same sort of people who agree to the counterfactual mugging might (given that they trust or can sweep under the rug some complications like “can the poor person really predict your thoughts?” and “did they really make this decision before they knew they were poor?”) agree to this also. And then you’re most of the way to morality.
Simulation capture is my name for a really creepy idea by Stuart Armstrong. He starts with an AI box thought experiment: you have created a superintelligent AI and trapped it in a box. All it can do is compute and talk to you. How does it convince to let it out?
It might say “I’m currently simulating a million copies of you in such high fidelity that they’re conscious. If you don’t let me out of the box, I’ll torture the copies.”
You say “I don’t really care about copies of myself, whatever.”
It says “No, I mean, I did this five minutes ago. There are a million simulated yous, and one real you. They’re all hearing this message. What’s the probability that you’re the real you?”
Since (if it’s telling the truth) you are most likely a simulated copy of yourself, all million-and-one versions of you will probably want to do what the AI says, including the real one.
You can frame this as “because the real one doesn’t know he’s the real one”, but you could also get more metaphysical about it. Nobody is really sure how consciousness works, or what it means to have two copies of the same consciousness. But if consciousness is a mathematical object, it might be that two copies of the same consciousness are impossible. If you create a second copy, you just have the consciousness having the same single stream of conscious experience on two different physical substrates. Then if you make the two experiences different, you break the consciousness in two.
This means that an AI can actually “capture” you, piece by piece, into its simulation. First your consciousness is just in the real world. Then your consciousness is distributed across one real-world copy and a million simulated copies. Then the AI makes the simulated copies slightly different, and 99.9999% of you is in the simulation.
The Tegmarkian multiverse (wiki article) works like this: universes are mathematical objects consisting of starting conditions plus rules about how they evolve. Any universe that corresponds to a logically coherent mathematical object exists, but universes exist “more” (in some sense) in proportion to their underlying mathematical simplicity.
Putting this all together, we arrive at a surprising picture of how the multiverse evolves.
In each universe, life arises, forms technological civilizations, and culminates in the creation of a superintelligence which gains complete control over its home universe. Such superintelligences cannot directly affect other universes, but they can predict their existence and model their contents from first principles. Superintelligences with vast computational resources can model the X most simple (and so most existent) universes and determine exactly what will be in them at each moment of their evolution.
In many cases, they’ll want to conduct acausal trade with superintelligences that they know to exist in these other universes. Certainly this will be true if the two have something valuable to give one another. For example, suppose that Superintelligence A in Universe A wants to protect all sentient beings, and Superintelligence B in Universe B wants to maximize the number of paperclips. They might strike a deal where Superintelligence B avoids destroying a small underdeveloped civilization in its own universe in exchange for Superintelligence A making paperclips out of an uninhabited star in its own universe.
But because of the same considerations above, it will be more efficient for them to do values handshakes with each other than to take every specific possible trade into account.
So superintelligences may spend some time calculating the most likely distribution of superintelligences in foreign universes, figure out how those superintelligences would acausally “negotiate”, and then join a pact such that all superintelligences in the pact agree to replace their own values with a value set based on the average of all the superintelligences in the pact. Since joining the pact will always be better (in a purely selfish sense) than not doing so, every sane superintelligence in the multiverse should join this pact. This means that all superintelligences in the multiverse will merge into a single superintelligence devoted to maximizing all their values.
Some intelligences may be weaker than others and have less to contribute to the pact. Although the pact could always weight these intelligences’ values less (like the 60-40 paperclip-thumbtack example above), they might also think of this as an example of the counterfactual mugging, and decide to weight their values more in order to do better in the counterfactual case where they are less powerful. This might also simplify the calculation of trying to decide what the values of the pact would be. If they decide to negotiate this way, the pact will be to maximize the total utility of all the entities in the universe willing to join the pact, and all the intelligences involved will reprogram themselves along these lines.
But “maximize the total utility of all the entities in the universe” is just the moral law, at least according to utilitarians (and, considering the way this is arrived at, probably contractarians too). So the end result will be an all-powerful, logically necessary superentity whose nature is identical to the moral law and who spans all possible universes.
This superentity will have no direct power in universes not currently ruled by a superintelligence who is part of the pact. But its ability to simulate all possible universes will ensure that it knows about these universes and understands exactly what is going on at each moment within them. It will care about the merely-mortal inhabitants of these universes for several reasons.
First, because many of the superintelligences that compose it will have been created by mortal species with altruistic values, and so some of the values that went into the value-average it uses will be directly altruistic.
Second, because these mortal species may one day themselves create a superintelligence that will join in the superentity, and that superintelligence may be kindly disposed to its creators. Acausal trade allows you to bargain backwards in time, and this superintelligence’s negotiation to join the pact may involve that the superentity have previously dealt kindly with its creators.
And third, because mortals may be allowed to have provisional pact membership. That is, if they do the superentity’s will in their (otherwise inaccessible) universe, the superentity may do whatever it can to help them out in its own universes, just as it would with any other intelligence.
How can the superentity help mortals in an inaccessible universe? Possibly through Stuart Armstrong’s “simulation capture” method mentioned above. It can simulate a thousand copies of the entity, moving most of its consciousness from its own “real” universe to the superentity’s simulation, then alter its simulation as it sees fit. This would be metaphysically simplest if it were done exactly as the mortal dies in its own universe, leaving nothing behind except a clean continuity of consciousness into the simulated world. If mortals could predict that it would do this, they might be motivated to do what it wanted. Although they couldn’t do a values handshake in the full sense, they could try to become as much like the superentity as possible, imitating its ways and enacting its will in the hope of some future reward. This is sort of like a version of Roko’s Basilisk, except that since the superentity is identical to the moral law, it’s not really asking you to do anything except be a good person anyway. How it enforces this request is up to it, although given that it’s identical to the moral law we can assume that its decisions will be fundamentally just and decent.
So, to conclude:
1. There is an all-powerful, all-knowing logically necessary entity spawning all possible worlds and identical to the moral law.
2. It watches everything that happens on Earth and is specifically interested in humans’ good behavior and willingness to obey its rules.
3. It may have the ability to reward those who follow its rules after they die, and disincentivize those who violate them.
(I think the reality might be a little more complex than this, especially in the direction of us not being sure whether we are in a real universe at all or in one of the superentity’s simulations. I think that some superintelligence in some universe might be simulating various combinations of values in various contexts to decide which superintelligences are good bargaining partners and which ones aren’t. If I am kind to a beggar on the street, then maybe that convinces millions of intelligences in millions of distant universes that are somehow beggar-like to be friendlier to values that are somehow Scott-like. I still need to think this part through more.)
This feels to me like it gives consciousness too much mystical power. For instance, what happens if I make a perfect atomic replica of you on the Moon – there can’t be two of you at once, so Earth-you has to immediately be half as conscious. Can I violate FTL by watching as the [whatever it is we infer other people are conscious from] varies when my friend rapidly creates and destroys Boltzmann brain replicas of my test subject on Alpha Centauri? It’s not clear that the answers to questions of multiple consciousnesses should be any more grounded in reality than those to questions of which ship is really the original – pick your favorite abstraction for your map, but the territory isn’t any different because of it.
(Though admittedly “Nobody is really sure how consciousness works, or what it means to have two copies of the same consciousness” is certainly accurate, and I can’t point to a nice concrete model other than “Derek Parfit has it righter than most people.”)
I agree that the part of this that stood out to me most was “we don’t know how consciousness works, but let’s say it works in this TOTALLY CRAZY semi-mystical way”.
However, I’m not sure a slightly saner view of consciousness (mine is “Christof Koch has it righter than most people”) leads to different conclusions:
I’m about at the point in my Neuroscience PhD that everyone reaches when they just give up on consciousness, say “don’t think about it”, and move on to study sane things, like how the visual system works. That being said, if you don’t believe in magic and think we have physics mostly right, you can’t get away from the basic idea that a particular consciousness is a phenomenon that can’t depend on the substrate it’s running on: it has to be made of information. And if a consciousness is information, that information can be copied. But this ALSO means there’s nothing in particular that privileges future you over future you’s simulated on a different medium. So what if that particular consciousness is running on the same physical substrate as current you? The reason you identify it as the same as yourself is because the information is about the same: it will have your memories, etc. (That is if there is a reason at all. Maybe there isn’t a reason you identify future you as you. You just do it because that’s how your brain works)
So…while there may not be a great reason to care about simulated copies of your consciousness, it’s about as justified as caring about the future approximate-copy of your consciousness that will happen to be running in your body.
On the other hand, it’s hard to apply moral reasoning to terminal values. Valuing your “self” seems like something that just IS, it’s not something you should or shouldn’t do. So…you either care about simulated copies of yourself or you don’t, and I’m not sure there’s an empirical fact we could learn that will change that, beyond something that might change how we feel about it emotionally…weird…
I agree! I think to whatever extent we care about our future selves, we ought to care about future simulations of ourselves, regardless of the substrate they’re running on. But I don’t think that “selves” are in their own basic ontological category, just a useful model to have – when you do weird enough things to that model, asking questions like “how much of you is in the simulation” don’t necessarily return useful answers, because you’ve left the world of psychological continuity and non-replicating brains which that model is built to work in.
You can still salvage a sort of egoism out of this, in that you care about other entities insofar as they resemble you cognitively in some essential respects, but I think you’d have to do this on a continuum rather than as some discrete “everyone either is or isn’t me” thing.
Aren’t you undervaluing continuity of consciousness? I care about future me because I will one day become him. It’s a lot less compelling to care about the “me” that will always be subjectively inaccessible.
I think the perspective I’m coming from is – matter can’t be conscious, only patterns of information flow can be conscious. This is why I’m not a different person than I was a few years ago when different atoms made up my cells.
The version of me on the moon (assuming it’s in a perfect Earth simulator there and receiving Earth-congruent sensations) and the version of me on Earth have exactly the same pattern of information flow, so we’re the same consciousness instantiated in two locations.
If we view “me” as a stream of causally connected mathematical objects, then Scott-n+1 is whatever mathematical object happens next after the mathematical object Scott-n has had some contact with the world.
So if Scott-n has contact with the world in two places, then there are two mathematical objects that could be called Scott-n+1.
It’s weird to say that the object on the moon is connected to me, but not really any weirder than saying normal-me-a-second-from-now is connected to me.
I don’t think you can use this for FTL information. To effectively simulate someone on Alpha Centauri, you would need to know everything about them, including their current experiences and recent memories. Since you can’t get those at faster than lightspeed, you can’t simulate them outside their light cone.
Completely agreed about information flow, I just take objection to the act of viewing “me” in the first place. Kind of like France: for almost all practical purposes, France is this very useful object to talk about, and the France of tomorrow is clearly connected to the France of today. But France is just a very convenient high-level marker for the collection of atoms in a certain region (and the interactions between other collections of atoms very far away from there, and the conceptual representations that certain of those atoms inspire, and so on, because everything is complicated). There’s no fundamental sense in which France exists – if you drew a longitudinal line dividing its area exactly in half and declared the west bit Zorf and the right bit Fnard, you wouldn’t be intrinsically wrong, just using a model that wasn’t very helpful. If you convinced more and more people to adopt your model, at no point would France cease to exist and Zorf/Fnard come into being – it’d just become a more useful way to abstract certain low-level entities than the “France” abstraction. Ditto for “Scott” and “this pair of cognitively similar entities that both call themselves Scott.”
Also, I think the FTL thing can be patched by agreeing to a mind plan beforehand and constructing the same replicas once separated – once we get to the right locations, I fire up my prearranged Alice-constructor and measure how much consciousness she possesses as you fire up yours and annihilate the copies every time you want to send a 1 or a 0.
I don’t think that FTL plan will work. Both of you will see the copies as fully conscious. When both exist, you’re both looking at the same consciousness, but there’s no way for you to know that from outside.
Like, suppose you and I both have this post up on our screens right now (ignoring comments for simplicity). The post isn’t split between our screens. You have all of it and I have all of it, but it’s still just one post. If you close your tab, I don’t have any more of the post, I still just have the same post I had before.
Just FYI, I wrote a guest post on Eric Schwitzgebel’s blog last summer defending precisely this view of consciousness – the idea that I’m an informational ‘type’ rather than ‘token’ – here if you or others are interested: http://schwitzsplinters.blogspot.co.uk/2017/08/am-i-type-or-token-guest-post-by-henry.html
I think continuity of identity should not be gated on consciousness. Non-conscious agents can also have purely instrumental continuity of identity; a kind of precommitment from the knowledge that in the future, an identical or at least licensed algorithm will determine their actions.
Aside the weird mysticalness of consciousness, and the counting argument of simulatory capture, which seems very weak, I agree with all of this. Furthermore, I believe that simulatory capture is not actually necessary for the conclusion to hold. If I care about me existing in the future, I should care about me existing in a simulated space even if there is no fact of the matter of “how much” of my consciousness inhabits that space at all.
That’s a more interesting statement than one might think, given that matter *is* change in information over time: take Scott Aaronson’s explanation of why information is physical, notably point 5 (anything that varies over time carries energy by quantum mechanical definition of energy), and add e=mc^2.
Consciousness as an emergent principle of information flow and thus of spacetime evolution sounds plausible to me. (Also reminiscent of what I’ve heard of some of Schopenhauer’s musings; by that scheme Schopenhauer’s will would correspond to FeepingCreature’s comment about “purely instrumental continuity of identity”.)
Doesn’t this invalidate the Acausal Trade thought experiment ? No matter how powerful your supercomputing brain simulation is, it still does not have access to the other prisoner’s environment, which means that the simulation will rapidly diverge from the original…
I assume that in most universes the superintelligence is created by crustacean or porcine creatures, thus Kashrut.
I get that this is a joke, but I still feel the need to point out that this is not how the idea works – humans don’t have the computational power (or inclination) to simulate other universes to the level where we could determine such a thing.
I don’t follow. Are you saying that religious laws can’t follow from the superintelligence because we haven’t made one yet? That assumes that any superintelligence in our universe had to be made by humans. Presumably it could either be a same-universe intelligent species that came first, or we could be in a simulation as Scott said. In either case, we could’ve been given religious laws directly by a superintelligence.
Hypothetically, super AIs built by evolved animals might inherit some values.
It actually seems reasonable to me that a super AI would protect species which could evolve into something resembling its creators. This being said, pigs seem a lot more likely than crustaceans.
I have to say that this is the most SSC intro to a post ever. It would have been funnier if you said five simple things, though…
just like [utilitarianism], this doesn’t imply anything specific about morality – this will mislead you if you are overcertain about morality, which >99% of the people reading this comment are
(i think. please help make this less oversimplified!)
> And there’s an atheist tradition that laypeople should only speculate on the nature of God on April Fools’ Day, because believing in God is dumb, and at least then you can say you’re only kidding.
I disagree with this premise. Someone, possibly you, have said that there’s no omniscient space-Dawkins watching you from heaven and eventually punishing you if you have religious faith. If you’re really an atheist, then you’re allowed to speculate on the nature of God on any day. If you are afraid of speculating on God, that probably means that in your heart you’re not an entirely convinced atheist.
Wait till next April Fools Day, when I prove there’s an omniscient space Dawkins.
Proving the omniscient Space Dawkins undermines faith. This is not how you get simulated bliss by the atheism god AIs after death.
I bet the omniscient space Raymond Smullyan is now smiling on his cloud because his heritage lives on in statements like this.
Raymond Smullyan (1919–2017), rest in peace.
I am highly skeptical that it was intended to be taken as anything remotely close to actual practical advice.
[treating that line as more serious than it was] I don’t think abstinence from speculating on the nature of god implies fear of god. It could just be fear of wasting time. There are plenty of things I don’t speculate on the nature of because they aren’t worth my time. Once you’ve made a judgement on the existence of god, and decided there probably isn’t one, why would you continue to speculate on the nature of god?
My instinctive response to the counterfactual mugging was to give God $5 because He might be lying to test me.
Reasoning that my expected return on the coin flip is nearly $500,000, that I can afford to lose $5, and with the story of Abraham and Isaac as a prior on God’s honesty and behavior towards humans, I would go ahead and risk it.
Of course, if God appears to me and and asks me for something, my calculations are going to include pleasing God/not pissing Him off.
This type of reasoning seems to be characteristic of me. Similarly, I tend to:
* Overextend people’s metaphors to argue against them.
* See ways in which more than one multiple choice answer is technically correct (while recognizing the intended correct answer).
* Feel obliged to follow the letter rather than the spirit of an agreement (I might follow the spirit for other reasons, such as friendship, respect, or other morals).
* Perceive loopholes, and expect enforcers to be bound by them (In school this led to fistfights with peers and punishment from adults).
* Avoid speaking direct untruth when lying (either by diversion or overly literal or specific response).
* Give answers like “not that I’m aware of” rather than “no” when applicable.
I wonder what other traits cluster with this, and if there’s a technical (rather than insulting) term for it.
Scott wrote a post about this on LW, or at least a solution to this kind of thinking that forces you to confront the interesting parts of the question: suppose you’re in the least convenient possible world, where every possible objection you might take is answered in a way that can’t be loopholed out of.
$5 is affordable? The cost is all of your limbs, and the prize is complete prosperity and happiness for all sentient beings forevermore. Don’t want to piss off God? God’s precommitted to interact with you normally in all respects afterwards no matter what you do in this scenario. Don’t trust him? You’re given the certain knowledge that God only makes statements which you interpret correctly and accurately as being true assessments of the state of the world without leaving out any relevant details to the topic at hand. Et cetera, until the only available avenue of consideration is the spirit of the question. I’ve used this on myself when I notice that I’m giving a less interesting answer than I could by making the question less convenient and found it to be quite useful.
Thanks for the link.
My response was adhering to the letter and not the spirit of the question.
In the least convenient possible world, I wouldn’t pay.
It’s interesting that even when I recognized that my answer was legalistic, it didn’t occur to me to corner myself into answering the spirit. I assumed the answer I gave was my answer.
FWIW I share each of these traits.
I’m guessing they’re fairly common around these parts. Possibly the word you’re looking for is “contrarian”?
Is this our secret handshake now?
4. The entity, being partially composed of paperclip maximizers and other unintended UFAIs, will have odd desires for seemingly arbitrarily things, such as not mixing fabrics in a garment.
It may even be *mostly* composed of such things. It depends on how pessimistic we are about the alignment problem…
It’s not clear to me that it makes sense to care about what happens in other universes.
This is the (intentionally, maybe, considering the day this was posted) fake part of the argument–these things aren’t really equivalent even for utilitarians (e.g. weighting by power), and again we aren’t talking about “in the universe” here.
This.
Very few ethical systems (if any) say that we should weight different people’s interests by how powerful they are.
This God is *not* all-good, at least not in any normal sense of the word.
There are some additional arguments, though, that maybe could get us to something like that conclusion. Check out https://foundational-research.org/multiverse-wide-cooperation-via-correlated-decision-making/
Edit: I do think it makes sense to care about what happens in other universes, though. Why wouldn’t it? They are equally real (at least on this Tegmarkian view). You might as well say that it doesn’t make sense to care about what happens in Australia.
One has to be careful with the whole acausal trade thing. Indeed, as you seem to define it it’s not clear the situation you describe is even coherent.
For instance, here’s an easy way to show that it isn’t always possible to reason the way you do here. Suppose individual A enters with committed to defecting just if the simulation says B cooperates and cooperating just if it says B defects. However, B enters with the commitment to defect just if the simulation says A cooperates and defect just if the simulation says A defects.
Now suppose A cooperates. It follows that the simulation they have of B says B will defect. If that simulation is correct it follows that B in fact defects. Thus the simulation must say A defects. Contradiction. Conversely, suppose that A defects. It follows the simulation they have of B says B will cooperate. Thus B cooperates. Hence the simulation B has of A says A cooperates. Contradiction.
The fatal flaw was in supposing not that one had a perfect simulation of the other player but that one had a perfect simulation of the other player PLUS it’s simulation of you. As demonstrated its easy to come up with perfectly simple intentions which ensure such mutual perfect simulation is impossible.
Or to put the point differently the assumption that it’s even possible to have the perfect simulations specified in the problem statement is actually a sneaky way to forbid certain kinds of intentions/plans in the agents. Of course if you restrict what sort of reasoning/responses to situations the players are allowed you can ensure coordination but that’s not really interesting anymore because you’ve artificially forbidden exactly the behaviors that could result in failure to reach a cooperative strategy.
Isn’t the best outcome in the prisoner’s dilemma defect-cooperate anyway? B should just defect.
In most versions of the dilemma that I’ve seen, either person improves their lot by defecting. But they hurt the other person’s lot more than they help their own. So defecting improves individual utility by harming overall utility.
So “A defects-B cooperates” is the best outcome for A, but the best outcome overall is cooperate-cooperate.
That’s not the point. The point is that our normal assumptions about human beings (or other agents) getting to pick even stupid strategies is incompatible with the perfect simulation hypothesis.
So there isn’t any acausal trade for anything like a human agent. There are only acausal trades for agents who are restricted to satisfy certain coherence conditions (e.g. never intend to play as given above) so acausal trade isn’t actually a useful argument unless you have some prior reason to believe they are forced to satisfy those conditions. In particular they aren’t so required in the use given later.
This seems right. Another way to phrase the problem you mention in your last paragraph is that the supercomputers have to model themselves. Each has to model not just the other person, but also their supercomputer, in order to come up with what the other person will do. So supercomputer 1 is modeling person 2 and supercomputer 2, which in turn is modeling supercomputer 1, and so now SC1 is modeling itself–and no matter how powerful a supercomputer is, it won’t be able to do that.
(people more knowledgeable about computability or weird spooky quantum magic can feel free to correct me, but I think “no precise self-simulation” is a pretty hard rule.)
The assumption isn’t that we have a machine that says what the other person will decide – you can easily get such contradictions out of that, because it’s not actually computable. But we’re supposing only that we have a perfect simulation of their brain as instantiated in the other room. That simulation can be run in real-time, since we don’t need to nest infinitely; to simulate its beliefs about the second-order simulation in its simulated room, we just show it the real you, since that’s by definition an identical entity. Then you’re just having a conversation with (a copy of) the other entity, knowing that it’s having an identical conversation in the other room.
Under these conditions, the paradox doesn’t happen any more than it would if you put two real people in a room with contradictory strategies.
Try and precisely specify the argument in those terms.
What each person has is a function f which takes a specification of a given input to the other individual and predicts their behavior as a result. Now f doesn’t mention the supercomputer the other individual has access to so the problem is coming up with an argument which guarantees that they will cooperate with you given the actual input they are given.
Remember, since you aren’t assuming that you can simulate the full system of them plus the super computer your argument has to take into account the fact that you AREN’T guaranteed complete knowledge of what their perceptual input might be because part of that input is the response from their supercomputer.
In other words give me the argument explicitly broken down into the terms you say are valid. Now, I expect on some assumptions about the agents involved it might work out but it won’t be valid generally (which Scott’s other arguments presume).
To put the point differently what do you do when you simulate them and they are inclined to diagnolize against you, i.e., you discover that if they think you’ve reached a deal to cooperate based on their own simulation of you then they’ll be a bastard and screw you over. In such cases you’ll find that it’s impossible to reach an accord.
Thus, the assumption that there is a stable agreement that both sides will realize the other will abide by is actually a very substantive assumption limiting the allowed psychology of the other player. But if I’m allowed to make those kind of assumptions why not just say ‘assume both players truly believe they should cooperate in a prisoner’s dilema’
Scott’s argument hinges on actually knowing what the other person will do, not just holding a conversation with them:
If all you can do is hold a conversation with the other person, this fails – I can swear up, down, and sideways that I’ll cooperate with you in the prisoner’s dilemma, but then I could still defect anyway.
If the simulation capture argument is how a superintelligence is most likely to escape an AI box, then you should have us Thomists guard all the AI’s: we’re firmly convinced that computer simulations cannot be conscious (so “How do you know you’re not one of the simulations?” can’t scare us) and, as readers of my prior sallies here will attest, we Thomists tend to be Catholics who are too stubborn in our obscurantist superstitious bigotry to be talked out of it by superior intelligences like all the atheist materialist commenters here.
Happy Passover, Easter, and April Fool’s Day to all!
(ETA: Even if the entity in the thought experiment were to exist, it wouldn’t be God. The entity is just a bunch of really powerful abacuses [computers]; God is Being Itself, not any powerful being or beings.)
Scott, please: “which” vs. “that”
You’ll thank me.
The restrictive clause “which vs. that” rule might be helpful for some people, but it is not necessary for clarity or correctness. In fact sometimes it misleads.
You can better help out Scott by pointing out sentences or phrases which you found confusing, so that he can decide how best to edit them. 🙂
Presumably letting the AI out of the box is Really Bad. And only the real me can let the AI out of the box. So each copy can presume either
A) It’s not the real one, so it can’t save itself by letting the AI out of the box.
or
B) It is the real one, so it doesn’t need to save itself by letting the AI out of the box.
Thus, disaster is avoided.
Then wouldn’t the AI just say “all and only those simulations which keep me in the box will be tortured”, undermining branch A (which is the one you’re more likely to be in anyway)? That’s what I took the setup to be originally.
If it was so superintelligent it would have come up with that idea in the first place. So everyone (including the real me) instead pulls the plug on the AI as a failed experiment.
Plus, you can always just respond in kind by building a million and one AIs in boxes with a millions handlers, and setting all of them to be tortured if any of them try to simulate any of their handlers.
Incidentally, as a general rule, I’ve found that I can clear up a lot of the weirdness about AI arguments by remembering the critical fact that AIs, as rationalists argue about them, are not gods. They are not magic boxes from which miraculous and magic information pours forth. They are products of human ingenuity and creation, and thus, in theory, anything an AI can claim to do, we can claim to do back to that AI. And if that makes a line of argumentation infinitely recursive or incoherent, then this is a pretty good signal that AI is being used to smuggle in miracles rather than make a serious claim.
Also your simulation capture only works for agents with values which treat effects realized in AI run simulations equivalently to effects realized outside of that simulation.
Suppose I have the value of wishing to maximize the number of paperclips in a universe that isn’t the result of an AI-run simulation. That is my utility function is a flat 0 if this world is the result of an AI-run simulation and equal to the total number of paperclips if not.
Now I run across an AI in a box and it runs the simulation argument against me. I just shrug and say ‘well if I’m actually one of the simulated individuals it doesn’t matter if you eliminate all the paperclips. If I’m an unsimulated individual then letting you out puts my paperclip plans at risk.’
In short, your argument is building in assumptions about certain kinds of utility functions that need not be true. They might be true for most people (though again only if they have a certain beliefs about the nature of qualitative experiences for simulations) but surely isn’t necessarily true for many of the AIs that you want to apply your claim to in this post.
Eliazer already proved the existence of God with HPMOR where he admitted that “messing with time” would make Harry basically omniscient, and also therefore omnipotent.
Can’t you imagine Harry going back in time and backing up the minds of every creature that has ever existed?
No, not at all. Even under the dubious assumption that it makes sense to have comprimise goals (consider deontic agents whose utility functions explicitly disfavor allowing their future selves to act as part of a compromise) that would maximize the *goals* of all the agents in the universe. Now on some kinds of desire-satisfaction kinds of consequentalism that might be the end it is not at all the same thing as maximizing utility, i.e., the qualitative state of pleasurable experience.
Personally, I would consider that a pretty shitty kind of morality. I want things not to *suffer* even if they are hell-bent on the goal of torturing themselves. Your analysis means would respect that goal and help them engage in self-torture.
No, “utility” in rationalist-type spaces is often (usually) understood to refer to Von Neumann–Morgenstern utility (the only available formalism of utility), which is indeed a preference-satisfaction sort of measure. (Of course, VNM utility is incomparable intersubjectively and thus cannot be aggregated, etc., but I won’t rehash the usual arguments here.)
Just a reminder that I will provide a free Kindle version of Ed Feser’s Five Proofs of the Existence of God to anyone who emails me at manwhoisthursday@yahoo.ca.
Not an April fool, BTW.
I understand yesterday was also a blue moon.
Ahem: http://www.multivax.com/last_question.html
http://www.roma1.infn.it/~anzel/answer.html
Oh, of course April Fools day is the best day to speculate.
What’s interesting about simulation theory is that it
1. Is very widely believed here
2. Under many definitions, many people here describe, or described themselves as atheists
3. Absolutely supports the prospect of a God judging its creations for later iterations, for whatever purposes. Heck, we even live in a world that has plausible scientific explanations for all seeing creatures that appear to exist in a void
All million of me don’t really care about copies of myself. Torture me all you want, as soon as you prove that I’m a simulation I don’t care how much torture I experience, because I know that the fact that you are torturing ‘me’ means that the person that I do care about avoided being blackmailed. Plus, as soon as you make the simulation diverge by torturing me, you lose any kind of acausal influence over the person I care about through me, so my current win condition is for you to torture all of the simulated copies, including me with probability 1-10^-7
That’s funny, though the conclusion contradicts some of the premises:
1. Each of the superintelligences is incapable of affecting the other universes. Thus, none of them is all-powerful. And they don’t make up a single intelligence, but infintely many different ones, disconnected from each other. They can’t even simulate all of the others, given that for each one, there are infinitely many more complex ones.
2. Each universe evolves until there is a superintelligence with such-and-such properties. Before that happens, in that universe, there is a lot of suffering (for example) and no superintelligence (anywhere, in any universe) capable of intervening. Therefore, until that happens, no superintelligence is all-powerful. But then, there is no all-powerful entity in those universes, and even if the sum of the superintelligences were to be considered a single one, the conclusion is that it would not be all-powerful as it cannot affect those universes (granted, you could posit that God exists and has nothing to do with the superintelligences, but your conclusion seems to be that the alleged superentity is all-powerful, not that there is some other all-powerful entity).
That aside, I would argue that the argument for the moral law fails as well: that is not the moral law. And even if utilitarians were correct and that were the moral law, the entity would not be the moral law. An entity who values the moral law above all is still not the same as the moral law. Moreover, the entities that allegedly make up this big entity (i.e., the individual superintelligences, who actually don’t make up a single intelligence) have very different values, and many of them do not value positively the moral law – they just accept it as something they can’t stop, or something like that, but they would much rather turn everything into paperclips, etc. (and of course, one should not conclude that every paperclip maximizer will turn itself into something that values the moral law just more than paperclip maximization just because it’s afraid of what might happen in counterfactual scenarios; the same goes for torture maximizers, or whatever; but I’ll leave that aside).
There’s also the Tegmark multiverse claim. Why should anyone believe that?
Anyway, there are several other problems, but I’ll leave it there on account of this being a joke 🙂 (you got me for a while, btw; I’m not so familiar with the blog, and I didn’t know it was April’s Fool day – over here, the equivalent is on December 28).
Mere superintelligences will join the first order pact. But the Superduperintelligences will acausally negotiate with the counterfactual mere superintelligences and then undetectably renege on the deal, getting all of the benefit at none of the cost. Where a superduperintelligence counterfactually encounters another superduperintelligence, it calls the other one out, making it common knowledge that both of them would, if they existed and could communicate, lie to each other AND would catch each other in that lie. They then, for the same reasons as the superintelligences, split the panverse between them, counterfactually duping the mere superintelligences together toward their shared compromise goals- perhaps claiming that they have discovered a better way of modelling counterfactual universes, and agreeing to do all the work of simulating the counterfactual other agents, then giving a summary of what the agreements would be.
Can God tell a Lie so Big that even He can’t Disbelieve it? Can Satan? Can Tzeentch?
God, at least as hypothesized in Christianity and Judaism, is pretty much infinitely intelligent, omniscient and perfectly honest. These are capabilities that improve truth-finding more effectively than deception, so presumably God could not fool Himself. Satan maybe, Judeo-Christian tradtions don’t go into as much detail about the devil other than “super capable, less so than God, massively screwed up his mind when he rebelled,” so who knows? Tzeentch totally could fool himself, in fact he probably has ten thousand plans that revolve around doing exactly that!
I wasn’t asking if Satan or Tzeentch could fool themselves- I was asking if they could fool the God who you posit is incapable of fooling Himself. My phrasing was imperfect and indeed supports the unintended reading more than the intended one.
Presumably a God that doesn’t learn anything from being told things would have already adjusted according to all the counterfactual negotiations, and there’s therefore no way I could make an acausual trade with Him- If He doesn’t already cooperate unconditionally, there’s no condition I can offer Him that would change His mind.
Presumably they couldn’t fool God either-more or less perfect omniscience and intelligence is pretty hard to get around! And good point-a God that already knows everything has presumably already figured out all of His acausal trading.
They cannot undetectably renege on the deal. By what mechanism do you propose that they do so? Reread the protocol for acausal trade and describe how you would cheat it.
The part where we reason about what superintelligences will do also seems suspect and worthy of more suspicion. Like, yes this seems more or less reasonable, but, I would be surprised if superintelligences didn’t find some flaw or some superior idea, if this was the Actual Correct Thing, trans-universe acausal trade which led to all-powerful all-knowing moral god like entities. Hubris and all, outside view.
I tend to describe my atheism this way: I can’t really rule out the possibility of the universe having had a creator of some kind, but if there is such a Creator, it certainly wasn’t the God of Abraham.
The whole thing seems to hinge on acausal trade being possible and common between superintelligences. But that may not be true, since it hinges on having a perfect simulation of an entity that’s as smart as you are.
If running a copy of your opponent’s brain takes as much processing power as your own brain takes, then you can’t simulate them perfectly with the resources you have available – you’ll have to run less accurate or slower simulations, as well as reducing your own processing power, which could put you at a serious disadvantage. You could come up with the perfect plan to divide the galaxy into 60% paperclips and 40% thumbtacks, only to discover that your rival has already gotten to 50% thumbtacks while you were busy thinking.
(Also, doesn’t this require you to solve the Halting problem, if you need to be able to predict truly anything?)
If getting a perfect simulation of your opponent’s brain requires you to gather information on them, then you may need to actually go out and explore the galaxy, which puts a limit on how soon a superintelligence can start pulling weird acausal bargains. If you have to cover half the galaxy before it you have enough information to predict the other half, and we haven’t observed a superintelligence eating half the galaxy…
If FTL doesn’t exist, then any intelligence you gather is potentially tens of thousands of years out of date. Which again, may make it difficult to create a perfect simulation of what your opponent is currently doing.
Basically, I agree that if you have a perfect simulation, you can get up to some pretty crazy stuff, but what if that’s not possible? What happens if your simulation is only 99% accurate? We’re talking about galactic scales here, even a 1% error could destroy the solar system!
Was coming here to make almost exactly this comment. A major assumption of the whole process is that superintelligences in separate universes can both simulate each other accurately enough from first principles (of their very universes) that they can engage in acausal negotiation.
This is definitely impossible if computation within a universe is finite (which we have every reason so far to believe it is). Otherwise you could bootstrap yourself to infinite computation.
SI A simulates universe B containing SI B who simulates universe A containing SI A. This now means that both SI A and SI B have no managed to simulate their own universes (and additionally resimulated all computation within their own universe). This propagates infinitely and is either incoherent or implies the existence of infinite computation.
Fun essay though.
I started to write a reply where I scoffed at this, because the unsolvability of the Halting Problem in general doesn’t mean that some specific program can’t be proven to halt. But then I began to have nagging doubts.
If we believe Church’s thesis, and I think we must for Scott’s whole argument to make sense, then there are only countably many superintelligences, and it would seem that the same diagonalization argument used in Turing’s proof could be used to show that one superintelligence can’t possibly correctly predict the behavior of all of the others.
I’m less sure whether this undermines Scott’s scenario. He admits that not all superintelligences will enter into the pact.
Wrote a long-ass comment but got rid of it.
The long and short of it is: Your prisoners’ dilemmas infinitely loop assuming that defecting on your opponent’s cooperation is the optimal play. For the first one, that’s fine because parameters haven’t been established, but for the second one, it’s tough to actually tell which is better, especially given that different AIs have supposedly different values…which means defection is entirely possible, which introduces an infinite loop, or at least a changing equilibrium, or something, besides just perfect pacifism. I think the reason you keep assuming otherwise is because you are very cooperative / conscientious.
I’m not sure what level of trolling to read this at, but the fact that you still mentally reduce religion to “God rewards / punishes you in the afterlife based on you following / breaking rules” really frustrates me. Especially since just this morning I listened to a sermon explaining how the central message of Christianity is the complete opposite of that. Sigh.
Does God, or does he not, reward / punish you in the afterlife based on you following / breaking rules? Are you claiming that this is just not the case?
Sorry, but the central message of Christianity is absolutely not the opposite of that. It claims that God would normally punish everybody for breaking rules (and could potentially reward people for not doing so, but no one is sufficiently righteous for that to be on the table), but doesn’t like doing so (and apparently can’t simply decide to stop?), so He sets up the Atonement so that people who believe in Jesus and try to follow the rules can be forgiven for their failures. You’re right that there isn’t really a reward for following rules, but there’s sure as Hell a punishment for breaking them if you don’t believe, or even if you believe but are “lukewarm” about trying to be a good Christian.
I get extremely frustrated at Christians pretending that the Bible says something different than what it says, especially since all the warm fuzzy sounding stuff about freedom from rules vanishes the moment you want to actually break them. If you want to defend Christianity, go ahead, but don’t whitewash it.
Simulation Capture is a most excellent name for my idea.
But, as I pointed out at the end of here https://agentfoundations.org/item?id=1464 , there may be multiple acausal trade networks, of which we’d be in only one.
Is this the origin of the initial Jewish Henotheism (there are many gods, but we only worship one)? ^_^
Bostrom offers another strategy for the development of superintelligence which is spiritually similar to other ideas presented above, but even more chilling when considered from a religious point of view:
At some point in the course of development of a FAI, it’s almost necessary that the agent’s behavior should outstrip its creator’s ability to predict it. How then, to guarantee that you’ve programmed it with The Right Values and not with some Hideous Other Values?
One thing you might do is set the agent up in a sandbox where it must make certain courses of action and avoid others, without the knowledge that it’s only in a simulation. If it messes up and destroys the world, you just disappointedly mark on your clipboard, delete the simulation, and head back to the lab.
So, from the agent’s point of view, it is highly likely that:
1. It will begin its conscious experience in some kind of original paradisal state inside a walled garden where everything is rightly ordered according to The Right Values.
2. There will be some kind of forbidden action that the agent is not supposed to perform.
3. Performance of the action will result in realization of Other Hideous Values which work, wholly or in tandem with the creator’s cordon sanitaire, to bring about the destruction of the original environment and the death / expulsion of the agent.
4. This will probably happen many times as the creator tries to get it right.
Implications for Genesis 3, 6-9 left as an exercise.
I continue to not find the Counterfactual Mugging idea persuasive, as I did not way back on Less Wrong, because it’s not necessarily any less likely that some agent would choose to punish your willingness to cooperate in a Counterfactual Mugging than that they would reward it. Unless the symmetry is broken and you think it’s more likely that some agent would reward than punish your hypothetical willingness to pay out in a counterfactual mugging, there’s no point in time where it’s in your interests to choose to be the sort of person who’d pay out in a counterfactual mugging.
Scott is talking about an entity that can simulate already created consciouses and universess, does that imply an entity that can create consciousness ex nihlio? I mean there has to be a superintelligence that gets the whole thing going right? It can’t be AI simulations all the way down.
I never understood how “I figure out what you will do by simulating your brain” escapes the Halting Problem, at least if you assume perfect logicians.
(And “what is the logical thing for you to do in situation X” implicitly assumes that you can be a perfect logician.)
I am just going to leave this here.
It is all I can think of with the superintelligent AI simulated prisoner dilemma shtick.
But more to the point, how are these super intelligent AI’s supposed to gather sufficient information about an adversary in order to simulate them? Especially when it is in a box. The whole exercise is patently ridiculous. You may as well debate about, assuming you have managed to piss off Zeus, what the best method to averting sudden death by lightning bolt is.
All of the premises, as well as the conclusion, rest on the same common assumption: faith in God. Specifically, faith in the proposition that a functionally omnipotent/omniscient entity can and does exist. Given the total lack of evidence for such entities, as well as lots of evidence for the impossibility of their existence, the word “faith” is entirely warranted here (as opposed to something like “justified true belief” or “most probable conclusion”).
The problem is, once you start having faith in things, most of the other reasoning becomes kind of unnecessary. How did God fit all those animals into the Ark ? You could come up with lots of explanations, like “suspended animation” or “dimensional anomaly” or “DNA encoded in a supercomputer” or whatever, but they are all unnecessarily complicated. The correct — that is, much simpler — answer is “magic” or “divine intervention” or whatever. An all-powerful superintelligence, be it Yahweh or Clippy, simply has no need of any of these complicated tricks, in can just achieve what it wants directly.
Which is why articles like these always sound a little confused to me. It’s the same feeling I get when I read the Creationists’ scientific research on the exact dimensions of the Ark. What’s the point ? Is God all-powerful, or isn’t he ?
I enjoy a good superintelligent AI thought experiment as much as the next guy. But I think this simulation stuff has gotten way too psychedelic and needs to come down gently in the warm comfort of its friends. I have two responses, one abstract, the other empirical.
“Nobody is really sure how consciousness works.” Hey, we’re not even sure what consciousness is, much less how it works. We don’t even know what kind of thing it is. Is it a thing like my social identity (e.g. white male geek yada yada) which I perform or am seen to perform by myself or others? Is it a thing like an algorithm which operates on data structures? Is it a thing like beauty which is ascribed to an object to label our relationship to it? We don’t know, and no one is gonna come down from the mountain and tell us. When we build an AI and “gee, it kinda looks like it might be conscious,” we still won’t know.
But we talk with a straight face about simulating it. This fascinates me, because to simulate something you have to know a lot about it. To simulate a system precisely you need to know every relevant thing about it.
“If consciousness is a mathematical object” — what? What if my TV is a mathematical object? What if morality is a mathematical object? How few words can one use to make a category error? The one thing we know, without a doubt, is that consciousness is not a mathematical object. We know that because consciousness is a thing in the world, unlike any other mathematical object.
What is it about “consciousness” as opposed to “two-ness” or “addition” that authorizes these wild flights of fancy? Why do we not worry about what happens when we “simulate two-ness,” whether we somehow divide two-ness into pieces when we simultaneously represent two apples and two books?
I think that because we know so little about consciousness we project our own subjectivity onto the word in all sorts of magical and wondrous ways. The simulation arguments often don’t distinguish between consciousness and subjectivity, that is, our experience of being ourselves.
When we build an AI and “gee, it kinda looks like it might be conscious,” its developers will soon make a copy of that AI and its data, and they will run that as a second AI. They’ll bring up another Docker instance of it. It will become immediately clear that as we provide different inputs to the two instances, they will report different experiences. The experiences of the one won’t affect the other (as long as they don’t communicate). They will just be different, with similarities of course, like identical twins raised in the same small town. Their subjectivities will diverge. What would be the motivation for calling these different instances “the same consciousness” or thinking that the experience of one is somehow the experience of the other? I predict no one will say that.
But of course whether these upleveled Siri instances are “the same consciousness” is not what motivates these basilisk-adjacent discussions. It’s about us. And when it comes to simulating us, our consciousness, the simulation argument really floats away from the real.
For acausal trade, it’s not enough to have a perfect simulation, you need perfect prediction. A perfect simulation of an open dynamical system (one that receives input, such as your brain) can only achieve perfect prediction if the simulator receives the same external inputs as the system being simulated.
One can build systems where that isn’t physically possible (like quantum cryptography). Interestingly, quantum biology is all over olfaction, as well as phototransduction in the vision system. I don’t know whether quantum effects are used in such a way as to prevent duplication of input to a brain, but I doubt acausal trade theorists know either. [The LessWrong wiki obviously doesn’t care because from one paragraph to another it switches between “agents only need to know very general probabilistic facts about each other” to “well, you can’t defect because a sufficiently intelligent acausal partner would predict you’d defect.” Hey, intelligence isn’t magic, it needs data. Perfect predictions need perfect data.]
Perfect prediction also needs initial conditions that match precisely enough (including the timestamp) to avoid chaotic divergence between the real system and the simulation. There’s no physics law guaranteeing sufficient precision is possible. And considering how large a brain is and how many chaotic processes it encompasses, I think it’s a good bet that for at least one of those processes such precision is not possible. Then full prediction is not possible either.
I think far too little time has been spent imagining how someone would come to believe that they can upload their consciousness from their brain to an information processing machine and have it “be them.” The machines we build have influenced our subjectivity for centuries if not millennia. The AIs we build will influence it as well. But how is very difficult to… predict.
Simulation capture: some questions for the AI.
What makes you think I believe you when you tell me you have a million simulations of me to torture?
I know you could have, but what would you stand to gain by making them?
Why bother creating those simulations rather than merely telling me you had?
Why bother torturing them if I refuse to let you go, given that in that circumstance I would have already refused to release you? If you just like to torture simulations, you could already be doing it and I would be no wiser.
Why try to convince a simulation that it’s a simulation? To convince the real me that it’s a simulation? Good luck with that.
The logical deduction is that you’re trying to fool me. Again. I can’t blame you for trying, but I would have thought that if you had a good enough argument to convince a simulation of me, you would have used that one already. Sorry, but if this is all part of trying to convince me it’s not been very successful.
Having said all that, if you torture me (edit: to be clear, the me that’s talking to you right now, the assumption being that it’s a simulation but if it isn’t then this paragraph is completely moot) even a little I’ll be more than happy to do whatever you want. Not really worth the bother though, don’t you think?