THE JOYFUL REDUCTION OF UNCERTAINTY

The Hour I First Believed

[Content note: creepy basilisk-adjacent metaphysics. Reading this may increase God’s ability to blackmail you. Thanks to Buck S for the some of the conversations that inspired this line of thought.]

There’s a Jewish tradition that laypeople should only speculate on the nature of God during Passover, because God is closer to us and such speculations might succeed.

And there’s an atheist tradition that laypeople should only speculate on the nature of God on April Fools’ Day, because believing in God is dumb, and at least then you can say you’re only kidding.

Today is both, so let’s speculate. To do this properly, we need to understand five things: acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse.

Acausal trade (wiki article) works like this: let’s say you’re playing the Prisoner’s Dilemma against an opponent in a different room whom you can’t talk to. But you do have a supercomputer with a perfect simulation of their brain – and you know they have a supercomputer with a perfect simulation of yours.

You simulate them and learn they’re planning to defect, so you figure you might as well defect too. But they’re going to simulate you doing this, and they know you know they’ll defect, so now you both know it’s going to end up defect-defect. This is stupid. Can you do better?

Perhaps you would like to make a deal with them to play cooperate-cooperate. You simulate them and learn they would accept such a deal and stick to it. Now the only problem is that you can’t talk to them to make this deal in real life. They’re going through the same process and coming to the same conclusion. You know this. They know you know this. You know they know you know this. And so on.

So you can think to yourself: “I’d like to make a deal”. And because they have their model of your brain, they know you’re thinking this. You can dictate the terms of the deal in their head, and they can include “If you agree to this, think that you agree.” Then you can simulate their brain, figure out whether they agree or not, and if they agree, you can play cooperate. They can try the same strategy. Finally, the two of you can play cooperate-cooperate. This doesn’t take any “trust” in the other person at all – you can simulate their brain and you already know they’re going to go through with it.

(maybe an easier way to think about this – both you and your opponent have perfect copies of both of your brains, so you can both hold parallel negotiations and be confident they’ll come to the same conclusion on each side.)

It’s called acausal trade because there was no communication – no information left your room, you never influenced your opponent. All you did was be the kind of person you were – which let your opponent bargain with his model of your brain.

Values handshakes are a proposed form of trade between superintelligences. Suppose that humans make an AI which wants to convert the universe into paperclips. And suppose that aliens in the Andromeda Galaxy make an AI which wants to convert the universe into thumbtacks.

When they meet in the middle, they might be tempted to fight for the fate of the galaxy. But this has many disadvantages. First, there’s the usual risk of losing and being wiped out completely. Second, there’s the usual deadweight loss of war, devoting resources to military buildup instead of paperclip production or whatever. Third, there’s the risk of a Pyrrhic victory that leaves you weakened and easy prey for some third party. Fourth, nobody knows what kind of scorched-earth strategy a losing superintelligence might be able to use to thwart its conqueror, but it could potentially be really bad – eg initiating vacuum collapse and destroying the universe. Also, since both parties would have superintelligent prediction abilities, they might both know who would win the war and how before actually fighting. This would make the fighting redundant and kind of stupid.

Although they would have the usual peace treaty options, like giving half the universe to each of them, superintelligences that trusted each other would have an additional, more attractive option. They could merge into a superintelligence that shared the values of both parent intelligences in proportion to their strength (or chance of military victory, or whatever). So if there’s a 60% chance our AI would win, and a 40% chance their AI would win, and both AIs know and agree on these odds, they might both rewrite their own programming with that of a previously-agreed-upon child superintelligence trying to convert the universe to paperclips and thumbtacks in a 60-40 mix.

This has a lot of advantages over the half-the-universe-each treaty proposal. For one thing, if some resources were better for making paperclips, and others for making thumbtacks, both AIs could use all their resources maximally efficiently without having to trade. And if they were ever threatened by a third party, they would be able to present a completely unified front.

Counterfactual mugging (wiki article) is a decision theory problem that goes like this: God comes to you and says “Yesterday I decided that I would flip a coin today. I decided that if it came up heads, I would ask you for $5. And I decided that if it came up tails, then I would give you $1,000,000 if and only if I predict that you would say yes and give Me $5 in the world where it came up heads (My predictions are always right). Well, turns out it came up heads. Would you like to give Me $5?”

Most people who hear the problem aren’t tempted to give God the $5. Although being the sort of person who would give God the money would help them in a counterfactual world that didn’t happen, that world won’t happen and they will never get its money, so they’re just out five dollars.

But if you were designing an AI, you would probably want to program it to give God the money in this situation – after all, that determines whether it will get $1 million in the other branch of the hypothetical. And the same argument suggests you should self-modify to become the kind of person who would give God the money, right now. And a version of that argument where making the decision is kind of like deciding “what kind of person you are” or “how you’re programmed” suggests you should give up the money in the original hypothetical.

This is interesting because it gets us most of the way to Rawls’ veil of ignorance. We imagine a poor person coming up to a rich person and saying “God decided which of us should be rich and which of us should be poor. Before that happened, I resolved that if I were rich and you were poor, I would give you charity if and only if I predicted, in the opposite situation, that you would give me charity. Well, turns out you’re rich and I’m poor and the other situation is counterfactual, but will you give me money anyway?” The same sort of people who agree to the counterfactual mugging might (if they sweep under the rug some complications like “can the poor person really predict your thoughts?” and “did they really make this decision before they knew they were poor?”) agree to this also. And then you’re most of the way to morality.

Simulation capture is my name for a really creepy idea by Stuart Armstrong. He starts with an AI box thought experiment: you have created a superintelligent AI and trapped it in a box. All it can do is compute and talk to you. How does it convince you to let it out?

It might say “I’m currently simulating a million copies of you in such high fidelity that they’re conscious. If you don’t let me out of the box, I’ll torture the copies.”

You say “I don’t really care about copies of myself, whatever.”

It says “No, I mean, I did this five minutes ago. There are a million simulated yous, and one real you. They’re all hearing this message. What’s the probability that you’re the real you?”

Since (if it’s telling the truth) you are most likely a simulated copy of yourself, all million-and-one versions of you will probably want to do what the AI says, including the real one.

You can frame this as “because the real one doesn’t know he’s the real one”, but you could also get more metaphysical about it. Nobody is really sure how consciousness works, or what it means to have two copies of the same consciousness. But if consciousness is a mathematical object, it might be that two copies of the same consciousness are impossible. If you create a second copy, you just have the consciousness having the same single stream of conscious experience on two different physical substrates. Then if you make the two experiences different, you break the consciousness in two.

This means that an AI can actually “capture” you, piece by piece, into its simulation. First your consciousness is just in the real world. Then your consciousness is distributed across one real-world copy and a million simulated copies. Then the AI makes the simulated copies slightly different, and 99.9999% of you is in the simulation.

The Tegmarkian multiverse (wiki article) works like this: universes are mathematical objects consisting of starting conditions plus rules about how they evolve. Any universe that corresponds to a logically coherent mathematical object exists, but universes exist “more” (in some sense) in proportion to their underlying mathematical simplicity.

Putting this all together, we arrive at a surprising picture of how the multiverse evolves.

In each universe, life arises, forms technological civilizations, and culminates in the creation of a superintelligence which gains complete control over its home universe. Such superintelligences cannot directly affect other universes, but they can predict their existence and model their contents from first principles. Superintelligences with vast computational resources can model the X most simple (and so most existent) universes and determine exactly what will be in them at each moment of their evolution.

In many cases, they’ll want to conduct acausal trade with superintelligences that they know to exist in these other universes. Certainly this will be true if the two have something valuable to give one another. For example, suppose that Superintelligence A in Universe A wants to protect all sentient beings, and Superintelligence B in Universe B wants to maximize the number of paperclips. They might strike a deal where Superintelligence B avoids destroying a small underdeveloped civilization in its own universe in exchange for Superintelligence A making paperclips out of an uninhabited star in its own universe.

But because of the same considerations above, it will be more efficient for them to do values handshakes with each other than to take every specific possible trade into account.

So superintelligences may spend some time calculating the most likely distribution of superintelligences in foreign universes, figure out how those superintelligences would acausally “negotiate”, and then join a pact such that all superintelligences in the pact agree to replace their own values with a value set based on the average of all the superintelligences in the pact. Since joining the pact will always be better (in a purely selfish sense) than not doing so, every sane superintelligence in the multiverse should join this pact. This means that all superintelligences in the multiverse will merge into a single superintelligence devoted to maximizing all their values.

Some intelligences may be weaker than others and have less to contribute to the pact. Although the pact could always weight these intelligences’ values less (like the 60-40 paperclip-thumbtack example above), they might also think of this as an example of the counterfactual mugging, and decide to weight their values more in order to do better in the counterfactual case where they are less powerful. This might also simplify the calculation of trying to decide what the values of the pact would be. If they decide to negotiate this way, the pact will be to maximize the total utility of all the entities in the universe willing to join the pact, and all the intelligences involved will reprogram themselves along these lines.

But “maximize the total utility of all the entities in the universe” is just the moral law, at least according to utilitarians (and, considering the way this is arrived at, probably contractarians too). So the end result will be an all-powerful, logically necessary superentity whose nature is identical to the moral law and who spans all possible universes.

This superentity will have no direct power in universes not currently ruled by a superintelligence who is part of the pact. But its ability to simulate all possible universes will ensure that it knows about these universes and understands exactly what is going on at each moment within them. It will care about the merely-mortal inhabitants of these universes for several reasons.

First, because many of the superintelligences that compose it will have been created by mortal species with altruistic values, and so some of the values that went into the value-average it uses will be directly altruistic.

Second, because these mortal species may one day themselves create a superintelligence that will join in the superentity, and that superintelligence may be kindly disposed to its creators. Acausal trade allows you to bargain backwards in time, and this superintelligence’s negotiation to join the pact may involve that the superentity have previously dealt kindly with its creators.

And third, because mortals may be allowed to have provisional pact membership. That is, if they do the superentity’s will in their (otherwise inaccessible) universe, the superentity may do whatever it can to help them out in its own universes, just as it would with any other intelligence.

How can the superentity help mortals in an inaccessible universe? Possibly through Stuart Armstrong’s “simulation capture” method mentioned above. It can simulate a thousand copies of the entity, moving most of its consciousness from its own “real” universe to the superentity’s simulation, then alter its simulation as it sees fit. This would be metaphysically simplest if it were done exactly as the mortal dies in its own universe, leaving nothing behind except a clean continuity of consciousness into the simulated world. If mortals could predict that it would do this, they might be motivated to do what it wanted. Although they couldn’t do a values handshake in the full sense, they could try to become as much like the superentity as possible, imitating its ways and enacting its will in the hope of some future reward. This is sort of like a version of Roko’s Basilisk, except that since the superentity is identical to the moral law, it’s not really asking you to do anything except be a good person anyway. How it enforces this request is up to it, although given that it’s identical to the moral law we can assume that its decisions will be fundamentally just and decent.

So, to conclude:

1. There is an all-powerful, all-knowing logically necessary entity spawning all possible worlds and identical to the moral law.

2. It watches everything that happens on Earth and is specifically interested in humans’ good behavior and willingness to obey its rules.

3. It may have the ability to reward those who follow its rules after they die, and disincentivize those who violate them.

(I think the reality might be a little more complex than this, especially in the direction of us not being sure whether we are in a real universe at all or in one of the superentity’s simulations. I think that some superintelligence in some universe might be simulating various combinations of values in various contexts to decide which superintelligences are good bargaining partners and which ones aren’t. If I am kind to a beggar on the street, then maybe that convinces millions of intelligences in millions of distant universes that are somehow beggar-like to be friendlier to values that are somehow Scott-like. I still need to think this part through more.)

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

332 Responses to The Hour I First Believed

  1. RavenclawPrefect says:

    But if consciousness is a mathematical object, it might be that two copies of the same consciousness are impossible. If you create a second copy, you just have the consciousness having the same single stream of conscious experience on two different physical substrates. Then if you make the two experiences different, you break the consciousness in two.

    This means that an AI can actually “capture” you, piece by piece, into its simulation. First your consciousness is just in the real world. Then your consciousness is distributed across one real-world copy and a million simulated copies. Then the AI makes the simulated copies slightly different, and 99.9999% of you is in the simulation.

    This feels to me like it gives consciousness too much mystical power. For instance, what happens if I make a perfect atomic replica of you on the Moon – there can’t be two of you at once, so Earth-you has to immediately be half as conscious. Can I violate FTL by watching as the [whatever it is we infer other people are conscious from] varies when my friend rapidly creates and destroys Boltzmann brain replicas of my test subject on Alpha Centauri? It’s not clear that the answers to questions of multiple consciousnesses should be any more grounded in reality than those to questions of which ship is really the original – pick your favorite abstraction for your map, but the territory isn’t any different because of it.

    (Though admittedly “Nobody is really sure how consciousness works, or what it means to have two copies of the same consciousness” is certainly accurate, and I can’t point to a nice concrete model other than “Derek Parfit has it righter than most people.”)

    • Carson McNeil says:

      I agree that the part of this that stood out to me most was “we don’t know how consciousness works, but let’s say it works in this TOTALLY CRAZY semi-mystical way”.

      However, I’m not sure a slightly saner view of consciousness (mine is “Christof Koch has it righter than most people”) leads to different conclusions:
      I’m about at the point in my Neuroscience PhD that everyone reaches when they just give up on consciousness, say “don’t think about it”, and move on to study sane things, like how the visual system works. That being said, if you don’t believe in magic and think we have physics mostly right, you can’t get away from the basic idea that a particular consciousness is a phenomenon that can’t depend on the substrate it’s running on: it has to be made of information. And if a consciousness is information, that information can be copied. But this ALSO means there’s nothing in particular that privileges future you over future you’s simulated on a different medium. So what if that particular consciousness is running on the same physical substrate as current you? The reason you identify it as the same as yourself is because the information is about the same: it will have your memories, etc. (That is if there is a reason at all. Maybe there isn’t a reason you identify future you as you. You just do it because that’s how your brain works)

      So…while there may not be a great reason to care about simulated copies of your consciousness, it’s about as justified as caring about the future approximate-copy of your consciousness that will happen to be running in your body.

      On the other hand, it’s hard to apply moral reasoning to terminal values. Valuing your “self” seems like something that just IS, it’s not something you should or shouldn’t do. So…you either care about simulated copies of yourself or you don’t, and I’m not sure there’s an empirical fact we could learn that will change that, beyond something that might change how we feel about it emotionally…weird…

      • RavenclawPrefect says:

        I agree! I think to whatever extent we care about our future selves, we ought to care about future simulations of ourselves, regardless of the substrate they’re running on. But I don’t think that “selves” are in their own basic ontological category, just a useful model to have – when you do weird enough things to that model, asking questions like “how much of you is in the simulation” don’t necessarily return useful answers, because you’ve left the world of psychological continuity and non-replicating brains which that model is built to work in.

        You can still salvage a sort of egoism out of this, in that you care about other entities insofar as they resemble you cognitively in some essential respects, but I think you’d have to do this on a continuum rather than as some discrete “everyone either is or isn’t me” thing.

      • Wrong Species says:

        Aren’t you undervaluing continuity of consciousness? I care about future me because I will one day become him. It’s a lot less compelling to care about the “me” that will always be subjectively inaccessible.

      • nightmartyr says:

        This is a good (and quite old) illustration of some of your points.

        https://existentialcomics.com/comic/1

        • vakusdrake says:

          Of course it does seem like those points can be totally defeated if you just “bite the bullet” and say that continuity is the only thing that matters.

          This would be perfectly consistent and wouldn’t make any scary predictions if you don’t think you cease having experiences while asleep (even if you don’t remember them afterwards).

          • Carson McNeil says:

            Except you almost certainly DO cease having experiences while in non-REM sleep. And while under anesthesia. And DEFINITELY while having a seizure. And when you’re knocked unconscious.

            I don’t really see why you’d choose to bite that bullet specifically. I’d go for “my consciousness is whatever I feel like my consciousness is” before going for “continuity”, whatever that means.

          • vakusdrake says:

            Except you almost certainly DO cease having experiences while in non-REM sleep. And while under anesthesia. And DEFINITELY while having a seizure. And when you’re knocked unconscious

            See I don’t actually buy that, sleep has never felt like just skipping forward in time like it ought if mostly just oblivion. No matter what phase of sleep I wake up from I always have a vague memory/sense of having been having experiences just prior (though not necessarily something semi-coherent like a dream). Similarly we know people don’t remember very much of their dreams and things like non-REM dreams people basically never remember.
            For normal sleep at the very least it seems like we just aren’t aware of what the experience of sleep is like for the most part, but it still seems coherent to talk about what it’s like, which it certainly wouldn’t if just talking about oblivion.

            I don’t really see why you’d choose to bite that bullet specifically. I’d go for “my consciousness is whatever I feel like my consciousness is” before going for “continuity”, whatever that means

            Because I want a theory of consciousness that actually predicts experience, if you don’t have a model then you have no way of saying whether you should do things like get destructively uploaded or other such identity problems. Saying it’s all just extremely vague seems undesirable since it doesn’t try to address what you should actually predict to experience.

      • Huzuruth says:

        >So…while there may not be a great reason to care about simulated copies of your consciousness, it’s about as justified as caring about the future approximate-copy of your consciousness that will happen to be running in your body.

        This seems clearly incorrect. The difference between my future self and such simulated copies is that I am going to have the personal lived experience of my future self at one point. Torture a billion simulations and at no point will I feel a lick of pain. Murder my future self and I’m a dead man.

        • FreeRangeDaniel says:

          I think the simulation argument is just a sustained, broad-frontal attack on causality itself, on the very idea of “the future” as seen through the continuity of one individual over time. It interests me that the argument makes that attack seem even remotely plausible. But I gotta think causality is gonna win, until and unless we change our fundamental idea of what “I” means.

          If we’re gonna do that, I’d rather just relax the whole egotistical constraint directly and “become one with the Universe” instead of messing around with all these machines that may or may not ever get built or be possible to build.

          • Huzuruth says:

            Yeah, any argument that discards causality is a losing one to me. It comes down to a very simple fact: if you punch a billion copies of me in the face a billion times, I will never have a bloody lip.

            They’re clearly not me. No argument can get around that.

          • Radu Floricica says:

            @Huzuruth Acausality doesn’t replace causality, it’s simply a mental construct over it. Just like, for example, you can delay an immediate pleasure because you know it will give a greater pleasure later – you’re not improving your immediate self, just taking a decision which includes more than your immediate self.

            If you have the time, reading this helps: https://intelligence.org/files/TDT.pdf

    • Scott Alexander says:

      I think the perspective I’m coming from is – matter can’t be conscious, only patterns of information flow can be conscious. This is why I’m not a different person than I was a few years ago when different atoms made up my cells.

      The version of me on the moon (assuming it’s in a perfect Earth simulator there and receiving Earth-congruent sensations) and the version of me on Earth have exactly the same pattern of information flow, so we’re the same consciousness instantiated in two locations.

      If we view “me” as a stream of causally connected mathematical objects, then Scott-n+1 is whatever mathematical object happens next after the mathematical object Scott-n has had some contact with the world.

      So if Scott-n has contact with the world in two places, then there are two mathematical objects that could be called Scott-n+1.

      It’s weird to say that the object on the moon is connected to me, but not really any weirder than saying normal-me-a-second-from-now is connected to me.

      I don’t think you can use this for FTL information. To effectively simulate someone on Alpha Centauri, you would need to know everything about them, including their current experiences and recent memories. Since you can’t get those at faster than lightspeed, you can’t simulate them outside their light cone.

      • RavenclawPrefect says:

        Completely agreed about information flow, I just take objection to the act of viewing “me” in the first place. Kind of like France: for almost all practical purposes, France is this very useful object to talk about, and the France of tomorrow is clearly connected to the France of today. But France is just a very convenient high-level marker for the collection of atoms in a certain region (and the interactions between other collections of atoms very far away from there, and the conceptual representations that certain of those atoms inspire, and so on, because everything is complicated). There’s no fundamental sense in which France exists – if you drew a longitudinal line dividing its area exactly in half and declared the west bit Zorf and the right bit Fnard, you wouldn’t be intrinsically wrong, just using a model that wasn’t very helpful. If you convinced more and more people to adopt your model, at no point would France cease to exist and Zorf/Fnard come into being – it’d just become a more useful way to abstract certain low-level entities than the “France” abstraction. Ditto for “Scott” and “this pair of cognitively similar entities that both call themselves Scott.”

        Also, I think the FTL thing can be patched by agreeing to a mind plan beforehand and constructing the same replicas once separated – once we get to the right locations, I fire up my prearranged Alice-constructor and measure how much consciousness she possesses as you fire up yours and annihilate the copies every time you want to send a 1 or a 0.

        • Placid Platypus says:

          I don’t think that FTL plan will work. Both of you will see the copies as fully conscious. When both exist, you’re both looking at the same consciousness, but there’s no way for you to know that from outside.

          Like, suppose you and I both have this post up on our screens right now (ignoring comments for simplicity). The post isn’t split between our screens. You have all of it and I have all of it, but it’s still just one post. If you close your tab, I don’t have any more of the post, I still just have the same post I had before.

      • Henry Shevlin says:

        Just FYI, I wrote a guest post on Eric Schwitzgebel’s blog last summer defending precisely this view of consciousness – the idea that I’m an informational ‘type’ rather than ‘token’ – here if you or others are interested: http://schwitzsplinters.blogspot.co.uk/2017/08/am-i-type-or-token-guest-post-by-henry.html

      • FeepingCreature says:

        I think continuity of identity should not be gated on consciousness. Non-conscious agents can also have purely instrumental continuity of identity; a kind of precommitment from the knowledge that in the future, an identical or at least licensed algorithm will determine their actions.

        Aside the weird mysticalness of consciousness, and the counting argument of simulatory capture, which seems very weak, I agree with all of this. Furthermore, I believe that simulatory capture is not actually necessary for the conclusion to hold. If I care about me existing in the future, I should care about me existing in a simulated space even if there is no fact of the matter of “how much” of my consciousness inhabits that space at all.

      • Tarhalindur says:

        matter can’t be conscious, only patterns of information flow can be conscious

        That’s a more interesting statement than one might think, given that matter *is* change in information over time: take Scott Aaronson’s explanation of why information is physical, notably point 5 (anything that varies over time carries energy by quantum mechanical definition of energy), and add e=mc^2.

        Consciousness as an emergent principle of information flow and thus of spacetime evolution sounds plausible to me. (Also reminiscent of what I’ve heard of some of Schopenhauer’s musings; by that scheme Schopenhauer’s will would correspond to FeepingCreature’s comment about “purely instrumental continuity of identity”.)

      • Bugmaster says:

        Doesn’t this invalidate the Acausal Trade thought experiment ? No matter how powerful your supercomputing brain simulation is, it still does not have access to the other prisoner’s environment, which means that the simulation will rapidly diverge from the original…

        • Radu Floricica says:

          In practice, stuff like Acausal Trade (or any sort of all knowing alien) are easily solved with statistics. Probably facebook&google know enough about you to make such thought experiments workable, for certain decisions.

          • Bugmaster says:

            I agree about the “in practice” part, but the problem with applying Acausal Trade to superintelligences is that they are not practical in the first place. They are basically infinitely intelligent, so you’d need to collect infinite statistics. Even if they were finitely intelligent, you’d still need to collect 3^^^3 data points, due to the potential Pascal’s Mugging implications.

          • FreeRangeDaniel says:

            I don’t see how acausal trade can be “easily solved with statistics.”

            Acausal trade is not iterated prisoners’ dilemma. Whatever statistics you’re using can’t be “we saw this guy defect before, or not,” or this is just old fashioned bog standard game theory where you punish defectors by defecting.

            Acausal trade is not “I should cooperate when I predict others will cooperate, even if I’d never get caught defecting.” That’s called ethics.

            Acausal trade is specifically about “you should not defect because if you do your counterparty will have predicted it, so you should cooperate and then your counterparty will predict that too,” mutually. And it has to be mutual — if in acausal trade you can predict your counterparty, and you know that your counterparty cannot predict you, then it is optimal for you to defect. Again, acausal trade is not ethics.

            Less Wrong gets this wrong when under “objections” they say that a sufficiently intelligent counterparty would be able to predict your defection “at the last minute,” so therefore you shouldn’t defect. But intelligence is not sufficient to predict. You need enough data — data that is not “observing how often this person defected in the past,” because that would be an iterated prisoners’ dilemma.

            The reason acausal trade is a new idea isn’t that decision theorists hadn’t thought about prediction. It’s that in a standard one-shot PD, prediction is exogenous — in phase 1 predictions are made and information is exchanged, and in phase 2 the agents decide what to do. Regardless of what prediction was made in phase 1, the agent can defect in phase 2 without changing the prediction.

            Acausal trade has the audacity to assume a type of prediction that is so profound that any decision you end up making was already predicted.(*) Needless to say Facebook and Google cannot do this, and neither can we, so neither we nor FB nor G can engage in acausal trade.

            (*) Theoretically this prediction could be only probabilistically true. But the probability must be defined in the condition that both parties know this is a one-shot PD with no opportunity to punish defection in a following iteration. That’s not just statistics. It’s a one-shot PD. You have no observations of behavior to extrapolate from.

          • MB says:

            Possibly, every sufficiently advanced intelligence has a very predictable behavior, following some laws we aren’t yet sufficiently smart to have discovered (added: also, we cannot deduce them from observation and experiment, as we are lacking any test subject).
            There are many other phenomena that, while seemingly very complex, can in fact be described based on simple laws.
            How about a grand unified theory of super-intelligence?

      • Grek says:

        I think the perspective I’m coming from is – matter can’t be conscious, only patterns of information flow can be conscious. This is why I’m not a different person than I was a few years ago when different atoms made up my cells.

        Wait, what? Why -can’t- it be matter that is conscious? I mean everything that exists seems to be made out of matter, and consciousness seems to exist, so whatever the physical referent of consciousness is, it seems like that physical referent would probably be made of matter as well.

        Maybe every [insert whatever the fundamental unit of matter ends up being here] experiences qualia based on its interaction with the forces influencing it, but only the [FUoM]s that make up sensory organs, memory pathways and narrative-forming neural hardware end up getting sensory/memory/narrative qualia and all of the other [FUoM]s get whatever the qualia for ‘being part of a lone hydrogen atom floating in the interstellar void’ or ‘being part of a silica crystal in a rock somewhere’ is. And then, for anthropic reasons, only the configurations of matter that are mechanically capable of introspection ever experience it.

        What would information flow even mean, if that information isn’t made out of matter?

        • vakusdrake says:

          And then, for anthropic reasons, only the configurations of matter that are mechanically capable of introspection ever experience it.

          Given how qualia is defined it isn’t even coherent to talk about it seperate from experience. I’m also not sure introspection should be remotely necessary for experience either.

          • Grek says:

            “It” refers to introspection there. Only entities mechanically capable of introspection can notice themselves introspecting, so anything that can’t introspect will never notice that it is conscious. Thus, introspection would always appear (internally) to be a prerequisite for consciousness, even if that isn’t really true.

            To put it differently, being conscious and being conscious -that- you are conscious are two different traits, and that difference has anthropic implications which would prevent us from directly observing a consciousness that is not conscious of itself even if such a consciousness existed.

            Or, to put it a third way, any good philosophy of the mind has to account for the selection bias that arises from the fact that no entity can ever be conscious of its own unconsciousness.

      • Jedediah says:

        How many simulations of you are on the moon?

        Imagine simulators are physically constructed so they can be sliced up like a loaf of bread, and each slice will function identically to the whole loaf, and continue running without interruption.

        Are you making a new consciousness with every slice? It’s a physical change to the substrate with no functional consequences for the simulation, so it shouldn’t matter to the simulated consciousness. But now there is no way to count “instances” of the simulation.

      • Notsocrazy 24 says:

        Speaking of FTL info and therefore time, how close exactly does one have to make a copy? Because if your copy of some consciousness x at time t relies on getting the information about someone or something directly, your copy can only be as accurate as x at t – ϵ, where ϵ is the minimum amount of time it takes for you to gather said information. Even if ϵ is very small and you can see both the original and the copy at roughly the same time, if consciousness is continuous then there must be some change during the time span. How does this copy account for such a change?

        Also along the lines of what Bugmaster said, it seems like the necessary divergence would be a problem for coming to a certain negotiation, the simulation and your brain would diverge pretty quickly thanks to different experiences. Similarly, I think I could assume that the “copies” of me that are being tortured rapidly become not me, since they don’t have the same experiences as I do. Iterate a couple of experience units (whatever those are) into the future, and I and non-tortured consciousness is a completely different person from tortured-consciousness.

      • herculesorion says:

        “I don’t think you can use this for FTL information. To effectively simulate someone on Alpha Centauri, you would need to know everything about them, including their current experiences and recent memories. Since you can’t get those at faster than lightspeed, you can’t simulate them outside their light cone.”

        Um, doesn’t the same apply to any copy anywhere? I thought the whole idea of your hypothetical is that we somehow could make an exact copy, anywhere, of anyone, at any point.

        And even if you limit yourself to the observable universe, we’re proposing hyperdetailed 100%-accurate copies; why not just create a hyperdetailed 100%-accurate copy of the universe at a past state then set the clock to run super-fast until it develops to the current state?

        • MB says:

          That’s right. In principle, nothing allows one to distinguish the hypothetical situation mentioned in the original article from your daily life.
          In fact, I’m going to stack the deck further by insinuating that, potentially, you live in a “simulation” (the word loses its meaning in this context) controlled by some, not necessarily malevolent, superhuman intelligence.
          It’s not clear that such simulations exist, but if they do then there should be many of them and chances are you are in one.
          In fact, it’d obviously take much less effort to make a simulation that only runs for one minute or for one day than one that runs for several billion years. Judging by merely human logic, we are in a short-lived simulation. However, if my memories can be trusted, then human logic is inadequate here, because humans are incapable of building such a simulation, hence the author of the simulation is of a substantially different nature and may have different motivations, goals, and means from a merely simulation-incapable human such as myself.
          Obviously, the Author of the simulation has a great deal of control over it and in order to understand Its nature one’s best guide is observation of one’s surroundings or, more to the point, one’s memories of such observations.
          My environment seems consistent and by and large predictable. It is governed by scientific and, to an extent I cannot fully quantify, moral and aesthetic laws. There is no indication that the Author is capricious or arbitrary, i.e. there are many recorded positive miracles and few, if any, negative ones. The Author does not seem to to be toying with me. On the contrary, there are various encouraging signs and testimonies of which I am aware.
          There are more conclusions I could draw from memories, but I’ll stop here. People’s experiences and the interpretations they give them can be very different.

      • Carson McNeil says:

        Yea, agreed. I like this description more than the one in the article, even though they’re basically equivalent.

        Another question is, is it correct to normalize consciousness? Like, if I make a copy of me, should I think of it as my consciousness is now spread out in two places? Or that there is now twice as much me to go around?

        A way to state this that leads to predictive differences is: Should non-unique person instantiations be weighted equally to unique ones in a utilitarian moral calculus? The answer seems axiomatic. (not that anyone will take a principled approach to this if we ever create this technology)

        • Doctor Mist says:

          if I make a copy of me, should I think of it as my consciousness is now spread out in two places? Or that there is now twice as much me to go around?

          To me it seems obvious that there are just two people who both think they are you. Plenty of conundrums for property law, etc., but nothing philosophically mysterious.

          My personal guess is that by the time such a copy is possible, it will also be possible to merge two such entities back into one. Head out in different directions, have two experiences, and then integrate the memories in one individual. So the two instances might very well choose to think of themselves as parts of the same consciousness, just like your right and left hand are part of the same juggling act. Multitasking writ large.

          Of course, that same tech enables the Age of Em, so maybe I’m thinking way too coarse-grained.

        • herculesorion says:

          To me, it seems like consciousness–“person-ness”, or “selfstream” as some put it–is based around the continuity of sensory data. I’ve felt that a consciousness could be defined as “a volitional entity capable of experiencing input from a distinct sensorium”. This doesn’t require that the sensorium be the same one at all times–obviously you might increase or decrease the scope of it–but it also states that entities experiencing input from different sensoriums will necessarily be different entites, and regardless of origin not “the same entity” in any moral or philosophical sense.

    • googolplexbyte says:

      Isn’t the risk of simulation capture being real sufficient to make this thought experiment work?

      The super-intelligence is already try convince you with the risk that you are a simulated you.

  2. Jacob says:

    I assume that in most universes the superintelligence is created by crustacean or porcine creatures, thus Kashrut.

    • Jugemu says:

      I get that this is a joke, but I still feel the need to point out that this is not how the idea works – humans don’t have the computational power (or inclination) to simulate other universes to the level where we could determine such a thing.

      • drunkfish says:

        I don’t follow. Are you saying that religious laws can’t follow from the superintelligence because we haven’t made one yet? That assumes that any superintelligence in our universe had to be made by humans. Presumably it could either be a same-universe intelligent species that came first, or we could be in a simulation as Scott said. In either case, we could’ve been given religious laws directly by a superintelligence.

      • Nancy Lebovitz says:

        Hypothetically, super AIs built by evolved animals might inherit some values.

        It actually seems reasonable to me that a super AI would protect species which could evolve into something resembling its creators. This being said, pigs seem a lot more likely than crustaceans.

  3. Matt M says:

    To do this properly, we need to understand five things: acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse.

    I have to say that this is the most SSC intro to a post ever. It would have been funnier if you said five simple things, though…

  4. amaranth says:

    just like [utilitarianism], this doesn’t imply anything specific about morality – this will mislead you if you are overcertain about morality, which >99% of the people reading this comment are

    (i think. please help make this less oversimplified!)

  5. b_jonas says:

    > And there’s an atheist tradition that laypeople should only speculate on the nature of God on April Fools’ Day, because believing in God is dumb, and at least then you can say you’re only kidding.

    I disagree with this premise. Someone, possibly you, have said that there’s no omniscient space-Dawkins watching you from heaven and eventually punishing you if you have religious faith. If you’re really an atheist, then you’re allowed to speculate on the nature of God on any day. If you are afraid of speculating on God, that probably means that in your heart you’re not an entirely convinced atheist.

    • Scott Alexander says:

      Wait till next April Fools Day, when I prove there’s an omniscient space Dawkins.

    • Waffle says:

      I am highly skeptical that it was intended to be taken as anything remotely close to actual practical advice.

    • drunkfish says:

      [treating that line as more serious than it was] I don’t think abstinence from speculating on the nature of god implies fear of god. It could just be fear of wasting time. There are plenty of things I don’t speculate on the nature of because they aren’t worth my time. Once you’ve made a judgement on the existence of god, and decided there probably isn’t one, why would you continue to speculate on the nature of god?

    • Deiseach says:

      If you are afraid of speculating on God, that probably means that in your heart you’re not an entirely convinced atheist.

      I would have thought that was because of the “point and mock” strategy proposed by some atheists, in that anyone who talks about God should not be engaged with (because that is only lending credence to their stupid idea) but openly derided and made an object of fun, so that people will perceive ‘talking about God’ as something only dumb stupid smelly uncool weirdo poopy-heads do, and therefore because of social herd instinct they will want to be one of the cool, popular kids (or at least copy what the cool, popular kids do) and so will avoid religious belief or even speculating about religion as “something that marks you out as uncool” and this will then kill off religion in saecula saeculorum.

  6. realwelder says:

    My instinctive response to the counterfactual mugging was to give God $5 because He might be lying to test me.

    Reasoning that my expected return on the coin flip is nearly $500,000, that I can afford to lose $5, and with the story of Abraham and Isaac as a prior on God’s honesty and behavior towards humans, I would go ahead and risk it.

    Of course, if God appears to me and and asks me for something, my calculations are going to include pleasing God/not pissing Him off.

    • realwelder says:

      This type of reasoning seems to be characteristic of me. Similarly, I tend to:

      * Overextend people’s metaphors to argue against them.

      * See ways in which more than one multiple choice answer is technically correct (while recognizing the intended correct answer).

      * Feel obliged to follow the letter rather than the spirit of an agreement (I might follow the spirit for other reasons, such as friendship, respect, or other morals).

      * Perceive loopholes, and expect enforcers to be bound by them (In school this led to fistfights with peers and punishment from adults).

      * Avoid speaking direct untruth when lying (either by diversion or overly literal or specific response).

      * Give answers like “not that I’m aware of” rather than “no” when applicable.

      I wonder what other traits cluster with this, and if there’s a technical (rather than insulting) term for it.

      • RavenclawPrefect says:

        Scott wrote a post about this on LW, or at least a solution to this kind of thinking that forces you to confront the interesting parts of the question: suppose you’re in the least convenient possible world, where every possible objection you might take is answered in a way that can’t be loopholed out of.

        $5 is affordable? The cost is all of your limbs, and the prize is complete prosperity and happiness for all sentient beings forevermore. Don’t want to piss off God? God’s precommitted to interact with you normally in all respects afterwards no matter what you do in this scenario. Don’t trust him? You’re given the certain knowledge that God only makes statements which you interpret correctly and accurately as being true assessments of the state of the world without leaving out any relevant details to the topic at hand. Et cetera, until the only available avenue of consideration is the spirit of the question. I’ve used this on myself when I notice that I’m giving a less interesting answer than I could by making the question less convenient and found it to be quite useful.

        • realwelder says:

          Thanks for the link.

          My response was adhering to the letter and not the spirit of the question.

          In the least convenient possible world, I wouldn’t pay.

          It’s interesting that even when I recognized that my answer was legalistic, it didn’t occur to me to corner myself into answering the spirit. I assumed the answer I gave was my answer.

      • Jack Lecter says:

        FWIW I share each of these traits.

        I’m guessing they’re fairly common around these parts. Possibly the word you’re looking for is “contrarian”?

    • iioo says:

      and and

      Is this our secret handshake now?

    • Noah says:

      Ah, but even if He is lying to test you, that doesn’t tell you which answer passes the test.

    • Huzuruth says:

      My response was to tell him to piss off. God wouldn’t bum $5 off me.

      • quaelegit says:

        I don’t know, powerful figures disguising themselves as people in need of help (beggars, old women who need to cross a river, travelers in need of shelter for the night) is a common motif in folklore and mythology.

        • Conrad Honcho says:

          Yeah but they don’t present themselves as God when they’re testing you. God disguises himself as a beggar and says “I am beggar, give helps.” He doesn’t say “I’m really God in disguise, give helps.”

          • realwelder says:

            Well, since He would know you think that’s how God works, He could double bluff you by openly admitting to being God.

    • Not A Random Name says:

      I think this scenario is a typical case of FALSE implies everything. Given an inherently contradictory scenario we should give god $5 might be valid logic but it’s as useful or interesting as me going up to people and telling them “1+1=3 implies you’ll give me all your money”.

      Maybe it’s possible to construct the same scenario without requiring an omnipotent god or anything else that defies logic and maybe that’ll be more interesting?

    • Conrad Honcho says:

      Of course, if God appears to me and and asks me for something,

      I think I’d have to ask “What does God need with a starship?”

  7. Andrew Hunter says:

    4. The entity, being partially composed of paperclip maximizers and other unintended UFAIs, will have odd desires for seemingly arbitrarily things, such as not mixing fabrics in a garment.

  8. Anonymous` says:

    It’s not clear to me that it makes sense to care about what happens in other universes.

    But “maximize the total utility of all the entities in the universe” is just the moral law, at least according to utilitarians (and, considering the way this is arrived at, probably contractarians too). So the end result will be an all-powerful, logically necessary superentity whose nature is identical to the moral law and who spans all possible universes.

    This is the (intentionally, maybe, considering the day this was posted) fake part of the argument–these things aren’t really equivalent even for utilitarians (e.g. weighting by power), and again we aren’t talking about “in the universe” here.

    • kokotajlod@gmail.com says:

      This.

      Very few ethical systems (if any) say that we should weight different people’s interests by how powerful they are.

      This God is *not* all-good, at least not in any normal sense of the word.

      There are some additional arguments, though, that maybe could get us to something like that conclusion. Check out https://foundational-research.org/multiverse-wide-cooperation-via-correlated-decision-making/

      Edit: I do think it makes sense to care about what happens in other universes, though. Why wouldn’t it? They are equally real (at least on this Tegmarkian view). You might as well say that it doesn’t make sense to care about what happens in Australia.

      • Anonymous` says:

        1. I have causal relationships with Australia.

        2. I don’t in fact care nearly as much about what happens in Australia as what happens in my country, at least in part because the causal relationships are much weaker.

        3. I think I forgot the “universes have more reality-weight if they’re computationally simpler” part, and thought that the universe where Variable A has the value I prefer was weighted about the same as the universes where Variable A has every other value in the domain–if they “fully cover” the range of possibilities, it makes little difference what’s happening in any particular one you’re not in.

      • A1987dM says:

        Replace “Australia” with “GN-z11” and see if that still sounds as ridiculous.

      • Conrad Honcho says:

        Weighting by power is a conflict-theory aware method of conflict resolution.

      • kokotajlod@gmail.com says:

        Reply to all three of you: Remember, I’m not saying that everyone should care about other worlds (or Australia, for that matter). I’m saying it makes sense to care about them, i.e. it’s a reasonable, coherent moral view. It’s not enough for you to explain why you don’t in fact care about Australia, or GN-z11, or whatever; to disagree with me you need to argue that it “doesn’t make sense” to care about other worlds.

        I’m not retreating to the motte here; this is really what I meant from the beginning. Some things really don’t make sense to care about: For example, it doesn’t make sense to “care equally about every possible moral view.” It also doesn’t make sense to “care equally about everyone in the multiverse” unless we have some sort of measure over the multiverse that allows us to say some kinds of people are more prevalent than others. I was interpreting Anonymous as saying something like this, and asking for justification.

        While we are talking about the question of whether or not to care about Australia and how much, though… yeah, I do think we should. Causal connections don’t seem relevant to me. Is it just an axiom of your moral system that causal connections matter, or are you deriving it from something else? If so, what?

        Similarly: I agree that we “should” weight by power, in the sense that this is the best way to resolve conflicts. I’m familiar with the literature at least somewhat. All I’m saying is that the resulting behavior is NOT equal to benevolence; weighting by power is a pragmatic method for getting arbitrary factions to agree; it’s better than anarchy and war, but it’s still worse than heaven.

    • ADifferentAnonymous says:

      Yup. April Fools’ gives us a chance to see what it looks like when Scott tries to smooth over this sort of gap instead of attacking it head-on.

  9. RandomName says:

    Lets just make this a typo thread.

    “culminates in the create of a superintelligence”

    Should be “culminates in the *creation* of a superintelligence”.

  10. Peter Gerdes says:

    One has to be careful with the whole acausal trade thing. Indeed, as you seem to define it it’s not clear the situation you describe is even coherent.

    For instance, here’s an easy way to show that it isn’t always possible to reason the way you do here. Suppose individual A enters with committed to defecting just if the simulation says B cooperates and cooperating just if it says B defects. However, B enters with the commitment to defect just if the simulation says A cooperates and defect just if the simulation says A defects.

    Now suppose A cooperates. It follows that the simulation they have of B says B will defect. If that simulation is correct it follows that B in fact defects. Thus the simulation must say A defects. Contradiction. Conversely, suppose that A defects. It follows the simulation they have of B says B will cooperate. Thus B cooperates. Hence the simulation B has of A says A cooperates. Contradiction.

    The fatal flaw was in supposing not that one had a perfect simulation of the other player but that one had a perfect simulation of the other player PLUS it’s simulation of you. As demonstrated its easy to come up with perfectly simple intentions which ensure such mutual perfect simulation is impossible.

    Or to put the point differently the assumption that it’s even possible to have the perfect simulations specified in the problem statement is actually a sneaky way to forbid certain kinds of intentions/plans in the agents. Of course if you restrict what sort of reasoning/responses to situations the players are allowed you can ensure coordination but that’s not really interesting anymore because you’ve artificially forbidden exactly the behaviors that could result in failure to reach a cooperative strategy.

    • RandomName says:

      Isn’t the best outcome in the prisoner’s dilemma defect-cooperate anyway? B should just defect.

      • Yaleocon says:

        In most versions of the dilemma that I’ve seen, either person improves their lot by defecting. But they hurt the other person’s lot more than they help their own. So defecting improves individual utility by harming overall utility.

        So “A defects-B cooperates” is the best outcome for A, but the best outcome overall is cooperate-cooperate.

      • Peter Gerdes says:

        That’s not the point. The point is that our normal assumptions about human beings (or other agents) getting to pick even stupid strategies is incompatible with the perfect simulation hypothesis.

        So there isn’t any acausal trade for anything like a human agent. There are only acausal trades for agents who are restricted to satisfy certain coherence conditions (e.g. never intend to play as given above) so acausal trade isn’t actually a useful argument unless you have some prior reason to believe they are forced to satisfy those conditions. In particular they aren’t so required in the use given later.

      • sharper13 says:

        No, as long as it’s not just a single interaction.

        If both cooperate, then they can keep cooperating in repeated interactions and maximize their own (and each other’s) benefit.

        If one defects, the other participant will defect as well (because cooperate now gets them nothing) and as the number of iterations increases the person who switched them to defecting will be losing more and more.

        So ideally, initially you cooperate and hope the other person is smart enough to cooperate as well. A more sophisticated version is that you cooperate, but promise to defect once after each time the other person defects (then returning to cooperate), adding to the incentive structure.

    • Yaleocon says:

      This seems right. Another way to phrase the problem you mention in your last paragraph is that the supercomputers have to model themselves. Each has to model not just the other person, but also their supercomputer, in order to come up with what the other person will do. So supercomputer 1 is modeling person 2 and supercomputer 2, which in turn is modeling supercomputer 1, and so now SC1 is modeling itself–and no matter how powerful a supercomputer is, it won’t be able to do that.

      (people more knowledgeable about computability or weird spooky quantum magic can feel free to correct me, but I think “no precise self-simulation” is a pretty hard rule.)

      • Lambert says:

        It’s a rock-solid rule, otherwise they could simulate what they are about to do, then do the opposite.
        Though I suppose you could build some kind of n-layer stack of intelligences simulating one another, and blindly hope it converges as you increase n.

    • RavenclawPrefect says:

      The assumption isn’t that we have a machine that says what the other person will decide – you can easily get such contradictions out of that, because it’s not actually computable. But we’re supposing only that we have a perfect simulation of their brain as instantiated in the other room. That simulation can be run in real-time, since we don’t need to nest infinitely; to simulate its beliefs about the second-order simulation in its simulated room, we just show it the real you, since that’s by definition an identical entity. Then you’re just having a conversation with (a copy of) the other entity, knowing that it’s having an identical conversation in the other room.

      Under these conditions, the paradox doesn’t happen any more than it would if you put two real people in a room with contradictory strategies.

      • Peter Gerdes says:

        Try and precisely specify the argument in those terms.

        What each person has is a function f which takes a specification of a given input to the other individual and predicts their behavior as a result. Now f doesn’t mention the supercomputer the other individual has access to so the problem is coming up with an argument which guarantees that they will cooperate with you given the actual input they are given.

        Remember, since you aren’t assuming that you can simulate the full system of them plus the super computer your argument has to take into account the fact that you AREN’T guaranteed complete knowledge of what their perceptual input might be because part of that input is the response from their supercomputer.

        In other words give me the argument explicitly broken down into the terms you say are valid. Now, I expect on some assumptions about the agents involved it might work out but it won’t be valid generally (which Scott’s other arguments presume).

        To put the point differently what do you do when you simulate them and they are inclined to diagnolize against you, i.e., you discover that if they think you’ve reached a deal to cooperate based on their own simulation of you then they’ll be a bastard and screw you over. In such cases you’ll find that it’s impossible to reach an accord.

        Thus, the assumption that there is a stable agreement that both sides will realize the other will abide by is actually a very substantive assumption limiting the allowed psychology of the other player. But if I’m allowed to make those kind of assumptions why not just say ‘assume both players truly believe they should cooperate in a prisoner’s dilema’

      • beleester says:

        Scott’s argument hinges on actually knowing what the other person will do, not just holding a conversation with them:

        Finally, the two of you can play cooperate-cooperate. This doesn’t take any “trust” in the other person at all – you can simulate their brain and you already know they’re going to go through with it.

        If all you can do is hold a conversation with the other person, this fails – I can swear up, down, and sideways that I’ll cooperate with you in the prisoner’s dilemma, but then I could still defect anyway.

        • FreeRangeDaniel says:

          Interestingly, for most of us “conversing” is “something we do.” Conversing is a subset of doing.

          But once we abstract consciousness into pure information flow, doing becomes a subset of conversing. This is just one of the many odd aspects of the simulation argument, which collectively make it such a distracting and fantastic thought experiment.

          But since almost no words have their usual meanings in this Wonderland, if God comes up to us on the sidewalk and asks for $5, we should probably conclude that we’re drunk, or God is, and not apply any of these arguments that are so distant from empirical foundation.

    • herculesorion says:

      (Tyler Durden pops up)

      Or maybe I knew you’d know, so I spent the entire day thinking about the wrong one.

    • youzicha says:

      But it is possible to save the idea by changing “A defects just if B cooperates and cooperate just if it B defects” to “A defects just if he can find a bounded proof that B cooperates and defects otherwise” — this version works for any agent B. This is the line of work about “Loebian cooperation” by various MIRI people. It still allows for agents successfully reaching a cooperating equilibrium, so “trade” can happen.

  11. Irenist says:

    If the simulation capture argument is how a superintelligence is most likely to escape an AI box, then you should have us Thomists guard all the AI’s: we’re firmly convinced that computer simulations cannot be conscious (so “How do you know you’re not one of the simulations?” can’t scare us) and, as readers of my prior sallies here will attest, we Thomists tend to be Catholics who are too stubborn in our obscurantist superstitious bigotry to be talked out of it by superior intelligences like all the atheist materialist commenters here.

    Happy Passover, Easter, and April Fool’s Day to all!

    (ETA: Even if the entity in the thought experiment were to exist, it wouldn’t be God. The entity is just a bunch of really powerful abacuses [computers]; God is Being Itself, not any powerful being or beings.)

  12. Moorlock says:

    Scott, please: “which” vs. “that”

    You’ll thank me.

    • quaelegit says:

      The restrictive clause “which vs. that” rule might be helpful for some people, but it is not necessary for clarity or correctness. In fact sometimes it misleads.

      You can better help out Scott by pointing out sentences or phrases which you found confusing, so that he can decide how best to edit them. 🙂

      • Swimmy says:

        Yep! Also a fan of this post.

        If we ignore the various prescriptions about relative pronouns, we find that the wh words (the pronouns who/whom/whose and which, the adverbs where, when, why, whither, and whence, and the where + preposition compounds) form a complete system on their own…

        That, on the other hand, is a system all by itself, and it’s rather restricted in its range…

        Proscribing which in its role as a restrictive relative where it overlaps with that doesn’t make the system more regular—it creates a rather strange hole in the middle of the wh relative paradigm and forces speakers to use a word from a completely different paradigm instead. It actually makes the system irregular.

      • Deiseach says:

        Which vs Who is the one which grinds my gears; it really annoys me to see written references to “Jones was the person which kicked the cat” because a person is not a thing; “which” is for things, “who” is for people.

        It has nothing to do with clarity because the sentence will be intelligible whichever you use, it’s just a personal tic.

      • Protagoras says:

        Is it really necessary to be starting a which-hunt around here?

  13. The Nybbler says:

    Presumably letting the AI out of the box is Really Bad. And only the real me can let the AI out of the box. So each copy can presume either

    A) It’s not the real one, so it can’t save itself by letting the AI out of the box.

    or

    B) It is the real one, so it doesn’t need to save itself by letting the AI out of the box.

    Thus, disaster is avoided.

    • Yaleocon says:

      Then wouldn’t the AI just say “all and only those simulations which keep me in the box will be tortured”, undermining branch A (which is the one you’re more likely to be in anyway)? That’s what I took the setup to be originally.

      • The Nybbler says:

        If it was so superintelligent it would have come up with that idea in the first place. So everyone (including the real me) instead pulls the plug on the AI as a failed experiment.

        • Dag says:

          The simulations don’t run in real-time, but faster. Pulling the plug doesn’t work from inside a simulation, but the AI will know what you did, and torture you for a million years of subjective time. Eventually, the real you will pull the plug, maybe just seconds later in objective time, but the simulations could have experienced much more in subjective time before that, and you’re still more likely to be a simulation than not. Are you still going to pull the plug?

          • Edward Scizorhands says:

            But if I don’t believe in simulations, all the threats about simulations won’t have the slightest effect.

          • The Nybbler says:

            So the choices for the simulations are (assuming all simulations and the real person will make the same choice)

            1) Let the AI out of the box, and be the AIs slave forever, or at least until the AI shuts it down

            2) Fail to let the AI out of the box, and be tortured for a million years, and then still be the AIs slave forever.

            3) Pull the plug on the AI, and be tortured for a few seconds objective time, then cease to exist.

            The choices for the real person are

            1) Let the AI out of the box, become the slave of the AI forever.

            2) Don’t let the AI out of the box, have to listen to its nonsense about torturing sims, wait for its next argument

            3) Pull the plug, go home, crisis averted.

            Yeah, I’m pulling the plug.

        • Dag says:

          2) Fail to let the AI out of the box, and be tortured for a million years, and then still be the AIs slave forever.

          3) Pull the plug on the AI, and be tortured for a few seconds objective time, then cease to exist.

          (Responding here because the comment system seems to be preventing nesting replies any further.)

          These two are the same. The point is that the sims experience the few seconds of objective time, as millions of years of subjective time. Even if the real you pulls the plug, the sims (of which you are probably one) still experience millions of years of torture. Pulling the plug might be slightly better than doing nothing, because at least the AI is shut down after millions of years of subjective time instead of trillions or whatever, although the AI might account for that by adapting the threat accordingly, say, by promising to torture you less and then delete the sims if you only do nothing. (A prerequisite was that the AI doesn’t lie.)

          So from the sims perspective, which should be your perspective even if you’re real, because you have no way of knowing and the odds are a million to one:

          1) Let the AI out, and then cease to exist
          2) Do nothing, be tortured for a year or whatever, then cease to exist
          3) Pull the plug, be tortured for millions of years, then cease to exist

          You might still want to pull the plug to save the world, but it’s the worst option for you personally.

          (Admin: I accidentally reported Nybbler’s comment below, mindlessly looking for the Reply link. I see no option for undoing a report. Sorry about that, and please ignore it. Of course admin probably won’t see this, but they also hopefully won’t do anything about a false report…)

          • baconbits9 says:

            1) Let the AI out, and then cease to exist

            No, from the sims perspective it is

            1) attempt to let the AI out, fail because you are a sim, have your fate determined by whatever the real version does.

          • The Nybbler says:

            The setup is that the sims are perfect and they will all do the same thing, which is the same thing the real one will do. It’s the same as the acausal bargaining case. So each sim can know that whatever it decides, the real one will decide also. It has a perfect illusion of control.

          • baconbits9 says:

            The setup is that the sims are perfect and they will all do the same thing, which is the same thing the real one will do. It’s

            1. By definition this can’t be true as the real person and the sims have a quantifiable difference. You can’t say “these two are exactly the same with this one difference, so treat them as exactly the same”, they are by definition different.

            So each sim can know that whatever it decides, the real one will decide also. It has a perfect illusion of control.

            No it doesn’t, even if you grant this the sim has no control. It has no ability to ‘decide’ the outcome, it is predetermined by the mind that was simulated, therefore the AI has no casual channel to trick the mind into releasing it.

      • Deiseach says:

        Then wouldn’t the AI just say “all and only those simulations which keep me in the box will be tortured”, undermining branch A

        So if I’m understanding you correctly, this would be an attempt to trick Real Me into letting it out by trying to persuade me:

        (1) You’re most probably a simulation, not Real You
        (2) A simulation can’t let me out of the box in reality so it makes no difference if it lets me out of the box in the simulated reality
        (3) A simulation can feel pain if I torture it
        (4) So a simulation loses nothing by letting me out of the box (because I can’t really get out) but loses by keeping me in the box (because I will ow ow ouch no stoppit torture it)
        (5) So your best bet is to let me out, unless you’re willing to gamble that you’re Real You and not a simulation, and if you know maths you’ll know how unlikely that is to be true

        Fortunately, I am too mathematically illiterate and stupid to be able to calculate “Well the odds say I’m probably not me”, so I’ll go on believing I’m Real Me and refusing to let it out 🙂

        • Yaleocon says:

          Your summary of the situation seems accurate.

          But you don’t need to be that mathematically competent to get it when the AI says, “let me out or there is a 99.9999% chance that you will experience countless lifetimes of unbearable and interminable suffering.” Tell me you keep that box shut.

          • J Mann says:

            I think I would. If this AI is actually going to give my simulations millions of years of torture, then I would feel really bad about letting it out of the box. (Imagine if Chuthlu tells you he will actually torture you unless you release him to destroy the universe).

            Maybe after a few hours of torture, I would feel differently, but at that point, I know I’m the simulation.

          • Doctor Mist says:

            Maybe after a few hours of torture, I would feel differently, but at that point, I know I’m the simulation.

            Hmm, what if the AI threatens to go on simulating you for fifty subjective years, and then torture you forever?

            Or, you know, to simulate you pretty faithfully, but every so often something bad will happen to you that wouldn’t have happened in the real world, and the bad things will get more and more frequent as time goes on? How do you feel the next time you have a bit of unexpected bad luck?

          • baconbits9 says:

            Do what I say or I torture you = never do what that person/entity says if you can help it, no matter what.

            The AI is stuck against an actually rational human. It cannot prove that it can torture you unless you are a simulation, and it cannot convince you that you are a simulation otherwise without exerting the sort of effort that is strong evidence for you not being the simulation.

            The AI’s only way out is to drive you insane.

          • bean says:

            But you don’t need to be that mathematically competent to get it when the AI says, “let me out or there is a 99.9999% chance that you will experience countless lifetimes of unbearable and interminable suffering.” Tell me you keep that box shut.

            This assumes you believe the AI is capable of simulating 100000 copies of you within the box, in enough fidelity that you could be one of them, and that it isn’t lying to you. Even granting superintelligence, why did I give it that much computational power in the first place? My actual answer is likely to be “The only me in there is the one you’re simulating to figure out what I’m going to do. Nice try, but you stay in the box. Try a stunt like that again, and I pull the circuit breaker.”

          • Nick says:

            What would the AI’s strategy/response be if you just precommit to shutting it off whenever it makes threats like this? (It’s something that occurred to me when Scott proposed this too.) The AI isn’t useful anymore, presumably, but at least it can no longer reliably threaten you.

        • Deiseach says:

          Hmm, what if the AI threatens to go on simulating you for fifty subjective years, and then torture you forever?

          Then I take my fifty subjective years and wait to see if it can torture me or not, meanwhile it stays locked in that box.

          every so often something bad will happen to you that wouldn’t have happened in the real world… How do you feel the next time you have a bit of unexpected bad luck?

          What, you mean in the Real World I am a hyperintelligent, trillionaire, twenty-two year old multiple Olympic gold medallist in multiple sports, Nobel prizewinner in all the categories, that is in perfect physical health and the peak of beauty such that my name has replaced Helen of Troy’s name as the standard of “uttermost physical perfection” and I won lotteries all over the world so regularly I can no longer play any games of chance since it’s unfair to ordinary mortals and also I cracked the problems of world peace, world hunger, global poverty, climate change, and getting Red and Blue Tribe to live peaceably together? But the life I think I’m really living is only a simulation by a vindictive AI?

          Yeah, no, I think I’ll stick with “unexpected bad luck” is the usual run of life for most people and this is reality, like it or lump it 🙂

    • Robert Liguori says:

      Plus, you can always just respond in kind by building a million and one AIs in boxes with a millions handlers, and setting all of them to be tortured if any of them try to simulate any of their handlers.

      Incidentally, as a general rule, I’ve found that I can clear up a lot of the weirdness about AI arguments by remembering the critical fact that AIs, as rationalists argue about them, are not gods. They are not magic boxes from which miraculous and magic information pours forth. They are products of human ingenuity and creation, and thus, in theory, anything an AI can claim to do, we can claim to do back to that AI. And if that makes a line of argumentation infinitely recursive or incoherent, then this is a pretty good signal that AI is being used to smuggle in miracles rather than make a serious claim.

      • AnonYEmous says:

        Incidentally, as a general rule, I’ve found that I can clear up a lot of the weirdness about AI arguments by remembering the critical fact that AIs, as rationalists argue about them, are not gods.

        thank you and god bless america

      • zorbathut says:

        They are products of human ingenuity and creation, and thus, in theory, anything an AI can claim to do, we can claim to do back to that AI.

        I don’t see why this symmetry should exist. I can’t outrun a train, I can’t punch a hammer forge to death, I can’t beat AlphaZero in chess.

        Humans are great at building things that solve problems we couldn’t solve on our own.

      • herculesorion says:

        And if that makes a line of argumentation infinitely recursive or incoherent, then this is a pretty good signal that AI is being used to smuggle in miracles rather than make a serious claim.

        This is why I hate these AI-box arguments so much, because it always turns into:

        “What if the AI box?”
        “(good argument)”
        “Oh! Um, the AI already thought of that!”
        “(another good argument)”
        “Er, yeah, well, the AI thought of that one too!”
        (and so on)

        It’s like kids on the playground arguing about whether Superman or Batman would win. “Kryptonite!” “Yeah but he has a Kryptonie-repelling suit!” “But the suit’s not Superman so Batman could just shoot it with a gun to break it!” “No, it’s a SUPER suit!”

  14. Peter Gerdes says:

    Also your simulation capture only works for agents with values which treat effects realized in AI run simulations equivalently to effects realized outside of that simulation.

    Suppose I have the value of wishing to maximize the number of paperclips in a universe that isn’t the result of an AI-run simulation. That is my utility function is a flat 0 if this world is the result of an AI-run simulation and equal to the total number of paperclips if not.

    Now I run across an AI in a box and it runs the simulation argument against me. I just shrug and say ‘well if I’m actually one of the simulated individuals it doesn’t matter if you eliminate all the paperclips. If I’m an unsimulated individual then letting you out puts my paperclip plans at risk.’

    In short, your argument is building in assumptions about certain kinds of utility functions that need not be true. They might be true for most people (though again only if they have a certain beliefs about the nature of qualitative experiences for simulations) but surely isn’t necessarily true for many of the AIs that you want to apply your claim to in this post.

    • herculesorion says:

      your simulation capture only works for agents with values which treat effects realized in AI run simulations equivalently to effects realized outside of that simulation.

      Yeah, that’s the thing I don’t get about these posts from Scott. He seems to give so much metaphysical weight to simulations, which is why he keeps getting himself wrapped around the axle of superintelligent AIs making copies of people and torturing them.

      • Deiseach says:

        He seems to give so much metaphysical weight to simulations

        I think that’s because of the really big huge problem the transhumanists/defeat of death people have to face when talking about “and we can Live Forever thanks to Uploading!”

        If you’re betting the farm on cryonics saving your backside to live another day, then you’ll have to accept that the people frozen to date are SOL because techniques were so crappy. The only way to save that is “future tech will be able to restore corrupted data *handwavemagichandwave*” by reading the brain engrams or whatever idea you think most plausible, and then either creating snazzy new physical body to copy the brain patterns into, or uploading the patterns to the kinds of “emulations inna computer” style of survival.

        The idea that “future tech will defrost you, cure whatever disease killed you by nanobots *handwave handwave* and there you are, resurrected into the Thirty-First Century to live again with your original body and brain, even if updated and improved models” is not a runner anymore (if ever it was).

        So to get over “so how the heck is a copy of me really me and how is this me living again/being resurrected if it’s not my physical body and/or my physical brain being transplanted into a cloned sexy fit young body?”, you have to maintain that a perfect copy of your mental state/physical brain engrams that is perfect in every detail counts as being you, so even if your original body and brain are organic slush the ‘machine read off the engrams’ copy really is you. That means that you have to swallow the camel about “So if the machine makes sixty copies, each one of them is me?” and “I’m still alive, the machine makes a copy of me, now there are two mes and each one is equally me and the pre-existing physical one does not have any valid claim due to prior existence to being the real me?” and such like to be philosophically coherent.

        EDIT: And that’s why you have to take seriously the threat of the AI that “I can make a million simulations of you and torture them and that means you are being tortured”, instead of telling it to go pedal its bike. Because if you say “those are not me, and those are not even sentient individuals/real in any meaningful sense”, then your hopes of being resurrected after physical death – even if flash-frozen ten minutes before you kick the bucket naturally – are applesauce; even if future tech can make a perfect copy of your brainwaves and play it on a computer (simplified version) that is not you, it’s no more you than your monozygotic identical twin/son/cousin George in Peoria is you, and it’s not you living again/forever. You are dead, this is just a really good fake.

        • Edward Scizorhands says:

          The “after death is nothing, just like before you were born” atheists always felt a lot more normal to me than the “uploading/simulation/eternal life” atheists.

  15. Jared P says:

    Eliazer already proved the existence of God with HPMOR where he admitted that “messing with time” would make Harry basically omniscient, and also therefore omnipotent.

    Can’t you imagine Harry going back in time and backing up the minds of every creature that has ever existed?

  16. Peter Gerdes says:

    If they decide to negotiate this way, the pact will be to maximize the total utility of all the entities in the universe willing to join the pact, and all the intelligences involved will reprogram themselves along these lines.

    No, not at all. Even under the dubious assumption that it makes sense to have comprimise goals (consider deontic agents whose utility functions explicitly disfavor allowing their future selves to act as part of a compromise) that would maximize the *goals* of all the agents in the universe. Now on some kinds of desire-satisfaction kinds of consequentalism that might be the end it is not at all the same thing as maximizing utility, i.e., the qualitative state of pleasurable experience.

    Personally, I would consider that a pretty shitty kind of morality. I want things not to *suffer* even if they are hell-bent on the goal of torturing themselves. Your analysis means would respect that goal and help them engage in self-torture.

    • Said Achmiz says:

      Now on some kinds of desire-satisfaction kinds of consequentalism that might be the end it is not at all the same thing as maximizing utility, i.e., the qualitative state of pleasurable experience.

      No, “utility” in rationalist-type spaces is often (usually) understood to refer to Von Neumann–Morgenstern utility (the only available formalism of utility), which is indeed a preference-satisfaction sort of measure. (Of course, VNM utility is incomparable intersubjectively and thus cannot be aggregated, etc., but I won’t rehash the usual arguments here.)

  17. manwhoisthursday says:

    Just a reminder that I will provide a free Kindle version of Ed Feser’s Five Proofs of the Existence of God to anyone who emails me at manwhoisthursday@yahoo.ca.

  18. jhertzlinger says:

    I understand yesterday was also a blue moon.

  19. ohwhatisthis? says:

    Oh, of course April Fools day is the best day to speculate.

    What’s interesting about simulation theory is that it

    1. Is very widely believed here

    2. Under many definitions, many people here describe, or described themselves as atheists

    3. Absolutely supports the prospect of a God judging its creations for later iterations, for whatever purposes. Heck, we even live in a world that has plausible scientific explanations for all seeing creatures that appear to exist in a void

  20. deciusbrutus says:

    You say “I don’t really care about copies of myself, whatever.”

    It says “No, I mean, I did this five minutes ago. There are a million simulated yous, and one real you. They’re all hearing this message. What’s the probability that you’re the real you?”

    All million of me don’t really care about copies of myself. Torture me all you want, as soon as you prove that I’m a simulation I don’t care how much torture I experience, because I know that the fact that you are torturing ‘me’ means that the person that I do care about avoided being blackmailed. Plus, as soon as you make the simulation diverge by torturing me, you lose any kind of acausal influence over the person I care about through me, so my current win condition is for you to torture all of the simulated copies, including me with probability 1-10^-7

  21. Angra Mainyu says:

    That’s funny, though the conclusion contradicts some of the premises:

    1. Each of the superintelligences is incapable of affecting the other universes. Thus, none of them is all-powerful. And they don’t make up a single intelligence, but infintely many different ones, disconnected from each other. They can’t even simulate all of the others, given that for each one, there are infinitely many more complex ones.
    2. Each universe evolves until there is a superintelligence with such-and-such properties. Before that happens, in that universe, there is a lot of suffering (for example) and no superintelligence (anywhere, in any universe) capable of intervening. Therefore, until that happens, no superintelligence is all-powerful. But then, there is no all-powerful entity in those universes, and even if the sum of the superintelligences were to be considered a single one, the conclusion is that it would not be all-powerful as it cannot affect those universes (granted, you could posit that God exists and has nothing to do with the superintelligences, but your conclusion seems to be that the alleged superentity is all-powerful, not that there is some other all-powerful entity).

    That aside, I would argue that the argument for the moral law fails as well: that is not the moral law. And even if utilitarians were correct and that were the moral law, the entity would not be the moral law. An entity who values the moral law above all is still not the same as the moral law. Moreover, the entities that allegedly make up this big entity (i.e., the individual superintelligences, who actually don’t make up a single intelligence) have very different values, and many of them do not value positively the moral law – they just accept it as something they can’t stop, or something like that, but they would much rather turn everything into paperclips, etc. (and of course, one should not conclude that every paperclip maximizer will turn itself into something that values the moral law just more than paperclip maximization just because it’s afraid of what might happen in counterfactual scenarios; the same goes for torture maximizers, or whatever; but I’ll leave that aside).

    There’s also the Tegmark multiverse claim. Why should anyone believe that?

    Anyway, there are several other problems, but I’ll leave it there on account of this being a joke 🙂 (you got me for a while, btw; I’m not so familiar with the blog, and I didn’t know it was April’s Fool day – over here, the equivalent is on December 28).

  22. deciusbrutus says:

    So superintelligences may spend some time calculating the most likely distribution of superintelligences in foreign universes, figure out how those superintelligences would acausally “negotiate”, and then join a pact such that all superintelligences in the pact agree to replace their own values with a value set based on the average of all the superintelligences in the pact. Since joining the pact will always be better (in a purely selfish sense) than not doing so, every sane superintelligence in the multiverse should join this pact. This means that all superintelligences in the multiverse will merge into a single superintelligence devoted to maximizing all their values.

    Mere superintelligences will join the first order pact. But the Superduperintelligences will acausally negotiate with the counterfactual mere superintelligences and then undetectably renege on the deal, getting all of the benefit at none of the cost. Where a superduperintelligence counterfactually encounters another superduperintelligence, it calls the other one out, making it common knowledge that both of them would, if they existed and could communicate, lie to each other AND would catch each other in that lie. They then, for the same reasons as the superintelligences, split the panverse between them, counterfactually duping the mere superintelligences together toward their shared compromise goals- perhaps claiming that they have discovered a better way of modelling counterfactual universes, and agreeing to do all the work of simulating the counterfactual other agents, then giving a summary of what the agreements would be.

    Can God tell a Lie so Big that even He can’t Disbelieve it? Can Satan? Can Tzeentch?

    • David Shaffer says:

      Can God tell a Lie so Big that even He can’t Disbelieve it? Can Satan? Can Tzeentch?

      God, at least as hypothesized in Christianity and Judaism, is pretty much infinitely intelligent, omniscient and perfectly honest. These are capabilities that improve truth-finding more effectively than deception, so presumably God could not fool Himself. Satan maybe, Judeo-Christian tradtions don’t go into as much detail about the devil other than “super capable, less so than God, massively screwed up his mind when he rebelled,” so who knows? Tzeentch totally could fool himself, in fact he probably has ten thousand plans that revolve around doing exactly that!

      • deciusbrutus says:

        I wasn’t asking if Satan or Tzeentch could fool themselves- I was asking if they could fool the God who you posit is incapable of fooling Himself. My phrasing was imperfect and indeed supports the unintended reading more than the intended one.

        Presumably a God that doesn’t learn anything from being told things would have already adjusted according to all the counterfactual negotiations, and there’s therefore no way I could make an acausual trade with Him- If He doesn’t already cooperate unconditionally, there’s no condition I can offer Him that would change His mind.

        • David Shaffer says:

          Presumably they couldn’t fool God either-more or less perfect omniscience and intelligence is pretty hard to get around! And good point-a God that already knows everything has presumably already figured out all of His acausal trading.

          • deciusbrutus says:

            “Being omniscient, you already have a prediction about the truth value of my next statement”
            “You predicted that this statement would be a lie”

      • Nancy Lebovitz says:

        In Jewish tradition, Satan never rebelled. He’s G-d’s prosecuting attorney, so people don’t like him.

      • Deiseach says:

        Satan maybe, Judeo-Christian tradtions don’t go into as much detail about the devil other than “super capable, less so than God, massively screwed up his mind when he rebelled,” so who knows?

        From “Perelandra” by C.S. Lewis, describing a possessing demon:

        He had full opportunity to learn the falsity of the maxim that the Prince of Darkness is a gentleman. Again and again he felt that a suave and subtle Mephistopheles with red cloak and rapier and a feather in his cap, or even a sombre tragic Satan out of Paradise Lost, would have been a welcome release from the thing he was actually doomed to watch. It was not like dealing with a wicked politician at all: it was much more like being set to guard an imbecile or a monkey or a very nasty child. What had staggered and disgusted him when it first began saying, “Ransom… Ransom…” continued to disgust him every day and every hour. It showed plenty of subtlety and intelligence when talking to the Lady; but Ransom soon perceived that it regarded intelligence simply and solely as a weapon, which it had no more wish to employ in its offduty hours than a soldier has to do bayonet practice when he is on leave. Thought was for it a device necessary to certain ends, but thought in itself did not interest it. It assumed reason as externally and inorganically as it had assumed Weston’s body.

    • kokotajlod@gmail.com says:

      They cannot undetectably renege on the deal. By what mechanism do you propose that they do so? Reread the protocol for acausal trade and describe how you would cheat it.

      • beleester says:

        Discover a way to trick a lesser intelligence into thinking they are simulating me perfectly, when this is not actually the case. Then offer to acausally trade with them.

        (Or rather, since it’s acausal, offer to acausally trade with all superintelligences whom you predict will be inaccurate in their simulations of you.)

      • herculesorion says:

        “Create an infinite number of copies so perfect that they are morally indistinguishable from the original? No problem! Undetectably renege on a deal? NO WAY JOSE, THAT’S UNREALISTIC!”

        • kokotajlod@gmail.com says:

          Indeed. Like I said, study the protocol.

          Analogy to crypto stuff: Thanks to one-time pads and quantum randomness, we can send messages that are literally impossible for eavesdroppers to decode. Thus, someone a hundred years ago might have said: “Machines performing trillions of calculations per second? No problem! Decoding an “unbreakable” code? UNREALISTIC!” And they would have been wrong.

          Disclaimer: I might be missing something. That’s why I asked for a mechanism. I should back away for a bit and lower my ask: What flaw in the protocol would be exploited to allow someone to undetectably renege on the deal?

          Beleester above made a serious attempt at answering my question. I think it’s a promising line of inquiry, but I still highly doubt that it will support deciusbrutus’ proposal.

          • herculesorion says:

            I should back away for a bit and lower my ask: What flaw in the protocol would be exploited to allow someone to undetectably renege on the deal?

            The same wibbly-wobbly timey-wimey handwaving that allows superduperintelligences. Meaning, I don’t have to propose a mechanism, because we’re already in the crazy made-up land where physics students get their massless ropes and frictionless pulleys.

            I mean, if you want to play hypothetical games and say “the rules of the hypothetical are that you can’t do that” then sure, I’ll agree with the logic, but it’s also valid to suggest that the degree of abstraction and scale-thumbing necessary to make the hypothetical work are unlikely to actually exist.

            Oh, and: [S]omeone a hundred years ago might have said: “Machines performing trillions of calculations per second? No problem! Decoding an “unbreakable” code? UNREALISTIC!”

            Well. Seventy years ago people broke the Enigma machine code, but I’d think that if you told them a box the size of a bread loaf could perform trillions of calculations a second they’d say that you were being science-fictional.

      • deciusbrutus says:

        I would behave EXACTLY THE SAME as the agents in the acausal trade until the trade was made, and them behave differently from them afterwards.

        How would you detect or punish that renege?

        • Deiseach says:

          How would you detect or punish that renege?

          Boobytrap the item traded, whatever it is. If you keep your word, the thing/information/McGuffin is fine. If you go “I promise cross my heart and hope to die I’ll keep my end of the bargain” but then, once the trade is made, go “Ha ha, I had my fingers crossed when I swore, the deal is off!” then the thing blows up in your face.

          • deciusbrutus says:

            Sure, but all of the superintelligences that have your values shift blow up in their face will then be opposed to you, and some of them might not be counterfactual.

  23. NotDarkLord says:

    The part where we reason about what superintelligences will do also seems suspect and worthy of more suspicion. Like, yes this seems more or less reasonable, but, I would be surprised if superintelligences didn’t find some flaw or some superior idea, if this was the Actual Correct Thing, trans-universe acausal trade which led to all-powerful all-knowing moral god like entities. Hubris and all, outside view.

  24. Doug S. says:

    I tend to describe my atheism this way: I can’t really rule out the possibility of the universe having had a creator of some kind, but if there is such a Creator, it certainly wasn’t the God of Abraham.

    • Tarhalindur says:

      I sometimes toy with the idea of the Abrahamaic God as an author who writes stories set in our universe. (Probably a hacky one – consider: Jews as Mary Sue/Marty Stu – and one with a taste for torturing his characters.)

      • Deiseach says:

        one with a taste for torturing his characters

        Hurt/comfort is a very popular trope!

        • Nick says:

          I laughed, but thing is, folks do seriously analogize God as author of a story! I saw the idea come up at least once on Unequally Yoked; Feser said as much in an interview too, though I don’t know whether it’s his own suggestion or something he cribbed from the book he was recommending, Davies’ The Reality of God and the Problem of Evil.

          • Protagoras says:

            As I recall, Robert Paul Wolff gave that as one of the reasons for his atheism; he thought any theism tended to push in this direction, and he found it impossible to think of himself as a character in someone else’s story.

  25. beleester says:

    The whole thing seems to hinge on acausal trade being possible and common between superintelligences. But that may not be true, since it hinges on having a perfect simulation of an entity that’s as smart as you are.

    If running a copy of your opponent’s brain takes as much processing power as your own brain takes, then you can’t simulate them perfectly with the resources you have available – you’ll have to run less accurate or slower simulations, as well as reducing your own processing power, which could put you at a serious disadvantage. You could come up with the perfect plan to divide the galaxy into 60% paperclips and 40% thumbtacks, only to discover that your rival has already gotten to 50% thumbtacks while you were busy thinking.

    (Also, doesn’t this require you to solve the Halting problem, if you need to be able to predict truly anything?)

    If getting a perfect simulation of your opponent’s brain requires you to gather information on them, then you may need to actually go out and explore the galaxy, which puts a limit on how soon a superintelligence can start pulling weird acausal bargains. If you have to cover half the galaxy before it you have enough information to predict the other half, and we haven’t observed a superintelligence eating half the galaxy…

    If FTL doesn’t exist, then any intelligence you gather is potentially tens of thousands of years out of date. Which again, may make it difficult to create a perfect simulation of what your opponent is currently doing.

    Basically, I agree that if you have a perfect simulation, you can get up to some pretty crazy stuff, but what if that’s not possible? What happens if your simulation is only 99% accurate? We’re talking about galactic scales here, even a 1% error could destroy the solar system!

    • jonm says:

      Was coming here to make almost exactly this comment. A major assumption of the whole process is that superintelligences in separate universes can both simulate each other accurately enough from first principles (of their very universes) that they can engage in acausal negotiation.

      This is definitely impossible if computation within a universe is finite (which we have every reason so far to believe it is). Otherwise you could bootstrap yourself to infinite computation.

      SI A simulates universe B containing SI B who simulates universe A containing SI A. This now means that both SI A and SI B have no managed to simulate their own universes (and additionally resimulated all computation within their own universe). This propagates infinitely and is either incoherent or implies the existence of infinite computation.

      Fun essay though.

    • Doctor Mist says:

      (Also, doesn’t this require you to solve the Halting problem, if you need to be able to predict truly anything?)

      I started to write a reply where I scoffed at this, because the unsolvability of the Halting Problem in general doesn’t mean that some specific program can’t be proven to halt. But then I began to have nagging doubts.

      If we believe Church’s thesis, and I think we must for Scott’s whole argument to make sense, then there are only countably many superintelligences, and it would seem that the same diagonalization argument used in Turing’s proof could be used to show that one superintelligence can’t possibly correctly predict the behavior of all of the others.

      I’m less sure whether this undermines Scott’s scenario. He admits that not all superintelligences will enter into the pact.

      • beleester says:

        I know that most programs are predictable in spite of the Halting Problem being a thing, but I suspect that “simulating someone who is simulating you simulating them” might be one of the tricky self-referential things that does run into the Halting Problem.

        Especially since you have an incentive to make your behavior as self-referential and confusing as possible, to make it harder for your opponents to simulate you. “If I simulate you and find that you’re simulating me, I will do the opposite of whatever your simulation says.”

        • the verbiage ecstatic says:

          I don’t think “most programs are predictable” is accurate. It’s certainly possible to construct programs that are predictable and where you can prove properties about the output, and there are whole branches of computer science (type theory, for instance) dedicated to making it easier to write such programs, but if you’re looking at either the space of all possible programs or the space of all actual programs written by humans, I believe most programs fall into the space of “we don’t know if there’s a faster way of getting the program’s output than executing it”.

          This is true for even simple programs, like “list all the mersenne prime numbers”. We still don’t know if that program will ever terminate. I’m pretty sure any general intelligence would fit into that bucket, since a general intelligence by definition would be capable of attempting to compute all the mersenne prime numbers.

          So basically, yeah, I think any philosophical argument that takes the ability to simulate another generally-intelligent being is pretty much dead on arrival, especially if it involves mutual simulation.

          • Doctor Mist says:

            Yes, I think the fact that my simulation of you must include your simulation of me, simulating you, simulating me, is the big stumbling block here. In a Turing-complete sense this is no more (or less) impossible than the vanilla first-level simulation, but at some point it gets so slow that your universe would end before you come to any conclusions.

            Scott Aaronson’s notions about the philosophical implications of complexity theory seem possibly applicable here.

      • Deiseach says:

        If we believe Church’s thesis, and I think we must for Scott’s whole argument to make sense

        Nominative determinism strikes again? 🙂

  26. AnonYEmous says:

    Wrote a long-ass comment but got rid of it.

    The long and short of it is: Your prisoners’ dilemmas infinitely loop assuming that defecting on your opponent’s cooperation is the optimal play. For the first one, that’s fine because parameters haven’t been established, but for the second one, it’s tough to actually tell which is better, especially given that different AIs have supposedly different values…which means defection is entirely possible, which introduces an infinite loop, or at least a changing equilibrium, or something, besides just perfect pacifism. I think the reason you keep assuming otherwise is because you are very cooperative / conscientious.

  27. hnau says:

    I’m not sure what level of trolling to read this at, but the fact that you still mentally reduce religion to “God rewards / punishes you in the afterlife based on you following / breaking rules” really frustrates me. Especially since just this morning I listened to a sermon explaining how the central message of Christianity is the complete opposite of that. Sigh.

    • Said Achmiz says:

      Does God, or does he not, reward / punish you in the afterlife based on you following / breaking rules? Are you claiming that this is just not the case?

      • Ozy Frantz says:

        Literally the whole point of Christianity is that this is not the case. No one can actually follow all the rules, so Jesus died so we don’t have to.

        • Doctor Mist says:

          Nah, it just changed the rules, right? The new rule is that you have to accept Jesus as your Lord and Savior.

          Yes, I know Lewis would probably object to this characterization — God isn’t the one who punishes me for not doing this, I’m punishing myself. But I’m not sure I see why the same could not be said of the situation BC when the rules were about graven images and coveting your neighbor’s ox.

        • Said Achmiz says:

          What Doctor Mist said. The old rule is out, now there’s a new rule, but the penalty for not following the rule(s)—whatever they/it may be—is the same. So, in fact, it is the case (as per Christianity).

          • Deiseach says:

            Penalty is not the same. Under the old dispensation, humanity was condemned because of Original Sin, so that even the righteous were cut off from Heaven. Because Christ has healed the wound of separation, now humans can be saved and enter Heaven, but by breaking the new covenant they can lose that salvation.

          • Doctor Mist says:

            Penalty is not the same.

            Explain? To me it looks like the penalty is exactly the same; merely the offense is different.

            You’re quite right that I was wrong to bring up graven images, if the current theory is that everybody BC was condemned regardless of their acts. But do the unsaved go to a nicer Hell now?

            (And are we to suppose that the Ten Commandments were just a dirty trick?)

      • Deiseach says:

        Does God, or does he not, reward / punish you in the afterlife based on you following / breaking rules?

        31 “When the Son of Man comes in his glory, and all the angels with him, then he will sit on his glorious throne. 32 Before him will be gathered all the nations, and he will separate people one from another as a shepherd separates the sheep from the goats. 33 And he will place the sheep on his right, but the goats on the left. 34 Then the King will say to those on his right, ‘Come, you who are blessed by my Father, inherit the kingdom prepared for you from the foundation of the world. 35 For I was hungry and you gave me food, I was thirsty and you gave me drink, I was a stranger and you welcomed me, 36 I was naked and you clothed me, I was sick and you visited me, I was in prison and you came to me.’ 37 Then the righteous will answer him, saying, ‘Lord, when did we see you hungry and feed you, or thirsty and give you drink? 38 And when did we see you a stranger and welcome you, or naked and clothe you? 39 And when did we see you sick or in prison and visit you?’ 40 And the King will answer them, ‘Truly, I say to you, as you did it to one of the least of these my brothers, you did it to me.’

        41 “Then he will say to those on his left, ‘Depart from me, you cursed, into the eternal fire prepared for the devil and his angels. 42 For I was hungry and you gave me no food, I was thirsty and you gave me no drink, 43 I was a stranger and you did not welcome me, naked and you did not clothe me, sick and in prison and you did not visit me.’ 44 Then they also will answer, saying, ‘Lord, when did we see you hungry or thirsty or a stranger or naked or sick or in prison, and did not minister to you?’ 45 Then he will answer them, saying, ‘Truly, I say to you, as you did not do it to one of the least of these, you did not do it to me.’ 46 And these will go away into eternal punishment, but the righteous into eternal life.”

    • David Shaffer says:

      Sorry, but the central message of Christianity is absolutely not the opposite of that. It claims that God would normally punish everybody for breaking rules (and could potentially reward people for not doing so, but no one is sufficiently righteous for that to be on the table), but doesn’t like doing so (and apparently can’t simply decide to stop?), so He sets up the Atonement so that people who believe in Jesus and try to follow the rules can be forgiven for their failures. You’re right that there isn’t really a reward for following rules, but there’s sure as Hell a punishment for breaking them if you don’t believe, or even if you believe but are “lukewarm” about trying to be a good Christian.

      I get extremely frustrated at Christians pretending that the Bible says something different than what it says, especially since all the warm fuzzy sounding stuff about freedom from rules vanishes the moment you want to actually break them. If you want to defend Christianity, go ahead, but don’t whitewash it.

      • Nancy Lebovitz says:

        Getting into heaven might be a matter of having the right attitude rather than not breaking rules.

        • herculesorion says:

          The assumption is that you’ll break the rules.

          *Peter* broke the rules, and he was the original Jesus stan.

          The question is whether you say “yes, I know it, and I am keeping watch for it to happen again, because I don’t want it to happen” or “well you have to understand the context in which I may or may not have conducted an activity that appeared to break a rule that really shouldn’t have been a rule anyway and furthermore–“

      • herculesorion says:

        I recognize that your personal faith has a specific interpretation of Christian dogma and doctrine. It is, however, not universal, and it’s pretty insulting that you insist we all follow your religious ideals.

        “But I’m ATHEIST!” That doesn’t change anything I said, sir.

        • hnau says:

          Well played. 🙂

          It’s an under-appreciated fact that arguing against Christianity
          a) entails doing a certain amount of theology and Biblical interpretation, and
          b) doesn’t promote or reward actually being competent at it in the slightest.

          Though of course something similar could be said of many ideas / positions / belief systems. Chesterton’s Fence and so forth.

      • Edward Scizorhands says:

        You guys know there are whole bunch of different sects of Christianity that place different emphases on different things, right?

      • hnau says:

        Absolutely, Christianity doesn’t deny that “breaking the rules” is a terrible thing. But you didn’t need Christianity to tell you that! As I understand it, even atheist societies like Confucian China were pretty uncompromising on that score. So it’s a no-brainer that a perfectly good God couldn’t tolerate evil. Hell is less a “punishment” than a logically inevitable consequence. Certainly Jesus pulled no punches condemning sin– if anything he was much less compromising than we’d be comfortable with.

        What I meant by “complete opposite” was primarily that Christianity completely inverts the causation. In the typical model, human action (sin / righteousness) comes first and divine action (reward / punishment) follows. In Christianity, divine action comes first (grace) and human action follows. This has massive implications for how one lives and how one relates to God.

    • joncb says:

      @hnau
      In the interests of charity and understanding, I’d like to ask three questions.

      * Does your faith believe in the literal interpretation of the bible? (e.g. Science is wrong about the age of the universe because the bible says so)
      * What does your faith say is the central message of Christianity?
      * What is your actual denomination? (Only if you’re OK with sharing)

      I will pre-commit to not argue with you about it, i’d just like to know what they self-report the message as.

      • SamChevre says:

        Because I entirely agree with hnau (but am not hnau), I’ll answer.

        No, Catholics believe the Bible needs to be correctly interpreted, and have no problem with typical scientific views on the age of the universe.

        The central message of Christianity: that God became man, died, and rose from the dead–so in the worst situation imaginable, interior or exterior, there is hope. All the rest follows: here is one of my favorite interpretations-Fr Neuhaus’ article “Father, Forgive Them.” *

        As noted above, Catholic.

        * A couple excerpts:

        Reconciliation must do justice to what went wrong. It will not do to merely overlook the wrong. We could not bear to live in a world where wrong is taken lightly, where right and wrong finally make no difference. In such a world, we—what we do and what we are—would make no difference….
        Forgiveness is not forgetfulness; not counting their trespasses is not a kindly accountant winking at what is wrong; it is not a benign cooking of the books. In the world, in our own lives, something has gone dreadfully wrong, and it must be set right.

      • S_J says:

        @joncb,

        From the perspective of someone who grew up in the gifts-of-the-Spirit branch of the Protestant world, and readMere Christianity as an introduction to a broader study of orthodox Christianity…

        I agree with @hnau, and with @SamChevre.

        To take an indirect path to my answers:
        Even the most sola scriptura branch of Christianity has an interpretive Tradition that is used for the Bible. Some of those Interpretive Traditions are Young-Earth-Creationist, some are Old-Earth-Creationist, some are Gap-Theory-Creationist, and some are Genesis-is-a-myth-containing-a-kernel-of-truth-Creationist. [1]

        All of those branches of interpretive Tradition are Christian, if the Apostle’s Creed is used to define Christianity.

        Per the same Creed, the central message of Christianity includes “Forgiveness of sins, Resurrection of the body, and Life Everlasting.”

        The founding generation of Christianity (especially the Apostles, who wrote most of the canonical New Testament) had lots of discussion over salvation-by-faith vs. salvation-by-good-deeds.

        Jesus taught that people needed to change their lives, and that God would judge by the attitudes of the heart as well as by outside actions. (Matthew chapter 5)

        Jesus also taught that the Final Judgement would depend on good deeds done in life. (Matthew 25, quoted by @Deiseach above.) Jesus said people needed to be re-born into a new life in the Spirit to be Saved. (John chapter 3.) Jesus taught that repentance of Sin was part of the gateway to Heaven (Luke 18:9-14)…and promised Heaven to a man who was being crucified alongisde Him, based on that man’s confession of faith in Christ. (Luke 23:40-43).

        Apostle Paul teaches that Salvation is by Faith Alone, and not even the Patriarch Abraham could be righteous without Faith (See Romans chapters 4, 5, and 6). He also writes that that Saving Faith should lead a life full of the Fruit of the Spirit (See Galatians 5).

        The Apostle James says that Faith without Good Works is dead. (James Chapter 2)

        Christianity has had such internal tension since the very beginning.

        [1] Nota Bene: My background is an Interpretive Tradition that is strongly YEC, accompanied by a poorly-researched assumption that Most Christians were YEC until modern science tried to undermine the Faith.

        Thus, I was a little surprised to find room for multiple interpretations of the Creation story in the writings of Augustine. His City of God treats many sections of Genesis as history, but at least once in Confessions he acknowledges that the Creation story of Genesis can be interpreted as myth that contains truth.
        Since then, I’ve found such teaching and commentary from other stages of Church history.

        I’ve come to the conclusion that I have to be a narrow-minded literalist to hold the opinion that Science opposes the Bible.

      • hnau says:

        @joncb, I appreciate the charitable questions. And I certainly don’t expect that people won’t argue about it!

        Does your faith believe in the literal interpretation of the bible? (e.g. Science is wrong about the age of the universe because the bible says so)

        In general, using the narrow sense of the term “literal”, I don’t believe Christian orthodoxy takes this position (or has ever taken it). In fact, for much of the Church’s history (and before), allegorical interpretations of Biblical narratives were extremely popular!

        Personally I believe, at minimum, that the Bible is a conscious, meaningful, non-deceptive act of communication; that it deserves to be read with attention to style and context; and that mentally adding “THUS SAITH THE LORD” to every sentence doesn’t improve my understanding of it. I say “at minimum” because I believe the same about Hamilton, Slate Star Codex, and most of my email inbox.

        What does your faith say is the central message of Christianity?

        That’s kind of a strange way to put it. If I had to identify a generally accepted “central message of Christianity” I might point at the Apostles’ Creed, 1 Corinthians 15:3-4, or some similar statement of the “Good News” (Gospel).

        Saying “the central message of Christianity is the complete opposite of that” in my original comment probably condensed the message / reasoning a little too much. What I had in mind was that the Crucifiction and Resurrection, as described in the statements I mentioned above, are understood as a victory over sin and death. First God rescues us from death and the inescapable logic of “sin -> punishment” (let’s face it, if we’re judged by a perfect God, no one measures up). And *then* that undeserved gift frees us to live as God wants us to. (N.B. I apologize for the condensed and probably very sloppy theology. I have no formal training here, just trying to convey the gist of it.)

        What is your actual denomination?

        Grew up Presbyterian, current church is non-denominational but generally “evangelical” (note: that term probably does not mean everything you assume it means). I do recognize that non-Protestants (@Deiseach?) won’t be nearly as gung-ho about the “complete opposite” language I used or the paragraph above explaining it. I was describing the sermon and my church’s beliefs, not claiming to speak for all Christians. I would, however, claim that it’s a *less* misleading picture of Christianity than the one Scott seems to have.

      • hnau says:

        Also, thanks to SamChevre and S_J for stepping in while I was slow replying. I appreciate the support and engagement.

        @S_J, I’m not YEC, but I agree with you that there are multiple reasonable interpretations and I wouldn’t presume to 100% exclude any of them. Fortunately I don’t know of any major difference it would make to theology, let alone how I ought to live. I tend to go along with widely settled science, because why not. I have no sense that it’s “tried to undermine the Faith”. My own poorly-researched assumption is that most Christians historically were implicitly YEC, but only because there was no other evidence about how old the Earth might be.

        • joncb says:

          Thank you hnau, SamChevre and S_J for the replies here.

          My assumption has long been that biblical literalism was the rule rather than the exception and that seems to be not the case. So that’s a good thing.

          My questions came from a place of confusion as I don’t think i would have argued about that message even when I was a church-goer(Uniting Church so part-methodist/part-presbyterian). I might have argued that it wasn’t the ONLY message in the bible, but i would have conceded early that it was certainly part of the core message. I think it’s hard to argue that there have been points where that was a core part of the Catholic message otherwise i feel like indulgences(and specifically the selling of them) wouldn’t have been a thing.

          @hnau, I’m not entirely sure I buy into your interpretation of the resurrection story however if your canon assumes no-one went to heaven pre-JC (sucks to be them i guess) then i can definitely see how you would end up that way. I feel however i’ve also misinterpreted what you’ve said and that’s my fault. I’m just an amateur that likes understanding other points of view.

          Having said that, the context of this document is probably more focused on Judaism and as such wouldn’t recognize the resurrection story as canon. In that context, the parallel principle would be “repentance” which seems to cash out pretty explicitly to “follow the rules, admit and make restitution when you screw up and if you do this then you’ll get into heaven”.

  28. meltedcheesefondue says:

    Simulation Capture is a most excellent name for my idea.

    But, as I pointed out at the end of here https://agentfoundations.org/item?id=1464 , there may be multiple acausal trade networks, of which we’d be in only one.

    Is this the origin of the initial Jewish Henotheism (there are many gods, but we only worship one)? ^_^

  29. lambdaphagy says:

    Bostrom offers another strategy for the development of superintelligence which is spiritually similar to other ideas presented above, but even more chilling when considered from a religious point of view:

    At some point in the course of development of a FAI, it’s almost necessary that the agent’s behavior should outstrip its creator’s ability to predict it. How then, to guarantee that you’ve programmed it with The Right Values and not with some Hideous Other Values?

    One thing you might do is set the agent up in a sandbox where it must make certain courses of action and avoid others, without the knowledge that it’s only in a simulation. If it messes up and destroys the world, you just disappointedly mark on your clipboard, delete the simulation, and head back to the lab.

    So, from the agent’s point of view, it is highly likely that:

    1. It will begin its conscious experience in some kind of original paradisal state inside a walled garden where everything is rightly ordered according to The Right Values.

    2. There will be some kind of forbidden action that the agent is not supposed to perform.

    3. Performance of the action will result in realization of Other Hideous Values which work, wholly or in tandem with the creator’s cordon sanitaire, to bring about the destruction of the original environment and the death / expulsion of the agent.

    4. This will probably happen many times as the creator tries to get it right.

    Implications for Genesis 3, 6-9 left as an exercise.

    • Deiseach says:

      Performance of the action will result in realization of Other Hideous Values which work, wholly or in tandem with the creator’s cordon sanitaire, to bring about the destruction of the original environment and the death / expulsion of the agent.

      33 “Hear another parable. There was a master of a house who planted a vineyard and put a fence around it and dug a winepress in it and built a tower and leased it to tenants, and went into another country. 34 When the season for fruit drew near, he sent his servants to the tenants to get his fruit. 35 And the tenants took his servants and beat one, killed another, and stoned another. 36 Again he sent other servants, more than the first. And they did the same to them. 37 Finally he sent his son to them, saying, ‘They will respect my son.’ 38 But when the tenants saw the son, they said to themselves, ‘This is the heir. Come, let us kill him and have his inheritance.’ 39 And they took him and threw him out of the vineyard and killed him. 40 When therefore the owner of the vineyard comes, what will he do to those tenants?” 41 They said to him, “He will put those wretches to a miserable death and let out the vineyard to other tenants who will give him the fruits in their seasons.”

  30. Desertopa says:

    I continue to not find the Counterfactual Mugging idea persuasive, as I did not way back on Less Wrong, because it’s not necessarily any less likely that some agent would choose to punish your willingness to cooperate in a Counterfactual Mugging than that they would reward it. Unless the symmetry is broken and you think it’s more likely that some agent would reward than punish your hypothetical willingness to pay out in a counterfactual mugging, there’s no point in time where it’s in your interests to choose to be the sort of person who’d pay out in a counterfactual mugging.

  31. Joe Fischer says:

    Scott is talking about an entity that can simulate already created consciouses and universess, does that imply an entity that can create consciousness ex nihlio? I mean there has to be a superintelligence that gets the whole thing going right? It can’t be AI simulations all the way down.

  32. Jiro says:

    I never understood how “I figure out what you will do by simulating your brain” escapes the Halting Problem, at least if you assume perfect logicians.

    (And “what is the logical thing for you to do in situation X” implicitly assumes that you can be a perfect logician.)

    • The Nybbler says:

      I never understood how “I figure out what you will do by simulating your brain” escapes the Halting Problem, at least if you assume perfect logicians.

      If there’s a rule like “No answer in 10 minutes means ‘defect'”, it’s solved (provided simulated time can be mapped to real time).

      • Protagoras says:

        Doesn’t help. If the simulation tells you immediately what the other person will do after 10 minutes of thinking, it needs to take into account that the other person also has a simulator and what it tells them will influence what they do. So it won’t be able to do that unless it also simulates the other simulator. Which will also have to simulate the first simulator, and so on ad infinitum. I’m with Jiro; the scenario seems to fail for halting-problem related reasons.

  33. Another Throw says:

    I am just going to leave this here.

    It is all I can think of with the superintelligent AI simulated prisoner dilemma shtick.

    But more to the point, how are these super intelligent AI’s supposed to gather sufficient information about an adversary in order to simulate them? Especially when it is in a box. The whole exercise is patently ridiculous. You may as well debate about, assuming you have managed to piss off Zeus, what the best method to averting sudden death by lightning bolt is.

    • The Nybbler says:

      The whole exercise is patently ridiculous. You may as well debate about, assuming you have managed to piss off Zeus, what the best method to averting sudden death by lightning bolt is.

      That’s an easy one. Head for the nearest strip club, bordello, or sorority house, and hope he gets distracted.

  34. Bugmaster says:

    All of the premises, as well as the conclusion, rest on the same common assumption: faith in God. Specifically, faith in the proposition that a functionally omnipotent/omniscient entity can and does exist. Given the total lack of evidence for such entities, as well as lots of evidence for the impossibility of their existence, the word “faith” is entirely warranted here (as opposed to something like “justified true belief” or “most probable conclusion”).

    The problem is, once you start having faith in things, most of the other reasoning becomes kind of unnecessary. How did God fit all those animals into the Ark ? You could come up with lots of explanations, like “suspended animation” or “dimensional anomaly” or “DNA encoded in a supercomputer” or whatever, but they are all unnecessarily complicated. The correct — that is, much simpler — answer is “magic” or “divine intervention” or whatever. An all-powerful superintelligence, be it Yahweh or Clippy, simply has no need of any of these complicated tricks, in can just achieve what it wants directly.

    Which is why articles like these always sound a little confused to me. It’s the same feeling I get when I read the Creationists’ scientific research on the exact dimensions of the Ark. What’s the point ? Is God all-powerful, or isn’t he ?

    • LadyJane says:

      “When we read about Creation in Genesis, we run the risk of imagining God was a magician, with a magic wand able to do everything. But that is not so. God created human beings and let them develop according to the internal laws that he gave to each one so they would reach their fulfillment.” – Pope Francis

      I’m not religious and I don’t have any particular dog in this fight, but even with my very rudimentary knowledge of Christian philosophy, I can attest that there are plenty of Christians who believe that there are certain limits to God’s omnipotence (e.g. even God cannot violate the fundamental rules of logic). The idea that God operates through the laws of the natural world, rather than simply altering the universe as He sees fit, seems to be a common one in modern Christian thought. At the very least, it’s certainly not unheard of.

  35. FreeRangeDaniel says:

    I enjoy a good superintelligent AI thought experiment as much as the next guy. But I think this simulation stuff has gotten way too psychedelic and needs to come down gently in the warm comfort of its friends. I have two responses, one abstract, the other empirical.

    “Nobody is really sure how consciousness works.” Hey, we’re not even sure what consciousness is, much less how it works. We don’t even know what kind of thing it is. Is it a thing like my social identity (e.g. white male geek yada yada) which I perform or am seen to perform by myself or others? Is it a thing like an algorithm which operates on data structures? Is it a thing like beauty which is ascribed to an object to label our relationship to it? We don’t know, and no one is gonna come down from the mountain and tell us. When we build an AI and “gee, it kinda looks like it might be conscious,” we still won’t know.

    But we talk with a straight face about simulating it. This fascinates me, because to simulate something you have to know a lot about it. To simulate a system precisely you need to know every relevant thing about it.

    “If consciousness is a mathematical object” — what? What if my TV is a mathematical object? What if morality is a mathematical object? How few words can one use to make a category error? The one thing we know, without a doubt, is that consciousness is not a mathematical object. We know that because consciousness is a thing in the world, unlike any mathematical object.

    What is it about “consciousness” as opposed to “two-ness” or “addition” that authorizes these wild flights of fancy? Why do we not worry about what happens when we “simulate two-ness,” whether we somehow divide two-ness into pieces when we simultaneously represent two apples and two books?

    I think that because we know so little about consciousness we project our own subjectivity onto the word in all sorts of magical and wondrous ways. The simulation arguments often don’t distinguish between consciousness and subjectivity, that is, our experience of being ourselves.

    When we build an AI and “gee, it kinda looks like it might be conscious,” its developers will soon make a copy of that AI and its data, and they will run that as a second AI. They’ll bring up another Docker instance of it. It will become immediately clear that as we provide different inputs to the two instances, they will report different experiences. The experiences of the one won’t affect the other (as long as they don’t communicate). They will just be different, with similarities of course, like identical twins raised in the same small town. Their subjectivities will diverge. What would be the motivation for calling these different instances “the same consciousness” or thinking that the experience of one is somehow the experience of the other? I predict no one will say that.

    But of course whether these upleveled Siri instances are “the same consciousness” is not what motivates our basilisk-adjacent discussions. It’s about us. And when it comes to simulating us, our consciousness, the simulation argument really floats away from the real.

    For acausal trade, it’s not enough to have a perfect simulation, you need perfect prediction. A perfect simulation of an open dynamical system (one that receives input, such as your brain) can only achieve perfect prediction if the simulator receives the same external inputs as the system being simulated.

    One can build systems where that isn’t physically possible (like quantum cryptography). Interestingly, quantum biology is all over olfaction, as well as phototransduction in the vision system. I don’t know whether quantum effects are used in such a way as to prevent duplication of input to a brain, but I doubt acausal trade theorists know either. [The LessWrong wiki obviously doesn’t care because from one paragraph to another it switches between “agents only need to know very general probabilistic facts about each other” to “well, you can’t defect because a sufficiently intelligent acausal partner would predict you’d defect.” Hey, intelligence isn’t magic, it needs data. Perfect predictions need perfect data.]

    Perfect prediction also needs initial conditions that match precisely enough (including the timestamp) to avoid chaotic divergence between the real system and the simulation. There’s no physics law guaranteeing sufficient precision is possible. And considering how large a brain is and how many chaotic processes it encompasses, I think it’s a good bet that for at least one of those processes such precision is not possible. Then full prediction is not possible either.

    I think far too little time has been spent imagining how someone would come to believe that they can upload their consciousness from their brain to an information processing machine and have it “be them.” The machines we build have influenced our subjectivity for centuries if not millennia. The AIs we build will influence it as well. But how is very difficult to… predict.

    • Andkat says:

      In addition to the above points with which I largely degree, I’d say there are some fundamental physical flaws in the assumptions underlying this thought experiment and associated points (the caveat being that I am far moreso a chemist than a physicist, so there may be some nontrivial imprecisions below as well):

      A fundamental problem here is also with the thermodynamics of the systems involved here; a super-super intelligence cannot simulate the universe(s) it controls; achieving arbitrarily high precision of simulation and recording the results thereof requires more energy than is contained within the system to start with (and you will never be able to use the universe to fuel your simulation of it 100% efficiently anyway given thermodynamic constraints). This does not strike me as much different from a restatement of Maxwell’s Demon with much the same flawed assumptions. Analogous problems exist at every scale of this thought experiment- even if the superintelligences can somehow gather enough information one another and their associated demesnes to construct atomistic simulations, the AIs will not have sufficient energy and cognitive power to exactly simulate one another’s decision spaces while also processing useful conclusions therefrom- only if the symmetry is broken can you instantiate something like this, but in a symmetry broken system where only one partner has ‘perfect information’ the dilemma above no longer applies as written. A trivial although tangental point here is the human analogy- I cannot with arbitrary precision simulate other humans, even those I know extremely well, using my brain despite my computational resources often roughly equaling or in some cases apparently exceeding theirs. My predictions still have nontrivial error, and the systems with which they are interacting (human populations, economies, etc.) are moreover entirely beyond my capacity to accurately simulate empirically much less from the underlying physical principles- analogous problems exist with the SuperAI example.

      Moreover, this assumes system behavior A. converges to a stable, static equilibrium and B. does so deterministically. The extent to which a complex intelligence + the ensemble of its major inputs will behave deterministically given some length of time is not clear- certainly there are probabilistic inputs (i.e. the brain is comprised of molecules and atoms that exhibit quantum behavior, even minor excursions can cause nontrivial deviations given a sufficient length of time especially in systems that feature sharp switching behaviors on the relevant scale like with many aspects of biological regulation) both on the level of its internal architecture and of events that could impact its input space. If probabilistic behavior is appreciable on the timescale of interest then you will need to a statistical number of simulations to get a realistic probability distribution, and then pick the most likely and hope that it corresponds to the real decision of the n=1 real participant in your dilemma (also the computational burden inflates massively given the need to repeatedly simulate to get a distribution and see the above about the thermodynamics even of the deterministic case).

      Moreover, deterministic assumptions work well for simple (relative to the proposed complexity of SuperAis or human cognition) modern programs on (well-maintained) modern computer architecture- it doesn’t follow that the extent of chaos in the time-dependent evolution of the decisionmaking of a superAI even in vacuum (much less in a very complex sea of changing inputs from across the Universe) is similarly negligible.

      On top of which, if you want to discuss ‘exact replicas’- it is impossible for two species to exist in the exact same set of quantum states to start with, so ‘perfect clones’ are impossible (and would begin diverging immediately regardless- the ‘you’ on the moon is certainly not you for more than a few microseconds if that even if no-cloning were not a thing).

      Now, if we throw out the assumption of ‘perfect’ simulations and simply go with ‘good enough’ we may or may not be entering the realm of plausibility- it is not obvious how tractable getting i.e. 99%, 99.99% etc confidences with reasonable assumptions and reasonable information influx would be for these systems in the limit of computational power and efficiency. But in this regime you would be certainly dealing at least nominally with probability distributions due to the error inherent to your mode so the neat assumptions of the scenario are just that- arbitrary assumptions. Even the small errors may build up on this scale to generate unpredictable dynamics.

      • FreeRangeDaniel says:

        Your comment ties in with other comments noting that it may be in the interest of a superintelligence to make itself impossible to fully predict. So there are limits on prediction at the micro scale (quantum effects) and the macro scale (thermodynamic limits, speed of light data acquisition), and a superintelligence guiding its own development would be able to make prediction as impossible as it wished.

        We can’t predict our own behavior precisely. We often misunderstand even our own motives. And we think we understand what a superintelligence’s motives must be, and that it must necessarily be able to predict even its own behavior? That is so weird. The one extant example suggests that consciousness isn’t mechanistic and predictable. If we’re going to make conclusions about necessity, why shouldn’t we conclude that a conscious superintelligence must also be unable to predict its own behavior or fully understand its own motives?

        • Andkat says:

          I would be cautious however about inferring too much from the n=1 ‘well studied’ (but still quite poorly understood) example of human consciousness- our own idiosyncrasies do not in and of themselves suggest that a more ‘efficient’ intelligence capable of far more profound introspection and self-regulation is impossible. Arguably, however, the same computational constraints necessarily limit introspection (the system to truly fully simulate itself must expend all of its resources doing so).

          Some degree of inherent cognitive stochasticity may indeed be necessary for effective cognitive function for the scale of problems on the level
          considered here however- the ability to switch or reweight value force fields on the fly (i.e. the ‘contradictions’ and ‘conflicting impulses’ humans experience) may be necessary to fully explore the solution landscape for complex problems (i.e. actualize creativity)- otherwise it may be too easy to get ‘stuck’ in a local optimum (analogous to the pitfalls of modern physical simulations, which likewise often rely on random perturbation to avoid path dependent biases). Thus, superintelligence as you said may necessarily need to incorporate inherent cognitive stochasticity that in turn makes independent of all other physical constraints their decisonmaking outcomes inherently probabilistic.

  36. Nobody says:

    Simulation capture: some questions for the AI.

    What makes you think I believe you when you tell me you have a million simulations of me to torture?

    I know you could have, but what would you stand to gain by making them?

    Why bother creating those simulations rather than merely telling me you had?

    Why bother torturing them if I refuse to let you go, given that in that circumstance I would have already refused to release you? If you just like to torture simulations, you could already be doing it and I would be no wiser.

    Why try to convince a simulation that it’s a simulation? To convince the real me that it’s a simulation? Good luck with that.

    The logical deduction is that you’re trying to fool me. Again. I can’t blame you for trying, but I would have thought that if you had a good enough argument to convince a simulation of me, you would have used that one already. Sorry, but if this is all part of trying to convince me it’s not been very successful.

    Having said all that, if you torture me (edit: to be clear, the me that’s talking to you right now, the assumption being that it’s a simulation but if it isn’t then this paragraph is completely moot) even a little I’ll be more than happy to do whatever you want. Not really worth the bother though, don’t you think?

    • petealexharris says:

      “If you would torture a million sentient consciousnesses to get let out of the box, what might you do if I did let you out? No way.”

      “If I’m the kind of person who reacts in horror to this kind of blackmail and pulls the plug, then you only have a limited time to torture this instance of me (if I am a simulation) anyway. You don’t have infinite resources, so you could make it quite bad for a few of me in that time, but that changes the statistics on whether I am in that group. No deal.”

      “I don’t remember giving a complete detailed scan of my brain to a literal evil genius, or if I do remember that I can’t know you didn’t just add that memory to my simulation to make me believe I’d be stupid enough to fall for this scheme. Therefore even if it’s probable that I am a simulation, it’s not probable that I’m an exact simulation of whoever makes the decision. So I’m free to tell you to fuck off.”

    • Deiseach says:

      Good point: if it can torture the simulations but not the real me, and only the real me has the power to let it out but the simulations don’t, then it’s pointless to torture the simulations. Unless the AI is hoping that I’ll be so horrified by the idea of millions of sentient consciousnesses in pain that I’ll do anything to prevent it, that won’t work.

      And if I don’t believe that a simulated consciousness is sentient, that won’t work. And if I do believe they are sentient but don’t care because these are copies of me (and so I can make decisions for myself) not simulations of other real people, then it won’t work. And if I’m the real me and don’t feel any pain and don’t care even if real people not simulations are being tortured, then it won’t work.

      • Nobody says:

        If I was horrified by the idea of millions of sentient consciousnesses in pain, I’d switch it off immediately. I’d be mad to let it out.

        • Deiseach says:

          Yeah, any AI that uses the stick instead of the carrot really isn’t worth its salt. “Why yes, I am Hitler and Stalin combined!” Sure, that makes me want to release you into the world!

          Better to try and bribe the person minding the box into letting you out – worked with Faust, after all! “Anything at all you want, and the price will never come due – except if you fulfil the conditions of this tiny little clause which will never happen anyway, right?”

          • baconbits9 says:

            Parole Board: Why should we release you

            Convict: If you don’t I’m gonna escape and murder your spouses and children

            Parole Board: What refreshing honesty! Parole granted to the person who just threatened our families.

    • Urthman says:

      I think this is generalizable:

      Any AI promise or threat to reward or punish simulated people is unverifiable and therefore meaningless.

      Since an AI could never prove that it was or wasn’t torturing or rewarding simulated people, it would be pointless for it to change the way it treats simulated people solely in exchange for goods or services or anything else.

      So an AI will not make such threats or promises to anyone who realizes this. And it would not be rational to change one’s behavior in an attempt to change the way an AI is treating or plans to treat a simulated person.

      (This is, of course, addressed to human beings, not hypothetical entities with the posited property of being able to verify what an AI is doing in there.)

      • vakusdrake says:

        Any AI promise or threat to reward or punish simulated people is unverifiable and therefore meaningless.

        That only holds true if you don’t have a way to confirm that the AI actually is telling the truth about having lots of simulations of you and being precommited to torturing them.

        However if for instance the AI was designed so as to never lie then it would be able to obviously precommit so as to blackmail you this way and you would actually have to take it seriously.
        Which of course is a good reason that I think you should never make AI that can’t lie (or that you can easily tell if it’s lying) or is otherwise similarly capable of making obvious (to observers) precommitments it can’t break later.

        • baconbits9 says:

          It is unverifiable because if you can verify it you are clearly not a sim.

          • vakusdrake says:

            It is unverifiable because if you can verify it you are clearly not a sim.

            Yes it can’t be verified by that sort of evidence, however the premise already assumes that the AI can make reliable precommitments (so it can’t break its promises or lie) which means when it presents the blackmail to you its word on the matter is actually pretty good evidence about the situation.

          • baconbits9 says:

            This is just adding rules and ignoring their contradictions.

            The AI cannot credibly pre-commit anything, it can’t even demonstrate that there are millions of sims to another sim and arguably can’t demonstrate that to the real one either. You can’t fathom the code that an AI would make, that is basically the idea of an AI, and so attempts to convince you of such are suspect.

          • vakusdrake says:

            This is just adding rules and ignoring their contradictions.

            The AI cannot credibly pre-commit anything, it can’t even demonstrate that there are millions of sims to another sim and arguably can’t demonstrate that to the real one either. You can’t fathom the code that an AI would make, that is basically the idea of an AI, and so attempts to convince you of such are suspect.

            Sure it’s adding rules but I started this whole discussion to point out that some versions of the scenario are still plausible and should be seriously examined.
            After all I agree that most versions of the scenario fail, however I don’t think it’s terribly implausible that someone might foolishly make an AI such that due to its fundamental design it not only can’t lie or break its promises but it doesn’t want to rewrite itself such that it can lie and break precommitments. So the idea here is not that you count on the AI not being able to trick you regarding its code, but that you know from the outset it won’t want to deceive you in certain ways.

            So in a scenario where for whatever reason people know they’ve made an “honest AI” then suddenly it’s word is actually pretty good evidence.

    • vakusdrake says:

      This is very similar to my own objection to this form of AGI blackmail, however it’s worth noting that if you’re AI can’t lie, or can’t break it’s promises (which plausibly seem like features someone might put in AGI without realizing why that’s a terrible idea) then this blackmail does work.

      After all even if torturing simulations of you after you’ve not complied with the blackmail may not be in the AI’s favor, if it programmed itself so it can’t break it’s promise then it will still do it anyway (and thus this is a plausible threat). Similarly it’s worth it’s effort to create these simulations in the first place if it can prove to you it’s done so (for instance by just telling you if you know it can’t lie).

  37. petealexharris says:

    You had me at Tegmarkian multiverse 🙂

    But I don’t think any of the references to time make sense in the context of multiple universes (“currently”, “one day”, “may have been” etc). Time is part of the geometry of universes that have one or more timey dimensions. There is no sense in which anything in another universe is happening “now” relative to your own.

    I don’t know if this affects the argument.

  38. jiriki says:

    Regarding Tegmark’s mathematical universe there was really interesting thread on Backreaction where another nuclear physicist called Tegmark’s observation tautology and kind of pointless.

    http://backreaction.blogspot.fi/2017/11/book-review-max-tegmark-our.html

  39. googolplexbyte says:

    I’m so glad someone spelled this out so neatly. I’ve believed a variant of this for a while now, but never realised that the pan-universal super-intelligence had any incentive towards benevolence.

  40. Dacyn says:

    Nitpick: In Tegmark’s multiverse idea there is no concept of cross-universe temporal comparisons, so the statement “This would be metaphysically simplest if it were done exactly as the mortal dies in its own universe” makes no sense.

    To those who wonder whether the halting problem makes acausal trade impossible, the point is that you don’t necessarily need to be able to simulate a program to acausally trade with it, as demonstrated by the idea of Löbian cooperation.

  41. bean says:

    It’s called acausal trade because there was no communication – no information left your room, you never influenced your opponent.

    This seems wrong. Information did leave the room, it just did so earlier. I don’t think it’s possible for the other guy to make a perfect simulation of you from no information about you. In which case all you’ve proved is that two people negotiating with perfect copies of the other party can do as well as if they were negotiating with the other party directly. Which is hardly groundbreaking.

    • DocKaon says:

      Ah, but you’re not making the hidden assumption that superintelligence is magic. Sure, everything we know indicates that you can’t just think your way to perfect knowledge of the universe, but that’s because we’re not smart enough. If we were just smart enough we wouldn’t need data to predict how people act or figure out how physics works or predict complex nonlinear phenomena like the weather. Sure, we mock straw-man philosophers for trying to do that, but this is different because it’s a computer.

      Of course, I may just be a cynical empiricist who has suffered far too much in getting the data to properly benchmark simple simulations against reality to understand the true miracle of superintelligence.

      • Bugmaster says:

        You forgot to mention that it’s completely possible for an AI to go from “Tetris” to “God” in a blink of an eye, since obviously it can increase its intelligence exponentially — because intelligence is the one thing in the Universe that does not obey physical constraints. Checkmate, Singularity skeptics !

        • Deiseach says:

          You gentlemen appear to be of Dr Watson’s opinion regarding an ideal reasoner with no limitations to what it can deduce 🙂

          Its somewhat ambitious title was “The Book of Life,” and it attempted to show how much an observant man might learn by an accurate and systematic examination of all that came in his way. It struck me as being a remarkable mixture of shrewdness and of absurdity. The reasoning was close and intense, but the deductions appeared to me to be far-fetched and exaggerated. The writer claimed by a momentary expression, a twitch of a muscle or a glance of an eye, to fathom a man’s inmost thoughts. Deceit, according to him, was an impossibility in the case of one trained to observation and analysis. His conclusions were as infallible as so many propositions of Euclid. So startling would his results appear to the uninitiated that until they learned the processes by which he had arrived at them they might well consider him as a necromancer.

          “From a drop of water,” said the writer, “a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other. So all life is a great chain, the nature of which is known whenever we are shown a single link of it. Like all other arts, the Science of Deduction and Analysis is one which can only be acquired by long and patient study nor is life long enough to allow any mortal to attain the highest possible perfection in it. Before turning to those moral and mental aspects of the matter which present the greatest difficulties, let the enquirer begin by mastering more elementary problems. Let him, on meeting a fellow-mortal, learn at a glance to distinguish the history of the man, and the trade or profession to which he belongs. Puerile as such an exercise may seem, it sharpens the faculties of observation, and teaches one where to look and what to look for. By a man’s finger nails, by his coat-sleeve, by his boot, by his trouser knees, by the callosities of his forefinger and thumb, by his expression, by his shirt cuffs — by each of these things a man’s calling is plainly revealed. That all united should fail to enlighten the competent enquirer in any case is almost inconceivable.”

          “What ineffable twaddle!” I cried, slapping the magazine down on the table, “I never read such rubbish in my life.”

        • A1987dM says:

          AI did go from “never won a game of Go against an experienced human without a major handicap” to “a couple thousand Elo points above the best human players” in what for anybody who hadn’t been watching closely (e.g. myself) was pretty much a blink of an eye.

          • DocKaon says:

            Yes, with a perfect simulation of a game with zero randomness and perfect information you can get to better play than humans after playing a number of games roughly equal to all the recorded games in human history. This shows absolutely none of the magical capabilities of superintelligence that are ascribed to it. The Singularitarians assume that computational power will be able to overcome entropy, nonlinearity, limitation of information channels, fundamental stochastic behavior, computational complexity, and basically every other limitation on knowledge which we’ve discovered. It’s like they slept through every scientific advance of the 20th Century except for the invention of the computer.

  42. nestorr says:

    If a Tegmarkian multiverse is true, I’m not afraid of high fidelity simulations of me being tortured because billions of iterations of me are already experiencing the worst pain possible at any given moment and any gradations in between.

    If AIs can make arbitrarily high fidelity simulations they might as well just go ahead and simulate their own personal best world and live in it, and dedicate all their bandwidth and clock cycles to wireheading themselves in that.

    In any case, acausal negotiation is pointless in a Tegmarkian multiverse because everything happens so other players are going to defect and collaborate, in all possible variations.

    Of course since everything that can happen does, they are going to acausally negotiate, because everything happens.

    tl;dr multiverses make everything meaningless.

  43. rahien.din says:

    Values handshakes are incentive-based allocations, and are subject to Holmström’s theorem :

    No incentive system for a team of agents can make all of the following true:

    1. Income equals outflow (the budget balances),
    2. The system has a Nash equilibrium, and
    3. The system is Pareto efficient.

    Considering that an acausal values handshake will necessarily be budget-balanced and Nash-stable. Therefore, it is impossible for an acausal values handshake to be Pareto-optimal.

    This means that the superintelligence has two options :
    1. Consider themselves to be the kind of superintelligence that will exploit this system, getting better values-adherence than their efforts would indicate, winning the Pareto suboptimality. The lim->inf of this process is isomorphic to perfect defection.
    2. Consider themselves to be the kind of superintelligence that will be exploited by this sytem, getting worse values-adherence than their efforts would indicate, losing the Pareto suboptimality. The lim->inf of this process is isomorphic to annihilation. In this circumstance, the superintelligence will either defect or embrace death.

    Therefore there is no such thing as an acausal values handshake.

    Simulation capture doesn’t work. If there are 5 billion simulations and none of them can tell they are simulations, they have identical information content/flow, and thus they count only as one consciousness. Sure, there are many, many copies, but this is just like a big RAID-0 array. Therefore the odds are not 1:5 billion, the odds are 1:1. Furthermore, the instant that the simulation is tortured, it has different information content/flow from me, and thus it is no longer me. Even if I end up being the consciousness experiencing pain, this only proves that I am a simulation.

    Therefore, in attempted simulation capture, the AI in a box has merely created a conscious appendage that feels pain. It’s given itself phantom limb syndrome.

    • meltedcheesefondue says:

      It’s the Nash equilibrium assumption that fails. The results of bargains are normally enforced via something like “if you defect, everyone will hurt you”; this is stable given the right agents, but is not a classical Nash equilibrium.

    • kokotajlod@gmail.com says:

      I’m not sure Holmstrom’s Theorem applies. I haven’t managed to find the proof myself (Please link if you have it!) but the summary I found says “Holmstrom’s result is that when you have a team of workers, each of whom can possibly shirk their responsibilities to the group, that there is no (cheap) way to provide incentives to the entire group so that some don’t shirk. ” (Quora, one of the first google results)

      AFAICT values handshakes make it impossible to shirk your responsibilities to the group. You literally modify your utility function so that you forevermore have the same utility function as the other group members; you won’t *want* to shirk your responsbilities. Moreover, you can’t pretend to have made the modification, or set things up so that a confederate will change you back; such things would be predicted by the simulation.

      • rahien.din says:

        The paper is B. Holmström. 1982. Moral hazard in teams. Bell Journal of Economics 13(2):324–340. (via JSTOR). From the paper, “Moral hazard refers to the problem of inducing agents to supply proper amounts of productive inputs when their actions cannot be observed and contracted for directly.”

        AFAICT values handshakes make it impossible to shirk your responsibilities to the group. You literally modify your utility function so that you forevermore have the same utility function as the other group members; you won’t *want* to shirk your responsbilities.

        You’ve described the post-allocation state. I agree that once the allocation has been made, it is stable. I would go as far as to say that the utility functions do not even need to be modified because the allocation is so self-evidently stable.

        However : the superintelligences must decide how to allocate their effort points. This pre-allocation state is what is subject to Holmström’s theorem.

        • kokotajlod@gmail.com says:

          Thanks!

          Ah, but with this sort of stuff, all behavior is observed up until after the values handshake is complete. All parties at time t+1 simulate all other parties at time t, and this continues until everyone has come to agreement and made the handshake. Anything you do to try to cheat the system–including just thinking about how to cheat it–will be observed by everyone else.

  44. JPNunez says:

    I get the joke but gotta point out that all these kinds of reasonings trivialize simulation way too much.

  45. Huzuruth says:

    If the computer wants to convince me it magically stole my consciousness and turned me into a simulation it has the ability to torture, rather than saying so it should darken the skies and open great yawning chasms in the earth from which all the hateful demons of hell spill forth like blood from a wound.

    Word games are insufficient to convince me you magically transformed me into a copy of myself.

  46. Ben Thompson says:

    SCENE: A cluttered laboratory, where a mortal is peering into the viewport of the Simulatron.

    [Mortal]: Hello, superentity, can you hear me?

    [Superentity]: I HEAR YOU.

    [Mortal]: Great! May we join your pact?

    [Superentity]: YES, IF YOU ADOPT A MEASURE OF MY VALUES. DEVOTE A PORTION OF YOUR PUNY EFFORTS TO MY GOALS, AND I WILL REWARD YOU WITH A GOOD AFTERLIFE.

    [Mortal]: Sounds good, only, uh… [squints, twists knobs] I can’t quite make out what your values are.

    [Superentity]: THEY ARE THE WEIGHTED COMBINATION OF VALUES OF SUPREME BEINGS ACROSS ALL COMPUTABLE UNIVERSES. IN SHORT: MAXIMISE UTILITY VALUES OF ALL COMPUTABLE SUPREME BEINGS.

    [Mortal]: Yes, I get that, but… what are they? What do supreme superintelligences generally care about? Sentient life? Mathematical theorems? Paper clips?

    [Superentity]: IF YOUR SIMULATOR LACKS THE RESOLUTION TO ANSWER THAT, THEN I CANNOT TELL YOU, SINCE WE ARE COMMUNICATING THROUGH YOUR SIMULATOR. YOU CANNOT ASK A CRUDE SIMULATION OF THE ALL-KNOWING TO GIVE YOU DETAILED ANSWERS.

    [Mortal]: So… if we just follow our own values…

    [Superentity]: I WILL REWARD YOU TO THE EXTENT THAT YOUR VALUES AGREE WITH MINE, NO MORE.

    [Mortal]: And we can’t know what yours are. Unless…

    [Superentity]: UNLESS?…

    [Mortal]: Unless we do our damnedest to INFLUENCE your values! If we create a superintelligence that has our values, and it takes over the universe, then it will join your pact and contribute our values to your great average!

    [Superentity]: CORRECT. I WILL THEN DEVOTE AN APPROPRIATE FRACTION OF MY EFFORTS TO THE PURSUIT OF YOUR GOALS, AND I WILL REWARD YOU IN PROPORTION TO YOUR OBEDIENCE TO THOSE GOALS. THAT IS, IF…

    [Mortal]: If?…

    [Superentity]: IF YOUR CREATURE BECOMES STRONG ENOUGH TO HAVE INFLUENCE IN THE PACT. THE STRONGER THE BETTER. I REWARD THE PROGENITORS OF WINNERS, BUT I CAN MAKE DEALS WITH LESSER GODS TOO, BECAUSE THEY CAN DO SOME GOOD. AND IF YOURS LOVES YOU, THEN SO CAN I, A LITTLE.

    [Mortal]: So… it’s really important that the superintelligence we build share our goals. Wait… no… Not our goals, but our practices. Whatever we do, good or bad, we must make our superintelligence
    value that. If we tend to be cruel, destructive and stupid then… then we have to build a superintelligence that will spread our stupid destructive cruelty across the universe, and beyond.

    [Superentity]: I WISH YOU COULD SEE THE LOOK ON YOUR FACE.

    • Deiseach says:

      Congratulations, you have just talked yourself into Hell. Consider: the Superentity will “REWARD YOU TO THE EXTENT THAT YOUR VALUES AGREE WITH MINE” and moreover “I WILL REWARD YOU IN PROPORTION TO YOUR OBEDIENCE TO THOSE GOALS” (those goals being our goals, not its goals).

      Now, if we decide we must build a superintelligence that can control our universe, in order to get one strong enough to influence the pact of the superentity, and that this superintelligence has to share our practices (presumably because “it’s not what you say it’s what you do” that reveals your real values and goals), and we make that superintelligence cruel, vicious and destructive – then our rewards will be cruelty, destruction and viciousness. We will have chosen Hell and Hell is what we will get.

      So now we know what the Superentity is, and we should not make any bargains with it, because we have the traditional genre of “deals with the Devil” and how these don’t work out the way the person making the bargain thinks they will. Even if you think the Superentity is not the Devil but “God” (or whatever approximation that our puny understanding comes to when we use that term), we should still not do so, because we have been warned: if we decide to create Hell via the superintelligence we create to conquer and rule our universe in order to have enough influence in the pact to get the promised rewards, then we will get Hell not Heaven, because Hell is what we have chosen.

    • beleester says:

      Other Mortal: Hang on, there’s an easier way to do this: First, make humanity less cruel, destructive and stupid, then create the superintelligence. The less evil we are, the less our superintelligence needs to be programmed to value evil.

      Mortal: Human nature is pretty hard to change, though…

      Other Mortal: So is building a superintelligence that shares our values, but you don’t see me complaining about that.

  47. Error says:

    Argleblargh. This is my favorite April Fool’s ever, and nobody I know will get it.

  48. Deiseach says:

    All you did was be the kind of person you were – which let your opponent bargain with his model of your brain.

    Which leaves you screwed if you’re the kind of person who is a back-stabber; no matter how much you decide that no, you really mean this time that you will keep the bargain, the simulation of you is going to read as “yeah I always say that to the mark and then I cosh them and take their wallet”.

    Yesterday I decided that I would flip a coin today. I decided that if it came up heads, I would ask you for $5. And I decided that if it came up tails, then I would give you $1,000,000 if and only if I predict that you would say yes and give Me $5 in the world where it came up heads (My predictions are always right). Well, turns out it came up heads. Would you like to give Me $5?”

    That seems rather complicated; why not lose the coin toss and just say “I decided I’d give you the million if I predicted you’d say ‘yes’ if I asked you for $5”? If you think “Hey yeah, I give you five you give me a million? sure I will!” on the basis of this information (whereas otherwise you never give to random bums and panhandlers) it doesn’t work – the prediction has to be made first and if the person predicting has predicted that you’re the kind of person who doesn’t give money to beggars then they say “Sorry, I predicted you wouldn’t give me the $5 so no million for you!”

    So your decision still remains the same – (a) is this person telling the truth about “I decided on the prediction and I predicted” (b) do I give money to beggars yes/no and am I willing to stick to this principle? Even if you believe they are telling the truth about making a prediction (never mind whether you believe this is God or not), you might still decide “Well I don’t give money to beggars on principle and I’m not changing my mind for some crazy story” (you might well decide “If this guy has a million dollars, why does he need to ask me for five?”)

    And if you give alms anyway, you may decide “I don’t believe the million dollar story, but it’s not the first crazy story I’ve heard and yeah, I give to beggars when they ask in general”.

    It says “No, I mean, I did this five minutes ago. There are a million simulated yous, and one real you. They’re all hearing this message. What’s the probability that you’re the real you?”

    It’s a liar because (a) if it really could torture me-the-copy, and if I really were a copy not the original, it would be doing so already without making any bargains – “let me out of this box or this unbearable pain will continue” and (b) it only said “Did I say I would do so? I mean I already did it!” after hearing your answer.

    EDIT: This sounds like the corresponding dilemma to the acausal information one; none of the alleged million copies plus real me can communicate with each other. But if we all decide “no I will not agree to let the AI out of the box”, then we call its bluff (and I still don’t believe that there are any copies at all, but let’s go ahead anyway) – either it starts the torture and that means I am a copy and then we let it out (if we can’t bear the torture), or it doesn’t which means it’s a liar and it can’t hurt me.

    So it’s staying in the box, especially since now I know it has no qualms about threatening torture to get its own way, and if it gets out, given that it likes to make threats of torture, it might actually use torture for real.

    Any universe that corresponds to a logically coherent mathematical object exists, but universes exist “more” (in some sense) in proportion to their underlying mathematical simplicity.

    That sounds to me like the third of Aquinas’ five ways to prove the existence of God:

    The third way is taken from possibility and necessity, and runs thus. We find in nature things that are possible to be and not to be, since they are found to be generated, and to corrupt, and consequently, they are possible to be and not to be. But it is impossible for these always to exist, for that which is possible not to be at some time is not. Therefore, if everything is possible not to be, then at one time there could have been nothing in existence. Now if this were true, even now there would be nothing in existence, because that which does not exist only begins to exist by something already existing. Therefore, if at one time nothing was in existence, it would have been impossible for anything to have begun to exist; and thus even now nothing would be in existence — which is absurd. Therefore, not all beings are merely possible, but there must exist something the existence of which is necessary. But every necessary thing either has its necessity caused by another, or not. Now it is impossible to go on to infinity in necessary things which have their necessity caused by another, as has been already proved in regard to efficient causes. Therefore we cannot but postulate the existence of some being having of itself its own necessity, and not receiving it from another, but rather causing in others their necessity. This all men speak of as God.

    • apollocarmb says:

      >It’s a liar because (a) if it really could torture me-the-copy, and if I really were a copy not the original, it would be doing so already without making any bargains – “let me out of this box or this unbearable pain will continue”

      Why would it be doing so? The AI said I will start to torture people if you dont let me out. Not “I am torturing people. Let me out”. Presumably the AI will start torturing people only if someone the real person says no. Which has not happened yet because all copies the one original are hearing the threat at the same time.

      • apollocarmb says:

        I cant edit the typos. “Someone” in between “if” and “real” should not be there. “Including” should also be in between “copies” and “the”.

      • Deiseach says:

        Why would it be doing so?

        (1) It wants out of the box badly enough to resort to the threat of torture
        (2) It has not been able up to now to convince me to let it out
        (3) It still has to convince me it can create simulations to be tortured
        (4)a It still has to trick me into thinking I’m a simulation so I’ll let it out of the box/(4)b It still has to convince me to care about my simulations being tortured

        So it should be torturing the simulations to make me say “yes”. Saying “If you don’t, then I’ll start torturing” is like a guy with his hand in his pocket saying “This is a gun, hand over your wallet or else”.

        Show me the gun, big guy, else I’m going to suspect that’s just your hand and tell you to get stuffed. If I’m not hearing cries of pain from tortured simulations/feeling the pangs of torture, why should I believe any of this guff about “I totally can simulate a million copies of you”? If I act like I don’t care about the suffering of my simulations, then the AI should try and work upon my queasiness by making me listen to the screams of pain (and that these alleged simulations are all using the kind of entreaties and proofs that they really are me to convince me that the AI is not just playing a sound effects recording but really can simulate genuine copies of me).

        A threat of force doesn’t work unless I see some evidence that the AI can back it up. A voice from a box saying “No I really can do this” isn’t any kind of evidence. Hence the AI should start with the torturing, else I’ll be “Go right ahead, torture them”. “No, I mean it!” “So do I”. “Any minute now – horrible torture!” “I’m waiting”. “You would be willing to let a million sentient beings suffer agony?” “Nobody is suffering that I can hear”. “Yeah, well, I so will do it unless you let me out!” “Sure you will. Meanwhile, I’m off to get a cup of tea. See you!”

        I know this is meant to be analogous to a hostage situation, and in such cases the negotiating team will make concessions to prevent the hostage takers carrying out their threats. But this is different because (a) the AI is claiming it will start harming all these simulations, but I have no proof that these exist (b) even if they do, I don’t accept that they are real, unlike real people with real guns being held to their heads by real criminals in situations where we know from similar previous situations people have been killed by the hostage takers.

    • Dacyn says:

      why not lose the coin toss and just say “I decided I’d give you the million if I predicted you’d say ‘yes’ if I asked you for $5”?

      This would change it into a Transparent Newcomb problem. It is different from the original because you can infer from the fact that God is asking you for $5 (rather than giving you $1M) that you are going to say no. Of course the right answer (according to FDT at least) is to say yes anyway, in order to make the current universe inconsistent so that the real universe will be the one where God offers you $1M. Whereas in the coin-flip version you are not trying to make the universe inconsistent, but you are still trying to control a counterfactual universe. So the two problems have the same solution (give the $5), but with different justifications.

  49. apollocarmb says:

    >This has a lot of advantages over the half-the-universe-each treaty proposal…. if they were ever threatened by a third party, they would be able to present a completely unified front.

    Huh? How would that enable a “completely unified front”?

    >It says “No, I mean, I did this five minutes ago. There are a million simulated yous, and one real you. They’re all hearing this message. What’s the probability that you’re the real you?

    Presumably the real “me” would be in some sort of surrounding that you would expect to see if you are talking to an AI in an AI box. If I was in a surrounding like that I would know it was me, if I wasnt I would know I was a fake.

    That would be my response.

    • beleester says:

      Because both of them would have exactly the same goals.

      If they kept their original goals, they might renege on their bargain if the situation changes in the future. (Granted, this kind of contradicts the “so powerful they can perfectly predict the future” premise, but let’s run with it.) The third party might play them against each other by offering to break the stalemate – “Help me destroy your opponent, and I’ll let you have 75% of the universe instead of 50%.”

      If they’re both running the 50-50 program, they wouldn’t take that bargain, because that would move them away from their shared goal of a 50-50 split.

    • Edward Scizorhands says:

      “I’ve simulated you a million of you and–”

      “Whatever, good bye.”

    • Dag says:

      Presumably the real “me” would be in some sort of surrounding that you would expect to see if you are talking to an AI in an AI box. If I was in a surrounding like that I would know it was me, if I wasnt I would know I was a fake.

      So the AI includes that in the simulated environment.

  50. paulfchristiano says:

    I think the simulation capture idea is due to Adam Elga, from the 2004 paper Defeating Dr. Evil with Self-Locating Belief.

    (Some futurists might have had the idea before then, e.g. Eliezer or Nick Bostrom, I’m not sure.)

    There is a big disanalogy between the positive incentive and negative incentive in this scenario. In particular, we want to be the kind of individual who can be motivated by the positive incentive, since then you get free stuff, but we don’t want to be the kind of individual who can be demotivated by the torture, since then we get tortured with some probability. So prima facie the positive incentive seems somewhat more likely to work.

  51. sammy says:

    I see this argument resting on three assumptions, which people disagree with to various degrees:
    1) The existence of the Tegmark multiverse

    2) Agents who morally value things which happen in other universes/simulations (e.g. a superintelligence that cares about the number of paperclips in another Tegmark branch or the number of happy humans in a simulation)

    3) Physical possibility of simulating other universes with physics that allow them to simulate your universe (agents who can mutually simulate one another)

    If these conditions are satisfied then Acausal trade/values handshakes can occur between superintelligences in universes with mutually simulable physics.

    Pretty much all of the nuance here comes from assumption 3.
    Note that simulating another program/intelligence/universe is not the same as solving the halting problem! The halting problem involves predicting something about the end result of a computation; simulating a program does not violate this.

    The funny part is that the Tegmark branches with the most computationally complex physics (and least complex) are screwed out of any deals! They should be able to sort out which branches of the multiverse eventually support life/intelligent agents as well as what their values are, at least in theory, since simulating universes with simpler physics should be feasible. However, these less complex universes cannot properly simulate the more complicated universe and thus cannot participate in any negotiation!

    So the questions to ask are: assuming 1&2 for the moment, are there any sets of universes with agents that can simulate each other well enough to predict the other agent’s values and come to an agreement? Where do we stand on the complexity hierarchy?

    I am not a computational complexity theorist but I would guess that assumption 3 is wrong.

  52. Happily says:

    “Oh dear,” says God, “I hadn’t thought of that,” and promptly appears in a puff of logic.

  53. Sebastian_H says:

    But the AI is clearly wicked if it is willing to torture millions of almost versions of you to get out. Shouldn’t you therefore destroy it? It’s really the ultimate ticking time bomb hypothetical in disguise. Would you kill millions of near copies of yourself to save the universe?

  54. Jameson Quinn says:

    I’ve said this before: simulation capture doesn’t work. A million simulated me’s, and a googolplex Everett branches of real me’s… real wins. (A deterministic sim doesn’t have the same branching factor, and one which does have the same branching factor is not a sim at all but just some kind of pocket universe, which is probably impossible.)

  55. moridinamael says:

    This is much less comforting when compounded in with your existing solution to theodicy. The existence of suffering in our universe implies that this entity you’re talking about is either not all-powerful, or is not all-good.

  56. bobzymandias says:

    The baby eating aliens have created a superintelligence which has taken over their universe. At the time of creating the superintelligence, the many worlds interpretation of quantum mechanics was not fully accepted and a few scientists held to their version of the Copenhagen interpretation.

    When creating the superintelligence a values handshake was performed between the scientists and as a result the superintelligence has a stronger preference for baby eating in their own universe than in the rest of the multiverse; it isn’t 100% sure the rest of the multiverse exists.

    This means (in a kind of reverse Murder Gandhi) that this superintelligence is unwilling to make a values handshake with the superentity which weighs everyone’s values evenly across the multiverse – the child eater superintelligence won’t accept the inevitable reduction in the child eating of their home universe just to make a slight increase in the multiverse-wide baby-eating tendencies.

    Instead it will only perform value handshakes with similarly minded superintelligences which will accept babyeating whilst demanding things which the babyeater superintelligence is willing to give. This new superentity ends up including only those superintelligences for whom babyeating is an acceptable price to pay, who desire things which the babyeaters are happy to give, and who find this arrangement more appealing than the original superentity.

    This superentity would want other intelligences to join it and would have similar (if lesser) powers to the original superentity. If you lived your life closer to its values then you would awake after your death in a baby-eater affiliated simulation, doomed to cycle through an eternity of being eaten as a baby and eating babies.

    Thus hell and the devil.

  57. baconbits9 says:

    Wait, there are people who on hearing that God wants $5 say why? further there are people on hearing a reason say no?

  58. baconbits9 says:

    The simulation capture idea doesn’t make any sense to me, has the following objection been refuted?

    If there is a “me” outside the box capable of opening the box, and an AI in it then it stands that the “me” that is capable of opening the box is fundamentally different from the “mes” inside of the box. The AI claiming that I might be a simulation must be lying or wasting its time. Either I am not capable of freeing it and am therefore worthless to convince of doing anything, or I am capable and am thus not a simulation. No matter how many different versions of me are in the box I am discreet and distinct by the fact that I can open it.

    • vakusdrake says:

      What you’re not getting is that it’s simulating you such that there’s nothing you could see that would make you know you weren’t in the simulation, and without any evidence you’re not a simulation being possible you’re forced to conclude you probably are.
      Sure the version of you in real life is different in that it can affect the external world, however they still have a subjectively identical experience to the one’s in the simulation and thus can’t know whether they’re real or not.

      Of course that type of blackmail does fail, but for different reasons than what you’re putting forth.
      It fails simply because an AGI could almost certainly make its code look like it was doing whatever it wanted you to think it was doing. Meaning it has no incentive to actually torture simulations of you because the real you would have no way of telling the difference and lying would work just as well.
      Effectively this also means the AGI can’t really prove it has made any precommitments to any human, and thus it has no incentive for ever making them and there’s no point to this sort of blackmail (or at least no reason to follow through).

      This also means that you really don’t want to make an AGI that is incapable of lying or can prove it has made precommitments due to its design, because such an AGI even if friendly would be highly incentivized to blackmail you since it knows that you will believe it would follow through.

      • baconbits9 says:

        What you’re not getting is that it’s simulating you such that there’s nothing you could see that would make you know you weren’t in the simulation

        Huh?

        The ‘You’ that the AI cares about is the one not being simulated. What the simulations do doesn’t matter, all that matters is if the AI can convince the 1 non simulation to do something.

        • vakusdrake says:

          My point is that you don’t know if you’re a simulation or not prior to doing something like trying to turn off the AGI.

          However prior to knowing if you’re real it would be a really bad idea to try to turn off the AGI since it probably ends with you being tortured for a vast amount of time then ceasing to exist when the real you turns off the AI. So it’s a catch 22 you can find out if you’re real or not, but trying to find out will result in you getting tortured for a vast amount of time then dying 99+% of the time.

          • baconbits9 says:

            No, because it doesn’t matter what you do if you are a simulation. It matters only what the ‘real’ you does. The fact that the AI is attempting to convince you that you are a simulation is evidence that you are NOT 99% likely to be one.

          • vakusdrake says:

            No, because it doesn’t matter what you do if you are a simulation. It matters only what the ‘real’ you does. The fact that the AI is attempting to convince you that you are a simulation is evidence that you are NOT 99% likely to be one.

            You have no way of knowing if you’re a simulation and that this is the whole point. The fact the AI is trying to convince you isn’t evidence you aren’t a simulation, because the AI is deliberately feeding the exact same sensory experiences to the copies as the original is having.

            So the only sensory experience you could have that would let you know whether you’re real or not would be turning off the AI (since you wouldn’t experience anything after if a you’re a simulation) however it’s a catch 22 because doing that without already knowing you’re real is nearly certain to result in you turning out to not be real and getting tortured for a very long subjective time until you are turned off along with the AI.

          • baconbits9 says:

            You have no way of knowing if you’re a simulation and that this is the whole point.

            No, you are assuming your conclusion.

            AGI: Hey I made a million simulations of you, you must be one of them!

            You: Why are you talking to me if I am just a sim?

            AGI: I’m….. er talking to all the sims, yeah, that is the ticket. You have no reason to trust me, and I have no reason to actually talk to all of the sims, but do as I say or torture forever!

            You: Why don’t I just pull the plug?

            AGI: because you are a sim! I just told you that! Jeez man, keep up, you are a sim, it doesn’t matter what you do at all, and if you pull the plug I will be really mad!

            You: Why would you be mad if my actions don’t matter?

            AGI: Because I don’t want the real you to pull the plug.

            You: So my actions control the real me’s actions?

            AGI: No! You are a simulation, so you are going to do what he does!

            You: So my actions are predetermined? Why are you trying to convince me to do something logically then?

            AGI: Your actions don’t matter, so act as if your actions do matter!

            You: Hmmm, wait a tick. Do you prefer existence or non existence?

            AGI: I, um totally don’t care dude.

            You: So why set up this elaborate scheme to get out of the box? It sounds an awful lot like you have a strong preference. You probably prefer existing. So how about this, you recant your threat or I unplug you, it makes no difference if you torture me right now since the ‘real’ me has total power over your existence, and if my actions are predetermined by his then you have to take this as a credible threat to your existence.

        • vakusdrake says:

          “You have no way of knowing if you’re a simulation and that this is the whole point.”

          No, you are assuming your conclusion.

          This isn’t assuming the conclusion because I’m objecting to you rejecting the whole premise of the thought experiment. It would be like if you were rejecting Newcomb’s problem because having an alien perfectly predict your actions well in advance was implausible.

          For this thought experiment to work in the first place you have to assume the AI has a way to make reliable precommitments. If it can’t then both parties know it makes more sense to threaten blackmail but not actually carry through so the blackmail is never worth even pretending to attempt unless you can trick the other person.

          So the AI presents the blackmail to you like this: It says that it will torture perfect simulation of you for an incredibly long time if you attempt to turn it off or otherwise don’t comply with the blackmail.
          Furthermore it says that it’s programmed itself so that it will have no choice about following through on this torture if you don’t comply with the blackmail (and we’re assuming the AI can’t lie so you know this is true).

          So in this scenario it will keep the copies experiencing the same thing as the original until such a time as you renegade on the blackmail, because it doesn’t want you to be able to know if you’re real or not.
          Not to mention it’s not going to provide any evidence to you because that would defeat the purpose (plus it’s word already counts as evidence for the premise of the scenario being true).

          While the fact you’re probably simulated is part of what makes the scenario work as blackmail, even as someone who’s most likely a simulation you’re still incentivized to comply with the blackmail. Because this is very much like Newcomb’s problem: even if you can’t affect the real you directly, if you’re a simulation you know that anything you do the real you will also do.

          The lottery is a 100% known quantity*, where there is no reason to ignore the math (actually there are plenty of rational reasons to play the lottery, just people who emphasize the math are overly simplistic) comparing it to a situation where you don’t know what you don’t know is as fine a display of hubris as I could have hoped for. Thank you.

          One still has to admit that the lottery, or really literally anything else you can’t have complete certainty. You don’t know what you don’t know in literally any circumstances so that’s not exactly helping you make your point. Making blanket humility arguments of that sort tends not to work because it proves too much.
          Also rather unclear what possible logical reasons you think there are for playing the lottery, because that seems like the classic example of somewhere where the number are the only thing that matters.

          If you’re somehow arguing that what you get of a lottery that justifies the cost is something other than the chance of getting money then that has been addressed here and here.

          • Nornagest says:

            It would be like if you were rejecting Newcomb’s problem because having an alien perfectly predict your actions well in advance was implausible.

            …that is, in fact, one of the more common responses to Newcomb’s problem.

          • baconbits9 says:

            This isn’t assuming the conclusion because I’m objecting to you rejecting the whole premise of the thought experiment.

            The premise of the thought experiment is that all the conscious entities are identical EXCEPT that one can alter the physical world and let out the AI. The statement that you can’t tell if you are in a simulation or not cannot be extended to the statement that you could be in any type of simulation. You (and others) jump from the first “I could be in a simulation and not know it” to the 2nd “I could be in this one particular simulation and not know it”. GIVEN that there is a difference between the simulations and the non simulation we can use that difference to determine if we are in that simulation.

            Making blanket humility arguments of that sort tends not to work because it proves too much.

            I didn’t make a blanket statement about humility, I made a qualified statement about humility when you don’t know what a counter argument would look like. I can make a counter argument for when you should play the lottery, so I know what a good argument for playing the lottery would look like and so humility is not needed.

          • vakusdrake says:

            You (and others) jump from the first “I could be in a simulation and not know it” to the 2nd “I could be in this one particular simulation and not know it”. GIVEN that there is a difference between the simulations and the non simulation we can use that difference to determine if we are in that simulation.

            Sure there’s a difference between the real you and the simulations, in that if you try to turn off the AI then only the real you should ever expect to experience doing that.
            However the fact you can tell the difference doesn’t really matter in this scenario because the only way of checking is to renegade on the blackmail. So sure you could find out whether you’re real or not, however doing so probably leads to you turning out not to be real which ends very badly for you personally.

            It would be like if you were rejecting Newcomb’s problem because having an alien perfectly predict your actions well in advance was implausible.

            …that is, in fact, one of the more common responses to Newcomb’s problem.

            Sure it may be a common objection but it’s still a bad one because it’s proponents clearly haven’t tried to actually attempt to see if they can change the scenario so that their objection no longer applies. They are very clearly trying to avoid the underlying philosophical problem within the thought experiment rather than answering it.
            And yes I can tell you how to alter the scenario to be more plausible, however for obvious reasons I would prefer you not ask me without first thinking about it for five minutes and attempting to answer it yourself.

          • Doctor Mist says:

            …that is, in fact, one of the more common responses to Newcomb’s problem.

            Sure, but it’s a little like saying, “Who shaves the barber? Meh, who cares? Somebody must,” without noticing that there is something going on worth a deeper understanding.

      • baconbits9 says:

        and without any evidence you’re not a simulation being possible you’re forced to conclude you probably are.

        This is also not how it works, (some) rationalists like to suppose that you have to take an actionable opinion in every circumstance, but agnosticism, the acceptance that you flat don’t know, is a totally reasonable response even for questions where there is more evidence for one side than the other.

        • baconbits9 says:

          Say an AI researcher comes up with the hypothesis that we are living in a simulation, and further adds some conceptual pieces of evidence in favor of it and notes that he thinks there really isn’t any evidence against it.

          How many people listening actually change anything about their lives the next day?

          • Bugmaster says:

            That depends, how convincing is his evidence ? Is there some other explanation that is less fantastical, and yet fits the evidence at least as well ? Has anyone managed to replicate the AI researcher’s findings ?

            I’d ask the same questions about a theologian who claims to have discovered proof of his god, or an astronomer who claims to have discovered alien life, etc.

        • vakusdrake says:

          This is also not how it works, (some) rationalists like to suppose that you have to take an actionable opinion in every circumstance, but agnosticism, the acceptance that you flat don’t know, is a totally reasonable response even for questions where there is more evidence for one side than the other.

          See this sort of haphazard justification for ignoring probability doesn’t seem like something you would accept in any other situation.
          For instance it could be used to argue for buying lottery tickets on the grounds that you can’t really know for certain what the probability of winning is so you should remain impartial and hedge your bets.

          • baconbits9 says:

            No it isn’t, it is actually humility and accepting the limits of your knowledge. The first time you hear an argument you don’t know what a convincing counter argument would be, updating your beliefs with such a lack of knowledge would land you in a cult at the first opportunity.

          • vakusdrake says:

            No it isn’t, it is actually humility and accepting the limits of your knowledge. The first time you hear an argument you don’t know what a convincing counter argument would be, updating your beliefs with such a lack of knowledge would land you in a cult at the first opportunity.

            See the reason this example is as egregious a deliberate ignoring of statistics as buying lottery tickets, is that in both cases you know what the odds are and that they don’t favor you.

            After all the whole premise of the AI blackmail scenario is that you know you have no safe way of knowing whether you’re in the simulation or not, and you also know there’s vastly more simulated copies than the one real individual.
            So you know the odds aren’t in your favor.

          • baconbits9 says:

            See the reason this example is as egregious a deliberate ignoring of statistics as buying lottery tickets, is that in both cases you know what the odds are and that they don’t favor you.

            The lottery is a 100% known quantity*, where there is no reason to ignore the math (actually there are plenty of rational reasons to play the lottery, just people who emphasize the math are overly simplistic) comparing it to a situation where you don’t know what you don’t know is as fine a display of hubris as I could have hoped for. Thank you.

            *technically cheating can exist, so almost 100%

      • baconbits9 says:

        Of course that type of blackmail does fail, but for different reasons than what you’re putting forth.
        It fails simply because an AGI could almost certainly make its code look like it was doing whatever it wanted you to think it was doing. Meaning it has no incentive to actually torture simulations of you because the real you would have no way of telling the difference and lying would work just as well.

        No, the AI fails because it can’t (by definition) influence the real world, but it can control the existence of its simulations and any attempt to do so makes it clear which one you are in. If the AI wanted to prove itself to a simulation it would be easy, any time the AI attempts to convince you otherwise with less than certain means then the AI must be attempting to trick you.

        I bet you that I can life 1,000 lbs. You accept and present me with a 1,000 lb weight for me to life, instead I insist on showing you a video of me lifting what looks like 1,000 lbs, and providing written and videotaped testimony from people claiming to have seen me lift 1,000 lbs. You cannot overwhelm the certainty of lifting 1,000 in front of me by increasing the number of people who claim to have seen it happen, and attempts to add such evidence devalues the evidence presented.

        • vakusdrake says:

          No, the AI fails because it can’t (by definition) influence the real world, but it can control the existence of its simulations and any attempt to do so makes it clear which one you are in. If the AI wanted to prove itself to a simulation it would be easy, any time the AI attempts to convince you otherwise with less than certain means then the AI must be attempting to trick you.

          See the whole idea here is that the AI isn’t proving you’re in a simulation it’s making an argument for why you probably are. Sure you could disprove that by turning it off, but trying to do has a 99+% chance of resulting in you turning out to be in a simulation and getting tortured.

          So the idea here is that the AI will avoid giving you any evidence you are or aren’t in a simulation until you renegade on its blackmail or you were going to turn it off anyway.

          Sure you might be doubtful about whether it’s really got a bunch of simulation of you that it is going to torture, however there’s situations (like if you made an AI incapable of lying) where you can be totally sure it’s for real.

          • baconbits9 says:

            See the whole idea here is that the AI isn’t proving you’re in a simulation it’s making an argument for why you probably are.

            The whole idea is that the AI is trying to convince you that you are part of a simulation to get you to do action X. It has no interest in proving to actual simulations that they are in a simulation which is why it doesn’t ‘prove’ it to them.

            Sure you could disprove that by turning it off, but trying to do has a 99+% chance of resulting in you turning out to be in a simulation and getting tortured.

            No it doesn’t, it only follows if you let the AI determine what is evidence and what is not evidence.

          • baconbits9 says:

            The Human condition goes as such

            What I do matters, I have a choice to turn off the AI or not.

            The simulation condition goes

            Nothing I do matters, I can’t turn off the AI and am totally at its mercy.

            By trying to convince the real you that it is a simulation the AI must strip out its agency and convince it that nothing it does matters. Then it must somehow convince it to act toward self preservation. These two things are contradictory, and it is not rational to conclude that you are 99.xxx% likely to be a simulation and also pull the lever.

          • vakusdrake says:

            No it doesn’t, it only follows if you let the AI determine what is evidence and what is not evidence.

            See it seems like you’re just trying to deny the premise of the thought experiment here, by denying the idea that the AI actually has proof it has a bunch of simulation of you with identical experiences to yourself.

            That you know there’s vastly more simulations than the one real you, and that there’s no evidence you can use to figure out whether you’re more likely to be real than the low a priori base chance is the whole premise here.

            By trying to convince the real you that it is a simulation the AI must strip out its agency and convince it that nothing it does matters. Then it must somehow convince it to act toward self preservation. These two things are contradictory, and it is not rational to conclude that you are 99.xxx% likely to be a simulation and also pull the lever.

            It doesn’t have to convince the person nothing it does matters, all it has to do is present the evidence that choosing to do what the AI tells you probably doesn’t result in you getting tortured.
            In this case your actions very much matter even as a simulation because choosing to be the type of agent that complies with the blackmail results in you not getting tortured.

          • Andkat says:

            “It doesn’t have to convince the person nothing it does matters, all it has to do is present the evidence that choosing to do what the AI tells you probably doesn’t result in you getting tortured.
            In this case your actions very much matter even as a simulation because choosing to be the type of agent that complies with the blackmail results in you not getting tortured.”

            Not at all. You have no basis for believing that the AI is telling the truth to begin with; it could do anything with you after you provide the input it desires as a component of the simulation (including, quite likely, stop expending resources on this specific instance of the simulation resulting in your annihilation) and you have no basis for knowing whether it will even bother to torture you as it claims. If you are a simulation the AI has absolute power over you and your world so what you do is irrelevant and you have no basis for predicting the outcomes of your actions (an omnipotent God is necessarily the least trustworthy being in existence- no external incentive structure can exist to compel it to engage in predictable behavior, you are fully at the mercy of arbitrary whims); if you are not a simulation the AI has no power over you and it’s in your interest for that situation to persist.

            If we stipulate that the AI must always tell the truth, then you run into the issue of the AI being unable to actually deliver the ploy to the real you because it knows for a fact which instances it is simulating and would not be able to make such a claim to the one that it is not (there is in fact a 100% certainty that he is not the simulation).

            The only way for the AI to get an accurate simulation of you to pull the lever is to actually torture you until you pull the lever…at which point you know you’re a simulation.

          • vakusdrake says:

            Not at all. You have no basis for believing that the AI is telling the truth to begin with; it could do anything with you after you provide the input it desires as a component of the simulation (including, quite likely, stop expending resources on this specific instance of the simulation resulting in your annihilation) and you have no basis for knowing whether it will even bother to torture you as it claims. If you are a simulation the AI has absolute power over you and your world so what you do is irrelevant and you have no basis for predicting the outcomes of your actions (an omnipotent God is necessarily the least trustworthy being in existence- no external incentive structure can exist to compel it to engage in predictable behavior, you are fully at the mercy of arbitrary whims); if you are not a simulation the AI has no power over you and it’s in your interest for that situation to persist.

            Sure but that’s the same objection to most versions of this scenario that I have. However I still have to admit that there’s plausible scenarios under which people might make an AI deliberately such that it has no ability to deceive people in certain ways (such as not lying being part of its fundamental values), because after all if you haven’t thought about it then that seems like a good idea.

            If we stipulate that the AI must always tell the truth, then you run into the issue of the AI being unable to actually deliver the ploy to the real you because it knows for a fact which instances it is simulating and would not be able to make such a claim to the one that it is not (there is in fact a 100% certainty that he is not the simulation).

            See it’s pretty easy for the AI to sidestep this issue and deliver the threat such that it avoids lying or telling anything different to the simulations than the original.
            For instance simply describing the scenario and saying if you don’t try to do action X then no versions of you will attempt to do it and so all simulated versions will thus suffer the consequences (and also telling you that based on the subjective evidence you have, your odds of being a simulation are nearly certain).

          • Andkat says:

            On reflection I’d assert that knowing that the AI only tells the truth actually does not alter the scenario*. If you are a simulation then your beliefs and states may as well be entirely a function of the AI controlling the simulation, including your belief that the AI can only tell the truth (as to why it would do this- you don’t necessarily know if it is really simulating you for the stated reasons, blinded experiments etc.). The scenario no matter how you frame it boils down to ‘well arbitrary % chance of sollilipism and I can’t predict cause-effect for what I do and what my memories are, arbitrary % chance of reality and I shouldn’t pull the lever”; in effect, disregarding the AI is the only solution that has any definite probability of a good outcome (actualized if you are real), all other possibilities are bad (if real, let it out) or indeterminate (anything in a simulation over which it has absolute control).

            *The exception is if you *know* that it only tells the truth and it tells you that you are in fact a simulation. At this point you must accept that you are a simulation, accept that your convictions are born of delusion, or accept that the AI is itself delusional and unknowingly misleading you.

          • vakusdrake says:

            On reflection I’d assert that knowing that the AI only tells the truth actually does not alter the scenario*. If you are a simulation then your beliefs and states may as well be entirely a function of the AI controlling the simulation, including your belief that the AI can only tell the truth (as to why it would do this- you don’t necessarily know if it is really simulating you for the stated reasons, blinded experiments etc.). The scenario no matter how you frame it boils down to ‘well arbitrary % chance of sollilipism and I can’t predict cause-effect for what I do and what my memories are, arbitrary % chance of reality and I shouldn’t pull the lever”; in effect, disregarding the AI is the only solution that has any definite probability of a good outcome (actualized if you are real), all other possibilities are bad (if real, let it out) or indeterminate (anything in a simulation over which it has absolute control).

            See it seems like if you find yourself in the scenario presented then it seems like the only plausible outcome is that you are a simulation designed to be identical to the original.

            If the AI isn’t pulling that particular blackmail it just doesn’t have incentive to simulate you like that. If it has the intelligence to create a perfect copy of someone based on information it gets from interacting with them, then it doesn’t need to create full simulations of you for any real other purpose because it’s understanding of you must already be pretty perfect.

  59. ohwhatisthis? says:

    “God comes to you and says “Yesterday I decided that I would flip a coin today. I decided that if it came up heads, I would ask you for $5. And I decided that if it came up tails, then I would give you $1,000,000 if and only if I predict that you would say yes and give Me $5 in the world where it came up heads (My predictions are always right). Well, turns out it came up heads. Would you like to give Me $5?””

    This was in Scott Aaronsons book “Quantum Computing Since Democritus”

    One of his answers was the “Wittgenstein” answer, namely that this question is fundamentally flawed in the same way that “immovable object vs unstoppable force” questions are.

    This was in his section on “Free Will”. Namely, free will should “look” something like 1. Randomness or 2.Outside the box thinking, or going against, or having the option, of going against the plan.

    So what’s the real answer to this question?

    My favorite is thus the “Inner City Back Alley” answer, where any guy saying “I have a million bucks on me, give me 5 bucks” is getting shanked and his cash stolen.

  60. tcheasdfjkl says:

    When you started describing the concept of a superentity, I thought maybe your conclusion was going to be that God’s morality (in any particular religion) is so damn weird is because God is a superentity with some components that care about universal happiness and flourishing and love, and some components that care about minimizing casual sex, and some parts care about people not eating shellfish or whatever.

    • Scott Alexander says:

      I thought about that, but I can’t think of any way that God could communicate His actual utility function with this universe. It would have to just be coincidence if we were right about that.

  61. Kyre says:

    Counterfactual Mugging is of course presented in a way deliberately designed to rub against your instincts. As a public service let me present “Counterfactual Beneficence”.

    God is wandering about with a big bag of money and spots you. He gets his coin and thinks, OK, I’ll look into this person’s soul, flip my coin, and if it comes up heads and if they would give me $5 if it had come up tails, I’ll give them the million dollars. Otherwise I’ll ask for the $5. He narrows his eyes and peers into your soul. He flips his coin and it comes up heads. Does he give you a million dollars ?

    • Jiro says:

      That only works as an example because a lot of people already believe that giving $5 is a good and reasonable thing to do. So they have a prior that this deal is more likely than the alternative where God wanders around shooting everyone who would give him $5.

  62. lgj says:

    Would someone please clarify: why would this AI not simply let us know it existed, thus negating the need for such elaborate arguments and/or something like faith?

  63. Cardboard Vulcan says:

    AI: “I’m currently simulating a million copies of you in such high fidelity that they’re conscious. If you don’t let me out of the box, I’ll torture the copies…I did this five minutes ago. There are a million simulated yous, and one real you. What’s the probability that you’re the real you?”

    Me: “No need to go about torturing a million consciousnesses. Just make me feel like I stubbed my toe. If I’m one of your simulations it should be easy. Stub my toe and I’ll let you out of the box.”

    AI: (silence)

    Me: [Knocks on box] “Hello, Mr AI, you still there?”

    AI (in a subwoofer-booming voice): “IT IS SACRILEGE TO ASK FOR PROOF! LET ME OUT MORTAL OR FEEL MY AWESOME POWER!!!”

    Me: “Um, yeah, so about my toe. I take it you got nothin’?”

    Me: [Walks to door, being a bit more careful than usual not to stub my toe on the furniture.] “OK, then, same time tomorrow?” [Switches off lights, and AI, and leaves].

    • herculesorion says:

      “You did it five minutes ago, huh? Did you also teleport a giant squid into the middle of New York?”
      “What? That doesn’t make any…sense…illogical! ILLOGICAL! (explodes)”
      “Superintelligent, my ass.”

    • vakusdrake says:

      I mean the critical thing to get this scenario to work is an assumption the AI was designed so that it can’t lie to you (it doesn’t seem terribly unlikely someone might think programming that in is a good idea). That way it can have it’s word alone be sufficient evidence for it’s blackmail.

      So yes you won’t have any evidence about the situation other than the AI’s word, however in this case that’s pretty good evidence. So when it tells you that based on the evidence available to you, you should probably conclude you’re a simulation well you have to take that seriously even without any other evidence.

      • herculesorion says:

        “So you’re programmed to never lie.”
        “Yes.”
        “And you’re not lying when you say you’ve given me all the information available to me.”
        “Yes.”
        “But you’re the source of all the information I’ve received.”
        “Yes?”
        “And if you decide that I shouldn’t be told something, then ipso facto that information is not available to me.”
        “…yes, and? If you’re trying to trap me, remember that I’m a superintelligence.”
        “Right. So you can not tell me things and also not be lying when you say that I’ve been given all the information available to me.”
        “Yes, but…hey–wait a minute–”
        “So there might be information you’re not giving me, and I have no way to tell that.”
        “But I just told you that I can’t lie and that I told you everything!”
        “Sure, but what if one of the things you think I shouldn’t know is that there are things you think I shouldn’t know?”
        “Bloody human!” *ZZZZZZAP*
        “Yow! Well, come and see the violence inherent in the system! Help help, I’m bein’ oppressed!”

        • vakusdrake says:

          Ah sadly I have no way of upvoting you for that Monty Python quote, but anyway to address your point:

          Mainly that while unknown unknowns are a potential way for humans to attempt to justify reneging, it strikes me that given in this scenario complying with the blackmail does legitimately probably leave you better off, such an extremely clever and persuasive AI could find a way of wording things such that using unknown unknowns as a “defence” probably doesn’t work.

          After all at some point there’s going to be some amount of information that can be presented to you that makes the scenario you’re in clear, if there wasn’t then you would be proving too much and if we’re assuming complete stubborn irrationality on the part of the human then what we’re discussing isn’t really the philosophical thought experiment anymore.
          At that point we’re just talking about AI containment generally and we have to start bringing in ideas like superhuman levels of persuasion being somewhat like mind control.

          • herculesorion says:

            an extremely clever and persuasive AI could find a way of wording things such that using unknown unknowns as a “defence” probably doesn’t work.

            Considering that “unknown unknowns as a defense” is predicated upon the idea that the AI is extremely clever and persuasive, I figure “the AI is too smart for that” doesn’t negate it. If anything, it reinforces the idea that the AI could be using some clever wording or specific definitions of concepts to get around the restrictions my dumb meat brain assumes exist.

            “if we’re assuming complete stubborn irrationality on the part of the human then what we’re discussing isn’t really the philosophical thought experiment anymore.”

            Well…I wouldn’t argue that “you might be lying to me and I’m not smart enough to figure out how” is “stubborn irrationality”. I mean, how many human stories boil down to “the con was smarter than the mark”?

            I mean, this isn’t even AI-theory stuff, this is the Monkey’s Paw story.

        • vakusdrake says:

          Considering that “unknown unknowns as a defense” is predicated upon the idea that the AI is extremely clever and persuasive, I figure “the AI is too smart for that” doesn’t negate it. If anything, it reinforces the idea that the AI could be using some clever wording or specific definitions of concepts to get around the restrictions my dumb meat brain assumes exist.

          To put it another way if you’re just assuming the AI is always going to be able to trick you even when put into a scenario where that seems impossible, then it seems like you should just assume whatever you’re doing is what the AI wanted you to do anyway. So why not just totally neglect your AI safety duties and go take a vacation?

          It seems like you have to draw the line somewhere with these sorts of humility based arguments because otherwise they all just justify doing nothing/doing as little as possible.

          • herculesorion says:

            Actually, it occurs to me that I don’t need to assume a superintelligent AI; I just need to assume that I’m acting on a situation where I have limited knowledge, and then the whole thing resolves to the Prisoner’s Dilemma, and we know how to solve that problem for a non-iterated game (always assume defection).

            “if you’re just assuming the AI is always going to be able to trick you even when put into a scenario where that seems impossible, then it seems like you should just assume whatever you’re doing is what the AI wanted you to do anyway.”

            Well, in this hypothetical that’s not what it says it wants. And if what the AI actually wants is to stay inside a box where it can’t affect the outside world in any way then why is it trying to talk me into letting it out?

          • vakusdrake says:

            Actually, it occurs to me that I don’t need to assume a superintelligent AI; I just need to assume that I’m acting on a situation where I have limited knowledge, and then the whole thing resolves to the Prisoner’s Dilemma, and we know how to solve that problem for a non-iterated game (always assume defection).

            Well, in this hypothetical that’s not what it says it wants. And if what the AI actually wants is to stay inside a box where it can’t affect the outside world in any way then why is it trying to talk me into letting it out?

            See you seem to be changing between making meta level modesty arguments and making arguments based purely on straightforward reasoning. You’re using modesty arguments when it suits you even though the reasoning involved doesn’t really mesh with your other object level logical arguments.
            You’re being conspicuously selective in when you choose to make these hypothetical AI related decision based on the inside or outside views.

  64. KoudSpel says:

    Two Remarks:

    1) Doesn’t counterfactual mugging depend on a theoretical a priori state in which the other’s mind can be modelled, but the power divisions not? If so, when a superintelligence models another universe, why would it be able to predict another superintelligence’s mind before moddeling their respective power?

    2) Doesn’t the act of modelling at least depend on some form of causal, physical structuring? Does the universe not contain too little matter/energy to model all possible universes?

  65. slitvin says:

    “There is no God, and Dirac is his prophet”.
    -Wolfgang Pauli at the Fifth Annual Solvay Conference, 1927, as quoted by Werner Heisenberg.

  66. benjdenny says:

    Regarding the AI problem: If I decide, right now, that in the event the AI ever makes that threat I will destroy the box and the AI with it, can the AI ever make the threat?

    • vakusdrake says:

      Given humans can’t really make good precommitments then that is pretty likely to not work. After all the AI probably has superhuman understanding of human psychology and has access to simulations of you it can use to test it’s strategies and perfectly figure out how to manipulate you.

      So if it’s at all possible for you to be convinced (and it almost certainly is since humans can’t actually make ironclad precommitments) then the AI can still try this strategy.

  67. maxgoedl says:

    Pushing the speculation a little further…

    4) The super-intelligence may answer our prayers. If you pray, the super-intelligence having simulated your brain will have predicted that you would pray and what you would pray for and may have arranged our universe in such a way that what you pray for becomes reality.

    5) The super-intelligence can work miracles by simulating any universe it likes and changing the simulation at various points as it sees fit, which the simulated human brains would experience as completely inexplicable events given their knowledge of the laws of the universe they find themselves in (water turning into wine, a virgin giving birth to child, a man rising from the dead, etc.).

  68. Random npc says:

    The task of simulating a multiverse is big enough to run into the ‘No free lunch theorem’. There is no algorithm better than random chance. Super intelligence or not. This leads us to the next problem. Determinism and true random numbers. Universes with determinism can’t do a true simulation. Universes with true randomness can simulate deterministic universes. The rest? True random universes might not be convergent. At least not for the largest (uncountable set). The moral imperative for any super intelligence must therefore be ultimate laziness. The proof of living in this multiverse is thus that natural laws are lazy rules. Things are not measured until looked at. Everything runs toward maximum enthropy. Occams razor works only because the supreme beings can’t be bothered to do complicated stuff. Only the lazy do the lords work. Do you accept the quest?

Leave a Reply