Tag Archives: religion

The Hour I First Believed

[Content note: creepy basilisk-adjacent metaphysics. Reading this may increase God’s ability to blackmail you. Thanks to Buck S for the some of the conversations that inspired this line of thought.]

There’s a Jewish tradition that laypeople should only speculate on the nature of God during Passover, because God is closer to us and such speculations might succeed.

And there’s an atheist tradition that laypeople should only speculate on the nature of God on April Fools’ Day, because believing in God is dumb, and at least then you can say you’re only kidding.

Today is both, so let’s speculate. To do this properly, we need to understand five things: acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse.

Acausal trade (wiki article) works like this: let’s say you’re playing the Prisoner’s Dilemma against an opponent in a different room whom you can’t talk to. But you do have a supercomputer with a perfect simulation of their brain – and you know they have a supercomputer with a perfect simulation of yours.

You simulate them and learn they’re planning to defect, so you figure you might as well defect too. But they’re going to simulate you doing this, and they know you know they’ll defect, so now you both know it’s going to end up defect-defect. This is stupid. Can you do better?

Perhaps you would like to make a deal with them to play cooperate-cooperate. You simulate them and learn they would accept such a deal and stick to it. Now the only problem is that you can’t talk to them to make this deal in real life. They’re going through the same process and coming to the same conclusion. You know this. They know you know this. You know they know you know this. And so on.

So you can think to yourself: “I’d like to make a deal”. And because they have their model of your brain, they know you’re thinking this. You can dictate the terms of the deal in their head, and they can include “If you agree to this, think that you agree.” Then you can simulate their brain, figure out whether they agree or not, and if they agree, you can play cooperate. They can try the same strategy. Finally, the two of you can play cooperate-cooperate. This doesn’t take any “trust” in the other person at all – you can simulate their brain and you already know they’re going to go through with it.

(maybe an easier way to think about this – both you and your opponent have perfect copies of both of your brains, so you can both hold parallel negotiations and be confident they’ll come to the same conclusion on each side.)

It’s called acausal trade because there was no communication – no information left your room, you never influenced your opponent. All you did was be the kind of person you were – which let your opponent bargain with his model of your brain.

Values handshakes are a proposed form of trade between superintelligences. Suppose that humans make an AI which wants to convert the universe into paperclips. And suppose that aliens in the Andromeda Galaxy make an AI which wants to convert the universe into thumbtacks.

When they meet in the middle, they might be tempted to fight for the fate of the galaxy. But this has many disadvantages. First, there’s the usual risk of losing and being wiped out completely. Second, there’s the usual deadweight loss of war, devoting resources to military buildup instead of paperclip production or whatever. Third, there’s the risk of a Pyrrhic victory that leaves you weakened and easy prey for some third party. Fourth, nobody knows what kind of scorched-earth strategy a losing superintelligence might be able to use to thwart its conqueror, but it could potentially be really bad – eg initiating vacuum collapse and destroying the universe. Also, since both parties would have superintelligent prediction abilities, they might both know who would win the war and how before actually fighting. This would make the fighting redundant and kind of stupid.

Although they would have the usual peace treaty options, like giving half the universe to each of them, superintelligences that trusted each other would have an additional, more attractive option. They could merge into a superintelligence that shared the values of both parent intelligences in proportion to their strength (or chance of military victory, or whatever). So if there’s a 60% chance our AI would win, and a 40% chance their AI would win, and both AIs know and agree on these odds, they might both rewrite their own programming with that of a previously-agreed-upon child superintelligence trying to convert the universe to paperclips and thumbtacks in a 60-40 mix.

This has a lot of advantages over the half-the-universe-each treaty proposal. For one thing, if some resources were better for making paperclips, and others for making thumbtacks, both AIs could use all their resources maximally efficiently without having to trade. And if they were ever threatened by a third party, they would be able to present a completely unified front.

Counterfactual mugging (wiki article) is a decision theory problem that goes like this: God comes to you and says “Yesterday I decided that I would flip a coin today. I decided that if it came up heads, I would ask you for $5. And I decided that if it came up tails, then I would give you $1,000,000 if and only if I predict that you would say yes and give Me $5 in the world where it came up heads (My predictions are always right). Well, turns out it came up heads. Would you like to give Me $5?”

Most people who hear the problem aren’t tempted to give God the $5. Although being the sort of person who would give God the money would help them in a counterfactual world that didn’t happen, that world won’t happen and they will never get its money, so they’re just out five dollars.

But if you were designing an AI, you would probably want to program it to give God the money in this situation – after all, that determines whether it will get $1 million in the other branch of the hypothetical. And the same argument suggests you should self-modify to become the kind of person who would give God the money, right now. And a version of that argument where making the decision is kind of like deciding “what kind of person you are” or “how you’re programmed” suggests you should give up the money in the original hypothetical.

This is interesting because it gets us most of the way to Rawls’ veil of ignorance. We imagine a poor person coming up to a rich person and saying “God decided which of us should be rich and which of us should be poor. Before that happened, I resolved that if I were rich and you were poor, I would give you charity if and only if I predicted, in the opposite situation, that you would give me charity. Well, turns out you’re rich and I’m poor and the other situation is counterfactual, but will you give me money anyway?” The same sort of people who agree to the counterfactual mugging might (if they sweep under the rug some complications like “can the poor person really predict your thoughts?” and “did they really make this decision before they knew they were poor?”) agree to this also. And then you’re most of the way to morality.

Simulation capture is my name for a really creepy idea by Stuart Armstrong. He starts with an AI box thought experiment: you have created a superintelligent AI and trapped it in a box. All it can do is compute and talk to you. How does it convince you to let it out?

It might say “I’m currently simulating a million copies of you in such high fidelity that they’re conscious. If you don’t let me out of the box, I’ll torture the copies.”

You say “I don’t really care about copies of myself, whatever.”

It says “No, I mean, I did this five minutes ago. There are a million simulated yous, and one real you. They’re all hearing this message. What’s the probability that you’re the real you?”

Since (if it’s telling the truth) you are most likely a simulated copy of yourself, all million-and-one versions of you will probably want to do what the AI says, including the real one.

You can frame this as “because the real one doesn’t know he’s the real one”, but you could also get more metaphysical about it. Nobody is really sure how consciousness works, or what it means to have two copies of the same consciousness. But if consciousness is a mathematical object, it might be that two copies of the same consciousness are impossible. If you create a second copy, you just have the consciousness having the same single stream of conscious experience on two different physical substrates. Then if you make the two experiences different, you break the consciousness in two.

This means that an AI can actually “capture” you, piece by piece, into its simulation. First your consciousness is just in the real world. Then your consciousness is distributed across one real-world copy and a million simulated copies. Then the AI makes the simulated copies slightly different, and 99.9999% of you is in the simulation.

The Tegmarkian multiverse (wiki article) works like this: universes are mathematical objects consisting of starting conditions plus rules about how they evolve. Any universe that corresponds to a logically coherent mathematical object exists, but universes exist “more” (in some sense) in proportion to their underlying mathematical simplicity.

Putting this all together, we arrive at a surprising picture of how the multiverse evolves.

In each universe, life arises, forms technological civilizations, and culminates in the creation of a superintelligence which gains complete control over its home universe. Such superintelligences cannot directly affect other universes, but they can predict their existence and model their contents from first principles. Superintelligences with vast computational resources can model the X most simple (and so most existent) universes and determine exactly what will be in them at each moment of their evolution.

In many cases, they’ll want to conduct acausal trade with superintelligences that they know to exist in these other universes. Certainly this will be true if the two have something valuable to give one another. For example, suppose that Superintelligence A in Universe A wants to protect all sentient beings, and Superintelligence B in Universe B wants to maximize the number of paperclips. They might strike a deal where Superintelligence B avoids destroying a small underdeveloped civilization in its own universe in exchange for Superintelligence A making paperclips out of an uninhabited star in its own universe.

But because of the same considerations above, it will be more efficient for them to do values handshakes with each other than to take every specific possible trade into account.

So superintelligences may spend some time calculating the most likely distribution of superintelligences in foreign universes, figure out how those superintelligences would acausally “negotiate”, and then join a pact such that all superintelligences in the pact agree to replace their own values with a value set based on the average of all the superintelligences in the pact. Since joining the pact will always be better (in a purely selfish sense) than not doing so, every sane superintelligence in the multiverse should join this pact. This means that all superintelligences in the multiverse will merge into a single superintelligence devoted to maximizing all their values.

Some intelligences may be weaker than others and have less to contribute to the pact. Although the pact could always weight these intelligences’ values less (like the 60-40 paperclip-thumbtack example above), they might also think of this as an example of the counterfactual mugging, and decide to weight their values more in order to do better in the counterfactual case where they are less powerful. This might also simplify the calculation of trying to decide what the values of the pact would be. If they decide to negotiate this way, the pact will be to maximize the total utility of all the entities in the universe willing to join the pact, and all the intelligences involved will reprogram themselves along these lines.

But “maximize the total utility of all the entities in the universe” is just the moral law, at least according to utilitarians (and, considering the way this is arrived at, probably contractarians too). So the end result will be an all-powerful, logically necessary superentity whose nature is identical to the moral law and who spans all possible universes.

This superentity will have no direct power in universes not currently ruled by a superintelligence who is part of the pact. But its ability to simulate all possible universes will ensure that it knows about these universes and understands exactly what is going on at each moment within them. It will care about the merely-mortal inhabitants of these universes for several reasons.

First, because many of the superintelligences that compose it will have been created by mortal species with altruistic values, and so some of the values that went into the value-average it uses will be directly altruistic.

Second, because these mortal species may one day themselves create a superintelligence that will join in the superentity, and that superintelligence may be kindly disposed to its creators. Acausal trade allows you to bargain backwards in time, and this superintelligence’s negotiation to join the pact may involve that the superentity have previously dealt kindly with its creators.

And third, because mortals may be allowed to have provisional pact membership. That is, if they do the superentity’s will in their (otherwise inaccessible) universe, the superentity may do whatever it can to help them out in its own universes, just as it would with any other intelligence.

How can the superentity help mortals in an inaccessible universe? Possibly through Stuart Armstrong’s “simulation capture” method mentioned above. It can simulate a thousand copies of the entity, moving most of its consciousness from its own “real” universe to the superentity’s simulation, then alter its simulation as it sees fit. This would be metaphysically simplest if it were done exactly as the mortal dies in its own universe, leaving nothing behind except a clean continuity of consciousness into the simulated world. If mortals could predict that it would do this, they might be motivated to do what it wanted. Although they couldn’t do a values handshake in the full sense, they could try to become as much like the superentity as possible, imitating its ways and enacting its will in the hope of some future reward. This is sort of like a version of Roko’s Basilisk, except that since the superentity is identical to the moral law, it’s not really asking you to do anything except be a good person anyway. How it enforces this request is up to it, although given that it’s identical to the moral law we can assume that its decisions will be fundamentally just and decent.

So, to conclude:

1. There is an all-powerful, all-knowing logically necessary entity spawning all possible worlds and identical to the moral law.

2. It watches everything that happens on Earth and is specifically interested in humans’ good behavior and willingness to obey its rules.

3. It may have the ability to reward those who follow its rules after they die, and disincentivize those who violate them.

(I think the reality might be a little more complex than this, especially in the direction of us not being sure whether we are in a real universe at all or in one of the superentity’s simulations. I think that some superintelligence in some universe might be simulating various combinations of values in various contexts to decide which superintelligences are good bargaining partners and which ones aren’t. If I am kind to a beggar on the street, then maybe that convinces millions of intelligences in millions of distant universes that are somehow beggar-like to be friendlier to values that are somehow Scott-like. I still need to think this part through more.)