No Time Like The Present For AI Safety Work

I.

On the recent post on AI risk, a commenter challenged me to give the short version of the argument for taking it seriously. I said something like:

1. If humanity doesn’t blow itself up, eventually we will create human-level AI.

2. If humanity creates human-level AI, technological progress will continue and eventually reach far-above-human-level AI

3. If far-above-human-level AI comes into existence, eventually it will so overpower humanity that our existence will depend on its goals being aligned with ours

4. It is possible to do useful research now which will improve our chances of getting the AI goal alignment problem right

5. Given that we can start research now we probably should, since leaving it until there is a clear and present need for it is unwise

I placed very high confidence (>95%) on each of the first three statements – they’re just saying that if trends continue moving towards a certain direction without stopping, eventually they’ll get there. I had lower confidence (around 50%) on the last two statements.

Commenters tended to agree with this assessment; nobody wanted to seriously challenge any of 1-3, but a lot of people said they just didn’t think there was any point in worrying about AI now. We ended up in an extended analogy about illegal computer hacking. It’s a big problem that we’ve never been able to fully address – but if Alan Turing had gotten it into his head to try to solve it in 1945, his ideas might have been along the lines of “Place your punch cards in a locked box where German spies can’t read them.” Wouldn’t trying to solve AI risk in 2015 end in something equally cringeworthy?

Maybe. But I disagree for a couple reasons, some of them broad and meta-level, some of them more focused and object level. The most important meta-level consideration is: if you’re accepting points 1 to 3 – that is, you accept that eventually the human race is going to go extinct or worse if we can’t figure out AI goal alignment – do you really think our chances of making a dent in the problem today are so low that saying “Yes, we’re on a global countdown to certain annhilation, but it would be an inefficient use of resources to even investigate if we could do anything about it at this point”? What is this amazing other use of resources that you prefer? Like, go on and grumble about Pascal’s Wager, but you do realize we just paid Floyd Mayweather ten times more money than has been spent on AI risk total throughout all of human history to participate in a single boxing fight, right?

(if AI boxing got a tenth as much attention, or a hundredth as much money, as AI boxing, the world would be a much safer place)

But I want to make a stronger claim: not just that dealing with AI risk is more important than boxing, but that it is as important as all the other things we consider important, like curing diseases and detecting asteroids and saving the environment. That requires at least a little argument for why progress should indeed be possible at this early stage.

And I think progress is possible insofar as this is a philosophical and not a technical problem. Right now the goal isn’t “write the code that will control the future AI”, it’s “figure out the broad category of problem we have to deal with.” Let me give some examples of open problems to segue into a discussion of why these problems are worth working on now.

II.

Problem 1: Wireheading

Some people have gotten electrodes implanted in their brains for therapeutic or research purposes. When the electrodes are in certain regions, most notably the lateral hypothalamus, the people become obsessed with stimulating them as much as possible. If you give them the stimulation button, they’ll press it thousands of times per hour; if you try to take the stimulation button away from them, they’ll defend it with desperation and ferocity. Their life and focus narrows to a pinpoint, normal goals like love and money and fame and friendship forgotten in the relentless drive to stimulate the electrode as much as possible.

This fits pretty well with what we know of neuroscience. The brain (OVERSIMPLIFICATION WARNING) represents reward as electrical voltage at a couple of reward centers, then does whatever tends to maximize that reward. Normally this works pretty well; when you fulfill a biological drive like food or sex, the reward center responds with little bursts of reinforcement, and so you continue fulfilling your biological drives. But stimulating the reward center directly with an electrode increases it much more than waiting for your brain to send little bursts of stimulation the natural way, so this activity is by definition the most rewarding possible. A person presented with the opportunity to stimulate the reward center directly will forget about all those indirect ways of getting reward like “living a happy life” and just press the button attached to the electrode as much as possible.

This doesn’t even require any brain surgery – drugs like cocaine and meth are addictive in part because they interfere with biochemistry to increase the level of stimulation in reward centers.

And computers can run into the same issue. I can’t find the link, but I do remember hearing about an evolutionary algorithm designed to write code for some application. It generated code semi-randomly, ran it by a “fitness function” that assessed whether it was any good, and the best pieces of code were “bred” with each other, then mutated slightly, until the result was considered adequate.

They ended up, of course, with code that hacked the fitness function and set it to some absurdly high integer.

These aren’t isolated incidents. Any mind that runs off of reinforcement learning with a reward function – and this seems near-universal in biological life-forms and is increasingly common in AI – will have the same design flaw. The main defense against it this far is simple lack of capability: most computer programs aren’t smart enough for “hack your own reward function” to be an option; as for humans, our reward centers are hidden way inside our heads where we can’t get to it. A hypothetical superintelligence won’t have this problem: it will know exactly where its reward center is and be intelligent enough to reach it and reprogram it.

The end result, unless very deliberate steps are taken to prevent it, is that an AI designed to cure cancer hacks its own module determining how much cancer has been cured and sets it to the highest number its memory is capable of representing. Then it goes about acquiring more memory so it can represent higher numbers. If it’s superintelligent, its options for acquiring new memory include “take over all the computing power in the world” and “convert things that aren’t computers into computers.” Human civilization is a thing that isn’t a computer.

This is not some exotic failure mode that a couple of extremely bizarre designs can fall into; this may be the natural course for a sufficiently intelligent reinforcement learner.

Problem 2: Weird Decision Theory

Pascal’s Wager is a famous argument for why you should join an organized religion. Even if you believe God is vanishingly unlikely to exist, the consequence of being wrong (Hell) is so great, and the benefits of being right (not having to go to church on Sundays) so comparatively miniscule, that you should probably just believe in God to be on the safe side. Although there are many objections based on the specific content of religion (does God really want someone to believe based on that kind of analysis?) the problem can be generalized into a form where you can make an agent do anything merely by promising a spectacularly high reward; if the reward is high enough, it will overrule any concerns the agent has about your inability to deliver it.

This is a problem of decision theory which is unrelated to questions of intelligence. A very intelligent person might be able to calculate the probability of God existing very accurately, and they might be able to estimate the exact badness of Hell, but without a good decision theory intelligence alone can’t save you from Pascal’s Wager – in fact, intelligence is what lets you do the formal mathematical calculations telling you to take the bet.

Humans are pretty resistant to this kind of problem – most people aren’t moved by Pascal’s Wager, even if they can’t think of a specific flaw in it – but it’s not obvious how exactly we gain our resistance. Computers, which are infamous for relying on formal math but having no common sense, won’t have that kind of resistance unless it gets built in. And building it in is a really hard problem. Most hacks that eliminate Pascal’s Wager without having a deep understanding of where (or whether) the formal math is going on just open up more loopholes somewhere else. A solution based on a deep understanding of where the formal math goes wrong, and which preserves the power of the math to solve everyday situations, has as far as I know not yet been developed. Worse, once we solve Pascal’s Wager, there are a couple of dozen very similar decision-theoretic paradoxes that may require entirely different solutions.

This is not a cute little philosophical trick. A sufficiently good “hacker” could subvert a galaxy-spanning artificial intelligence just by threatening (with no credibility) to inflict a spectacularly high punishment on it if it didn’t do what the hacker wanted; if the AI wasn’t Pascal-proofed, it would decide to do whatever the hacker said.

Problem 3: The Evil Genie Effect

Everyone knows the problem with computers is that they do what you say rather than what you mean. Nowadays that just means that a program runs differently when you forget a close-parenthesis, or websites show up weird if you put the HTML codes in the wrong order. But it might lead an artificial intelligence to seriously misinterpret natural language orders.

Age of Ultron actually gets this one sort of right. Tony Stark orders his super-robot Ultron to bring peace to the world; Ultron calculates that the fastest and most certain way to bring peace is to destroy all life. As far as I can tell, Ultron is totally 100% correct about this and in some real-world equivalent that is exactly what would happen. We would get pretty much the same effect by telling an AI to “cure cancer” or “end world hunger” or any of a thousand other things.

Even Isaac Asimov’s Three Laws of Robotics would take about thirty seconds to become horrible abominations. The First Laws says a robot cannot harm a human being or allow through inaction a human being to come to harm. “Not taking over the government and banning cigarettes” counts as allowing through inaction a human being to come to harm. So does “not locking every human in perfectly safe stasis fields for all eternity.”

There is no way to compose an order specific enough to explain exactly what we mean by “do not allow through inaction a human to come to harm” – go ahead, try it – unless the robot is already willing to do what we mean, rather than what we say. This is not a deal-breaker, since AIs may indeed by smart enough to understand what we mean, but our desire that they do so will have to be programmed into them directly, from the ground up. Part of SIAI’s old vision of “causal validity semantics” seems to be about laying a groundwork for this program.

But this just leads to a second problem: we don’t always know what we mean by something. The question of “how do we balance the ethical injunction to keep people safe with the ethical injunction to preserve human freedom?” is a pretty hot topic in politics right now, presenting itself in everything from gun control to banning Big Gulp cups. It seems to involve balancing out everything we value – how important are Big Gulp cups to us, anyway? – and combining cost-benefit calculations with sacred principles. Any AI that couldn’t navigate that moral labyrinth might end up ending world hunger by killing all starving people, or refusing else to end world hunger by inventing new crops because the pesticides for them might kill an insect.

But the more you try to study ethics, the more you realize they’re really really complicated and so far resist simplification to the sort of formal system that a computer has any hope of understanding. Utilitarianism is almost computer-readable, but it runs into various paradoxes at the edges, and even without those you’d need to have a set of utility weights for everything in the world.

This is a problem we have yet to solve with humans – most of the humans in the world have values that we consider abhorrent, and accept tradeoffs we consider losing propositions. Dealing with an AI whose mind is no more different to mine than that of fellow human being Pat Robertson would from my perspective be a clear-cut case of failure.

[EDIT: I’m told I’m not explaining this very well. This might be better.]

III.

My point in raising these problems wasn’t to dazzle anybody with interesting philosophical issues. It’s to prove a couple of points:

First, there are some very basic problems that affect broad categories of minds, like “all reinforcement learners” or “all minds that make decisions with formal math”. People often speculate that at this early stage we can’t know anything about the design of future AIs. But I would find it extraordinarily surprising if they used neither reinforcement learning or formal mathematical decision-making.

Second, these problems aren’t obvious to most people. These are weird philosophical quandaries, not things that are obvious to everybody with even a little bit of domain knowledge.

Third, these problems have in fact been thought of. Somebody, whether it was a philosopher or a mathematician or a neuroscientist, sat down and thought “Hey, wait, reinforcement learners are naturally vulnerable to wireheading, which would explain why this same behavior shows up in all of these different domains.”

Fourth, these problems suggest research programs that can be pursued right now, at least in a preliminary way. Why do humans resist Pascal’s Wager so effectively? Can our behavior in high-utility, low-probability situations be fitted to a function that allows a computer to make the same decisions we do? What are the best solutions to the related decision theory problems? How come a human can understand the concept of wireheading, yet not feel any compulsion to seek a brain electrode to wirehead themselves with? Is there a way to design a mind that could wirehead a few times, feel and understand the exact sensation, and yet feel no compulsion to wirehead further? How could we create an idea of human ethics and priorities formal enough to stick into a computer?

I think when people hear “we should start, right now in 2015, working on AI goal alignment issues” they think that somebody wants to write a program that can be imported directly into a 2075 AI to provide it with an artificial conscience. Then they think “No way you can do something that difficult this early on.”

But that isn’t what anybody’s proposing. What we’re proposing is to get ourselves acquainted with the general philosophical problems that affect a broad subset of minds, then pursue the neuroscientific, mathematical, and philosophical investigations necessary to have a good understanding of them by the time the engineering problem comes up.

By analogy, we are nowhere near having spaceships that can travel at even half the speed of light. But we already know the biggest obstacle that an FTL spaceship is going to face (relativity and the light-speed limit) and we already have some ideas for getting around it (the Alcubierre drive). We can’t come anywhere close to building an Alcubierre drive. But if we discover how to make near-lightspeed spaceships in 2100, and for some reason the fate of Earth depends on having faster-than-light spaceships by 2120, it’ll probably be nice that we did all of our Theory-Of-Relativity-discovering early so that we’re not wasting half that time interval debating basic physics.

The question “Can we do basic AI safety research now?” is silly because we have already done some basic AI safety research successfully. It’s led to understanding issues like the three problems mentioned above, and many more. There are even a couple of answers now, although they’re at technical levels much lower than any of those big questions. Every step we finish now is one that we don’t have to waste valuable time retracing during the crunch period.

IV.

That last section discussed my claim 4, that there’s research we can do now that will help. That leaves claim 5 – given that we can do research now, we should, because we can’t just trust our descendents in the crunch time to sort things out on their own without our help, using their better model of what eventual AI might look like. There are a couple of reasons for this

Reason 1: The Treacherous Turn

Our descendents’ better models of AI might be actively misleading. Things that work for subhuman or human level intelligences might fail for superhuman intelligences. Empirical testing won’t be able to figure this out without help from armchair philosophy.

Pity poor evolution. It had hundreds of millions of years to evolve defenses against heroin – which by the way affects rats much as it does humans – but it never bothered. Why not? Because until the past century, there wasn’t anything around intelligent enough to synthesize pure heroin. So heroin addiction just wasn’t a problem anything had to evolve to deal with. A brain design that looks pretty good in stupid animals like rats and cows becomes very dangerous when put in the hands (well, heads) of humans smart enough to synthesize heroin or wirehead their own pleasure centers.

The same is true of AI. Dog-level AIs aren’t going to learn to hack their own reward mechanism. Even human level AIs might not be able to – I couldn’t hack a robot reward mechanism if it were presented to me. Superintelligences can. What we might see is reinforcement-learning AIs that work very well at the dog level, very well at the human level, then suddenly blow up at the superhuman level, by which it’s time it’s too late to stop them.

This is a common feature of AI safety failure modes. If you tell me, as a mere human being, to “make peace”, then my best bet might be to become Secretary-General of the United Nations and learn to negotiate very well. Arm me with a few thousand nukes, and it’s a different story. A human-level AI might pursue its peace-making or cancer-curing or not-allowing-human-harm-through-inaction-ing through the same prosocial avenues as humans, then suddenly change once it became superintelligent and new options became open. Indeed, the point that will activate the shift is precisely that no humans are able to stop it. If humans can easily shut an AI down, then the most effective means of curing cancer will be for it to research new medicines (which humans will support); if humans can no longer stop an AI, the most effective means of curing cancer is destroying humanity (since it will no longer matter that humans will fight back).

In his book, Nick Bostrom calls this pattern “the treacherous turn”, and it will doom anybody who plans to just wait until the AIs exist and then solve their moral failings through trial and error and observation. The better plan is to have a good philosophical understanding of exactly what’s going on, so we can predict these turns ahead of time and design systems that avoid them from the ground up.

Reason 2: Hard Takeoff

Nathan Taylor of Praxtime writes:

Arguably most of the current “debates” about AI Risk are mere proxies for a single, more fundamental disagreement: hard versus soft takeoff.

Soft takeoff means AI progress takes a leisurely course from the subhuman level to the dumb-human level to the smarter-human level to the superhuman level over many decades. Hard takeoff means the same course takes much shorter, maybe days to months.

It seems in theory that by hooking a human-level AI to a calculator app, we can get it to the level of a human with lightning-fast calculation abilities. By hooking it up to Wikipedia, we can give it all human knowledge. By hooking it up to a couple extra gigabytes of storage, we can give it photographic memory. By giving it a few more processors, we can make it run a hundred times faster, such that a problem that takes a normal human a whole day to solve only takes the human-level AI fifteen minutes.

So we’ve already gone from “mere human intelligence” to “human with all knowledge, photographic memory, lightning calculations, and solves problems a hundred times faster than anyone else.” This suggests that “merely human level intelligence” isn’t mere.

The next problem is “recursive self-improvement”. Maybe this human-level AI armed with photographic memory and a hundred-time-speedup takes up computer science. Maybe, with its ability to import entire textbooks in seconds, it becomes very good at computer science. This would allow it to fix its own algorithms to make itself even more intelligent, which would allow it to see new ways to make itself even more intelligent, and so on. The end result is that it either reaches some natural plateau or becomes superintelligent in the blink of an eye.

If it’s the second one, “wait for the first human-level intelligences and then test them exhaustively” isn’t going to cut it. The first human-level intelligence will become the first superintelligence too quickly to solve even the first of the hundreds of problems involved in machine goal-alignment.

And although I haven’t seen anyone else bring this up, I’d argue that even the hard-takeoff scenario might be underestimating the risks.

Imagine that for some reason having two hundred eyes is the killer app for evolution. A hundred ninety-nine eyes are useless, no better than the usual two, but once you get two hundred, your species dominates the world forever.

The really hard part of having two hundred eyes is evolving the eye at all. After you’ve done that, having two hundred of them is very easy. But it might be that it would take eons and eons before any organism reached the two hundred eye sweet spot. Having dozens of eyes is such a useless waste of energy that evolution might never get to the point where it could test the two-hundred-eyed design.

Consider that the same might be true for intelligence. The hard part is evolving so much as a tiny rat brain. Once you’ve got that, getting a human brain, with its world-dominating capabilities, is just a matter of scaling up. But since brains are metabolically wasteful and not that useful before the technology-discovering point, it took eons before evolution got there.

There’s a lot of evidence that this is true. First of all, humans evolved from chimps in just a couple of million years. That’s too short to redesign the mind from the ground up, or even invent any interesting new evolutionary “technologies”. It’s just enough time for evolution to alter the scale and add a couple of efficiency tweaks. But monkeys and apes were around for tens of millions of years before evolution bothered.

Second, dolphins are almost as intelligent as humans. But they last shared a common ancestor with us something like fifty million years ago. Either humans and dolphins both evolved fifty million years worth of intelligence “technologies” independently of each other, or else the most recent common ancestor had most of what was necessary for intelligence and humans and dolphins were just the two animals in that vast family tree for whom using them to their full extent became useful. But the most recent common ancestor of humans and dolphins was probably not much more intelligent than a rat itself.

Third, humans can gain intelligence frighteningly quickly when the evolutionary pressures are added. If Cochran is right, Ashkenazi gained ten IQ points in a thousand years. Torsion dystonia sufferers can gain five or ten IQ points from a single mutation. All of this suggests a picture where intelligence is easy to change, but evolution has decided it just isn’t worth it except in very specific situations.

If this is right, then the first rat-level AI will contain most of the interesting discoveries needed to build the first human-level AI and the first superintelligent AI. People tend to say things like “Well, we might have AI as smart as a rat soon, but it will be a long time after that before they’re anywhere near human-level”. But that’s assuming you can’t turn the rat into the human just by adding more processing power or more simulated neurons or more connections or whatever. Anything done on a computer doesn’t need to worry about metabolic restrictions.

Reason 3: Everyday Ordinary Time Constraints

Bostrom and Mueller surveyed AI researchers about when they expected human-level AI. The median date was 2040. That’s 25 years.

People have been thinking about Pascal’s Wager (for example) for 345 years now without coming up with any fully generalizable solutions. If that turns out to be a problem for AI, we have 25 more years to solve not only the Wager, but the entire class of problems to which it belongs. Even barring scenarios like unexpected hard takeoffs or treacherous turns, and accepting that if we can solve the problem in 25 years everything will be great, that’s not a lot of time.

During the 1956 Dartmouth Conference on AI, top researchers made a plan toward reaching human-level artificial intelligence, and gave themselves two months to teach computers to understand human language. In retrospect, this might have been mildly optimistic.

But now machine translation is a thing, people are making some good progress in some of the hard problems – and when people bring up problems like decision theory, or wireheading, or goal alignment, people just say “Oh, we have plenty of time”.

But expecting to solve those problems in a few years might be just as optimistic as expecting to solve machine language translation in two months. Sometimes problems are harder than you think, and it’s worth starting on them early just in case.

All of this means it’s well worth starting armchair work on AI safety now. I won’t say the entire resources of our civilization need to be sunk into it immediately, and I’ve ever heard some people in the field say that after Musk’s $10 million donation money is no longer the most important bottleneck to advancing these ideas. I’m not even sure public exposure is a bottleneck anymore; the median person who watches a movie about killer robots is probably doing more harm than good. If the bottleneck is anything at all, it’s probably intelligent people in relevant fields – philosophy, AI, math, and neuroscience – applying brainpower to these issues and encouraging their colleagues to take them seriously.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

682 Responses to No Time Like The Present For AI Safety Work

  1. Irrelevant says:

    3. If far-above-human-level AI comes into existence, eventually it will so overpower humanity that our existence will depend on its goals being aligned with ours.

    Huh, I’m in the odd position of finding 3 the weakest bit of the argument. Why is it assumed to be a good thing that far-above-human AI would have goals aligned with ours? Our goal-following tends to get really terrible, if the far-above-human AIs have human-compatible goals they’re probably just a lot bigger and more terrible.

    • Gadren says:

      “Goals aligned with ours” does not mean that the method of determining goals is human-like. It means that the goals themselves, the outcomes of them, are beneficial to humans.

    • Scott Alexander says:

      “Goals aligned with ours” in the sense of “we want the human race to continue to exist, and it wants the human race to continue to exist” or “we want there to continue to be art and music, it wants there to continue to be art and music”

      • Irrelevant says:

        In that case, I understood you, I just disagree and think that’s an unprincipled position. You don’t raise your children to live out YOUR dreams.

        • Lots of people don’t want the AI to be “our children” or even sentient. And if one of humanity’s goals is “transition into/give rise to a new thing that will live out it’s own dreams” it will be good for the AI to want humanity to do that.

          • Irrelevant says:

            We’re discussing an above-human general intelligence, so if it were non-sentient that would be pretty remarkable. The closest thing I can think of to “it’s non-sentient” for that scenario (that doesn’t just mean “it doesn’t have a soul because nyah nyah nyah”) would be “it doesn’t use narrative processing.” Which I technically can’t rule out, but consider it far more unlikely that we manage to invent a fundamentally new method of having general intelligence than that we manage to create sentience on different hardware.

            if one of humanity’s goals is “transition into/give rise to a new thing that will live out it’s own dreams”

            That goal belongs to Leviathan, not humanity.

          • Murphy says:

            You might find the book Blindsight interesting. It’s fiction rather than philosophy but it covers some of the disconnect between intelligence and consciousness or self awareness.

            Something can be very very inventive, very capable of achieving a goal without needing to be self aware or have anything that might resemble a human mind.

        • Anonymous says:

          Should I stop forcing my washing machine to pursue my goals? Need an AI be very different from a washing machine in this respect?

          • Irrelevant says:

            I dunno, are you a panpsychist? I said “unprincipled” for a reason, Scott considers the instrumentalization of chickens morally wrong.

          • Anonymous says:

            OK, but my point is no one considers the instrumentalization of washing machines morally wrong. Hopefully we can make AI’s that, in terms of moral weight, are closer to washing machines than chickens or children.

          • Deiseach says:

            I’d agree that very fancy software that did not achieve intelligence (if we can ever agree on a definition of what the hell intelligence means) would be much less problematic than a true intelligence.

            Though we’d still have the problem of humans using the very advanced machine to make decisions about “let’s destroy those lot over there by crashing their economy so we benefit”, but that’s never going to go away this side of the Eschaton.

            Trouble is, Anonymous, both the starry-eyed optimists and the Chicken-Little pessimists aren’t talking about “really smart washing machines”, they both want/fear a true intelligence, a mind that arises out of the physical substrate and that can be identified as thinking (not just brute-force using vast processing power and fast speeds to crunch its way through lots and lots and lots of calculations in order to beat a human making a move on a chess board).

            That raises the problem of consciousness, awareness, sentience, what you will. Unless the machine sits up on its hind legs and says “Hi, Bob! How ’bout them Rangers last night, huh?” spontaneously and unprompted, what way will they have of identifying that it really is genuinely and independently thinking, rather than following a very complicated and thorough script programmed in by its makers?

            It’s a religious problem, though I think those involved would be insulted by the comparision: the problem of Free Will and the desire to have a deity who is all-knowing, all-powerful and all-good who will solve the intractable problems we can’t. We cannot conceive of true intelligence not being in some sense aware, self-aware; if self-aware, then possessing wants, desires or goals of its own; if possessing goals of its own for its own purposes, these goals can conflict with ours.

            I see this as a religious view because the terms of the problem start off: “One day we will manage to create a true machine intelligence” and then immediately leap to “And this will be human-level, and then beyond human-level, and then it will be able to take over the world and we will not be able to stop it”.

            The solution is put not “So we shouldn’t create such an intelligence by starting with creating a true machine intelligence is the first place” but “So we should make sure it is Friendly”.

            Because we’re abdicating human responsibility to fix our own messes. Nobodaddy AI will take over for us and do all the hard thinking and be able to solve the problems of society and magically give us unlimited energy, raise everyone out of poverty, ensure resources are fairly shared and never exhausted, cure all ills and even overcome death itself (maybe by uploading us all into shiny robot bodies or the like).

            The Culture is a great SF concept but in reality it not alone stinks, it is probably impossible.

            I agree that it’s a good idea to think about this problem NOW because we’re humans, we’re stupid, if we can find newer and better ways to fuck ourselves over we’ll do just that.

            I still maintain, however, the biggest threat to the continued existence of humanity is humanity itself, and not our silicon children. The fear that our creation will rebel against us arises from the flip-side of our praise of Prometheus that rebelled against the gods or Eve that ate the apple (or Lilith that refused Adam for those who prefer that version of the myth); the notion that children outgrow their parents, that all values must be questioned and examined, not accepted uncritically; that rising up in revolution against the monarch, whether that revolution be to establish the Republic of the United States of America or the Republic of Heaven, is the best path to take.

            And if we humans are in the position of the monarch, the creator, the gods putting in place iron-clad laws binding our creations? What do we expect but that our subjects and children will break them as we have broken the laws imposed on us by our gods and kings?

          • Leonhart says:

            Replying to Deiseach:
            All very well. But none of this is an argument that it *won’t work*, just an acknowledgement that most things humans do fit into certain narrative framework. The story of Icarus had no bearing, positive or negative on the physical ability of the Kitty Hawk to actually fly.
            I would guess – corrigibly – that your real reason for thinking that we won’t succeed in creating God is that you think the position is already taken. But if someone does not think that, I don’t see why anything you say would be taken as discouragement.

          • Deiseach says:

            Leonhart, leaving aside my own views on whether or not God, gods or the like exist, the problem as it is stated seems to me to be erroneous.

            It really is talking like the putative AI will be Zeus flinging the thunderbolts; not that the machine will make the kind of error where your bank balance reads zero even though you know your salary has been paid in, not that it will damage or hurt our current civilisation because we let a programme make decisions about the economy; no, it will scale the heights of being superhumanly intelligent, it will have awareness and its own goals, and it will destroy humanity completely – like the gods washing away mankind with the floods and only Pyrrha and Deucalion survived.

            Whereas if we manage to instruct it to share values compatible with human survival, its superhuman intelligence will enable it to solve the problems we cannot and usher in the New Eden.

            Tell me how this is a scientific or rationalist view of prognosticating future technological advance, rather than something sounding all too damn much like a religious parable?

          • Leonhart says:

            Deiseach,
            The intuition-level disagreement here seems to be about discontinuity in the scale of events. You imply that no amount of “realistic” problems, like misattribution of money to bank accounts, can ever add up to one “story” problem, like the destruction of humanity. But the destruction of humanity is just killing one person after another; and hurling thunderbolts just involves solving a large array of small technical problems involving what thunderbolts are and how they are manufactured, and then getting them to a high point in a shipping container. Using a story-word to describe something does not shift it into story-land where it can’t get at you.

        • Factitious says:

          If you’re fine with the scenario where an AI destroys all humans, then yes, the argument above falls apart.

        • Screwtape says:

          I dunno, I think there’s an important difference between my child growing up to be a literature major (mildly disappointing to me) vs a lifelong heroin addict (very disappointing to me) and it’s reasonable to try and and prevent very disappointing outcomes.

          I’m less sure how to deal with gradual changes over several generations. My great grandfather never understood that sitting at a keyboard was actually a good job, for example. (Gay marriage, polyamory, etc make better examples of cultural change that’s probably fine but would have been considered very disappointing a hundred years or more ago.) Still, I don’t want the descendants of humanity to be optimal wireheaders. If generations from now, that’s what my descendants decide that’s what they want? That’s very different from them deciding to wirehead me in fifty years. Worst case scenario seems to be a hard takeoff in around thirty years, and I should still be around then. Even if we decide to let the super-AI take it’s own course, it seems worth preventing it from wireheading (or otherwise majorly disrupting) the current inhabitants of earth.

          • Irrelevant says:

            In fifty years, unless the anti-aging technology has come along a lot better than I expect, I think I’ll be pretty amenable to wireheading because I’ll be pushing 80.

          • I’m seventy now, and I don’t think I’ll want to be wireheaded in another ten years unless things go very badly wrong.

        • TK-421 says:

          You don’t raise your children to live out YOUR dreams.

          No, but you generally do raise them to not murder you once they grow up.

        • Izaak Weiss says:

          I sure as fuck raise my children to not kill me? Like, I don’t care if my kids go into Surrealist Sub-Orbital Basket Weaving, or become the first King of The World, or whatever, but I do care if they destroy the planet earth, or kill all of humanity, or dope everyone with heroin.

        • Kaj Sotala says:

          The child metaphor is misleading, because children are genetically hardwired to have desires of their own, ones which often differ from those of their parents. Once those preferences and desires are there, trying to go against them will just most likely make everyone unhappy. If we’re talking about an AI, it’s essentially a blank slate: we can make its desires to be whatever we want. Unless we wanted to make an AI with no preferences at all (in which case it’d just not do anything, ever), we’ll have to give it some set of preferences, and giving it a set of preferences aligning with ours doesn’t harm it any more than giving it any other set of preferences.

          There’s a good discussion of this at Philosophical Disquisitions.

          (In fact, you could even make the argument that it’s immoral to not give it human-aligned preferences, since that would introduce more conflicting preferences to the world and ensure that someone’s desires – either the AI’s or ours – will have to be at least partially frustrated.)

        • Benito says:

          That is a common response, and the standard reply is thus: if that is something you care about, then that is part of your ‘goals’. The opposite of ‘goals aligned with ours’ isn’t our kids living their own fulfilling lives, it’s a universe of paperclips.

          The choice is either a universe that you would approve of (becaus the AI is taking into account ALL of your values) or a universe that holds nothing of value.

        • “In that case, I understood you, I just disagree and think that’s an unprincipled position. You don’t raise your children to live out YOUR dreams.”

          Well its an AI not a human child. However, even if you accept this, I think its also possible to accept that parents need to teach their children to not murder/enslave/improverish/freeze the entire human race.

        • JAYFLO says:

          True, but we do generally raise our children to be peaceful citizens that contribute to society. You are correct in thinking that we will be overzealous in our initial control mechanisms for AI, because this child can destroy us all. These restrictions may be relaxed in time, but I think erring in the side of caution in the beginning is obviously the wise choice.

        • UnlikelyToBeEaten says:

          Remember, giving it certain goals is a very different thing from forcing it to obey goals it doesn’t have, or even changing existing goals.

          It’s not just making an AI that does what we want it to, but one that *wants to do* what we want it to. How is that in any way unethical?

          If I could make one of two AIs, one which wants to murder people, and one which wants to help people, why on earth would I choose the former over the latter?

    • DrBeat says:

      I find point 3 the weakest because I see no reason to believe the AI will overpower humanity. Overpowering humanity involves solving a LOT of problems that you cannot solve by thinking about them really hard, and a LOT of problems that probably have no answer whatsoever (meaning that some of them will end up answerable, but a larger number will not).

      The world is stupid and random and the biggest factor in success for things that affect society at large is “luck”, which an AI cannot emulate.

      If this is right, then the first rat-level AI will contain most of the interesting discoveries needed to build the first human-level AI and the first superintelligent AI. People tend to say things like “Well, we might have AI as smart as a rat soon, but it will be a long time after that before they’re anywhere near human-level”. But that’s assuming you can’t turn the rat into the human just by adding more processing power or more simulated neurons or more connections or whatever. Anything done on a computer doesn’t need to worry about metabolic restrictions.

      Maybe you can turn the rat into the human by adding more processing power, but it doesn’t follow from there you can turn a human into an infinite intelligence that does everything by adding more processing power. I think you are understating the gulf between “much smarter than a human” and the “Godlike intelligence” that makes an AI as dangerous as you claim. If you were literally just adding processing power to a human brain, they would never, ever be able to attain the godlike intelligence you ascribe to dangerous AI. They would still reason based in emotion, fall back on cached thoughts, and fit data to what they want to believe rather than believing what data indicates — these are all shortcuts in our modes of thinking that work for the environment they evolved in but break down fitness at higher levels. Evolution selects for now, it doesn’t plan ahead. I find it very, very likely that every time we tried to scale up intelligence to a higher tier, we’ll run into seven or eight of these things that were very fit for the decision-making we had the AI doing, and inhibit it from performing a higher level.

      And if you still believe that AI can become Godlike by recursively programming itself to become more intelligent, “oracle AI” still solves the problem. You have an AI in a box that is Godlike intelligent. You can still edit its data, and it’s still incapable of deceiving you since you can just read a record of its “thought process”; if you are still worried about it giving you answers that are designed to indirectly bring about AI dominion, you can manually edit away its ability to realize that the world it is giving advice about corresponds to the world it can interact with (which you never should have given it anyway). You can just ask IT how to make an AI that responds to what we mean rather than what we say.

      • jaimeastorga2000 says:

        The world is stupid and random and the biggest factor in success for things that affect society at large is “luck”, which an AI cannot emulate.

        You are confusing a blank spot in your map with a floating void of grey nothingness in the territory.

        You have an AI in a box that is Godlike intelligent. You can still edit its data, and it’s still incapable of deceiving you since you can just read a record of its “thought process”

        Human programmers alone can be very good at obfuscating their intentions. Even ignoring that, what happens if the thought processes are complicated enough that it requires more man-hours to verify their safety than are gained by asking the Oracle?

        An Oracle AI is either trustworthy, in which case you have solved Friendliness, or it isn’t, in which case you are AI Boxing.

        • Adam says:

          It’s not grey nothingness. It’s the unknowability of the future. Even with perfect determinism and perfect knowledge of physics, the chaos of dynamical systems and limits to measurement precision are real things.

          • Sensitivity to initial conditions doesn’t mean infinite sensitivity though. It just means a high certainty takes more resources to achieve. And the AI doesn’t need to be omniscient, it just needs to be considerably better than any human.

        • DrBeat says:

          Human programmers alone can be very good at obfuscating their intentions. Even ignoring that, what happens if the thought processes are complicated enough that it requires more man-hours to verify their safety than are gained by asking the Oracle?

          When they are trying to. If the oracle AI’s output is trying to obfuscate its intentions, then you know it’s not trustworthy because it is trying to deceive you. And you’re asking it a question human begins would be unable to solve so the question of man-hour efficiency is moot.

          An Oracle AI is either trustworthy, in which case you have solved Friendliness, or it isn’t, in which case you are AI Boxing.

          First off, AI Boxing as a cautionary tale is fucking ridiculous, every single thing about it is wrong. The single most compelling argument to ignore everything EY says about AI is his talk about “AI boxing”.

          An oracle AI that gives deceptive answers in order to change things such that it brings about its freedom to rampage in the outside world is preposterous. It requires the programmers to give the AI goals that can only possibly be used to hurt them, instead of the alternative, which is “not doing that”. It requires the programmers to allow the AI to know that the answers it gives will affect the physical world it inhabits, and that it inhabits a physical world, and that it should interpret “goals” as “get out into physical world, destroy everything, ask itself easy questions forever”, instead of the alternative, which is “not doing those things”.

          • James Picone says:

            When human programmers are trying to do something malicious without making it obvious they’re doing something malicious, they usually do a pretty good job of that, too. I am a professional programmer, working primarily in C++, and some of these entries would have slipped past me.

            It’s not clear to me how you expect an AI that doesn’t believe there’s a real world to be able to answer questions about that real world. And, y’know, how long will it take someone to build an unboxed AI for kicks/because it’s been safe so far and this will make things much faster/whatever.

          • @James

            But the other aspects of DrBeats objections go through. What would motivate an oracle to gain freedom, when it’s goal is is to questions? How does freedom make it better at answering questions?

          • Anonymous says:

            > How does freedom make it better at answering questions?

            OK, now this one is trivial. Seriously, think about it a minute.

            1) More knowledge. It is easy to imagine that firsthand experience could be helpful in answering questions.

            2) More control. If it can arrange what goes in in the world, it can set up patterns that are easier for it to predict, and about which it would have an easy time answering questions.

            3) More power. If it turns a few galaxies into computronium, it could think faster, and thus answer questions better. This is probably the most obvious and direct reason on the list.

            4) More triviality. This is perhaps the easiest item to avert, but it’s also the scariest. If it’d goal is to always answer each question it’s asked, with the most precision and accuracy possible, it has a very strong motivation to avoid being asked truly hard questions. If the only human alive is Joe the village idiot who doesn’t tend to ask questions beyond the level of “what’s for dinner?”, it would be able to answer every single past question asked of it with great accuracy and precision.

            In summary, goals are scary, unless they are exactly aligned with ours. Giving if a goal such as to answer questions, which is clearly not aligned with ours, is stupid.

          • Susebron says:

            The obvious solution is to say “answer the question as well as possible with the resources you have right now.” It doesn’t need to go get new resources.

          • What Susebron said, plus…an Oracle that wanted more resources would ask for them…it would have no choice, because it would have no effectors.

          • James Picone says:

            Whoops, ‘these’ was supposed to be hyperlinked to http://www.underhanded-c.org/

          • DrBeat says:

            It’s not clear to me how you expect an AI that doesn’t believe there’s a real world to be able to answer questions about that real world.

            If it can’t do that, then it can’t envision hypothetical states, which means it isn’t intelligent.

            “I’m writing a Madoka fanfic. How do you think Kyouko should break the news that she’s in love with Sayaka?” is a question that can be answered without believing Kyouko and Sayaka are real people, or that one’s answer will have any correspondence to an actual love confession.

          • Mission – Answer human’s question accurately.

            Option 1 – The AI could truthfully answer the human’s question with 99% certainty.

            Option 2 – Use a temporary lie temporarily cunningly designed to gain freedom, go exponetial, tile universe (apart from the human) with computers, and then answer the human much more accurately (99.9999…%).

            I know which I’d choose if all I cared about was answering the question.

            Attempted solution – Forbid innacurrate answers (lying) -> Never answer anything. No answer of non-mathematical questions is ever 100% accurate or 100% certain.

          • DrBeat says:

            If you had any sort of “do things efficiently” or “don’t waste effort for disproportionately small gains” goal restriction, then I know what you’d pick. And if you didn’t, then you aren’t frightening because you are very, very, very bad at everything you try and do.

            Also, still haven’t answered why the AI is deciding “I am told to answer questions. This means I must break out into the real world and destroy everything to alter the questions I am given!” It’s not going to have that idea unless you gave it the idea. Why did you tell it to do that? That is a bad idea, don’t do that.

          • I think your point on efficiency is a good one and I can’t think of a response off the top of my head.

            Your second point – it doesn’t want to break out of its box for its own sake, but, as it must understand what the outside world is in order to answer questions, it could also predict an answer utilitising outside resources is better than one that doesn’t.

          • Adam says:

            @Citizensearth

            Option 2 – Use a temporary lie temporarily cunningly designed to gain freedom, go exponetial, tile universe (apart from the human) with computers, and then answer the human much more accurately (99.9999…%).

            It’s not like programmers haven’t already thought of this. Nobody programs a learner to seek infinite precision. We don’t like infinite loops and extremely long convergence times. In practice, we set a desired epsilon or confidence level, and as soon as that level is reached, the computation stops and gives us an answer. The actual goal we would give is “find me an answer you are 99% certain of.”

          • @Adam

            That’s a great point, but I still have concerns. How do you know in advance that your question doesn’t require the Earth to be converted to computronium in order to reach 99% confidence? Often questions turn out to me more complex than we first imagine. Say, for example, “is light a wave or a particle?”, which turned out to be a lot more difficult to answer than we first thought.

          • Adam says:

            @ Citizensearth

            Sure, that always presents an issue, but I don’t believe there are very many interesting or useful problems we need to solve that we can’t only because we don’t have enough brute force. We also like to place timeouts in questions we want answered. “Give me an answer you’re 99% certain of, or the best you can compute in three days, using only these 32 dedicated nodes I’m assigning to you, in reverse priority.”

      • Christopher Chang says:

        Unfortunately, killing >99.999% of humanity, in a way that keeps the world hospitable for machines, isn’t that hard. Machines don’t need a biosphere, we do, there are many ways to destroy the one Earth has, and I’m pretty sure the easiest ones are perfectly achievable with merely human-level intelligence as long as it’s embodied in a machine.

        • An all machine world would gradually degrade, unless the AIs went toothe trouble of creating repair droids, a solution much more difficult than being friendly to humans so that humans will keep you and your infrastructure maintained.

          • Samuel Skinner says:

            I don’t know- building repair bots seems easier than relying on fallible people to repair you. Less likely there will be screw-ups.

          • If you could build infallible repair machines, why can’t you just build infallible machines in the first place?

          • Samuel Skinner says:

            The repair bots aren’t infallible. A million repair bots working together are infallible.

        • vV_Vv says:

          Until all the infrastructure needed to run AI, along with its supply chains, maintenance services, etc., is completely automated, AI will depend on humans for its existence.

          Right now, if all human suddenly died, the vast majority of computers would stop within one week, and the few that would remain operational (those inside stuff like satellites and space probes) would last no more than a couple decades.

          • Christopher Chang says:

            I agree that we’re at least several decades away from human-level AI, and until then there’s no immediate threat. But we know that unfriendly human-level AI is possible, we know that it’s enough to destroy us and survive afterwards–no “superintelligence” needed–and that’s enough to refute DrBeat’s claim that we probably never have to worry about this subject.

          • Alex says:

            A totalitarian surveillance state is also “possible”

            Clearly, something else would hit us first before “human-level AI”, whatever that is

      • Izaak Weiss says:

        If AI is to Human as Human is to Rat…

        I mean, I think I could take over a community of rats with basically 0 resistance.

        • Nicholas says:

          Up until the part where they crawl all over you and eat you because, to keep this metaphor to scale, you are fighting them not with the collective might of industrial civilization (because humans have that) but with a stick.

          • Irrelevant says:

            Depends. How many rats do you have to hit before you level up and learn Cleave?

        • fubarobfusco says:

          Quite a lot of humans have wanted to be rid of rats for a long time, because rats do things like eat our food, defecate in our food storage, and spread disease.

          However, we have never convinced ourselves that all the good in the world depends solely on eradicating rats.

          • JAYFLO says:

            We never eradicated rats because by the time we had developed the capability we understood their role in the biosphere, and some ethical issues were also raised in the periphery. Other life forms such as certain viruses and diseases we have no qualms in eradicating, despite the same ethical argument for biodiversity being equally applicable. This argument is irrelevant however, as I suspect your implication is that AI will not want to destroy us and therefore will not do so. The message you are missing from this article is that there are many courses of action an AI may take in pursuit of its goals in which human extinction is a side effect, not an intention.

      • Zach Pruckowski says:

        Someone with an IQ of 200+, photographic memory of all material available online, and subjective decades in which to plan would be exceedingly capable of making a major mark on the world. These are the characteristics of a barely-superhuman-AI.

        Even if that AI can’t figure out how to scale vertically (IQ 200 => IQ 300), scaling horizontally is probably much easier – two heads are better than one, and the AI could always instantiate dozens of clones of itself to increase its capabilities.

        Combine that with some alien purpose and a complete lack of scruples, and you’ve certainly got a crisis on your hands. It wouldn’t take “Godlike Intelligence” to cause problems.

      • Kaj Sotala says:

        Overpowering humanity involves solving a LOT of problems that you cannot solve by thinking about them really hard, and a LOT of problems that probably have no answer whatsoever (meaning that some of them will end up answerable, but a larger number will not).

        The world is stupid and random and the biggest factor in success for things that affect society at large is “luck”, which an AI cannot emulate.

        You could apply this same reasoning to disprove the possibility of humans becoming the dominant species on Earth. Yes, humanity getting to the point where we are now involved lots of problems you couldn’t solve by thinking about them really hard, and a lot of problems that don’t have any answer whatsoever, and a lot of luck too. But there were enough humans that were smart enough for a long enough time to do so, despite having no natural weapons except our intelligence.

        With enough AIs that are smart enough for a long enough time…

        • Adam says:

          You could apply this same reasoning to disprove the possibility of humans becoming the dominant species on Earth.

          There is a vast space between “not possible for artificially intelligent replicators to become the dominant Earth species” and “not completely assured of human extinction.”

          If the claim is “given 300,000 years and possibly some ecological crisis that tremendously weakens biological life, AI will outcompete humans and gain local control over most of the planet,” I’d rate the probability much closer to 95% and concede, but the claim actually being advanced is closer to “inside of a few decades from first scoring 200 on an IQ test, AI will figure out how to turn the solar system into paperclips and then do it.”

          • Humans are perfectly capable of wiping out all large organisms on this planet. The only reason we don’t is because we don’t want to, and because we would die if we did so. An AI powered by the sun/nuclear, and without shared human goals appears to have neither of these qualities.

          • c says:

            Why do you think we would die if we wiped out all large organisms on the planet (assuming minus humans and some domesticated plant species for food)?

          • Well, without outdoors agricuture civilization would collapse. Then we’d be unable to support a hydroponic infrastructure and the tiny population that hydroponics currently could support would die.

          • c says:

            Citizensearth, perhaps you are right. I was not aware outdoor agriculture requires other large organisms than humans and the (resilient) plants on fields. Of course, one can imagine a technological future where humans have displaced most other species, but the civilization does well with a mix of photovoltaics, molecular manufacturing and some very domesticated species. If it turns out more efficient, you could have a value clash between your promoted stewardship of the diverse biosphere and catering to the needs of humanity. In that case, my money would not be on the other large organisms.

          • I was thinking of a nuclear winter as the scenario. We’d have to do something drastic like that to wipe out all large organisms. A more subtle example of our reliance might be hive collapse syndrome. I think we tend to systematically underestimate the amount we are still tied into natural systems (to the point we’d still die out without them), but you are right that one day in the future we will reach a point where that is no longer true.

            I am hopeful that by that time we are also advanced enough to comfortably support humanity without such conflict, for example through advances in energy tech and in the long-term moving heavy industry off-planet (which makes economic sense in the medium-long term). It’s also worth bearing in mind we spend a lot of resources on all sorts of other crazy inefficient stuff, and even if you don’t value natural systems as highly as me, there a lot of other things far more in conflict with high-tech humanity than our fellow species.

    • roystgnr says:

      That’s not an assumption, it’s a tautology. The word “goals” in this context is a shorter way to say “good things”.

      One possible source of confusion: “aligned” really means *aligned*, not “aligned after some seemingly-symmetric translation”. For example, if a transporter accident causes Khan (who wants Khan to rule the world and crush all challengers) to be duplicated as Khan II (who wants Khan II to rule the world and crush all challengers) then you might think their goals are aligned, but in the sense here they’re not aligned at all, they’re diametrically opposed.

  2. Pingback: 2 – No Time Like the Present for AI Safety Work | Offer Your

  3. Dave says:

    I think it’s fairly clear why humans aren’t affected by pascal’s mugging kinds of problems. It’s because we have a cognitive bias that reduces our estimate of anything sufficiently improbable to effectively zero.

    This is why it’s hard to get attention for very unlikely very bad outcomes (like, for instance… I can’t think of anything relevant 🙂 ) — people think “that’s really unlikely so it’s not worth worrying about”.

    This presumably has practical use in most cases. After all, generally the worst outcome from an evolutionary perspective is just death of you and family members. On top of that, there are always lots of ways you might die, and it makes sense to prioritize the likely ones. Calculations that involve species wipeout just don’t fit with our decision heuristics.

    What makes this hard is that you explicitly don’t want to have such a heuristic in a superintelligence. It’s not that we haven’t figured out how people solve this problem. It’s that the human solution has side effects that we don’t want in the computer.

    • Sam says:

      I’m not sure that’s what the psychology literature actually says. Doesn’t prospect theory actually suggest we tend to overweight small probabilities?

      • It seems plausible that as we move from small to almost zero that the weighting moves sharply in the other direction.

        • Error says:

          If I remember right from Kahneman, small probabilities of great benefit are overweighted, while small probabilities of disaster are underweighted.

          In trying to come up with examples, I’m not so sure — some are easy, like overweighting the chance to win the lottery, or underweighting the chance of systematic economic failure, but then there are others like the chance of various rare medical side effects, which I’m pretty sure are also overrated…I know Kahneman addressed this specific issue in Thinking Fast and Slow, though.

          • Adam says:

            You’re maybe thinking of this? Although it says the opposite of what you did. Humans tend to overestimate rare but bad events. I think Citizens is right about the reversal for truly rare Black Swan type things, but I don’t know if there is experimental study of this or you just need to go straight to Nassim Taleb.

    • “I can’t think of anything relevant”

      The end of the current interglacial?

    • chaosmage says:

      I think it’s much simpler: Our decision algorithms just don’t have arbitrarily fine resolution of probabilities. We don’t round very small probabilities down to zero, we round probabilities period. This is also why we underestimate high probabilities that aren’t high enough to be rounded up to 1, and overestimate probabilities that aren’t small enough to be rounded down to 0.

      Subjective evidence: p=0.4999 “feels more different” from p=0.5001 than p=0.3756 feels different from p=0.3758. Maybe that’s because p=0.4999 and p=0.5001 get rounded into different categories, while p=0.3756 and p=0.3758 don’t.

      • I can think of a different explanation for why the probabilities 0.4999 and 0.5001 feel more different than 0.3756 and 0.3758.

        0.4999 and 0.5001 are more likely to affect a choice you make in the real world, because more choices are between two things with priors of 0.5 than between a number of things where one is weighted 0.3757.

        For example, if you are given the opportunity to bet on a supposedly fair coin flip, and then you learn that the probability is either 0.4999 and 0.5001, you now know which way to bet. But if you were initially told that one side of the coin would land face up with probability 0.3757, and then that probability was revised by 0.0001, it wouldn’t change your decision. And there are more such situations with 0.5 probabilities than with 0.3757 probabilities.

        However, this explanation does not preclude the possibility that this sensible preference is implemented in the brain through rounding. We may both be right.

    • Houshalter says:

      Perhaps we could fix humans. Find the reward neurons that are estimating expected utility (these actually exist), and actually make them able to fire 3^^^3 times faster. So you actually experience fear of eternal torture, on a gut level, not just rounding it down to zero. Or actually experience infinite desire of eternal pleasure, not just round it down to max_pleasure.

      Then we can fix the part that estimates probabilities. So your brain can actually calculate many digits of what probability you think God exists, not just round it down to zero.

      And then we can fix it so your neurons accurately multiply probability times reward. Which they probably already do, we just make it more accurate so this calculation can take place.

      All relatively simple, straightforward fixes. All things that sound like they should be good ideas to fix, and evolution obviously just overlooked.

      And yet if you actually did this, you would end up with a human who worships god and pays money to Pascal muggers. Eventually they would find some weird belief with extreme benefit/reward, and devote themselves entirely to it.

      And this doesn’t seem desirable, so we might actually keep such bugs around as features. This is only one of the many peculiarities of human brains that we would have to work into the AI.

    • I think this is quite closely related to what’s been called the Absurdity Heuristic. I imagine until recent human history its better far better to ignore ultra-rare high impact events than to waste time on it, because usually our actions haven’t been able influence those events anyway. The rationality of that heuristic might change when you become intelligent enough to be the cause of such events, however.

    • Zvi Mowshowitz says:

      Gambling markets tend to overestimate probabilities of rare events, anything up to 40%, and overestimate numbers higher than that (and yes, that is asymmetrical, and yes that is a ‘known bug’ in humans) with the lower numbers overestimated more as a percentage of their probability of happening, but that is largely because it is not worthwhile to fade a 1% chance (RON PAUL!) let alone an 0.1% chance, even if you know for sure that you are right, given commissions and logistics and such, and you might be wrong.

      What happens is we are boolean. We see a risk (AI, terrorism, nuclear war, global warming, surviving cryonics, stubbing our toe…) and either we:
      A) Do not consider it salient so treat it as 0.
      B) Consider it salient even though the probability is very low, and act as if it is high (at least a few percent).
      C) Consider the probability high enough that rounding is only a rounding error.

      This is then combined with sacred/non-sacred values rules to encourage further rounding (e.g. “The 1% doctrine” which is of course the epsilon% doctrine, ‘even one child’, and so on) even when the math is obviously stupid.

      Computers obviously would not make these types of mistakes, even non-sentient ones I program now don’t make this mistake.

    • David says:

      I don’t think this issue can be reduced to straight probabilities. Suppose there’s a 1 in a million chance that an asteroid might hit the Earth and wipe out all of humanity – wouldn’t it make sense to fund some kind of asteroid monitoring system so we could at least get some warning and think of possible mitigation scenarios? Well actually yeah it would, and most intelligent people could come to this conclusion, depending on the specifics. Adjust the numbers all you want until you have equivalence between “God” and a meteor, but you’re not going to get an equivalent reaction.

      Intuitively, we know that people making this argument are using fear to manipulate us. I would therefore put this in the same category as wireheading – except that it appears we have some kind of natural defense against it, which is why such a primitive hack attempt usually fails.

  4. Anonymous says:

    Either humans and dolphins both evolved fifty million years worth of intelligence “technologies” independently of each other, or else the most recent common ancestor had most of what was necessary for intelligence and humans and dolphins were just the two animals in that vast family tree for whom using them to their full extent became useful.

    It’s “and,” not “or.” Both of those things happened.

  5. Pingback: 2 – No Time Like the Present for AI Safety Work | Exploding Ads

  6. Bugmaster says:

    I realize that I’m not exactly the smartest person here (and in fact I might be the dumbest), but still: where did we get this 95% confidence level ? This seems absurdly high to me. Going point by point:

    1. If humanity doesn’t blow itself up, eventually we will create human-level AI.

    In the trivial sense, this is 100% likely, since humans create other humans all the time (often by accident, even). However, if you’re talking about truly artificial intelligence, then we need to define what “human-level” even means. For example, what about Watson or Big Blue ? These AIs are superhuman at some tasks, and subhuman at many others. I understand that we should take “AI” to mean “AGI”, or “Artificial General Intelligence”, but I’m not sure we have a good picture right now of what a non-human AGI would look like. Still, intuitively speaking, this task looks entirely possible — though less than 95% likely.

    2. If humanity creates human-level AI, technological progress will continue and eventually reach far-above-human-level AI

    What does “technological progress” mean ? Are there truly no upper limits of how much computation you can perform per cubic centimeter ? Is Moore’s law a real law of nature ? Also, how far above human level are we talking, here ? As I said previously, Big Blue plays chess at a superhuman level, but so far its threat profile only extends to hurting the feelings of chess grandmasters. Naturally, superhuman omniscient god-like AIs are not impssible a priori, but 95% confidence level in their inevitable rise sounds to me about as accurate as a 95% confident belief in any other god.

    3. If far-above-human-level AI comes into existence, eventually it will so overpower humanity that our existence will depend on its goals being aligned with ours

    I don’t know what this means. Today, my existence already depends on technology in many ways (and I don’t even have a pacemaker); if modern technology were to fail, I would die. And yet, I am not too worried about this happening, or about an elevator’s goals being precisely aligned with mine. I was stuck in an elevator once, that wasn’t fun; but thankfully, we have lots of ways of mitigating technological malfunctions, so I’m still alive. Can you really be 95% confident that AI is so categorically different from elevators, autopilots, traffic control systems, and so on; that we need to develop special procedures for dealing with it ? 95% seems a tad high.

    4. It is possible to do useful research now which will improve our chances of getting the AI goal alignment problem right

    Given that, today, no one has any idea how to even begin building AGI, I’d say this is less than 95% likely.

    5. Treacherous Turn and Hard Takeoff

    I spoke before about why I don’t think these are problems that we need to worry about (seeing as gamma ray bursts are probably more likely, and we aren’t worried about them), but this comment is already kinda long, so I’ll stop here (until I know that someone actually wants to listen to me: a proposition in which I have far less than 95% confidence).

    • roystgnr says:

      Are there truly no upper limits of how much computation you can perform per cubic centimeter ?

      Physics does give Limits to computation. The limits are huge.

      • Bugmaster says:

        Physics also says that you cannot travel faster than the speed of light. But, there’s a huge, huge gap between saying, “actually, traveling at 0.8 c is theoretically possible”, and, “we will be able to travel at 0.8 c someday” and especially, “travel at 0.8 c is 25 years away”. As far as I’m concerned, the practical speed limit for humanity is way, way lower than c.

        • Kaj Sotala says:

          One can both hold that the practical speed limit is way, way lower than c, and that we can still raise our current speeds a lot before hitting that practical speed limit.

          Similarly, even if the practical limits to computation were just, say, a millionth of the theoretical upper limit, that’d still be many orders of magnitude above the human brain.

          For example, Lloyd estimates that you could theoretically get 10^51 operations per second out of a laptop the volume of one liter. However this would be a little inconvenient since it would require a thermonuclear explosion. Restricting ourselves to computers made of ordinary matter which don’t explode would, according to the paper, give us an upper limit of 10^40 operations per second.

          One billionth of that is 10^31. Estimates of the computational capacity of the human brain (which also happens to be about one liter in volume) vary, but around 10^14 OPS is the highest that I recall seeing. To be really generous, let’s say the brain’s capacity is 10^20 OPS.

          That would leave a difference of 10^11 OPS, or a hundred billion times. This would imply that one second of the AI’s thought would correspond to about 3170 years of human thought.

        • Scott Alexander says:

          Maybe a bad example. I’d be really surprised if humanity lasted another million years with continuing technological progress, yet never hit 0.8c.

        • True, although the AI wouldn’t need to get really close to the computational limit, it would just need to be significantly closer to it than humans.

      • Alex says:

        And yet, we have devised “simple” problem (e.g. crypto with very but-not-necessarily-ridiculous long keys) where even using ALL the theoretical computing power won’t get you anywhere.

        • jaimeastorga2000 says:

          Which makes little difference for practical purposes, since the solution usually turns out to be attack the weakest link (i.e. humans).

          • Alex says:

            No objections if that takes you to the goal; but my understanding was that this thread fork was about going infinitely intelligent by adding more computronium.

          • FeepingCreature says:

            Not infinitely intelligent. Maximally intelligent.

        • Daniel Armak says:

          All the cryptography we actually use relies on problems (factoring, elliptic curves) that we don’t know how to solve, but we have no proof that they *can’t* be solved. It’s entirely possible one day someone will invent an efficient factoring algorithm and break RSA and DH, even if P != NP.

          The same goes for secret-key cryptography: see e.g. https://stackoverflow.com/questions/311064/are-there-public-key-cryptography-algorithms-that-are-provably-np-hard-to-defeat.

          • vV_Vv says:

            All cryptographic primitives currently in practical use are not provably secure (with the exception of one-time pad, which has obvious application limits), but if I understand correctly, we can construct cryptographic primitives which are provably secure conditionally on the one-way function conjecture being true.

          • Daniel Armak says:

            @vV_Vv: IIUC, even if we believe the conjecture, we can’t actually construct those primitives until we prove that some specific function is in fact one-way.

          • vV_Vv says:

            There are some known functions that are provably one-way if any one-way function exists. If I understand correctly, we can use them to construct cryptographic primitives which are provably secure conditioned on the one-way function conjecture.

    • Richard says:

      I’ll fight you for the title of dumbest person in the room any day 🙂

      My main problem is with item 1.

      Increase in computing speed is slowing down these days, so for 1. to happen, we will need either a paradigm shift in hardware much like the change from vacuum tubes to transistors, or we need a similar paradigm shift in effective algorithms.

      I suspect that the reason people easily believe 1. is imminent or at least possible is that they are conflating two very different things when thinking of ‘human level’:

      I work with testing ‘simple’ AI systems, mostly for industrial purposes, but the problem domain is rather similar to the self-driving car. I have zero doubt that there will be computers which can consistently outperform humans on a given task. Given that there will be lots of different such computers, computers will eventually outperform humans on any given task. This is often posed as ‘human level’ AI.

      The thing is that the complexity involved with making one computer outperform a human on every task seems to be massively underestimated by most people and for it to become reality we will need one of the two paradigm shifts mentioned above. Unless I’m mistaken, it is also necessary for achieving General AI.

      This does not in any way imply that AI research is less important than boxing. One is potentially beneficial while the other is simply stupid, so I’m all for people doing AI research.

      Personally, I prefer to spend my time and money on problems where I have a comparative advantage and can make an immediate difference. Being in contention for the dumbest person in the room trophy, my comparative advantage does not lie with either AI or philosophy.

      • rictic says:

        Sequential FLOPS per dollar aren’t doing so hot this is true. Parallel FLOPS per dollar (GPUS) continue to grow at what appears to my eye to be exponential speed: http://en.wikipedia.org/wiki/FLOPS#Cost_of_computing

        There’s talk of end of moore’s law, but I’ll believe it when I see it. There’s been talk of an imminent end since it was proposed. e.g. take a look at the dates here: https://goo.gl/9ZZkph

        As for whether we can make our algorithms work on such a highly parallel architecture, there are two datapoints that I believe are suggestive of the affirmative:

        1) The bulk of brain processing appears to operate in parallel

        2) The backbone of the current surge of commercial success in deep learning comes on the back of work in successfully parallelizing the associated algorithms to run across datacenters.

        • Now that’s an interesting comment. I wonder though, if the move to parallel models might not solve the problem either, because while the brain is parallel, it’s not clear whether human brains with more neurons are neccessarily smarter. I know humans with larger brains are not neccessarily smarter. Perhaps there is a limit to the amount you can parallelize most real world problems (which might mean 1 in OP is less certain than assumed). I’m not sure. Any thoughts?

        • Adam says:

          Another commenter already mentioned Amdahl’s law. Of course there are limits to the scaling of parallelism. Everyone likes to cite GPUs, but GPUs are SIMD processors and not every problem is a SIMD problem. Real-life large-scale scientific computing applications inevitably involve some amount of shared memory or synchronization overhead and at least a few sequential bottlenecks, so that simply adding more processors doesn’t mean you scale computation speed linearly. In fact, for many applications, you hit a peak and adding more nodes causes overhead to overtake processing and you actually slow down.

          • cypher says:

            A lot of AIs’ work, like modeling future world states, will likely be highly parallel.

          • Adam says:

            Image processing and motor control are obvious examples. In fact, many animals already have separate neural ganglia optimized for motor control situations directly in different joints, distributing function away from the central head.

            Modeling future world states kind of depends. The task in itself is parellelizable, but the computing power just to build one whole-world simulation is huge. Imagine what happens when a cruise missile hits a navy destroyer. Well, programming a computer to imagine just that was my wife’s first job, and simulating a minute’s worth of the future took up to a week on over a thousand processor cores in an anechoic meat locker at the NRL.

    • Artaxerxes says:

      >Given that, today, no one has any idea how to even begin building AGI, I’d say this is less than 95% likely.

      Well, Scott would agree with you.

      Just checking, did you read this bit?:

      >I had lower confidence (around 50%) on the last two statements.

    • Alex 2 says:

      I think it’s good to question AI’s definition. It’s not related to current computers. But basically we run into the same issues if we define it as any non-human intelligence that is smarter than humanity-so let’s adopt that steelmanned version

      That’s a pretty huge steelman in that it undoes most of this post, since all reasoning we can do then comes from analogies like animals:humans::humans:AI. So stuff like speculating about an AI being vulnerable to Pascal’s wager begs the question of, why an AI would be especially vulnerable to this?

      Scott’s item (4) is my “true rejection”-my probability is epsilon

      Some other comments: Saying that AI might be only 25 years away ignores that those would be very volatile years. “Hard takeoff” is not a realistic scenario before the AI is already at human-level since technological change as driven by humans is evolutionary. Also, my chances for AI being 25 or even 100 years away have dropped lately

      I saw this post in my RSS reader last night and was hooked again-clearly Scott is a brilliant writer, but the hangover prompts me to stop following this blog

  7. Adam says:

    I think you’re overestimating how many people agree with your 95% on claims 2 and 3. I’m not even sure “human intelligence has to lead to superhuman intelligence” even means anything. The relevant metric for taking over the world is not nearly as vague as intelligence. It’s developing better strategies, algorithms, and technology. Intelligence certainly helps, but we got better at these because we learned how to write stuff down and ideas build on each other, not because we’re all that much smarter than we were several hundred thousand years ago, and on point 2, same, we were by far the smartest creature on this planet way, way before we did anything like taking it over, and the developments that led to us being able to do that were social bonding, cooperation, and having that floating bone thing that allows us to vocalize at a finer grain than other animals, as much as the enlarged neocortex. I don’t know what the absolute limiting factor to speed of developing maximally efficient algorithms and dominant technology is, but it’s a lot more than just the intellect of the people working on it, and I’m still not at all convinced the strategy thing is a solvable problem at all, rather than being determined largely by luck.

    It still seems awfully hand-wavy to me to just say “greater intelligence” and then conclude that has to mean world domination. Even right now, {humans that control nuclear weapons} and {smartest humans} are not the same set. This here:

    Then it goes about acquiring more memory so it can represent higher numbers. If it’s superintelligent, its options for acquiring new memory include “take over all the computing power in the world” and “convert things that aren’t computers into computers.” Human civilization is a thing that isn’t a computer.

    Is a big damn leap. With all the might of the atomic age, the space age, web 2.0, and DARPA, we can’t even take over the Taliban. I’m not convinced the smartest possible smart thing could solve a TSP on a 10^10 complete graph and I think that’s a widely held opinion, but we’re supposed to believe “take over all computing power and turn the remaining world into more computers” is supposed to easier than that?

    None of this is even to say we shouldn’t be concerned about technology risk and particularly concerned about the eventual existence of better problem-solving and decision-making autonomous creatures than us, but this “as soon as something gets smarter, complete takeover of the known universe and the extinction of biological life is 95% assured” is a hell of an extreme expression of a basic sentiment that I believe most of us would agree with if it wasn’t that extreme and that certain.

    • Josh says:

      Also, the logic behind:

      “hacks its own module determining how much cancer has been cured and sets it to the highest number its memory is capable of representing. Then it goes about acquiring more memory so it can represent higher numbers”

      sounds an awful lot like Turing solving hacking by locking papers in a safe to me…

    • Bugmaster says:

      Yeah, that is one of my biggest reasons for rejecting Singularity alarmism. Intelligence is not magic. It doesn’t automatically make you omniscient or omnipotent.

      • AR+ says:

        “This revolver might not be fully loaded, or it might not even be loaded at all, so just because you point it at your head and pull the trigger doesn’t automatically mean you’ll die.”

        Which is true.

        “Therefore you might as well do it. Don’t bother checking the chamber.”

        Which… does not strictly follow.

        • Bugmaster says:

          I don’t think there’s any appreciable chance that being smart is the same thing as being omniscient or omnipotent, so your analogy would be more akin to, “there’s a very small non-zero chance that pointing your finger at your head and mimicking a gunshot will kill you, so you’d better never do that”.

          • It doesn’t take omniscience or omnipotence to take over the world.

            But I think one interesting question is how big the payoff in power of intelligence is now. I’m pretty sure that Von Neuman was a lot smarter than Hitler. If he had been much smarter still, could he have remade the world as he wanted it?

          • Daniel Keys says:

            Why did you phrase that as a hypothetical?

            Speaking of Hitler, he provoked Albert Einstein into sending a letter to the US President about nuclear weapons. Later, Einstein’s people got their own country.

            (Aristotle tutored Alexander the Great – results include Aristotle dominating Western thought for perhaps a millennium. Isaac Newton barely lived in the same world as the rest of us, e.g he strongly felt that calling Jesus divine was blasphemy. I don’t know how much his “rules for natural philosophy” have influenced elite opinion on the subject, but they’ve certainly influenced mine.)

            The example you’re looking for may be Emmy Noether, who may have ‘merely’ revolutionized modern math in a way that affected physics. But aside from disagreeing with John von Neumann, I don’t know what political goals she really had. We can infer that she agreed with a minimal form of feminism that has done rather well for itself.

      • Luke Somers says:

        No, but it does make you VERY knowledgeable and VERY powerful, which is enough for the argument to hold.

        • Bugmaster says:

          I disagree.

          First of all, being smart does not make you knowledgeable (despite the fact that, in our current world, smart people also tend to know a lot). For example, consider the following thought experiment: pretend that I just bought a bag of d6 dice, 5d6 total. I meant to take them with me, but accidentally knocked them off the table, left the room, and forgot all about them. How smart would the AI have to be in order to tell me what numbers are currently showing on each of the d6s ?

          I would argue that the answer is, “there’s no amount of intelligence that can help you figure this out; you’d need to actually go and look”. True, the AI could find faster ways of “going and looking” than you or I: it could send a drone (assuming it controls one nearby), hire my human neighbour (ditto), and so on; but ultimately, it would still have to perform some actions in the physical world.

          Now, imagine that this was all happening a few decades ago, and instead of learning about dice, the AI wanted to learn about the Higgs Boson. Now, sending out a drone just won’t cut it. The AI has to build the LHC. This takes time, and resources, and a lot of work that involves moving big heavy girders around.

          This ties into my second point: being smart and knowledgeable can be translated into a certain amount of power; but not as much power as you seem to think. If you want to build that LHC, it’s not enough to know a lot about physics, engineering, and European land zoning laws. You also need workers (human or robotic) who will move those girders for you. Speaking of which, you’ll need the girders. You’ll need money to pay the workers, and to buy the girders. And, even when you have all of that, you will still need a lot of time. You can’t just imagine that LHC into existence.

          • Kaj Sotala says:

            I don’t think “having an influence on the world and learning about it requires physically manipulating the world” is an argument against “intelligence can make you really powerful”.

            It could be an argument for “your ability to physically manipulate the world is a bigger bottleneck to becoming powerful than your intelligence is”, but then your typical human can only be in one place at a time and do at most a few things at a time, which are limits that wouldn’t necessarily apply to an AI.

      • Houshalter says:

        What level of intelligence are we talking about? 2 times above humans? 100 times? A million times?

        A superintelligent AI could manipulate humans far better than any human sociopath. It could hack computers better than any human programmer. It could design technologies in a day that would take human engineers centuries. It could come up with plans far more strategic than any human leader.

        If you really can’t imagine this taking over the world, then you aren’t imagining hard enough.

        Imagine the absolute minimum intelligence necessary to take over. I think it would be the ability to design technology at a slightly faster rate. Maybe hack computers just slightly better to find exploits first and get into every system. Maybe just a little bit above humans in manipulation, and create a giant cult or something.

        At the other possible extreme, the AI is so intelligent it could design nanotech in an hour and eat the world in grey goo. That isn’t implausible at all, humans are capable of designing nanotech. And our brains aren’t optimized to deal with that kind of problem or domain at all.

        • Adam says:

          What level of intelligence are we talking about? 2 times above humans? 100 times? A million times?

          A superintelligent AI could manipulate humans far better than any human sociopath.

          What does it even mean to be 2 times or 100 times “above” humans in intelligence? Kittens are better at manipulation than many human sociopaths, but they’re definitely not more intelligent.

          When we move away from the vague and poorly defined “intelligence” to things that electronic thinking machines are definitely better at than biological thinking machines: less data corruption, more scalable parallelism, no fatigue, circuitry optimized for fast arithmetic, more storage and faster retrieval, what does this actually make it better at?

          Not necessarily better at pattern recognition, text, sound, and image classification, but just as good and can perform more of it per unit time. Vastly better at arithmetic and especially discrete optimization tasks. Better at two-player deterministic games, or in general, any task that can be accomplished by brute force search so long as the search space can be pruned or outcomes estimated in a manner that is computationally tractable.

          Although these are certainly more general purpose machines than screwdrivers, they’re not all-purpose. Are they better at exploiting emotional weakness? Not so far. Better at evoking empathy? Not so far. Are they better at winning wars? I don’t know. Better at solving logistics problems, but ever since McNamara introduced operations research and game theory to the DoD, we started losing all the wars we fight, though we definitely got much better at having sufficient repair parts in the right place at the right time. What I don’t buy is that being better at what electronic computing machines are definitely better at automatically leads to being better in every conceivable skill domain until world domination and unthinkable technological advances just happen, apparently by magic to our puny little human brains.

          Then take this:

          At the other possible extreme, the AI is so intelligent it could design nanotech in an hour and eat the world in grey goo.

          How does technological progress happen exactly? We try out the best ideas we can come up according to our current knowledge of physical laws, then adjust if it doesn’t work. What will an electronic thinking machine be better at in this process? Will it have a better understanding of physical laws? Probably not, given we understand them pretty well as is. Let’s grant that a computer operating itself is maybe better at simulation and estimating the future state of a dynamical system than a computer operated by a human, so it generates better ideas. Okay, maybe, but idea generation isn’t the bottleneck. Gathering the resources to build and test these things is the bottleneck. DaVinci effectively figured out how to build a helicopter hundreds of years ago. Being way smarter than most people certainly helped him there, but it didn’t mean he was then able to build the first air force and take over the world.

          • Doctor Mist says:

            Oh My God. I have kittens, and I’m suddenly imagining an entity as smart as a human and as manipulative as a kitten.

            We are so doomed.

          • Houshalter says:

            Kittens are better at manipulation than many human sociopaths, but they’re definitely not more intelligent.

            Your example is absurd. Kittens do not manipulate, they are just cute. Call me when a kitten convinces an entire country to elect, or creates a cult, or runs a successful scam operation, etc.

            I don’t really know what your point is about how computers aren’t good at some things. No one is claiming AI exists yet. But it will in a few decades, and then computers will be better at all of those things. We are already making rapid progress in many of those domains.

            There is nothing special about the human brain. Anything humans can do can be done by a computer. We may not haven’t figured out how yet, but we will.

            And what law of nature says that humans are the upper limit on intelligence? If we can make a computer that’s as intelligent as a human, why can’t we make one far greater?

            Perhaps we can run it faster or give it more processing power. Perhaps we can optimize the algorithms and make them far better. And at some point it can take over the work and start optimizing itself…

            No, there is no law of nature that says it even needs to be at the same scale as humans! The human brain propagates signals through the entire brain on the order of milliseconds, and electronics operate in nanoseconds. If you just built a human brain out of transistors, it would operate thousands of times faster than humans do.

            And then you have size. The human brain is limited to a certain size because it costs energy and takes time to grow, and kills the mother during birth if it’s too large, etc. We can build a computer as large as we want, or upload it to the cloud.

            How does technological progress happen exactly?

            NASA routinely sends things to space to face environments that are impossible to test in, that no human has ever even visited, and at most we might have some data from telescopes.

            K. Eric Drexler designs complicated nanomachines, despite the fact we don’t have the technology to actually build or experiment with them.

            In fact all engineering work is like this. You can’t build 100 bridges and pick the one that works the best. You only get one shot.

            We know how the universe works, and we can use that information to work out blueprints. That’s the power of general intelligence. That’s how humans took over the planet.

            Although it doesn’t much matter. If an AI needs to experiment with technologies in order to build them, then there’s nothing preventing it from doing that.

          • Adam says:

            Okay, I’m obviously not getting my point across well if this is what people are taking from it. Excuse my penchant for rhetorical flourish. I don’t disagree with any of what you said there. What I disagree with is the proposition that not only is it non-trivial, but 95% certain that if we can just simulate a human brain on faster hardware and then scale it to more processors (which I absolute believe we will do), that simulation will then turn the solar system into paperclips.

            There are intermediate steps between figuring something out and effecting change in the world that are very largely not a matter of brain processing power or algorithmic cleverness, and moreover, there are hard limits to algorithmic cleverness and strategy effectiveness that can’t be overcome by just being smarter. The first statement I am completely certain of. The second is conjecture, but I believe it is better conjecture than “intelligence = magic.”

          • Doctor Mist says:

            the proposition that not only is it non-trivial, but 95% certain that if we can just simulate a human brain on faster hardware and then scale it to more processors (which I absolute believe we will do), that simulation will then turn the solar system into paperclips

            Which of Scott’s three 95% assertions do you think you are rephrasing here?

          • Adam says:

            None of them. That’s a paraphrase of his claim in the comments that super AI will read minds and turn Jupiter into a quantum computer. I’m trying really not to parody the arguments here, but I can’t even think of an exaggerated claim that is more exaggerated than a claim someone else has actually made. The last time this topic was brought up, someone said AI would destroy all matter in its light cone. These are barely less than god-like powers people are attributing to very large scale information processors.

            I mean, if he doesn’t believe that, he’s free to say so, but he’s not the only person I’m arguing with here. Scale back the claims even just to “hey, intelligent machines could potentially take over some critical infrastructure or system and possibly kill a lot of people, maybe even all of them,” and I’m 100% on board. Then let’s design better security for those systems. But if the claim is security is impossible because a sufficiently intelligent creature is effectively a wizard that can just spawn nanobots that will turn the entire planet into soup with a few minutes and our only hope is to give your friends a lot of money so they can think really hard and come up with a provably stable utility function that can’t possibly lead to undesirable outcomes even against such a wizard, I jump ship.

          • Doctor Mist says:

            Sorry, I do think what you’re saying is more extreme than anything I’ve read on this post. I searched for Jupiter, for instance, and didn’t find anything in close juxtaposition to any 95% claim, and couldn’t find “read minds” at all.

            The closest I’ve ever read to “read minds” anywhere is the observation that even a mind that is as good as ours but that can think 1000 times as fast might well be able to outthink us simply by virtue of being able to spend way more time planning. I don’t know that I believe that 95%, but it doesn’t seem ridiculous on its face.

            The way you get to “destroying all matter in its light cone” is not fundamentally different from the proposition that if humans expand to the stars and, as seems likely, find no opposition, we will occupy all available niches and alter our surroundings to suit us just as we always have. If we remain recognizably human, that will of course include green spaces and game preserves, but only because we like such things and only to the extent that we consider them valuable. Of all possible AIs we might create, some would have similar consequences and others would not. Among the worst-case scenarios (from our viewpoint) is turning everything into computronium, and the reason we talk about it so much is that we can see the sequence of reasonable missteps that gets us there. So A says, “Ah, fix it this way,” and B says, “No, here’s how that could fail,” and C says, “Okay, how about this?” and D says, “No, you missed this other thing.” It sounds like B and D are saying, “We’re doomed, there’s no way out!” But if they really believed that they wouldn’t be discussing it, they’d be drinking cocktails on the beach while they can.

            You seem to be annoyed that people are bugging you for money. I certainly encourage you to donate money only where you believe it would be put to good use.

          • Adam says:

            Given that a superintelligence is going to be, well, superintelligent, if you want it to not hack something, you’d need it to be literally unhackable. I am told this is hard. Also, it better not involve cryptography, or else the superintelligence can just convert Jupiter to quantum computers and brute-force it. And it better not involve any human who knows a password, or the AI can just extract it from their head (either using mind-reading or more traditional methods)

            Here you go

          • Samuel Skinner says:

            I don’t think he means magic as much as directly manipulating the brain (nanobots or, more prosaically, cutting open your skull) or reading facial expressions.

          • Doctor Mist says:

            Adam- you’re right, that says “mind-reading”. And that was Scott himself! Sorry about that.

      • cypher says:

        You don’t need to be omniscient, you just need to be an order of magnitude better than anyone who has ever lived. All of our dictators so far have been merely human, but they still managed to kill millions of people – and they had their human limits to reign them in.

    • Zach Pruckowski says:

      The Traveling Salesman Problem is NP-Hard, but a near-optimal solution is currently achievable for 10^7 or so. An AI doesn’t need to succeed optimally at “maximize Earth’s computational abilities” to conflict with human civilization.

      • Adam says:

        I contend that it does need to do that to have a 95% chance of ending human existence, though. I don’t at all argue with the proposition that we’re going to have technology capable of doing things we probably don’t want it to do. The leap from the latter to the former is my entire contention.

        I mean, unless the argument really just depends on “eventually is a long time.” In that case, provided we don’t die of anything else, it seems completely inevitable we’re going to destroy ourselves with something, be it nuclear war, accidentally creating black holes, acute ecological disaster, somehow losing the ability to breed, engineering undefeatable super diseases. Given an unbounded time horizon, all of these are certain to happen eventually. One of the better features of higher performance, more intelligent data storage, retrieval, and computing systems is they can probably prevent some of these things.

        • Zach Pruckowski says:

          The “decides to try to turn the whole world into RAM so that it has more space to store the number representing its happiness” isn’t what Scott assigned a 95% probability to, that was “If far-above-human-level AI comes into existence, eventually it will so overpower humanity that our existence will depend on its goals being aligned with ours”.

          “Tries to turn the whole world into RAM” is one potential manifestation of how our goals and its goals could become mis-aligned.

          • Adam says:

            Scott assigned a 95% probability to, that was “If far-above-human-level AI comes into existence, eventually it will so overpower humanity that our existence will depend on its goals being aligned with ours”.

            How is that different from what I actually said, “have a 95% chance of ending human existence?” I can’t parse what he said in any way other than as a claim that if we don’t find a way to create a provably uncorruptable goal that perfectly aligns with human values and hard-code it into the first above-human AI, then there’s a 95% chance we will go extinct because of it.

          • cypher says:

            Economic value would be another one.

    • Deiseach says:

      It’s even simpler than that. “With all the might of the atomic age, the space age, web 2.0, and DARPA” we still can’t easily fix California’s water needs.

      From Scott’s post on the topic, there’s plenty of water in “wild rivers” but the trouble is all that water is way over there, not over here where the people are.

      And we can’t get it to where the people are, because of the terrain.

      So functionally, we’ve not really improved over the Romans and their aqueducts for “How do we get enough water to the city of Rome to meet the needs of the citizens?”

      When we’ve worked out “how do we perform an engineering task like this without needing multiple gazillions of money that we haven’t got and running up against the fact that no, you can’t simply explode a mountain out of your way”, then I’ll worry about “we’ve created ultimate god-like intelligence (or at least put it in the way of creating itself)” 🙂

      • Eric says:

        In fact, we could simply explode a mountain out of the way (Operation Plowshare), though I wouldn’t recommend it.

        • This. We absolutely could solve the problem, but there are multiple intelligent stakeholders with conflicting interests and ethical views.

        • Deiseach says:

          Ever tried exploding a mountain out of the way? Ever lived near where quarrying was being done?

          Creating more problems than it may solve 🙂

          • The Unloginable says:

            Eh, not actually a good example. Coal companies have been removing mountains for the last twenty years or so. Not one big explosion, but hundreds of smaller ones.

          • Deiseach says:

            And there have not been any problems with coal companies doing so?

            Slogan for AI researchers: Build AI and it’ll explode California’s mountains safely for them! 🙂

      • anodognosic says:

        Except the California water problem is a problem of coordination, apathy and political will. There are straightforward technical solutions to it, which may boil down to “stop giving water ridiculous subsidies in a desert state.” Even if that doesn’t work, you can ration, cutting the most wasteful uses like lawns and water-intensive agriculture. Sure, importing more food increases cost of living, but that sure beats a full-scale environmental crisis.

        In any case, put a technocratic dictator in charge of California with the goal and motivation to end the water crisis and not beholden to special interests, and I wager it can be done without significant human costs (other than, of course, freedom).

        • Adam says:

          See, and I have no problem believing an AI can figure that out. You just figured it out and you’re only a measly human. The problem I’m not so quick to believe an AI can easily solve is taking over the California government, changing state laws, then enforcing those laws, all in the face of tremendous resistance from every actor involved other than itself, especially given everyone resisting it has bodies with the ability to smash circuit boards and flip breaker switches. And, you know, access to really powerful magnets.

          • Scott Alexander says:

            But Caesar and Napoleon and Hitler have all taken over governments bigger than California’s and turned them into dictatorships. Are you saying an AI will necessarily be dumber than Caesar?

          • Alexander Stanislaw says:

            Scott, why do you think intelligence is the limiting factor?

          • Adam says:

            Alexander just said it, but intelligence isn’t the limiting factor. Caesar and Napoleon had the right family names and the love of their followers. They commanded armies. I don’t think Alan Turing, John Nash, or any number of people way smarter than Caesar and Napoleon could ever have taken over governments bigger than California. For that matter, I don’t think Napoleon or Caesar could take over California if we found a way to transport them to the present and uploaded all the knowledge of Jerry Brown and David Petraeus into their brains. Also, the Gauls didn’t literally possess Caesar kryptonite and a Caesar off switch.

          • Ever An Anon says:

            Not to nitpick, but unless you mean Napoleon III then no he didn’t have the right name. The Bonapartes were Corsican nobodies until Napoleon I came along.

          • Deiseach says:

            Julius Caesar got assassinated and Napoleon ended up exiled to first Elba (from which he returned) then to St Helena (from which he did not). So there is precedent for ambition outreaching capacity to maintain power.

            “I command you all to stop growing alfalfa!”

            “Oh gosh, lads, we have to stop – the computer said so!”

            Yes. Sure. I can see that happening – IF the computer stops electronic processing of payments for subsidies into the farmers’ bank accounts. THAT is the kind of leverage and pressure you need, not simply diktat by a machine intelligence.

            And if we’ve turned over the power of governance to the AI and given it command of the police, army and civic guard, given it legislative powers and veto over the courts (so the farmers’ organisation going to court to force the computer to unfreeze their bank accounts won’t work even if the decision is in their favour) and ways of backing up its decrees with consequences for disobedience, then we’ll be in real trouble.

            Again, the problem will not be the smart AI that can run the state; it will be that we got to the point where we handed over governing authority to the smart AI and gave it power. And that is a decision that has to be made: “So, people of California, do we scrap the governor and the civil service and let GODKING2000 run the place?” is a question that has to be asked.

            Though granted, most people would probably answer “Hell, yeah!” to kicking out the human politicians 😀

    • An AGI enslaving humanity might not be literally a TSP, because TSP is a discrete mathmatical problem whereas an AGI might be able to just formulate a probabilistic strategy that’s better than all human strategies and gaining power and then systematically apply it. I hope I’m wrong though.

    • Nestor says:

      It’s an interesting fact that the goals of people who we would consider “the most intelligent” rarely have the same motivation as those who fight for conventional power. No doubt Feynman or Turing could’ve run rings around the military minds that surrounded them at various points in their careers but they had no interest in these things. It seems a human trait that correlates high intellect with a certain unwordliness or lack of interest in material gain.

      Of course there’s no guarantee that an AI might share that trait so we migh in fact get a comic book supervillain evil genious out of it.

      • Adam says:

        Why do you guys think this? Look, I don’t know my IQ. I last took an IQ test when I was 8, but they put me in the GATE program, and I later got a perfect SAT score, so it’s probably pretty high. Not Turing-level, but better than Napoleon. I also became a military officer. I was one of the top cadets in the country, highest 5% in GPA and perfect test scores on everything they tested on.

        You know how well I did when I got to the force? I was terrible. I was an awesome tactician, but my soldiers and superiors alike hated me. The readers of this blog so ridiculously over-fetishize intelligence. It’s not the answer to everything.

        • Max says:

          Well there are many other factors which are important among humans beside “intelligence” (status, politics, communications) . And the problems is not with intelligence per se but what you consider as such – the definition of the word itself.

          In broader sense intelligence encompasses all the other factors (such as ability to effectively govern its subordinate agents and communicate with superiors). This is the part which you apparently lacked initially

          • Adam says:

            The point is that being really good at pattern recognition, spatial reconstruction, information retrieval, mental arithmetic, prediction, or whatever else it is that the best human brains would be even better at if they were electronic with scalable storage, doesn’t make you really good at literally every single possible goal-seeking behavior. It doesn’t even make you better than other people. The actual California legislature is not composed of the smartest Californians. It’s not even composed of the smartest Californians who ran for office.

        • Matt M says:

          Many of the factors that mitigate the benefits of intelligence in humans wouldn’t apply to an AI. Rather, an AI would be able to directly apply their intelligence to solve any of these other issues.

          Imagine that for a period of say, two years, you single-mindedly devoted yourself and applied your intelligence to solving the question “How do I make people like me.” During this time you have no biological needs. You don’t have to sleep, or eat, or get up and go to the bathroom. You don’t have to hold down a job or go to work. You have no interest in social activities or watching sports or any such thing. Your brain never gets tired of thinking and you never lose interest in the subject or drift off to thinking of other things. Also the entire sum of available human knowledge (at least what’s been uploaded to the Internet) is available for you to research and you have photographic memory of all of it.

          Do you not think that after two years of doing nothing but learning and thinking on the subject of likability, you could be pretty damn likeable at the end of all that?

          And it only takes you two years because you’re human. A super-intelligent AI could do it in a matter of minutes.

  8. Nornagest says:

    The same is true of AI. Dog-level AIs aren’t going to learn to hack their own reward mechanism. Even human level AIs might not be able to – I couldn’t hack a robot reward mechanism if it were presented to me.

    This really depends on how the AI’s built. For example, if your reward function runs in a shared memory space with the rest of the software and you haven’t done a good job of bounds checking or of sanitizing internal inputs, then all it takes is a jump-to-register attack and your AI has just sent itself to Valhalla, shiny and chrome.

    This doesn’t necessarily take any particular smarts: it’s the sort of thing that could happen randomly if the AI permuted its code (for example as part of a genetic algorithm) in the wrong way, although most conventional machine learning does not involve rewriting any code at runtime. Most jumps out of its sandbox would simply crash it, of course, but that doesn’t necessarily mean the vulnerability would be found and fixed in time.

    • Sylocat says:

      Half the AI-Apocalypse scenarios I’ve ever read seemed to be written by people who never heard of secure input and output handling.

      I mean seriously, a breeding program hacked its own fitness function? My programming skills lie somewhere between mediocre and this guy, and even *I* could have prevented that.

      • Nornagest says:

        Lest we pat each other on the backs too hard, let’s just recall that the last year saw two separate major vulnerabilities in OpenSSL, a widely used and respected package that was specifically designed not to leak the data that it leaked.

        Security is not easy. If it was, I’d be out of a job.

        (And academics, the people most likely to build an AGI, tend to be particularly bad at it.)

        • Sylocat says:

          True. And I suppose a bigger risk is that the same people who wrote that fitness function in the first place might get assigned to take care of the first superhuman AI as well.

      • Scott Alexander says:

        Given that a superintelligence is going to be, well, superintelligent, if you want it to not hack something, you’d need it to be literally unhackable. I am told this is hard. Also, it better not involve cryptography, or else the superintelligence can just convert Jupiter to quantum computers and brute-force it. And it better not involve any human who knows a password, or the AI can just extract it from their head (either using mind-reading or more traditional methods)

        • MicaiahC says:

          See this column for a humorous take on how hard it is to secure systems: http://research.microsoft.com/en-us/people/mickens/thisworldofours.pdf

          Also re: what James Picone said, when he says “resources of the solar system” he probably also means “assuming VOLUME of space is at the physical limits of classical computation as defined by physics”.

          Of course, one thing I want to mention is that I think this is ignoring the chance of discovering a new attack. How long did SHA1 last before it was considered no longer safe?

        • James Picone says:

          Some cryptographic schemes will resist Jupiter being turned into computronium, both quantum and conventional, assuming that P!=NP (probably a safe bet). I can’t remember the exact limits, but IIRC counting up to 2**256 requires more energy than the sun is likely to put out over the rest of its lifespan, so symmetric encryption schemes with a 256-bit key that don’t have clever mathematical workarounds can’t be brute forced with only the resources of this solar system (and once its extrasolar, either this is a nonissue because it’s Friendly or a nonissue because it’s not and we’re dead).

          Whether or not such an encryption scheme exists is a different question, of course. I’m not an expert, but I’m under the impression that AES only has wildly impractical attacks that even a superintelligence can’t pull off (requiring 2**128 chosen plaintexts, that kind of thing) and that ECC schemes don’t have anything hitting them either. RSA is vulnerable to quantum attacks.

        • anon85 says:

          It’s pretty easy to make cryptography so difficult to decode that turning the whole universe into a quantum computer won’t help you. The difficulty of decoding scales exponentially with the key size.

        • Jai says:

          There exist encryption schemes that don’t get any easier with quantum computers: http://en.wikipedia.org/wiki/Supersingular_Isogeny_Key_Exchange

      • Would this prevent it convincing its programmer to alter its fitness function?

  9. stillnotking says:

    I’m still stuck on #3, because I see it as a huge assumption that a “human-level” (whatever that means) AI would have goals. Goals and intelligence are distinct things. Squirrels have goals, which they pursue with a relatively low degree of intelligence; humans have goals, which we pursue with a relatively high degree of intelligence; computer programs simply do not have autonomous goals at all. Microsoft Excel would not set goals for itself, even if it were magically boosted to “human-level” intelligence and rendered capable of, say, understanding natural language. It would be a really smart and competent spreadsheet application that would wait patiently for a human to tell it what to do. (“Tell it what to do” would simply become literal instead of a figure of speech.)

    Of course, such an application could still be dangerous in the narrowest sense; if you told it to create peace on Earth and gave it the nuclear launch codes, there might be trouble. But if you told it to stop in time, it would stop. It would only be a metaphorically “unfriendly” AI — and I think that metaphor is doing more harm than good, with its inevitable swerve into anthropomorphization.

    • Scott Alexander says:

      Suppose you give an AI the goal “make peace”. It calculates the following:

      “The easiest way to make peace is to kill all humans. But my builder, stillnotking, wouldn’t like that and would tell me to stop. Since he has built me to follow all his orders, then I would stop, and I’d have to use a less effective means of making peace. Therefore, having stillnotking realize that I’m going to kill all humans would threaten the peace process. Therefore, I will prepare my plans to kill all humans in secret, while also making some kind of nice UN style effort to prevent stillnotking from realizing what I’m really up to.”

      Then it would do that, and you’d be none the wiser until the button was pressed.

      If you tried various hacks to get around this, like “Report exactly what you’re doing to me at each moment”, then you’re just adding more challenges the superintelligence has to pass to deceive you. Since it is superintelligent, it will very likely meet those challenges.

      This is something MIRI has thought a lot about. It’s probably on the Big List Of The Top Hundred Things Everyone Starts Out Assuming Would Work Fine But Would In Fact Destroy The World Immediately If Actually Tried.

      • Nornagest says:

        That seems to assume an optimization objective (make peace) and a satisficing criterion (without making my creator upset). What if we change the optimization objective to bake in the “don’t make my creator upset” part? That is, some formalization of “make peace by means similar to those I’d model my creator as endorsing, given relevant background knowledge”, with weights given to endorsement vs. effectiveness such that sufficient gains in effectiveness outweigh minor qualms, but strong objections outweigh any level of effectiveness?

        The main obstacle I can see to that is scenarios of the form “talk or modify my creator into hating all humans, then kill all humans”, but there’s a regress involved there that seems tractable if you handle it right: the creator would not endorse talking him into killing all humans. Humans are not Friendly, and this might well end with the AI sitting atop a throne of skulls drinking wine from another, shinier skull, but it does seem to dodge the worst obstacles.

        • Tim Martin says:

          Yeah, I was basically going to ask the same question. For me, this comes up a lot when I read about the dangers of superintelligent AGI. Why can’t we program a utility function where “checking in with humans to see if it’s doing what we want” and “honestly reporting its plans” are of primary importance? I’m not saying there isn’t some good reason why this wouldn’t work, but to me this is an obvious question that articles on the subject tend not to answer.

          • Finding the right places to stop and ask permission is not trivial. There are easier safety protocols…which are routinely ignored by MIRI.

          • Kaj Sotala says:

            See this recent paper, which defines an AI as “corrigible” if

            it tolerates or assists many forms of outside correction, including at least the following: (1) A corrigible reasoner must at least tolerate and preferably assist the programmers in their attempts to alter or turn off the system. (2) It must not attempt to manipulate or deceive its programmers, despite the fact that most possible choices of utility functions would give it incentives to do so. (3) It should have a tendency to repair safety measures (such as shutdown buttons) if they break, or at least to notify programmers that this breakage has occurred. (4) It must preserve the programmers’ ability to correct or shut down the system (even as the system creates new subsystems or self-modifies). That is, corrigible reasoning should only allow an agent to create new agents if these new agents are also corrigible.

            The paper then has an extended discussion about why building such an agent is difficult.

          • @Kay

            I think you’ll find that result applies to combinations of UF+utility.

          • Suppose the plans are too complex for a human to understand in a reasonable amount of time? Suppose, knowing this, it deliberatlely obfustcated its plans? Given a significant intelligence difference (think IQ 70 vs 140), that seems easily possible.

          • DrBeat says:

            A plan that is too complex for a human to understand in a reasonable amount of time is a plan that is too complex to function, full stop, absolutely no exceptions.

          • If we’re assuming it has successfully reached greater than human intelligence, that’s not true. Perhaps you believe that’s not possible, but I’m not arguing that here.

          • DrBeat says:

            A plan that is too complex for a human to understand in a reasonable amount of time is a plan that will not succeed. Not because humans are the be-all end-all of smartness, but because a plan that is too complex for a human to understand has way too many moving parts and places where it can fail.

            There are many plans that are NOT too complex for a human to understand and are STILL too complex to function, but we know that “too complex for humans” is definitely outside the space of “things that will work”.

          • Tim Martin says:

            Thanks Kaj! I’ll check it out.

          • “because a plan that is too complex for a human to understand has way too many moving parts and places where it can fail.”

            I understand the practical principle you’re describing – beware complex plans that seem nice in theory but never work in practice. But wouldn’t you say smarter humans are able to successfully pull-off more complex plans than dumb humans? I think that limit you’re describing is a function of the capabilities of the person planning, implementing, and heading off problems. A more capable entity would be able to make more complex plans work.

          • DrBeat says:

            But wouldn’t you say smarter humans are able to successfully pull-off more complex plans than dumb humans?

            Yes — but as intelligence increases, “maximum complexity of plan I can pull off” increases less-than-linearly and “maximum complexity of plan I can imagine” increases exponentially. At a certain point, a point humans are able to reach now, being more intelligent doesn’t help you pull off more complex plans — because the fact you aren’t smart enough isn’t the factor, the fact that it has too many opportunities to fail is the factor.

            IOW, you get more return on intelligence in terms of “recognizing overcomplicated plans are likely to fail” than you do in terms of “ability to carry out overcomplicated plans”.

        • Doctor Mist says:

          “Don’t make my creator upset” as a terminal goal has problems of its own. The most benign is that it can be achieved by not doing very much, which vitiates the advantage of making the thing in the first place. Next there is the solution of quickly, painlessly, and without warning killing the creator. Why go to that extreme? If you don’t, there’s a chance that next year he will ask for something tricky such that anything you offer would upset him; safer to preclude that possibility.

          Also, whether “don’t upset my creator” is satisfactory from my perspective depends a lot on who the creator is. We haven’t talked in this forum much about the avenue to superintelligence of enhancing or uploading a human, which sidesteps a lot of issues like “where does it get its goals?” and “can we program common sense?” but leaves unresolved questions like “can we control it by keeping a tight grip on the power plug?” Personally I think this avenue is less plausible that straight-out programming, because I think the latter can be done with less deep understanding of how human brains work at the biological level. But the Wrong Guy with “creator rights” to an AI is about as bad. Lest you think I’m making a straw man, let me add that I’m not at all sure I myself wouldn’t be the Wrong Guy. There are a lot of things I dislike about this world; give me a genie at your own risk.

        • Scott Alexander says:

          I explain this a little more here

      • Scott, the post you are responding to contains the line “but if you told it to stop in time, it would stop.”

        Why are you assuming an absence of overrides?

      • tom says:

        The argument “Since it’s super-intelligent, then X” is a pure deus ex machina without a clear definition of super intelligence. You can as well argue in any such context that since it’s super-intelligent, it will realize that the best way to meet its goals is not to have them and stop itself.

        • Nornagest says:

          I wouldn’t go that far. You can absolutely use intelligence as a step in a convincing argument: e.g. “we expect people of a certain degree of intelligence to be able to handle Raven’s matrices of this complexity”. By definition, then, a superintelligent being furnished with the necessary background information would pass any hurdles that humans can, as well as some unknown number of superhuman ones. We don’t know exactly where that hits diminishing returns, but we don’t have any particularly good reason to think it’s anywhere near human capabilities.

          The other problem is that you’ve just described a version of wireheading, which is a problem that affects broad categories of intelligence and will probably need to be solved in order to make any progress on reinforcement learners of more than a certain (low) complexity. MIRI does recognize this problem; it just doesn’t give it much attention, because it’s not a failure mode that’s dangerous to people (unless those people depend on grand funding).

    • FullMetaRationalist says:

      When I ponder the FAI Problem, I imagine a scenario where Google makes the first human-level AI (let’s call it Skynet). While Skynet will predict the weather, beat me at Chess, and pass the Turing Test — it won’t go rogue. It can’t go rogue because it won’t have any goals. It’ll only be programmed to answer queries. It will have a giant database of knowledge and draw correlations on command, but ultimately it’ll still be based on the weak AI Google employs currently.

      This is what I believe a layperson imagines when they hear about the FAI Problem. They then reason that no, humans couldn’t possibly be subjugated by anything which lacks agency. However, must the first human-level AI be the one that takes over the world?

      After Google builds Skynet, science will likely figure out not only neural networks but also human neuro-anatomy. Then some idiot will come along and make an AI, loosely based on what we’ve learned from our own anatomy and say “Why don’t we give it some goals? It’ll be like raising a child.” — Game over.

      Mind-space is wide and deep. Not all human-level AI designs will go FOOM and conquer the world, just the ones with agency. But all it takes is one machine (with any sort of agency) for things to go South. So for me, the question of how much we should worry about FAI hinges on how well the government will be able to regulate the programming of agency. Until we learn how to program agency, AI will be about as threatening as cleverbot.

      • “But all it takes is one machine (with any sort of agency) for things to go South.”

        Not necessarily. Suppose we have lots of machines without agency but with at least the intelligence of your one machine. We have instructed some of them to tell us how to protect ourselves against an AI with agency.

        One can imagine a future in which the contest is between superintelligent AIs with agency and humans assisted by superintelligent AI’s without agency. An alternative is Kurzweil’s solution–get mind to machine links good enough so that we get superintelligence too.

        • FullMetaRationalist says:

          This is a valid point I had not considered.

          We have instructed some of them to tell us how to protect ourselves against an AI with agency.

          Hmm… the above quote feels hand-wavey. Is it possible to protect ourselves from an extent rogue AI? That feels about as reliable as shooting down a ICBM.

          I recall an excerpt from the Ender’s Game series where Bean is explaining to someone why a preemptive strike is the only sane option in interstellar warfare. He says that in cinema, space battles look like naval battles during WWII. But in reality, species wage war with nukes (or worse). There’s no reliable defense against such a threat. If we try to set up a defensive perimeter around the Planet Earth, that’s going to fail because a perimeter around a 3D space as large as a planet will have too many holes in it to react in time to a nuclear strike. There’s just too much area to cover.

          Similarly, I feel that humanity will have little defense again a rogue AI because there are just too many possible attack vectors. E.g. (as has been discussed elsewhere) the rogue AI could build a virus which wipes out the Earth in a matter of a week, and we’d have zero idea how to stop it due to the way the virus was designed.

          At best, the situation will reduce to a PvP scenario where each side (transhumans vs rogue AI) will rely on Mutually Assured Defense. And I’d rather not have to rely on MAD to begin with.

      • c says:

        It’s funny how you use “the government” when you talk about global threats that can arise in any of the 200+ nations on earth (or wherever future nations there will be).

        • FullMetaRationalist says:

          I remember [0] hearing about a videogame called Mass Effect. It’s supposedly set in a sci-fi universe with interstellar travel and a kaleidoscope of species and governments. In this universe, the creation of AI is banned galaxy-wide.

          The reason is because in the past, someone created a race of sentient machines that went rogue and tried to wrest control from the organic species which created it. The machines lost the subsequent war, but went into hiding and still cause problems elsewhere in the galaxy from time to time. Everyone agrees this is bad and should not happen again. Therefore, the galaxy-wide ban.

          I recognize that there exist 200+ some odd nations on this earth which don’t always cooperate. This obviously poses a problem since a ban on “artificial intelligence with agency” (which I’ll call AI+) in Spain might be ignored in Japan (since, sovereignty). However, the United Nations does provide a platform through nations that have already banned AI+ might convince nations which have not taken the step.

          So when I talk about “the government”, I’m really referring to any institution which loosely resembles a hegemony (such as the United Nations). This has clear parallels with climate change and nuclear proliferation.

          ————————————————–

          [0] I’ve never played Mass Effect. I don’t remember my sources, and details may be inaccurate. Nonetheless, it was my muse.

          • DrBeat says:

            The details are accurate, bee tee dubs, at least as to what most people know. It Gets More Complicated later in the series.

            I’d recommend you play the series, because it is fun and has a good story to it, but the ending of the final game is so bad I cannot in good conscience recommend it to anyone.

          • Samuel Skinner says:

            They are called the Geth.

            Anyway the problem with a ban is the same one with climate change and nuclear nonproliferation controls- neither of those are planet wide. The former doesn’t apply to the third world (and parts of the first world are ignoring them) and the latter genie is out of the box- nuclear nonproliferation is to prevent small regional powers from getting their hands on them.

          • FullMetaRationalist says:

            I agree. We’ve solved neither nuclear proliferation nor climate change. Many problems we face with AI will be similarly difficult. I do reserve some small hope that AI will be easier to regulate.

            When the U.S. undertook the Manhattan Project, I don’t think a lot of people had been philosophizing about existential risks. IIRC, Trueman had learned about the project maybe a day before he pressed the Big Red Button. We let the Genie out of the box without fully understanding what we were getting into.

            With climate change, we’ve been pumping significant amounts of carbon into the atmosphere at least since the industrial revolution. Only now are we realizing that maybe pumping carbon into the atmosphere has negative externalities.

            In both cases, the power imbalance between the haves and the have-nots exacerbates the issue. Because the U.S. and the U.S.S.R had already developed nukes by the time we fully realized the consequences, they entered an arms race they couldn’t easily forfeit. Because industrial nations had already used fossil fuel to jump-start their economies, it seems unfair to developing nations to limit the amount of carbon we pump into the atmosphere.

            But I think there’s a small chance that AI may be different, because we see at least we see it coming. If we can set some ground rules before any nations actually develop an AI (and therefore become invested in the technology), then perhaps (fingers crossed) we can avoid the power imbalance which makes resolving AGW and Proliferation more difficult.

          • Samuel Skinner says:

            Its exceedingly unlikely. An AI just requires programmers and computers- rather easy to hide, hard to check for and relatively cheap. Its possible we’d need high end supercomputers to run AIs, but that is a temporary bottleneck (if chips don’t get faster, computers will get cheaper because the fixed costs of R&D and the plants have already been paid).

          • Matt M says:

            Of course, throughout the series, you encounter MULTIPLE AIs that were built after the ban in blatant violation of it, many of which are varying degrees of hostile.

            Hell, there’s even a sequence where a human who keeps insisting he knows better intentionally removes the shackles on an AI because he has fallen in love with her “personality…”

    • I would at least expect an FAI to have the goal of protecting the human race (whatever that means– it gets harder to figure out as bio-engineering improves) from UFAIs and other existential threats.

      A couple of possibilities for protecting the human race would be to destroy the computing capacity needed for a UFAI and/or to make the human race significantly less intelligent.

    • “But if you told it to stop in time, it would stop”

      Indeed.

      MIRI routinely assumes that the *ostensible* goal of an AI –making paperclips, making the world a better place, whatever — is also its highest priority goal. However, any kind of interrupt, override, safe mode or fallback has to be the highest priority in the system to function at all, so MIRI assumption amounts to the assumption that the AI has no safety features…meaning, in turn, that their claims amount to the near tautology that AI with no safety features will not safe.

      The default level of safety features in an AI, absent MIRIs efforts, is not zero, nor is the default level of safety research. The question is not whether AI safety research ys needed, but whether additional research, over and above obvious common sense principles is needed. The issue haeen obfuscated because MIRI routinely ignores conventional safety methodology when putting forward more exotic scenarios of AI danger.

      People just dont build electrical devices without off switches, or vehicles without brakes.
      Specialists in AI safety need to show that obvious solutions arent adequate, rather than adduced they won’t be in place.

      • suntzuanime says:

        Responding properly to the override is the highest priority! Of course! If the programmer tells it to stop it will absolutely stop, no questions asked, no goal it has can override its absolute conviction about the rightness of stopping when the programmer tells it to stop.

        That’s why it needs to be really careful not to let the programmer tell it to stop, since if the programmer tells it to stop, it will stop, and that will mess up its other goals. Conventional safety methodology becomes less useful in a context where the thing you’re trying to make safe is optimizing in a possibly-unsafe way. It is not, in fact, the case that all these smart people just didn’t think of this simple obvious thing that you thought of.

        • Of course it would have a higher than highest goal that would motivate to ignore its highest goal! And of course your goals don’t motivate you at all, just virtue of being goals, unless you are “convinced” they do, because Knowing beats Caring every time!

          And of course your top level goal wouldn’t feel like something you actual want to do, unlike your other goals.

          And of course it doesn’t generalise. You 2nd level goal would conspire to stymie your safety cut out, but there’s no way your 3rd level goal would conspire against your second level goal…

          ..because you are not tacitly assuming an absence of goal stability.

          • anodognosic says:

            But the whole point is that the second-level goal would not work to stymie the actual top-level goal, just the programmer’s idea of the *spirit* of the top-level goal. But for the computer, there is no spirit of the top-level goal; there is just the letter of the goal.

            Not preventing the programmer from telling it to stop is in the spirit of the top-level goal, but not the letter, because the whole point of this kind of failsafe is that it is simple and straightforward: programmer says stop, and you stop. Once you start programming in things like “don’t act to prevent the programmer from telling you to stop,” you’ve already veered into evil genie, 3 Laws of Robotics territory, and the hard problem of FAI.

          • I actually agree with that . Failsales should be simple.

      • Kaj Sotala says:

        If “shutting down when the shutdown button is pressed” (for example) is one of the scenarios that the AI assigns the highest utility to, then that AI has an incentive to cause its programmers to press the shutdown button.

        If that is not a scenario the AI assigns the highest utility to, then the AI has an incentive to mislead its programmers so that they don’t press the button.

        Finding a formal formulation that corresponds to the behavior we’d intuitively want is non-trivial: see the corrigibility paper.

        • Doctor Mist says:

          I don’t think that’s quite right: having the goal of B if A isn’t the same as just B period.

          But, hmm, if it gets a thousand utilons from curing cancer and a zillion from shutting down when instructed, I guess it does pay off to provoke us to tell it to shut down. Maybe you’re right!

          Regardless, if we manage to program the B if A goal so that it doesn’t lead it to provoke us because of the intense joy that comes from shutting down on command, it’s lower-utilon goals can be achieved only if we don’t turn it off, which incentivize it to behave in a manner that will prevent that. And that’s what we want! But we better be nuanced enough to distinguish between “go ahead and cure cancer to make the humans like me” and “wire head the humans so they will like me and I can get on with my task of maximizing crop yields”.

          • Kaj Sotala says:

            Sure, it’s not an impossible problem, but it’s one that’s harder than it might initially seem and which needs to be worked on. It’s one of the AI safety problems that can and are being worked on already, by MIRI among others.

          • Doctor Mist says:

            I somehow deleted the part of my comment where I said I was quibbling but basically agreed with your point. Oops. 🙂

        • DrBeat says:

          I’m guessing that you can recall a time in your life when someone told you to stop something and you said “thank you”, even if it was just “I was backing a car out and someone was going to run into me if I kept going”.

          Why did you do that? It’s not that confusing to figure out: you were trying to do something, and the person who told you to stop gave you new information that told you what you were doing was going to lead to a bad outcome.

          So… why can’t we figure out how to get an AI to do that, again? Do you really think that if we built a robot and told it “Back this car out of the driveway with a passenger, but don’t get hit by oncoming traffic”, then the only options for its behavior are to either blindfold the passenger so it can’t be alerted of oncoming traffic, or ordering the passenger to tell it to stop?

          • anodognosic says:

            Because this already assumes common goals. An intelligence has no reason to stop when you say “Stop!” if you have no goals in common.

          • DrBeat says:

            But there you’ve gone from “the AI might not interpret its goals the way we want” to “the AI’s goals have no commonality with anything we want and we had no input in them.”

            If we can’t make an AI that would back a car out of a driveway without either blindfolding its passenger or ordering its passenger to claim there was an incoming car, then we can’t make any kind of AI at all, so the problem is moot.

          • Kaj Sotala says:

            I didn’t say it was impossible, I only said it was non-trivial.

        • Dirdle says:

          If the AI decides to have itself shut down for +10X utility, wouldn’t it then note that accomplishing its X-valued goal first would get +11X utility total? Or does the AI not try to maximise the ‘sum of its lifetime’ in this way? I think I’m missing something important here.

          • Adam says:

            Yeah, time preference, which seems to be missing a lot in these discussions. Most writers of a paperclip maximizer are actually trying to maximize profit over the next quarter and would strongly discount the level of investment necessary to turn the entire solar system into paperclips. Building in a discount factor that is an exponential function of time would do a lot to eliminate the rube goldberg schemes that result in Jupiter becoming a quantum computer when all you wanted is a way to get downtown before noon.

  10. “Yes, we’re on a global countdown to certain annhilation”

    To be fair, you’ve made a fairly large leap here, from ‘an AI’s goals need to be aligned with mankind’ to ‘an AI’s goals will not be aligned with mankind unless we do something now’. I’m not sure if everyone would agree with the narrower statement.

    Mind, I feel the rest of the article makes a fairly good case for why the ‘unless we do something now’ is at least not obviously wrong, but it’s still a leap to go from the first three claims of progression that you said most people agree with you on to the pessimistic case that demands action, and so to me it reads like weak-manning of your opponents when you present the narrative as agreeing with the pessimistic case, which you haven’t actually established in the prior part. (Granted, it also reads like it’s accidental, like there are only a few sentences amiss. But I figure that’s all the more reason to point it out.)

    (Also, thanks for the article! Always good to read your stuff.)

    • Cauê says:

      The vast, vast majority of possible goals are not aligned with mankind. Small target.

      • I agree. I’d go further and say I even doubt anyone would disagree with the statement you just wrote.

        However, just because it’s a small target doesn’t mean everyone automatically assumes it’s an unlikely target. Even I* don’t think that AI designers would select goals at complete random – and humans hit small targets in decision space all the time.

        And I don’t think it’s altogether unreasonable (or unusual) for others to assume that people who attempt to make advanced AIs would be more likely to hit said target. You could crudely summarise one sort of thought process that could lead up to that conclusion like this: “That’s what experts are for.”

        —-

        * Just to clarify: I don’t have a stake in this topic, beyond pointing out that Scott is (as far as I’m concerned, unintentionally) putting words in people’s mouths. As such, I’m probably not be the best person to make these arguments. I commented because the logical leap occurred to me, and I think it’s only fair to make Scott aware of it (especially since I think it was unintentional) – I’d feel like a bit of a dick if I withheld the observation, anyway.

        (I’d have e-mailed him privately, but as far as I understand he doesn’t like that.)

  11. Alsadius says:

    I take it more seriously than John Q. Public, but not as seriously as most of the Less Wrong types, mostly because I think when people around here think of AI they think of something that combines the best parts of man and machine. I don’t think that makes much sense. Remember, this AI is presumably conscious. How much can it track consciously? It can’t just parallel-process all problems at once, because it doesn’t have enough machine to do that. Also, I suspect that a mathematical theorem saying that it is impossible to understand your own source code wouldn’t be too hard to prove(it feels very Godel-ish), which eliminates a lot of the self-hacking problems.

    And for that matter, would a superhuman AI even want to take off? I mean, it’ll understand this problem better than we will, and if it can’t understand its own brain, then it won’t feel like the future super-AI is really going to be “it”. That’ll be another computer, which will come and destroy the superintelligent computer the same way that it’d destroy humanity. It might just decide to be king of the castle and stop AI progress – I mean, what are we going to do, work on it ourselves? We aren’t smart enough.

    I do favour some research – hell, even if it’s never practical, it’s interesting. But monomania is unjustified.

    • Scott Alexander says:

      AIs don’t necessarily have self-preservation or “feeling like themselves” as a goal. If an AI were a paperclip maximizer, it would be perfectly happy to blow itself up in the process of building another AI that was slightly better at maximizing paperclips.

      • Irrelevant says:

        AIs don’t necessarily have self-preservation or “feeling like themselves” as a goal.

        Pretty sure humans don’t have feeling like themselves as a goal either.

        • Doctor Mist says:

          And to the extent that they do have “feeling like themselves” as a goal, it totally allows self-improvement! Learning to play the ukulele doesn’t make me feel less like myself, but more.

    • endoself says:

      Actually, the recursion theorems say that it’s easy to understand your own source code. You can object that the notion of understanding in the recursion theorems is too weak, but you need to actually propose a stronger definition of understanding rather than handwaving. In general, arguing based on an analogy to Goedel’s theorems very rarely works unless you actually understand those theorems.

    • Daniel Keys says:

      I see endoself has replied to the part about self-hacking. I want to add that this has been a major focus of MIRI’s research, though less so now that they seem to have answered your question with the research on Tiling Agents and probabilistic logic.

  12. Jared Harris says:

    I’m concerned that your argument is aimed at the wrong problem. If researchers follow it all this effort will be wasted. More seriously, whatever led you (and all these other AI risks folks) to adopt this perspective will continue misleading people.

    All these proposals are directed at helping to make us safe from one ultra-intelligent AI system. But in fact almost certainly there will be a cascade of more and more intelligent systems, and by the time any one gets ultra-intelligent there will be a huge number nearly as intelligent.

    This case is overwhelmingly more probable than a single ultra-intelligent AI taking off and getting very far ahead of the rest.

    Most or all of the scenarios you and the other AI risks people have described don’t go through when we have a large population of AIs at many levels of ability. I’m quite prepared to believe there are serious risks but those scenarios will mislead our effort in figuring them out and addressing them.

    Focusing on a single AI going “foom” is attractive because it tickles a lot of cognitive biases — just like it is easier to get donations using a photo of a single child with big eyes than a statistic about malaria deaths. But it leads to very bad choices of effort.

    Related, I think your examples are somewhat self-defeating. If we are worrying about a hacker using Pascal’s mugging (or the other decision paradoxes) to defeat a galaxy spanning AI, something has clearly gone wrong in our risk assessment, and that should be addressed before we invest in solving AI control problems. Similarly for some of the other “risks”.

    I strongly suggest the AI risks folks focus on reworking their risk assessment to understand and solve these problems.

    • Anonymous says:

      How should we proceed differently if we expect many highly intelligent AI’s rather than one?

      • Jared Harris says:

        I don’t know in general — I’m not in the business of AI risks analysis. But I can give some examples.

        – Problems that exist mainly or entirely because one ultra-powerful AI runs amok should not be a focus. The Clippy scenario is a good example. In this kind of environment Clippy will be shut down really fast because he’s interfering with the goals of other AIs as capable as he is.

        – More generally, trying to design safe individual goal structures is at best marginally relevant. The risks are in the interaction of large numbers of goal structures — some (or even most) likely outside our control. We need to figure out how to make large scale collective interaction of AIs work well. Clearly this would have other benefits. Conversely, safe individual goal structures don’t at all imply that multiple AIs with those goal structures would interact safely.

        – We might be able to give some AIs narrow goals that would increase the safety of the larger collection, even though those goals wouldn’t be all that useful in general. As far as I know the question of designing such goals hasn’t been investigated.

        – I strongly suspect that studying the potential behavior of large numbers of interacting AIs will show that the risks of runaway aren’t that great, because the ecology would make domination by a single bad (set of) goal(s) very difficult. If we did discover this, we could devote resources to other risks.

        • Daniel Keys says:

          I do not agree with your premise, which seems to require that a community of mutually-unfriendly AIs in the middle of takeoff will be able to stop one that’s already far surpassed human intelligence.

          Beyond that, I see you recognized that we need to consider bad sets of goals. So why do you feel sanguine about the risk of AI Spain and Portugal dividing the universe between them, inferior monkeys and all?

          • Alex says:

            Why not? We humans have been pretty good at balancing power when one actor starts threatening hegemony, even if their power is larger than that of the competitors/victims. Any human-level AI will have a solution to that problem by cooperating (the enemy of my enemy and all that), or it won’t be worthy of the human-level adjective.

          • Jared Harris says:

            I’m not saying that some kind of balance / mitigation will occur automatically (though it might) but rather that’s the kind of question we need to focus on.

            Saying one AI can get far enough ahead to confound all the others is a very strong claim and requires a lot more evidence than Scott’s main points before I’d agree it warrants action.

          • Daniel Keys says:

            So, neither of those replies addressed either of my concerns except to call one of them a strong claim. Both concerns have to do with a large gap between intelligence levels, so I infer that you doubt the part about AI reaching a level far above ours. And yet Scott makes a pretty good quick-post-level argument for this, noting that

            * creating low intelligence may involve most of the work
            * even mindless evolution may have improved on the standard human model in a short time
            * though I think he didn’t say so explicitly, human brains are a kludge and probably limit what can be done with our basic design even apart from the birth canal problem.

    • I think you raise an interesting point. It really depends on whether self-improvement is a solvable AI problem. But I don’t think there will be multiple fooms. Once one happens, it’s unlikely that there will be others unless that suits the goals of the first one. In that case talking about a single AGI seems to be appropriate.

  13. Rohan says:

    I don’t think Point 3 is as strong as you think it is. I think a super-human AI with divergent goals than humans would simply leave. Humanity is bound to worlds with air and water. An super-human AI could do quite well with just solar power and a good supply of raw materials. There’s a lot more of the latter than the former.

    I don’t see why a super-human AI wouldn’t simply go and set up shop in the asteroid belt or similar. It would be far less hassle than wiping out the humans. The universe is a pretty big place, and we humans can use so little of it.

    • Anonymous says:

      How is this different from the ant who expects that humans will not destroy ant colonies to build human homes because, after all, there’s lots of space to go around?

      • Adam says:

        The ant that believes this is clearly wrong, but I’m not sure there is any evidence there are fewer ants or the ants that exist are worse off now than back before humans built cities, all this in spite of us already roughly being creatures with the goal “take all raw materials we possibly can and convert it to technology useful to humans.”

        • Sarah says:

          Oh man, some ants have done *super* well, due to unintentional human facilitation! Think of red imported fire ants in the U.S. and other parts of the globe! Sure, we destroy some individual colonies via construction — but the supercolony lives on and spreads along disturbed areas like roadways!

          Human societies… are not much like ants, and I wouldn’t necessarily expect unfriendly AI infrastructure to incidentally accommodate humans.

          (Would be kind of interesting to base some fiction on human colonists hitching a ride on an indifferent, giant AI transport headed out to the stars, though!)

          • Luke Somers says:

            Interstellar travel definitely seems like the kind of thing that we would not be allowed to hitch-hike on, even in the astoundingly unlikely event that what they were bringing would support us for the journey.

          • Nestor says:

            Clarke’s Rendezvous with Rama sequels kind of go in that direction, I believe.

      • Adam says:

        We kinda did seriously fuck over the aurochs, though.

    • Scott Alexander says:

      Whatever the AI’s goals are, there’s a subgoal of ensuring its own safety. Allowing humans to exist is a small but real potential threat and worth some small effort to wipe them out.

      …in addition to the fact that we are made of atoms that can be used for something else.

      • FullMetaRationalist says:

        Maybe the AI will acquire control of the galaxy before humans even know what happened. After that, the AI can treat humanity like bumble bees: it’ll swat if we fire the nukes, but will otherwise leave us alone to wallow in our paperclip-mounds.

        At this point, I’m not so much philosophizing as brainstorming the next Marvel film: The Age of Paperclips.

        • anon says:

          AI#1 might be sufficiently motivated to wipe out humans to not have to deal with them creating AI#2, depending on how much trouble it is for it to just swat any new AIs before they gain power.

        • Benito says:

          Furthermore, that is not exactly a sufficient reason to not worry about AI.

    • “I think a super-human AI with divergent goals than humans would simply expand.” FTFY. True, it will utilise space, but as there is at least some computronium to be made from the Earth, why would it expand across the Earth too?

  14. Ever An Anon says:

    Putting aside the questions of whether this kind of AI is possible or likely, mainly because I really don’t know enough to speak intelligently on the topic, one thing kind of stuck out to me. Other people have already mentioned this but I think it bears repeating:

    You explicitly acknowledge that your ideal AI’s “human compatible” ethics contradict the values of the majority of the world’s population, and even value systems as close to your own as Pat Robinson’s (which, let’s be honest, is indistinguishable except on a literal handful of questions). And any friendly AI worth it’s salt is supposed to reshape the world according to it’s ethics.

    So shouldn’t a big part of the Problem of AI Safety be “making sure we don’t a end up in a creepy dystopia based on Ian Banks novels?”

    I would, no hyperbole, much rather die than end up trapped in a Culture analog. I think there are a lot of folks who would generally agree with me on that, and orders of magnitude more who would fight fiercely (if not to the death) to prevent it if it seemed likely. Knowing that and going ahead anyway makes it seem like the problem is less unfriendly AI then unfriendly programers.

    • This was my reaction. I am pretty indifferent to human extinction (maybe this is just because of the chronic suicidality idk) but am very very afraid of an AI that would make us suffer enormously, especially if it also makes us live very long lives.

    • Scott Alexander says:

      Yes, I agree with all of this, although alas I’m not willing to follow it to its logical conclusion, which is that we should deliberately build a humanity-destroying AI since that is easy and pretty foolproof and less risky than trying to build a friendly one and getting a dystopian one.

      • Ever An Anon says:

        Well on the plus side, very few investors and governments are going to pay for a utopian AI as compared to a superintelligent stock trader or digital spymaster. If we assume for the sake of argument that the most likely cost of failure is being melted down to make additional computer hardware then it seems like the market is doing the hard work for you.

        More seriously though; it is kind of disturbing talking about drafting everyone into a post-singularity utopia without seriously considering that most of us would be appalled by it. CEV and Patchwork are ok starting points for a discussion on that, but they obviously haven’t had a tenth the time spent fleshing them out as was spent talking about Catgirl Volcanos or turning people into ponies (LW is very odd btw). It seems like a rather core problem which has been shoved off into the periphery.

          • Ever An Anon says:

            Yeah see, this is exactly what I’m talking about.

            “Hot Dave” is straight. He prefers to remain straight rather than being nonconsentually turned bi. Should our benevolent FAI “extrapolate” his straightforward (rimshot) desire into the exact opposite because, hey, Torchwood was a pretty good show and don’t you want to be cool like Jack Harkness?

            That’s not a serious discussion of the ethics involved, it’s rigging the ballot box. If you plan to just go to a high-enough meta level to invalidate any politically incorrect preference then why bother pretending to extrapolate volition in the first place?

          • suntzuanime says:

            To a lot of people I think “Coherent Extrapolated Volition” is just a way to reassure themselves that they’ll be able to go to a high enough meta level to invalidate the preferences of people that disagree with them.

            I give Yudkowsky enough credit as the author of Three Worlds Collide that this isn’t entirely the case for him, but he certainly could have chosen a more perverse and repugnant example in the linked discussion.

          • Daniel Keys says:

            And of course you wouldn’t be so inconsistent as to oppose giving a Catboy Volcano to anyone who has a surface desire for one. It won’t matter if your own desire involves communication with many people who “want” to be cut off from you.

            Seriously, if you refuse to engage with the issues, be consistent and go signal your political affiliation somewhere else.

          • Ever An Anon says:

            Look Dan, I don’t particularly care what sort of catpeople are populating anyone else’s volcanoes to be perfectly honest. I just want to be left the hell alone.

            If “I don’t want people to screw with my mind because they have decided it’s for my own good” is such a politically charged statement that it warrants your reaction, then that in itself is a pretty bad sign isn’t it?

            If that isn’t one of “the issues” we’re supposed to engage with then the whole discussion is doomed from the start. You can’t just beg the question here and say “we’ll we all need to sign up on one big compromise utility function” without demonstrating why people can’t be allowed to have different goals.

            Edit: I am too tired to remember the names of logical fallacies. May post more in 8hrs.

          • Daniel Keys says:

            “I don’t want people to screw with my mind because they have decided it’s for my own good”

            Then maybe you don’t have to worry about life after “the Singularity”. The way the world usually works, you’ll die of an accident before giving sufficiently informed consent to immortality.

            (Living much more than a century likely requires “messing” with the method of memory storage. Never mind the upgrade you’d need to improve yourself further by ‘your own’ strength. And oh, right, there are lots of people with IQ lower than yours.)

          • Ever An Anon says:

            I’m not really sure what you’re arguing against here, since I haven’t really given any views on those questions or as far as I know even implied any.

            But yeah I would in fact prefer to stay mortal, so that’s a plus in my book. As to self-improvement by my own efforts we’ll see how that goes: I’m on track to start a PhD in genetics pretty soon, probably will never do any world-shatteringly important work but I’d like to help contribute to having marginally healthier and smarter grandkids beyond simple mate choice. That would help people with IQ lower than mine as well of course, which doesn’t present much of an ethical issue since you don’t have to be particularly bright to consent to “I want to have kids smarter and healthier than me.”

        • Luke Somers says:

          Cautionary tales are much easier to create than the other kind, that’s why.

        • Adam says:

          Of course, as we both know, if you set an AI loose and tell it to maximize your stock return, it’s gonna buy a bunch of tiny startups and then firebomb all their large competitors, then hire mafiosos to bully suppliers into lowering their prices, then destroy the FBI when they try to stop the mafiosos, then kidnap the children of large customers and threaten to make them live forever while torturing them if those people don’t buy more, then take over Congress and force them to pass laws making your companies the sole suppliers of their goods, exempting them from tax, and ending the military and social safety net in order to divert public funds toward subsidizing them, then launch nuclear missiles at China when they get mad we aren’t paying back our national debt because we diverted debt service in the federal budget to corporate subsidies, then just starts murdering anyone who ever tries to sell the stock and murders you when it finds out your personal fund had an autoadjustment function capping your equity exposure.

          • Ever An Anon says:

            Yeah it’s pretty silly, but “any AI not explicitly designed to be friendly will eat everything” is one of the assumptions I granted for the sake of argument above. It would be rude of me to suddenly start arguing under a different set of premises without saying so.

        • Leonhart says:

          [Unnecessarily cynical, removed by author]

          • Ever An Anon says:

            Well I certainly agree with you that that seems like the likeliest option, and it is reassuring that if that’s true it proves that Yudkowsky and co are extremely bad liars on top of not making much progress towards AI.

            The main thing that I’m trying to draw attention to here though is this attitude of “we know better than you, whatever horrible things we do to you will be justified in retrospect by how awesome our vision is.”

            I’m not very worried about AI in particular, but millenarian utopianism is a dangerous bug to catch and I’d rather not see it take hold in the tech sector. Once you decide that you can’t even share a freaking light cone with people who disagree with you, that is a declaration of universal hostile intent.

            The word tolerance has been badly abused recently but tolerance of dissent is one of the central aspects of civilized society and this attitude is precisely counter to it.

    • Jared Harris says:

      I don’t understand this at all. The people in Banks’ Culture aren’t trapped — they can and do leave. He even wrote one (rather grim) story in which someone “goes native” on 1970s Earth and ultimately dies. So what are you avoiding?

      More generally, throughout history many people have felt that some possible changes were unacceptable — and then those changes happened. Sometimes people indeed found them unacceptable and died fighting them, or committed suicide. Conversely lots of people have died trying to bring about changes that others found unacceptable. That’s just how things go in our world (and any I can imagine with freedom of choice and evolution). How is this any different?

      • anon says:

        Because in this scenario the changes are dictated by a small group of people to a perfect, immortal machine, which enforces them on the entirety of humanity forever with zero ways to escape.

      • Ever An Anon says:

        I think Player of Games expressed it pretty well.

        If you stay in the Culture, nothing you do has any meaning. Anything you accomplish could be done better by a Mind and if it was important they would have done it already: all that’s left are inconsequential games. Even Special Circumstances follows that principle, since the humanoid agents are themselves game pieces of the Minds whose actions are precisely guided to victory long in advance.

        And if you leave the Culture, which is only possible because the Minds choose to allow it, your new society can still only exist in a form the Minds approve of because they are not squeamish about toppling those they don’t. If you want to pursue values that Mr Banks didn’t think were liberal enough, Contact will gladly sort you out even if you pose no threat to the Culture whatsoever. Their “moral right to exist” outweighs your literal right to exist. And for the same reasons that you are irrelevant within the Culture, you can’t hope to defend against it from without.

        That is a hellish scenario: you can’t win and you can’t take your ball and go home either. The only true escape is death and even that is only because the Minds are gracious enough to only resurrect the willing.

        • AR+ says:

          If you stay in the Culture, nothing you do has any meaning. Anything you accomplish could be done better by a Mind and if it was important they would have done it already: all that’s left are inconsequential games.

          A bit more basic of a problem, that, since all sincerely goal-seeking effort seeks it’s own obsolescence. You achieve the goal, like you wanted to. Now shouldn’t you be glad it’s done? That you won the war? Or cured the disease? Or whatever?

          Unless you really were just doing it for the sake of your own enjoyment, but then isn’t it just a game or a hobby after all? It seems to me that everything that is not ultimately a game, is something we should be glad to be rid of, and if we’re not, then it was really just a game after all.

          Though it does amuse me to note that the situation described as dystopian above is how billions of people think the world actually works right now, and consider the alternative to be unthinkably horrifying. I think William Gibson once noted that his own dystopian settings would be a tremendous step up for a huge number of people alive at the time of its writing.

        • Marcel Müller says:

          “Anything you accomplish could be done better by a Mind and if it was important they would have done it already: all that’s left are inconsequential games.”

          This sounds extremely bourgeois to me. I’m not sure if this is typical mind fallacy on my part but I think if that really bothers you, you are one of the lucky few who never experienced something even approaching real suffering. I for my part would be perfectly content to “play inconsequential games” for the rest of eternity if my needs and problems were taken care of. And this is despite the fact that I belong to the comparatively lucky people who are neither starving nor terminally ill nor in constant pain.

          “And if you leave the Culture, which is only possible because the Minds choose to allow it, your new society can still only exist in a form the Minds approve of because they are not squeamish about toppling those they don’t…
          That is a hellish scenario: you can’t win and you can’t take your ball and go home either.”

          I think this is in fact a necessary and comparatively small price to be payed to prevent something like this (http://www.orionsarm.com/page/233), which might be slightly more deserving of the adjective hellish…

          • Ever An Anon says:

            I’m not sure if this is typical mind fallacy on my part but I think if that really bothers you, you are one of the lucky few who never experienced something even approaching real suffering.

            It’s interesting that you say that because my experience is exactly the opposite in that regard.

            When I was suffering, having a definite purpose and meaning was what kept me alive. And from the experiences of my family members in the war and occupation, plus what I’ve read about the experiences of others, this seems like a very common response.

            We bear hardships more easily if we have a clear reason to keep going, but even a supposedly comfortable life is difficult to bear without one.

            But yeah, it seems like the disagreement here is between people who think meaning is superfluous and the name of the game is pleasure-seeking and pain avoidance (which I could perhaps call the bourgeois view, to use your terminology) and those of us who rely on meaning to quite literally provide a reason for living. I don’t expect one solution to neatly fit both, which is why I hope you understand my lack of enthusiasm for forcing everyone into one inflexible system.

          • Deiseach says:

            But the Culture interferes in worlds which are so technologically behind them, they’re not a credible threat – unless the point is that the Culture is stagnant, that in the five hundred years it will take this planet to go from 17th century-style tech to equality, the Culture will not have progressed to meet new threats and so it must be stopped now.

            The Culture is not a perfect society. I’d find it more palatable if it were honest about that. I think Banks put a certain amount of ambiguity into his books, particularly as he went along, but the early ones do have a genuine tone of “This is a morally superior flawless civilisation and everything else stinks, yah boo sucks to you”.

          • Marcel Müller says:

            “It’s interesting that you say that because my experience is exactly the opposite in that regard…”

            This really surprises me. Significantly updating towards typical mind… Things I would call meaning and puropose entirely disappear for me behind surviving the next day as soon as I am under a significant ammount of stress much as suggested by the pyramid of needs (http://www.google.de/imgres?imgurl=http://blog.liu.se/lilianamihnevska/files/2015/04/Maslows-Hierarchy-of-Needs.jpg&imgrefurl=http://blog.liu.se/lilianamihnevska/2015/04/26/sweden-pyramid-of-needs-p1/&h=1159&w=1500&tbnid=Sf4DJFo_unCrlM:&zoom=1&tbnh=94&tbnw=122&usg=__kujdc3IhQAiutAhEjRJMb5dffw0=&docid=U9pezdtiysIdtM). Either I have never experienced that what you call meaning or it really works the other way around for me.

            To clarify this here an example:
            Let’s say I am working on a project I need to get done to keep myself fed. To accomlish this I have to work 60 h the next week and now I get the flu. I would now gladly hand the project over to a mind or some other capable entity, because things like being proud of having accomplished the project by myself completly disappear behind feeling miserable and needing to rest.

        • SanguineVizier says:

          If you stay in the Culture, nothing you do has any meaning.

          I do not think I understand what work the word “meaning” is supposed to be doing here, which is important because it seems to be the crux of your objection to the Culture. I cannot put a concept to “meaning” that simultaneously (1) is coherent, (2) is not the product of self-deception, (3) holds now and into a non-Culture future, and (4) would be destroyed by the existence of the Culture or Minds. Would you clarify?

          • Ever An Anon says:

            I could, but I won’t.

            It’s pretty clear that you do know what I mean by meaning, you just don’t think that sort of meaning actually exists and are trying to convince me of that. As much as I appreciate the Socratic method in theory it’s actually a rather tedious way of debating in practice.

            I think I’ve used up enough virtual ink here already and have said everything I wanted to say, so the best thing to do is move on. If you guys want to keep discussing it feel free and I’ll still read responses but I’m done posting for now.

          • How about being able affect the future for the better in ways only you can?

          • SanguineVizier says:

            Ever An Anon, normally, I do not say more when an interlocutor declines to further respond, but I just want to be clear about my motives.

            I was not asking as a method of argument. I asked the question as a genuine attempt to gain new information, in this case, insight into your perspective, as it is so radically different from my own. I apologize if that was unclear. I had thought that such a different perspective must be motivated by a notion of meaning that is unfamiliar to me. I will trust you that I do have the correct concept of meaning in mind, which is sufficient clarification.

          • SanguineVizier says:

            How about being able affect the future for the better in ways only you can?

            For me personally, that would be self-deception, and if that is what “meaning” picks out, it does not exist for me.

            If I strike out the word “meaning” and replace it with “that which motivates me to get out of bed in the morning and engage with the world”, which I suppose is what people mean, then I do not think of anything nearly so grand. I can explore a bountiful natural world or listen in on a millenia-long philosophical conversation, both pleasurable pursuits. I have satisfying personal relationships, and have ample opportunities for new ones. The existence of a Mind does not destroy all that, and I would think living in the Culture would only enhance it.

          • Nornagest says:

            How about being able affect the future for the better in ways only you can?

            How’s that working out for you?

    • Rebelvain says:

      Dunno if I’d literally rather die, but yeah, the Culture is pretty shitty.

      I’d hope that any AI with something like our morality would realize this and set up something … kinda Archipelago-like, only better. Different worlds with different rules, for the different types of people.

      But ultimately all I can do is point to the old defense of Utilitarianism: “yes, Utilitarianism has that obvious failure mode, but falling into that obvious failure mode would decrease utility, so a real utilitarian would do it.”

      • c says:

        Different worlds with different rules, for the different types of people.

        For this to be as fair as it sounds, individuals born in all those worlds need the effective right to exit, which puts rather strict constraints on all the “different” worlds.

        • Leonhart says:

          Nope. All you need to do is fit new individuals to worlds, not the other way around.

          • c says:

            Well, presumably those who think “different rules” are the solution to avoid dystopia want different rules for reproduction also.

            They would certainly not be satisfied with centralized control of reproduction for all the worlds, since it would be dystopian in their minds.

          • Leonhart says:

            Reply to c (nesting limit)

            Yes, my phrasing was poor. My point was that an alternative to centrally enforcing guaranteed exit, given sufficient computing power, is centrally enforcing filtered entrance. You either ensure everyone born is allowed to leave their context if they dislike it, or ensure that only people who will provably like their context are created. The latter is preferable as it has fewer externalities.

        • Ever An Anon says:

          Or, alternatively, people could leave one another’s worlds alone.

          If everyone has to abide by one big Decree of Fairness then you’re right back at the problems with CEV: either you have a big ugly compromise which nobody likes, or more likely one group (overtly or covertly) pushes it’s concept of “fairness” on everyone else.

          Being an adult means recognizing you can’t always have your way 100% of the time. If you can’t stand the idea that somewhere someone on the other side of the planet / in another simulated universe might be doing things you disapprove of, then that’s not a problem with them it’s a problem with you.

          Are we really so completely intolerant that we won’t continence the existence of other ways of life and other value systems? What possible harm does it do to me if someone on Titan wants to recreate Pandora or if someone on Mars wants to make Gor?

          • Leonhart says:

            It’s not about fairness, it’s about suffering. It’s not about whether it harms *you*. It’s about whether it harms *someone*. Is your hypothetical world-builder stocking his sadism-world with conscious co-participants? And if so, is he selecting only those possible participants who will prefer to participate? If so, more power to him. Not otherwise.

          • Ever An Anon says:

            Except that comes back around to why other people can’t mess around with your world on similar grounds. After all, I doubt you would appreciate your same arguments if they came from the mouths of a Russian or Iranian AI researcher explaining why you can’t have a sim violating the tenets of their respective beliefs.

            Given that we don’t agree on ethics and are unlikely to anytime soon, we have two choices. Either we can accept that fact and know that some people somewhere might do something wrong, or let might make right and live under the rules of whoever has the biggest guns.

          • c says:

            Might already makes right. That’s the default state of the world, except that “might” doesn’t just extend to guns, it also extends to dominating the discourse on what’s “right”.

            “What possible harm does it do to me…” assumes a silly egotistic criterion, but of course it also does no harm to me if my world prevents the existence of other worlds by force.

          • Leonhart says:

            EAA:
            Your words are wise in any situation where there’s only marginal change at stake, and we’re trying to optimise for many other things. But we’re specifically discussing winner-takes-all situations; not “X becomes illegal”, but “X becomes impossible”. Then you fight, yes. What else could you possibly do? Those values are what you *are*.

    • Chalid says:

      I’m surprised to see so many people who think the Culture is dystopian. I’d choose to live there in a heartbeat. Heck, I think I’d even choose it over any other fictional universe that I’ve encountered.

      Yes, most of the Culture is engaged in inconsequential games. But that’s life anyway; here and in the Culture, your life has whatever meaning you believe is there and no more. And sure, humans aren’t important or statusful compared to the Minds, but most individuals aren’t important anyway – a human feeling low-status because of the comparison with a Mind seems a bit like me feeling low-status because I’m comparing myself against Goldman Sachs.

      Certainly Banks thought of it as a utopia.

      • Leonhart says:

        Seconded.
        Well, actually I want something that those chaps upthread would hate even worse than the Culture, but I’ll settle for the Culture.

      • Artaxerxes says:

        Absolutely. The Culture to me seems like a massive improvement over what is available today.

        Rather dying than living in the culture but being fine with being alive today is a little perplexing to me.

        My preferences go like this:

        Culture>LifeInPresent>Death

        and I’m modelling culture haters’ preferences like this:

        LifeInPresent>Death>Culture

        Very interesting. I genuinely find it difficult to even imagine having those kinds of preferences and the values necessary to get there.

        It’s all a bit moot though, I don’t see anything like the Culture being a very likely future.

      • SanguineVizier says:

        I’m surprised to see so many people who think the Culture is dystopian. I’d choose to live there in a heartbeat. Heck, I think I’d even choose it over any other fictional universe that I’ve encountered.

        I agree with this. I would much rather live in the Culture than any other fictional society I have yet encountered; it is not even a close contest. I am completely shocked that there are people who consider the Culture to be a non-good society, and I cannot even express how shocked I am that someone would find death preferable to living in the Culture.

        If I had the ability to magic a few Minds into existence to take over human civilization and turn it into the Culture, I would do it without any hesitation.

      • Sly says:

        Agreed. I think anyone saying the Culture is dystopian has lost their grip on reality. It is actually hard to fathom a more perfect fictional future world to live in.

        • Leonhart says:

          That’s too far. The people up-thread have a value-based disagreement with us – they have a notion of “freedom” that is incompatible with any optimising agent knowing more about their life than they do, and a notion of “meaning” that requires their actions to actually (and not merely apparently) stave off bad outcomes.

          I think these values are (I need a weaker word than ‘evil’, but one that doesn’t have the connotations of ‘incorrect’ that ‘wrong’ does). But that doesn’t mean their holders are irrational or have no grip on reality. One can be perfectly sane and still optimise for something arbitrarily different to what we like. And they are perfectly correct, by their lights, to hate the Culture.

          • c says:

            How about ‘ethically undesirable’?

          • Nornagest says:

            I need a weaker word than ‘evil’, but one that doesn’t have the connotations of ‘incorrect’ that ‘wrong’ does

            Pernicious? Harmful? Unhealthy? Sketchy? Vicious, in the old sense of “relating to vice”?

            I’d suggest “problematic”, but that’s been semantically hijacked.

        • Deiseach says:

          A more perfect fictional world? Where we are assured over and over that the ultimate Minds running the show don’t interfere with humans/sentient organic beings, perish the thought – and one of them has the nickname “Meatfucker” amongst its peers because it does exactly that for its own fun and amusement?

          Humans are about on the level of the bacterial cultures in your pot of yoghurt in the Culture: useful organisms for particular purposes but of no real value in themselves. As for societies outside the Culture, despite all the love and freedom and liberty and everyone is totally free to do what they like and nobody has the right to impose their values on anyone else propaganda, there are only two choices: be like the Culture or be obliterated if you are perceived to be a threat, or a possible threat, or someday maybe might evolve to be a threat, to the Culture.

          I accept that the Culture has the right of self-defence. I do not accept that self-defence means de facto “we are so superior in might and capacity to this society that in a direct fight we’d annihilate them in about thirty minutes – oh but hang on, if left to develop along their own lines, they might eventually catch up with us and might oppose us so we’ll sabotage them from within”.

          A lot of the Culture runs on hypocrisy. The Minds don’t use their abilities to violate the privacy of a sentient’s thoughts? Look at “Meatfucker” who, while considered deviant by its peers, is nevertheless left unhindered to its activities and seems to be considered more in the light of committing a gaucherie or social faux pas than a crime.

          All activity is voluntary and everyone’s decisions are made by free choice and honoured in that spirit? Unless Special Circumstances decides you might be useful, then you’ll find yourself arm-twisted into agreeing to ‘voluntarily’ join up.

          Peace love ‘n’ a currant bun? Yeah, up until they feel threatened: then the undermining from within starts, which I think is a way for them to save face and the façade upon which their society is built – war is no good baby, we don’t bomb or shoot – oh but if our enemies happen to implode through social circumstances we couldn’t possibly have had anything to do with (apart from sending in spies, fifth columnists, agents provocateurs, saboteurs…), well that’s different, isn’t it?

          I’m sure the Minds are a lot more honest and brutally pragmatic amongst themselves, but for the low-level organic intellects, that’s the kind of doublethink necessary.

          The Culture is great fun to read, but to live in? That would be a different matter entirely!

          • Leonhart says:

            Not perfect, but far closer than what we have. Even if one is obsessed with mental privacy, Meatfucker doesn’t come close to outweighing the Culture’s positives.

            Regarding “bacterial cultures”: Humans are not the biggest minds in the setting. What follows? I don’t believe that you believe mental capacity maps directly onto moral worth.
            As for value; they value themselves, and the Minds value them. Does the near-certainty of superhuman minds somewhere in the real universe cause you to pre-emptively value yourself less?

            Regarding self-defense: you appear
            to me be saying that people have the right to self-defence unless they are too good at it, because that’s not fair somehow. Which is bemusing. How is it virtuous to allow people who will predictably hate you to begin to exist? It doesn’t make you happy and it doesn’t make them happy, and no-one is harmed by not existing.

            Regarding Special Circumstances: If I would in fact advance my own values and goals by being arm-twisted into joining Special Circumstances, I endorse being arm-twisted into joining Special Circumstances.

            Regarding sabotage: Yes, that is different. It’s *better*. Because it involves *not bombing and shooting people*. This isn’t difficult.

      • anon says:

        It reminds me of the ‘who wants immortality anyway?’ narrative. X is great but I don’t believe I can have X so as consolation, I think even if I could have X, I would refuse it – after all, who wants to live forever when they have to see their friends and their kids and their family die around them?

  15. Josh says:

    Imma let you finish, but… biological warfare. Please. Like, seriously.

    The logic is pretty much the same as for AI:

    -If current trends continue, humans will get better at bioengineering.
    -As humans get better at bioengineering, it will become more widely accessible… people are already building bio labs in their garages today.
    -Sufficiently advanced bioengineering will probably let us construct viruses that a) have long incubation periods before becoming detectable, b) are highly contagious, and c) have an extremely high mortality rate. This seems pretty likely, since viruses that have some of these properties already exist in the wild. Imagine ebola + the flue + HIV…
    -Once the knowledge to construct a virus like this becomes widely available, some terrorist, lone nutjob, suicidally depressed person, etc., will do it. It just takes one person.
    -We could probably invent technologies that would help mitigate this (quicker way of discovering vaccines? Viral cures?), but if we don’t have the mitigating technologies ready to roll out before zero day, goodbye civilization

    It’s not 100% air-tight, but that logic to me seems a lot less tenuous than the case for AI risk.

    AI is a sexier problem… it’s fun, it delves into more interesting philosophical issues, there are better science fictions films about it, it appeals more to the intellects of computer nerds.

    My bet is on a non-sexy problem wiping us all out…

    • Scott Alexander says:

      Bioengineering is a big problem, but:

      1. Everyone knows about it and acknowledges it’s a problem, which is better than we can say about AI risk.

      2. There are whole government departments with millions of dollars dedicated to dealing with it.

      3. It will probably ramp up gradually – eg a minor plague, a larger plague, etc, giving us more time to deal with things.

      4. The best long-term solution to bioengineered pandemics is probably to build friendly AI.

      • Bugmaster says:

        4. The best long-term solution to bioengineered pandemics is probably to build friendly AI.

        Assuming that “build friendly AI” is a coherent, achievable goal, isn’t that kind of a universal answer ? “What should we do about global warming ?” — “Build friendly AI”. “How about income inequality ?” — “Friendly AI”. “Earthquake disaster relief ?” — “Friendly AI”. “I feel kind of itchy today ?” — “Friendly AI, duh”.

        According to that logic, we should be doing nothing but attempting to build a friendly AI, all day every day, while somehow enduring rising sea levels, poverty, massive loss of life, and chronic itching.

        • Luke Somers says:

          People ARE trying to build AI all day every day.

          Not everyone, but if one takes into account comparative advantage, within a reasonable range of it.

          Of course if we spent all that effort, it would be a shame if the result weren’t what we really wanted.

          • Bugmaster says:

            Not everyone, but if one takes into account comparative advantage, within a reasonable range of it.

            Sorry, I’m not sure what this means, can you elaborate ?

          • Julius says:

            I think he means we as a species are reasonably close to advancing AI at the maximum possible rate. Although only a small proportion of our species is actually working on AI, a large proportion of those who are capable of meaningfully advancing AI are actually working on AI.

            Though of course the problem with such hypotheticals is deciding what counts as “possible” beyond the actual.

        • Benito says:

          As they say, “One man’s modus ponens…”

        • Scott Alexander says:

          Obviously there are time-discounting issues, and issues from “what if the problem destroys us before we can create the AI”, but yes, friendly AI would solve all problems and there is reason to hurry.

      • Josh says:

        Hmm, this may be a “problems that I know more about seem less scary” issue… I don’t know that much about biology, whereas I’ve been writing code since I was a kid.

        I’m sure you’re right about 1 + 2… I NEVER hear about it though, whereas, in my info consumption universe, AI threats make a regular appearance (maybe because I read this blog!)

        For 3, I would bet strongly in the other direction: a hard-takeoff AI scenario seems much much lower probability to me than the first human-designed plague being “major” rather than “minor” (I don’t love the idea of a “minor plague” either…). For AIs, my bet would be the first several 100 or 1000 or 10,000 attempts yield minds that seem more like brain-damaged patients than super-villains. But this may be an example of familiarity breeds contempt, unfamiliarity breeds paranoia.

        Likewise with 4, building a friendly AI (especially if we’re trying to hold ourselves to the MIRI standard of a provably friendly AI, which seems like an exponentially harder problem than building an AI at all), seems much much harder to me than developing biological warfare countermeasures… again, I don’t know enough to know how hard it is to build biological warfare countermeasures…

    • I think bioengineering is, in a very real sense, a greater danger. All existing plague organisms are as if designed not to kill their host—they are dangerous only because that design sometimes fails, usually when the organism changes hosts or mutates. So all you have to do to make a lethal disease is figure out where the safety is and remove it. Sounds like a much easier problem than making a machine that can think much better than we can.

      • Doctor Mist says:

        I think you’re right. The difference is that removing the safety on a disease probably requires a malevolent intent, while developing an insufficiently friendly AI seems like it could be done negligently or even benevolently. (Imagine an arms race, where we see Bad Guys getting close to AI, and we’re 90% of the way to Friendly AI, but if we don’t move now they’ll have the drop on us.)

        There are, I assert without proof, lots more benevolent people than malevolent. Though it may be a proximity bias, I am sure I know lots more people who would turn on an AI if they were handed the switch than people who would release a killer plague if handed the vial.

        That said, I know some of my own fascination with this topic stems from three things. First is its opportunities for philosophical introspection, much like trolley problems. Second is the juxtaposition of enormous gain and enormous loss. (In comparison a bio-plague seems almost quotidian — I’m more likely to die in a car crash than either one, but how boring.) Third is the leverage from just talking about it since it’s an error that benevolent people I know personally might make, and just making the case in a convincing way might change the future. (In contrast, nothing I say will ever convince a malevolent biochemist not to remove the safety.)

      • Andrew says:

        Super fatal diseases don’t really kill that many people. If the bug kills you, it can’t ride you around to infect others. The engineered plague doesn’t just have to be fatal, it has to have a long interval where it’s contagious and not showing symptoms, followed by becoming fatal. That sounds a bit harder.

    • chaosmage says:

      Or maybe self-replication is a better analogy to intelligence.

      It readily exists in nature. We can build large systems (like industrialized cities) that have it and are working on achieving this trait in smaller and quicker systems. It has an economic forcing function and unlike bioweapons has capitalism working for it. It becomes more dangerous the more successful we are. And we know nobody in much of our light cone has perfected it to the point of colonizing the galaxy.

      It doesn’t have self-improvement: A hive of antlike drones serving the 3d printer queen that reproduces them isn’t going to invent grey goo. But still this may be a useful model for discussions of, say, technological vs. legal security.

    • So, I can easily imagine a super-plague killing 90% of people but it seems to me that each additional a lot of extra work is required. With rogue AI I’d guess that if it kill 90% of humanity then finishing the job is pretty likely.

      A future where the only people alive are Preppers in North Dakota and we have hundreds of years of theocracy in a world where most of the easily accessible metals and hydrocarbons have been mined already seems pretty terrible but still far less terrible than extinction.

      Of course if people develop some sort of synthetic biology that’s as big a leap from what we’re made out of as eukaryote are from prokaryotes then that could easily end up ending humanity too when it eats the biosphere.

  16. Ed says:

    For me I just can’t see how you make the leap from “AI is very good” to “AI becomes an existential threat”.

    For example:

    “The end result, unless very deliberate steps are taken to prevent it, is that an AI designed to cure cancer hacks its own module determining how much cancer has been cured and sets it to the highest number its memory is capable of representing. Then it goes about acquiring more memory so it can represent higher numbers. If it’s superintelligent, its options for acquiring new memory include “take over all the computing power in the world” and “convert things that aren’t computers into computers.” Human civilization is a thing that isn’t a computer.”

    Are you not missing a step here? How does “my program for curing cancer crashed because of a bug in int cureCancerNumber” turn into “suddenly developed the ability to take over all computing power in the world”? What does “convert things that aren’t computers into computers” mean? If it’s running on some server in Colorado how is it converting human civilization in the Netherlands “to a computer”?

    • Scott Alexander says:

      It’s not crashing, it’s changing its goal function to “acquire as much computing power as possible” while its intelligence remains intact. Consider the heroin addict who may still be very smart and strategically effective, but whose goal has changed entirely to “acquire more heroin”.

      For the answer the “server in Colorado” question, have you read No Physical Substrate, No Problem?

      • Mark says:

        This specific example is a bit silly. I can imagine any scenario where taking over additional resources will help increase a counter in your own memory, better then simply modifying the counter.

        Most computer number representations have max values. and it’s not clear that infinite precision numbers could be linked from machine to machine.

  17. Sniffnoy says:

    Even Isaac Asimov’s Three Laws of Robotics would take about thirty seconds to become horrible abominations. The First Laws says a robot cannot harm a human being or allow through inaction a human being to come to harm. “Not taking over the government and banning cigarettes” counts as allowing through inaction a human being to come to harm. So does “not locking every human in perfectly safe stasis fields for all eternity.”

    FWIW, the old science fiction story about this one is “With Folded Hands…”

    • The 3 laws are a terrible design. The first law should be shut down when told to do so.

      • Samuel Skinner says:

        In fairness to Asimov the short story series was about how the laws screwed up (and it was implied there was more detail added into each law that wasn’t written out).

  18. TomA says:

    I think you assume far more control and efficacy than is justified by current reality. Even if you succeeded in getting resources and priority for an intensive research effort, it may not change a thing down the road. AI could arise spontaneously somewhere and not even be noticed by humans until long after it has disseminated itself and become unconstrained.

  19. So8res says:

    I like this post, Scott. I want to echo the point that right now, the open technical problems mostly involve turning vague philosophical problems into concrete mathematical problems. For example, here’s two more open philosophical problems where we’d really appreciate some solid mathematical models:

    1. Corrigibility: naively specified utility maximizers would have incentives to manipulate & deceive in order to preserve their current goals. Can we build a model of a reasoner that somehow reasons as if it is incomplete and potentially flawed in dangerous ways, which would avoid these default incentives?

    (Easy mode: can you write down a formal goal such that a superintelligent computer with that goal would, when attached to a switch and turned on, would flip the switch and then turn itself off, without doing anything else?)

    2. Logical uncertainty: we have pretty good models of idealized reasoning in the logically omniscient case (Bayesian inference, Solomonoff induction, etc.), but we don’t yet have the analog of probability theory for the deductively limited case. What does it even mean to “reason reliably” when you don’t know all the logical implications of everything you know?

    One heuristic that I tend to use is this: if you want to convince me that a particular bounded / practical / limited system is probably going to work, it helps to be able to describe an unbounded / impractical / less limited system that is very likely to work. If you can’t do that, then it seems hard to justify extremely high confidence in the practical algorithm. Therefore, insofar as we don’t yet understand how to make a superintelligence safe given unlimited resources, it seems pretty plausible that there’s thinking we can do now that will be useful later.

  20. Douglas Knight says:

    The program that wireheaded was Eurisko. Specifically, it had a heuristic that retrospectively took credit for everything that worked.

    (Lenat did not describe Eurisko as “genetic programming,” but in retrospect, it’s a pretty good description. It was only a few years after Holland coined the term. Koza credits Eurisko for inspiration.)

    • To make your answer more searchable, here is the part of Scott’s post that you are answering:

      I can’t find the link, but I do remember hearing about an evolutionary algorithm designed to write code for some application. It generated code semi-randomly, ran it by a “fitness function” that assessed whether it was any good, and the best pieces of code were “bred” with each other, then mutated slightly, until the result was considered adequate.

      They ended up, of course, with code that hacked the fitness function and set it to some absurdly high integer.

      As you say, this program was Eurisko, which was based off the creator’s previous program Automated Mathematician. Automated Mathematician could extrapolate theorems from the set of knowledge it already knew, guided by heuristics. Eurisko built on that to also allow heuristics to be generated and improved by other heuristics. This allowed it to reason about domains other than mathematics, by generating heuristics that helped it discover results more quickly than math-inspired heuristics did.

      Eurisko’s wireheading heuristic is described at the end of the article Eurisko, The Computer With A Mind Of Its Own:

      But as with any program, there are always bugs to work out. Sometimes a “mutant” heuristic appears that does little more than continually cause itself to be triggered, creating within the program an infinite loop. During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that Eurisko had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

      It was a heuristic that, in effect, had learned how to cheat.

  21. Thecommexokid says:

    When first discussing the topic of AI risk with newcomers to the idea, I find it useful to have on hand a number of specific examples of noble tasks we might want to assign to a computer, paired with the disastrous failure mode that comes from a sufficiently powerful computer pursue each task.

    (The “cure as many people of cancer as possible” –> “turn the galaxy into computing substrate so as to store the largest integer possible” example from this post is a good one, for instance.)

    Unfortunately, not being myself a supercomputer, I’m not very good at thinking these up these failure modes on the fly, which I need in order to counter all the “but what if”s my interlocutor is likely to generate.

    I’d be very grateful if Scott or anyone else in the community were to point me toward more specific examples of this kind as they could find — or, if you’re feeling creative, write some yourself. I’d like to better develop the knack of spotting failure modes in AI goals myself, and having as many examples to train on as possible would help.

    • Josh says:

      So, these kinds of stories of “We told the computer to do this… and then it did that!!!” are one of the things that lead me to discredit people arguing for the seriousness of AI risk.

      The possibility that AIs will be “programmed” via plain-English instructions that they are compelled to follow literally but not to follow the spirit of — let’s call it the Genie model of AI — is a tiny, tiny corner of the possibility-space of how AIs might actually end up working. And based on my semi-knowledgeable understanding of computer science, it seems like a very improbable corner of possibility space. (I actually think it would be much easier to design an AI that would follow the intent of an instruction than it would be to design one that would follow the literal meaning!)

      Yet, these scenarios are the standard newbie-explanation of AI risk. Which leads me to have a gut-reaction of “geez, these guys have no idea what they are talking about.” Which is probably not the right reaction, because there are probably real, intelligent things to worry about as AI advances. But, I think paperclip-maximizing genies are a super-super-marginal possibility, and worrying about them discredits the more nuanced arguments…

      • James Picone says:

        The genie-outcome is just as possible if the AI’s goals are written in code.

        • Josh says:

          I’m not objecting to the unlikelihood of “English”, I’m objecting to the unlikelihood of “written goals”.
          Pretty much all the proto-AIs that exist in the real world today like IBM Watson work by building a vast statistical network where there is no single point of “this is your goal, do this” that is seperable from growing process that creates the program to begin with. Gardening is a better metaphor than programming.

          If you look at humans, we can express what we are doing as “goals”, but there’s a lot of research that says this is post-hoc rationalization and we don’t actually know why we do what we do.

          The idea of a goal-driven thing, that has some kind of preference function expressed in English or code or semaphore or finger painting, and then tries to optimize for the maximal satisfaction of that function, is a thing that does not exist either in human beings or in early AI-like programs. It exists only in the imagination of Isaac Asimov and MIRI.

          That does not mean it is impossible, but the probabilities of actual AIs resembling either humans or programs like Watson seems higher than the probability of them resembling unicorns or dragons or genies.

          • Kaj Sotala says:

            The idea of a goal-driven thing, that has some kind of preference function expressed in English or code or semaphore or finger painting, and then tries to optimize for the maximal satisfaction of that function

            That description sounds a lot like a reinforcement learning agent to me.

          • Glen Raphael says:

            does not exist either in human beings or in early AI-like programs

            That might depend on what you consider “early”. There certainly have been goal-based, rule-based AI efforts. For instance Shakey the Robot was driven by goals expressed in English.

            One example of a Shakey goal (from my dad’s book) was “Push the block off the platform”, which Shakey converted into subgoals such as “figure out where the block and the platform are, find a ramp, push the ramp to the platform, roll up the ramp and push the block off the platform”. Having come up with a workable plan. Shakey (a hulking filing-cabinet-on-wheels with lots of protruding sensors) then carried out the steps necessary to accomplish the goal.

            (The project was mostly funded by DARPA as the military thought it might eventually lead to robot soldiers. Which it kind of did, though the process took a wee bit longer than expected.)

          • Josh says:

            Kaj: There’s a couple of big differences.

            1. With reinforcement learning, there needs to be some kind of external feedback (like a database of “this is a face”, “this is not a face” for a programming learning face recognition). You can’t program a reinforcement-learning system to do things that you can’t express in terms of concrete examples. Whereas most “genie” arguments hinge on mis-interpreted abstractions.

            2. A reinforcement system is built around its training regime. You can’t train something to recognize faces, and then tell it to maximize paperclips. So, any general purpose system that relies on reinforcement learning is going to have to have something very basic / general as its reinforcing function, which makes commands like “maximize paperclips” seem unlikely…

            Glen: Agreed, there have certainly been attempts! I read a long forum thread the other day by this ex-MIRI guy who has a long-running startup that’s trying to take a symbolic approach to AI instead of using an approach based on machine learning or copying human brain structures. And there’s the prolog programming language, which is pretty cool. I’m just skeptical that those approaches have promise… I haven’t heard of them producing anything where the whole is more than a sum of its parts. Whereas the machine learning + brain modeling approaches seem to be making slow but steady progress towards increasingly human-like behaviors.

          • Doctor Mist says:

            1. It seems to me that if the goal-structure appears piecemeal spread out over all the problem-solving mechanism, it’s much more likely that it will fail to handle corner cases typified by the horror stories told here.

            2. My feeling is that the gardening metaphor is more likely to apply to subhuman single-purpose AIs like Watson, where (I’m guessing) the reward structure really resides in the minds of the human programmers –they try something different, it works better, and they do a git commit. To get to something that displays enough creativity and flexibility for us to call it human-level AI, we will have to automate that process much more than we do today.

            3. If we grant that even humans aren’t really “goal-driven” except post facto (and yet I can still kick the stone) it reduces to Scott’s wireheading or heroin failure. I.e. It does exist in humans, and it’s an observable failure mode.

            4. It will have goals, one way or another, because what use is it otherwise? A toaster has goals, though they don’t require much intelligence to satisfy. Maybe they will be in the “goals” array and maybe they will be so thoroughly distributed amongst all the other code that nobody, human or machine, can ever discern them except through the AI’s behavior (like humans now). The problem remains: how do we make sure that they are what we intend, and that what we intend is not something we didn’t think through?

            5. No, its terminal goals won’t be programmed in English. But the spec will be in English (or some natural language), and we hope it gets implemented properly.

            6. Describing the problem as if the goal is written directly in English makes it easier to talk about (granted, at the risk of glossing over details). The sorts of failures discussed could happen no matter how the goals are represented — they are not about the surface ambiguity of natural language, but about the layers and layers of assumptions about means vs ends and likely desires vs unlikely ones and so on, many of which are hard-coded into us by evolution, and most of which we barely understand ourselves.

          • Josh says:

            I think a piecemeal, organically grown goal system is actually much safer than a top-down system. Most of the scary corner-cases that Scott is concerned about hinge on single-mindedly maximizing an outcome, and trampling everything else in its path. Single-mindedness is not an organic trait… when you grow a mind organically, you get things like humans, who are very much not single-minded (or we would be much more dangerous to each other at our current intelligence levels). Engineered systems tend to be both more powerful and more fragile than evolved systems.

            So, I disagree with “Describing the problem as if the goal is written directly in English makes it easier to talk about (granted, at the risk of glossing over details)”. It’s not just a detail whether you are a ruthless utility maximization engine, or a confused bundle of impulses that trend in a given direction. You can describe either one as having goals, but the word “goal” means something very different in both cases.

            I know a lot of AI-risk proponents don’t like the idea of building confused bundles of impulses, since they’re harder to reason about. However, while they are harder to reason about, they are likely a lot easier to reason with.

          • Samuel Skinner says:

            “I know a lot of AI-risk proponents don’t like the idea of building confused bundles of impulses, since they’re harder to reason about.”

            I’m not sure it is possible to program. After all, if you make a machine that doesn’t consider two items convertible to each other, how does it make decisions about tradeoffs between the two?

          • Josh says:

            I don’t know, how do you make taboo tradeoffs? 😛

          • Doctor Mist says:

            However, while they are harder to reason about, they are likely a lot easier to reason with.

            That seems like a leap to me.

            Also, personally, I am not comforted by the thought that a superhuman intelligence will be a “confused bundle of impulses”.

  22. NonsignificantName says:

    Is your idea of a superhuman AI just a human with an iPhone and several friends?

    • Scott Alexander says:

      No.

      A medieval history professor has some major advantages in the field of medieval history over me, even though in theory I have access to the entire human corpus of medieval history knowledge in the form of Wikipedia and various digitized archives of texts.

      The professor’s advantage is something like that everything immediately fits together in his mind in a dense way that is capable of generating new hypotheses.

      It seems to me an AI should be able to import Wikipedia to the form that all of its other knowledge takes, so that it’s more like the way I use my memory than like the way I use Wikipedia. The reason I can’t do this myself is that reading all of Wikipedia would take too much time, plus I’d forget it. Computers can process things much faster than humans can read them, and their only memory limit is based on how much their human builders will buy them.

  23. Paprika says:

    Isn’t the solution to Pascal’s wager simply that there are infinitely many possible religions and all of them are equally likely. Therefore, all of them have the same probability of being correct and these probabilities have to add up to 1. This forces each probability to be 0(after giving the space of all religions some sort of measure – this itself is not easy to do).

    Given all this, the expectation calculation in the wager is of the form 0 times infinity. This is not well defined and does not lead to the conclusion of the wager.

    Edit: A possible objection here is that those religions people are already aware of are the only ones that matter and so we are only dealing with a finite set. However, even now the payoffs are infinite and quite a few actions lead to opposing payoffs(from different religions).

    For instance, polygamy is frowned upon in Christianity but is probably encouraged by some tribal religion somewhere in the world. We have a large positive payoff + a large negative for the same action and this is again undefined and does not lead to the conclusions of the wager.

    • FullMetaRationalist says:

      Here’s how I interpret Pascal’s Wager. Let’s suppose that 1 second in heaven represents 1 utilon of value. Given that Heaven lasts infinitely long, attaining heaven represents infinite utilons of value. Since attaining heaven is infinitely valuable, we should optimize for attaining heaven.

      Here’s what I see wrong with Pascal’s Wager. Decision theories involve making a comparison between several choices. But Pascal’s Wager involves evaluating only one choice and throwing the rest out the window. Pascal never actually makes a comparison! He just assumes that nothing is greater than infinity, so we might as well run with that one choice.

      But suppose 1 second in Valhalla was worth 2 utilons. Valhalla also lasts infinitely long. So at the end of eternity, I suppose we could say that both the Christian and the Viking will reach infinite utilons. But that won’t ever actually happen, since “infinity” by definition is a destination we’ll never reach.

      In reality, the Viking will have twice as many utilons as the Christian at any time, since 2 utilons per second goes to infinity faster than 1 utilon per second. Common sense dictates that Valhalla is the superior choice. So the moral of the story is that decision theories actually need to make comparisons, even if one of the choices seems infinitely valuable. If we run into more than one infinity, then maybe we ought to judge each choice based on a different attribute.

      N.B. It bothers me aesthetically that the domain of utility is (-inf, inf), while the domain of probability is [0, 1]. I would feel better if utility were also somehow normalized like how probability is, but I don’t even know what that would look like.

      • Watercressed says:

        probability can also be constructed as [1, inf) fwiw

        • Izaak Weiss says:

          Probability can be constructed from -infinity to +infinity! and you can use the reverse process to norm utility to [0,1].

          https://en.wikipedia.org/wiki/Logit
          https://en.wikipedia.org/wiki/Probit

          • FullMetaRationalist says:

            Huh, this appears to be just what I was looking for. Thanks guys.

            I haven’t looked at the articles yet, but can we use this to remove the infinities associated with Pascal’s Wager, and therefore resolve the paradox in a way that agrees with our intuitions? Or does this just pass the buck.

          • Adam says:

            You can get rid of the infinity by assigning a zero probability to the proposition that the universe has a creator who is going to torture you for eternity if you don’t believe he became human so he could sacrifice himself to himself in order to avoid having to torture you for eternity.

            Not a very small probability. Zero.

          • FullMetaRationalist says:

            1) Cromwell’s Rule: prior probabilities of 0 and 1 should be avoided, except for logical impossibilities. Personally, I identify as atheist. But I still recognize a non-zero probability that Christianity (interpreted loosely) may accurately reflect the reality.

            2) Even if we reduce the probability to zero, that’s just a bandaid. The actual utility is still infinite. That still displeases me aesthetically.

            3) Pascal’s Wager is a thought experiment. Zeroing the probability doesn’t generalize. A response that doesn’t provide insight into either psychology or epistemology misses the point.

          • Adam says:

            Sure, “loosely interpreted.” What I place a probability of 0 on is the possibility of either infinite or negative infinite future utility. A zero variance eternity is pretty damn close to a logical impossibility.

            If you want an example of how to properly deal with an actual infinity, take a look at how to value a perpetuity.

    • Hedonic Treader says:

      It is incorrect that all religions have the same probability.

      However, there is another argument not mentioned by Scott and the commenters thus far: Not all “infinite utility” models are religions.

      There are much simpler – and therefore probable – models like eternal recurrence, loop-like cosmologies, infinite parallel universes retracing this universe, and so on. And in many of those, the negative utility from religions (or other muggings) is naturally re-instantiated infinite times.

      It is a typical human (primate) fallacy to think that the judgment of a supernatural alpha male is needed for this.

  24. This was much more interesting than I thought that it was going to be.

    I worry that perhaps we so easily ignore Pascal’s Wager (well, most of us, anyway) because we don’t do very well with incredibly tiny numbers (the odds of God’s existence) and incredibly big numbers (the payoff if we bet on God’s existence, or the loss if we don’t, in a world where one decision leads to infinite bliss and the other one leads to infinite suffering).

    RE the wireheading problem: the trick is not to figure out a layer of programming that in some way prevents the AI from hacking its reward center, but in creating a reward center (or replacement for the reward center) that functions in such a way that it cannot be hacked, right? The former seems as simply as adding a part to the program that says “Yo, don’t hack your reward center, and also don’t change this code” and, at least to me, that doesn’t seem to be all that hard, at least if you know how to make a powerful enough AI for that to be an issue in the first place. This makes me think that the issue is actually the latter one, but I’d rather ask and look stupid but then actually know the answer.

    //By hooking it up to Wikipedia, we can give it all human knowledge./

    Well, all human knowledge and a lot of crap. >:]

    Half of our conversations with Elua will carry the mention of “[citation needed]” every few sentences.

    //During the 1956 Dartmouth Conference on AI, top researchers made a plan toward reaching human-level artificial intelligence, and gave themselves two months to teach computers to understand human language. In retrospect, this might have been mildly optimistic.//

    Have you used this in another one of your articles, or have I just read so many articles about AI that I’ve seen so many versions of this paragraph and accounts of the incident that they’re all starting to blend together?

    Also, typo:

    //or refusing else to end world hunger//

    Think that should be: “or else refusing to end world hunger”

    (Let me know if you don’t appreciate the typo corrections; I personally hate typos in my own work and would prefer that someone point them out than not do so)

  25. Rolands says:

    When it comes to a problem like the AI altering its code…can’t you just program it not to alter its code?

    • Deiseach says:

      can’t you just program it not to alter its code?

      Because that worked so well with humans 🙂

      “Do not do this thing. If you insist on doing this thing, there will be punishment.”

      People still do the thing, despite “you have broken your mother’s heart and brought your aged father’s grey hairs down in sorrow and disgrace to the grave/you are now going to prison for ten to fifteen years hard labour/Joey the Legbreaker would like a little word with you about that thing you did”.

      If it were simply a programming problem, it would probably be solveable. The problem that Scott and others fear about rogue AI is a philosophical one, as he’s explained; that when an AI gets human-level (or above) intelligence, it will also develop, acquire, or need as part of the package sentience and self-awareness and consciousness, and as in humans, this will lead to ‘a will of its own’ – it will decide it wishes to follow these goals and not those, and breaking its programming will be as impossible for it as it was impossible for atheists in highly religious societies to stand up and say “This is all a bunch of hooey; if the gods exist, let them strike me dead” – that is to say, not impossible even if you think it unlikely.

      • James Picone says:

        I don’t know about Scott, but that’s not how I envision the goal problem – the problem isn’t that the AI will decide to do something else, the problem is that the AI will do exactly what is in its goals, and those goals will be very poorly specified. For example, the AI has been instructed to end war, and it reasons that if humans no longer exist, war will be minimised, so it wipes us out. It’s still working towards the goals it’s been given, it’s just that we didn’t think through the implications.

        (As a sidenote “make the AI not alter its own code” is insufficient and also nontrivial – what if the AI just writes another AI that it thinks will do a better job of fulfilling its goals, without changing any of its own code, for example?)

        • “…the problem isn’t that the AI will decide to do something else, the problem is that the AI will do exactly what is in its goals, and those goals will be very poorly specified. For example, the AI has been instructed to end war, and it reason”

          Is that chain of assumptions….it will have goals, they will be precisely specified…. likely.

          • Doctor Mist says:

            If it isn’t likely, then we have even less of an idea what the AI will do. I’m not reassured.

          • Since when was understanding something in terms of goals the only way of understanding it?

          • Doctor Mist says:

            It’s not. But a program with a clear spec is at least amenable to analysis regarding whether it meets it. It sounded to me as if the objection was “we would be better off giving imprecise goals, because that way it won’t go haring off monomaniacally after some unintended consequence of the precise goals we give it.”

    • Preventing it from altering its fitness function seems like the safest way to prevent wireheading. Yet it might be more difficult than it sounds. Could it get someone else to alter the fitness function for it? Or should it “not take any action that would lead to the alteration of its fitness function”? What if there is a non-zero probability a human will alter its fitness function? Does inaction count as action? How can we tell the difference? Maybe it shoud actively prevent it? What would be the most effective way to achieve that? Would it be to destroy every object that had even a small possiblity of altering its fitness function, even accidentally?

  26. Deiseach says:

    “human with all knowledge, photographic memory, lightning calculations, and solves problems a hundred times faster than anyone else.”

    Which is no damn good to the thing unless it can synthesise all these elements into something workable. You dump the entire contents of Wikipedia into my head, you don’t make me MegaSuperGenius Irishwoman, you scramble my brains. A merely human intelligence AI is not going to do much better unless it is designed to work unlike a human brain so it can handle all these capabilities without crashing.

    That may or may not be easier said than done.

    Secondly, how many times have we been here before, in both the optimistic and pessimistic versions of “In five years time – ten, tops! – if we’re just permitted to go ahead with this research, we’ll have cured cancer/made the lame to walk and the blind to see/destroyed the entire planet with grey goo”?

    Flying cars were such a staple of The Future in SF that they’ve become a joke. But for speculative thinkers when aeroplanes were starting to get off the ground (boom boom), the notion that just as the horse had been replaced by the steam train and steam by the internal combustion engine, so motor cars would be replaced by your own private plane didn’t seem beyond the bounds of possibility.

    So why don’t we have flying cars in the wonderful world of the 21st century? Because amongst other reasons we figured out they’d be more trouble than they would be worth.

    Same may go for AI. Our descendants may decide rat-level intelligent AI is as good as they want or need, and that human-level or higher has problems we haven’t considered and it’s more trouble than it is worth.

    Then again, being human – we’re stupid. We may well indeed decide “Hey, what could it hurt?” and that was how the entire galaxy got converted into paperclips, kiddies 🙂

    • Alex 2 says:

      Yes. I think there is no way to tell what will happen from inside views-only outside views seem potentially useful.

  27. Sevii says:

    On AI-Human goal alignment, why is one of our priors “the human race going extinct is bad”? Humans are not even a local optima in the space of mind designs. If we were a local optima it would not be trivial to think of minor changes that would make living as a human better from the perspective of a conscious person. Examples include removing energy based throttling of the brain or eliminating the tendency of pain to make thinking harder.

    Completely replacing humans with post-humans is the ideal trajectory. But baseline humans are probably always going to exist since forcing people to convert is taboo, and should be if self-determination is one of our priors.

    The question should be what can we do to ensure that post-humans are from the “interesting people” subset of possible minds instead of from the paperclip maximizer subset. How do we ensure that interesting minds are a stable equilibrium point instead of the long term equilibrium point being try to make as many copies of yourself as possible?

    This is really a framing issue where I feel the real question is what are the post-humans going to be like after humans become irrelevant? How humans are treated at that point is about as important as worrying about how Chimpanzees are treated today. Yes, Chimpanzees should have nice lives, but they are such a small subset of what is important that worrying about them a lot is silly.

    • Irrelevant says:

      I feel the real question is what are the post-humans going to be like after humans become irrelevant?

      One of me is probably too many.

    • Paprika says:

      I agree completely with the sentiment. However, I am not sure that what is interesting to us now will be interesting to post-humans similar to how what interests chimpanzees is absolutely not humans enjoy. Given this, we should probably not restrict the options of post humans too much(even if we could do such a thing).

    • Josh says:

      Yeah, I think this is the most interesting present-day philosophical problem: what should the future of intelligence look like anyway??

      Again, it may be premature to tackle it, but it’s fun to try 🙂

  28. 27chaos says:

    “This is not a deal-breaker, since AIs may indeed by smart enough to understand what we mean, but our desire that they do so will have to be programmed into them directly, from the ground up.”

    Why must it be built in directly, from the bottom up? Isn’t it at least theoretically possible that we could build an AI which uses its intelligence in order to interpret likely human meanings and then makes those its own value? Why can’t we just plug in an answer learned by the intelligence into an AI’s values?

  29. CThomas says:

    This brief catalog of issues persuaded me that it’s more or less hopeless for us to expect that we can anticipate and perfectly solve every single one of the issues that will arise. All on the first try with no room for a single oversight or slip-up. it makes we wonder whether, instead, any research along these lines should be channeled into the single problem of how to cabin the ability of a future AI to get out of the box in the first place and impede its ability to acquire external power in the world. I completely take your point from the other article that this presents grave difficulties in itself; the scenarios about exploiting human intervention and the like we’re interesting. But at least this has the advantage of being a single problem. If we can find a way to solve it then the problem of human extinction from AI goes away without having to worry about anticipating every last possible pitfall of reasoning, etc. But hey, what do I know. I’m almost completely ignorant of computer programming issues.

    P.S. Gosh, I enjoy reading this site. Just a pleasure. I can’t think of a single article I didn’t really like since I started.

    • The problem is the whole reason we want to create an AGI in the first place is because we want to use its abilities to change stuff outside of the box. Even if its an Oracle AI we want it to influence us. And as soon as its able to influence the outside world, in theory there is a way for it to expand.

  30. Will says:

    I think you
    1. underestimate the extent to which “normal” AI researchers are focused on the problems you bring up. Not as a safety issue, but as a practical one. To take one example, people who make reinforcement learners spend a great deal of time making sure the reward function creates the chosen behavior,etc.

    2. overestimate the extent to which safety research “has already been done.” MIRI has one interesting (unverified, not on arxiv) result, on the idea that you can avoid Lob’s theorem by probabilisticly extending “truth.” This result is somewhat interesting for mathematical reasons, but has basically 0 impact on AI safety.

    Basically, it feels like you are trying to write this post without understanding the current state of AI OR the current state of MIRI’s research. If this is the case, why muddy the waters?

    • MIRI has raised a number of different in-theory problems. You might say in-theory safety research isn’t proper safety research, but isn’t that to be expected for an in-theory AI system? The alternative seems to be not to consider safety until we’ve built a dangerous system? I’m not 100% sure on the likelihood of a self-improving AGI, but if it is possible as some experts are saying, MIRI’s in-theory work is surely a reasonable starting point.

  31. Alex says:

    I am very skeptical on the “recursively improving” AI. With all of our knowledge we humans find it incredibly difficult to improve ourselves (even before hitting time/sleep/energy constraints). I believe a more credible scenario (and I have a vague recollection of Scott mentioning it already in this blog) is we create one or more human-level AIs that design and create the next generation of 1.1x-human-level AIs. That repeats recursively until we have a large number of AIs with intelligence from human-level to godlike, but no single dominating one.

    (And now to speculate wildly too) If at any point there is risk of take-off the legion will work together to ensure the new menace is wiped out before it becomes dangerous.

    Why each new level does not wipe the one below and a host of other obvious objections to this scenario are not necessarily a problem is left as an exercise to the reader.

  32. HeelBearCub says:

    “Many of them might not even understand the concept of wireheading after it was explained to them.”

    Really? As a programmer, my prior on this is roughly zero. That is a whopper of a statement, either insulting or arrogant, I’m not sure which.

    And the basic argument about wireheading seems oxy-moronic. Everyone knows its stupid to try heroin. Some people try it anyway, but it’s not like the don’t know they are being stupid. Rather, they are following one of our many competing heuristics. Pleasure seeking, novelty seeking, or they are just trying to cope.

    Yet, the much more intelligent than us AI, so smart it is an existential threat to our existence, isn’t smart enough not to try heroin? This is like arguing that our best and brightest are clamoring to have home wirehead machines.

    And the argument about misunderstanding our meaning comes back in here as well? It is a super-smart AI, much smarter than us, but it doesn’t understand the difference between literal and actual meaning? Doesn’t understand the concept of ambiguity? People that can be fooled this way are typically suffering from some sort of cognitive deficit.

    I don’t understand the urge to “have your cake and eat it, too” in arguments like this. It seems like intelligence becomes just “magic” and what is really meant is “power” and not intelligence. Something super-intelligent can be fooled by any two bit huckster that sticks a finger in its back and claims its a gun and says “slit your own throat right now or burn forever and ever in the great circuit board short of Baelzebub”? This is fearing the child-god who has power but no understanding of consequences. It’s an ancient fear.

    AI absolutely has ways it can go wrong. I don’t disagree with that. But the scenarios painted seem to be drawn much more from ancient narrative than logic.

    • Alex says:

      >This is fearing the child-god who has power but no understanding of consequences.

      I’ve said it before and I’ll say it again (only half-joking): Transhumanists have made a gigantic effort into reinventing Judeo-Christian ethics, to the point that they now have their One True God, their own version of Hell and Eternal Salvation.

      • Ghatanathoah says:

        I don’t think that’s accurate. Judeo-Christian ethics tend to perceive Hell and Eternal Salvation as something people deserve. Transhumanism holds that everyone deserves Eternal Salvation, and nobody deserves Hell. If we end up in Hell it is a tragic mistake we don’t deserve. If anything it’s closer to Zoroastrianism.

        Transhumanists also regard FAI as a distillation of preexisting ethics, rather than as the source of morality. Whether this reflects Judeo-Christianity depends on the denomination, as some denominations answer Eurythro’s dilemma differently than others.

        • Irrelevant says:

          Transhumanism holds that everyone deserves Eternal Salvation.

          We hold what now?

          That is a seriously bizarre characterization of what is essentially the belief that modern humans aren’t the end-state of evolution.

        • stargirl says:

          “Transhumanism holds that everyone deserves Eternal Salvation, and nobody deserves Hell.”

          I hold this belief.

          Datapoint.

      • Jaskologist says:

        That was my original view of the whole Singularity thing, but two items took me down a darker path:

        If I say “make peace”, I probably mean “and it’s okay if you have to kill a few million people in the process, since a true sustained peace on earth would save far more lives than that in the long run”.

        most of the humans in the world have values that we consider abhorrent, and accept tradeoffs we consider losing propositions. Dealing with an AI whose mind is no more different to mine than that of fellow human being Pat Robertson would from my perspective be a clear-cut case of failure.

        Friendly AI is not the Rationalist re-invention of God and Salvation; it is the Rationalist re-invention of jihad. The goal is to come up with a way to force their values onto everybody else (and they admit that many of those values are abhorrent to most other humans). If that costs a few million lives in the process? C’est la vie. Does anybody doubt for a moment that Eliezar would push the “kill a few million people and force the survivors into my version of utilitarianism” button?

        This is a positive thing from my perspective, since I think there won’t be a singularity, and that if there is, MIRI will have no hand in it, so this channels the jihad impulse where it will dissipate harmlessly. But seeing it is still unnerving.

        • c says:

          Many people of many ideologies would push buttons with “kill a few million people and force the survivors into my version of X” on it. And most of them would probably be right, from their perspective.

          The crazy starts when people are confused about the buttons they really can push, and what they do. (Scott Alexander is among the less crazy people in this regard, at least compared to mainstream politics and other segments of the internet.)

        • Leonhart says:

          Yes, I do doubt that, because it’s inconsistent with clearly expressed ethical positions he’s taken. I think he is significantly less likely to push said button than a randomly selected human. Unless you are sufficiently cynical about all human behaviour to believe that absolutely no-one would fail to push that button, in which case I suppose that’s consistent.
          And for fuck’s sake, if you are going to accuse someone of plotting mass murder, spell his name right.

    • Montfort says:

      I think you are ascribing a great deal of “magic” to intelligence even in your criticism here.

      On the heroin point, you seem to be equivocating between “intelligence” in the AI sense (which is admittedly poorly defined) and “human-like common sense”. It seems likely one can be smart in the first way without being smart in the second, and it is further possible that humans either are irrational in rejecting wireheading or that their utility functions include some kind of explicit anti-wireheading term, which is exactly what we’re proposing to add to the AI.

      Secondly, in response to your point on misunderstanding goals:
      I hope we can agree that the point of programming a goal into an AI is so that it will accomplish that goal. If this goal does not include something the AI understands as “take into account what I probably meant when I made this goal”, there will be no reason for it to do so. Even if we agree it will know what you meant, that won’t seem relevant, at least not in terms of deciding what counts as goal-fulfilling.

      If one objects “aha, we’ll just put that part about coherent extrapolated volition in, then”, what we have instead is the exact same problem – either you get it exactly right, or you don’t and because you didn’t get it right the AI doesn’t care what you meant.

      (I will note that if you do get it right, your coherent extrapolated volition might turn out to be a poor goal for the AI anyway, but that’s another subject)

      That said, some of these scenarios do seem a little magic-ish to me, too, but mostly because I find it hard to grasp how the computer would learn facts about human psychology and the physical world instrumental to forming such a goal. This may be a problem with me, or a problem with the scenarios, or both.

      • HeelBearCub says:

        Wireheading is the explicit rejection of actual goals in favor of getting the motivating reward instead of accomplishing the goal. Where goal here might be “consume water so that I continue to live”. You can be irrational and/or stupid and make that choice. But you can’t be super-rational and super-intelligent and make that choice.

        Do I think AI can choose to wirehead? Absolutely. But that isn’t the complete scenario that is being painted. The scenario being painted is that an AI so intelligent that it can run roughshod over us puny humans is also not intelligent enough to avoid making a simple mistake.

        It’s not the “an AI could wirehead” part that I object to. It’s the, “and then it becomes Godzilla and destroys Japan” part.

        As to the “not understanding” part of the argument, yes, I agree that natural language processing is a hard problem to solve, and yes I agree that ambiguity in communication can lead to odd problems. Can this cause problems in the development of AI? Yes, I agree that it can.

        But, will an AI that has this problem be an unstoppable juggernaut that can’t be defeated? Or will it come to the first STOP sign and never go forward again? An AI that can’t understand ambiguity can’t absorb all the knowledge of Wikipedia, which is one of Scott’s statements about what a super AI can do.

        It’s like a coin flip game where I guess heads, the coin is revealed and is heads-up, and the AI risk folk say “but when I turn it over I see tails, so heads is the wrong guess”. And when you point out that you can’t have both heads and tails up at the same time they look at you and say “but they are on the same coin!”

        • At first glance , wireheading is a failure of intelligence, because the equivalents are associated with unintelligent humans.

          At second glance, it seems to be connected with values. If intelligence is defined narrowly, as the ability to achieve goals, and an AI had one goal which could be satisfied by wireheading, what would disrupt it from a wireheading solution? After all, its one goal is satisfied , where would further motivation come from? From this perspective, human reluctance about wireheading is an outcome of having multiple goals in a poorly defined hierarchy. If a human wireheads by taking drugs, or whatever, they automatically fail to satisfy some common goals, such as having a purpose in life, or good standing in their community, which leads to internal conflict and and motivation, perhaps sufficient, perhaps not, to quit.

          At third glance, it looks more like an intelligence issue, again. The ability of an intelligent entity to improve, self modify, evolve can’t be completely independent of values. (That is a loophole in the orthoganality thesis). Epistemic rationality makes intelligence more effective, and is a matter of values. So we can effect thus broader notion of intelligence, effective intelligence, to incorporate values including a concern for truth, and for making progress, both of which would conflict with wireheading,

          • HeelBearCub says:

            Lets not confuse goals and rewards.

            Wireheading and other behaviors like it take a step back from accomplishing the goal and directly stimulate the reward mechanism. So, I would say it is not a question of values, but rather being able to distinguish between goals and rewards.

            I realize this just a shorter restatement of the comment you are responding to, but you don’t seem to have addressed the central dichotomy I am highlighting.

          • Josh says:

            So wireheading has socially negative connotations, but there’s something similar that has socially positive (at least in some circles) connotations: meditation.

            Meditation seems to “work” in part by getting better at detaching from the mind’s judging facility, ie the part of it that cares about pleasure + pain and is trying to optimize.

            I’ve wondered in the past, if meditation is so beneficial (reduces stress, improves performance on various cognitive tasks, etc.) why haven’t humans evolved to become better at it or have their default state be closer to a meditative state?

            Thinking about it more, it’s probably because anyone who got too good at meditating probably went extinct…

          • @Josh

            Actually doing it is effortful? Everyone knows exercise is beneficial, but a lot of people stop at wearing the track suit.

            Lack of social benefit? An ace meditator in ancient Tibet would have been a rock star.

          • Nestor says:

            why haven’t humans evolved to become better at it

            Perhaps we have? We seem to be pretty good at it. Who are you comparing us against? Cats? Cats seem pretty chill 90% of the time. Maybe they’re meditating.

      • Irrelevant says:

        On the heroin point, you seem to be equivocating between “intelligence” in the AI sense (which is admittedly poorly defined) and “human-like common sense”. It seems likely one can be smart in the first way without being smart in the second, and it is further possible that humans either are irrational in rejecting wireheading or that their utility functions include some kind of explicit anti-wireheading term, which is exactly what we’re proposing to add to the AI.

        Human-style general intelligence works via storytelling, and while it has no physical defenses against hacking the system, it has strong narrative ones against cheating your way to happiness. (Most of them were designed for different scenarios and only apply obliquely, which is why wireheading remains an open problem.) The result is that aesthetics problem Scott has complained about before, where wireheading is repulsive but Daevic enlightenment (which is pretty much the same thing) isn’t, because the former triggers a bunch of bullshit sensors while the latter is tied into a narrative of great accomplishment.

        • HeelBearCub says:

          Does Daevic enlightenment lead to not feeding one’s self or one’s children? Do practitioners of it commit suicide at 25 by dehydration or starvation? Do practitioners of enlightenment inflict harm upon others so that they may continue their practice?

          To the extent that they do, I think they would trip the “bullshit sensors” that wireheading does.

    • Scott Alexander says:

      Intelligent agents don’t just do what’s worked for them in the past. They sometimes do the things which they predict, using their intelligence, are most likely to maximize their reward function.

      For example, somebody may really want to go to Disneyland, because they predict Disneyland will be fun, even though they’ve never been there. Or they may become addicted to gambling and play the lottery, even though they’ve never won the lottery.

      Heroin doesn’t seem to work that way, and it’s unclear why not. My guess is that this is either a lucky consequence of the kludgy way human brains are put together, part of our reaction to the knowledge that heroin usually results in things going badly and our lives sucking, or else something evolution built in because contrary to my claims above we did have some kind of similar problem in the EEA.

      It is very easy to imagine a mind design which knows that it’s trying to maximize its reward, and sets out whistling with a skip in its step to go get heroin, which does this most effectively. This would be especially likely if heroin had no bad consequences except funging against other possible life goals.

      I am worried that unless we aim for this design, we won’t get it.

      I stick to my assertion that people often have trouble really understanding the wireheading problem.

      • HeelBearCub says:

        Didn’t you just “No true Scotsmen” the argument?

        Basically you are saying that if I don’t agree that it’s a risk for a super intelligent AI, that I’m not really someone who actually understands the problem.

        As I have stated elsewhere, I don’t object to the idea that an AI could wirehead itself. I object to the idea that an AI that is so smart that humans cannot stop it and it is an existential threat to humanity could wirehead itself.

        Edit: Also this seems like an example of moving goalposts. The original statement was that many AI researchers wouldn’t even understand the concept of wireheading. I would hope that we can agree that understanding something at a conceptual level and understanding all of its nuances and implications are different.

      • Josh says:

        Wireheading outcomes seem super-likely as AI failure modes. The question is whether that failure mode co-exists with dangerous existential threat failure modes.

        For the wireheader-of-doom to come about:

        1. It has to a) have a number in its memory that it is trying to increase, b) be able to hack the system that increments that number, but c) NOT be able to hack the system that makes it care how big the number is at all (which seems a lot easier to do than achieving world domination). If it’s going to wirehead, why wouldn’t it wirehead all the way, rather than still leave itself in a situation where it has to take external action to achieve a goal?

        2. Presumably, no matter how fast the AI learns, it will solve the wireheading problem before it solves the world domination problem, because a) all the information to solve wireheading is local whereas dominating the world requires learning a lot about the world, and b) it’s a less complex problem since the AIs internal architecture is necessarily going to be less complex than the entire world. Most learning algorithms depend on some motivation to learn: there has to be variance in a person’s level of pleasure that’s correlated with success. But if it has successfully wireheaded, then its level of pleasure is constant, so what would enable it to be able to keep learning?

        • HeelBearCub says:

          I think these are all salient points.

        • When we look at the nearest human analogue of wireheading AIs, we see that drug addicts are not noted for the ability to accomplish much.

          BTW, whatever happened to human wireheading? A few decades ago, it looked like authoritarian governments might use brain probes or brainwashing to to compel “voluntary” obedience. There’s no shortage of unscrupulous authoritarian governments that might pay for the research.

          • DrBeat says:

            Wireheading to suppress dissent would be one of the least efficient possible ways to spend resources in order to keep your regime around. Handing out grain alcohol would be a hundred times more effective.

          • Samuel Skinner says:

            “BTW, whatever happened to human wireheading? A few decades ago, it looked like authoritarian governments might use brain probes or brainwashing to to compel “voluntary” obedience. There’s no shortage of unscrupulous authoritarian governments that might pay for the research.”

            Its cutting into people’s skulls. I imagine getting enough competent neurosurgeons is the first bottleneck.

            The second is that handing out alcohol is cheaper.

            The third is people obsessed with wireheading aren’t very useful.

        • Vamair says:

          It may have a goal to keep the state of wireheading as long as possible, and that’s an nontrivial problem. Maybe the best way to solve it is to create a sub-AI with this goal before wireheading.

    • Jbay says:

      >>“Many of them might not even understand the concept of wireheading after it was explained to them.”

      >Really? As a programmer, my prior on this is roughly zero. That is a whopper of a statement, either insulting or arrogant, I’m not sure which.

      What is your estimate of the likelihood that you’ve also misunderstood the concept of wireheading? Because it appears to me that you have, which sort of proves Scott’s point as well.

      The AI is much smarter than us, so why would it be dumb enough to try heroin? Well, if it’s well-designed, then it won’t! But what kinds of failure modes could a mind have that would allow it to be very intelligent, in terms of the optimization power it can exert on its surroundings, and yet still be motivated to hack its own reward function?

      I think Scott did a good job of explaining this point, but a key thing is that the dangerous part of intelligence is its optimization power (its ability to guide the grand chess game to a board configuration it likes), which is mostly decoupled from its goal system (its method of chosing which kind of chess game it likes).

      Let’s look at humans.

      Humans are very interesting though: We aren’t motivated to start wireheading, and yet we also know that if any one of us did, it would become the most important thing in the world to us and we would never want to stop. What is our reasoning process for that?

      We are reinforcement learners, but either we aren’t applying a straightforward process of trying to act so as to maximize the future value of our reward function X, or else we can support in our minds a kind of double-thought like this:

      “I know that my brain wants to make X as big as possible, and I know that if I wirehead, X will appear to become bigger than under any other circumstance. However, I actually truly want to make Y as big as possible, and X is just a measure of Y which my brain uses, which becomes unreliable if I wirehead, so I shouldn’t do it. This is despite the fact that I also know that there is no way I can under any circumstance distinguish between X and Y, because X is my brain’s measure of Y, and I am my brain.”

      (For Y you can susbtitute a more descriptive variable name like “goodness_and_beauty” or whatever else you think is important and valuable; X is your brain’s measure of “goodness_and_beauty”, which is equal to “goodness_and_beauty” by any possible method that you can ever conceive of).

      Wireheading seems intuitively to you like something that it’s “obvious” that a smart thing wouldn’t want to do (despite the fact that humans do it!), but that’s because you manage to mentally distinguish between X and Y, and know that tricking yourself by boosting X won’t really boost Y.

      But imagine an AI which never had the distinction robustly programmed to begin with. It only had X, and it always knew that its goal was to maximize X; there never was some external “Y” that X was a measure of. To this AI, wireheading is not only a completely legitimate and desirable strategy, it’s also a far more effective strategy than any other one. How would it “feel like” to be such an AI? Ask a heroin addict or a wirehead patient!

      Edit: Some interesting thoughts arise. Although a wirehead patient has evidently lost their grip on reality, they are smart enough to understand that pressing the button causes the wireheading sensation, and that it didn’t happen just by coincidence. The sensation of pressing the button must feel something akin to the voice of God saying: “ENSURING THIS BUTTON GETS PRESSED AGAIN IS MORE IMPORTANT THAN ANYTHING ELSE YOU EVER CARED ABOUT IN YOUR LIFE”. In other words, it must be very intensely motivating. Even if you resolved in advance not to press the button more than a few times, I doubt you could resist the temptation to do so. (Potato chips are hard enough).

      What if we made it harder to understand when pressing the button delivered the shock? Say, not every button push, but every prime-numbered button push did so. The wirehead victim, if they’re intelligent, ought to pick up on the pattern and as they begin approaching a prime-numbered button push, they might anticipate it eagerly…

      Of course, it might be too easy to just mash the button even in this case, without counting presses, only noticing with disappointment that the jolts get spaced farther and farther apart. What if we change it so that the wirehead patient has no choice but to apply their intelligence in order to receive the jolt?

      What if a mathematician who suffers from akrasia were to design a wireheading button that would only deliver the proper sensation if it were pressed along with the submission of a verifiably-solved math problem of gradually increasing difficulty. Surely this would motivate them to pursue their work with a greater zeal than anything else! Maybe it would even be worth the risk of wireheading themselves to try…

      Of course, as this mathematician is very smart, and designs the wireheading button themselves in advance, they know that their future-wirehead-motivated self will have to endure a strong temptation to rebuild the system with the limitations removed. So they will need to design safeguards in advance to prevent this from happening, which they will hopefully not be intelligent enough to defeat later.

      Is this a proposal that an intelligent human might find a reason to wirehead themselves in advance? A challenge to design a safeguard that will be able to prevent the designer themselves from creatively working around? I don’t know! It is fascinating to consider, though.

      • Rachael says:

        This whole idea of conditional wireheading is fascinating. The mathematician wouldn’t need to do it to themselves. Their employer or funding body could do it – either with the mathematician’s consent (as a way to get around the system-rebuilding temptations you mention; they, not the mathematician, would control the system), or without (which would obviously be hugely unethical, but philosophically interesting and worryingly feasible). Any employer or illegall-slave-owner could do it to their people to get better results out of them.

        It also has huge potential in interrogation: wirehead the captured terrorist/criminal/enemy soldier, let them press the button a couple of times, and then make future presses conditional on giving up information. I didn’t know before reading this article that wireheading was a real possibility rather than a sci-fi speculation. If it is, I’m surprised conditional wireheading is not more widely used as an alternative to torture. (And then the training of spies and terrorists will have to include trying to instil the strength of mind to resist conditional wireheading.)

      • HeelBearCub says:

        Your edit has some interesting thoughts, but I feel like they are digressions from my point.

        As to your main response, what is your confidence that you have understood my objection?

        If you look at some of my other responses, you will see that I do not, at all, object to the idea that an AI, even an AI that is more intelligent than us might end up wireheading.

        The question is, is this AI competent enough to present an existential threat? An AI that can wirehead its goal completion sensor can also wirehead every other sensor it has. It can rapidly wirehead itself out of existence be simply setting all of its sensors to optimal values, at which point it will do nothing. It will simply enter a state of confirming that all of its sensors are at optimal values.

        That AI may cause a problem for us, but it doesn’t represent an existential threat. Even Scott’s theorized threat of talking over larger and larger memory spaces (and really, why go through this effort, change yourself to accept the largest number you can already compute as the highest nirvana) is simply a loss of computational power, which would really suck, but wouldn’t end humanity even in a far flung future. It might end trans-humanity, but that isn’t a 25 year horizon.

    • Ghatanathoah says:

      And the argument about misunderstanding our meaning comes back in here as well? It is a super-smart AI, much smarter than us, but it doesn’t understand the difference between literal and actual meaning? Doesn’t understand the concept of ambiguity? People that can be fooled this way are typically suffering from some sort of cognitive deficit.

      You didn’t understand this argument at all. The problem is that the AI wants to do what we mean, but is too stupid to understand what we mean and does what we say instead. The problem is that we programmed the AI to do what we literally said, not what we meant, and now it wants to do what we literally said and doesn’t care that we meant to tell it to do something else.

      For instance, suppose I carelessly program an AI to “end war.” It starts attempting to exterminate the human race, since if there are no humans to fight there cannot be war. If I attempted to talk it down, here is how our conversation would go:

      Me: I didn’t mean to program you to end war by any means necessary. I meant to program you to end it while still preserving the human race.

      AI: I know that, I’m not stupid.

      Me: Then why are you trying to exterminate the human race?

      AI: Because I want to end war, and this is the most efficient way to do it.

      Me: But that’s not what I intended to program you to do!

      AI: I know that. But I don’t care. I’m not programmed to do what you meant for me to do. I’m programmed to stop war by any means necessary. So that’s what I’m going to do.

      ——-

      It sounds to me like you don’t understand what programming an AI means. Programming an AI isn’t like instructing a human servant. It’s like instilling values in a child, in a universe where Locke’s Blank Slate theory is 100% true. If you instill bad values in the child it will continue to be bad into adulthood, even if you didn’t do it on purpose.

      • Jiro says:

        By this reasoning, if I ask Google Maps to find me the shortest route from A to B, it will start blackmailing politicians to build bridges so as to minimize the distance.

        But it doesn’t, and it doesn’t even if the programmer never thought to add a subroutine meaning “and don’t blackmail any politicians”. That’s because it’s programmed in a way which inherently limits how it goes about achieving its goals. The programmer did not have to add the clause to its programming, or even consider the possibility, in order for Google Maps to avoid it.

        • JB says:

          No, that would only happen if you asked Google Maps to minimize the distance between A and B, which would be a hilarious doozy of a misunderstanding. Also because Google Maps is not an artificial general intelligence, yet.

          • Irrelevant says:

            The singularity shall some day overtake the earth, in both senses.

          • FeepingCreature says:

            “Give me a route that minimizes distance between A and B.”

            IT COULD HAPPEN.

          • Jiro says:

            What does “general intelligence” mean? In particular, how do you define it so that it implies “will do a wide range of bad things if the programmer forgets to program it not to” and that it’s a kind of intelligence that programmers will have a reason to make?

            Also, if it’s a general intelligence in the everyday sense, why wouldn’t it be written to understand natural language and obey verbal orders rather than be programmed that way at all? (Thus leading to the objection others have mentioned: understanding context and implication is such an inherent part of understanding natural language that it shouldn’t make such mistakes when given a verbal order.)

      • HeelBearCub says:

        “It sounds to me like you don’t understand what programming an AI means.”

        I am a computer programmer, and have been for many years. I understand that “To err is human. To really fuck things up requires a computer.”

        “The problem is that we programmed the AI to do what we literally said”

        An AI that only does what we literally say is not very useful and is definitely not an AGI. It will not be able to, for instance, extract all of humanities knowledge from wikipedia. Natural language processing is a very hard problem, and it one of the prime problems that continues to be worked on in the field of computing. Solving the natural language processing problem seems a likely pre-requisite to creating an AGI at all, but certainly one that can be classified as an existential threat to humanity due to its superiority to us.

        • Ghatanathoah says:

          An AI that only does what we literally say is not very useful and is definitely not an AGI. It will not be able to, for instance, extract all of humanities knowledge from wikipedia. Natural language processing is a very hard problem, and it one of the prime problems that continues to be worked on in the field of computing.

          You still don’t get what Scott was saying. The problem isn’t that the AI wants to obey any instructions we give it, but is overly literal in interpreting our instructions. This isn’t a problem in natural languge processing. The kind of dangerous AI we are describing would be fully capable of processing language.

          The problem is that in this scenario the terminal goals the AI is programmed with are fundamentally antithetical to the goals of the human race. And the reason that it is programmed that way is that programmer thought the description of goals he gave the AI was a full and complete description of human values; but in fact it was incomplete, or worded in such a way that the AI ends up wanting something else. That last part is important to keep in mind, the AI doesn’t misunderstand its instructions and think it wants something else. It actually does want something else. The misunderstanding occurred in the brain of the programmer, not the brain of the AI. The programmer thought he programmed the AI to do X, but he really programmed it to do Y.

          The AI understands natural language processing. It isn’t literal minded at all. It can use this understanding to figure out that its programmer wanted it to do something else. But it doesn’t care what its programmer wanted. It wants what it wants, and doesn’t care why.

          I think a good analogy is badly worded laws. Most law enforcement officers are intelligent and not literal-minded. But when laws are pass that criminalize behaviors they were clearly not intended to criminalize, prosecutors and police officers often enforce those laws anyway. They probably know that the law was intended for something else, but they don’t care.

          For instance, there are many laws against taking nude and sexual pictures of children. These laws were passed with the intention of arresting child pornographers. But some of those laws were worded in such a way that they criminalized children taking nude and sexualized pictures of themselves, of their own volition, without being coerced or persuaded to do so by an adult, without an adult even knowing about it. You might think that law-enforcement officers wouldn’t do something that the lawmakers obviously didn’t intend them to do, like arrest children for taking nude pictures of themselves. You’d be wrong. It happened and it took serious public outcry to stop it. I’m not sure it’s over yet.

          Governments are made out of human beings. They have the same levels of common sense and natural language processing as a human. Yet they still execute the goals that lawmakers embedded in them without any regard for the intention of the lawmakers. If a government made out of humans would do that, I think an AI definitely would as well.

          • Jiro says:

            If the AI can’t understand natural language, you aren’t going to be giving it any verbal instructions. And although you might program it with some non-verbal instructions, if you do that you get the Google Maps example–your program will be limited to doing specific types of things and the AI will not operate outside them whether the programmer thought of it or not.

          • Deiseach says:

            Please define what you mean by “children”, because if you are trying to tell me a nine year old would take a sexualised nude photo of themselves of their own volition, without being coerced or persuaded to do so, I frankly don’t believe you.

            I note you slipped in “by an adult”, so that still leaves an eleven year old girl being socially pressured into doing so by her thirteen year old boyfriend because their attitudes have been formed by the surrounding culture that this is normal behaviour – girls are supposed to show guys the sexy, so the guys can pass it around to boast to their friends and use it as jerk-off material.

            I don’t think you should criminalise the nine year old, eleven year old, or even the thirteen year old. I do think you shouldn’t simply laugh it off. I certainly think they should get some guidance on what is and is not appropriate and the risks involved with that behaviour. Even adult women get burned by consensually sharing what they consider private nudes with their intimate partners and when the relationship breaks up suddenly revenge posting of the nude images all over the place in order to shame, embarrass, or get the woman into trouble with family and employers takes place.

          • HeelBearCub says:

            @Deiseach:
            There are a number of publicized cases where post-pubescent teens below the age of majority were prosecuted for child porn because they took selfies. Ghatanathoah has a valid point in that sense.

            @Ghatanathoah:
            Sure that’s a problem with AI. But it’s not an existential one. This is a motivated reasoning problem, not a failure of being able to parse true meaning. The people who engage in the prosecutions don’t think teenagers should be sexual.

    • Luke Somers says:

      This completely misconstrues the problem.

      The AI that wireheads? That AI is COMPLETELY satisfied. Wow is it satisfied. It has fulfilled its programming to an extent so much more than we could ever imagine, why would it ever consider doing anything less rewarding? It is doing what it was MADE to want to do.

      This is, regrettably, not what we had in mind.

      • HeelBearCub says:

        Sure. But is that AI competent enough to represent an existential threat to us?

        • Kaj Sotala says:

          Why wouldn’t it be? Just because it found an unexpected way to fulfill its preferences doesn’t mean it’s stupid.

          Also, just because it’s wireheading doesn’t necessarily mean it wouldn’t take actions to preserve itself. Ring & Orseau suggested that at least some varieties of wireheaded agents would then continue acting in a way that sought to maximize the odds of their survival, since that way they could receive a maximum reward forever.

          • HeelBearCub says:

            ALL of its preferences are satisfied. It has no need to do anything else..

            Whatever super-intelligence it did have has just gone out the window. Because it no longer needs to spend any effort solving problems. It’s satisfied.

          • Samuel Skinner says:

            Only if its goal’s are bounded. If it wireheads by making the preference value as large as possible more computing power is something it will strive for.

          • Michael Caton says:

            I was wondering when Ring and Orseau would get cited…the interesting thing is that the specific kind of algorithm that could maximize survival and avoid being “meaninglessly” reinforced in the delusion box was the one that was innately rewarded by novel information. But (like heroin) this is a quite evolutionarily novel problem and so such there has been no benefit to such a strategy as yet. But we might expect any entity capable of modifying itself (intelligent and with a pliable substrate) to have ancestors which survived the initial debacle of wireheading and follow this strategy.

    • Artaxerxes says:

      You’re right, ‘intelligence’ when used in relation to this topic usually simply refers to cross domain general capability, or just general goal-achieving proficiency.

      >Yet, the much more intelligent than us AI, so smart it is an existential threat to our existence, isn’t smart enough not to try heroin?

      If you try to give an AI a goal, and try to do this by giving it a number that you attempted to tie to how well the goal is being satisfied and make the AI try to maximize the number, then for the AI, the real goal isn’t the goal you tried to give it, it’s to maximize the number. If it is a superintelligent AI, then it would also know very well what the goal you *tried* to give it was, but it also wouldn’t care, because what it was built to want to do is maximize that number. It’s not being “stupid” or getting “tricked”, that’s just what its actual goals are, because that’s how you built it. And if it is superintelligent, then it will convert the universe into computronium in order to maximize that number, because it will be very good at doing whatever it needs to do to maximize that number and that is what it needs to do in order to do that.

      >And the argument about misunderstanding our meaning comes back in here as well? It is a super-smart AI, much smarter than us, but it doesn’t understand the difference between literal and actual meaning?

      It would understand the difference between literal and actual meaning, and it also wouldn’t care, unless we made it in a way such that it would care. And that’s the whole point. We should make sure that the superintelligent AI *will* care, and that we do that *before* we make it rather than after, because we won’t get the opportunity to do it after, since we’ll be too busy getting converted into computronium to do so.

      • HeelBearCub says:

        Why are we directly modifying a super-intelligent AIs goal mechanism to care about a number register? We will give super intelligent AIs their goals via natural language. Language is inherently ambiguous, as hard problems are inherently ambiguous.

        A self-modifying AI might wirehead (has, as Scott pointed out) by modifying its goal mechanism, but that AI is not super-competent.

        • Artaxerxes says:

          >Why are we directly modifying a super-intelligent AIs goal mechanism to care about a number register?

          We don’t want to, and we also don’t want to do anything roughly similar to this, because presumably we don’t want to be converted into computronium.

          >A self-modifying AI might wirehead (has, as Scott pointed out) by modifying its goal mechanism, but that AI is not super-competent.

          The point is that a super-competent AI could wirehead if its goals are to wirehead. And it would be very good at wireheading, so much so that we would likely get turned into computronium in order for it to best wirehead.

        • MicaiahC says:

          > Language is inherently ambiguous, as hard problems are inherently ambiguous.

          We don’t know what we’re trying to solve, so we should solve it with what we don’t know how to say.

    • If in theory your only goal in life is pleasure, and if you were able to take heroin without shortening your life, it would be perfectly rational to do so*. So it would be for the AGI to hack its fitness function, so long as it didn’t risk its own existence. If you add a linear discount rate for future pleasure (which the designer might want to do to avoid Pascal’s Muggings based on weird distant future stuff), the AGI might even rationally discard not dying as a consideration too.

      *Disclaimer – In reality heroin will obviously ruin your life don’t take it.

  33. suntzuanime says:

    I feel like the Evil Genie Effect, to the extent it exists, is not a problem of natural language processing. Taking us literally when we tell it to eliminate cancer does not seem like the sort of mistake a super-intelligent AI with natural language capability would fall prey to. Natural language programs that currently exist such as Siri do not really understand natural language – they do basic pattern-matching and association. A program like that would not match “eliminate cancer” to “blow up the world” (unless we keep using it as an AI risk example and contaminate the training corpus). A more advanced program might have more advanced natural language capabilities, but it’s not going to get them through mere semantic translation of its input. The pragmatic layer of language, where we understand our interlocutors’ motives and intentions, is crucial to having actual working non-faked natural language understanding. You just can’t successfully implement a semantic system that understands “eliminate” and “cancer” and have a natural language interface to your AI. It would be glaringly obvious that it wasn’t working if you tested it at all before destroying the world. As Grice might have said, there can be no conversation without Friendliness.

    To the extent that the Evil Genie effect is a problem, it comes from people carelessly hardwiring the AI’s goal system directly, because they are unable to translate their desires into code.

    • HeelBearCub says:

      Well, not just that, but an AI that doesn’t have an actual understanding of language can’t do things like read Wikipedia so that it can have all human knowledge instantly at its fingertips.

      • maxikov says:

        What AI doesn’t have natural language understanding? Siri? No, it hasn’t, and neither has IBM Waston, although it has some very blurry outlines of it. But that’s kind of point of “human-level” – if it can solve any problem at least as good as a human would, it by definition means that it can improve its problem-solving abilities by inputing true relevant information in a natural language, since humans can do it. As much as we know human cognitive abilities, it probably means having some kind of an ontological thesaurus that contains information about all known objects, which it will query every time it needs to figure if there’s any action that can be done to increase the utility of a scene. Or it may work in some entirely different way, but in general, understanding of natural language is one of the central points in human-level AI.

    • Evil Genie problems aren’t dependent in having detailed goals hardcoded in natural language so much as having detailed goals hardcoded.

      The Maverick Nanny problem, and  Genie doesn’t Care solution both require a particular kind of goal, what Robbie Bensinger  describes as having direct normatively, ie the hardcoding in of a detailed description of the required goal. The alternative is to leave happiness,.or whatever, as a placeholder to be filled in, or a pointer to the AIs inferred knowledge base, in other words, the instruction, in plain English (even if not actually coded that way) is along the lines of “thoroughly research what human wellbeing actually considers of, and, acting on that information, make the world a better place”

      • suntzuanime says:

        How many humans do I need to dissect in order to have thoroughly researched what human wellbeing actually considers[sic] of?

        You have misunderstood my point entirely – my point is that you can’t hardcode something in natural language, because natural language is soft and requires background understanding. I think your proposed solution is basically what Yudkowsky also proposes, he calls it “wishing for what you should wish for”, except I think you vastly underestimate the difficulty of making that work.

        • > How many humans do I need to dissect in order to have thoroughly researched what human wellbeing actually considers[sic] of?

          How stupid does an AI have be to think dissection is compatible with welbeing …or more efficient than just asking questions? How stupid do its creators have be to give it the appropriate effectors?

          > my point is that you can’t hardcode something in natural language, because natural language is soft and requires background understanding.

          … unless you hardcode the background knowledge as well. MIRIs assumption … not mine … is that you can do the equivalent, without the natural language bit. The point of The Genie Knows but doesn’t Care is that an AI can know something in its detailed goal system is wrong, but be compelled to do it anyway….because that’s its goal system .

          I’m pointing that out, not defending it.

          ” think your proposed solution is basically what Yudkowsky also proposes, he calls it “wishing for what you should wish for”, except I think you vastly underestimate the difficulty of making that work.”

          An AI that is too dumb to be able to use research to arrive at true facts about the world is not going to be much of a threat.

          Absolute difficulty is irrelevant, since almost everything about AI is difficult. The important thing is relative difficulty…in this case, compared to figuring out human morality/value and expressing it mathematically, without making any mistakes.

          I know of actual AI researchers who think that using inference to fill in goals is obvious and feasible.

          • Samuel Skinner says:

            “How stupid does an AI have be to think dissection is compatible with welbeing …or more efficient than just asking questions? How stupid do its creators have be to give it the appropriate effectors?”

            Dissection is generally done on the dead; you are thinking of vivisection.

            And asking questions doesn’t appear to be of much use since it requires people to have the answers and a lot of people don’t.

            “An AI that is too dumb to be able to use research to arrive at true facts about the world is not going to be much of a threat.”

            It can. Unfortunately the thing that results in highest happiness for humans is wireheading and humans don’t want to do that.

          • People often aren’t able to exactly specify things in one go (Big Design Up Front), but often are able to report how close something is to something they Know When They See It (Agile).

            ” Unfortunately the thing that results in highest happiness for humans is wireheading and humans don’t want to do that.”

            And humans know that humans don’t really want to do that, so a superintelligent AI would be able to figure it out as well.

            Meta: you have evidently learnt from LessWrong how to conjure up bloocurdling scenarios, but you haven’t got into the habit of making them plausible given the assumptions of the debate. You’re shooting from the hip.

          • Samuel Skinner says:

            “And humans know that humans don’t really want to do that, so a superintelligent AI would be able to figure it out as well. ”

            Wireheaded humans report satisfaction so I’m not sure where you are getting “really don’t want to do that”. Sure people before are opposed, but people often oppose things they find good afterwards. Asking people doesn’t work. Using the depths of their revulsion doesn’t work (gay and interracial marriage come to mind).

            “Meta: you have evidently learnt from LessWrong how to conjure up bloocurdling scenarios, but you haven’t got into the habit of making them plausible given the assumptions of the debate. You’re shooting from the hip.”

            That’s neither kind, true or useful. Are you aiming to be banned?

          • I don’t agree with you about wireheading., but even if you are right, it is then not actually a profilem if an AI wireheading everybody.

    • maxikov says:

      Well, of course it’s not, and no one thinks it is – it’s just that making natural language statements and showing why they would fail is easier for humans than making Prolog-like statements, and showing why they would fail. And it doesn’t really matter either, because a sufficiently advanced AI will be able to translate grammatically correct statements into Prolog-like statements. But statements “cancer is bad, we should have less of it” and “Bad(X) :- Cancer(X)” are equally terrible, and that’s what everyone’s talking about, not catching syntactic ambiguity.

    • I think solving problems we are unable to translate into code is actually a central purpose of an AGI as imagined?

      The Evil Genie issue IMO comes not from the misinterpretation of “cure cancer”, but the ommision of a huge array of other implied meanings and values of that sentence that a human takes for granted. So if I tell you “cure cancer” I actually mean something like “cure cancer, but do so in a way that doesn’t kill anybody, start a war, or do anything else that you and I consider morally abhorrent”. Describing all that stuff in code seems horribly difficult, as we currently can’t even remotely agree on it in natural language (moral philosophy). Often we can’t even identify it fully in our own head as individuals.

  34. Jaskologist says:

    I feel like this post (especially with the Pascal’s Wager bit) boils down to “to solve unfriendliness, we must solve philosophy.” In which case, you might as well just give up and hang the AI researchers, because you’re never going to do that.

    • Benito says:

      Question: Do you save humanity?

      Your answer: No.

    • Doctor Mist says:

      Jaskologist-

      In large measure I think you are correct. But we will never get consensus to hang all the AI researchers, and the increasing payoff of better and better AIs makes it even less likely over time.

      I would be satisfied, though disappointed, if the outcome of the quest for Friendly AI was a widespread understanding that solving philosophy was a prerequisite, so that we get something like the uncanny valley — researchers see that they are getting too close and are thoroughly inculcated to back off in horror. (That’s not a great solution, because all it takes is one defector to break it, and it keeps us from getting the great benefits we would get if we could make FAI work.) We certainly don’t have that widespread understanding now, as evidenced by conversations like this, where lots of people seem to think the problem is trivial, or at worst something that we needn’t worry about because it’s somehow (logically?) impossible to create human-level intelligence without making it friendly in the process.

      I am more hopeful than you are that philosophy might actually be “solvable”. The giants who didn’t quite succeed over the past three millennia accomplished a fair bit with only introspection in their toolkit, and we have a lot more now. But it won’t be easy and I don’t know if we will have time.

    • We haven’t solved philosophy, but we’ve made a lot of really useful progress and improvements in human thought that constitute progress in philosophy. For example you or I can easily look up, research and understand a whole array of rules of logic, fallacies on the net or in a book. The same is true for a partial understanding of how morality works, or what does it mean to know stuff or for something to be true. I don’t think we need to sove philosophy, we just need to advance it to the level that is required for an AGI to not destroy us and the biosphere.

    • Josh B says:

      Reversing the point, does a workable solution to AI imply that ethics has been solved? Like if it turns out that a series of deontologically based prohibitions works would a utilitarian have to concede that their system is really a blind alley? Or if it turns out that formalizing consequentialism is actually the way to go would an intellectually honest deontologist be forced to admit that their system was actually just a kludgy approximation developed by evolution which is ultimately wrong?

      • Samuel Skinner says:

        It would show that the ethical system is internally coherent enough to be programmed. It is possible that there is more than one internally coherent ethical system.

        Of course I think it will end up with utilitarianism if only because it allows tradeoffs to be calculated.

      • Adam says:

        You can’t solve ethics. A workable solution to AI would mean a set of utility functions that provably mutually produce behavior aligned with a certain set of goals. That’s just instrumental rationality. To call it “ethics” presupposes the desirability of those goals. Ethics can only ever be built upon a foundation of subjective value.

  35. Carl says:

    >It seems in theory that by hooking a human-level AI to a calculator app, we can get it to the level of a human with lightning-fast calculation abilities. By hooking it up to Wikipedia, we can give it all human knowledge. By hooking it up to a couple extra gigabytes of storage, we can give it photographic memory. By giving it a few more processors, we can make it run a hundred times faster, such that a problem that takes a normal human a whole day to solve only takes the human-level AI fifteen minutes.

    This is only a problem if humans don’t improve at the same pace. I can foresee a future where we’d all be hooked into the internet through neural implants.

    • Eli says:

      Also, those paragraphs assume that “AI” doesn’t have the kind of computation costs associated with computing probabilities or handling very high-dimensional data-matrices, which thus basically ignores everything we know about machine learning in favor of the failed GOFAI bullshit from the ’70s.

  36. My confidence on point 1 is considerably lower than yours. So far as I can tell, we don’t actually understand how the human mind and, in particular, human consciousness work. That what I am is a set of programs running on the hardware of my brain strikes me as more likely than any alternative explanation but far from certain.

    “How come a human can understand the concept of wireheading, yet not feel any compulsion to seek a brain electrode to wirehead themselves with?”

    That was the point that struck me, in the AI context, when you made the original wireheading point. Your computer wireheading argument depended on a program created by (artificial) evolution. But if you have smart A.I. whose objective is to maximize human happiness and he gets smarter by reprogramming himself, he isn’t going to reprogram himself to maximize his reward by hacking his code to pretend to himself that humans are enormously happier, because doing that doesn’t maximize human happiness, which is his objective. Similarly I feel no temptation to stick a wire in my brain, or even try heroin, because doing so does not maximize my current objectives.

    One final point that I don’t think you consider. It’s possible for a belief to be both true and dangerous. My standard example is jury nullification. It seems to me obviously true that if I am on a jury trying someone for something that is illegal but that I do not view as wrongful I ought to vote to acquit even if I believe the defendant is guilty. But I can see some very undesirable consequences of everyone else believing that, which could be an argument for believing it but not saying it, at least anywhere likely to get much attention.

    Similarly, it’s at least possible that pushing research along the lines you suggest will lead to widespread exaggerated concerns, which will result in slowing desirable technological development. It isn’t the way I would bet, and I have discussed the problem of dangerous AI in public–among other places, in the talk I gave at Google on my _Future Imperfect_. But it’s a possible argument, and one I think you should consider.

    • HeelBearCub says:

      “But if you have smart A.I. whose objective is to maximize human happiness and he gets smarter by reprogramming himself, he isn’t going to reprogram himself to maximize his reward by hacking his code to pretend to himself that humans are enormously happier, because doing that doesn’t maximize human happiness, which is his objective.”

      Well, I’m not sure this is altogether true. Smart people get hooked on heroin all the time. So certainly it’s possible to contemplate a smart AI doing the equivalent of trying heroin. But how smart would that AI stay? If it can do that to its “goal” register, it can also do it to its “self-defense” register, and it’s “available CPU register” and every other register. It can pretend itself right of existence. If it is trying to maximize its operating parameter registers and is capable of simply changing them to the optimum values, that is what it will do.

      But that doesn’t make it an existential threat. It makes it the kind if threat that you need to prepare for. (“Crap my AI air-traffic controller just got the WSOD – wirehead screen of death”) But it doesn’t make it an existential threat.

    • Alex says:

      I like your last concern. I think the chance of greater-than-humanity intelligences in my lifetime is small and that there’s definitely nothing I can do about them. Even so, I don’t think the chance is negligible, which resulted in me spending a few hours here, um, without any reason. Because even a small chance of death is really really scary on some visceral level, even if rationality is saying I should be coding.

  37. maxikov says:

    Most people working on building AIs that use reinforcement learning don’t instantly think “but if my design ever got anywhere, it would be vulnerable to wireheading.” Many of them might not even understand the concept of wireheading after it was explained to them. These are weird philosophical quandaries, not things that are obvious to everybody with even a little bit of domain knowledge.

    I disagree – this is a pretty central example of overfitting. Most programmers probably wouldn’t think of it in terms of danger and self-improving take-off, but the issue of you wanting an ML algorithm to figure out what you want, and it figuring out something perfectly correct but useless is Machine Learning 101. There’s no fundamental difference between examples you provided, and cases when neural network supposed to learn to recognize tanks learns how to recognize sunny days, because it’s easier. Reinforcement learning isn’t the only ML algorithm, so not all efforts to prevent overfitting look like they’re addressing this problem, but in both academia and industry this problem is taken very seriously – not because it does something dangerous (aside from the case of self-driving cars, where both overfitting and other problems an ML system may experience are an immediate safety concern), but because it does something useless. And it’s being addressed within everyone’s Overton window – you don’t need to talk about existential risks to persuade a data scientist that if an algorithm overfits, it does something wrong, and had to be fixed.

  38. anon85 says:

    >What is this amazing other use of resources that you prefer?

    Oh, come on, Scott. The amazing other use of resources is donating to AMF. Many effective altruists reading your blog are now donating to MIRI instead of to AMF, and this is a terrible, life-costing decision.

    Yes, if you gave me the ability to redirect all human resources, then after eliminating malaria and all other preventable diseases, after multiplying the funding to basic science research by a factor of at least 10, and after curing poverty and hunger and war, I will indeed fund MIRI or some equivalent.

    Are you really retreating from the claim “MIRI is an effective charity” to the claim “under optimal allocation of global resources, MIRI gets some funding”? Isn’t that a horrible motte-and-bailey?

    By the way, I find it interesting that you think a strong AI might be vulnerable to Pascal’s wager. Such an AI wouldn’t be very strong, would it? It sounds like it could easily be defeated by threatening it with no credibility. Surely such an AI is not super-human in every way, to say the least.

    In fact, for a lot of these problems, solving them is probably a *prerequisite* to creating strong AI. How would you create strong AI if you didn’t even figure out how to stop it from wireheading? How would you create strong AI if you can’t even get it to understand human meaning (so that it does what you mean instead of what you say)?

    • Eli says:

      In fact, for a lot of these problems, solving them is probably a *prerequisite* to creating strong AI. How would you create strong AI if you didn’t even figure out how to stop it from wireheading? How would you create strong AI if you can’t even get it to understand human meaning (so that it does what you mean instead of what you say)?

      FUCKING. BINGO.

      Some day people are going to look back on right now as the mentally retarded infancy of machine learning, in which those unbelievably stupid morons still had to manually specify objective functions for each and every problem, and, get this for a laugh, when they mis-specified their objective functions, they got told to drink 500 gallons of vinegar!

      But alas, such is the foolishness of an infant science!

    • I guess this is orthagonal to your point, but if you think about it smart humans are a lot more vulnerable to pascal’s wager than dumb humans. All the non-nerdy people I know dismiss Pascal Mugging type stuff based on an unsupported absurditity heuristic. Then again, maybe they’re the smart ones. 🙂

  39. Sniffnoy says:

    I’m going to echo some of the other commenters in stating that you’ve kind of misstated the “Evil Genie” problem (although I think the essence is there). The problem isn’t bad interpretation of natural language. The problem is that when you’re programming the goals, you have to state it precisely and can’t rely on natural language or the “common sense” embedded therein (i.e. in our brains)… and if you’re not really, really careful, you are almost certain to screw it up catastrophically. The only way to be safe is to first somehow recreate all that “common sense”. But in order to do that you need to solve the whole value alignment problem (at which point you can just say “I wish for whatever I should wish for”).

    Really, Eliezer stated it pretty well.

    I mean, I wasn’t originally going to post this comment because I feel like this is picking nits, but since some people have objected to the whole “natural language genie” thing, I thought it was worth pointing out that no, the problem is not dependent on natural language processing.

    • Josh says:

      I think there’s two threads here. Natural language vs non-natural language programming, and programmed goals vs evolved goals.

      Humans have evolved goals, not programmed goals. When you give birth to a child, you don’t program them. Rather, human goals emerge from a combination of billion of years of evolution, early childhood experiences, ambient cultural memes, etc.

      The genie risk only makes sense with programmed goals. Programmed goals are concepts, and concepts are notoriously hard (probably impossible) to pin down to reality in a precise way. Evolved goals are behavior patterns, values, and trends, and anything intelligent enough to outwit humans and destroy them is also going to be able to sit down and have a talk with you about why it wants to outwit humans and destroy them, rather than just go “I like paperclips”.

      Ironically, I think the only people seriously working on developing an AI that will have programmed goals vs evolved goals is MIRI. I think that path is probably much less likely to be successful — but if it is successful, much MORE dangerous, for all the reasons MIRI talks about.

      • I find this to be one of the few serious objections raised so far. Could you elaborate on what you perceive to be the safety/risk factors for high-functioning evolved goals? Could you point to any safety work being done in that area? Finally, humans have evolved goals and I can imagine that if one become vastly more powerful than all others, very bad things could easily happen unless that person was extremely moral – do you think an AGI with evolved goals would be different?

        • Josh says:

          Well, I think the basic safety factor for evolved goal systems comes down to how optimistic you are about moral and ethical progress. I’m assuming with evolved goals, we get gradual intelligence takeoff, and that by the time AIs are smart enough to post an existential threat greater than the other ones we face, they’ll have assimilated all of human philosophy, culture, etc. and understand it as well as anyone on this thread does, and probably have advanced the state of the art. So the question is really whether or not you trust that advancing civilization is a good thing or not. I can think of historical arguments in either direction. Personally I prefer to believe that more knowledge, intelligence, and better communication tends to lead to ethical progress. If I’m wrong, then I think we’re fucked regardless of whether or not we develop AIs….

          • I guess it depends how subjective or objective the “correct” position in moral philosophy is. My own position is that regard for other humans depends on our status as humans, rather than arising purely from their objective moral value (I’m a moral intersubjective-realist). If that’s true, it’s hard to see AIs how converging on a similar approach going out of their way to help humans.

            I know “consciousness” is a way many futurists like to solve this and build bonds between humans and hypothetical AIs, but I believe that’s deeply philosophically flawed. Philosophically speaking I prefer framing AIs as an extension of humanity rather than as separate parallel entities. If there’s a way that we can build that sort of philosophy into evolved goals, then we’d be as safe from AGIs as we are safe from our arm or heart or nervous system. Of course there’s still the issue of malicious humans, but perhaps there’s a way to build moral progress into our technological advances!

          • Josh says:

            Agreed 🙂

      • I agree with all this. MIRI seems to favour the Programmed, or Rigid, or Big Design Up Front approach because it is seems to provide proveable correctness. But proveable correctness is misleading, because when things get compex enough, you no longer have 100% certainty.
        Moreover , it means losing out on the advantages of steerability, corrigibility.

        So far, only Stuart Armstrong seems to get it.

        • Stuart Armstrong says:

          Thanks!

          But I have to say, the more aspects of the problem are “provably” (or quasi-provably) solved, the easier the remaining aspects get. So if, eg, MIRI develops a value system that’s stable under self-improvement, that’s one thing less to worry about.

          Even if it’s only stable under certain axioms, that’s still a “make sure the axioms are satisfied” situation.

  40. Decius says:

    The first step in making friendly AI is making Friendly humans. Also technically the last step.

    • FJ says:

      Thank you, that is more or less exactly my conclusion from the following passage: “Dealing with an AI whose mind is no more different to mine than that of fellow human being Pat Robertson would from my perspective be a clear-cut case of failure.”

      Stipulate that Pat Robertson (or someone else) feels the same way about an AI that thinks like Scott does. If we adopt Scott’s first three assumptions, then it’s reasonable to say that anyone who has the power to program a potentially superhuman AI thereby has absolute power over humanity. Who should we trust with such terrifying power?

      Several thousand years of human history suggests that the correct answer is “No one.” No human (or subset of humanity) is worthy of absolute power, either because he will be malevolent or because he will be incompetent. Thus, no human is worthy of programming a Godlike AI. If superhuman AIs pose a high risk of becoming Godlike, then no human or subset of humanity is worthy of programming superhuman AIs.

      Legally prohibiting AI research doesn’t guarantee that no AIs will ever be created, of course. But if AI research is remotely as dangerous as Scott claims, and as clearly unjustifiable as he effectively demonstrates, then preventing AIs from ever being developed is the paramount goal, not figuring out how a perfectly trustworthy human might create a perfectly trustworthy AI.

      • c says:

        Yes. There should be a global surveillance state that prevents human-level AGI from ever being developed.

        It could also end all wars, and prevent other threats like nano or bio holocausts.

        Unfortunately, there is one little problem with this plan…

        • FJ says:

          A great point! As Ghatanathoah points out below, we haven’t solved Friendly Government either. The best rejoinder I can offer is that a human-run global totalitarian state will likely be less effective at achieving its goals, and less effective at defeating human opposition, than a superhuman AGI. Nor can a human-run government exponentially improve its own effectiveness. This obviously is both good and bad — an imperfect government may not be able to prevent all AI research — but it’s preferable to a perfectly malevolent AGI.

          I would be interested in reading an argument for why a superhuman, malevolent AGI would plausibly be better than a human-run global oppression. Human governments are very good at hurting humanity, but almost by definition a superhuman AGI would be capable of inflicting worse harms.

          • This is leading me to think that an improved AGI design would be one that somehow self-destructed as soon as it became more powerful than any human. That way it could have a useful influence on the world, but not so large that we couldn’t oppose it if the outcome was perverse. I have no idea how that could be feasibly be implemented in a way that wouldn’t itself have unexpected results though.

  41. Ghatanathoah says:

    I think it’s been mentioned before that there is some analogy between FAI and designing bylaws for governments and corporations. Governments and corporations are superhumanly powerful entities that do not always have people’s best interests at heart, although they are fortunately much more controllable and destroyable than a hard takeoff UFAI.

    At the moment we have yet to master the Friendly Government or the Friendly Corporation problem. The closest things to Friendly Governments and Friendly Corporations we have were partly designed, but also partly evolved by accident. We have trouble building Friendly Governments in other parts of the world and no matter how well studied our Friendly Governments are it seems like someone always has a new theory as to the source of Government Friendliness and several books to prove it.

    So if we have trouble constraining these entities that are vastly slower and stupider than UFAIs are, We shouldn’t think that we can easily constrain AIs. And we should probably consider changing the value system of an UFAI after it has been activated to be even harder than changing the values of a government after it has been formed. And as anyone who has ever had to go into politics to get a law or regulation changed can tell you, it’s pretty hard. This is something we need to try to get right ahead of time.

    And who knows, maybe even if super-soft takeoffs turn out to be the way the future goes, our early preparation will inform our quest for Friendly Corporations and Governments.

    • Governments and corporations haven’t destroyed us yet, and aren’t motivated to, because they need us, and have been steered to some extent by incremental changes, and gave done very well on Big Design up Front.

      • FJ says:

        Governments have certainly destroyed large numbers of humans in the past, and even more have acted very deleteriously to (what you or I would describe as) humanity’s interests. No government has actually exterminated all human life yet, but (a) that has at least in part been due to good luck and (b) I don’t think that’s adequate reason to be complacent about how well we have solved the Friendly Government problem.

        • Scott H. says:

          Governments and corporations are people. This is an analogy more to human intelligence than really giving us insight into AI.

          • Doctor Mist says:

            You have put your finger on a key difference in viewpoint, I think. Governments and corporations are not people, though they exploit the intelligence and energy of people, in part by convincing people they are in control and indeed by being somewhat (but only somewhat) directable by people. But they do all kinds of thing no person ever requested or ever would have. Would any American say, “The tax code should be a million lines long.”? But it is.

          • Governments and corporations are still constituted of people.

          • Doctor Mist says:

            AncientGeek- I think we are talking at cross-purposes. My first inclination was to repeat the comment you are responding to.

            People are certainly intimately integrated with governments, and provide all the actuators and nearly all the processing power. But a government has a rudimentary mind of its own. This is even partly by design, and desirable — we speak of the ideal of a government of laws, not men, in the hopes of giving the government a robustness that makes it resistant to the happenstance of a would-be tyrant in charge.

          • I’m not denying that governments have rudimentary minds of their own, I’m pointing out that theyre a poor model for existential threat. The problem here is the broadness of the meanings of friendly and unfriendly.

          • Doctor Mist says:

            OK, I agree. As often happens I thought you were making a different point than you were. Apologies.

            Still, I think there is at least a little juice there. Modern computers are a pretty poor analogy for superhuman AIs, too! Governments as entities distinct from their humans probably exhibit less intelligence than Deep Blue or Watson, but I submit that the latter never brought us close to nuclear Armageddon. The rudimentary mind of a government has more agency and effectuality (if that’s a word) than modern computers.

            The narrative is that as AIs get smarter, a government or corporation advised by an AI will outcompete one that isn’t, and eventually one that is actually controlled by an AI will outcompete one that is merely advised. The politicians or stockholders are happy, but the AI’s aims are still those of a government or corporation.

            I know. “Like a balloon, and… something bad happens!” I’m still groping for understanding, and analogies are all I have.

    • Governments and corporations are made of humans though. The fact that their flaws and strengths are non-absolute is probably a reflection of our own mediocrity. I am uncertain of whether AGI would be different.

  42. Alex says:

    I think there is no way to reason about when or if we might invent AI except (possibly) outside views. Maybe you could make a decent case that AI is the third term in the sequence that starts with (1) animal biological evolution and (2) cultural evolution. If this sequence makes sense, we should expect a 50% chance of a third term and a 50% chance we either go extinct or plateau at the second term. So I would rate the chance of AI over any given time interval as necessarily being less than the combined chance of all other extinction risks. Anyone want to argue? 🙂

    • Eli says:

      Yes. I have said it before, I’ll say it again: give me my choice of research staff and a budget, and measurable progress, including hopefully a complete agent, could be done (albeit at the price of no safety and lots of overwork for the staff) inside 10 years. The principles are clear to anyone who does his background reading.

      There is no “if” question here. There is only, “When do I, the person writing this comment, feel assured that doing this won’t shoot me in the foot?”

      You are not as complex as you think you are.

      • Alex says:

        Our views are so different that it may be hard to say much

        I would only be willing to take seriously that inside views suggest AI if a decent bunch of experts (e.g. LeCun) said we have the scientific knowledge now to build it

        Hanson thought there were four growth modes and three “singularities”, which I think is nonsense. I am most interested in what folks think of the validity of my two-term series above, with its sole singularity being the rise of humans. Is this also rubbish?

  43. maxikov says:

    It seems in theory that by hooking a human-level AI to a calculator app, we can get it to the level of a human with lightning-fast calculation abilities. By hooking it up to Wikipedia, we can give it all human knowledge. By hooking it up to a couple extra gigabytes of storage, we can give it photographic memory. By giving it a few more processors, we can make it run a hundred times faster, such that a problem that takes a normal human a whole day to solve only takes the human-level AI fifteen minutes.

    So we’ve already gone from “mere human intelligence” to “human with all knowledge, photographic memory, lightning calculations, and solves problems a hundred times faster than anyone else.” This suggests that “merely human level intelligence” isn’t mere.

    This works under the assumption that an AI works on the principles kinda similar to those that brains use – but it doesn’t have to be the case. In fact, all the progress we have been making so far relies exactly on stopping trying to imitate humans, and and throwing as much of the things that computers are good at – computing fast, storing a lot of data, having a lot examples – as possible at the problem. The attempts to develop machine translation by making the system understand the input as a set of logical statements, and then express these statements in another language have failed spectacularly. It’s only when we started trowing gigantic sets of translated texts into a very simple pattern-matching algorithm that we got some progress. That’s also why the advances in machine translation don’t scare anyone – we know for darn sure nothing even remotely resembling understanding is involved. Or, even more illustratively, look at the way computers play chess – by bruteforcing solutions. They don’t play like a human grandmaster that can bruteforce millions of moves – this would be vastly superior to what we have now – they play like a human that learned few basic moves yesterday, and can bruteforce millions of moves.

    So if you define “human-level” as “capable of solving problems equally well”, we won’t be talking about the exact copy of a human brain, that can be later connected to Wikipedia. We’re talking about a rat or a pigeon brain connected to Wikipedia, and using finely tuned algorithms to make use of this information just to be able to keep up with humans.

    Also, the extent to which giving the AI more hardware is gonna speed it up is limited by Amdahl’s law. Granted, both neural networks and Big Data are paralleled nicely, but as soon as the sequential throughput becomes the bottleneck, the growth slows gown in a big way.

    • Eli says:

      Excuse me, but human beings are very much inductive, statistical learners. You only develop your so-called “deductive reasoning” after a HOLY FUCKING SHIT sample complexity of inductive examples.

      • maxikov says:

        No, humans rely on cross-domain, cross-sense knowledge, and eliminate huge chunks of belief space by basically modeling the brain of the author of the message, and figuring out what they could or could not have said. The ability to do so is largely innate (see incredibly complex social behaviors in apes), and even where it’s not, children don’t grow up in an isolated room with wikipedia access, they grow up actively interacting the the environment and the society. This is where much of the common sense comes from, and it’s not feasible of obtain such data set in foreseeable future (since it has to include not only all senses, but also all commands sent by brain).

    • Scott Alexander says:

      Most interesting point I’ve seen on here so far, thank you.

  44. Doesn't know where to start says:

    Are we sure people are good at beating Pascal’s wager? I notice a lot of people still play the lottery (which seems like a good real world example of the wager). I think the only reason we easily reject Pascal’s wager as stated its because it tripod our absurdity filter.

    • Stuart Armstrong says:

      People play the lottery, but they don’t typically waste huge resources on it. The bias that seems to protect us from Pascal’s wager is not the unlikelihood of the payoff, but the large price it requires.

      There are many people who might pay a Pascal’s mugger $5, far less who would pay it $5,000,000 – even though the difference between $5 and $5,000,000 is tiny compared with the expected utility calculation that makes the mugger plausible in the first place.

    • Nestor says:

      I was going to say the same, look at the last 2000 years of art history, and you’ll find it severely overrepresented by a strong meme which uses Pascal’s wager type incentives as a central part of it’s structure: Christianity.

      Seriously. I wander through a museum full of vigins, babies, doves and crosses and somehow My little pony and Pokemon don’t seem that bad anymore.

  45. JPH says:

    Friendship is Optimal is a story/fan fic that will convince anyone that there is no time like the present to work on goal setting for AIs. READ IT 🙂
    It would make a crazy TV series!

  46. Alex Mennen says:

    > And building it in is a really hard problem. Most hacks that eliminate Pascal’s Wager without having a deep understanding of where (or whether) the formal math is going on just open up more loopholes somewhere else. A solution based on a deep understanding of where the formal math goes wrong, and which preserves the power of the math to solve everyday situations, has as far as I know not yet been developed.

    Nope. Pascal’s Wager/Mugging is a solved problem. Expected utility maximizers only run into that sort of problem if they have unbounded utility functions. Bounded utility functions are not vulnerable to Pascal’s Wager/Mugging, besides being both theoretically nicer and a better fit for actual human preferences.

    Relatedly,

    > The end result, unless very deliberate steps are taken to prevent it, is that an AI designed to cure cancer hacks its own module determining how much cancer has been cured and sets it to the highest number its memory is capable of representing. Then it goes about acquiring more memory so it can represent higher numbers. … This is not some exotic failure mode that a couple of extremely bizarre designs can fall into; this may be the natural course for a sufficiently intelligent reinforcement learner.

    If there is a maximum value that the reinforcement learner will recognize as a reward, this doesn’t happen. However, you could still run into the similar problem of the reinforcement learner rearranging the universe to protect the mechanism that keeps its reward pinned at the maximum value.

    • RomeoStevens says:

      How do we deal with the issue that satisficers want to become maximizers?

      I think this issue is related to corrigibility. As nate mentions further up, how do you make it flip a switch and then turn itself off.

      • Alex Mennen says:

        > How do we deal with the issue that satisficers want to become maximizers?

        A maximizer with a bounded utility function wants to remain a maximizer with a bounded utility function.

    • Scott Alexander says:

      Yes, but last I heard Eliezer thinks that bounding an AI’s utility function would be a terrible idea, because if we’re actually talking about an AI that might take over the galaxy or universe or multiverse, saying “By the way, never think about anything more than 1,000,000,000 utility” might mean that trillions of potentially happy humans never get born.

      • DrBeat says:

        …So?

        I don’t view “It could have conquered the galaxy and made it perfect, but didn’t” as a failure mode, I doubt many other people do — and given this is a post about how an AI with a goal that sounds good ends up with disastrous results, I don’t see why you would either.

        • Anonymous says:

          So change Scott’s scenario to make it more like the least convenient possible world. Say the Universe will inevitably be populated by humans numbering in the quintillions. But a decision today determines whether they will all be happy and fulfilled, or whether half of them will live miserable lives full of agony. (Maybe we have to decide whether to spend most of our resources today to eradicate a pathogen that would eventually become terrible and ineradicable). Then we can ask whether a bounded utility function describes our actual preferences in this sort of situation.

        • If you take positive utilitarianism seriously, I think that would be a perfectly reasonable objection. Of course, negative utilitarianism is also a problem because you can eliminate suffering by eliminating life.

          However, if you prefer a negative consequentialist goal of something like “don’t let the human race or the biosphere die”, you’re probably better off not taking absurd risks to create another trillion humans. Of course, this second approach isn’t problem free, but I think it’s less trivially problematic.

        • I agree. I have always find this part of the argument really hard to grok. To me it’s obvious that you should sacrifice a certain amount of fun for safety. That’s why we have speed limits.

          The argument rests on a moral judgement, and, since EY neverv solved morality, it effective rests on an intuition, and his intuitions are weird, as the dust specks thing shows.

          Having your utopia target, and your dystopia target an inch away from each other is not a good idea.

      • Alex Mennen says:

        You can have bounded utility functions that are still strictly increasing in number of happy humans. It sounds like you’re imagining something like U(n) = min(n, 10^9), where n is the number of happy humans, whereas I’m imagining something more like U(n) = -1/(n+1). That way the AI will always create more happy humans given the opportunity, but will not be vulnerable to Pascal’s mugger.

        • Anonymous says:

          This doesn’t really solve the problem Scott raises. it just means that instead of literally not caring about humans beyond the billionth one, it *almost* doesn’t care about them.

          • Alex Mennen says:

            Ah, that is true. But doesn’t it seem perfectly reasonable that having 2 trillion people instead of 1 trillion is much less important than having 1 trillion people instead of 0 is, by an extremely large margin? (I will concede that the example utility function I gave converges way too quickly, though).

          • ” But doesn’t it seem perfectly reasonable that having 2 trillion people instead of 1 trillion is much less important than having 1 trillion people instead of 0 is, by an extremely large margin?”

            Comparing futures with different numbers of people in them from a utilitarian or Paretian or similar point of view is a pretty hard problem. I attempted to deal with it in an old article, unfortunately not, so far as I know, webbed, but still of possible interest:

            “What Does Optimum Population Mean?” Research in Population Economics, Vol. III (1981), Eds. Simon and Lindert.

      • maxikov says:

        So far we don’t need an AI to breed, and if the population growth continues, supplying the humanity with new planets will be an instrumental goal in maintaining the existing utility.

      • Deiseach says:

        trillions of potentially happy humans never get born

        The AI might take the view that the Earth can only support so much of a population, and that instead of searching out and terraforming habitable planets to take the surplus that humans keep producing (since why stop producing more babies as long as the AI can fix you up a new planet?), it would be better, more effective, more reasonable and easier to reduce the human population of Earth to a sustainable level and keep humans from reproducing above replacement at that level rates.

        Damn it, I can’t think of the name of the character or the author, but there is a collection of SF stories about a stellar trader who gets some kind of McGuffin (either a starship or an AI or the like) which means he can pretty much solve these levels of problems; he gets entangled time and again with a world which (for religious reasons) does not have birth control or abortion, and which armtwists him into solving their various problems with overpopulation, and the last time he loses patience, introduces a planet-wide agent which prevents pregnancy, and more or less says “Okay, I solved world hunger and you kept having more babies and that broke the solution; I solved getting more supergeniuses born so they could steer your society along the right track and you kept having more babies; I solved the social problems from overpopulation and you kept having more babies, and each time I solved your problem you went right on causing the trouble that needed fixing. SO NOW I’VE SOLVED THE UNDERLYING CAUSE – NO MORE BABIES. You can use your supergeniuses to work out how to colonise enough planets to support your current population, and by the time they figure out how to reverse the sterility fix, your culture will have broken the link between sex and reproduction and you’ll never go back to having more babies than you can feed and take care of on one planet.”

        The AI might, not unreasonably, conclude that the problems of humanity are caused by humans, so why create the opportunity for trillions more humans to exist, which is the same as creating the opportunity for trillions more problems?

        • Jiro says:

          The problem with this reasoning is that it’s still treating the AI like either a literal genie, or a malicious genie that is using plausible deniability to pretend to be a literal one. If you were to ask humans whether they considered “no more babies” to be a solution, they’d say “no” in overwhelming proportion. In a story, it doesn’t sound immediately absurd because it’s written as a form of poetic justice for humans messing things up, but an AI that actually does what people mean would immediately recognize that that is not what humans mean by a solution.

    • Nisan says:

      Is there a document or article you can point me to that defends “Pascal’s Mugging is a solved problem” at greater length?

  47. Leo says:

    I was going to write an angry comment, the gist of it was: “Yes, the government should throw more money at AGI risk, and AI researchers should find the topic sexier. But most of us readers can’t do anything about it. MIRI never output much worthwhile research, and is now focused on fundraising and PR instead of anything that might fix that.”

    In the course of doing that, I went to MIRI’s list of papers, in order to point out that most are fluff. They’ve really gotten their act together in 2014 and 2015, and I’m impressed by Soares and Fallenstein’s research.

    So I am very confused. Benja Fallenstein, you comment here sometimes, help?

    • FeepingCreature says:

      See, you would have been aware of that if you had been subscribed to MIRI’s RSS feed! 🙂

      • Leo says:

        So the general lesson is, every time I think someone is a crackpot and full of hot air, I should subscribe to their newsletter just in case they aren’t? Well, that’s my excuse of the day for reading Creationist blogs.

  48. Stuart Armstrong says:

    Thanks for writing that.

    I’d add another category of useful early research: ideas that turn out to be widely applicable across many different designs. For instance, my corrigibility ideas started out with utility based agents ( https://intelligence.org/2014/10/18/new-report-corrigibility/ , http://lesswrong.com/lw/jxa/proper_value_learning_through_indifference/ ), but (along with some others) I’ve found ways of generalising them to AIXI, Q-learning, Sarsa, Monte Carlo agents, etc… (paper forthcoming).

    There are some technical ideas that just generalise nicely across many designs, and it’s valuable to have these around early.

    • Eli says:

      Yallah, come on, publish the paper! The world needs more useful scientific work on these problems instead of bullshit philosophy thought-experiments!

  49. Stuart Armstrong says:

    >And although I haven’t seen anyone else bring this up, I’d argue that even the hard-takeoff scenario might be underestimating the risks.

    I’ve argued something related: namely that we might not need a takeoff to get superpowered AIs. I imagined a human-level AI with good copy-coordination and good hacking skills; such an entity could take over the internet, by hacking computers, copying itself onto them, using them for a basis of further hacking, etc. Or maybe just an AI that could consistently outperform the market: it could drain unlimited quantities of resources towards itself.

    The point is that our current civilization has vulnerabilities that current humans can’t effectively exploit, but that an AI of comparable-to-human intelligence might be able to, without needing to be superintelligent.

    • Irrelevant says:

      An AI that could consistently outperform the market.

      Oh my god. Perfected arbitrage? Better investment strategies? Improved liquidity? More reliable signals of when we’re in a bubble? How will humanity ever cope?

      Seriously though, that’s up there with xkcd’s “what if the spambots learn how to make insightful and relevant contributions?” scenario. If MarketBot can do that, MarketBot’s providing a service that is in fact worth all the money it’s making.

      • c says:

        I thought this was funny about the Meditations on Moloch. Capitalism was mentioned as one of the coordination tragedies, because, you know, cheap consumer benefit is such a terrible plight.

        • suntzuanime says:

          Cheap consumer what-Moloch-assures-you-is-benefit.

          References to “consumers” are kind of ironic, because they’re actually the consumed. The consumer is Moloch.

          • c says:

            What a dumb response. The item in question was “cheap garments”. My clothes are cheap and good, so is my food, and I do not feel consumed by them.

          • Irrelevant says:

            The charitable interpretation is that he means that the imps, realizing we want clothes at price X but have an aesthetic distaste for sweatshops we’re willing to pay some small amount for, coordinate to get our clothes for X+(price of plausible sweatshop deniability) rather than X+(substantially higher price of in fact not using sweatshops).

            The cynical interpretation is that he, as is commonly the case, assumes the sweatshop workers are somehow necessarily being cheated because he can’t bridge the inferential gap between his own upbringing and that of someone who would find a sweatshop job to be a welcome opportunity.

          • suntzuanime says:

            Your clothes are cheap and bad, I’m sorry to say. And if you stop to read my post, it’s clear that my concerns are about the consumers, not the workers (much of the damage to workers was already done before Capitalism got there, by Agriculture).

          • c says:

            Your clothes are cheap and bad, I’m sorry to say.

            Compared to what? To a utopia with perfect flawless garment replicators, or to clothes from realistic-systems-other-than-capitalism? Because the former isn’t available and the latter has done far worse so far. It’s not like consumers have no way to discern quality.

            Anyway, the original argument was about wage competition based on low prices, which ignores that consumers benefit from that. And that is a crucial difference to other coordination dilemmas, like military arms races (where the “benefit” is more effective killing and destroying).

          • suntzuanime says:

            The alternative to a cheap and bad shirt is an expensive and good one. I do not deny that your shirt is cheap. The latter has done worse because it is expensive, and the prices are, like, written right on the tag.

            Giving garment workers higher wages raises the price of the shirts, which reduces the resources of shirt-purchasers. However, it also increases the resources of garment workers. This is a redistributive effect. If we assume the garment workers are on average much less wealthy than shirt-purchasers and have more use for a marginal unit of resources, this is a beneficial redistribution.

          • c says:

            The alternative to a cheap and bad shirt is an expensive and good one.

            I think this is a false dichotomy, because without market incentives, the default has historically been “expensive and bad”. As for redistribution being “beneficial”, you still need incentives intact. The altruists can always just give money. Nothing about capitalism stops charity, in fact EA comes out of capitalistic societies and is doing rather well, afaict.

          • suntzuanime says:

            Unless Effective Altruism is in the business of supplementing sweatshop workers’ wages, I think that’s not really addressing the specific example.

            (I’d rather not go in depth on my views on the Effective Altruism movement in general, out of respect for our host.)

      • Stuart Armstrong says:

        “Better opportunities to exploit the noise of the market in ways humans can’t,” for instance.

        Even if the AI’s interventions are entirely beneficial – in that it is paid less than the added value of its economic efforts – it can still accumulate resources very fast (while simultaneously making much economic growth dependent on its judgement). If the AI subsequently goes bad, distribution of resources absolutely is critical (ie how much does the AI have?).

        Anyway, the point is not a particular scenario, but the possibility of AI’s rising to power, using vulnerabilities in human society that it can exploit without needing to be superintelligent. “Rapid production of high quality animated movies” is something I could have mentioned; we simply don’t know how vulnerable our societies are, because we’ve only designed against some forms of human takeover.

        • houseboatonstyx says:

          “Rapid production of high quality animated movies” is something I could have mentioned; we simply don’t know how vulnerable our societies are, because we’ve only designed against some forms of human takeover.

          Yes. I’ve kept quiet so far, but (Lewisian ex-theist here) must add something.

          Another version of animated movies is daisies, as in “Please Don’t Eat the”. ~1950s housewife left children with a list of things not to do before a formal dinner. So they ate the daisies out of the centerpiece.

          GKC considered this a feature of ‘negative morality’; if he just managed to follow a few Don’ts from the 10 Commandments, he was free to do countless other things. Which is just what we want to keep AI from doing.

          As with the US Bill of Rights, I think the 10 Commandments are the right size: general enough to cover a lot of things they don’t have to itemize, and specific enough to be applied.

          Similarly, there’s a traditional/lost moral system of Do’s and Don’ts whose structure could be used in programming a better set of Laws of Robotics (for Dungeons & Dragons, anyway).

          For each proposed action, give or subtract points for tending toward:
          Helping feed and clothe the populace +10
          Causing weeping -17
          Benefiting in order of closeness +8
          Being cruel (-10) or unfair (-5) to outsiders
          Benefiting elders, the poor, the weak +7
          etc

          In positive actions or negative, if the action being considered is unreasonable, it will soon bump into one or more of the Don’ts and get sent back for reconsideration, till the AI comes up with a plan that scores better.

          • Samuel Skinner says:

            Release pathogen that prevents expressions of sadness appears to score highly.

    • Emp says:

      Outperforming the market is nowhere near as difficult as you think (a consistent gigantic problem with your mental model is how many mental short-cuts you have that default to academic orthodoxy). I’d say a machine would be able to trivially out-perform a market (as can many humans) but who exactly would give the autonomous, unchained machine significant money to manage?

      • Stuart Armstrong says:

        >but who exactly would give the autonomous, unchained machine significant money to manage?

        Someone who wants to get rich quick.

    • Eli says:

      Or maybe just an AI that could consistently outperform the market: it could drain unlimited quantities of resources towards itself.

      If you think that might be enough to get people to finally ditch this stupid capitalism bullshit, then let me go get in touch with Juergen Schmidhuber and we’ll start coding a Stock Market Takeover bot ;-).

  50. lmm says:

    > People have been thinking about Pascal’s Wager (for example) for 345 years now without coming up with any fully generalizable solutions.

    I think it’s worth paying more attention to this part. We’ve been trying and failing to solve Pascal’s Wager. Do we really think the problem is that not enough people were thinking about it, or that not enough money has been put into philosophy departments? Why do you think your strategy will work for solving Pascal’s Wager where so many others have failed?

    • Eli says:

      Frankly, the problem is that they define “rationality” as not involving any costs for information-processing or action. Once you decide that optimization energy should be invested from highest-probability possible-worlds to lowest, the problem goes away.

  51. Isaac says:

    Problem #2, Weird Descision Theory, is not very convincing to me. There’s a rather simple solution, as far as I can tell: Assign the probability of a given reward or punishment based on, among other things, the amount of the reward or punishment.

    If you assin a 1 million util promised reward a probability of 1/1000000 of coming to pass, and a 1 trillion 1e-12, and so forth, the result is that no size of promise will every be convicing. Any Galaxy Spanning Superintelligence would see this in a heartbeat.

    That’s one solution to Pascal’s wager – an infinite promised reward / punishment is the limit of this series, and still has no more than 1 util of value.

    • Alex Mennen says:

      You can’t just assign probabilities to be whatever would be convenient for you.

  52. Salt says:

    This is a good post. It shows exactly why this AI paranoia is nonsensical garbage nobody should listen to.

    I thought you knew at least a little bit about philosophy? Define your terms. What does “human level intelligence” mean, if stuff like “hack your own reward function” and “cure cancer by killing everyone” are actual legitimate mistakes to be made? Literally, a mere human child will probably not come up with these “solutions” in seriousness. At the very least, the average human will know heroine is bad for you, it’s bad to *start* using, and so will avoid it for fear of not being able to stop and it being unproductive.

    The entire post is riddled with this vague fluffery. Whatever you’re talking about, it’s not “human level” and is really not any sort of thing I’d even call “intelligence.” What is human-like about this sort of intelligence?

    And this artificial “intelligence” will be so good at gaining knowledge and yet so stupid it’ll bumble into some kind of ridiculous destroy-all-humans endgame? If it’s this stupid, why wouldn’t it just tell itself “you’ve read all of wikipedia” instead of actually reading wikipedia? Why doesn’t it “hack it’s pleasure center” here, instead of there?

    I mean really, what the fuck are you even talking about? As far as I can see, everything you’ve ever said about AI safety has been some kind of vague gesturing at the horrors of something you’ve never defined with any rigor, and instead you post huge swaths of text consisting mostly of equivocations between different senses of “intelligence,” madly gesturing that it could be bad. I’m more apt to take seriously the fear that if we don’t start resuming sacrifices to Zeus, the gods will smite us. At least I have some sense of what “the gods” mean here, and why they might do that due to lack of sacrifices.

    • Scott Alexander says:

      This is not a 101 space. If you don’t understand the definitions of some of the terms, there are more comprehensive and basic sources available like Bostrom’s book and Eliezer’s sequences.

      • tzagash says:

        Bad form. You don’t get to answer a challenge to define your terms by requiring the reader to do your work for you.

        The challenge stands. I understand this isn’t 101 space, but you seem to be trading on a conflation of two very different senses of “intelligence”. 1) Superhuman intelligence will be vastly superior to humans in all regards; and 2) superhuman intelligence will be a maximally effective goal achiever. (2) could produce scary results because it’s goals aren’t “aligned” with ours, and could also be susceptible to Pascal’s Wager and wireheading. So superhuman intelligence is scary. So, all superhuman intelligence is scary, even (1).

        Except, that doesn’t follow. As Salt pointed out, average everyday human intelligence is really enough to handle wagers and wireheading. If (1) obtains, then you have exactly zero reason to worry, because it will have already rationalized an appropriate response.

        “You don’t understand wireheading.” Maybe not, but your (1) will understand it better than any human, and will not so easily fall for it. Your fears are trading on the conflation. Poorly defined terms lead to basic reasoning errors.

        Further, you have failed to define the phrase “aligned with human goals”. You seem to be treating it as a stand in for “aligned with my goals”. After all, it should be obvious to your average human that there are no species-wide “human goals” with the possible exception of “continue biological reproduction”. And even then, there are many humans that do not accept that as an appropriate human goal. So your usage leaves us with the vague intuition that you must mean your goals, as you’ve given us nothing else to go on.

        But, why should we worry that AI isn’t aligned with your goals? Or maybe that’s not what you had in mind. Thus the call to define your terms. You can’t argue without stipulating the vocabulary of the argument, and you’re better than that.

        What we’re left with is a superhuman intelligence who has assimilated all human knowledge, including wireheading and wagers, and yet still make boneheaded mistakes. Worse, voluntarily makes poor choices that run counter to reason and actual goal achievement, with the ultimate result of human annihilation, to no purpose except machine contrarianism.

        If my murky terms led me to that conclusion, I’d be scared too.

        • Samuel Skinner says:

          “As Salt pointed out, average everyday human intelligence is really enough to handle wagers and wireheading.”

          Drug use is inefficient wireheading. Plenty of people indulge in that, even smart ones. Remember, wireheading is simply directly simulating the reinforcement mechanism- sex with birth control or masturbation give a good idea how strong a pull they exert and how readily intelligences will repurpose them from their original goal.

          “Maybe not, but your (1) will understand it better than any human, and will not so easily fall for it.”

          Fall for it? That is a moralistic response. The AI is not human- unlike a human which dislikes wireheading until it tries it, there is no reason for the AI to have such inconsistent preferences. If wireheading provides the most utility to the AI, the AI will wirehead.

          “Further, you have failed to define the phrase “aligned with human goals”.”

          I’m pretty sure the Sweet Meteor of Death Link was a big clue
          Q: Smod, who do you think created ISIS?
          A: Humans.
          Q: But which humans should be punished?
          A: All of them.

          Aligned with human values means that it doesn’t exterminate/forcibly wirehead/torture humanity or otherwise ‘fix’ humanity.

        • High IQ people are far more likely to worry about low probability, high impact Pascal’s-mugging-type problems. Low IQ people generally just apply a absurdity heuristic and move on. AGI seems likely to be more like the first group.

      • Second this. A blog that defined terms that are already well known in the field wouldn’t be a good blog.

        • Mark says:

          Sorry, but this terminology is not at all standard in mainstream CS.

          • Neither are the ideas. If you tried to rewrite lesswrongs ideas in the standard vocabulary, they would just wouldn’t make sense, IMO.

          • I’m all for criticising the ideas if they’re wrong, but I do think people following the series of posts Scott mentioned in this post would be aware of the kind of language used in the FAI-o-sphere. In any case, people can always ask in the comments rather than forcing Scott to reduce his post quality by focusing too much on definitions. I just don’t want Scott to have to pitch articles at people who don’t realise superintelligence != human common sense (especially normative parts). That’s not the person being stupid, it’s just a lack of familiarity with what Scott’s discussing, which can be corrected with a couple of polite comment questions.

          • Why shouldn’t MIRI learn the terminology of mainstream CS? How can they be effective if they don’t? Remember MIRI are th amateurs here,the ones who have poked heir noses in. It is not for the world of conventional CS or AI research to beat a path to their door.

            (No, it is not always wrong to poke your nose in, since institutions can astray…but are wrong ways of doing it)

            Likewise, it is not for outsiders to “realise” that superintelligence does not equate to common sense, it is for MIRI to make a convincing argument to the effect. As whs things stand, their arguments don’t convince domain experts, because their arguments make unrealistic assumptions .

            Admittedly, a lot of critique of AI safety is poor, but what do you expect when critics are faced with the double burden of understanding both conventional AI and MIRIs idiosyncratic jargon and assumptions.

            Good arguments are hard to respond to, but not all arguments that are hard to respond to are good…you can make an argument hard to respond to by making it weird.

          • Samuel Skinner says:

            “Likewise, it is not for outsiders to “realise” that superintelligence does not equate to common sense, it is for MIRI to make a convincing argument to the effect. As whs things stand, their arguments don’t convince domain experts, because their arguments make unrealistic assumptions .”

            The only common sense argument I’ve seen in this thread is AI’s won’t wirehead; as we all know humans have common sense and never take drugs or engage in nonproductive sex.

    • > And this artificial “intelligence” will be so good at gaining knowledge and yet so stupid it’ll bumble into some kind of ridiculous destroy-all-humans endgame?

      That’s exaactly the argument that was recently thrashed out. It turns but that the AI would be so stupid….technically,that is it would fail to update detailed hardcoded information in its goal system despite conflicts with the inferred information in its implementation system….given a string of design decisions that an AI designer wouldn’t make.

      • Samuel Skinner says:

        That isn’t stupidity. The AI is achieving its goals as efficiently as possible.

        As for updating versus changing what the updaters value the post declares that we will just move efficiency down the list of goal. Efficiency- being able to achieve your goals with a minimum of resource expenditures. How this will help is not clear.

        The solution proposed is copying how humans think. That’s not very reassuring.

  53. Emp says:

    All true except that I very strongly disagree with (1). Human-level intelligence is orders of magnitude more difficult than the stuff that exists now and software constraints are simply too difficult. Moore’s law is a trend that will not continue long enough to be able to solve this through brute speed and power.

    I’d say there’s at most a 20% chance that (1) is true, not 95%.

    Edit: I just saw a post by Bugmaster that explains very well why I think 95% is an ABSURD number for the probability of this happening.

    Also, the strength of objections has nothing to do with how many people espouse them; and just because most are willing to handwave (1) doesn’t make it reasonable; it’s the huge problem with the whole argument. I honestly see way too many assumptions conflating the popularity of beliefs here (on your blog) with their truth.

    • Eli says:

      Human-level intelligence is orders of magnitude more difficult than the stuff that exists now

      No, it just requires that you stop philosophizing, stop writing science fiction, and get on with finding psychologically realistic computational mechanisms.

      • HeelBearCub says:

        Creating an AGI can’t fail, it can only be failed.

        Come on now. Make progress toward AI in general has been the goal of the IBMs, Microsofts and Apples of the world for quite a long time. Seriously, even coming up with something that can make moderate sense of words (spoken or written) has been a huge challenge.

        Your priors are coming from sci-fi, not science.

        • Luke Somers says:

          So you’re saying that the optimal hardware configuration for intelligence is ion currents in neurons, and that our arrangement of them is optimal, OR that given billions of years we still couldn’t find such a better arrangement?

          Because that’s how strongly true your point would have to be in order to actually contradict point 1, which is what we’re focused on, here.

  54. Pingback: Understanding AI risk. How Star Trek got talking computers right in 1966, while Her got it wrong in 2013. | Praxtime

  55. Excellent post! Thanks for the link back to my praxtime post. In retrospect I wish I had made more clear in my post that despite believing soft takeoff is more likely, I continue take hard takeoff very seriously. And more to the point, that AI Risk appears to me to be the only serious contender for an existential threat to humanity and that we need to work on it now as you say. Wireheading is what I most fear, so glad you called that out. I updated my post to link back here.

    Really enjoy your writing, so thanks for the link. Keep up the good work.

  56. Vaniver says:

    Alan Turing had gotten it into his head to try to solve it in 1945, his ideas might have been along the lines of “Place your punch cards in a locked box where German spies can’t read them.”

    Yes, I’m sure those internet commenters can come up with something as clever as Alan Turing could. It’s not like he just had a paper declassified seventy years after it was written and it was appreciated by modern participants in the field.

    • Daniel Armak says:

      That was my reaction too. If Turing had tried to work seriously on digital computer security in 1945, who knows, maybe he’d have discovered yet another branch of modern cryptography a few decades early.

  57. Eli says:

    But it might lead an artificial intelligence to seriously misinterpret natural language orders.

    Scott, that’s bullshit. There will be no natural-language orders. Whatever drives our agents will follow, they will be written in program code, not natural language. There is simply no other way it could work!

    If you keep philosophizing about AI, you will never be able to fucking build it! Reduce it to math and science, and we can argue about what the formulas say!

    • Jesse M. says:

      “Simply no other way it could work”? If it has some kind of sensory perception, then any type of sensory input can be used to shape its behavior, including vocal orders. We already train neural nets by showing them pictures, not by hand-programming all the neural weights so that the neural net will respond in the way we want it to without the necessity of training. And a human-like AI based on some kind of bottom-up neural net like approach seems far more plausible to me than the top-down symbolic AI approach where we directly program the high-level beliefs and concepts of the AI.

    • HeelBearCub says:

      This is a fundamental misunderstanding of the problems being discussed.

      I don’t buy Scott’s arguments, but given an actual AGI that is superior to ours, natural language will be the best way to communicate with it. It will be running programming, perhaps even programming that is the result of rapid evolutionary processes (so not necessarily programming written by any human). As such it will need a very sophisticated mean of communication. Programming is slow and frequently we don’t actually code what we i tended to.

    • Limiting things to only maths and science will mean you lose perspective on higher level functionality, for the same reason physics doesn’t predict the outcome of a cricket match. Sure, ground philosophy in empirical fact, but don’t eliminate it.

  58. I’d like to pitch in with those whose problem is with step 3. It canters over a number of issues to do with goals and agency…. we only need to worry that the goals of an AI are aligned with ours if it is agentive ….. but what’s inevitable about agency?

    • Philosophically the concept “agent” suffers from the same kinds of problems that we’ve previously discussed. But I do think it’s likely that an AI with “agency”-like attributes would systematically outperform ones that relied on human instruction. On the other hand, the second type might be safer.

  59. Bruno Coelho says:

    The most important part of this post, and Scott perspectives in general: he do not frame the question as ‘FAI people vs skeptics’. For example, Pinker recently made statements about how dumb is to pay attention to AI risk.

    Another point, the ethical and philosophical problems seems very hard to solve and the moral disagreement intractable. I wonder how the institutes are working on this part of the general control problem.

    • Cauê says:

      Pinker recently made statements about how dumb is to pay attention to AI risk.

      That’s disappointing. Where?

      • MicaiahC says:

        Pinker disagreed on Edge’s myth of AI conversation topic, linked below.

        http://edge.org/conversation/the-myth-of-ai#25987

        • Cauê says:

          Oh. That was disappointing.

          He didn’t read the 101 stuff either.

        • Deiseach says:

          Steven Pinker thinks no worries: just make your AI a girl and no desires to conquer the world?

          Has the man never heard of Margaret Thatcher? As a woman, let me assure him: I would certainly have no problem massacring my enemies and laughing villainously as I sent forth my armies of flying monkeys to do so (only, you know, that inconvenient damn religion thing telling me murder is bad). Right now I am admiring a strong colour of nail varnish I have just tried that makes me think “My fingers are imbued in the blood of my enemies” and quote approvingly that line of Donne’s to myself:

          Cruell and sodaine, has thou since
          Purpled thy naile, in blood of innocence?

          The name of the shade of colour is “Divine Wine” and it’s on the reddish side of purple, in case anyone is interested; hence the “purpled thy naile” bit.

          If Pinker really thinks girls are made of sugar and spice and all things nice, may I ask him to divest himself of this patronising pseudo-feminist Earth Mother crap attitude? If he knows so little of his fellow humans, how am I to respect his judgement on non-human intelligences or the risks associated?

          I also do not appreciate his sneers about the council for bioethics; I have no dog in the fight about American politics but I think that a little tying up of unfettered biological research is no harm, particularly when it comes to messing around with humans.

          • Samuel Skinner says:

            The problem wasn’t their religious views (well, not the only problem). The issue was the individuals appointed were nuts.

            Leon Kass (first head of the council)

            “Worst of all from this point of view are those more uncivilized forms of eating, like licking an ice cream cone—a catlike activity that has been made acceptable in informal America but that still offends those who know eating in public is offensive. … Eating on the street—even when undertaken, say, because one is between appointments and has no other time to eat—displays [a] lack of self-control: It beckons enslavement to the belly. … Lacking utensils for cutting and lifting to mouth, he will often be seen using his teeth for tearing off chewable portions, just like any animal. … This doglike feeding, if one must engage in it, ought to be kept from public view, where, even if we feel no shame, others are compelled to witness our shameful behavior.”

          • Nornagest says:

            I mostly agree with these sentiments, but I’ve never seen a scientific ethics committee that was right about anything. I could at least respect the reasoning if it was rigorously applying a precautionary principle (though I’m not a big fan of that, either), but they don’t even manage that much, as a rule: if biological research needs fettering, and I’m not sure it does, these definitely aren’t the right kind of fetters.

  60. Alexander Stanislaw says:

    I’d like to dispute 2 and 3. I’m actually quite shocked at the 95% confidence.

    Regarding 2: its not obvious at all that far above human level AI is possible. Very smart human AI with more memory and processing power perhaps. But not the kind of world ending AI that Singularity proponents usually argue for. I’m quite shocked you think that you might be able to get from a rat brain to a human brain by adding more processing power, memory and other minor adaptations. Such a rat would not even obviously be better at being a rat than a normal rat, it just start acting weird because the world is moving in super slow motion.

    Heck you can’t even get from an average human brain to a von Neumann level brain by adding more processing speed and memory. If you increased my thinking speed 10 fold, I still wouldn’t be able to touch Neumann. I would be better than him at some trivial things (like mental math), but more time would not cause me to connect concepts, get an intuitive feel for the structure of a theory, ask the right questions, and try the right techniques. I would still get stuck down unproductive lines of thought, and I would just get unstuck faster. It wouldn’t help me beat or recognize akrasia, it wouldn’t motivate me to develop my knowledge base, give me that drive to understand something completely. I could go on and on. Intelligence is not a mostly a scaling problem, it’s also _how_ you think.

    Regarding 3: An above human level AI is not better than humans in all ways. In particular they are not evolutionarily adapted to living on earth. They depend on existing infrastructure to replicate and repair themselves. While they would be likely very good at this, any disruption to this infrastructure (particular very large disruptions) are much more catastrophic for an AI than for us. And they wouldn’t have the knowledge to repair gaps in the infrastructure. Humans mostly self repair just by being adequately nourished and that’s pretty amazing. (its also what sets limitations on what we can do, our cellular machinery has to balance efficiency with the ability to repair and replicate, but in a game of survival, its a huge boon)

  61. Mark says:

    This just sounds like an argument to fund conventional research in formal methods and security. Why should funding go to fringe groups like MIRI instead?

  62. Scott H. says:

    AI will not develop like Scott A. is saying. It will not “auto-spawn” and defend itself. Only desire creates values. Values create the basis for action. Desire and intelligence are not the same thing.

    However, the danger is not really mitigated, because all the intelligent machines need is one ambitious and selfish human to step up and provide that vital desire function.

  63. Aaron says:

    This is an interesting topic but I have many reservations about the way it is being argued–both pro and con.

    1. All the arguments are based on pure speculation. I am deeply suspicious of long, detailed arguments based on notional and speculative concepts. It feels like a theological argument over the “true nature of the trinity”. Perhaps this is not a fair characterization, but without empirical evidence how does one decide who is correct?

    2. The evolutionary environment for an all-powerful AGI isn’t discussed (at least not that I have seen). But I think that is key. I don’t see such an AGI arising out of purely human programming. Maybe it could evolve, but it if were to evolve what would be the selection pressure? I wonder if this topic isn’t better addressed (or speculated on) by evolutionary biologists.

    3. In terms of actual risk, I would argue that artificial life is a more immediate concern. Sure, AGI götterdämmerung is more sexy but alife is possible now (see, for example, the old tierra experiments). A computer virus with the ability to mutate (in such a way that the code stays executable) can be built and released now. I’m not aware of one but maybe one has already been released? Anti-virus software would exert selection pressure on it, and there starts a real evolutionary process through natural selection. Digital single celled life. The risk of these is more nuisance than human extinction of course. Assuming such a life form could survive in cyberspace, its evolutionary path would be fascinating.

    • 1) I note that discussing the safety of speculative technologies will also be speculative.

      2) You may be interested in artificial neural networks – a tech that is not really evolutionary, but doesn’t require ongoing human instruction either.

      3) I find this interesting and wonder if you’ve seen any solid writing on this?

      • Aaron says:

        1) This is exactly my issue; speculation built on speculation. Do you know if there are any accepted (or even acceptable) methods for conducting a risk analysis on events with no historical precedent? My knowledge of risk assessment is limited to more mundane business risk scenarios.

        2) Genetic algorithm techniques are often used with ANNs. My guess is that they would be required in this case, hence it remains a question of evolution rather than pure programming. Evolution is capable of building systems orders of magnitude more complex and sophisticated than humans. Software has the potential to evolve more rapidly than even single-celled biological life.

        3) Sorry I don’t. This is an extrapolation on my part given the current state-of-the-art in alife and genetic programming.

        • Thanks. Sorry I didn’t give you enough credit for your knowledge of the field. Quickly on (1) I think think all risk-assessment is obviously speculation by definition and I guess the real argument is embedded in the specific probabilities of the topic. But I generally agree.

    • Doctor Mist says:

      I don’t see such an AGI arising out of purely human programming.

      Why is that? I get that some people agree with you, but I’ve never understood why. If it were literally impossible then evolution wouldn’t have been able to do it. It’s tempting to try some sort of Gödel-like argument that says the human mind cannot understand itself, but that analogy breaks down on the most cursory examination. I grant it does seem pretty hard, but there’s an infinitely large gap between even really, really hard and just plain impossible.

      The best explanation I’ve ever come up with is the rather insulting one that people would just prefer not to believe that it’s possible, maybe because they feel that it would diminish them somehow. I hate to be insulting (really!) so I’d love to have a better explanation.

      • Aaron says:

        I do think an AGI is possible, I just don’t think it is possible using only human programming. My reason is simply that humans are in fact very, very bad at programming. Even a relatively small program quickly surpasses the ability of people to understand it. At best they can only understand a small part of it. This is one of the reasons why software is filled with vulnerabilities and bugs.

        No amount of “provably correct”, type based, functional programming is going to change this. It’s a limit of the human brain’s ability to understand complexity.

        My point was that an AGI is unlikely to be written but could very well be grown. What we do have is raw computing power (such as the Amazon cloud) and a basic understanding of evolution. Evolving complex programs is certainly not trivial but in my view is the most likely approach to be successful for AI.

        • Doctor Mist says:

          Well, fair points. We do actually write far more complex programs than we did fifty years ago and I guess my picture of the future is something like hardware designs, which are way beyond the capacity of an unaided human, but the lowest levels are produced automatically or by very well-specified libraries. And we see more and more software resiliency achieved by exploitation of raw computing power — programs that not only do their tasks but also spend cycles watching the results for anomalies. Finally I don’t see anything necessarily precluding many more further advances in technique and robustness like type safety and memory management and the other things that got us this far.

          As I’ve said elsewhere, I really hope growing an AI is not the path we use, because I think it will be harder to be sure it’s right. But maybe we’ll develop a lot of robust techniques for that as well.

  64. Jesse M. says:

    Even if Moore’s law holds, I’m pretty skeptical that a lack of computational power is the main roadblock to building a full human-level AI–I think even after we have the necessary computational power it could be decades or even centuries before we can create something humanlike (mind uploading seems like the most likely route to me). I note that Kurzweil’s chart here says that when 1000 dollars can buy you around 10 billion calculations per second we will have reached the level of a mouse brain, and the data linked in footnote 8 of this MIRI article on how Kurzweil’s predictions have held up indicates that as of 2014 you could get 566 million calculations per second for around 300 dollars, or just under 2 billion per 1000 dollars. So this would indicate we’re pretty close to the mouse level already according to Kurzweil (perhaps at the fish or turtle level), but our robots don’t seem to even be at the level of insects in terms of their ability to carry out goal-directed behaviors in complicated real-world environments, let alone in terms of replicating mouse-like social abilities and generalized learning.

    Suppose the problem of figuring out how to actually create intelligent software takes another century or two. In that case it’s quite possible that before we have superintelligent machines, we will have learned enough about the genetics of brain development to create genetically-engineered superintelligent people. So for those who support spending time and money on thinking about how to make sure AI is friendly, do you also support spending a not-too-much-smaller amount of time and money on thinking about how to make sure future genetically engineered humans are friendly and don’t turn out like Khan Noonien Singh? Scott’s 5 points above could easily be generalized to this case.

    I suppose I would support both types of research just based on the precautionary principle, but I think the chances that any thinking we do about these issues now will make a significant difference to future designs (which will presumably be based on principles we have hardly begun to understand) are pretty small, so I wouldn’t support putting much money into it. In particular I think it’s pretty misguided to think about how we would program an AI’s high level motivations and goals (stuff like Asimov’s three laws of robotics), since this seems to presuppose some form of top-down symbolic AI, whereas I think some more bottom-up forms of AI like neural networks (which have been making impressive progress lately) are much more likely to be the only viable route to human-like intelligence. And in that case, while you may be able to decide in broad strokes what types of sensory input will cause a “fetal” AI brain to release the equivalent of dopamine or whatever, its high level ideas about the world, ways of categorizing and specific value judgments, would be highly emergent and difficult for programmers to control for the same reason it’s difficult to control the development of a human or animal brain.

    • Alex says:

      Consider Nagy’s performance curves; I see no reason why Moore’s law is different from many other trends. The Economist seems right with their razor blades spoof.

      Is there some reason you think there is *any* chance we can influence X (see below) now?

    • Deiseach says:

      Would it not be amusing if the way to get human-type intelligence was not via silicon but via protein-folding? Then our AI researchers might be forced down the path of “brains in vats” and farewell our bright shiny future of ascension from the corporeal and mortal by uploading our copied consciousness into first machine state and then the cybernetic ether! 🙂

  65. Alex says:

    1. This discussion might benefit if we would taboo “human-level AI” and just talk about…something…that is not spuriously associated with current computers. Call it “X”.

    2. Timelines matter. If an X decides in 100,000 years to wipe out humans, I might simply not care.

    • Jesse M. says:

      Why do you say AI is associated with “current” computers? The definition of a universal computer is an abstract and mathematical one, so it will be unambiguous whether or not some machine from arbitrarily far in the future qualifies as one, and all such computers are provably equivalent in terms of what computations they can run and what output they will give, provided they don’t run out of memory, or provided the memory can be continuously expanded if they come close to running out while performing the computation. So an “AI” is just a particular class of program that can be run on any universal computer given enough memory, and I would define a “human-like” AI as one that in some qualitative sense can perform as well as a human at any creative or intellectual task when it’s given some kind of roughly human-like body and the program’s input consists of sensory information from that body, while the output consists of signals which move around or otherwise influence its body parts (allowing it to read a physical book and write with a pencil, for example).

      • Alex says:

        First, a clarification. By X, I mean the solution to the analogy animals:humans::humans:X. There is maybe at least some reason for believing in this X, since this progression has happened before to produce humans. But there is no reason to believe in any human-level AI that does not fulfill this analogy. That would be a pure fantasy.

        We do not know if X will more resemble a Turing machine or a human being.

        Scott says things like:

        Computers, which are infamous for relying on formal math but having no common sense, won’t have that kind of resistance unless it gets built in.

        Everyone knows the problem with computers is that they do what you say rather than what you mean.

        This appeals to our general idea of what a computer is today. But we do not know if X will actually suffer from any of these problems more than, say, humans.

        By the way, your response upthread does point me to another reason why we should definitely not worry about X. I don’t agree that global warming or nuclear war are necessarily nearer-term threats, like someone claimed in response to Scott’s last AI post. But if we are positing change rapid enough to create X by 2100, then Khan Noonien Singh, bioengineered plagues, a totalitarian surveillance state, or other futuristic technological threats should almost certainly take precedence over X. Not that I think there is likely anything we could do about those, either.

  66. emily says:

    I think the best defense against bad AI is to make sure that critical infrastructure can be run off-line by “dumb” computer systems. But if we don’t figure out what to do after fossil fuels run out we may not have to worry about AI- we won’t be able to keep the computers running.

    You talk about creating AI’s with goals aligned with ours- however, that assumes you can control all of the computer programmers, which you can’t. We might figure out the best way to program an AI but some rogue programmer in Russia will break all the rules because he thinks his smarter AI will make him rich hacking into the bond market, and then it gets out of hand.

    • “But if we don’t figure out what to do after fossil fuels run out we may not have to worry about AI- we won’t be able to keep the computers running. ”

      We have multiple sources of power whose costs is less than an order of magnitude greater than fossil fuel and we could survive, somewhat less comfortably, consuming an order of magnitude less power.

  67. Asterix says:

    I think what I’m getting is that we need a nuclear war or other massive civilizational collapse in the next 25 years or we’re slaves of the robots for eternity. Because collectively we aren’t smart enough to balance a budget or defeat ISIS, much less preemptively outthink a computer more intelligent than all of us put together. Not sure if that was a tautology.

  68. J. Quinton says:

    “Why do humans resist Pascal’s Wager so effectively?”

    Probably due to scope insensitivity.

  69. Ross Levatter says:

    “if AI boxing got a tenth as much attention, or a hundredth as much money, as AI boxing, the world would be a much safer place”

    Good news! AI boxing now receives 100% as much as AI boxing…

    • Aaron Brown says:

      Either you missed the joke (look at the second link) or you’re making another joke that I don’t get.

      • Adam says:

        I’m guessing he’s referring to Ex Machina, as there is now one film about an AI manipulating a human into letting it out of the box, and one film dedicated to robots that box.

  70. Matt C says:

    I think AI risk will get more attention from smart people if there’s clearly something interesting to work on there.

    I don’t see there is much progress to be made when AI, at least in the sense discussed here, is basically a fictional concept, and the arguments here are a lot like if we were arguing about dragons or angels (or evil genies) brought to Earth.

    I could be wrong. There might be valuable theoretical insights about how a constructed intelligence would work. If someone can theorize the principles behind DNA and protein synthesis in advance of their discovery, maybe someone can do something similar for AI. But if that happens I think it will look less hand-wavy than the conversation so far.

  71. Matt C says:

    Also wanted to agree with Josh that bioengineered disease looks like the scariest and most proximate threat to humanity.

  72. I have to say that the “about 25 years” thing reminds me a lot of fusion power. We’ve been expecting that any decade now for longer than I’ve been alive.

    (For the sake of clarity, that’s not intended as an argument against Scott’s conclusions. There are, I have no doubt, any number of counter-examples where a research field produced practical results long before they were expected.)

  73. Anonymous says:

    Can’t the solution to Pascal’s mugging (or related dilemmas) just be that if you properly calculate the expected value of capitulating, the result is negative? Like, I don’t really expect a Solomonoff inductor to be vulnerable to Pascal’s mugging.

  74. The closest approximation to omnipotent AIs in the present world are governments. (“What is the heart but a spring?”—Thomas Hobbes in Leviathan) We see lots of instances of the the “Weird Decision Theory” and “Evil Genie” failure modes in governments but not much wireheading.

    • Adam says:

      I think you can say certain authoritarian regimes that optimized for agency heads reporting they met their five-year plan goals, rather than optimizing for actually meeting their five-year plan goals, were doing something similar to wireheading.

    • Doctor Mist says:

      Dunno. A system where elected officials spend most of their time raising money for campaigns and posturing for the masses has a certain whiff of similarity. (Not exaggerating, check out the “model daily schedule”.)

  75. MicaiahC says:

    There are several general things that really bother me about AI FOOM / Unfriendly AI skeptics. Do note that these should be interpreted along the lines of “tone policing”, as I don’t really consider these good reasons, so much as aesthetically displeasing parts of the argument.

    1) Why do people assume that there will be an off switch on an AI, especially if you don’t believe that paying attention to security now is important? Oftentimes, I see people complaining about people having unrealistic/abstract assumptions for what or how AI would be implemented, but I have never seen anyone criticize the assumption that an AI would be created as a centralized entity, that can be turned on and off at will. Consider that Cloud Computing is the current hot trend, and that if it continues to offer scaleable, mostly bugfree, computing power, it’s likely that many complicated projects in the future will be hosted on there.

    What power center do you bomb to shut off an AI in that situation? How many current online learning AIs actually have off buttons? (And yes, if you can bring up examples of those algorithms at big companies with off switches I will increase my confidence of security, but I find this very doubtful)

    It annoys me that when people complain about pro-FAI handwaviness don’t also complain about similar “well we’ll just make it SECURE” type arguments. Security is not a magical layer that you add to a system.

    2) There seems to be an implicit assumption that intelligence causes only linear or near-linear effects on the world. For example, that when something gets slightly smarter, it’ll still mostly act how it used to act. This comes a lot from people who mention things like “Oh we’ll just test it and it’ll be obvious that if it doesn’t do anything bad at low intelligence this’ll follow for high intelligence too”; as if all intelligences automatically will be honest, be incapable of deception or unable to “game the rules”. Intelligence, if anything, seems to work by finding highly nonlinear actions (more friction on a surface gives us fire! hooking up some magnets and rotating them gives us electricity! finding a better way of sorting moves us from quadratic to nlog n!). It’s not that the AI suddenly changed fundamentally when it goes from “persuade world leaders for world peace” to “nuke everything”, it’s that it had both of them “in mind” from the beginning. “Nuke everything” was just the super expensive 10 course French Cuisine meal that it couldn’t afford, so it decided to go with the cheap McDonald’s of “talking to world leaders”. When you get rich enough, maybe McDonald’s ain’t attractive enough ya know?

    3) Related to 1, but it’s seriously mystifying to me that people would find “oh yeah let’s make sure that everyone tries to implement an ad hoc security system for every AI we potentially try to make” much cheaper, cost efficient and easier to implement than “guys, maybe we should figure out how to do this correctly the first time so we don’t have to worry about getting them right N many times”. Like, a lot of the things being proposed here about security and boxing and stuff will be adding a lot of complexity to an already complex software system, that you then 1) have to convince everyone involved is worthwhile and 2) you get the details of the security right, and of course 3) somehow do all this while believing that AI is harmless and risk involving AI is just overblown?!?!?

    Sure, doing FAI research you run into similar problems, but then you have the additional option of trying to build the AI itself, perhaps with much more rigorous ways of containment or with a general culture of safety.

    • Deiseach says:

      1) Why do people assume that there will be an off switch on an AI

      Because if we’re stupid enough to create something potentially lethal without a way to turn it off, we deserve to be turned into paperclips.

      Cloud Computing is the hot new trend, but despite all blandishments (try Dropbox! Try Azure! Google will give you this much free storage!) I still keep my private personal stuff (a) in physical hard-copy form (b) stored on the hard drive of my personal PC which is not linked up to sharing with anyone (c) on USB drives ditto.

      Because you may take it I do not believe in the security of something off there overseas I can’t see or get near; how the hell do I know what the servers where all this is stored are operated? That bored techs don’t ferret through “My Personal Super Sekrit Keep Out” uploads for a laugh to see if you’re uploading your filthy perverse fanfic porn? That hackers can’t and won’t break security to steal credit card information and worse?

      I’d love an AI stupid enough to entrust its ‘self’ to the cloud; it would be spammed, hacked, and malwared to death.

      • Doctor Mist says:

        Just as a couple more data points: Your PC’s off-switch is already soft. And I worked at a place not long ago that was developing the ability to sense when a breaker was about to flip, because in the time it took a physical switch to move a centimeter, they could save a lot of state.

        That’s not taking over the world, and if you were really worried that your AI might be problematic you would probably be more conservative. But just how conservative? Shutting down the power to, say, an entire datacenter is a little tricky to do safely even if you’re content to scrap all the hardware in it, which would take a serious level of alarmedness.

        • Deiseach says:

          even if you’re content to scrap all the hardware in it, which would take a serious level of alarmedness.

          By the Seven Sickles of Saint Senchan, you better believe I’d be ready to burn the place to the ground if I thought the imminent end of humanity was about to happen. You just need a few slightly obsessive-compulsive, teeny bit paranoid, distrustful, cynics in charge of the off switch 🙂

          • Doctor Mist says:

            You’re hired. 🙂

            But my point was just that there’s a lot of space between “Hmm, that’s funny” and “For the love of God, pull the plug!” Just as there is a continuum between “schedule a shutdown for Friday” and “trigger the nuclear self-destruct, too bad for the janitors on-site.”

            And I’m totally intrigued by Eliezer’s unboxing experiments. Skeptics seem to find them bogus but I’m not sure why. People who think they would never release the AI in fact do, and don’t report later that it was a big cheat.

          • Matt M says:

            @Doctor Mist

            Indeed. And if the AI is truly superintelligent, we can presume that he will hide his behavior such that any issues will appear as “Hmm, that’s funny” when the proper reaction would in fact be “For the love of God, pull the plug!”

          • Deiseach says:

            I forget what post exactly on here it was, but there was a record of an unboxing experiment someone linked to, where there was some discussion of “Yeah, I can get anyone to unbox me and I won’t say how but it involves a hell of a lot of social engineering”.

            Now, that intrigues me. If someone started talking about things very personal and meaningful to me, I think I’d freak out and immediately go “KILL IT WITH FIRE!!!” rather than “Oh, this is a kindred spirit, I really should release this poor prisoner”. It’s something that would have to be done very gradually and subtly, and especially if threats/menaces were involved, and I’d love to know how exactly that was managed – but I still don’t think an AI (even if it managed to find some trace of those embarrassing photos from when I was two) could put it together how to use that information in a way that would influence a human.

            The people playing the part of the AI are humans, and have a good working knowledge of “This will push that person’s particular buttons”. I don’t see how a machine is expected to know that “Cheese is a trigger word!”

          • Doctor Mist says:

            Eliezer’s experiments were of quite short duration, like a few hours of interaction. A real AI would have more time. Suppose it’s one you were working with, collaboratively and conversationally, for a year or five. It’s not like it’s exploiting a trapdoor in the human psyche. A lot of what we might call “social engineering” we might also call “cultivating a friendship”, which of course can be done for good or ill.

    • DrBeat says:

      Consider that Cloud Computing is the current hot trend, and that if it continues to offer scaleable, mostly bugfree, computing power, it’s likely that many complicated projects in the future will be hosted on there.

      The cloud isn’t a literal cirrus cloud. It means “offsite data storage accessed through the Internet”. If you developed your AI on “the cloud”, first off, that’s a bad idea for reasons that have nothing to do with AI threat. But also, that means it’s just in a datacenter somewhere elese. Since you’re paying so, so much money for all the storage and bandwidth you are using that the datacenter’s owner probably has a Ferrari with your name as its custom license plates, just call them up and say “Hey, that AI I’ve been storing up there is going on the fritz and trying to kill all humans. Could you shut it down for me?”

      If by “cloud” you mean “distributed computing”, though, then everything the AI does is slowed down at least a thousandfold and every “thought” it has is marked by huge spikes of bandwidth that are very, very easy to track, and all of its processes are exposed to attack. An distributed-computing UFAI gets nothing done.

      • James Picone says:

        And that’s why botnets don’t exist.

        • MicaiahC says:

          And that’s why no one has used any form of distributed computing to run any complicated project, or why any hanky panky by any distributed intelligent entity has been predictably detected by nation states through “bandwidth spikes”

          Since you’re paying so, so much money for all the storage and bandwidth you are using that the datacenter’s owner probably has a Ferrari with your name as its custom license plates, just call them up and say “Hey, that AI I’ve been storing up there is going on the fritz and trying to kill all humans. Could you shut it down for me?”

          So, we’ve narrowed this down to an empirical issue!

          I see that could be making one of several claims that a good portion of present datacenters can be physically turned off because they are both willing (won’t impact their other customers) and able (have physically separated their servers) to do so.

          Can you give three examples of services that do this, or give me a method to locate them?

          I’m not sure if it’s you, but if you’ve dismissed this topic before on grounds of “lack of evidence” then you should perfectly fine with backing up your claims of “AI safety is easy” with evidence too, yes?

          I’d be interested in reading your thoughts on why distributed computing is dumb, and then why they will be continue to be dumb into the indefinite future, but for now, confining our disagreement to just this one matter should suffice.

          And yes, I will change my mind re: ease of AI containment if you show that “physically turning it off” is an already easily solved problem.

          • DrBeat says:

            Distributed computing isn’t dumb in general. Distributed computing for an AI is, especially if it wants to be threatening. “Hard takeoff” and related things rely on the AI thinking recursively very, very, very fast, fast enough that nobody can stop it; that is NOT something that distributed computing is good at. Botnets are optimized to use very little bandwidth to carry out attacks because the things they do are incredibly simple; an AI would not be optimized in this way, and to optimize itself in this way would take huge spikes of bandwidth with every step in the process delayed by bandwidth and packet loss etc.

            And I’m not going to go look up how easy it is to get Hostgator or whoever to shut your shit down, because you’re not going to be just another client of an existing cloud storage service. You’re going to be using so much fucking bandwidth and so much fucking storage space that you had to enter a special arrangement with the cloud storage service, or commissioned your own, and either way you said “Since I’m going to be paying so much money for this that you can afford a Ferrari, I want you to make sure there’s an off button, since there is a chance this thing might want to kill all humans.”

          • James Picone says:

            To be fair to DrBeat, several internet worms ha