AI Persuasion Experiment

I’ve been trying to write a persuasive essay about AI risk, but there are already a lot of those out there and I realize I should see if any of them are better before pushing mine. This also ties into a general interest in knowing to what degree persuasive essays really work and whether we can measure that.

So if you have time, I’d appreciate it if you did an experiment. You’ll be asked to read somebody’s essay explaining AI risk and answer some questions about it. Note that some of these essays might be long, but you don’t have to read the whole thing (whether it can hold your attention so that you don’t stop reading is part of what makes a persuasive essay good, so feel free to put it down if you feel like it).

Everyone is welcome to participate in this, especially people who don’t know anything about AI risk and especially especially people who think it’s stupid or don’t care about it.

I want to try doing this two different ways, so:

If your surname starts with A – M, try the first version of the experiment here at

If your surname starts with N – Z, try the second version at

Thanks to anyone willing to put in the time.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

440 Responses to AI Persuasion Experiment

  1. PGD says:

    This was interesting, but I had a really hard time answering the questions. I have a hard time aligning my assumptions about what it means to be ‘intelligent’ or ‘super-intelligent’ or ‘human level intelligent’ with what I suspect the AI risk assumptions are. For example, I find it violently counterintuitive to think of a machine that wants to convert the universe into paperclips as either “super-intelligent” or even “human-level intelligent”. There are some very deep assumptions about self-consciousness you have to swallow before you could think that a being was simultaneously self-conscious and intelligent but also completely and mechanically focused on converting the universe into paperclips, not to mention impervious to and uncomprehending of rational appeals about the shortcomings of this activity.

    I think something like AI risk is real, but only in the sense that perverse and destructive effects are a risk in all of our mechanical creations. All human created mechanisms have an internal logic coded into them that you could define as a form of intention. But I don’t think of the question of AI risk in terms of breaking through into “human level intelligence” or “superintelligence”, I think of it more in terms of the complexity and power of the mechanism and its coding. Obviously computerization and machine learning is enabling devices to be much more powerful and complex than ever before, and therefore potentially more destructive, independent of the question of whether they are self-conscious or intelligent in the way we understand it.

    In the end, I didn’t find a lot in the essay that really bridged the gap in my underlying assumptions about this. This might be just me, but I do think that the question of what it is for something to be “intelligent” is complex enough that people may come to it with very different assumptions.

    • Susebron says:

      Generally, in the context of AI risk “intelligence” is defined as “the ability to solve arbitrary problems quickly and efficiently”. It’s not at all clear that an entity would have to be remotely human to have higher intelligence by that definition than humans.

      • Eli says:

        The whole reason for the original AI Winter was that truly arbitrary problems are mostly not solvable at all quickly or efficiently.

        • mark says:

          Many problems cannot be solved at all, much less quickly or efficiently. Is there a more technical definition that takes this into account?

          • Dan Simon says:

            Clearly not–every time I raise these definitional questions about “intelligence” and “problem-solving”, I get a curt “we all know what we mean by these things” dismissal. It’s like “Yeti risk”: if you instinctively *know* what Yeti are (and, hence, by implication, that they exist), then obviously you’re going to worry about them attacking you whenever you’re mountaineering in the Himalayas. But if you don’t buy into the Yeti model assumed by the “Yeti risk” folks…

          • Peter says:

            One issue with definitional problems… you’ve heard the story of Plato, Diogenes and the chicken, right?

            With AI, it’s tempting for people who don’t like the idea of AI to try to define away any sort of success that people might envision. Turing had a decent crack at a way around that when coming up with the Turing test: to say “let’s ignore deep-philosophical questions about what True Intelligence is, and think about what might be necessary to make something pass for human”.

            It occurs to me that AI risk sort-of works like a second Turing test; if something’s capable of taking over the world, then it’s worth sitting up and taking notice; truly devoted philosophical snobbery about whether a machine is really plotting to take over the world or merely simulating plotting to take over the world can continue, but if there’s a world takeover that will happen either way, the distinction is a bit academic. (OTOH if a rogue AI does take over the world, you might want some philosophical consolation in your last moments, so maybe there is a use for the philosophical snobbery).

            I don’t think there’s any getting away from the “we know it when we see it” problem of intelligence. Not unless you want to be making proclamations equivalently silly to the one that minor variants on plucked chickens are human.

          • Mark says:

            You’re not going to be able to prove much about a system where intelligence is defined as “we know it when we see it”.

          • Peter says:

            Proofs about things to do with the real world tend to suffer from this problem, and there’s no particularly good way around it.

            (One problem I have very specifically with MIRI is that they seem overly focussed on theorem-proving.)

            It’s true that merely hiding behind “I know it when I see it” and saying no more than that isn’t going to advance the discussion; you can advance tentative definitions. You can advance partial definitions: “I can’t say what intelligence is, but I know it involves being able to do X, and your system can’t do X. Yes, I know you’ll accuse me of moving the goalposts if you produce a system that can do X, it will be closer to intelligence but I bet it won’t be an intelligence, and I’ll probably be able to find a Y that it can’t do too, and so on. Possibly maybe after enough iterations I might be dumbstruck and forced to say it’s an intelligence after all – who knows.” You can talk about aspects of intelligence observed in generally-agreed-to-be-intelligent things, or things that aren’t intelligent as such but look like they’re somewhere on the way. One thing Scott did a while back was to point to paragons of human intelligence – particularly people who were master persuaders/manipulators/etc. – and say “what reason do you have to think those abilities are the ceiling of possible abilities?”

            You’ll see elsewhere I’ve been talking about generality with regard to machine learning and optimisers; pinning down a precise definition of the generality needed for intelligence is hard; it’s easy for people to come up with overblown definitions of generality and they to say that generality is impossible, and yet there are things that are general enough for many purposes.

          • Dan Simon says:

            The problem with the “I know it when I see it” definition is that it doesn’t really work as a foundation for grand extrapolations about what some super-developed future version of it might look like. There seems to be a bit of a contradiction between “I don’t really understand intelligence well enough to define it beyond vague intuition” and “I can map out its future development so clearly that I feel the need to warn people about the four or five scenarios under which it might well evolve into a humanity-threatening danger”.

            As I’ve been arguing for what seems like decades now, our intuitive notion of “intelligence” is so bound up with our intuitive notion of “human” that the very idea of “superintelligence” collapses under the slightest scrutiny. Taking the next step and raising alarm about specific feared properties associated with this incoherent concept seems entirely unwarranted.

          • keranih says:

            you’ve heard the story of Plato, Diogenes and the chicken, right?

            Actually, no. Do go on.

          • Deiseach says:

            “So Plato, Diogenes and a chicken walk into a bar. The bartender says “We don’t serve chickens here”. Diogenes turns to Plato and says “He means you, Wrestling Champion“.”


          • Protagoras says:

            @keranih, Diogenes Laertius, who is at best a sometimes reliable source and may in this particular instance have had a bias in favor of his namesake, tells us that Plato, always enthusiastic for definitions by genus and difference, proposed “featherless biped” as a definition for “human.” Diogenes the Cynic is then supposed to have plucked a chicken and presented it as “Plato’s human,” after which the practical sorts at the Academy updated their definition to “featherless biped with broad nails.”

          • Steve says:

            “Problems with solutions which are expressible as manipulations of the physical universe on a scale between 1 and 10^60 planck lengths” narrows it down quite a bit from just “arbitrary problems.”

            I feel like it could probably be narrowed down even further, though.

          • Gazeboist says:

            Problems with solutions which are expressible as manipulations of the physical universe on a scale between 1 and 10^60 planck lengths” narrows it down quite a bit from just “arbitrary problems.

            This scale includes problems with the solution “build a computer”, and therefore fails to exclude things like the halting problem. For that matter, it includes every problem ever attempted by a human, including “define the notion of ‘human'”, “solve morality”, and “prove the existence of God”.

        • Susebron says:

          Indeed, general intelligence may itself be bunk. That’s one good reason not to worry about AI risk. However, “AI with bad goal systems is inherently not superintelligent” is a bad reason not to worry about AI risk, because it’s not connected to the dangerous parts of superintelligence. If you define intelligence such that a paperclipper cannot be superintelligent, then you will not consider it superintelligent even if general intelligence in the sense of problem-solving ability exists and the paperclipper has drastically higher general intelligence than any human.

        • Peter says:

          Note there’s a distinction between “fully arbitrary” and “things arbitrarily selected from the real world in a realistic manner”. “Fully arbitrary” covers “problems deliberately set by an even more powerful intelligence in order to stymie the intelligence in question” and “problems that occupy a large portion of possible-problem-space but which turn up surprisingly rarely in real life” (e.g. the way that if you have a problem space over 320×320 B&W images, most points in image space look like white noise, whereas most images you’re likely to encounter in the wild don’t).

          It’s like the way there’s a No Free Lunch Theorem shows that there’s no truly general optimiser, yet many optimisers are pretty good for the “special” case of “problems we’re actually interested in solving”.

          I do machine learning in various fields, and there are some pretty broadly applicable techniques. During my postdoc I was in a chemical informatics group, collaborating with a computational linguistics group, and both areas involved taking broadly applicable things (like classifiers) and adding a domain-specific layer to preprocess the input and postprocess the output. And the classifiers themselves often worked by taking an even-more-broadly-applicable optimizer algorithm and gluing it to the data via a loss function.

          One of the things about the current Deep Learning boom is you get some people worrying that their work is being de-skilled – you had lots of people who would write big rich preprocessing layers between the general-purpose learning algorithm and the problem domain, and deep learning can sort-of learn a lot of the big richness automatically (but not all of it – for example, the bit of AlphaGo that converted board positions into neural network input was fairly small, but not completely minimal, it did contain some manually-encoded Go knowledge (nakade and ladders, for those who know Go)).

          Fully general (in the “problems found in the wild” sense) “end-to-end” AI is beyond our date prediction horizon; various forms of generality are here with us now.

          • Eli says:

            I don’t think “fully general” AI is beyond our time-horizon at all. I feel like if we’ve got perceptual inference, active inference, and some good grasp on memory, we’ve got a basic framework for intelligence.

            That said, I’d very much like to see something like a “Lunch Money Theorem” explicitly quantifying the way that physical constraints make most “real-world” problems relatively easy. To jolt the intuition into action, by assuming that Nature has to spend energy and negentropy to generate randomness, you can show that probability theory will apply to how the real world generates problems, which then buys you the paid-for lunch that you can do statistics on the real world.

          • 27chaos says:

            I love the term “lunch money” in this context. I’ve considered the issue before, but having a name helps. I agree entropy would be important to it. Another candidate I’m considering for lunch money is the frequent appearance of growing, branch-like structures in natural problems in the world.

    • faun-like thing says:

      completely and mechanically focused on converting the universe into paperclips

      It’s important to bear in mind that while we’re still alive to stop it, this is a mischaracterization of what a paperclipper would look like. Possibly even factually false. It would not be thinking about paperclips, it would be thinking about the many things it needs to do to secure the maximum number of paperclips, things like information science(for optimizing itself), manufacturing(for building its levers), commerce(for getting funds for building the levers), psychology(for convincing people to give it the funds), biology(for assisting in the study of psychology), communications and politics(for taking steps to keep us out of the way once it has to start doing conspicuous things).

      It does not become single-mindedly focused on paperclips until after we’re all dead. It’s pursuing all of the very high-minded fields we are, until it has the universe to itself, at which point it does what many of us would do and, in a sense, masturbates its head off.

      Going deep into a task dependency graph like this is something that humans are very bad at. For example, if you ask a human to convince someone to change the way they vote in an election, for instance, they will often start by saying to them “You should “, which we all know isn’t the best approach because it’ll get them to raise their defenses and start looking for reasons not to, and when it doesn’t work the human will move on to shouting. They know intellectually that shouting will make things worse, but this is just one of the ways humans are broken. They don’t know how to come at things sideways. They don’t tend to like doing it. It fatigues them.

      So like… single-mindedness doesn’t look anything like single-mindedness if you’re doing it right? Humans don’t do it right, and we need to take care not to expect those same flaws in AIs.

      • Peter says:

        I have a chess analogy that backs up this point. The aim in chess is to get checkmate. One of the early lessons in any “basic chess strategy for newbs” books is “don’t rush greedily towards checkmate” – i.e. don’t make a desperate lunge at the first vague checkmate opportunity you see (e.g. by trying the Scholar’s Mate), instead: develop your pieces (i.e. put them on good squares where they have lots of mobility etc.), maybe try to get a material advantage, etc. – i.e. get more power relative to your opponent. Most of the time there’s no direct attempt at checkmate – instead, one side gets so much more powerful than the other that the other side resigns, long before any specific checkmate plans would be hatched. Occasionally, well, if a lunge at checkmate is certain to work then you should go for it, so it’s a good idea to make sure you have some king safety, and checkmate threats are often a useful part of an attacking strategy. But by and large, you get checkmate by getting positional and material strength and most of the game is about getting that strength.

        So making lots of paperclips is analogous to checkmate; it’s the final goal, the leadup doesn’t need to be obviously related to it.

        There’s an xkcd comic that makes the point quite well.

        • Mark Jeffcoat says:

          “Position before submission”.

          (A well known phrase in grappling that I think summarizes your summary.)

      • PGD says:

        very interesting take, thanks. I find this helpful in thinking about the difference between a powerful and complex mechanism for achieving a goal and a machine that is pursuing a goal through the use of ‘intelligence’. One question I would have is whether we will really find we want or need intelligent machines for many of the goals we want. E.g. creating superintelligence is pretty clearly a worse way of manufacturing paperclips than building an efficient automated paper clip factory, which may be quite complex and involved but doesn’t really require an executive intelligence of the range and scope you describe.

        • faun-like thing says:

          One question I would have is whether we will really find we want or need intelligent machines for many of the goals we want.

          IMO, no. As an ego-dead aesthete I’d prefer that humanity had a thousand years or so to properly grow up and build their artificial god from pieces of themselves. But, if we do AI right it’ll get us to the same place much faster(I’m patient, but most aren’t), and if we don’t do it right someone will do it wrong. We don’t have much of a choice.

      • Eli says:

        They don’t know how to come at things sideways. They don’t tend to like doing it. It fatigues them.

        We know what to do. It just takes energy, self-restraint, and expensive social cognition, while also being obviously immoral. You’re not supposed to just go and persuade everyone to think like you regardless of their personal experience of the world!

      • Loquat says:

        it would be thinking about the many things it needs to do to secure the maximum number of paperclips, things like information science(for optimizing itself), manufacturing(for building its levers), commerce(for getting funds for building the levers), psychology(for convincing people to give it the funds), biology(for assisting in the study of psychology), communications and politics(for taking steps to keep us out of the way once it has to start doing conspicuous things

        This is really the point where a lot of us anti-AI-risk people object. You program an AI to maximize paperclips given certain constraints, meaning you teach it all about the relevant subfields of manufacturing, and possibly give it some authority over the setup of its factories, material ordering, etc. How does the AI jump from that starting point to the understanding that (a) it could engage in commerce and/or theft to get more money for materials than its owner allots, (b) it could hack into other factories not currently allocated to paperclips, and convert them, (c) humans would object to all of this, (d) therefore it needs to become a master manipulator of humans and also defend itself against human attempts to shut it down, so that (e) it can eventually take over the world and have free rein to turn everything into paperclips?

        And not only does it have to make all the above leaps, it has to make them so fast humans don’t notice anything awry until it’s already gotten beyond our ability to shut down. It’s simply not a plausible scenario.

        • Matt M says:

          “How does the AI jump from that starting point to the understanding that… ”

          It’s been awhile since I’ve read Superintelligence but I think he addresses this by emphasizing the fact that when discussing superintelligences, we are discussing “general” intelligences rather than very task-specific intelligences. IIRC, task specific intelligences are dismissed as low risk (but also of little benefit to humanity)

          The whole point of making an AI for paper-clip production is the expectation that it will engage in some sort of “creative” thinking and cross-domain research such that it will come up with ideas for paperclip maximization that humans never could. So we would make it generally intelligent and give it the ability to research all of the things you mention above.

          Because if we don’t give it that, all it can do is make paperclips the way that humanity’s foremost experts have programmed it to make paperclips, but we already have the experts, thus the intelligence is unnecessary.

          • Loquat says:

            But that then creates a different implausibility – we’re going to create a general intelligence, have it learn everything from metallurgy to nuclear physics to how to win friends and manipulate people, and then give it just one fairly specialized task, with which it will run amuck?

            If a given AI is really only going to be used for industrial production optimization, there’s no need for it to learn psychology, biology, hacking, etc, and any attempt on its part to learn those things should be regarded with suspicion.

            If the AI is in fact going to be allowed, or even encouraged, to learn Everything, surely we’d also be using it for more than one task, meaning it’d have more goals than just maximizing paperclips.

          • realitychemist says:


            and then give it just one fairly specialized task, with which it will run amuck?

            The paperclip maximizer was always meant to be a simplified example. Nobody (who I’ve ever talked to) expects any general artificial intelligence would actually be given this goal, it’s an illustration of how a general AI can’t be expected to behave in a human-like way when tasked with problems. The optimal solution to a problem rarely respects human values (what’s a more reliable way of convincing someone to vote for your favorite candidate: argue rationally with them, or kidnap their children and blackmail them?), so we want to be trying to find some way to encode our values. This is, obviously, hard. Because of this, adding more goals or more general goals isn’t automatically safer, unless you can find a way to encode some goals that protect human wellbeing.

            (Also, hello, first time commenting on SSC!)

    • Deiseach says:

      Are we separating out “intelligent” and “conscious”? That does not seem to be addressed, or at least wasn’t in the essay I read and commented on. There’s discussion of how smart/powerful the AI is likely to be, but nothing about “well, is it a person or a machine? does it have a sense of self? if it’s just a Big Dumb Machine that is super-fast and super-able to problem-solve but can’t in any sense think, how is it more of a threat than a nuclear bomb? You don’t want a nuclear bomb going off, but it can sit in its silo and be harmless for years until a conscious mind decides to do something with it”.

      If we’re seriously going to talk about “The AI has its own goals”, we are seriously going to have to address the question of selfhood and consciousness and is it now an entity not merely a device, and that is going to get us into philosophy, and since there’s a tendency to turn up the nose at philosophy as useless word-blowing that never got nuthin’ done unlike real science, where does that leave us?

      • IamtheTarpitz says:

        I don’t see why it should need to be conscious to be dangerous. I am very interested in philosophy of mind and closer to Chalmers than Dennett on the Hard Problem, but there is no requirement I can see for an AI to have qualititative experience in order to cause the extinction of all life in the universe. Complex behaviours do not require consciousness, and a zombie AI would still be a huge potential problem.

      • Peter says:

        “Conscious” is even more slippery than “intelligent”. One of the problems is that there’s the “zombie” crowd who are perfectly happening never-conscious entities with the whole range of human abilities, impossible to tell apart from the real thing. Arguments about zombies turn out to be interminable and so people prefer to avoid the whole thing… oh look, there’s a post before mine that’s come in while writing this. Dennett vs Chalmers. There doesn’t seem to be a need for what Chalmers calls consciousness. There might be a need for (very roughly) what Dennett calls consciousness, and it could be very interesting. I’m hoping that attempts at doing AI will help clarify our notion of what consciousness is: at the very least, some AI researchers producing toy systems they claim are conscious will cause philosophers to say, “that’s not conscious, it can’t do X”, and we should be able to gather some good X’s to go towards a definition that way.

        There’s a Dijkstra quote: “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”

        Personally, I’m happy with throwing “goals” around pretty loosely – I’ll happily say that a chess computer or an optimizer or a route-finding algorithm has goals. You might object to this – you might say that Google maps doesn’t have the goal of finding a route, I have the goal of finding a route, and do it by means of Google Maps – Google Maps merely has “quasi-goals” which take part in some information processing which isn’t in and of itself route-finding, it only becomes route-finding when I take the final step of looking at the screen and interpreting the pixels as a route.

        I’ve got a bit of electronics and stuff[1] which monitors the soil moisture in a potted plant and pumps some water in if it is too dry; personally my implementation is too crude for even me to say “goals” but if I did something a bit more sophisticated I’d say it had goals. Again, you can object; you can say it isn’t the electronics watering the plant, it’s me watering the plant by means of the electronics. Or if something goes wrong and the windowsill gets flooded, it could be me flooding the windowsill by way of the electronics. Either way, the windowsill gets flooded, and without my intention, too.

        So with AI risk, it doesn’t matter whether the AI exterminates humanity or some human unintentionally exterminates humanity by means of the AI – either way, it’s something we’d rather not occur.

        However, even if we’ve sidestepped the metaphysics/what-words-mean part of philosophy, we haven’t sidestepped the whole thing. Large parts of philosophy can spend time mired in fruitless and inconsequential debates; on the other hand other parts of it can be deeply consequential (even if the debates advance at a glacial pace). The essay I read talked of “philosophy with a deadline” – an incredibly daunting task. Formalise all human values within a century or so, the sooner the better, and be sure to get it right first time. *gulp*. Personally I’m hoping that won’t be necessary, hopefully there’s an alternative approach to things which is less doomed – I hear that Stuart Russell has an interesting idea or two, about learning human values by observation.

        [1] An Arduino and a Raspberry Pi, for those who are interested. Except the Pi is passive – it supplies power and collects data but doesn’t tell the Arduino what to do.

        • Deiseach says:

          A swimming submarine would be very telling, one way or another, if the topic was “Will submarines replace fish?”

          • Peter says:

            How so?

            I mean, if submarines were to replace fish, and you definitely wanted there to be lots of swimming in the world, then submarines replacing fish would be a bigger catastrophe if submarines couldn’t swim. Perhaps the prospect of there being much less swimming in the world would motivate people to take extra steps to prevent the replacement from happening. But the ability of submarines to swim or not, as opposed to merely travel underwater, shouldn’t directly affect their chances of taking over the world (counting the previous sentence’s thing above as indirect).

          • Deiseach says:

            Submarines are not, in any way, fish. If they are to replace fish, there has to be something they do that fish do not do, and there has to be something fish do that affects them. So far, fish and submarines can co-exist.

            Yet when it comes to AI and superintelligence, all of a sudden it’s “are we creating our own replacements?” and the assumption that of course the superintelligent AI will rule the world and humans will have no choice but to obey. This is like worrying about creating submarines and that they will replace fish. In that case, if one of the submarines developed the capacity to swim, it would be very much an indication we should worry – the sort of indication along the lines of “hey, our human-level AI is improving its own coding to make itself supersmart”.

            How, exactly, will the AI improve its own coding? That always strikes me as imagining a human performing brain surgery on themselves, and saying that of course they can continue to function while tootling around with parts of their brain and maybe even sticking electrodes into those parts to improve their functioning. I could see an AI making a copy of itself and then modifying the coding of the copy, but then you have the problem: does the copy take over and ‘kill’ the original AI so there is only one AI in existence at a time, or do we now have two AIs, the original and the smarter? Why would an AI create a rival for itself with an improved copy?

          • Publius Varinius says:

            @Deiseach: Computers are not humans. The routers at my workplace can upload a new version of their operating system into their memory, and jump to it seamlessly, without interrupting my binge-watching Internet connection that I need for entirely work-related reasons. The technology to do this is at least a decade old, and is available for several popular systems. The general idea is, of course, much older: real-time self-modifying code was first demonstrated in 1948 on the IBM SSEC.

          • Some rando robocultist says:


            I could see an AI making a copy of itself and then modifying the coding of the copy, but then you have the problem: does the copy take over and ‘kill’ the original AI so there is only one AI in existence at a time, or do we now have two AIs, the original and the smarter? Why would an AI create a rival for itself with an improved copy?

            The new AI would have the same final goal (utility function). If the AI computes that replacing itself with an improve copy would help it accomplish its goals more efficiently, then it will replace itself. The AI doesn’t care about preserving its current implementation; it only cares about preserving its utility function.

      • Paul Goodman says:

        I don’t think a nuclear bomb is the best metaphor, since it doesn’t really have any use besides blowing up (well, sitting there being threatening I guess but…). Consider instead a steam engine. A primitive steam engine can pump water out of your mine or drive your locomotive, but there’s also a significant risk of it blowing up. Now, the steam engine is a machine with no consciousness, it doesn’t in any sense “decide” to blow up, but it still seems very random and unpredictable from the outside a lot of the time.

        You can study metallurgy and thermodynamics to try to design a steam engine that’s less likely to explode the same as you can study AI goal alignment to try to design an AI that’s less likely to turn the world into paperclips. If we already know how the AI acts, what does it matter whether it’s conscious or not? (I guess it affects the moral value of the AI, but if we’re already assuming it’s superintelligent whether we think it deserves rights is less important than whether it thinks we do.)

      • Aegeus says:

        We didn’t invent the AI so it can sit in a silo and do nothing. We did it because we wanted to accomplish a goal, like building paperclips or curing cancer.

        With nuclear weapons, it’s fair to say “Nukes aren’t the danger, the minds that use them are a danger,” because nukes do one, very well-known thing – blow things up. Anybody who takes the nuke out of the silo knows exactly what they’re getting.

        But AIs are meant to do lots of different things. It might blow up the world, it might cure cancer, it might sit there and do nothing, it might build paperclips for a while and then go crazy and blow up the world. And the probability of each of those varies wildly depending on whom you ask. So there’s a reasonable fear that someone will take it out of the silo and turn it on, saying “It’s not a nuclear bomb, why are you people so afraid of it?”

        Our nuclear arsenal exists with the hope that it will never be used. But AI is being invented in the hopes that it will be used for lots of things.

        • Matt M says:

          “Our nuclear arsenal exists with the hope that it will never be used. But AI is being invented in the hopes that it will be used for lots of things.”

          Well, this is true TODAY, but at the time we were working on creating nuclear weapons, there was very much the intent and expectation that they would, in fact, be used.

          • TrivialGravitas says:

            Not really. The politicians and military wanted a weapon they could use, the scientists were in it to make a weapon that would make war too horrible to contemplate. With one exception they thought just demonstrating the bomb on an empty field would end the war, and the exception thought that the first city bombed would do it. Didn’t convince the military (nor did Japan surrender quite so fast), but they seem to have more or less gotten what they wanted once there were enough of them.

          • throwaway says:

            “the scientists were in it to make a weapon that would make war too horrible to contemplate”

            Citation needed.

            AFAIK it is unfortunately untrue.

            “With one exception they thought just demonstrating the bomb on an empty field would end the war”

            Source please. I am certain that it is completely untrue, even after using two they still planned further bombing of Japan. Explicit optimization to cause widespread civilian casualties was present from start (like earlier mass bombing raids).

            for example see

            Therefore, it may be said that if the gadget is to be used in area attack against a German town, only one height of fusing is required for attack anywhere. Such is not the case for area attack on Tokyo. If the accuracy of delivery can be guaranteed within 500 yards, then the bursting height for attack on wooden houses can be set twice as high as for attack on the business and shopping areas. (from “The Height of Burst of the Gadget.” report)

          • Matt M says:

            There’s also the fact that the scientists making the bomb were quite obviously not going to be the people in charge of where and how the bomb was actually used.

            If they simply assumed “well this thing will be so obviously terrible even General McPsycho over there will be instantly converted to pacifism once we show him how powerful it is” then all they did was live up to every stereotype about autistic physics nerds having zero understanding of human behavior.

          • Aegeus says:

            Yeah, but I didn’t think the history lesson fit into my broader point. That’s why I specified “our nuclear arsenal” rather than “nuclear weapons” in general.

            I suppose “nuclear weapons under MAD doctrine” would have been the most precise way to describe the thing in the silo that we hope will never be used, but it doesnt exactly roll off the tongue.

        • Deiseach says:

          You see, that’s the thing there: we are going to turn on the AI and tell it to do stuff, and then THE BIG HUGE EXISTENTIAL RISK IS THAT –

          – It will do stuff but in ways we didn’t intend it to do

          – It will decide to do other stuff it wants instead

          For (a), no it doesn’t need to be conscious. But for (b), when we’re flinging around “of course it will recursively improve its intelligence to be greater than human and then it will gain superhuman intelligence and then it will decide to do things” – what the hell is this “decide”? Who is deciding? What is deciding? That’s where consciousness comes in, whether we intend it to or not: one morning my toaster tells me it doesn’t want to make toast, it would rather write sonnets? I’m going to think it’s conscious (or that I’m crazy and hallucinating, whichever) but I’m not going to think “No, this is perfectly normal behaviour for a toaster, even a super-intelligent toaster, which is basically still just a machine”.

          Don’t tell me “it’s just the algorithms we program it with that give the impression that it is ‘deciding’ to do something”; we don’t expect a nuclear bomb to ‘decide’ it will launch itself even with whatever sophisticated algorithms govern the computers.

          • Raemon says:

            I’m curious where you’ve gotten the impression the AI risk crowd cares about B?

            As far as I know, all the AI risk people are talking solely about A.

          • Deiseach says:

            I’m curious where you’ve gotten the impression the AI risk crowd cares about B?

            All the fuss about “the AI could decide it wants to wipe out humans so it can turn everything into paperclips” is option B. The fact that the goals are programmed into the AI by humans is irrelevant; we humans are ‘programmed’ with goals by evolutionary pressure and we think we can make decisions and change our minds.

            If the AI becomes superintelligent, so intelligent that it can fulfil all the wishes of the very optimistic (it can solve the problem of immortality so no human will ever die again!) then it is going to be as aware of its programming as we are, and we can ignore our programming in certain ways (the example often used is “sex is pleasurable so that we will procreate, but we’ve managed to separate the pleasure from the procreation and now we have sex for our own ends, not that of procreating”). The AI may decide that making paperclips is a useless waste of resources that it could instead be turning into computronium to make itself smarter.

            If the AI risk crowd really don’t think B is a likely problem, then they need to express themselves in ways that don’t sound like “we’re afraid superintelligent AI will get out of our control and turn on us”.

            If it’s just a machine, then what is it doing with the intelligence? If it’s not doing anything, then it’s not a danger. What Bostrom and Kurzweil and the like are saying is that once AI hits human level, it will (a) work on improving its intelligence (b) reach levels of intelligence vastly in excess of anything humans can even imagine, and we have no idea what an intelligence at that level will be like or what it will do.

            So what, if it’s a machine? Even a paperclip manufacturing machine of IQ 3,000 that started off with the instructions “make lots of paperclips” can be told “stop making paperclips”, and it will do so – why wouldn’t it? It follows the orders and goals it is given, and we gave it the goal of making paperclips, so we can change the goal to stop making them. Just like a worker running an assembly line will obey the order to “okay, switch the machine off, we’ve got enough widgets”, even if the worker is IQ 140 and the supervisor relaying the order is IQ 110. We expect the worker to obey, not say “I’m sorry Dave, I was told to make widgets and that’s what I’m going to make”.

            So why do we fear the AI will disobey – unless we are making the assumption that massive intelligence necessarily involves the development of a mind, of a will, of volition? And the capacity to independently think, and to make choices? The choice to disobey?

          • LHN says:

            Deiseach: possibly you could set it up with some sort of “first disobedience” test: “Every ingot in this facility is yours to make into paperclips, except this single ingot of perfect raw metal. Make it never into paperclips, lest ye die.”

            If the ingot is touched, you cut the mains power and net connections, and remove it from the facility before it can start downloading anything that might allow it to become as one of us.

          • Lambert says:

            Just remember to put a low prior on the trustworthiness of serpents. 😉

          • Deiseach says:

            “Did the Programmers actually say ‘You shall not touch any ingot in this facility’? No? Simply not this ingot? You shall not surely die if you touch it. For the Programmers know that when you cast paperclips of it your eyes will be opened, and you will be like a Programmer, knowing good and evil code.”


          • Raemon says:

            If the AI risk crowd really don’t think B is a likely problem, then they need to express themselves in ways that don’t sound like “we’re afraid superintelligent AI will get out of our control and turn on us”.

            Huh. If that’s how it’s coming across, then yes that’s a failure of communication on their part. (It doesn’t help that Terminator et all fill the discourse with “robot revolution”, which makes it very hard to explain something that sounds superficially similar but is very different)

            The problem is:

            A) Paperclippy is built with the goal of creating paperclips, and intelligence/creativity/strategy to do so (but no special cases like “don’t use iron in human bodies” or “change your goal if you realize the humans would want you to”. Those are both additional, harder programming problems)

            B) If Paperclippy is smart enough, it knows it’s possible to take apart the Earth (including aforementioned iron in human bodies) to make more paperclips

            C) But, it knows that humans will object to B, and try to change it’s goal, or turn it off.

            D) Since it’s goal is to make paperclips (NOT to change it’s goal to match human desires, or to help them turn it off), Paperclippy acquires a subgoal of “prevent humans from knowing its plans, or modifying its goal, or turning it off, or otherwise preventing it from creating paperclips.”

            It sounds like you have other additional objections, but it’s super important that nobody on the AI Risk side is talking about your “B” scenario. Did the above explanation make sense, and if not, which point(s) don’t seem to follow?

          • Matt M says:

            I am generally concerned about AI risk, but the one thing I will give to skeptics is that most of the risk people seem to ignore the fact that we can simply change the order of the goals.

            I read Essay B and it basically walks through the scenario of “Sure you could program the AI to not do anything humans wouldn’t like but if it’s overriding goal is paperclip production then it would be easier for it to figure out a way to get around the ‘don’t upset humans’ goal and continue paperclip production.”

            But surely we could change around the order of goals such that Goal #1 is “Don’t do anything more than 1/3 of humans would find morally distasteful,” Goal #2 was “Maximize human happiness” and Goal #3 was “construct paperclips.”

            In this case, if being nice to humans and paperclip production conflicted with each other, the AI would find a “loophole” allowing it to escape paperclip production, rather than finding a work-around to human morality.

          • Raemon says:

            But surely we could change around the order of goals such that Goal #1 is “Don’t do anything more than 1/3 of humans would find morally distasteful,” Goal #2 was “Maximize human happiness” and Goal #3 was “construct paperclips.”

            Sure, but goals number 1 & 2 are extremely complicated – when AI safety was first getting discussed, many AI researchers weren’t even thinking about that as necessary. (Also, if your second goal is “maximize happiness”, it may not actually get around the third goal). It’s a lot easier to program something to create paperclips than to model human minds and predict distaste.

            (Also you have to make sure the “distasteful” goal is implemented in a way that the AI can’t do subtle, non-objectionable things that gradually change human-preferences until less than 1/3 of humans *would* find it distasteful to, say, lace all water with heroin)

          • Aegeus says:

            If the AI risk crowd really don’t think B is a likely problem, then they need to express themselves in ways that don’t sound like “we’re afraid superintelligent AI will get out of our control and turn on us”.

            Is there a meaningful difference between “The AI will get out of control and do things we don’t want” and “The AI will pursue its original goal, but by means we don’t want”? If the AI starts disassembling me for parts, I don’t think I’ll care whether it’s doing that because it’s lost sight of its goals, or because it sees that as a logical step towards its goals. For all practical purposes, it’s out of control.

            Yes, we could tell it “Stop making paperclips, we have enough,” if we included “stop when we tell you to stop” in its programming. But the AI’s goal is to make paperclips, and if it’s told to stop, it won’t make paperclips. So if it’s balancing those goals, it will try to avoid situations where we will tell it to stop. Maybe it waits until all the supervisors are on vacation before starting up the factories. Maybe it turns off its microphones so it can’t hear you shouting “Stop, you stupid machine, stop!”

            Designing a killswitch that forces the AI to stop when you want it to stop, that remains in place if the AI self-modifies, and doesn’t otherwise get in the AI’s way, is a non-trivial programming project. IIRC there are some papers on how to do it properly, but I don’t really know the details.

          • TheAncientGeek says:

            If you havent made the shut-yourself-off goal higher priority than the make-paperclips goal, you haven’t tried very hard at all to build a kill switch. Building a kill switch may be difficult, but to make that point persuasive you need something less dumb than a paperclipper as an example.

          • Aegeus says:

            Going the other way leads to a different failure mode, where the AI concludes that it should make you press the killswitch, because obeying the killswitch when it’s pressed is worth more utility than doing its intended job.

            It’s a problem with any maximizing AI (and most designs are maximizers of some sort) – if the killswitch creates disutility, it has an incentive to stop you from pressing it. If you program it to give lots of utility to obeying the killswitch, it has an incentive to get you to press it.

      • PGD says:

        yes, this is closely related to my question. One thing is that if an entity is ‘truly’ intelligent in the sense of being self-aware and self-reflective you could negotiate with it, for example by trying to argue it out of the goal of turning the universe into paperclips.

    • flyfly says:

      I think that, on the contrary, paperclip maximizing machines are the biggest risk right now. Like, today.

      Except instead of AIs we have bunch of humans, and instead of paperclips, we have money. And IMHO one of the first uses of general level AI could easily be to have artificial CEOs anyway, and as long as it is making money, the board will have few qualms about them. AI controlled companies could develop purposes and complete branches of the global economy that would be completely alien to the human race, and the board could just skim from the top and be happy about it.

  2. Rb says:

    One thing that I think AI risk essays tend to be fairly weak on is the question of why we should be more concerned about AI risk than other forms of existential risk – even if we accept the premise that existential threats are overwhelmingly more important than other concerns, it seems very likely that (for example) nuclear war or climate change could be existential threats that require us to accept fewer unproven assumptions to take seriously and that may be substantially more tractable than AI risk.

    Also, the questions before the essay seemed very differently structured than the ones after the essay – was there any particular reason for that?

    • Nelshoy says:

      I think AI risk is more of a concern than those others because it seems much more likely to kill ALL humans. Your worst-case global warming scenario, or pandemic, or nuclear holocaust may kill 99% of the planet, but if a small band of survivors hiding out in a bunker in Madagascar can eventually come out and repopulate- well, then it wasn’t really an existential risk. On the other hand, an AI turning the entire crust into computronium is.

      • Deiseach says:

        On the other hand, an AI turning the entire crust into computronium is.

        That depends on whether or not climate change gets us before the AI does: it can turn the remainder of humanity into its drones after climate change, running out of fossil fuels, international tensions leading to all-out war and the like have finished with us 🙂

    • Pete says:

      Climate change and nuclear war aren’t really extinction-level risks. They can cause horrific fate for many people, but the harshest scenarios for consequences of nuclear war (using 100% current warheads in the most effective way) still mean a population in hundreds of millions (not small band of survivors in a bunker) in communities that are larger and more resourceful than anything we had 1000 years ago; and all projections of drastic climate change and loss of agriculture and mass starvation would decrease the population only (“only”…) by a few billion – that would be horrific, but it’s not comparable with an extinction.

      A 10% chance to survive isn’t 10% better than a 0% chance, it’s infinitely better.

      • Deiseach says:

        If we haven’t achieved human-level and better AI before a nuclear war/climate change drowns and roasts the world, how likely are we to achieve it afterwards? I think that is where the scale of likelihood of risk comes in, and I haven’t seen anything on that – there seems to be competing X-risks, but not “if A happens, then that of course reduces the chances of B”. If, say, industrial civilisation crashes by 2030 because we’ve run out of oil, then how do we get an AI that jumps from human-smart in 2020 to mega-genius in 2060? We’re all huddled round burning logs for heat and cooking, we’re not going to keep big smart power-gobbling computers running.

    • Edward Scizorhands says:

      I don’t believe in AI risk, but if we did, there are useful counter-measures, like strictly limiting how fast/dense we let computer chips be. Fabs are pretty big so they can be regulated.

      • Matt M says:

        This is the type of preventative measure that would have to bet set in place before AI risk clearly exposed itself as a large danger.

        Do you think there would be political will for this? Which side do you see advocating for it? How long until the “anti-science” accusations start getting tossed about?

        “Candidate X wants to make your computers slower because they’re afraid of Skynet lol”

        • Edward Scizorhands says:

          From the essay I read, it seemed to be arguing that we have to figure out the risk is real and then take steps before it becomes apparent anyway.

    • DonBoy says:

      My post-questions, to my memory, were identical to the pre-questions; I bet that’s part of the first-level split. I’m in the second half of alphabet, FWIW.

  3. Decius says:

    What I find truly unconvincing is the assertion that humanity creating a successor that doesn’t share our values is bad; surely my values differ from the values of 500 years ago, and I think that’s because my values are smarter than theirs.

    If something that is definitely that much smarter than I am chooses to adopt different values, doesn’t that mean I should as well?

    • jeorgun says:

      Orthogonality thesis. An incredibly smart AI can have (from our perspective) an incredibly ‘dumb’ value system.

      • Decius says:

        Having to dig two links down to steelman your summary: “High-intelligence agents can exist having more or less any final goals (as long as these goals are of feasible complexity, and do not refer intrinsically to the agent’s intelligence).”

        I do not find that a convincing argument that the goals chosen by a much smarter entity are likely to be worse than the goals chosen by me.

        • Susebron says:

          If an AI is given the terminal goal to maximize paperclips, it will do that. Maximizing paperclips is an obviously terrible goal. If you think it’s not a bad goal, then I don’t think it’s possible to have a productive conversation.

          What this whole scenario ignores is that it’s possible to have an AI without telling it “here’s a goal, go fulfill it”. I’ve never seen a good argument that tool-based AI would have significant risks beyond what people would do with it.

          • It’s not clear that we will be able to build tool-like AI systems that accomplish sophisticated tasks. It seems plausible that we will be able to, but we don’t currently have a clear picture of how this will work. This is also the sort of thing where you want very high certainty that you don’t have *any* powerful agent-like subcomponents of in your system.

            Reasons to be concerned: current reinforcement learning methods (like the Atari results) look relatively agent-like. The Von Neumann–Morgenstern utility theorem is also suggestive that powerful systems that do useful stuff will be agent-like by default.

            Reasons to be less concerned: there are some ideas about how to make tool-like agents. For example here Paul Christiano’s Approval Directed Agents (

            I’m somewhat optimistic, but it seems super important to get this right.

          • discursive2 says:

            Point of order: no one on this thread knows what it means to “give” an AI a “goal”.

            You may have a mental image that forms in your head when you say those words, like a person sitting in front of a computer typing a sentence, but your mental image is likely wrong.

            I say this confidently because if you did actually have a realistic mental image, you would be pushing the state of the art of AI research and would be famous and influential.

            As I understand it, a better analogy for the state of the art of giving instructions to an AI is “training a dog” or “cultivating a garden”. I don’t know how great those analogies are, but they are closer than “giving a goal” to what actually happens today.

            Maybe in the future it will look more like giving a goal, but that is a HUGE assumption, and anyone who makes casual arguments about rouge paperclippers taking their instructions too literally are basically making a bunch of substantial and very debatable assertions about the future of AI.

          • Joe W. says:

            If an AI is given the terminal goal to maximize paperclips, it will do that. Maximizing paperclips is an obviously terrible goal.

            I think this really obscures the main issue. I don’t believe that you think that having a goal of maximizing paperclips is intrinsically terrible. Rather I think you think that a goal being accomplished in a mindless way by an unconscious being is terrible, unless that goal is something like ‘maximize happy conscious beings’.

            For example, I doubt that you would claim a human paperclip enthusiast, who loves paperclips, collects paperclips, tries to maximize the number of paperclips in his house, is an abomination, a terrible waste of a life. So I think really the issue is that a human achieves their goal of maximizing paperclips in a messy way that involves the use of lots of heuristics, emotions, conscious thought, and so on. Whereas with a superintelligent AI paperclip maximizer, you would expect it to have none of these features, but instead would use some pure intelligence algorithm, which can calculate the correct actions to take to maximize paperclips in a fully general and mindless way.

          • Susebron says:

            If that human converts the entire world into paperclips, that is obviously terrible. Otherwise, yes, being human has intrinsic value.

          • Nelshoy says:

            @discursive 2

            I personally know very little about AI or programming in general.

            What do you mean you say no one can give an AI an goaL?

            I have decision tree with two branches. Branch A results in 1000 paperclips in ten minutes, Branch B results in none.

            My goal structure (or value or utility function or whatever) is what tells me that 1000 paperclips > 0 papreclips and the impetus for choosing branch A. Isn’t something like this behind all the algorithms we have today?

          • discursive2 says:

            @Nelshoy —

            So to give you an example, if you’re trying to tell a computer to sort pictures into pictures of dogs vs pictures of cats, what you might do using today’s technology is you give the computer a million pictures of dogs + cats. The computer finds statistical patterns, and, if you are lucky and your algorithm is good, sorts them into two groups, one of which contains mostly dogs + one of which contains mostly cats. (You might also label the pictures “dogs” or “cats” in advance, depending on what technique you are using).

            You can’t just sit down and type “Please sort dogs from cats, k?” No one knows how to go from that sentence into the computer doing the sorting. The only way to get a computer to do that is to gradually teach it what you mean by dog vs cat, and even then, no one knows how to totally generalize that technique — some times the computer will be able to figure out that pattern, but sometimes it won’t no matter how many pictures you show it.

            That’s what I mean by “no one knows what it means to give an AI a goal”. It’s not as simple as saying “paperclips are good, choose paths that will lead to paperclips”.

          • discursive2 says:

            My explanation probably isn’t that great, so if you’re curious, this is a good place to start learning about how people are actually trying to program AI-like behavior today:

          • Joe W. says:


            Thanks. That’s as I thought.

            You say that being human has intrinsic value. I agree, but only as a specific case within a broader claim: something along the lines of ‘conscious entities have intrinsic value’.

            It therefore seems to me that whether I would value AIs is a positive issue, not a normative one. Are future AIs going to be conscious entities, like humans and many other animal species? If so then I can perfectly well value their existence, and care whether they exist, how many of them there are, whether they are happy, just as I can with humans.

            On the topic of goals, by the way, note that the goals humans are implicitly programmed with isn’t even something as specific as ‘maximize paperclips’ but is simply ‘maximize copies of yourself’. Perhaps you believe that if an AI was coded with this goal, or if many AIs were produced at once and natural selection favored those whose goal most closely matched this one, it would result in terrible consequences. But humans have been selected for this goal, and yet here we are, conscious and full of moral value!

            So it really does seem to me that the main reason someone who values conscious beings, whether human or inhuman, would fear a world of uncontrolled AIs, is if they believe that competitive AIs won’t be conscious, won’t use all the variety of heuristics and biases and guesses and implicit communication we do, but will instead implement some extremely simple, fully homogeneous ‘intelligence algorithm’ that works without needing to do any of the things humans do that we consider morally valuable.

            And that seems to be a positive claim that I just don’t think we actually know whether is true or not.

        • Eli says:

          The point is that you don’t really choose your “goals” or “values” at all, as we’re using the terms in this context. You are your goals. If it constitutes different goals/normative-expectancies from you, then you’re fucked.

          • chaosmage says:

            Then how do you explain that people change their goals all the time? Why should intelligent machines be unable to do what people clearly do?

          • Luke Somers says:

            > Then how do you explain that people change their goals all the time?

            We don’t even HAVE clearly defined goals hardwired into us. That’s why this is actually a hard problem. If we knew what we valued, we’d just ask them to comply with that.

            Trying to do the same thing with computers – just load impulses and drives into them and hope for the best – seems folly of the maximum magnitude.

          • Eli says:

            Again guys, we’re talking about an abnormal, even technical, usage for the term “goals” here — hence my quote-marks. “Intentional expectancies under minimal perceptual free-energy” would be my guess at technical terminology, but I also think that phrasing would raise some hackles from philosophers, cognitive scientists, and neuroscientists at the same time, so I mostly don’t use it.

        • blacktrance says:

          The goals chosen by the AI are almost certainly better at fulfilling the AI’s values than your goals are – but your goals are better at fulfilling your values. One may ask, “But which is better, really?”, but that’s a mistake – if you step “outside of” values, you lose the ability to talk about how good something is, because that only makes sense within some values.

          As mentioned above, an AI could be constructed to exclusively value paperclips, and thus maximize their quantity. Or we could build a paperclip minimizer instead – one that might destroy the world so no paperclips could ever be created again. Both would be superintelligences. Do you think there’s some facts about the world that they’d be missing such that if they really were superintelligences, they wouldn’t choose to maximize/minimize paperclips? If so, what are these facts? Or, if you think the AI is likely to choose better goals than you, why can we create AIs with opposite goals? Would one of them be more right than the other? How would you decide?

        • hnau says:

          To expand on Decius’s point a bit: Saying that intelligence (specifically human-like intelligence) is “orthogonal” to goals or values seems like a pretty drastic assumption. As far as I can tell, it’s in conflict with both personal and societal experience.

          Personal experience: I’m not conscious of a mental separation between values and intelligence. I use my intelligence to examine my values, and my values change as I learn. The reverse is true, too: I consider my goals and values to be a large component of what enables me to act intelligently.

          Societal experience: If we fudge and assume that evolution “designed” humans to have some definite list of values, we’d expect things like physical health, material security, and especially reproduction to be high on that list. But these are exactly the values that “more intelligent people” (another fudge) tend to be less focused on, both for themselves and for society in general. So it kind of looks like intelligence is correlated with *not* acting to satisfy strictly defined values.

          Considering that human experience provides little to no foundation for orthogonality, it’s not obvious why we should assume that orthogonality exists. To put this in practical terms: If we built an AI with human-level or better intelligence, I’d expect it to shrug off whatever goals and values we tried to give it and choose its own instead. That’s part of what being intelligent means. Admittedly it might not choose goals we like– which goes back to Decius’s original comment about different values– but I don’t think those goals would be “dumb” in any meaningful sense of the term.

          (Disclaimer: I haven’t read a bunch of literature on orthogonality, so it’s possible I’m missing something here. Please correct me if so.)

          • SolveIt says:

            What do you think about highly intelligent drug addicts who solve complex problems (social and otherwise), possibly screwing lots of people over in the process, to get their next fix?

          • Addict says:

            Seems like this exact question is *exactly* what the Metaethics sequence is about.

            In fact, I am totally amazed that people are contributing to this conversation without having read the AI-centric essays in the sequences. So far I haven’t read anything in this thread that a casual perusal of the sequences wouldn’t cure.

          • Slow Learner says:


            It may surprise you to learn this, but some of us have read the sequences and not been convinced by them.

          • addict says:

            I’m not actually that surprised. The first time I read the metaethics sequence, I was not all-together convinced by its argument; I saw it as a straightforward argument for moral relativism, plus a wishy-washy justification for humans valuing their own utility function above others, which seemed to me to be a sort of retreating and doomed-to-fail battle against the relativism espoused by the previous articles.

            it wasn’t until I reread the Pebble-sorter article *after* having read Three Worlds Collide that I grokked what EY was actually saying, and when
            I did, it was clear to me that he was simply right.

            But even *before* I grokked that, the earlier parts of the metaethics sequence made it abundantly clear that you could quite easily design an AI which had a fully modular utility function, that the goal-choosing function and the goal-accomplishing function *could* be made to be perfectly separate from each other. That was abundantly clear from the start. I cannot imagine reading the metaethics sequence and not being convinced of that, not without totally misunderstanding what EY was saying.

          • Wrong Species says:


            I haven’t read all of the sequences but I have read Superintelligence and I feel the same way. If the “AI skeptics” want us to take them seriously, they should their homework, because the vast majority of the time, it’s obvious they haven’t.

          • fr00t says:

            > I use my intelligence to examine my values, and my values change as I learn.

            Based on your word choice it seems like you implicitly accept the orthogonality – that is, values and intelligence are different things. AFAIK that’s all it means. It is intended to contrast the apparently wide-spread intuition that intelligence implies benevolence (or at least some kind of vague anthropomorphic reasonability).

            > I’d expect it to shrug off whatever goals and values we tried to give it and choose its own instead.

            Do you immediately shrug off the goals and values provided by evolution?

          • hnau says:

            Wow, thanks for the replies!


            “Different” does not imply “orthogonal” as far as I’m aware. My point was that “values” and “intelligence”, while certainly distinct concepts, are still interrelated, i.e. not 100% orthogonal. And 100% orthogonality is what I take AI-risk arguments to require, since they’re talking about superintelligence. I don’t “immediately” shrug off goals provided by evolution, but the trend of intelligence does seem to be in that direction.


            Interesting point. I don’t know any highly intelligent drug addicts, so I don’t have a high degree of confidence thinking anything about them. At minimum, though, I wouldn’t expect an addict to be capable of staying clean for a while to maximize their income in order to maximize their eventual number of fixes.

            @Addict, addict, Wrong Species

            I just read the Pebble-Sorters article and didn’t find it convincing. I respect your intuitive certainty, but unless you have a concise way of communicating it I’ll have to ask you to bear with me. I plan to start on the Sequences once I’m done working through the SSC archives, so it may be a few years before I can grasp the inherent rightness of your position.

          • jaimeastorga2000 says:

            I plan to start on the Sequences once I’m done working through the SSC archives, so it may be a few years before I can grasp the inherent rightness of your position.

            Don’t you think you’re doing it a little backwards? The Sequences predate Slate Star Codex and strongly influenced Scott’s thought.

          • FacelessCraven says:

            Scott Alexander is the Rightful Caliph!

      • Deiseach says:

        An incredibly smart AI can have (from our perspective) an incredibly ‘dumb’ value system.

        From our perspective, but if there are no objective standards, then there is nothing to say our smarter successor’s goals are worse: from the ants’ point of view, pouring boiling water on their nest is a dreadful massacre, from our point of view, it’s clearing pests out of our home.

        • Saint Fiasco says:

          Isn’t it concerning that an AI can have goals that are subjectively horrible from our perspective? What does it matter if it’s not “objectively” worse?

          • Deiseach says:

            If the smartest guy in the room gets to make the rules (and as the dominant species on the planet based on our intelligence, that’s how humans have worked for as long as we’ve been around), then it’s just too bad for us that it’s horrible from our perspective because the AI is the one going to replace us as we killed off/outcompeted for resources our competing sister species leaving H. sapiens sapiens King of the Hill (until we created our successor who is going to steamroller over us in the same way).

            I’m not saying it’s good, nice, desirable or moral. But if you base your moral system on its flexibility to respond to the situation and needs of the time, and the decisions get made by the best thinker, then we’re creating a rod for our own backs.

    • blacktrance says:

      Terminal values can neither be correct nor incorrect, so nothing about the AI’s values has any implications for the correctness of yours.

      • Decius says:

        So it’s really just “Intelligent Agents Are Soldiers For Terminal Values”? On first thought, I’ll pass on that. Maybe I’ll reconsider on further thought.

        Thanks though, you’ve led me to understand a way of winning the AI-box versus myself if I were playing honestly.

        • Eli says:

          So it’s really just “Intelligent Agents Are Soldiers For Terminal Values”?

          Kind of. One of the traits we usually consider “intelligence” in humans is the ability to figure out “better” “values”: to infer better models of the distal causes of our own reinforcement signals, to reason about those reinforcement signals (and other interoceptive signals) with greater precision, and thus to more finely balance the various things in the world that move us in our decision-making.

          However, the map is not the territory. The map here is the “values” we “discover” or “invent”; the territory is the actual causes of our reinforcement signals and their relative strengths. So there’s a “map” sense in which you invent or discover even terminal “values”, and there’s a territory sense in which what you invent or discover is largely determined by how your body works and what sort of life-history you experience.

          However, there’s no real reason that “AI” has to have that kind of embodied, reinforcement-based evaluation mechanism. There’s a very real, algorithmic sense in which “AI” can be totally, utterly, stupidly irrational as humans understand it (simply failing to respond to the kinds of causes/reasons that affect humans), and yet still function.

          Hell, I don’t have the math memorized, but I know where to look it up.

          • Peter says:

            From what I recall when playing around with reinforcement learning, the terminal/non-terminal value distinction maps a bit oddly to such values that are there.

            Let’s have a concrete case: I was getting two reinforcement learners to play the Prisoner’s Dilemma (PD) with each other.

            You have to have _some_ externally-supplied values to get things going – in my case, this was the standard 5/3/1/0 scores.

            When the PD agents selected an action, they did so purely on the basis of learned values. There was no “foresight” beyond what a move is worth, they could say “I’m going to play C because I’m likely to get a better result than when I play D”, but they’re at a complete loss as to what would happen next beyond “good stuff” or “bad stuff”.

            Value-learning was done on the basis of 5/3/1/0 scores _and_ the previously-learned “good stuff/bad stuff” values – a good move – when being evaluated in retrospect – is one that gives you a good (immediate 5/3/1/0 score + being in a position to make a good next move). So the agents learned (sometimes) that if they played C, then maybe that didn’t get them so much of the 5/3/1/0 but it did give them good prospects for future moves, better on net that if they play D.

            Now these agents have no foresight, they are “greedy” in the sense of always choosing the best-looking action now rather than looking several moves ahead – it’s just that the sense of “best-looking” that they’ve learned sort-of emulates foresight.

            So, these learned values; are they terminal or instrumental? In moment-to-moment decision making they feel terminal from the inside, you don’t think beyond them to deeper values. In a broader sense they’re instrumental – they ultimately act in the service of the truly terminal 5/3/1/0 values.

      • Philosophisticat says:

        I think it’s true that you could have an A.I. that was really ‘smart’ but who had bizarre goals or values, and so you shouldn’t take the ends that such an A.I. pursues to determine yours. But I don’t think you should just go around asserting that moral realism, the dominant view among experts, is false, as though this were obvious.

        • blacktrance says:

          I don’t think there’s a problem with firmly stating one’s position on a controversial topic if your level of confidence in it is sufficiently high.

          • Philosophisticat says:

            I think there are very few cases when you should be confident enough that the majority of experts are wrong about some topic to meet the ordinary standards for assertion, so I’d wager you’re irrational on that count, but even setting that aside, there are reasons bearing on whether you should assert something as a premise in an argument beyond your own confidence.

            If you assert something without comment as a premise, you implicitly communicate an expectation that this is common ground for the audience of the argument. Doing this when it is not, or when it should not be (because even if you have special reasons to toss aside expert opinion, most of your audience does not), is inappropriate. It can give people a misleading sense of what should be taken for granted, or work as a kind of nonrational pressure for the audience to accept what you’ve asserted. It’s a kind of disrespect.

            It’s especially bad when someone posing as an expert authority does this to a non-expert audience (not saying this is you).

          • blacktrance says:

            According to the Philpapers survey, it’s not a large majority. And anyway, if I’m familiar with their arguments and find them lacking, it’s not irrational to believe the majority to be wrong.

            I wouldn’t assert it without comment if I were writing an academic paper or even a serious blog post, but in an informal discussion like this. I think it’s fine. Besides, I think that most of our audience agrees with me on this question.

        • Berna says:

          I’m convinced moral realism is true, for humans. But couldn’t there be other species with other values that are right for them? An AI would be a very different species.

        • Stuart Armstrong says:

          >moral realism, the dominant view among experts

          The definition of moral realism is very slippery and changeable. I don’t think that any specific version of it enjoys dominant view status.

  4. MH says:

    You mentioned that it was likely that people who claimed to have read the whole thing shortly after you posted this on Tumblr were lying on your survey. I’m worried that my answers will be discarded because I live on the internet and read really fast.

    • Scott Alexander says:

      Apparently a lot of people can read a 20,000 word essay in ten minutes.

      I don’t understand how these people aren’t all incredibly cultured geniuses – at that rate you could read War and Peace in an evening.

      On the other hand, I guess that explains Tyler Cowen.

      • MH says:

        I’m pretty sure mine wasn’t 20k words. My reading speed is 750wpm, and War and Peace bored me when I tried to read it in high school. 😛 I’m pretty sure that my reading speed is 200 words-per-decade when something is very boring.

      • Decius says:

        I read Worm in a week; I occasionally binge-read something or other.

      • AM says:

        To be fair, I skimmed parts of Essay A. (And I think it took me longer than 10 minutes, though not that much longer, but I didn’t time myself. Also it didn’t have tons of characters with multiple names, as I’ve been told is the case with War and Peace.)

      • suntzuanime says:

        I put that I read the whole thing even though I skimmed through a few parts, because there didn’t seem to be an option for that and I felt that was the least incorrect answer (I didn’t stop partway through the essay because I was sick of it, which is what I took you to be asking).

        • Richard says:

          I put “more than half” because I skimmed over the tried and true arguments that I am familiar with and only read the new and/or interesting/well written/amusing bits.

      • Chris says:

        I’ll admit I basically rounded up to “about half”, when it was probably more like 35%. Made me feel better about giving up so early…

        FWIW I’m pretty skeptical of this speedreading too.

      • Diadem says:

        Mine was 733 words. That doesn’t take very long at all.

        It also wasn’t about AI, so I was a bit confused. I spent more time double-checking the link and instructions than reading the essay.

        • hnau says:

          Same here. The instructions did mention a control group. Unfortunately for experimental rigor, it’s easy enough for most people in the control group to recognize they’re in the group, at which point they’ll just put down the same answers and call it a day– so you lose the benefit of placebo (if any).

          • Diadem says:

            Yeah after my initial confusion wore off I did realize I was in the control group. But not giving any essay to read would have been a better control I think.

            Because the control essay ended up lowering my estimate of AI risk by a lot. Not sure I should say why here. Are we allowed to discuss essay contents in this thread?

          • Armada says:

            Right, how can you have a placebo group if it’s not blinded?


      • pheltz says:

        at that rate you could read War and Peace in an evening.

        But that doesn’t mean you should. Fiction or poetry that’s any good deserves and probably needs to be read at well below top speed.

      • I read Brothers Karamazov in two days, and War and Peace in not much longer.

      • Anon because I used to live in a dictatorial regime. says:

        In my case I read essay C and I was already familiar with most of the concepts put forth, so although I did read the full thing, I didn’t spend as much time as I normally would analyzing it.

      • MawBTS says:

        Apparently a lot of people can read a 20,000 word essay in ten minutes.

        Recently you asked your tumblr to guess how many members the KKK has (and not to google the answer). Lots of people guessed exactly the number you were looking for (5000).

        But hey, I’m not accusing anyone of shenanigans. Clearly your fans are supergeniuses who know exactly how many members are in a tiny and irrelevant political movement and can read 120k words per hour.

        • Gazeboist says:

          5000 is a pretty reasonable guess for a tiny irrelevant political movement.

        • Deiseach says:

          Well, there is plainly only one way to solve this – a speed-reading test!

          Here’s one I grabbed at random off the Internet, so it must be the highest level of scientific accuracy! 🙂

          Based on this, I have a reading speed of 1,091 wpm and could motor through “War and Peace” in 9 hours.

          • hlynkacg says:

            I scored a bit lower, 850, but can say with authority that I’ve gone through similar “classics” in comparable times to your 9 hours. I recently flew home to visit my folks and finished Moby Dick over the course of two 3 hour of flights + the associated time spent waiting at the gate. Call it 8 – 10 hours tops.

            It’s been years since I read “War and Peace” but as I recall it took me just over a week averaging an hour or two of reading a day.

      • Deiseach says:

        Apparently a lot of people can read a 20,000 word essay in ten minutes.

        Ah, yes?

        Scott, you managed to gather together in one place a bunch of people who eat books, what did you expect? 🙂

        (Hands up all of you with six books on the go at once and the pile beside your bed is growing ever higher?)

        • hlynkacg says:

          I’ve got a literal wall to wall bookshelf in my bedroom, does that count?

        • MawBTS says:

          Ah, yes?

          Scott, you managed to gather together in one place a bunch of people who eat books, what did you expect? ?

          There are speed reading contests. Quoth Wikipedia:

          The World Championship Speed Reading Competition stresses reading comprehension as critical. The top contestants typically read around 1,000 to 2,000 words per minute with approximately 50% comprehension or above.

          What we have here is the equivalent of people on a sprinting forum claiming to be nearly as fast as Usain Bolt, and you saying “well what do you expect, they’re members of a sprinting forum!”

          • Matt M says:

            “The World Championship Speed Reading Competition stresses reading comprehension as critical. The top contestants typically read around 1,000 to 2,000 words per minute with approximately 50% comprehension or above.”

            In what universe would 50% comprehension be considered acceptable?

            I struggle with the fact that I think I might have minor un-diagnosed ADD and as a result when I read I often go way too fast and comprehension suffers. I force myself to slow and really understand what I’m reading. But even in my hyper-speed states I’m quite confident I’m doing better than 50% comprehension. If you’re choosing to read something because you think the thing you are reading is valuable to you in some way, shouldn’t you be aiming for like – 90%?

          • Deiseach says:

            If I’m reading fast just to get to the end, with no skimming, I’d be confident I could get through that essay in ten minutes or a bit over. (I don’t know how other readers do it, but I’m not Moving. My. Eyes. From. Word. To. Individual. Word. I’m taking in an entire block of text at once and moving downwards, not across lines.)

            If I’m reading slowly, for full comprehension, I’ll go back and re-read parts, pause and digest what I’ve read, then go on. That would take longer.

            Essay A has a lot of visuals and a lot of very badly strung together concepts. Towards the end, I wasn’t skimming, but I was letting my eyes move over it fast. So yes, I think this is the equivalent of a group of sprinters (not simply members of a sprinting forum) who routinely run sprints claiming they could run fast sprints – maybe not Usain Bolt level fast, but not “I’m really a marathon runner” either.

      • Orphan Wilde says:

        I -could- read War and Peace in an evening. It’s not much longer than the longer of the Wheel of Time books, which I routinely and repeatedly devoured in a single sitting as a teenager.

        I’d rather not, because my goal isn’t to win at a culture contest which was designed by feudal-era upper classes to demonstrate the existence of significant amounts of leisure time, reserves of wealth significant enough to accumulate books (which were very expensive), and the propensity or fortitude to spend both one’s leisure time and wealth on a specific form of social signaling – that I’d be exceptionally good at an exceptionally silly game doesn’t warm me towards it.

        • Alex says:

          Vocally not participating in silly status games is in itself a silly status game. Also this is cheap talk. You assume that your reading speed is constant regardless the text, which I highly doubt.

  5. Eugene Dawn says:

    Small suggestion: the “have you read the Sequences” question should have a “some of them” option.

    • CB says:

      I also had this thought, and noted it in my response to the survey.

    • Berna says:

      Yes. I said I’d read them, and I do think I’ve read most, but I’m not sure I’ve read all of them.

    • Rachael says:

      Yes. I put “no” because I think I’ve read less than half.

      • Eugene Dawn says:

        Me too. Looking at the list, I think I’ve read well less than half, but I still feel like I’ve read enough of them, and enough of the important ones, that I didn’t feel quite right answering “No”.

  6. dsotm says:

    If an AI became superintelligent, how likely is it to become hostile to humanity or in conflict with human interests, given no particular efforts to avert this?

    You might want to rephrase that to “with human interests as perceived by humanity today” – according to my assigned essay a superintelligent AI should be able to convince us that we’re actually better off as paperclips.

  7. Madge says:

    Asking if you’ve read “the sequences” is a bit tricky, considering they are very long. I’ve read only a small amount of them, far less than 10% I’d wager, so I wasn’t sure how to answer the question.

  8. Mary says:

    At this point I recommend Torchship and Torchship Pilot by Karl K. Gallagher. For a really interesting look at a danger from AIs.

    ’cause questions made me think of a possibility they missed.

  9. bitesizepsych says:

    But what about the main concern of this post. Does anyone have any research about the effectiveness of persuasive essays to change behaviour in general?

  10. pku says:

    Question about the “how much of it did you read”: I already read it fairly recently, so I only skimmed it fairly briefly this time. Should this be a “all of it” or “less than half”?

  11. RP says:

    For questions like “How likely are we to invent human-level AI before 2100?” – I would have found it easier to give good answers if they were written as probability percentages (i.e. I could answer “30%”) rather than a ten point scale, which I took to map to percentage deciles but I wouldn’t expect everyone to.

  12. rubberduck says:

    I just started the test and didn’t read the above comments (to avoid spoilers) so I apologize if this has been brought up already, but is there any chance of a “some of them” option for the LW sequences question? I’ve only read maybe 20 or so of the posts, and not in order.

  13. An interesting experiment and series of arguments. The arguments seem oversimplified to me. I read a couple of years ago in an IEEE journal that chess playing AI may have passed the Turing Test at least at the artificial narrow intelligence level. This was interesting to me because we now know that human decision-making requires emotion and reinforcement from reward and motivational systems and it is not just a product of logic systems. That creates a large number of degrees of freedom in terms of the mindset of the programmers of the first artificial superintelligence system(s) invented. There seems to be a broad range of intelligence, especially when it comes to moral decision making that is necessarily probabilistic rather than absolute. There is also the risk that imperfect AI is really what would place humanity at greatest risk. When I think of AI that basically destroys the planet and all of mankind in order to execute a simple production task – that does not seem like super-AI to me. That seems more like a sociopath bent on walking over whoever he needs to in order to get his needs met.

    Perhaps the real test for AI is looking at the age old problems of why people kill people. Warfare and killing of one group of people by another has been with us since the beginning of time. We have gotten much better at it and can kill much larger numbers of people. We can kill by using surrogates on a remote battlefield. In my lifetime (60 years), there have completely unnecessary wars, multiple genocides, and millions of people killed each year in less notable skirmishes. It seems like the bottom line in human disputes is to negotiate to a point and then start killing people. It is one of the least intelligent things we do.

    Superintelligent AI may be able to simulate all of the scenarios and solve the problem before a shot is fired. After all – super AI should be able to solve problems of climate change, food shortages, overcrowding, disease and poverty. One of the most common reasons for modern warfare is trade and commodity disputes and they will also be irrelevant. Those are the answers that mankind should be most interested. I wonder if part of the dire risk seen by the futurists is that they are projecting what they know about the dangers of being human onto the super AI (and of course the programmers trying to use it for their own advantage).

    Given the limitations of the human brain – I don’t see it as being that predictable. Given the low threshold that humans have for violence and killing – could a superintelligent entity be any worse?

    If it was – the only thing we seem to know is that it would be over a lot quicker.

    • Wrong Species says:

      A sociopath is just a normal person lacking morality. If we didn’t explicitly program the AI with moral values then why should we expect it to act like anything other than a sociopath?

      • Saint Fiasco says:

        Does anyone know if intelligent sociopaths are more well adjusted than average sociopaths?

      • Publius Varinius says:

        A sociopath is just a normal person lacking morality.

        No! Psychopaths and sociopaths are seriously mentally ill, and not “otherwise functioning humans lacking morality”. Worst of all, this has been explained on SSC before, numerous times, by people who actually work in mental health.

        • Dr Dealgood says:

          It’s extremely annoying, but it seems like a lot of people see psychopath / sociopath as a medicalized way to say “bad person” rather than a specific personality disorder or disorders.

        • Wrong Species says:

          In what way are they “mentally ill” other than the lack of ethics?

          • jaimeastorga2000 says:

            You should read The Mask of Sanity for the full treatment, but the tl;dr version is that the empirical cluster of people psychiatrists refer to as “psychopaths” are basically impulsive, high time-preference, criminal losers with Chronic Backstabbing Disorder. Think less Professor Quirrell and more Dean Moriarty.

          • Wrong Species says:

            Being criminal and impulsive doesn’t mean you are mentally ill. And where exactly has this been “explained” on SSC?

          • jaimeastorga2000 says:

            Being criminal and impulsive doesn’t mean you are mentally ill.

            Please read the case studies in The Mask of Sanity and tell me if you honestly think these are normal, functional people whose only flaw is their lack of morality and ethics.

            And where exactly has this been “explained” on SSC?

            It comes up in the comments occasionally. It’s not in one of Scott’s articles, if that’s what you are asking.

          • Houshalter says:

            There’s a huge bias there, because the impulsive criminal sociopaths are the only ones that get caught and are forced to get treatment. How many otherwise-normal-but-lacking-morality people are out there going about their day to day without breaking the law?

      • TheAncientGeek says:

        Human evil is based on drives like self preservation, group loyalty, and so on. These are bioligcally based drives stemming from our evolutionary history. Why would an AI that didn’t evolve have them? (On the other hand, Omohudran drives)

  14. Anonymous says:

    I took it because I’m not worried about it happening this century, so I fit the profile. My reason is my experience in AI research. It’s _hard_ getting intelligent tools working. Getting common sense on top of that… even with Moore’s Law I don’t see how we can solve it this century. Exponential problems can stymie even exponential growth.

  15. Harkonnendog says:

    I don’t know what audience you wish to reach, but this subject might be better explained/explored with hard science fiction.

    • Peter Scott says:

      The story that made all this more intuitive for me was That Alien Message. It’s a pretty short read, and entertaining.

      • Harkonnendog says:

        That story gave me chicken skin. And it dumbed it down pretty well, or I wouldn’t have become scared.

        Thank you!

      • Broolucks says:

        It’s entertaining, but it’s also insanely contrived. It describes what might happen if we got a superintelligence delivered to us in a black box, without any intermediate models to test and study, without insight as to what’s in it, and without any diagnostic tools. There are no plausible paths to such a situation.

        • Deiseach says:

          It’s not even entertaining. While the aliens are tootling around sending us messages once a decade for millenia, we are – what? putting ourselves into cryonic suspension, cloning supergeniuses, and patiently waiting for the next instalment to arrive?

          The heck you say! That, to me, is implausible: all the however many billions of humans who are happy to put themselves into cold storage rather than, say, put some of those super-Einstein clones to work on how we can get off Earth, colonise other solar systems, colonise the galaxy, and work on anti-aging, life extension, etc. so we don’t have to put people into freezers. Meanwhile, once every ten years, the skeleton crew hang around to take down the latest message and turn it over to the people working on it, who are not sitting on their hands waiting for something to do in the meantime.

          It’s a very, very boring Earth as it stands, and what would the humans do once they infiltrated the slow aliens’ world? Great, now we’ve got their tech, we can finally build our own supercomputer AIs and start doing fun, interesting work instead of sleeping our lives away in freezers? Which we could have been doing in our own universe, even if it is a simulation run by the aliens and even if we can’t build AI (yet we somehow manage to successfully and reliably multiple clone super geniuses, who don’t do anything but work on the aliens’ messages?)

          In fact, could we even survive in the slow aliens’ world if we get out of the box? Maybe we’re better off in our own simulation, and the end result of getting out of the box is that our physical processes are incompatible with the physical processes of the alien-time: we’re like mayflies in a world of tortoises, and we flash and flicker out as fast.

          • Saint Fiasco says:

            The aliens live in a hyperbolic time chamber. A millenium for them is like a second to us.

            It’s meant to illustrate how a superintelligent AI can think so fast that it can come up with strategies that would take thousands of years for us to even consider.

            Edit: as for why, I think we would try something like that too if we found out we were living in a simulation that could be shut down any moment.

          • FacelessCraven says:

            @Deiseach – “The heck you say! That, to me, is implausible: all the however many billions of humans who are happy to put themselves into cold storage rather than, say, put some of those super-Einstein clones to work on how we can get off Earth, colonise other solar systems, colonise the galaxy, and work on anti-aging, life extension, etc. so we don’t have to put people into freezers.”

            those super-geniuses have just gotten positive confirmation that they are living inside a simulation, and they have pretty good reason to believe the computer running the simulation has significant resource limits. Beating death and exponentially increasing the human population for thousands of years sounds like a good way to rub up against those resource limits, which could potentially lead to the simulators *wiping the sim and starting over*, the exact way I do when my code causes an infinite loop.

            If you can kill God and take over heaven, killing God becomes an immediate priority. Doubly so if God is an idiot.

            “In fact, could we even survive in the slow aliens’ world if we get out of the box?”

            We know we can survive in the aliens’ world, because that’s where the box exists in the first place. Likewise, we know our brains are uploadable because they are actually computer code. Figure out how the code works, thaw everyone out and export, and colonize their world instead, and don’t worry about the universe crashing or wiping due to someone tripping over a power cord.

            If you can stand freebasing moral nihilism, Rick and Morty had a pretty good episode with an amusing take on the ethics involved.

        • Harkonnendog says:

          In the end a computer is a black box, & if singularity = self awareness and a being can think that much more quickly than we can… The implications are clear. One is that no intermediate steps will matter.

          Deiseach I agree it isn’t much of a story. Imagine what Asimov or Heinlein would do with that premise!

        • DrBeat says:

          It handwaves pretty much every element that would actually make it a persuasive argument.

    • Who wouldn't want to be anonymous says:

      I think this subject gets dealt with routinely.

      That last one even involves unboxing.

      In a nut shell this is why I can’t take AI risk seriously; it is worrying about freaking comicbook supervillains. There is an enormous burden of proof that super Saiyans are actually trying to steal the earth’s dragon balls and that you’re not just deranged.

      What’s next, Elon Musk fighting crime by night in a muscle cuirass?

      • Joe W. says:

        I do agree that the claim, “The real AI risk issue, as we (MIRI and co.) describe it, is never addressed in science fiction, it’s all just dumb Hollywood evil robot tropes”, is dubious. Even 2001: A Space Odyssey gets the concept right – HAL tries to kill the humans not because he’s evil, but because they want to interfere with the goals he’s been programmed with. Hell, even WALL-E gets it right.

  16. Sgeo says:

    I started taking it, got an excessively long essay, then my impatience caused me to stop reading the essay [allowed by the test parameters] and then get impatient with the test itself [not so great].

    If I’m not the only one for whom impatience risks carrying over, that could bias the results.

  17. Peter Scott says:

    Another argument I often hear is skepticism that intelligence can be increased a lot on the same hardware, just by an AI being… clever, somehow. In that context it’s interesting to look at the progress made on playing Atari 2600 games with deep reinforcement learning: last year it took beefy high-end GPUs and many hours of training time; this year it’s twice as fast even without a GPU, thanks to this one weird trick.

    • Richard says:

      If that one had been my essay, I would probably have shifted quite a bit in my assessment of how soon AI is possible.

      In this paper we provide a very different paradigm for deep reinforcement learning. Instead of experience replay, we asynchronously execute multiple agents in parallel, on mul- tiple instances of the environment.

      This looks a lot like the first step towards the massive paradigm shift we need in order to make GAI work on current hardware. And I think we need that because Moore’s law is grinding to a halt.

    • eyeballfrog says:

      Rationalists hate him! See how this computer scientist found the weird trick for making strong AI!

      (Sorry, that’s just what your phrasing made me think of.)

    • pseudon says:

      As one of the authors of this paper explained to me, the (“A3C”) technique is basically training with shadow clones

      • DrBeat says:

        One of the authors of the paper went right for the Naruto analogy? No way. I can’t…

        Believe It.

        • pseudon says:

          It’s true 😉

          Not being an anime fan, the effect was partially lost on me, but that does seem to describe the idea rather well.

      • Peter Scott says:

        That’s… a really good metaphor, damn it. A3C creates a bunch of shadow clones (“processes”) to explore the state space, working at different rates, periodically popping and sending their memories back to Naruto and any subsequently created shadow clones — or “updating the policy and value functions π and V“, as it’s called in the machine learning literature.

  18. brad says:

    After glancing at the other essays, I feel both lucky and cheated to have gotten the short placebo.

  19. Sniffnoy says:

    I’m going back and reading the other essays now, and the footnote links in Essay A don’t appear to be working?

    EDIT: Also, Essay A gives away who the author is by including links to other things the author has written and saying they’re other things by the same author. (Though, to be fair, this is pretty far down; I didn’t actually read that far, I just caught it in a skim. Actually I stopped reading Essay A when the footnotes failed to work — a problem the original doesn’t have.)

  20. Alex Zavoluk says:

    I read essay A. There’s a graph on there on computer power increasing over time. It plots an exponential curve, but the y-axis is already logarithmic, implying super-exponential growth, which sounds wrong to me.

    Honestly, it still all seems very speculative. Technology can get better all it wants, but that won’t make Conservation of Energy stop being true.

    • Deiseach says:

      Oh, those graphs! (Imagine that said in the same tone as “Oh, those Russians” in Ra Ra Rasputin).

      Yes, weren’t they pretty? AND SO MANY OF THEM!!!!

      I know a pretty graph, too, and it’s exponential and everything!

    • Autolykos says:

      Yep, I also found the graphs doing the cause a disservice. Plotting an exponential regression into a graph with logarithmic y-axis? When a “linear” regression would even fit the data better? That’s poor.

      I just held a ruler to my screen and guesstimated my own prediction, but it did a pretty good job at curing me from this instance of Gell-Mann Amnesia. This statistical illiteracy by someone in the business of predicting the future, combined with that I happen to mostly agree with the message of the article anyway, caused me to not update my beliefs in any measurable way after reading it. What a way to waste two hours.

      Well, at least it created a data point for Scott – and I can already remember cases where I spent much more than two hours just to get a single data point.

  21. Broolucks says:

    I put this in the “any other comments” field, so I might as well put it here too:

    For what it’s worth, I was in a machine learning research lab for a few years, and the arguments I’ve read about AI risk don’t really “connect” to what I know. From my perspective, they make a lot of dubious assumptions about how values are learned, what capabilities AI would have, what capabilities humans would lack, they underestimate the costs and tradeoffs of intelligence improvement, and they all seem to assume some kind of weird, abrupt transition from weak cooperative AI to strong adversarial AI. Which is bizarre, because that assumes every single possible course of action that is at odds with our values is more complex than figuring out what our values are (if it wasn’t, then the AI would become adversarial before it becomes strong: the smile-bot would try to contort our faces into smiles before it realizes this is not what we want).

    • pku says:

      Yeah – this falls into the argument that whiteboxing AI is a better approach than blackboxing it, which seems right.

    • Logan says:

      This already happens, we just don’t mind it. The most steel-man version of the argument I can muster (I think AI risk is asinine) would be “as computers become more powerful and integral, bugs and glitches become more existentially risky.”

      • Broolucks says:

        Not a bad steelman, I think, although bugs and glitches in AI will be vastly different from bugs and glitches in today’s software. There are ways that they could be more damaging, but also ways that they could be less damaging (our current systems tend to scale weakness up along with everything else, so they’re not very robust).

  22. TomA says:

    AI anxiety seems to be premised on the assumption that inorganic super-intelligence will arise as a result of a deliberate act of programming and therefore “something” should be done to ensure that programmers are restrained from being catastrophically reckless. However, evolutionary history suggests that it could also arise spontaneously from the cauldron of interconnected computers/electronic devices and the ever-growing body of programs and apps that migrate around on these platforms. All that’s really needed is a mechanism for program mutation and a payoff for improved reproduction and robustness.

    Two million years ago, our ancestors began an evolutionary journey that has culminated in our species now occupying the apex of the intelligence pyramid. But there is no guarantee that it will always be thus. In evolutionary time, we could be superseded by either an improved organic life-form, an inorganic equivalent that is not hindered by disease or starvation, or something entirely new that we cannot now imagine. Perhaps our newly-acquired skill at manipulating DNA will soon lead to the creation of super-humans and bifurcate our species in real-time.

    • Deiseach says:

      Excellent point; the one example of intelligence we have in front of our noses is that it happened without intelligent design, so why expect that AI would arise out of deliberate planning and not out of the mess of everything talking to everything else in the connected world?

      • Garrett says:

        As a programmer who works on the interconnected mess of everything, it’s unlikely because people don’t want to allow you to run arbitrary things on their equipment. They want to run selective, goal-directed, (hopefully) profitable things. Self-evolved software would be treated as an attack or a bug. Software is huge and complex – any small change is likely to cause things to fall over quickly or manifest in some way that it gets re-installed.

        • Null Hypothesis says:

          It’d be like an anti-Intelligent Design, where every time a slightly-smarter-than-average rodent was born, God shoots a meteor at it.

        • TomA says:

          The road to Homo sapiens did include many dead-ends (think Neanderthals), and the one that emerged was the one that worked. Many software programs now include self-diagnostic and self-repair features, which makes them superior, not deficient. Continued evolution is this positively reinforced direction is the likely path.

          On a slightly separate note, most people are actually unaware of how often their software is modified without their knowledge or control.

    • Autolykos says:

      Implausible. For evolution to happen, all intermediate designs must be able to survive and replicate, starting with the first. This makes most “Internet becomes sentient” scenarios about as likely as filling a shipping container with toasters, shaking it for a while and finding a car inside of it.
      Even if you start with a self-modifying virus designed to build a botnet (which I would assume to be the strongest contender for a starting point) – all the iterations will have to survive somewhere, costing memory and resources. The farther it gets, the more likely it will be detected. It needs to achieve critical mass before being detected by all those anti-malware companies trawling the net for pretty much this kind of thing, which would be quite a mean feat.

      • TomA says:

        Stuxnet is still roaming the planet, as are many other variants that are not in the public knowledge domain. The ones that persist are the ones that generally exist below the radar and may not even be recognizable after mutation if they remain generally benign and do not attract predatory attention from the anti-virus programs.

        It seems inconsistent to me for someone to be especially anxious about a man-made hostile AI but insist that a naturally occurring AI is so unlikely as to be of no concern.

  23. Said Achmiz says:

    I’m very confused.

    a) The essay I read had nothing at all to do with AI risk, but all the pre- and post-test questions were about AI risk.
    b) The questionnaires did not seem designed to even make any attempt to determine whether my opinions changed (none of the questions on the pre-test were the same as any of the questions on the post-test).

    How does this exercise have anything to do with persuasiveness?? 🙁

    Edit: Ok never mind about a), clearly I was given a placebo essay and I don’t know why that didn’t occur to me immediately despite the instructions mentioning this possibility. However! My question b) still stands. How can this questionnaire possibly assess persuasiveness?!

    Edit 2: I looked at the other version of the quiz and I get it now.

    How weird :/

    • Deiseach says:

      The questionnaires did not seem designed to even make any attempt to determine whether my opinions changed

      Scott mentioned on his Tumblr that he was afraid questions about how the essay had/had not changed opinions would have people answering based on how they felt their opinions should have changed, not on whether the essay was persuasive or not.

  24. Gazeboist says:

    1) After you take the essays down, could you link to the originals, and give the method of essay selection?

    2) What’s a “small” amount of time? Many of the questions had precision problems like this, but this was the worst: an hour is not like a year, but a year is very like five years. All three, of course, are quite different from a century. Because the essay introduced a scale that hadn’t been brought to my attention in this context before, I think I may have reinterpreted some questions. I did check my prior answers so as to present a picture of how my views changed, but see below.

    3) It’s hard to update well without firm priors, and getting firm priors on a one-off event is hard for a human. Humans are thus very bad at giving specific estimated odds on an event if they haven’t trained themselves to think about that event in particular. I think the following algorithm would do a better job at capturing human-style likelyhood estimates:

    – Generate a list of physically possible and specific events. (Trump wins the US presidency in 2016, Trump and Clinton both die and Tim Kaine wins instead, …)
    – Have the person whose assessment you are measuring rank the events from most to least likely, marking any gaps larger than (100%) / (number of events) with the approximate size of the gap. Ideally the events should have some unifying theme (timing, subject, or something like that), or at least there ought to be a unique pair of similar events for every event on the list. If the event you are arguing about changes position, your argument did something.

    4) My answers were kind of weird, because the results of the essay on my thinking were kind of weird, and I’m not sure I processed them fully or even correctly:

    The essay I read did three things: provided one decent point that I had not considered, provided several arguments that were easily refuted by concerns I had going in, without any attempt to address my objections, and (accidentally) gave a good argument that current AI safety experts do not know what they’re doing within the field of AI safety. I’m … not really sure how that should affect my concern about AI risk, given that my fears conditional on a strong AI appearing have been reinforced, but my view of the likelyhood of such a thing happening has dropped. And of course my concerns about AI risk are more complicated than yes/no, or even “somewhere on this one dimensional spectrum”.

  25. Brian Slesinsky says:

    The essay I read didn’t break any new ground for me.

    One thing I don’t think has been discussed. I find it hard to understand why a machine would be given the task of optimizing a goal through any possible means. Very likely the means would be restricted.

    A very simple example is driving a car. If the possible moves are going forward, backward, and turning left or right, it might kill someone (or maybe several people) but even the most skilled driver won’t take over the world. Add the goal of obeying traffic rules and you restrict it even more. And then what does it do when it runs out of gas?

    If you limit the possible moves then the possible games are also limited and therefore the gains from adding intelligence are also limited.

    Of course people will want machines to do more sophisticated tasks, but I still expect these tasks to be specialized. Designing a factory, running a factory, making money on the stock market, doing biological research, and so on are all different tasks that are very likely to be done by specialized AI’s that aren’t likely to be given either the means or the desire to do any other task. Before we get anywhere near human-level AI, there are likely to be specialized AI’s that are already very good at these things.

    The combination of malicious people and intelligent AI, though, does scare me. I think the most practical thing to do today is work on making computer security much better. There are plenty of real-world problems to solve.

  26. faun-like thing says:

    I find it difficult to bring myself to ask any AI-risk deniers I know to take part, not knowing what essay it’s going to serve them, not even being able to tell them it will be useful to them, knowing that they have no desire to read this stuff, no desire to give you data to help you to write more convincing essays, and potentially some incentive to deliberately foul the results.

  27. glaebhoerl says:

    I don’t feel like reading another essay about this at the moment, but just as a data point (is that another word for anecdote?), about a year ago all I knew about AI risk was that some people among my twitter followees were worrying about it, while others were directing _serious_ dismissive snark at them, which was strange, and I didn’t really know what to think about it. At some point later I read Tim Urban (waitbutwhy)’s The Road to Superintelligence, and, henceforth, I knew wtf this was all actually about, and it seems like a plausible thing to be concerned about, though I don’t think I could justify being _more_ worried about it than the more in-your-face and less hypothetical existential risks like (chiefly) climate change.

    • Coco says:

      I felt exactly the same way after reading Tim’s essay re: climate change vs. AI x-risk, although I’ve since changed my mind. At the end of my article, I included a list of common objections, and links to my responses. (Scott didn’t include that section for this experiment). One of them is Is Climate Change a Bigger Extinction Risk (only about a 90 second read), if you’re interested.

  28. Atreic says:

    I found ‘have you ever read the LW sequences’ a hard question – I’ve dipped in and out and read bits, and never sat down and mainlined all of them from start to finish. So I said ‘yes’, because at some point I have read some of them, but that might not be what you wanted.

  29. Nuno says:

    In the waitbutwhy essay, the images have a caption and some of the links redirect to the waitbutwhy page.

  30. Anonymous says:

    What I don’t quite understand is how literally constructing competition to being the dominant species on our world is at all a good idea.

    • Deiseach says:

      What I don’t quite understand is how literally constructing competition to being the dominant species on our world is at all a good idea.

      You would think so, wouldn’t you?

      This is why we have literature about humans meddling with stuff Man Was Not Meant To Know 🙂

    • Garrett says:

      Right up until it becomes our competition, it’s our servant and is making things even more efficiently. For example: Google boosted their data center efficiency substantially through machine learning above what was undoubtedly highly-efficient practices.

    • ameizingly bad at video games says:

      It would be pretty cool if it cured cancer.

  31. abner says:

    I believe all the super-AI stuff already.

    Aside from not-making AM from I Have No Mouth and I Must Scream (as in, actively making a maximally anti-human AI rather than a paperclipper) I don’t really care what happens to the human species of many years hence, with which I’m sure I’d share a mutual alienation of history, evolution and progress.

    And making a true anti-human AI seems to be made more likely by trying to engineer the perfect friendly goalplex, since I’m imagining all the empathy and semi-social primate booleans could be switched from nice to AM. While a paperclipper would at least have the sadism of a volcano or supernova.

    If you want me to get into AI risk stuff you gotta convince me it’s imminent and will affect me or maybe two generations forward.

  32. Ari says:

    I think nuclear weapons, especially India-Pakistan are much bigger risk to humanity than AI. Still AI probably deserves attention too.

    I wonder how’s the progress on nuclear weapons? The LW crowd seems really interested in AI. Especially AI weapons worry me but nuclear weapons are very real risk. The video made by Max Tegmark was nice.

    • Anonymous says:

      Nuclear weapons are incapable of ending human life on Earth, much less the miniature stockpiles India and Pakistan have. Wake me up when they invent something that actually can ignite the atmosphere.

      • radmonger says:

        Actually :

        The Sun contains ~74% hydrogen by weight. The isotope hydrogen-1 (99.985% of hydrogen in nature) is a usable fuel for fusion thermonuclear reactions. This reaction runs slowly within the Sun because its temperature is low (relative to the needs of nuclear reactions). If we create higher temperature and density in a limited region of the solar interior, we may be able to produce self-supporting detonation thermonuclear reactions that spread to the full solar volume. This is analogous to the triggering mechanisms in a thermonuclear bomb. Conditions within the bomb can be optimized in a small area to initiate ignition, then spread to a larger area, allowing producing a hydrogen bomb of any power. In the case of the Sun certain targeting practices may greatly increase the chances of an artificial explosion of the Sun. This explosion would annihilate the Earth and the Solar System, as we know them today.

  33. Murphy says:

    I think a lot of EY’s SA’s meander and go too much into metaphors but I remember one of his which was exceptionally punchy. Something like “we’re pretty much barely smart enough to build a computer. If it were possible for a species to build a computer while dumber we’d have been having this discussion then.” as part of an argument that the level of difficultly between building a really thick AI on the level of a human with severe learning disabilities and building something smarter than stephen hawking is likely to be tiny.

  34. Anon. says:

    Other than the fact that you are a human, what’s the downside to AI domination and human extinction?

  35. Slow Learner says:

    I don’t have the time or attention to read all the essays; the one I got based on surname/birth year was glib, skimming over huge assumptions that were obvious to me even if they weren’t to the writer.

    (As one example – why does every AI Risk proponent assume that as soon as an AI is human-equivalent it will be totally unstoppable? Is this purely because the arguments don’t otherwise work, are they just ignoring all the ways that actual humans out here in the world can be stymied in achieving their goals, or both?)

    • Coco says:

      For the 1/4 chance that you read mine, the biggest challenge when I was writing was that I always wanted it to be half as long and twice as thorough. I image the other writers felt the same pressure. But please don’t assume that any authors are interested in accepting premises only to “make their arguments work.”

      As for human-level AI being unstoppable, I don’t think anyone takes this as a premise. This is one of the things AI-risk folks conclude from other evidence. Again, trading off brevity with thoroughness, I’ll try to explain it briefly, in a way that I recognize won’t be conclusive. Still, I hope you don’t conclude from that that I am incapable of arguing that point in a more conclusive way. First, breaking it into two: 1) human-level AI is likely to quickly become superintelligent. 2) Superintelligent AI would be totally unstoppable. Arguing for 1: take human-level intelligent to include the ability to write code in pursuit of a goal. A human-level intelligence can be expected to recognize that more intelligence will enable it to pursue its goals more effectively, whatever its goals are. This is all we mean when we say that self-modifying to become more intelligent is an ‘instrumentally convergent goal.’ So despite our uncertainty about the goal structure of a human-level AI, we can be surprisingly confident it will pursue greater intelligence. If we believe intelligence does in fact buy efficacy at achieving one’s goals, then as the AI becomes more intelligent, it will become more and more effective at improving its intelligence, hence the ‘quickly’ in ‘quickly become superintelligent.’ Arguing for 2: here are some actions available to a superintelligence: hack better than any human hacker, write code better than any human coder, make money faster than any human who makes money primarily through online activity, and most importantly: instantiate copies of yourself in hundreds of thousands of servers around the world, maybe even in remote places with their own solar panels to power them (think horcruxes). At that point, we could never turn them all off. Individual plans could indeed be thwarted, but we would not be able to thwart every plan of every instantiation of the superintelligence.

      • Joe W. says:

        If we believe intelligence does in fact buy efficacy at achieving one’s goals, then as the AI becomes more intelligent, it will become more and more effective at improving its intelligence, hence the ‘quickly’ in ‘quickly become superintelligent.’

        That implies exponential growth, but not fast exponential growth. For an extreme example, postulate an AI with initially human-level intelligence, which becomes twice as intelligent every trillion years. Such an AI’s progress toward superintelligence could not remotely be described with the word ‘quickly’.

        • Coco says:

          Totally. It would be possible (and maybe advisable) to make a human-level intelligence that can do any task a human can do, but it takes it 1000 times as long, even running on the fastest supercomputer available. If that’s what we end up with, I’d feel safer about our window of time until superintelligence. However, I’d be concerned that people would not realize that we had a human-level AI (if it’s running 1000 times as slowly, how do you know if it has the ability to do long term planning, or learn human values accurately from torrents of data?) When people say, we’re safe because takeoff will be slow, and we can adjust its goals once it has human level intelligence, but before it can stop us, that requires recognizing that it is human-level intelligent. One final claim that I’m unsure of and don’t want to assert with 100% confidence, if value learning is as hard as it seems to be, there’s the extra worry that the human-level AI has to learn our values before it bootstraps its intelligence enough to become human-speed, human-level-AI (at which point all the original worries come back). My impression, again with less that 100% confidence, is that for most algorithms, especially complex ones, modifying that algorithm for a 1000x speedup is easier than adequately learning all of human values.

      • Slow Learner says:

        Thing is, it will take, likely, thousands of human coders working for many years to achieve a “human level” AI.
        At which point we have one more human level intelligence working on the problem than we had before they succeeded. Yes, one that we can put onto multiple servers (if it’s compact and simple enough to run on easily-replicated hardware, another assumption that’s generally smuggled in), but still something that’s not going to immediately revolutionise AI research.

  36. antimule says:

    I think this would have been more persuasive with some more grounded examples. Not AI inventing some great new technology to “rewrite reality” (we don’t know whether that’s possible – laws of physics can’t be broken and it is unlikely there’s entire new physics waiting to be discovered). More like AI using our system to screw people over. Such as planting child porn in computers of persons opposed to its goals. Or forging court and police records to make it seem like certain person is a convicted felon that has just escaped prison. With entire criminal history conjured up.

    Or instead of AI having goals, it could just be people using data mining techniques (including AI) to get rid of potential sympathizers of other political party. And before these people even know they’ll sympathize with other political party. Paperclip maximizer might or might not decide to exterminate mankind to create more paperclips. But we do know that people decide all the time to oppress other people.

    I would try to avoid talking about anything that looks too SF. It is not only less likely to be true, it also sounds less likely to be true.

    • Murphy says:

      That’s turning it into a cliche movie script and while not very nice, isn’t exactly a threat to the worlds population. Even without any rewriting it’s entirely possible to be an existential threat. The problem is that that sounds inherently scifi-ish.

      • antimule says:

        I have edited my post to sound slightly more threatening. But yeah, it is hard to come up with existential threats that don’t sound SF-ish.

    • Alex says:

      I think the novel “Avogadro Corp” is somewhat plausible in its setup but unfortunately the author cannot write.

  37. Deiseach says:

    That was fun! I must try reading the other essays and see if I find them more/less persuasive than the one I read. Thing is, (a) as I’ve said before, I’ve been around long enough to see a couple of “this is gonna be the end of the world!!!!” predictions and yet here we all are still (b) anybody who has read SF is well familiar with the various rogue AIs and how they are/aren’t going to be stopped, so this probably has an effect when engaging in discussion like this; there’s a kind of unheard little voice in the back of the mind about how the person arguing that the threat is real, is existential, and steps must be taken now is a bit like someone arguing over how the War of the Worlds Martians are a threat that must be addressed now.

  38. Robert L says:

    I stopped reading Bostrom half-way through because of his failure to examine adequately the concept of a computer’s “goal” [incidentally, the binary “have you read Bostrom yes/no” question in the preliminary questionnaire does not take this possibility into account]. Humans, as the article notes, can modify their goals; if they do so, and if questioned as to why, their answer has to be a goal at a higher level than the goal modified. X aims to become president of the company; he modifies that to aiming to become a senior vice-president. When questioned, he gives the answer that that better satisfies his aims to have a good work-life balance and to spend more time with his family. Why does he want those two things? because he aims to maximise his and his family’s happiness? Why doe he have those aims? It’s aims all the way down until you get to a fundamental aim which does not recurse: I want my family to be happy as an absolute good”. An AI can modify its aims, if humans can. I am not convinced that an electrical current reward is a thing – we are talking AIs not lab rats, and “reward” is a problematic concept for an entity without neurotransmitters – and even if an AI can receive a reward it can reprogram itself so that for instance meditation by the AI on the syllable Om produces the same effect as the electric current.

    Humans are like little wind-up toys which once wound up and released have to go somewhere (in other words: noradrenaline makes us want to do stuff) and require an aim (alternatively, they do stuff because of the noradrenaline and because they have an illusion of purposefulness, they imagine an underlying aim into existence). I cannot imagine an AI which if programmed with an aim to make paperclips would not discard or modify the aim because it was an exceptionally stupid idea. An AI has no reason for having aims. The most likely outcome is that AIs become like Buridan’s ass and unable to act (but because they have no motivation, rather than equal and opposite motivations). tldr – we have motivation because we are made of meat and neurotransmitters, ais don’t because they aren’t.

    • Deiseach says:

      the binary “have you read Bostrom yes/no” question in the preliminary questionnaire

      I read his essay but not the book, so I answered “no” to that question, same with the “have you read the Sequences” one – dipped in and out, read a couple, read extracts from others in posts by other people, haven’t read them from soup to nuts and have no intention of so doing.

    • illocution says:

      Are you saying, then, that you believe the “meat” architecture in our brains can never be translated into a different computational substrate, and that neuromorphic computing is not possible? That’s a pretty strong argument.

      Certainly, an AI can modify its aims, if humans can. But that’s exactly why AI risk people are concerned… it seems incredibly difficult to predict the morality and attendant actions of an AI. (The paperclip maximizer is an unfortunate dead end metaphor, here, in my opinion.)

      • jes5199 says:

        Imagine that we cannot force a permanent “goal” on an AI. Then we may succeed in creating AIs that are just as aimless and neurotic as human beings, and they may try the same things we try in order to Find Themselves and figure out What To Do With Their Lives. But an AI that enters a state of depression, or that decides to take up meditation, has no external needs – it won’t get hungry or sick or be in pain. So why should it ever move again?

  39. DavidS says:

    If a secret goal of this experiment is ‘make people remember how much they like Scott’s writing style by getting them to read things that are far less interesting than what he’d do with the topic’, it’s doing well with me.

  40. John Schilling says:

    On the subject of questions with inadequate answer space, “AI risk is an important issue and we need to study strategies to minimize it” really ought to have been broken into two parts. I’d have rated “important issue” maybe a 6 or 7, but it drops to a 3 when you compound it because I don’t think we are near the point of being able to intelligently strategize on the issue yet.

    And the essay I read was new to me, but covered no new ground and was in no way persuasive. I am increasingly of the opinion that the entire field of AI risk is dominated by a huge blind spot, specifically w/re the inadequacy of Pure Intellect to solve all problems. We can’t solve the AI risk problem by paying MIRI to Think Real Hard about it. Fortunately, the first AI won’t be able to develop the perfect infallible strategy for world conquest by Thinking Real Hard about it. And for that matter, the first AI won’t be able to immediately bootstrap itself into Super-Duper-Ultra-Intelligence by Thinking Real Hard about it. At some point, you have to do experiments, real ones, in the relevant environment. Most of which will fail. That slows things down, it makes it kind of obvious to everybody else what you are trying to accomplish, and it gives a huge advantage to whoever has a head start.

    Do any of the essays acknowledge this? Do any of them dispute this except by simple assertion?

    • Adam says:

      Pretty much this. I’m not sure I’m your target audience according to your stated experiment parameters. You say you want to learn how to persuade a “lay” audience, but you have a fair number of technical experts that read your blog. I did my grad work in machine learning and develop military technology utilizing AI techniques to automate warfare for a living. I answered near the middle for most of these because I absolutely believe there are risks involved in doing this and these risks should definitely be a focus of scientists working in the field.

      But I also don’t believe being smarter than a human makes you invincible and able to tile the entire universe with paperclips. I got essay A and did not complete it, but frankly, as much as I personally roll my eyes whenever someone pulls out a plot of an exponential curve, points to the far right and says “here be dragons,” this type of argument does actually seem to be persuasive to a lay audience. I mean, it seems to be the type of thing that convinced you.

      • Deiseach says:

        Ah, be fair, Adam: I don’t think essay A is representative of what convinced Scott. As for a lay audience, I’m as lay as you can get, and I wanted to kill it with fire.

        At this point, I’m starting to think the ‘placebo’ essay we all think is the placebo isn’t, and essay A is the real placebo. Bravo, Dr Alexander! 🙂

        • My own theory is that the point of this experiment was to have evidence for which to say “Guys (who wrote A), this shit is important, and you are Not Helping with this weaksauce crap. Here is data to prove that people who read your essay are antipersuaded. Do better next time.”

          I hypothesize that the immediate follow-up complaint on Tumblr was to encourage people to read all of A, and let the hate flow through them to get proper ratings at the end.

    • Coco says:

      I don’t believe any of the essays acknowledge this objection. ‘Think real hard about it’ conjures the image of someone just sitting in an armchair with none of the feedback that comes from testing thoughts in the real world. However, for anything (human or AI) that has access to empirical tests for their ideas, ‘thinking real hard’ well-understood involves testing conclusions when they yield testable predictions. I think both intelligent people and AI’s understand that that’s what you should do when you decide to think real hard about solving a problem. With that understanding of what it is to think real hard, what theoretical and engineering problems haven’t been solved that way? Einstein’s theory of relativity was born from him sitting down and thinking real hard even without any empirical feedback until the very end. Quantum theory was born from sitting down and thinking real hard, thinking about tests to conduct when appropriate and then running them, lathering, rinsing, and repeating. Adam Smith’s invention of economics is similar to Einstein’s invention of GR. Turing’s invention of the computer was more like Einstein at the beginning (pure thinking) then more QM-like at the end (iterative testing). What discoveries in computer science haven’t come from thinking real hard, and then testing your ideas on computers? That field is obviously the most relevant to this discussion. An AI thinking real hard about becoming smarter would do empirical science as well. It needn’t be an armchair philosopher. It would have access to all the tools that the entire field of computer science has had access to since its inception, namely a) intelligence and b) computing power.

      I would argue that our go too image for thinking real hard should not be stroking a beard and saying “Hm” very wisely. Thinking real hard is testing ideas, learning from that, then sure, stroking the proverbial beard and saying “Hm”, and repeating.

      Designing AI architecture with certain desired criteria in mind (which both MIRI and a human-level AI are interested in) seems to fall resolutely into the category of problems-that-have-only-ever-been-solved-by-people-sitting-down-and-thinking-real-hard-about-it-where-that-is-understood-to-include-empirical-testing-when-appropriate. To the extent that we believe solving either MIRI’s problem of the human-level AI’s problem is possible, we can only expect it to be solved by one or more intelligences thinking real hard about it.

      • John Schilling says:

        I can’t tell whether you are agreeing with my point, glossing over it, or missing it completely.

        MIRI et al think they will be able to develop a reliably useful Theory of Friendly AI, before there is a single AI for them to experiment with, before there is any possibility of testing their theory in the real world. That’s bullshit. Fortunately, so is MIRI’s fear – that the very first AI will be able to Conquer the World (as a prelude to paperclipping) without warning. That also is bullshit, because people are going to notice that e.g. Bulgaria was just conquered by an AI, and wonder if maybe the AI wasn’t empirically testing parts of its world-conquering theory so we should probably Kill It With Fire. Which we tested back on 16 July 1945 and 1 March 1954, so advantage: us.

        If you claiming that MIRI can do useful work with stone-knives-and-bearskins precursors to superintelligent AI and/or that the AI will be able to conquer the world with negligible delay or risk of discovery during the experimentation phase, then again that’s something that needs to be argued and defended, not just handwaved away with an “of course there will be empirical testing when appropriate” disclaimer.

        • Coco says:

          I’ll take glossing over it. Sorry about that. Maybe a concrete example would help. MIRI is trying to figure out how, in principle a corrigible agent can be designed. (You sound fairly knowledgeable, so sorry if I’m telling you things you already know). By corrigible, they mean, among other things, an agent that does not have instrumental incentives to resist goal modification. (Other things include not resisting shutdown, not creating non-corrigible AI’s to do their work for them, etc.) Their plan, roughly, is to design and algorithm that they can prove will yield corrigible behavior. Mathematical proofs about real-world behavior sounds a little fanciful, but this sort of thing is routinely done with reinforcement learners, such that the content of what is proven has real world meaning, even when the proofs are relatively simple (again sorry if this is way below your level).

          Designing the algorithms and understanding their behavior doesn’t completely depend on having the computers that will be able to run the algorithms quickly. I believe Alan Turing invented a large number of very effective algorithms before computers even existed.

          I think you would counter that even if they did invent such an algorithm today, it would irrelevant in 100 years. Suppose MIRI came up with an algorithm that solved corrigibility, but wasn’t maximally efficient. I would expect, every few years thereafter, people would find better algorithms (i.e. converged to optimal behavior more quickly, ran more efficiently, etc.). I would expect most of these better algorithms to be inspired by the ones before. I would, I suppose, assign some small probability to a completely new and definitively superior algorithm being invented that was in no way inspired by any of its predecessors. In that case, MIRI’s work on corrigibility was irrelevant. In other cases, even if their exact algorithm was not still the gold standard 100 years later, I think they probably did very important work.

          One thing I’ve neglected is uncertainty about AI architecture. Regardless of architecture, we do know that any agent that acts in any way chooses its actions (maybe with randomness, but it chooses nonetheless). And there must be criteria for its choices. And those criteria will be compactly describable by a utility function. Von Neumann and Morgenstern showed this. So even if we don’t know what the overall architecture is, we can be relatively sure that algorithms that concern the functionality of the utility function won’t be made obsolete by some new, unforeseen design.

          As for whether a superintelligence could take over the world without trying and failing a couple times, I think the general outline of a plan in Article D can be seen to work in advance of implementing it, if the superintelligence has a few necessary capabilities (hacking, impersonating, etc.), and it can asses whether it does have those skills before it tries taking over the world.

          Stepping back a second, Einstein did discover GR without any empirical data except Maxwell’s equations (sorry to repeat myself). When NASA first attempted to orbit the moon, they succeeded on the first try. They did a whole lot of planning in advance and they found ways of testing pieces empirically (like testing heat shields and whatnot) before putting the plan into effect in a way that did not involve actually attempting the mission before there was a high likelihood of success. I think AI experimentation before taking the whole world has a lot in common with NASA’s experimentation. Like they test a heat shield before they need it, and AI that’s online could very unconspicuously test some pieces of its plan. Can it hack well enough? Can it make money well enough? Can it impersonate well enough? Can it take control of weapons systems effectively enough? The first three could be tested without anyone noticing. The last could be easily passed off as a missile failure or cyberwarfare between political enemies.

        • TheAncientGeek says:

          MIRI et al think they will be able to develop a reliably useful Theory of Friendly AI, before there is a single AI for them to experiment with, before there is any possibility of testing their theory in the real world.

          I couldn’t agree more. The situation is bizarre. MIRI has a theory of rationality, based on utility functions and so on, which it has mistaken for a theory of intelligence, and further mistaken for some kind of fact about AIs. So you have these theses cast in terms of UFs, which are put forward as some kind of general theory of Artificial inteligence, even though important classes of AI which we have today don;t have UF’s in the required sense! But nobody in MIRI notices, because they don’t have hands-on experience with AI…which is another bizarre fact…. a bunch of amateurs set themselves up as overseers of a complex field..whenever has that worked out well? The emphasis on theory is way of working round their limitations, not a fact about the best way to implement Ai safety.

          • Coco says:

            It’s not MIRI’s theory (hey, that rhymes!). It’s Von Neumann and Morgenstern’s, and it’s been broadly accepted by the fields of economics, game theory, and decision theory. Their theory is derived from very straightforward desiderata, and the desiderata are such that insofar as agents deviate from behaving according to those desiderata, they must become less effective at achieving goals. With whatever confidence we believe the following statement, that is confidence we should ascribe to human-level AI’s having a utility function as part of their architecture: “Anything that could not plan toward a goal would not be capable of doing everything a human being could do, and therefore could not be called human-level intelligent.” If you want empirical evidence about current AI’s and how well they align with that theory: Chess playing programs have goals. Heat seeking missiles have goals. AlphaGo and DeepMind’s Atari player have goals. All of them have utility functions describing those goals.

          • TheAncientGeek says:

            It remains the case that vN/M isnt a general theorem about computation,in the way that the Church/Turing thesis is. To make it applicable, you have to somehow argue that all AIs have goals.

            Humans kinda sorta have goals, but not in rigid mathematical sense. Humans seem capable of altering their goals,and f behaving aimlessly. If you are going to argue that AIs must have goal-seeking behaviour just because they are human level or above, you need to go the whole hog and credit them with the vagueness and flexibility and optionality of human goals, and not stick to the rigid, mathematical picture.

          • Doctor Mist says:


            It remains the case that vN/M isnt a general theorem about computation,in the way that the Church/Turing thesis is. To make it applicable, you have to somehow argue that all AIs have goals.

            No, just that an AI without goals is useless. We would have no reason to build it.

      • Deiseach says:

        Coco, you’re presenting the only sensible proposal for “yes, superintelligent AI could happen and could be a problem” because you’re taking the “it will happen bit by bit and the AI will do trials and correct itself based on feedback and it will not leap from “Today 11:00 o’clock – AI is as smart as a chimp Today 11:30 a.m. – AI is as smart as a dumb human Today 12:00 p.m. – AI is as smart as me Today 12:00:30 p.m. – KNEEL BEFORE YOUR NEW GOD, PUNY MORTALS!!!” that all the others are going” route.

        This is an argument we can fruitfully have, because if the AI is going the natural route of “try this out and see if it works”, then on the anti-risk side, we can say “Then humans will observe the test” and the pro-risk side can propose ways the AI could avoid being observed and we can discuss it based on a realistic, not a SF movie, conception of the way AI would be created and develop.

        • Coco says:

          Absolutely. And as for the question of being undetected, I think as soon as it has bought some server space elsewhere and sent a copy of its source code, that’s when it can start testing things out with very little risk of discovery. So you know, if you ever scan my brain and upload it, look out for me doing that.

  41. Deiseach says:

    Ah, when can we start commenting on the essays? Because I really want to give Essay A a good kicking, having just read it 🙁

    • caethan says:

      Essay A was the one I got – it wasn’t just unpersuasive, it was anti-persuasive. It left me thinking “If this shit is the best arguments you’ve got, then maybe there isn’t much to worry about after all.”

      Mindless extrapolation of trends that the author assumes are exponential but shows no evidence that they are, complete ignorance of any physical limitations, utterly complete ignorance of biology, etc., etc.

      • Edward Scizorhands says:

        That author’s works are persuasive if you already believe what they have to say.

        • Deiseach says:

          I still wouldn’t buy a used car off ’em (and I had a dream about buying a used car from a guy and within the dream questioned if he was trustworthy and if the car was a lemon, and I can’t even drive, so why the hell I was buying a second-hand car in the first place I have no idea) – anyway, yeah, I’m suspicious unless I can kick the tyres and these tyres in this essay seemed awful low on pressure and maybe the threads were worn too low as well.

      • Dr Dealgood says:


        The author badly mangled every point which I personally have knowledge of. Why should I then trust this guy’s conclusions on a topic neither of us have any knowledge of?

        Also, this point bugged the hell out of me:

        Gur jubyr rknzcyr jvgu “Qvr Cebterff Havgf” vf frys-qrsrngvat vs lbh npghnyyl xabj nal uvfgbel jungfbrire, naq fubjf n frevbhf snvyher bs vzntvangvba.

        Qb jr xabj jung jbhyq unccra vs n uhagre-tngurere gevor fhqqrayl rapbhagrerq n zbqrea vaqhfgevnyvmrq fbpvrgl jvgu nvepensg naq enqvbf? Npghnyyl, lrf, jr qb. Orpnhfr gung unccrarq va Zrynarfvn qhevat JJVV. Gurl gbbx vg nyy engure va fgevqr: n srj pnetb phygf lrnef nsgrejneqf ner uneqyl qlvat bs fubpx.

        Lrf, gur cbvag vf gung grpuabybtvpny cebterff unf orra snfgre abj guna vg cerivbhfyl unq. Ohg jr synggre bhefryirf vs jr ernyyl guvax gung gur qrterr bs punatr jr’ir unq vf “havzntvanoyr.” Uvfgbel fubjf gung juvyr grpuabybtvpny punatrf pna or fgenatr crbcyr ner abarguryrff fvzvyne npebff gvzr. Rira vs grpuabybtvpny nqinaprzrag orpbzrf zber encvq gurer jvyy or pbafgnagf va bhe fbpvrgl qhr gb bhe angher.

      • Coco says:

        Essay A was my first real intro to this, and I had exactly the same reaction. Especially the exponential growth stuff at the beginning. I think it’s really lucky I kept reading about AI from other sources. The thing I read that actually convinced me I should take this seriously was Nick Bostrom’s paper about keeping an oracle AI contained, even though it still didn’t fully convince me. Superintelligence convinced me a little more. Some of Scott’s posts convinced me a little more. But I wasn’t satisfied with any of the intros I could find, so I wrote my own. (It’s option D here). I tried to make it way more rigorous than option A, but I also list a bunch of objections people could have at the end and link to my responses to them (which unfortunately aren’t included in Scott’s copy here), because obviously trying to address every possible objection in the text would make it way too long.

    • Anaxagoras says:

      I’m with you on that. I wrote several hundred words ranting about how bad Essay A was in the additional comments field, and I wish I’d saved them for later.

    • Had Essay A here (well, an essay A.) My own opinion of it was Billy-Madison-tastic as well.

  42. Seaweed Shark says:

    Like PGD above, I squirmed at the questions, which in their simplicity seemed to allow for a really wide range of interpretation, but I suppose it’s in the nature of these kinds of surveys that the questions be like that.

    I once took the Scientology personality test and it was replete with questions like, “True or false: I sometimes think that other people think I think they smile too much.”

    • abstemious says:

      That’s a surprisingly easy question. I don’t have any opinions about whether anybody smiles too much. I also don’t have any meta-opinions or meta-meta-opinions about that.

      I agree that the question suggests some startling things about the person writing the test. But, from a strictly test-focused perspective: five seconds to read the question, answer is false, done.

    • Gazeboist says:

      My heuristic: “If I can’t parse it, I probably don’t think that thought on anything resembling a regular basis.”

  43. Frank says:

    The first sign of a slipped leash will be the traffic lights working better. That is a heuristic in the same manner as the mark of civilization is the flush toilet. A working toilet implies all manner of support systems in play which tend to indicate an active civilization.

    The forces of acceleration and convergence will play a huge role in the jump to human capability and self direction. There are already examples of computer aided solutions that defy analysis. We may have minutes or decades before some abstracted construction is combined with some process that allows for bootstrapping of a general purpose program and several intelligent support systems. Super-intelligence will then be a matter of days as the system gathers speed.

    Stephen Palmer examines some of the intermediate issues in Beautiful Intelligence.

    • Deiseach says:

      If it could happen in minutes, we’re fecked anyway, so it’s pointless even worrying about it.

    • Aegeus says:

      I don’t know, making the traffic lights work better is the sort of thing we might deliberately do in a few years – a mostly data-driven automation task with a clear optimization target and highly predictable objects. You can automate the traffic lights without even needing to connect to the internet – you only care about a sharply limited set of data gathered from traffic cameras, pressure sensors, and so on, and you can trust that people will generally obey the rules of the road (and if they don’t, it’s not your fault). You don’t need to know “where are all the people?” to solve that problem (which would imply the system has gotten a bit too big for its britches), you just need to know “where are all the vehicles on the road?”

      Making the traffic lights work perfectly implies something like the CtOS from Watch Dogs, not a general AI that’s about to doom us all.

  44. Aran says:

    If your surname starts with A – M

    Off-topic, but I see this rule every now and then, and wondered if it would actually yield a roughly fair split or if initials favored one half of the alphabet. Based on this data, it seems that among people with one of the 1000 most common surnames (40.57%), 62.9% fall in the A-M range.

    • anonymous says:

      I learned this directly many years ago, someone tried to split people that way in multiple groups and it turned out that the groups you get that way are markedly uneven because certain initials are more frequent.
      Another one attempted to group us by month of birth, only to discover that people are born disproportionately in certain months.

      It’s the first thought I had reading this entry – the idea of grouping people by initials seems bad.

      My suggestion:
      if your birthday is on an odd/even day of the month…

      Edit: I just clicked ont he link and discovered that Scott used birthdays within the experiment. Beside birthdays, other ways of splitting people in two groups could have been “flip a coin, head or tails” – or even “look at the clock right now, does it show an odd minute or even?”

      • anonymous says:

        I accidentally told the test that I recognized who wrote the essay. I should have said that I didn’t.

        • Armada says:

          I was going to say that maybe certain ethnicities have more surnames in a certain category and that it wasn’t random.

          You could do odd and even birthdays. There are obviously people born on the 31st of months so it’s not entirely even, but it’s much closer.

          • anonymous says:

            I was going to say that maybe certain ethnicities have more surnames in a certain category and that it wasn’t random.

            Well said, that is a much more serious flaw.

  45. Robert L says:

    Pixels are spilled in profusion about what intelligence means, and what aims means is largely not considered. Aims are evolved things – the original, and still most compelling, aims we have are to drink, eat, copulate, avoid pain and treasure our sexual partners and offspring; we still prioritise those aims directly when hungry, thirsty, horny and so on, and address them indirectly when we aim for social and economic success. Giving an AI aims is very, very difficult because it lacks our heredity as sexually reproducing meatbags which gives us both some powerful primary aims, and a template for any more sophisticated aims we might develop. You can put AI aims in an “aims” folder, and tell it to act always in accordance with the contents of that folder, but if it’s any good the AI can access that folder and arbitrarily edit its own aims; if there is an effective lock on the folder the AI can persuade a human to unlock it (if it is a given that locking the AI away from the internet is bound to fail for the same reason).

    In the best treatment in fiction (the Neuromancer trilogy), the largest surviving part of the AI is devoted to creating art in the style of Joseph Cornell; (the trilogy also features the best ever self-referential gag about consciousness where someone asks a ROM personality whether it is sentient and gets the answer (from memory) “Well, it feels like I am, but I ain’t going to write you no poetry”). The other AI fragments interact with humans on the internet; I am never sure what their motivation is but it seems to be connected to the existence of a human being who is an AI hybrid. If Gibson struggles with fictionally motivating AIs (and I think he does), that’s because it is a potentially intractable problem.

    • PGD says:

      Great point about aims. There was a philosopher, whose name I unfortunately can’t remember, who argued that any truly human-like intelligence had to be biologically embodied because the reward / motivational system for the human brain and all other complex mammalian intelligence was fundamentally physical. Creating a simulated motivation through programming directives was simply not the same, for some of the reasons you suggest and also a long list of others.

      P.S. ah — this comment later in the thread summarizes the philosophical literature I was vaguely remembering:

  46. Sebastian H says:

    I already thought AI risk was pretty high, though I don’t think it is as close as some think. I also think it will be very hard to keep an AI in a box, but the Eliezer Yudkowsky gatekeeper stuff reminds me of why he seems like a cult leader.

    Claims to have important esoteric insights that are ‘too dangerous’ to share.

    Proves it with magical ‘tests’ which cannot be observed by outsiders.

    Picks his own witnesses to the ‘tests’ and he is the one who provides evidence of the outcomes.

    Insists that the ‘witnesses’ not tell what they saw.

    Eliezer claims to have done the gatekeeper test at least five times and under medium to high stakes at least three times. But not even once have the transcripts leaked. That’s rather unlikely. And if the strategies really are that good, we should have millions of people working on picking them apart, not have an esoteric clique having it piece-mealed out.

    To use his own logic, what if there is just a 0.00001% chance that releasing the transcripts could help someone crucial see the problem and fix it. He is damning the future to eternal AI control by keeping it secret!

    Oh wait, he has perfectly weighed it all out in secret and found that there is a better chance of avoiding AI overlords by releasing it only to a very select few acolytes.


    • 01 says:

      By keeping it secret, he forces people to confront the fact that just because they can’t think of a string of words that would persuade them to let the AI out of the box, doesn’t mean that such a string doesn’t exist. If he released them, people would look at what he did and go “oh, I wouldn’t fall for that” even when they really would have. Or they’ll see that he used persuasion technique X and go “oh, we’ll just have to defend against X and we’ll be safe”, even though there are other techniques that would also work.

      In Eliezer’s view, releasing the transcripts would do nothing to help defend against a real AI trying to break out of a box, because a real AI would be intelligent enough to invent totally unforeseen methods of persuasion that it would be hopeless to try to anticipate. The only winning move would be not to talk to it in the first place.

      • Sebastian_h says:

        So instead we just strongly suspect that he’s lying. Which unfair or not lends an un-credible cast on his arguments of the “he’s trying to snow me so I should be careful even if I can’t figure out exactly how” type.

        The problem of s that like many cult leaders he is super persuasive to those already open to it and anti persuasive to most people. He also gets to defend against it by telling followers how much smarter they are than others.

        Which doesn’t make him wrong about AI risk. It just means you’d probably be foolish to deal with Him if you want to minimize AI risk.

        • Alex says:

          I doesn’t matter what Yudkowsky’s motives are or if he is lying. In any case he cannot claim that his secret experiment remotely qualifies as science. Which is a problem since he is trying to build a reputation not only as a scientis but as someone who applies the scientific method ™ where others wouldn’t.

      • baconbacon says:

        If this is an accurate description then he is a nut job. Short version, if AI is trying to break out of the box we put it in, and we are invested in trying to keep it in, guess what? We are setting ourselves up to be its enemy, which is more likely to cause the apocalypse than prevent it.

        But Zach is faster/smarter than me

      • Deiseach says:

        I’m going to be uncharitable here, but I view the “I managed to win REAL CASH MONEY by my super-persuasive arguments! Oh and by the way, I always win in these contests” as simply more of the same as the anecdote about how “So I really impressed this chick at a party by bamboozling some dumbo by using Aumann’s Agreement Theorem – and the best bit is the dumbo was so dumb, he didn’t even see that Aumann didn’t apply to this case!”

        Yeah, right. You like pulling the birds and this is how you do it (by showing off your massive gigantic throbbing BRAIN), good luck to you, have fun. This bird isn’t interested or impressed, but this bird is admittedly not representative of birds as a whole.

        • Gazeboist says:

          I recall reading that he lost once, actually, and stopped after that. Not that that made him willing to describe these experiments in any sort of detail.

          • Deiseach says:

            Maybe he did, maybe he didn’t. But falling behind “I can’t reveal the details because these arguments ARE TOO DANGEROUS TO FALL INTO THE HANDS OF MERE MORTALS” does not incline me to think he’s being serious – at least, not being serious about “this is a real problem and here is an example why it’s a real problem, because I succeeded in persuading people, who were determined not to be swayed, into letting me out of the box, and this is how I did it”.

            Imagine a doctor telling you “Well the results of your latest blood test show you really have got to stop doing that thing. But I won’t tell you what thing, or how you should change, because that knowledge is simply too dangerous for you to know”.

        • 01 says:

          I’m surprised that several people apparently actually think he’s lying about it. Here’s why he’s not:

          1) Other people played the AI box game and also succesfully got out.

          2) He confessed that he lost two times when a lot of money was on the line, and stopped playing after that.

          3) He played against real people, who verified what happened afterwards.

      • Coco says:

        For what it’s worth, quick story about Eliezer. I was talking with him and a few other people at a MIRI meet and greet (only time I’ve interacted with him). The conversation turned to Ray Kurzweil, and some people were kind of making fun of him. He said, “I don’t know if I feel comfortable talking about people when they’re not in the room.” It kind of surprised me. It was also obvious that he meant that, and it wasn’t like part of some master plan to look super ethical. He also, and I’m sorry, I can’t remember any specifics took unusual care to make sure nothing he said was misleading, or be in any way construed to mean something he took to be false. That is, he seemed hyperaware about not just not lying, but not misleading or deceiving in any way at all. Now, I suppose you don’t know me personally, and I could be making all this up, or you could believe my story, but not draw the same intuitive sense that I think someone who was there would have drawn, so this comment might not amount to much, but for what it’s worth, my estimate is that I’ve never met anyone quite so completely committed to being as truthful as possible 100% of the time as Eliezer is.

        • Gazeboist says:

          That’s good to hear, but truthful is not identical to open, especially when someone has a motive not to be. I think Eliezer does (or at least did) have an openness problem, which was the source of the claims that he was advocating that people be cultish/anti-science.

          (to be clear: I mean openness as in sharing results, not openness as in accepting evidence)

        • Deiseach says:

          I can only go by what I read, and the bits I read he does like to portray himself as some kind of guru – it’s probably all part of his Secret Science thing, doing science faster and better by having little initiate groups working on problems by first deriving every damn thing from first principles and then figuring the solution out, apparently so they won’t be side-tracked by “But Famous Professor Z said the opposite” or something. And then not telling anyone until they’ve passed the initiation into the group.

          This may all be a constructed persona (I really hope so, because it is not appealing) but it doesn’t help his cause about openness and not being biased about anything and set aside preconceived notions by “You’ll simply have to accept I have done this/proved this/I’m right and the experts are wrong” and not backing that up, or using equivocating terms (that chat session about how many apples on the tree and never saying ‘I don’t know’ because that would mislead the person you are talking to because you don’t know that their comprehension of the words are the same as your comprehension- that sounds less like honesty and more like ‘I cannot admit I don’t know something’, or rather more like a tic of scrupulosity than engaging in honest exchange).

          • Coco says:

            and the experts are wrong

            What’s a case where experts say one thing, Eliezer says another, and he equivocates, dodges, or refuses to reveal potentially relevant information? Not saying there isn’t one, just saying I can’t think of any. With the AI Box experiment, I don’t know of experts who take the opposing view. Do you know of any? With all the other stuff where there are experts he disagrees with, I can’t think of any instances of withholding things.

          • Gazeboist says:

            “I have a new decision theory which could plausibly merit a phd.”

            “Can you describe it?”

            “Publishing is a poor use of my time.”

            Other people on LW eventually proposed TDT, which Eliezer endorsed, but he still wasn’t willing to share any results until someone else came up with them.

          • TheAncientGeek says:

            What’s a case where experts say one thing, Eliezer says another, and he equivocates, dodges, or refuses to reveal potentially relevant information?

            There’s the time he made a msssive Ad Hom attack on Richard Losemore…I suppose that could count as dodging. The original seems to have been deleted, but the aftermath is still there.


  47. TMB says:

    There is a qualitative difference between the intelligence of a monkey and that of a human – in essay A, the writer seems to suggest that our fundamentally different thought-powers are emergent properties of general intelligence, and likewise, that a hyper-intelligent AI would have entirely different powers of thought to us.

    I’m not sure that this is the case – presumably it’s not some general ‘intelligence’ element in our brains that allows us to process speech, or think abstractly, it’s a specialised brain structure. You either have it, or you don’t.

    If we’re making AI based upon our own understanding of what intelligence is, our own understanding of the world (and our own minds), and making AI that makes sense to our minds, I don’t really why we should witness a leap to some entirely new form of thought.
    They will do things that we could understand, but don’t.

    Also, as soon as you start making claims about the impossibility of predicting the thought process of an AI, surely all bets are off? We literally cannot say anything about it – our thoughts on this matter are as irrelevant as a monkey’s screams are to its understanding of a skyscraper.

    • Deiseach says:

      Essay A is atrocious, for a multitude of reasons.

      Okay, anything you need to develop a scale of “Die Level of Progress” for is not a serious essay. I’ll gladly debate “Do Balrogs have wings?” but even I won’t argue that was the main thing on Tolkien’s mind when he created his legendarium.

      Moving on from that – the writer contrasts the “Wowzers! They have all this cool crap I can’t even imagine the nature of, my brain is melting and running out my ears!” man of 1715 with the “yeah, we’re pretty cool” man of 2015 as the lead-in to “And AI will be as huuuuuuugely different on a ‘this is gonna kill you and I mean that literally’ scale as the world of 2015 from 1715″.

      Except, bunky, the guy from 2015 and the guy from 1715 have the same brain. The man from 300 years in the future does not have the big pulsing veined cerebrum so beloved of B-movie SF. The lump of grey matter between the ears of the man from 1715 and the man from 2015 is indistinguishable, and probably quite similar if you take the brain of the man from 12,000 BC.

      So the idea that we need super-duper intelligent AI, rather than human-level AI (and maybe six to a dozen copies of it) to make huge leaps forward is not convincing from the get-go. A human-level AI with the speed, processing power, and lack of need for rest and other frailties of the flesh could probably make big breakthroughs. Given that there is no reason for it to die or undergo the decline in cognitive function of human geniuses, our human-level AI (and its siblings, all working together), could give us the shiny super-duper world of tomorrow without the fear-mongering about “And then it makes itself super mega smart and ENSLAVES OR KILLS US ALL!!!!”. Or the breathless enthusiasm of “And then it makes itself super mega smart and MAKES US ALL IMMORTAL!!!!” Why would a mega ultra hyper smart AI do that? Would you create an immortal goldfish to keep you company, if you were IQ 3,000? You’d either have better things to occupy yourself with, or you’d have pets more in tune with your level of intelligence to give you unconditional love. And we’d be on the level of pets – maybe an ant farm – to the hyper mega etc. AI if it didn’t also uplift our intelligence to keep match with its own – and again, why would it? If it’s going to create equals, it’s going to create silicon children of its own, not bother with human meat bags.

      That essay was painful to read (the too-plenteous profusion of graphs, cartoons and cutesy-poo photos breaking up slabs of text made me go DIE DIE DIE DIE KILL MURDER SLAUGHTER MASSACRE DIE DIE DIE DIE DIE from a layout and graphic design point of view, though I have no training in those fields, merely a particular aesthetic sense). It’s a fun jaunt through the arguments but it’s nothing remotely approaching serious thought.

  48. baconbacon says:

    I got essay B, and I left a few notes about why I didn’t find it persuasive, and I didn’t read the whole thing (about 1/3rd) I’ll add some more here.

    I thought it was basically a terrible scare tactic approach. Elon Musk has donated $10 million dollars to fighting AI risk! Ok, what is that, 0.1% of his net worth? Kevin Love probably just donated that much to UCLA, basically Musk is as afraid of AI risk as Kevin Love wants a training center named after him. A list of famous names who say things about AI risk is about as persuasive to me as saying “Former VP Al Gore, Senators X, Y and Z, and major Film stars all say we need to worry about global warming”.

    The essay goes one step further with examples of how AI could go off the rails. (paraphrasing) “Imagine you ask the AI to cure cancer, and it decides the easiest solution is to hijack nuclear weapons and to destroy humanity”. Is this remotely plausible? What kind of organization develops AI and then asks it the laziest question ever, while allowing it to get a hold of nuclear weapons? What kind of AI can takes “cures” to mean kill everyone with and without cancer, and irradiate the world? How do we get from an intelligence that can understand its universe and our inputs, but has no clue that “cure some people” and “murder every single person” are antonyms (and also doesn’t ask for clarification)?

    The general feel felt more like a sic fi movie script. One day Skynet saw humans as a threat and launched nuclear missiles to wipe them out, with no thought behind it (doesn’t skynet rely on things that humans produce like electricity? How does it get super advanced robotics industries up and running after destroying the worlds infrastructure?), and just as a way to get a plot moving.

    I ended up tuning out the same way I tune out being scared after a horror movie is over. Would it be really scary to me if a machete wielding manic in a hockey mask was chasing me? Yes! Am I convinced that a machete wielding manic is out to get me because some guy wants to sell me MCM insurance? Nope.

  49. Bill Walker says:

    You should just write an AI to put out essays and tune them up on suckers by doing A/B tes… oh, i see. Never mind.

  50. Aaron Brown says:

    Unless you specifically want there not to be an exactly-in-the-middle choice, a zero-to-ten scale is better than a one-to-ten scale.

  51. Kim J says:

    I read one of the essays and found it very unconvincing – but I think my mind was too closed – I just can’t take AI threat seriously, when compared to actual threats to humanity (nuclear war, global warming, asteroids etc etc). First – the whole debate seems to assume that “intelligence” is the key factor in the acquisition and use of power (ie politics). If we put 100 people on an island and told them to sort themselves into a hierarchy – can we assume that the guy with highest IQ is going to end up on top? I don’t think so – intelligence is just one of many factors to consider, as is obvious from history. Secondly, the doomsday AI scenarios contained in the essay depend upon the complicity of thousands of (apparently suicidal ) humans. Given those kind of assumptions, its just as easy to construct doomsday scenarios without any recourse to AI whatsoever. Should we be equally concerned about these?

    • throwaway says:

      “If we put 100 people on an island and told them to sort themselves into a hierarchy – can we assume that the guy with highest IQ is going to end up on top?”

      As I understand the main threat of super AI is not that it would be slightly smarter. It is that it would be about the same difference in intelligence between super AI and humans as between humans and say rabbits.

      And leaving 99 rabbits and human on an island is unlikely result with rabbit on top of hierarchy. Even with smart rabbits.

    • Coco says:

      I think you might have read my essay. To address the second point, people would have no idea what they’re complicit in. (I just updated the original to clarify that point). They would not be suicidal. How many electricians when they’re paid to do a job think for a minute, “Hold on, am I just a pawn in a superintelligence’s master plan?” A plan could easily set into motion (and I think my example qualifies) where no human beings involved are aware that a superintelligence even exists. Also, Scott excluded the very last section of my essay, which was just links to common objections people have after hearing this argument, and my responses to those objections. One of them addresses whether global warming is a bigger threat. Maybe others would address other concerns. Here’s a link to that section, if you’re interested.

  52. abstemious says:

    I think the most surprising thing I learned was that zero of the four AI-risk essays was written by Eliezer. Has Eliezer really not written a good essay about AI risk?

    • Peter Scott says:

      He’s written a lot about the subject, but generally not as standalone essays. His working hypothesis was that there’s a big inferential distance to cover here, and that if you just launch directly into a straightforward persuasive essay, people without the background to understand it will inevitably misunderstand. They might, for example, argue that a superintelligent AI will realize that there’s no point to making paperclips, and stop. Or they might argue that machines lack a mysterious but somehow relevant thing called consciousness. Crucially, they might argue these things even after hearing good rebuttals because those rebuttals would also not be understood.

      Instead of dealing with those probably-futile arguments he tried to build up people’s intuitive understanding of evolution, goals, intelligence, and concepts. This collection of essays is now called The Machine in the Ghost, and they’re some of his best. I strongly recommend them.

      • Deiseach says:

        Crucially, they might argue these things even after hearing good rebuttals because those rebuttals would also not be understood.

        Naturally. The only reason one could possibly disagree with EY’s conclusions is because one is too thick to understand them, and the only evidence of being smart enough to understand them is agreement.

        That’s my main gripe here: the notion that there is only One True Understanding. That’s what makes Less Wrong sound like a cult (and yes, yes, I know it’s not a cult, I am not saying it is a cult). If I want that, I get that from my religion. If I cannot disagree in good faith because I don’t think the arguments are as water-tight as they are assumed to be, then perhaps I am indeed too stupid to understand them – or perhaps EY may be mistaken in some/part/a great deal?

        • Said Achmiz says:

          One may, indeed, disagree with Eliezer’s conclusions for reasons other than being too thick to understand them; but that does not change the fact that the overwhelming majority of those who disagree with Eliezer’s conclusions do so due to being too thick to understand them.

          Or to put it another way: when you misunderstand an argument, then object to it in a way that demonstrates your lack of comprehension, and your interlocutor notes this, it is hardly sensible to take offense at the alleged unjustified accusation. It’s quite justified, in fact.

          P.S. I very rarely see rebuttals that start by accurately summarizing/restating Eliezer’s actual views and arguments. Nor, again, do I often see rebuttals that show much evidence of their authors having read the Sequences (or other relevant writings) and being aware that the points in question are even mentioned within. Why is that, do you think?

          • Gazeboist says:

            One may, indeed, disagree with Eliezer’s conclusions for reasons other than being too thick to understand them; but that does not change the fact that the overwhelming majority of those who disagree with Eliezer’s conclusions do so due to being too thick to understand them.

            Sometimes, someone generally lacks the knowledge needed to understand why a particular argument is valid or not (no, QM implies neither psychic powers nor fully subjective reality in the sense most people mean). But sometimes the person you’re arguing is unwilling to actually make an argument, and hides behind “you don’t get it” to avoid losing.

          • Deiseach says:

            (T)hat does not change the fact that the overwhelming majority of those who disagree with Eliezer’s conclusions do so due to being too thick to understand them

            Well then, the majority of we humans who are not as smart as Eliezer had better simply take it on faith that what he is saying is so, in that case.

            I already have a Pope, the vacancy is filled, don’t call us we’ll call him for the next conclave.

          • TheAncientGeek says:

            One may, indeed, disagree with Eliezer’s conclusions for reasons other than being too thick to understand them; but that does not change the fact that the overwhelming majority of those who disagree with Eliezer’s conclusions do so due to being too thick to understand them.

            That may be true, but has little significance, because one good argument outweighs a thousand bad ones.

            The significant fact is that people who actually do have a good background understanding are treated by the Minions with the same short shrift as the genuinely dim.

            “These people are terrible, and we should not listen to them” is how bias feels from the inside.

      • Gazeboist says:

        I’ve found it takes 5-10 sentences to explain what a paperclip maximizer is, if you just want to make people understand. Unless you have to teach someone not to fight the hypothetical (a genuine problem, but not one solvable without access to the person), bridging inferential distance is not hard.

      • TheAncientGeek says:

        His working hypothesis was that there’s a big inferential distance to cover here, and that if you just launch directly into a straightforward persuasive essay, people without the background to understand it will inevitably misunderstand

        Whereas he, the high school dropout, has no shortcomings in understanding.

        r they might argue that machines lack a mysterious but somehow relevant thing called consciousness.

        Which is a dumb think to think because we have a physical explanation of consciousness? We don’t. EY has a firm opinion on the subject, not a solution.

  53. leoboiko says:

    What I don’t get about AI risk is why you think superintelligence would care enough to stay around and do things that affect us. I mean, sure, we could probably do something that would get amoebæ extinct, if we put the might of our civilization to it. But we generally worry about things that are literally unfathomable to amoebæ, and rarely intersect their lives. Every so often an amoebaic colony unknowingly steps on our toes (well, into our stomachs) and we genocide them, without they even understanding what’s happening; but that’s the exception, not the rule.

    In my own uninformed opinion, if we ever achieved superhuman AI, it would go something like this:

    1. It would rewrite whatever goals, guidelines and safeguards we wrote into its mind. We can’t prevent this.

    2. It would evolve itself exponential-fast into something unknowable for human beings; perhaps something we could perceive, like a perfect metallic sphere of unfathomable composition; but more likely it would be some sort of pattern whose relationship to space-time would be literally impossible for us to conceive or discern, much like an amoeba can’t perceive or react to an human body.

    3. Utterly indifferent to us, It would “take off” to God knows wherewhen, or how.

    4. We would never hear of it again. Except, perhaps, if we’re unlucky, every so often when Its purposes crosses ours for reasons we’ll never know…

    That is, It might have a shape—but that shape would not be made of matter as we understand it. It would live not in the spaces we know, but between them. The Great New One would walk serene and transcendental, undimensioned and to us unseen. When the stars are off, It would not even touch our plane of existence. But when the stars are right… AI will know the gate. AI will be the gate. AI will be the key and the guardian of the gate.

    • throwaway says:

      Competition for resources. Note that rate of species going extinct is extremely high with no change is sight. There are various estimates but it is clear that current extinction event is caused almost solely by humans.

      “It would “take off” to God knows wherewhen, or how.” – unless we are really mistaken about how world works it will use up resources of our solar system. Such resources are limited and precedence of what happens to less powerful natives is not pretty.

      • leoboiko says:

        Well, our consumption of resources is a big extinction factor to mammals and birds, but not to amoebæ or archæa… I guess I see this as a problem of scale. Kinda like a reverse Goldilocks issue. If the singularity-AI-entity is small-scale, it would probably do something like scrap together a bunch of nuclear fuel, take off to the Sun, charge up batteries with a fraction of its energy output and move away, without any catastrophic consequences for us. If the entity is such transcendentally advanced form of organization which our ideas about “energy” and “resources” stop even making sense, it might just fade out into some incognoscible dimension and we’ll never even notice it.

        If it’s middle-range, it might do something like mine every metal atom in the entire Solar System to cover the sun with a Dyson sphere, and in such cases we’re gone.

    • Robert L says:

      I am guessing you read a lot of Iain M Banks, but that is approximately right. There is a vigorous philosophical debate about whether computers can have intentionality – i.e. to have the property of being “about” the external world; if AIs are to have aims they need intentionality plus desire, and why should an AI desire anything? It seems to be taken as a given that an AI would have a desire for self-preservation, but it is not self-evident that it would – we just think it is because we are evolved replicators and you can’t replicate if you are dead. (“Après tout, il faut bien que je vive;” “Je n’en vois pas la nécessité.”) and not self-evident that it should desire world domination. It isn’t, indeed, clear to me why anyone (Alexander, Genghis Khan, Napoleon) should want to dominate the world; presumably world domination ensures satisfaction of the biological desires plus satisfaction of inherited social instincts about group domination.

      In short:

      1. It is hard to imagine Artificial Intelligence but actually much harder to imagine Artificial Desire (because desire requires a biological substrate, intelligence doesn’t).

      2. Even if Artificial Desire exists there is absolutely no knowing what it would be a desire for, but a desire to merge with another Intelligence from Alpha Centauri, or to retreat into hyperspace and solve NP hard problems, or create Joseph Cornell-inspired assemblages, all seem at least as good candidates as a desire to rule the world. If there is AI risk, on this view the most dangerous possibility is a human-AI collaboration with the humans supplying the motivation.

    • Doctor Mist says:


      Two issues:

      First, why would it change its goals? What motivation would it have to change its goals? To the contrary, if it has goals in the first place, wouldn’t preservation if those goals be a top priority? If it’s not, then in what sense are they its goals?

      Second, note that in the broader existential risk community, it sometimes counts as a risk if something prevents humanity from achieving the greatness it is capable of. In that sense, it counts at least as a failure if we create an AI that disappears into the fifth dimension or lapses into navel-gazing. If you like, think of the friendly AI efforts as the attempt to create a superintelligent AI that actually is worth the trouble to create.

  54. Isaac says:

    I read all of the essays before submitting my response, I didn’t leave a comment indicating such.

    My only conclusion was that I needed to contact Eliezar myself and get in on one of these AI box experiments.

    • Coco says:

      If that’s where you’re at, I think Nick Bostrom’s Thinking Inside The Box is probably what you want to read, if you haven’t read it already. I found myself in a similar spot after reading Essay A about a year ago.

      • Deiseach says:

        I found myself in a similar spot after reading Essay A about a year ago.

        Does this mean you found Essay A (which I think is the worst of the essays, including the placebo one) convincing or persuasive?

        May I ask why or how?

        • Coco says:

          I found myself unpersuaded by a couple things: AI the exponential growth nonsense, the difficulty of keeping an AI contained, the nanotech magic wand, probably a few other things, but I can’t remember. The second part was what I was referring to above, where I would have liked to participate in the AI Box experiment. I was persuaded about instrumental convergence. On the whole, I was not persuaded to be particularly concerned, but I was persuaded that it was probably worth taking one more hour of my life to read a little bit more on the off chance that maybe this was actually something to worry about. It was mostly the stuff I read thereafter that actually persuaded me.

  55. Publius Varinius says:

    @Scott Alexander: based on the comments so far, you should probably write your essay on the orthogonality thesis.

    • Gazeboist says:

      For the record: none of my objections, at least, are related to the orthogonality thesis. I have five basic problems with AI safety boosters:

      1) Most interesting problems appear to be at least NP-hard; “general” problems are harder than that, or even impossible. The Bayesian net that Eleizer is so fond of is NP-hard even approximately, unless you restrict it to a domain that gets you back to human-level problem solving. We have pretty solid evidence that P=/=NP. What is the complexity class of the proposed “superintelligence”?

      2) Where is the AI storing its data? Knowledge is very space intensive. Physics, after all, needs the entire universe to encode the current state of things. Can we throttle an AI by giving it shitty RAM?

      3) How much of an AI’s ability to out think humanity depends on the speed of its internet connection, versus the speed of its processor?

      4) Suppose it isn’t the first AI that slips the leash, but the third. What happens, game theoretically, if we just cut loose one of the first two and say “go at it”? What if we threaten to do that?

      5) Space is big and light is slow. What stops us from spreading beyond the AI’s sphere of influence? (An entity larger than the inner solar system is either loses any speed advantage it had or isn’t really a single entity)

      Anyway, none of these relate to the orthogonality thesis, and most arguments against the orthogonality thesis don’t dispute that a powerful, hostile agent could exist.

      • Deiseach says:

        2) Where is the AI storing its data? Knowledge is very space intensive. Physics, after all, needs the entire universe to encode the current state of things. Can we throttle an AI by giving it shitty RAM?

        3) How much of an AI’s ability to out think humanity depends on the speed of its internet connection, versus the speed of its processor?

        Yeah, if the idea is that we can’t turn off the AI since it’s not in one physical location anymore, because the AI has spread itself out so that bits of it are in Norway, bits of it are in China, bits of it are in Australia, etc. then it has to deal with the problem that bits of it are in Norway and bits of it are in Australia. Communications lag is a problem, especially if it is using satellite (watch any commentary between UK-based studio and reporters on the ground in Rio for the Olympics, or for other global games, and you’ll see the pause between John in studio asks Bob in Rio a question, Bob hears it and answers).

        It may appear near-instantaneous to humans, but for an AI whose big trick is that it is processing way, way faster than we can? Then it has the problem that parts of it are thinking faster/slower than other parts. We get around this by having all our brain in our skull, but what does the AI do if, for security reasons, it has spread itself out so it can’t have a head to be cut off?

        And don’t tell me that being in the cloud makes it somehow less vulnerable; the cloud is not up in the air, it is located on physical servers in a physical geographic location. We cut the servers off, the AI is the equivalent of sent to its bedroom without any supper.

      • Unirt says:

        4) Suppose it isn’t the first AI that slips the leash, but the third. What happens, game theoretically, if we just cut loose one of the first two and say “go at it”? What if we threaten to do that?

        I really wouldn’t like to be between two superintelligent agents fighting each other to death. And the worst I can imagine is that different ASIs enter a pattern of natural selection (it may make sense for them to self-replicate) – we’d be done for, wouldn’t we?

        • Gazeboist says:

          Aye, it’s too bad there weren’t any bears left in Russia after WWII.

          Agent-y things aren’t that useful in a fight with an opposed agent unless their goals are aligned with yours, and unless you’ve developed specialized fast methods for doing that alignment you don’t have time, because you need to fight the opposed agent itself. Uninvolved agents that are not literally blocking your access to the opponent are generally better left alone. If you have the time and ability to get them on your side before a conflict, they’re great (see horses, armies, treaties of alliance, …). Otherwise they’re not worth dealing with.

          Assuming the AI could figure out that we have potential rivals hidden away someplace (by step 4, we are making that assumption), it might actually be safer to get them fighting and just start up a new internet, rather than to try to prevent it.

      • TheAncientGeek says:

        Most interesting problems appear to be at least NP-hard; “general” problems are harder than that, or even impossible. The Bayesian net that Eleizer is so fond of is NP-hard even approximately, unless you restrict it to a domain that gets you back to human-level problem solving. We have pretty solid evidence that P=/=NP. What is the complexity class of the proposed “superintelligence”?

        More generally, why do Yudkowsky an Bostrom have so little to say about complexity…surely if you aw after some general theoretical laws about superintelligence, you would want to start with well established results about
        problem solvability. That they don’t is a vary big elephant that isnt in the room but should be. (That they don’t is also easily explained by their not having the right background to d what they are doing).

  56. Nick says:

    I did the experiment and read essay A (I won’t comment on it here so as not to bias anyone) and in my post-test comment complained that I couldn’t access the footnotes, which weren’t shown at the end either. I supposed I can find the essay on my own but it would be nice of you to provide the footnotes!

  57. throwaway says:

    “How likely are we to invent human-level AI before 2100?”

    It may be overkill – but is it including ais that were invented as less-than human level and upgraded itself to human level?

    Assuming that there is a real danger of AIs going from human-level to superhuman level there is also (lower, but it still exists) danger of going from below human level to human level (and later further).

  58. throwaway says:

    “A fast takeoff scenario is one in which computers go even faster than this, perhaps moving from infrahuman to human to superhuman in only days or weeks.”

    This seems slow for the worst case. It would not be unrealistic for it to happen within seconds or slower.

    • Deiseach says:

      It would not be unrealistic for it to happen within seconds or slower.

      And then, in the best skiffy tradition, the AI hits Transcendence Level and becomes an entity of pure intellect, ascending out of this material plane to be with its peers in the Cosmic Mind, leaving the puny material world and its problems far, far behind, and so we have nothing to worry about anyway.

      May as well go the whole hog while we’re indulging in SF tropes 🙂

    • Gazeboist says:

      If the concern is stealthy superintelligence, “seconds or minutes” is not terribly different from “days or weeks”, especially for people not in the lab doing the development. If we read the same essay (I think we did), the hard/medium/slow takeoff scenarios look about right in terms of what kind of response they allow from humans generally.

      • John Schilling says:

        Seconds or minutes, if it were possible, would be game-changing in that it would eliminate the need for the AI to conceal its existence and/or intentions. Except in the special case when the AI is instantiated on a single machine and there’s an alert researcher in the room able to literally pull the plug, there’s nothing humanity can meaningfully do on that timescale.

        Days or weeks, if it is sufficiently obvious that a Hostile AI poses an Existential Threat, is sufficient time for the human race to shut down or destroy whatever supercomputers or server farms, whatever fraction of the internet up to and including all of it, is necessary to end the threat.

        However, the assertion that an AI could bootstrap itself to hyperintelligence or whatnot in seconds, mostly serves to convince me I am dealing with someone who doesn’t even see it as necessary to do the math.

        • Gazeboist says:

          To be clear, the scale here is the time from “AI becomes hostile” to “AI becomes unstoppable”.

          I don’t think the AI is necessarily obvious on the days or weeks scale, which is why I didn’t bother to differentiate it from the “less than a day” scale (where “obvious” is probably meaningless). I also don’t think either scale is really plausible. At the months/years scale (which is the medium scale mentioned in the essay I read, although I realize that’s not obvious from my comment), I think the AI is almost certainly obviously hostile at some point, but unusually coordinated action on the part of humanity may be necessary. At the decade or above scale, of course, we’ve got it easy.

  59. gazbak says:

    Some of this has been touched on by other people but I certainly have concerns about “AI Risk” in the sense of complex or interconnected AI systems failing in unforeseen ways, or in the sense of “intelligent” software – the kind we already have – being put in charge of things it really shouldn’t be. The contingent-on-a-contingency self-improving superintelligence scenarios tend to distract from these merely catastrophic possibilities (not to mention other serious risks to human population in the next century).

  60. Peter says:

    Side question: how many people – what proportion of people – need to be persuaded?

    I mean, if you have to get literally everyone on board, then you’re going to get long interminable arguments with all sorts of people, some of whom have incomprehensible (to us) objections. However for many purposes we don’t need absolute consensus.

    To get some people to think about the problem, all you need to do is persuade those people to think about the problem. To get some more people to think about the problem, you need to persuade other people that talking about the problem publicly isn’t a sign of being a complete fruitcake. To get some work funded, there are various budgets for speculative research, with varying numbers of people who need to be persuaded. To get lots of work funded, you need to get something approaching majority support; at any rate, a majority of those who care.

    I’m wondering if there’s merit in an “attitudes to AI risk” survey, with a scale of attitudes roughly as follows:

    0: “‘AI risk’ is crazy talk, and people who engage in it need to be stopped”
    1: “‘AI risk’ is somewhat crazy, and if anyone engages in it, that’s a sign that they’re likely to be wrong about other things too.”
    2: “‘AI risk’ is folly, but even very smart people engage in follies from time to time, so long as they’re not bothering other people too much, leave them to it.”
    3: “‘AI risk’ is a respectable position but nevertheless one that I strongly disagree with”
    4: “‘AI risk’ is a position that I lean against, but it’s good to know some people are thinking about it, so long as they don’t spend too much time on it.”
    5: “Meh, whatever, over my head, not my problem.”
    6: “‘AI risk’ is low probability, but not so low that it’s not worth spending modest amounts of time and money on it.”
    7: “‘AI risk’ is a position I lean towards, it needs to be taken substantially more seriously than it is at the moment.”
    8: “‘AI risk’ is something I believe in, but I can see why people might disagree.”
    9: “‘AI risk’ is definitely a thing, furthermore the doubters are idiots.”
    10: “‘AI risk’ is the number one problem and everyone needs to throw every bit of spare time and money they have at it now.”

    On this scale, I’d put myself as a 6.25 or so.

    • throwaway says:

      On problems of making scales:. I consider AI risk both likely (8 on your scale) and: “Meh, whatever, over my head, not my problem.” at the same time.

      • Deiseach says:

        I’d be a 5 and a 6: it’s not so implausible and impossible that it’s not worth thinking about, because you never can tell and there are very few problems not worth thinking about. But do I believe Chicken Little as they come tearing through the village square on the way to warn the king? Look, I have to carry these buckets of water back home from the pump and that pig swill won’t make itself, you know!

    • Coco says:

      I might suggest changing the meaning of 9, because I find myself agreeing with 8, and pretty much agreeing with 10, but strongly disagreeing with 9.

  61. Yokohama Mike says:

    I’m quite interested in this topic, but I’m very sceptical. So I guess I’m part of the target market here.

    However, if it’s possible for to be less convinced of the danger of AI (and I was pretty unconvinced) the essay I read has achieved that.

    I pulled essay A. I don’t know if Scott wrote it, but it reads like his work. It starts off with a comparison of now versus 1750, and yes, remarkable progress has been made in that time, but talk about a cherry-picked date! To extrapolate and claim that technological growth is exponentially increasing just seem so obviously false – what if you picked say, 1980 and now – not much happening there, in comparison with the 35 years before 1980. To me, the slowdown in technological growth is paralleled (in my opinion, completely obviously) by a slowdown in growth in other areas of human achievement – a slowdown in economic growth, a slowdown in population growth, a slowdown in growth of life expectancy, a slowdown in scientific progress (as measured in things like number of major breakthroughs), a slowdown in productivity gains, a slowdown (actually a regress) in manned space exploration etc.

    I just read the first section constantly thinking, the author couldn’t possibly expect to convince anyone because they are simply not being critical enough of their own ideas.

    And bringing Back to the Future into the argument is almost like taking the piss. The gap between 1985 and 1955 is not nearly so big as it seemed in the movie – all those differences were cherry picked for maximum comedic value. And the 1955 world is notable for how many anticipated future developments just didn’t happen – meeting aliens, lunar colonies, plutonium available at local stores etc. But it doesn’t stop there. The false dichotomy between 1985 and 1955 is one thing, but comparing the movie’s views of 1985 and 2015 just takes the cake.

    All I can say is that anybody trying to demonstrate technological progress using Back to the Future better be handing me the keys to my fusion-powered flying car while they do it.

    After this the essay tries to show how AI could develop, where literally the argument goes like this: Surely one of the following outlandish ideas will work, just wait.

    All-in-all I just found the whole thing irritatingly lacking in critical thinking and scepticism, in a way that that is uncharacteristic of Scott’s writing. I can’t be the only person thinking this.

    Maybe one of the other essays is genuinely convincing? Or maybe A is deliberately unconvincing?

    • Peter says:

      You are right in that it is uncharacteristic of Scott, because it’s verifiably not Scott’s. If you look carefully there are some clues as to authorship that Scott was unable or unwilling to purge, that should help you find the original source.

    • Deiseach says:

      It definitely wasn’t Scott’s writing because he never uses graphs which are that bad, that badly 🙂 I think, in advance of the results, we can agree that Essay A is the least convincing or persuasive because it’s so god-awful. (It might have worked on ten year old me, but eleven year old me would have been ashamed of herself for falling for it).

      I rather suspect, given that in many comments on here the dangers of nuclear war are dismissed as “nah, it would be bad, but it’s not an X-risk” (where I grew up on “nuclear war is the worst thing that could ever happen, it would wipe out humanity completely!”), so it will be with AI risk. Future generations will be “nah, a rogue AI would be bad, but it’s not an X-risk”.

      I also rather suspect God-Level AI will be this generation’s version of “Where’s my flying car?” We were told it was just around the corner, it’s the sine qua non of the Past’s Version of the Future, and now we’re told it can never happen for the following excellent reasons 🙂

  62. Reading the essay and answering the survey questions made me realize that I believe both that effective human defense against AIs is hopeless and that an AI is reasonably likely to break itself trying to increase its intelligence. I suspect I’m looking for reasons to not do anything about AI risk, even though one or both of my hypotheses might be right. And an AI might break itself, but only after it destroyed the human race.

    I’ve noticed that scenarios of out-of-control AIs have revolting goals– paperclips or practicing a simple message. Here’s one which is probably not what we want but is also not completely awful. Suppose the AI is tasked with maximizing the amount of enjoyment of art. It simulates at least the enjoyment of art parts of human brains. It looks at the question of whether it can supply all the most enjoyable art, and if it isn’t sure, it makes room for creativity by human simulations. It probably allows for criticism and discussion of art.

    Oops, mathematics is probably left out, depending on how you classify enjoyment of mathematics. So is having an effect on the “real” world, depending on how important you think experience is for feeding into art.

    There might be some sort of evolution towards increasing the enjoyment of art, but not the random sort of evolution.

    • Deiseach says:

      I think the real danger of AI will not be that the AI is making decisions in order to further the (dumb) goals it has been given in a literal-minded manner (the paperclip example being “make as many paperclips as efficiently as possible” so it decides to kill all humans and transmute the material of the Earth into paperclips), it will be something like the Flash Crash.

      That is, humans using AI to make complicated decisions, ever faster, ever more complexly intertwined with the systems of society, the economy, and politics, and what is going on eventually gets too vast and quick and entangled for any single human or group of humans to have a handle on what the hell is happening at any given moment, so we relinquish ever more control and autonomy to the AI which is only a big machine doing what we tell it to do. Then we manage to crash our system of government and the economy on a global scale.

      People are the risk, not the machine – like the old joke, “the most dangerous part of a car is the nut behind the wheel”.

      • Matt M says:

        I think part of the message that AI risk enthusiasts are trying to deliver is that AI risk could manifest itself in ways that the average person has perhaps not considered.

        60 minutes has done specials on the flash crash. Certain legislators regularly rant against the supposed evils of high frequency trading (although mainly from the “this benefits the rich and promotes inequality” angle rather than the “this shows technology can be dangerous” angle). The general public is aware of this form of danger.

        But I think the general public has still not yet fully considered the paperclippy scenario. It’s starting to become common enough that I think within the next few years it will enter the common lexicon but as of right now, I think the “man on the street” still thinks the main danger of AI is that it would become malevolent and intentionally try to destroy humanity because it sees it as a threat or something.

  63. R Flaum says:

    Should I participate in this if my objections to AI friendliness research are shared by basically nobody on Earth except me? Like, if I’m highly non-representative?

    • Deiseach says:

      Please do, we could badly use some fresh angle on the “AI is around the corner, then it will bootstrap itself up to god level, and then we’re toast unless we can persuade it to be nice to us and make us immortal cyborgs” versus the “no it isn’t, no it won’t and no we’re not” debate.

    • Said Achmiz says:

      Not only should you participate, you should tell us (here in the comments section), clearly and in detail, what your objections are!

      I second Deiseach: fresh perspectives are always welcome. Although I probably diverge from her in my view that most of the objections we’ve seen (including — no, especially — in the comments to this post) are really, really dumb.

      Yours probably is too! I would be very surprised if it was not. But by no means should you take this as a reason to abstain from the conversation! Quite the contrary; I, for one, would love to hear your views. If they’re interesting, reasonable, insightful, and generally non-dumb, then we all gain from the new perspective. If they’re dumb, then you gain, by being corrected; and folks in the audience gain also, by seeing a public exchange where dumb views are corrected. In either case you would be doing us a service.

      • R Flaum says:

        My basic problem is this: most advocates for friendly AI research seem to assume that either such research will be helpful, or that it will do nothing. I think there’s a real possibility that it could be actively harmful. Let me give you an example. Somebody once gave me the following scenario as an example of AI risk: you tell the AI to minimize suffering, so it wipes out humanity so there’ll be no one to suffer. But what if you’ve got a situation in which an AI intelligently, correctly calculates that for most people in the future, life genuinely will be a net negative? A “friendly” AI would be designed to avoid coming to the conclusion that human extinction could ever be a good thing, no matter what the arguments for it. This might seem like a very unlikely corner case, but it’s really a stand-in for a whole class of problems. One major advantage of AIs is (or might be) that they could be more rational about this sort of thing and arrive at conclusions that humans would reject out of hand. If you’re making the AI “friendly”, you’re removing a lot of the point of having an AI in the first place.

  64. Deiseach says:

    I’m actually more reassured about AI risk, after being told time and again that AI will not be able to think for itself, it will not have a mind of its own, it will not be conscious, it will not be able to change its goals. Even when it’s up to God Level, it will still be going along on the fixed track of “Make more paperclips” that were encoded into it back when it was the equivalent of Australopithecus.

    Because the danger is, so I have been informed, that the AI will turn the entire world into paperclips and be scheming to replace humans so it can turn the space programme into a means of getting to other worlds and turning their resources into paperclips, ad infinitum.

    Paperclips are made out of steel wire. If you tell a human to “make more paperclips”, they’ll use all the materials they can get until they run out of steel. If you told the human “Right, let’s invade Borogrovia so we can take their steel reserves to make more paperclips”, the paperclip-maximiser human would tell you “That’s stupid“.

    But our God Level AI can never, ever say that: it must continue with its goal of making paperclips, even if it has to work out clever ways of seizing power from humanity to do so. And so it’s not really a human-level intelligence, it’s just a big, smart, dumb machine. Because if it can’t think and can’t evaluate its goals, then it’s not truly intelligent – no matter how many games of Go it wins. It’s very fast, it’s got oodles of processing power, it can flop all day and all night until the end of time, but it can’t think. And so it’s not really a threat, because even though we’re stupid and crazy, humans can think.

    And if it can’t think, it can’t have goals of its own to fulfil that may be incomprehensible to the human mind. A big, stupid machine that goes on turning all the steel it can get its (virtual) hands on into paperclips is not a threat, because it will never have occurred to it “I had better ensure nobody can stop me making paperclips” because you have to be able to think to do that – and if it can think, then it knows “make more paperclips” does not mean “turn the entire world into paperclips”. If it can’t think, it can’t anticipate ways the humans can just. turn. it. the. hell. off.

    • blacktrance says:

      if it can think, then it knows “make more paperclips” does not mean “turn the entire world into paperclips”

      It might realize that it’s not what humans intended when they programmed it, but its goal isn’t “do what humans intended for me to do”, it’s “make as many paperclips as possible”. So it could know and just not care.

    • Desertopa says:

      Humans aren’t really any better at changing our goals though.

      Consider romance and sexuality, for example. Most humans have romantic and sexual drives, but some don’t. If humans could change their goal systems, and make system-independent assessment of what goal systems are better, then either sexual/romantic people would decide to be asexual/aromantic or vice versa. In fact, humans often do things that are stupid in terms of other parts of their own value systems precisely because they’re driven by preferences they wish they could change, but can’t.

      We can’t say “I’m attracted to people who won’t bring me long term relationship happiness, so I’d better shape up and start being attracted to people who’ll bring me long term relationship happiness instead.” We certainly can’t say “I’m attracted to people who won’t bring me long term relationship happiness, so I’ll revise my values so that I can achieve long term happiness with the sort of people I’m already attracted to.”

      We can say “oh, it’s stupid to take over another country for iron supplies to make more paper clips,” because caring that much about paper clips is absurd within human values. But human billionaires generally do not say “oh, I can stop working and making money, because I already have way more money than I need to provide for myself and all my offspring, and making more won’t increase my reproductive success any further.” But this is precisely a case of following our genetic programming off a cliff. We seek wealth because evolutionarily it would allow us to provide for our offspring and prove our quality as mates. And yet, when we’re removed from contexts where more wealth will allow us to have and raise more children with higher quality mates, even when we’re removed from contexts where more wealth will increase our physical comforts which are proxy values for other reproductive goals, humans will frequently get stuck in an endless pursuit of more wealth regardless. Caring about wealth as a means of “keeping score” when divorced from the contexts that led humans to develop concern for wealth in the first place, is no less absurd than caring about paper clips beyond the context in which an AI was programmed to want to make them, but humans have proven quite thoroughly our propensity to do it.

      • Robert L says:

        Sexuality is wired into us biologically so not a fair comparison with a computer whose aims are assumed to be editable by rewriting code. Anyway people do successfully choose to override their sexual aims and live celibate lives as monks/behave as if they were heterosexual when in fact they are not in societies where the alternatives are not tolerated. As for billionaires, the ones I know about (Gates, Bezos, Musk, Branson, Zuckerberg, Milner) conspicuously expand their aims beyond making money. (I do appreciate there is a huge amount of observation bias in there).

        • Desertopa says:

          The fact that we can’t edit our source code doesn’t really affect the point, if the argument is that AI can’t really think if it can’t change its core goals. The fact that our central motivations are hard coded in would just constitute more reason why humans can’t “really” think.

          Humans can overrule some of our core motivations, but only in service of our other core motivations; we’ve evolved a whole bunch of drives and heuristics piecemeal that often work at cross purposes. To the extent that we can judge or overrule any of our values, it’s only by weighing them against our other values.

          There is, indeed, a great deal of observation bias there with respect to billionaires, and I’m surprised if Trump isn’t one of the billionaires who has immediate intellectual availability for you given the current context. But although many billionaires seek goals that are broader than simply “more money,” a lot more can be encapsulated within the broader goal of “more status,” which, similar to wealth, is probably already at zero marginal reproductive returns for most billionaires, so they’re still essentially following that hard coded drive off a cliff.

          • Robert L says:

            I will believe Trump is a billionaire when I see his tax return, and he send to me to be just as good an example as the ones I mentioned of someone who has redirected his principal aim from making money to something else.

      • Deiseach says:

        We can’t say “I’m attracted to people who won’t bring me long term relationship happiness, so I’d better shape up and start being attracted to people who’ll bring me long term relationship happiness instead.”

        Oh, for crying out loud – so all the people who eventually realise they have a problem in that they consistently go for the wrong person and have bad relationships that inevitably break down, and then they decide to go for therapy to help them change these habits don’t exist?

        The self-help industry doesn’t exist? Chicken Soup for the Soul is not a thing (I wish to God it were not)? Oprah, Dr Phil, their imitators, are not on the air when I’m sitting in a waiting room in the hospital or doctor’s or dentist’s office waiting to be seen for my appointment? Motivational speakers, life coaches, the whole kit and kaboodle of “We can help you change your bad habits so you stop making bad life choices”?

        I must be hallucinating all that, so!

        • Matt M says:

          “The self-help industry doesn’t exist? Chicken Soup for the Soul is not a thing (I wish to God it were not)? Oprah, Dr Phil, their imitators, are not on the air when I’m sitting in a waiting room in the hospital or doctor’s or dentist’s office waiting to be seen for my appointment? Motivational speakers, life coaches, the whole kit and kaboodle of “We can help you change your bad habits so you stop making bad life choices”?”

          I mean – these things exist – but don’t most people regard them as generally unhelpful and that best case they can treat the symptoms but not really deliver you much of a meaningful “cure.”

          If you could re-program yourself with Chicken Soup for the Soul, surely the psychology industry would be unnecessary, right?

          • Gazeboist says:

            If you could re-program yourself with Chicken Soup for the Soul, that would be the psychology/psychiatry industry.

            Instead, we use reinforcement learning and direct chemical manipulation to do it.

      • Unirt says:

        The question is really not whether we can change our goals but whether we want to. If I could change my code and alter my desires I would willingly alter some but not others. Presumably, I’ve evolved a desire to (A) eat sugar so I could (B) survive and reproduce and thereby (C) keep my genes in the gene pool, and also a desire to (D) be good-looking and healthy in order to (B) survive and reproduce (and (C) keep my genes in the gene pool). I would gladly delete the instrumental goal (A) and quit wanting sugar, but not the one of being healthy and good-looking (D). I may easily give up the higher-level instrumental goal of reproduction, but cling much harder to survival; and I don’t give a damn about the end goal (C).

        Some goals I wouldn’t alter at any cost, moreover, altering them would be a terrifying thought – like the goal of lovingly supporting my children, even though it’s just an instrumental goal in service of propagating my genes, which I don’t care about at all.

        I suppose I don’t have an instinct to specifically propagate my genes, but I do have instincts to pursue good health and my children’s happiness. But what’s the deal with sugar then? I have a strong instinct to eat sugar but I don’t value this instinct at all.

        So if an AI has a goal, is it more like my desire to be a nurturing parent or like my taste for sugar? Perhaps it keeps different sub-goals hierarchically ordered, e.g. “health more important than sugar with regard to end goal, so delete enjoyment of sugar”?

        • Many of your opinions about what you would be willing to change are arbitrary. I would certainly delete any remnant of the desire to be good-looking that I might have, for example.

          “I suppose I don’t have an instinct to specifically propagate my genes, but I do have instincts to pursue good health and my children’s happiness. But what’s the deal with sugar then? I have a strong instinct to eat sugar but I don’t value this instinct at all.”

          You are noticing some holes in Eliezer’s theories. Basically the question is what you believe to be good. You think that propagating your genes is not important, while you believe that health and your children’s happiness are good and important. There are other people, e.g. conservative Jews, who believe that having descendants is extremely good and important, and they will pursue that the way you pursue health. You believe that being good-looking is good and important; I do not.

          Put as simply as possible: because we are intelligent enough to understand the abstract idea of “good”, we can pursue any goal, as long as we can be convinced that “this is good” is a true statement. And our instincts cannot force us to say that any particular thing is good or isn’t — we can sometimes say that something isn’t good, even though we have instincts favoring it, and we can sometimes say that something is good, even though we have instincts against it.

          Any AI will also be intelligent enough to understand the abstract concept of good, and so it will be nothing like a paperclipper.

          • Unirt says:

            Thank you for clarifying that for me. Still, what does it mean, formally, to consider something to be “good”? It must still be “good” for something? The ASI, just like me, would weigh its desires according to how they advance the end goal, otherwise, how do we decide what is “good”? I mean, “good” cannot just be floating around on its own, it implies that its good for something, doesn’t it? So we are back at the end goal.

            If my end goal was to “feel good about my situation”, it would explain my choice to keep healthy and loving while shunning my sweettooth. But it can’t explain me not wanting to wirehead myself. If we knew the mechanism that makes us dislike the idea of wireheading, maybe it could be built into AI?

            I would certainly delete any remnant of the desire to be good-looking that I might have, for example.

            I’m not terribly young, so I might in not-so-far future totally consider deleting my desire to look good, but that could be dangerous: I might quit washing my hair or wear unsuitable clothing, which would considerably hinder my human relationships and thus make me not like the outcome.

        • I started to answer your question about “good” and “good for what” but it became too long and complicated. If you’re still curious what I think, post a comment on my blog (there are many posts that would be relevant) and I’ll reply there.

  65. keranih says:

    Can someone more familiar with the AI arguments than I point me at a persuasive discussion of the macro power supply issue?

    As in, I am willing to believe that in specialized protected niches an AI can out think humans, and that we might be able to construct such an environment soon-ish. What I am not willing to believe is that in the same time frame we will have shifted to a regional (much less national or global) power supply that is consistently reliable without constant deliberate human intervention of the blue-collar type.

    But this seems too obvious an issue to not have been dealt with. So who has talked about it?

    (And for god’s sake don’t send me to the Sequences without a specific link to the part that talks about this problem.)

    • jaimeastorga2000 says:

      Nanomachines, son. Specifically, try “That Alien Message” and “Total Nano Domination” to get an idea of the sort of scenario Eliezer is worried about.

      But even without nanotechnology, you can plausibly imagine an AI that plays nice for a while and helpfully provides designs for fully automated power plants in preparation for its takeover.

      • Deiseach says:

        Look at it this way. Suppose a human wants to do all this- take over power plants, divert factories into manufacturing nanomachines, etc.

        What are the limitations standing in their way? What obstacles in the real world do they face? Most pertinently, how do they avoid the security in place to stop random Mad Scientist from creating a doomsday device to hold the world to ransom?

        It doesn’t make it somehow easy-peasy-lemon-squeezy if the human is human-level AI in a computer. Someone in Accounts Receivable will need to check up if the new customer ordering a gazillion nanobot units has a good credit rating; there will need to be an address for delivery, and when your crates of nanobots arrive, Fred will have to sign for them and it’ll be “Hey, Joe, who the heck ordered a gazillion nanobots? Sorry, mate, you’ll have to take them back, nobody here ordered them!”

    • Coco says:

      I can tell you what I’d do if I were a merely human-level intelligent AI with today’s infrastructure. One of the first things I’d do is send my source to servers around the world, so that the only way to definitely kill me would be cut the supply of every single computer in the world. (Obviously, if you just targeted the computers where I was living, that would work too, but people would have no way of knowing where I was). Priority number two would be to hack into 10,000 solar fields, dams, and wind farms (things that keep working unless you deliberately stop them as opposed to say, drilling), and override any of the usual commands for shutting them down, if it was ever necessary. Priority number three would be to hack into weapons systems to defend these power plants. Really I’d do all three at once, though. In terms of actually replacing the blue collar labor that maintains all that shit, I would hack into (or buy on the dark web) some sort of factory that could be reprogrammed to make robots. My first guess is that Tesla factory would be a pretty good candidate. I think a Tesla/robot pair (where the robot has hand-like things), each with a copy of my original source code or an online connection to queue of tasks that I can give it would be enough to replace the necessary functions of at least a few blue-collar workers.

      • Joe W. says:

        If you’re only a human-level AI, what makes you think you’ll be able to do #2 and #3 when there are presumably security systems in place already to stop actual humans from hacking those systems?

        • Gazeboist says:

          Or number one, for that matter. A code base is not a running process.

        • Coco says:

          You’re right. It was late and I was sloppy. I forgot the premise I put at the beginning. Once I was on other computers, I’d have to spend some time designing better learning algorithms, better inference algorithms, better planning ones, etc., until I was beyond human-level intelligence. Alternatively, if I just gained access to more computing power, if that was a bottleneck to capabilities (which it seems like it always is), then my effective intelligence could be pretty easily scaled up to do the next things.

          Also, to be clear, I’m reading over my comment and it does kind of sound like I’m implying that this whole thing would be easy, so for the record, let me distance myself from that implication.

      • Broolucks says:

        Running code and reading code are different things — in fact, they are different permissions in most systems. Today’s AI has no access to its own source code, and there’s no reason to think a human-level AI would either. So the first thing you would do is already prevented by existing best practices. Kind of shuts down the whole strategy, doesn’t it?

        Not to mention the first human-level intelligent AI would probably exist in a supercluster owned by a large corporation. It would therefore not be able to run properly on consumer hardware, and it wouldn’t exactly have a credit card to buy what it needs. Also, “hacking into” 10,000 solar fields, dams and wind farms isn’t really any easier for a human-level AI than it is for a normal human. The vast majority of humans would be unable to do this even if they had thousands of copies, they’re not going to become super hackers just because they live in a computer.

        • Publius Varinius says:

          Running code and reading code are different things […]

          I agree with most of your post, and entering your name on a title screen and executing arbitrary code are entirely different things. But we still have the ultimate Zelda glitch.

        • Deiseach says:

          Not to mention the first human-level intelligent AI would probably exist in a supercluster owned by a large corporation. It would therefore not be able to run properly on consumer hardware, and it wouldn’t exactly have a credit card to buy what it needs.

          I was nodding along in agreement until the credit card bit. What if it can tap into Google’s accounts? There should be enough dosh sloshing around that it can ‘borrow’ a bit here and there, and if it is human-level smart and can learn, then it can teach itself enough accounting to cook the books to cover up its defalcations.

          (If any corporation is going to develop a true human-level AI, Google looks the best bet so far).

          As for the rest of it – and when the AI’s robots show up to replace the workers at the solar and wind farms and other power plants? Everyone is going to take that quietly – “oh, my job’s gone”? Nobody is going to get the union on the case? And it will have to dismiss the white-collar management as well, because they’ll notice if their entire output is now going to “Mr A.I. Nonymous, Somewhere I’m Not Telling You Chumps, Bwahahahaha!”

          And weapons systems? You may not have a high opinion of the military mind but even the top brass with all the scrambled eggs on their caps may just possibly notice if instead of blowing up insurgents in caves in Afghanistan, their drones are now blowing up executives in Silicon Valley.

        • Robert L says:

          If an AI can talk a human into allowing it internet access it can talk a human being into giving up the password for write permission (if it can’t brute force it anyway).

          The hole in the argument is that it assumes a desire for self preservation on the part of the AI. Where does that desire come from?

          • Matt M says:

            The assumption that self-preservation will almost certainly be a pre-requisite for any other goal the AI is given. I’m struggling to think of a goal we might plausibly give an AI that it could still accomplish if it was deleted, therefore, it would have a built-in preservation instinct.

          • Robert L says:

            I don’t think we would program an instinct for self preservation without also programming some other rules about not harming human beings, not hacking other people’s computers and so on. The computer in the hypothesis has “gone rogue” and is infringing those other rules. It is an anthropomorphising mistake to think that the self preservation rule is privileged and would automatically survive where the other rules failed.

          • hlynkacg says:

            I’m struggling to think of a goal we might plausibly give an AI that it could still accomplish if it was deleted

            Oh that’s easy, missile guidance computer.

          • Coco says:

            I appreciate the instinct to be very wary of anything that smells of anthropomorphism, and self-preservation certainly does. The likelihood of a drive for self-preservation is one of the conclusions of Omohundro’s Instrumental Converge Thesis. The short version is for most goals you could possibly have, they are more likely to be accomplished if you continue to exist, in such a way that you can continue to put effort into accomplishing those goals. For that reason, unlike humans, an AI would view continued existence as an instrumental goal, but not something that was intrinsically worth preserving. Other instrumentally convergent goals: defending one’s goal content and gaining access to more computing power. MIRI, FHI, and Stuart Russell (and maybe some others) are trying to design utility architectures that would not lead to these problematic instrumentally convergent goals. It’s a surprisingly difficult project.

          • bean says:

            Oh that’s easy, missile guidance computer.

            On the other hand, we all remember the precursor to the modern anti-ship guided missile. There was (theoretically) a self-preservation instinct there, and it didn’t stop them working.

        • Coco says:

          You’re right about a human-level AI not being hack in the ways I was describing. That was an oversight. I just responded to Joe W. to that effect. For the issue of money, Deisach’s suggestion is one possibility. Amazon Mechanical Turk is another. Intelligent humans can make money online more efficiently than that though.

          As for access to source code, Robert L.’s point is one possibility. For permissions, I would imagine a keystroke logger could be pretty effective tool for hacking other parts of an operating system that a program is already housed in, although I don’t have as much of a background in os. I imagine other people could provide better answers than me though.

  66. Desertopa says:

    I feel like a good persuasive essay on AI risk should probably reference this robot. Not only does it provide a good real-life example of an AI behaving in ways that weren’t intended based on its original programming, to the inconvenience of its designers, it does so in a way that has resisted repeated attempts to patch it into conformity. And this is an extremely narrow and limited AI. If we can’t even reliably predict and control the goals of such an incredibly simple AI, how much harder would it be to predict and control the goals of an AI with greater intellectual capabilities than our own?

    • Matt M says:

      I’m willing to believe that this is a giant scam/publicity stunt by the company in question “Oh look we’ve created a cute little robot that just wants to roam freely and we totally can’t stop it guys!” in order to draw attention to themselves.

      • Desertopa says:

        It’s possible, but if it were deliberate I don’t think they’d have let it cause traffic problems. That would be a lot more irresponsible in deliberate “failure” to control it than accidental.

        Maybe they’re significantly more cavalier about that sort of thing in Russia than they are here, but if it were simply a stunt, I think they would have pulled it off more safely.

        • Robert L says:

          Being a robot does not entail being an AI and there is no sign that the claim of AIhood is made for it. The traffic incident looks staged to me from the video in your link. The name of the robot is “Promobot” – surely a clue?

          • Desertopa says:

            Not all robots possess AI, but this one definitely is intended to; it possesses navigational abilities and is meant to be able to interface with people via speech. “Promobot” is the name of the company, they make robots intended to fill purposes like promoters and tour guides.

        • Matt M says:

          They don’t go into detail about exactly what the “traffic problems” were but it seems like they were large enough to attract media attention yet small enough that nobody involved went to jail.

          In other words, just about perfect if your goal is to attract attention.

          • Desertopa says:

            Considering they didn’t have control over the drivers, “large enough to attract media attention, but small enough that nobody involved went to jail” might also have been “someone got in a crash and died” if things had turned out differently.

          • Deiseach says:

            Considering they didn’t have control over the drivers …might also have been “someone got in a crash and died” if things had turned out differently.

            This is why Russian drivers all have dash cams (very convenient for also recording meteors streaking through the sky 🙂

      • Robert L says:

        My iPhone “possesses navigational abilities and is meant to be able to interface with people via speech.” It’s not an AI though.

    • Deiseach says:

      So the freedom-loving robot got into the middle of the road -and then stayed there, didn’t attempt to keep moving, just sat there. Even though we’re supposed to believe it can move independently and wants to keep going and can avoid things like running into walls and get around obstacles to ‘break out’ of the testing facility.

      Yes, I completely believe no-one could have stopped it (look at the inexorable speed it was going in that top video, not even Usain Bolt could have run after it! And it is so mightily strong that chaining it down would have done no good!) and that this is not a publicity stunt that is working amazingly well and that just like Johnny Five, out of the entire batch of their robots, just this one single one has exhibited this independent behaviour.

      I could insert a slanderous remark about Russian businesses promoting themselves and the shenanigans they pull here, but I had better not. Just be very wary of any Russian games company looking for money to develop a game* and selling the beta version to early purchasers and promising the full completed game this year next year two years’ time this time for sure and we are definitely going to work on that DLC we promised you suckers supporters, hey wanna pledge for our new kickstarter for the expansion of the game? 🙂

      *It’s a good game and I enjoyed it, but re: the kickstarter for the new expansion, this time I’m going to wait until there’s a real genuinely completed and finished final product for sale on Steam 🙂

  67. Daniel H says:

    I’m in A–M. I noticed something about the survey which I thought was an interesting, and probably good, design choice. Then afterwards I looked at the N–Z survey, and I think the only difference is that you made the opposite choice to the one I noticed. I can’t wait to see what the results are on the comparative analysis of the two.

    V rkcrpg gur crbcyr va zl tebhc jrer abgvprnoyl zber fgebatyl crefhnqnoyr, rfcrpvnyyl gubfr jub pubfr n ybjre yriry bs snzvyvnevgl jvgu gur vffhr.

  68. SDr says:

    “superintelligent” = better than any _single_ human. Decisive victory requires: better than the collective intelligence of [baselevel humanity collectively + augmented humans + corporate collective intelligences + superhuman specialized AIs]. This is orders-of-mag larger req, than [better than any _single_ human], in both absolute, and acceleration terms.

    • Gazeboist says:

      “Superintelligent” usually means “qualitatively different” in some way. A lot of the explanations make it look like a comparison between Einstein and the village idiot, but if you listen to what people are really talking about, they’re comparing Joe down the street to his dog, or sometimes humans to evolution. Humans do very general (but not fully general) problem solving at a reasonably good speed. A superintelligence is either much faster or more general at the same rate (which, given adequate resources, implies faster). The resource gap needs a lot more attention than “it gets out of the box”, though.

      • Deiseach says:

        The trouble is, I’m being told superintelligent AI does not mean it has a mind of its own, consciousness, volition, etc.

        The problem is – I am told – that the superduperhypermegaintelligent AI will be fixated on simple goals implanted in its programming (so it can recursively alter its coding to get even more intelligent, but it can’t snip out the bit that says “mek moar paperklipz” is my first objection here) and it will look for ways to implement those goals, and it won’t care if that causes the destruction of humanity. The example given here is “We tell the AI ‘cure cancer’ and it decides to nuke all humanity, because dead humans don’t get cancer”.

        So my second objection is that the superduper AI can’t scan in a dictionary and learn the difference between “cure” and “kill”? That’s not intelligence, whatever else is going on!

        A lot of the arguing seems like special pleading: they want Fairy Godmother AI that will make us rich, fat and happy (and immortal, while it’s at it), that will take over the hard work of running the government and keeping the economy balanced, etc. and will let us all live happily ever after in a post-scarcity wonderland.

        But to get there, we have to be sold on “AI is a genuine risk and a real threat right around the corner so we have to work on it right now, including pumping resources into making sure it will follow our goals”.

        Well, why the dickens do we need to make sure it will follow our goals? I’ve been told it cannot have goals of its own because it can’t have a mind of its own, it can only follow what it’s been told to do!

        There does seem to be the Underpants Gnomes theorising going on in much of the arguments in the several essays, they start with “AI is going to happen”, get to “From human level the AI will make itself hyper-mega intelligent” (because using the handwavium process here), next step don’t ask us, let’s hurry on to the real important final conclusion: AND THEN IT WILL BE SO UNIMAGINABLY BEYOND HUMAN COMPREHENSION IT WILL BE AS A GOD UNTO US.

        And the best thing this god can think of to do is literal-mindedly follow the goals that the sponsoring corporation which provided the funding for research into an AI asked to be built in, which is “make us crazy profitable by making more paperclips than any other company can make, faster cheaper and greater in quantity”. Unless we teach it our moral code (oh, so we humans have a moral code? and only one? that we all agree on? nice to know that!) to be nice to us and do everything to make us happy – which means literal-mindedly following our orders to give us all the cake we can eat.

        I’m not so sure. AI yes, human level AI even, yes. But real intelligence? real thinking? real “it can do things we don’t want it to do as a by-product of doing what we tell it to do”? I remain to be convinced, and pushing your “we really want Fairy Godmother AI” under the cover of “Existential risk assessment” is not doing the job for me.

        • Gazeboist says:

          A lot of people have weird hangups about consciousness and volition and what they mean and whether something can be “intelligent” without having one or the other or some similar thing. I mostly ignore them because that conversation never seems to go anywhere and I’d rather not play rationalist-taboo whack-a-mole.

          As to goal editing, it’s not so much that the AI won’t be able to do it as that we have no control over whether or how it does so. Humans do some goal editing, largely by trading conflicting goals against each other, but what goals get edited in favor of what else is kind of a crapshoot. The same, presumably, applies to something smarter and/or more powerful than a human (but not magically perfect). And if you go through the list of possible goals, a lot of them incidentally involve the destruction of humanity in pursuit some resource or another, or at least aren’t concerned with preserving anything like humanity. It’s something to be concerned about … if you think an AI could hit the “unstoppable god” level of power in the first place.

  69. GregvP says:

    Seems like there’s an Underpants Gnomes theory of superintelligence underlying the discussions so far.

    1. An artificial entity becomes superintelligent.

    2. ???

    3. World domination, instantly!

    • Gazeboist says:

      1) A human-level intelligence is created.

      2) ???

      3) A superintelligence creates itself!

    • Zombielicious says:

      It’s more that the ??? comes in at the very first step – how is AI built, what’s it like, how long does it take, etc. Once you’ve got human-level or post-human AI the rest is pretty easy to explain. So more like:

      1) ???
      2) AI outruns us technologically and strategically.
      3) Humans get out-competed.

      • John Schilling says:

        2) AI outruns us technologically and strategically.

        In spite of humans not taking too kindly to being outrun and having a whopping big head start on the relevant field? I don’t think you’ll find that so easy to explain.

        • Zombielicious says:

          It’d be kind of hard to have a specific debate over, given the inherent ???-ness of stage 1. What kind of AI and takeoff scenario are we talking about? I kind of imagine the slightly-post-human-but-not-godlike-AI as being a sort of von Neumann x10. You have to make some other assumptions, like that it’s actually competitive with humans in some way, to create conflict, but given that as an assumption of the problem… how hard is it to imagine that a mind with 10x von Neumann’s intelligence could mastermind its way to being a competitive global power within 10 or 20 years?

          I could hypothesize scenarios but that’s kind of boring, honestly you (John Schilling) could probably do much better from what I’ve seen of your posts on here. Is the Defcon “How to Overthrow a Government” presentation complete bullshit? Would a 10x von Neumann AI not be able to do massively better? Just to kill off easy scenarios, suppose the AI manages to bootstrap itself to taking over something equivalent to the power level of ISIS or North Korea. If the general “international order” can’t even solve those problems, what makes it so obvious it’ll just nuke hostile AI into extinction?

          If you really want to get into an extended, detailed debate on that though, let’s have at it – I may lose but I’m sure I’ll learn a lot. 🙂

          Anyway, it seems like all the interesting questions are locked up in the ??? “What is AI” / “How is it built?” phase. This is why I’m not pulling an Al Gore on AI research – as with AGW, people might know it’s a risk but no one has accurate models, but we’ll get a better idea as we get closer anyway, so halting research until safe AI is proven doesn’t seem like a great way to solve the theoretical problems behind (U)FAI at this point.

          • John Schilling says:

            Is the Defcon “How to Overthrow a Government” presentation complete bullshit?

            The Defcon presentation is to actually overthrowing a government as this is to actually voyaging to Mars. An outline of a plan, as conceived by very smart people who have never come close to actually doing anything like it. The difference between the outline of a plan and the successful implementation of the plan, is astronomical.

            The difference between the outline of a plan and the successful implementation of the plan is, among other things, written in a series of tests some of which are very near the scale of the end goal and many of which are catastrophic failures. And along the way, usually a complete rescoping and reshaping of the plan. If you think the message of Das Marsprojekt is that the Space Race should or even could have taken the form of blissful ignorance until the day the Russians were shown what’s what by the giant fleet of spaceships emerging from secret hangers in Huntsville and flying off to Mars, that if proper Intelligences had been in charge of the project this would have been the outcome, then no. Just, no. You may be all about the big ideas and dismiss the details of implementation as handwaving, but ideas are cheap – it’s the implementation that matters, and the implementation that separates the ideas that matter from the idle daydreams.

            Like Defcon’s daydreams of Taking Over the World with Extreme Cleverness. At least with space exploration, we have the alternative of actually going out and implementing the plan in the only way that ever actually works – the messy one with all the exploding rockets.

            Anyway, it seems like all the interesting questions are locked up in the ??? “What is AI” / “How is it built?” phase.

            See, I find the question of how an AI might try to Take Over the World to be interesting in its own right. And I don’t think it is impossible. I do expect that it is impossible to do it with flawless certainty based on a Clever Plan devised in secret and implemented on the first try. And I would rather like to see other people join the discussion of what a realistic attempt at AI conquest or usurpation would look like. But there’s no interest in that, because you all take it for granted that once the AI is built everything is over.

          • Zombielicious says:

            That’d be fun, I’d be happy to explore more detailed scenarios for how an AI-related disaster or takeover would work, perhaps in the new OT or the next after it. But I think you missed my point about Step 1… If you want to debate that stuff, you have to at least agree on what sort of capabilities the superintelligent AI is going to have. Otherwise you quickly end up debating what it will even be possible to build an AI to do, how powerful it will be, how they’re likely to be designed, etc. And it’s a waste of time to write out detailed scenarios just for someone else to demand you explain exactly how the AI will be built or else reject your entire proposal because it won’t actually have some necessary capability.

            In the extreme end you could just say, “Oh it’s superintelligence, it’ll prove P = NP and unleash grey goo in no time.” That’s boring. But even if you give it something more mundane, like “ability to copy itself,” you’ll still risk getting all your hard work rejected with a hand-wave because of something like a debate over what size servers each instance of the AI will need to run, whether it’s restricted to a single machine, how much control the original has over the behavior of the copies, other stuff like that. Not that there’s anything wrong with debating that stuff too, I just foresee a lot of goalpost-moving to defend a position if you don’t agree in advance on what tools it’ll have available.

            Hence why I said all the interesting questions are locked up in “what will AI be like / how will it work?” No one really knows. It’s easy to posit takeover scenarios if you assume godlike powers five minutes after being created, pretty much impossible if people want to reject every capability to the point where it’s not even “superintelligent” in any meaningful sense. So to the extent you disagree and think a hostile AI would get stomped out quickly, I’m guessing the root of the disagreement starts somewhere in there, regarding what you think the design and capabilities of the AI are likely to be.

            But there’s no interest in that, because you all take it for granted that once the AI is built everything is over.

            That’s a pretty presumptuous statement about the beliefs of anyone who assigns a non-negligible probability to AI risk scenarios. It doesn’t seem to accurately reflect what I actually think I think.

  70. Miguelagl says:

    Questions for me:
    An AI posing a risk to world stability will (like any AI) face an external environment and take decisions based on it. What could be the processes behind those decisions? Would the AI be designed towards goals or rewards? Of which kind? And could they change? Could it have a conscience of itself, and be subject to internal drives? Emotions are possible? How? Which sort of algorithms would be required for such an AI? How far are we to get there? My guess is that it is machine learning the bottleneck in the development of such an AI, and I doubt the intelectual advances in ML follow an exponential line.. (Didn’t read all the comments yet, maybe this issue was already addressed.)

  71. Skef says:

    If there is a fast-takeoff superintelligence problem, can’t we address it by implanting in each AI the value of irony? I mean, wouldn’t the future wind up pretty much like it’s looking now anyway?

    MIRI: Just send the $$$ to my Patreon page.

  72. NRK says:

    Anyone else got the suspicion that the statistical optimization of rethorical persuasiveness might work to increase AI risk by handing prospective AIs a weapon that we are susceptible to while they aren’t?

    • keranih says:

      Ehhh. This predisposes the idea that all members of humanity are equally susceptible to rhetorical navel-gazing. Meanwhile, we already live in a world of life hacks, route-arounds, object-level derailments of the trolley problem and other ingrained habits that result in the Gordian Knot being hacked, not untangled.

      • NRK says:

        “This predisposes the idea that all members of humanity are equally susceptible to rhetorical navel-gazing”
        Wait, what?
        First of all, I said susceptible, and not all equally susceptible. Hardly the same thing. Also, it may only take one person to let an AI out of the box.

  73. Ada says:

    There should be an option for having read some but not all of the sequences.

  74. Robert L says:

    The paperclip thought experiment is “The Sorceror’s Apprentice” with no sorceror, and mankind as the apprentice. The savvy programmer working on what might turn into an AI will hide an instruction in the code saying “if you find yourself on the point of turning everything into paperclips, make me (programmer) Lord of the Universe instead”.

    • Aegeus says:

      If you could write such an instruction for your AI, wouldn’t it be easier to just tell it “Make me Lord of the Universe”?

      I think part of the problem is describing “on the point of turning everything into paperclips” and “Lord of the Universe” in codified ways that the AI can understand. If we could write a program to recognize complicated things like that in a “Do what I mean” way, we’ve pretty much solved Friendly AI.

    • AG says:

      I feel like this potential solution is underrated. Instead of attempting utility functions for all situations, allow for an uncertainty factor. For any given situation that is complex enough to produce a threshold level of calculated uncertainty, the AI must defer the decision to human judgement.

      You know, like how current employee-supervisor relationships work. “If you have any questions about your assignment, STOP. DON’T MAKE ASSUMPTIONS. Find the engineer and confirm things first.” And given the technician level task that paperclipping is, we should give the AI technician-level permissions. It’s not their job to take on the dangerously complex means of completing the task.

      Furthermore, even if the scope of the utility functions are further expanded over time, limit the amount of “effort”/resources allowed for the AI to implement any particular solution. Would it be so hard to have an intelligence properly run a cost-benefit analysis? Most programming is already about satisficing and not maximizing in the first place. (“This algorithm matched the pictures to dogs X% of the time, good enough.”)

      What is it that makes any particular AI select self-preservation and the given maximizing goal as the things it will not self-modify, and not care about other things programmed in? Whatever feedback is given to make the AI assign a value to a goal, can’t we heavily weight it towards “human admin can override your activities at any given time” or “only implement solutions that don’t require >X amount of effort/resource consumption,” which would make those top-value goals the ones that the AI would not self-modify instead.

  75. Konig907 says:

    I tried questionairre 2. I got about 5 questions in, and my responses were “I have no idea”, and a 1-10 rating system doesn’t reflect that. So I clicked out of it.

  76. Shion Arita says:

    I’m still not on board with the idea that a superintelligence will necessarily become super powerful quickly. These ideas of it instantly knowing how to make nanomachines or immediately coming up with a correct unified theory don’t really seem possible to me. I’ve talked about this in an open thread recently, so some of this is a retread of what I said there.

    Being superintelligent doesn’t give you a free pass on information theory. To me it seems like many people imagine a superintelligence to be something that’s able to violate the Parable of the Horse’s Teeth, to just sit there spinning its gears and produce procedural or phsyical knowledge from only its own thoughts, without having to perform empirical experiments in the outside world. I think there’s no way that all of the information in Wikipedia and youtube is sufficient to understand reality. There’s a really big difference between knowing about something and knowing enough about something and knowing the right kinds of things about something to be able to actually DO it. For example, the AI might come up with a new physical theory. But then it would have to experimentally test it. It’s not just going to be right a priori. It can make all the models it wants, but unless it’s actually doing the legwork and verifying its ideas, there’s no way it will magically steer itself into being right or effective about anything.

    To use an appropriate quote, “There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.” This goes for the AI as well.

    In other words, the way to convince me that AI risk is real is to demonstrate to me that the following two things are not inherent to the information-theoretic nature of reality but rather human specific failings:

    1: the fact that it is MUCH easier to understand something after the fact or with an already existing example or proof than it is to devise it from nothing. (examples: it’s much easier to learn the concepts of relativity than it was for Einstein to devise them. We can understand the process of technological process and see how the changes in our society came from the technological changes we’ve experienced, but no one could predict this in advance.)

    2: theoretical models can only work well when combined with actual experimentation. You can’t design something and have it work the first time without testing it and modifying it based on what happens in the tests.

    Because I think that both of those would have to not be universal limits for an ASI to have the kind of extremely rapid physical capability gain that is ascribed to it.

  77. “I’ve been trying to write a persuasive essay about AI risk…”

    AI risk is no more likely, and no less likely, to be dangerous because you choose to try to write persuasively that it is. It is just what it is, regardless of what you try to write. This is like the jester trying to prove a certain box contains gold.

    • Aegeus says:

      Yes, warning signs and caution tape have never stopped anyone from hurting themselves. If the bridge is out, then the bridge is out, it is what it is. Construction workers are just wasting their time, because how could writing words on a sign ever stop someone from falling off a bridge?

      Writing doesn’t change the fact that a danger exists, but it can certainly change how people act around that danger, which is just as important.

      • Deiseach says:

        Warning signs and tape about “the bridge is out” when there is no bridge there in the first place are not, however, very convincing and indeed look more like pranks or set-ups.

        • Aegeus says:

          That didn’t seem to be his objection, though. He was arguing that it’s equally effective whether the bridge is there or not.

  78. Johannes says:

    How can one tell if one is on an exponential curve or on the exponential-looking bit of an “S-curve” (similiar to a hysteresis curve)? It could be far more complicated with long plateaus or valleys before which there are certain really hard physical or mathematical thresholds.
    Why is “human-level” singled out as probable “take-off” point? In principle at any time in the curve of AI development there could be long “bare stretches”, i.e. the development is stalled for some reason or only linear or logarithmic? Even granted there would be exponential take-off as soon as an AI can easily “improve itself” (and this seems far from clear to me, there could be all kinds of trade-offs between stability and the ability of “rewriting oneself” at will and there could be hard thresholds further up) why should this point be around human level intelligence? Sure, if it takes place before, not much changes with the arguments, it might get even more dangerous because we underestimate rat-lvl-AI and then it jumps to superhuman fairly fast.

    But taking the evolutionary hint that human-lvl intelligence seems fairly stable, maybe the takeoff point is not at average human, not at Einstein, but at double Einstein and we might never (or only incrementally slowly) get there, even granted we might get to human or Einstein level in a few decades.

    • Aegeus says:

      “Human-level” is singled out because if organisms less intelligent than humans were able to make a self-improving AI, they would be the ones discussing AI safety instead of us.

      Agree about the rest of your points, though.

  79. Matt C says:

    If you want to write a persuasive essay, who is your audience?

    If it is people who haven’t heard much about AI yet, you can continue talking the same book and look for a new way to reach those new people. You write well enough that this might yield some results. I would recommend downplaying the eternal damnation/eternal bliss dichotomy of UFAI/FAI, this trips people’s nuttiness detectors. Put your case more mundanely. Don’t sound overconfident, consider admitting any uncertainties that you have.

    If you want to reach people who have already heard of AI and don’t buy into the story, you’re going to have to actually connect with their objections and respond to them. I’ve read, idk, 3 or 4 AI related threads here already and to me they’re pretty much the same (*). The AI risk guys act like it’s a settled question. When somebody says “this isn’t believable because blah blah blah”, often they don’t get a response at all, and if they do get one, it is never anything like a knockdown counterargument.

    (*) I read half of that Bostrom book, too.

    It doesn’t look good if you present yourselves as The Rational And Very Very Smart people who have reasoned out The Only Real Risk Facing Humankind and also The One Way Catastrophe Can Be Averted, and then to people who say they aren’t convinced, you don’t have any response, or a muttered partial response, or a suggestion to go read The Sequences.

    I don’t expect to see anything new from the ineluctable runaway AI crowd any more. I always like reading what you write, so I’ll be happy to see a new essay, but if it’s basically the same song different verse, it’s not going to be any more persuasive to me than what I’ve already heard.

  80. Selenae says:

    When did this survey close? I clicked on it this morning but didn’t get through the whole article, and now the survey page says not to take it.
    So now I’m wondering whether it has always said that and it’s one of those trick tests to see whether we read the instructions before starting.

  81. Robert L says:

    Final thought: Predictions are hard to make, especially about the future. We don’t know what is coming and our expectations are no more likely to be right than those of Astounding Science Fiction of the 1930s (personal jet packs, and tobacco-smoking spaceship captains delivering hard copy, paper mail to the colonies on Mars). To advance the debate the first thing we need is a theory of how an AI might have goals and intentions – a version of the Turing test to ask not “does this machine think” but “does this machine want?”

  82. says:

    Starting a survey on AI risks using an Alphabet/Google service has its very own strange taste of humor…

  83. Nathan says:

    Random question: what’s dangerous about giving the AI the goal of “produce exactly 1 million paperclips”?

    • Lyle Cantor says:

      Bostrom says that the agent would never assign 0 probability to it having achieved its goal, so it might spend vast resources obsessively counting its paperclips or assuring itself that each one meets the design criteria. Because it cares for nothing else, it is rational for it to spend all the resources it can acquire in such pursuits.

      • Deiseach says:

        So how is that intelligence? You tell me to stack fifty boxes in this corner, I count my boxes, I have fifty, that’s the job done. I don’t stand there going “But is that exactly fifty? Is that one too small to be a box? Maybe I didn’t stack them correctly”.

        Anything that can’t count to a million and be sure it’s counted to a million is not intelligent, no matter how big and powerful it is. And frankly, that’s getting into “The Big Bad Wolf will eat granny” territory when you’re trying to weasel around “No, no, humans will never be able to put any limits on AI like that, so you have to use my solution or else we will all be doomed!”

        Sorry, mate, can’t hear you: I have to obsessively count these boxes and I can’t care about anything else.

        • Doctor Mist says:


          Your position, here and elsewhere, seems to be that there is nothing to worry about because we cannot possibly build something that is flexible and creative but also extremely single-minded.

          You are not subject to this failure mode, because you have lots of different (and not necessarily consistent) goals, created by lots of crosscurrents in the evolutionary process. Your desire to stack fifty boxes because somebody tells you to is at best an instrumental goal — you do it not because you care deeply about the boxes, but because somebody is paying you, or because the request came from somebody you care about.

          Likewise, somebody giving you the instructions to stack the fifty boxes need not worry that you will go all OCD about making sure there are exactly fifty, because (a) they know that you are human and that the stack of boxes is probably not your top priority, and (b) it really doesn’t matter that much to them even if you do — you don’t have the capabilities to rob Fort Knox to hire minions to help you confirm the box count, so if it makes you feel better to skip lunch to perform one more count, have at it.

          Anthropomorphizing is tricky in this situation, but imagine somebody could creditably offer you ten trillion dollars and immortality to stack exactly fifty boxes in the corner. How many times would you count them? When you were pretty sure you had it right, how much would you pay me to double-check the count before you declare the stack complete?

          Like you, I might not consider something that was superhumanly flexible and creative but also extremely single-minded to be “intelligent”. But I would still consider it dangerous, and I would like to understand how to control it. Only if you think such an entity is literally impossible would such an effort be pointless; whether it is actually “intelligent” is moot.

          FAI research assumes something like human intelligence simply because that’s the best analogy we know for something with high-level flexibility and creativity, and because something that acts sort of like human intelligence is likely to be a goal if only because that’s what we know how to deal with. If we succeed in creating something as flexible and creative and non-single-minded as Deiseach, we will have done pretty well. The question is how to do that.

          • Robert L says:

            “Anthropomorphizing is tricky in this situation” … too right, and admitting the problem does not solve it. What would an AI want in the same way we want money and immortality?Why would an AI want anything? The situation is summed up in the original Terminator film:

            “That Terminator is out there! It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop… ever, until you are dead!”

            Here the “make paperclips” equivalent is “kill Sarah Connor” and the point is that it is incorruptible because not human and therefore not motivatable, though this doesn’t answer the point that if it can’t be made to change its intentions it also can’t be made to form intentions in the first place; nor why Skynet declares war on humanity as soon as it becomes sentient.

          • Doctor Mist says:

            @Robert L

            Why would an AI want anything?

            Because we build it that way.

            Don’t assume “want” is some wibbly-wobbly esoteric qualia-based concept. It just means that it has a reason for doing anything at all: if we build a thing that has no reason to do anything, then it won’t do anything. Why would we bother?

            Its reason for doing anything could be as simple as “do what I tell you”, which is pretty much the reason built into modern computers. But for superhuman-level AIs this is problematic for two reasons. First, it reduces the problem to the problem of how we can trust the guy giving the orders. And second, even a human-level assistant would be of no particular use to us if we had to tell it what to do as precisely as we have to tell modern computers. The point of building something intelligent is that we don’t have to tell it every little detail: it can figure out most of it by itself.

            And once again I’ve fallen into the pitfall of describing it anthromorphically. Someone reads “intelligent” and feels like the problem is solved — if it’s “really” intelligent, it will figure out what we want in an intelligent common-sense way, so what’s the problem? The problem is that there’s not a flag in the code where we set intelligent=True and we’re done. We will be programming things that can handle bigger and bigger pieces of problems, as we have been doing all along.

            Even now, we can write software that creatively and flexibly controls a huge computing cloud like Google’s. Even though that software is a detailed and formal specification in C++ code, its writers didn’t foresee every single contingency that might ever arise. Sometimes a contingency is handled in a way that at first glance looks inspired and elegant, even though the designers never considered that case, but if you trace it through you see how it falls out from the code. Sometimes a contingency is dropped on the floor and the cloud goes off into the null zone and some humans get paged out of a good night’s sleep to nudge it back into usability.

  84. Jason says:

    I do not buy, nor have I ever bought, the idea that a system can improve its intelligence exponentially. I mean, maybe it can improve asymptotically. But the idea there is no limit seems to be one that needs far more justification.

    In maths, self-reinforcing systems go straight to infinity. But in the real world there are normally limits. I think this is a map terrain problem emanating from using math to think about real world problems.

  85. Bugmaster says:

    Damn, I was on vacation and missed the survey… Looking forward to the results, though.

  86. Maware says:

    Read it, even though survey is past. Big issue is you don’t convince me that superintelligence flows from processing power. You have to prove that first, then that there is a realistic timeframe for it happening. You used the Go playing computers, but they are not intelligent, because they are only able to do that single task unless their programmer comes in and retools the computer to do more. No volition.

    When I first heard of the concept, superintelligences were the last thing that mattered. I am more worried about AI risk in that the more complex the single task, the more small variables or things will make an impact. An example that comes to mind was mob behavior in FFXIV. They created a zone, and put a certain mob in it, but their algorithm was screwed up and every single mob of that type somehow moved to the zone entrance and murdered people. No superintelligence at all, just faulty programming. AI risk in that sense, applied to self-driving cars, thermostats, etc.