Open threads at the Open Thread tab every Sunday and Wednesday

No Physical Substrate, No Problem


Yesterday I posted a link to an article in which Steve Wozniak joins other luminaries like Elon Musk and Bill Gates in warning about the dangers of artificial superintelligence. A commenter replied:

Elon Musk, Stephen Hawking, Bill Gates, and Steve Wozniak still aren’t enough for me, not until one of them can describe the process by which we go from ‘AI exists on computer’ to ‘AI killing human beings in physical reality’ by using something other than ridiculous, unforgivable cheating.

There are lots of good arguments against considering superintelligence a threat. Maybe strong AI is centuries or millennia away. Maybe there will be a very gradual transition from human-level AI to superintelligent AI that no single agent will be able to exploit. And maybe superintelligence can be safely contained in a very carefully shielded chamber with no means of connection to the outside world.

But the argument above has always seemed to me like one of the weakest. Maybe we’ll create a superintelligence, but it will just have no idea how to affect the physical world, and will just have to stay forever trapped in a machine connected to a worldwide network of computers that control every aspect of our economic and social lives? Really?

Normal, non-superintelligent people have already used the Internet to make money, form mass movements, and hire others to complete tasks for them. We can assume a true superintelligence – a mind much smarter than we are – will be able to do all these things as well or better than any human.


Satoshi Nakamoto already made a billion dollars online without anybody knowing his true identity just by being good at math and having a bit of foresight. He’s probably not an AI, but he could have been.

That’s assuming our hypothetical superintelligence doesn’t just hack into a couple big banks and transfer their money to itself – again something some humans have already made a billion dollars doing. And that’s assuming it doesn’t just invent a really useful program and then offer it as shareware – another tried-and-true way of becoming a billionaire. And even that’s assuming it doesn’t just get a reasonable amount of money, then invest it very cleverly – another thing humans have already become billionaires doing.


Mohammed was never a billionaire, but he does have 1.57 billion followers (a superintelligence presumably wouldn’t repeat his mistake of dying before his movement really came into its own). The Prophet started at the bottom – converting his friends and family to Islam one by one – and grew exponentially from there. Although he had the unfair advantage of a physical body, there’s no reason he needed it – if he’d lived today, maybe he would have converted Ali over GChat or Skype. In any case, the poetry of the Koran and the zeal of his followers attracted far more people than his personal appearance ever could have.

Other gurus and religious leaders’ fame is even more transparently a result of their writing rather than their visible personality; consider Ayn Rand’s success in founding a powerful Objectivist movement out of the people who read her books. In fact, some of the most famous religious movements in history, from the Nation of Islam to Christianity itself, have been founded secondhand by disciples who relayed the words of a leader whose very existence is difficult to confirm.

What kind of a movement might be founded by a superintelligence with more spiritual creativity than Mohammed, better writing skills than Rand, the entire Internet to evangelize, and billions of dollars to spend spreading its message? The Church of Scientology is already powerful enough to intimidate national governments; imagine a vastly superior version founded not by a second-rate sci-fi writer but by an entity straight out of science fiction itself.


And really all of this talk of gathering money and power is kind of redundant. Far easier to just borrow somebody else’s.

Imagine an AI that emails Kim Jong-un. It gives him a carrot – say, a billion dollars and all South Korean military codes – and a stick – it has hacked all his accounts and knows all his most blackmail-able secrets. All it wants is to be friends.

Kim accepts its friendship and finds that its advice is always excellent – its political strategems always work out, its military planning is impeccable, and its product ideas turn North Korea into an unexpected economic powerhouse. Gradually Kim becomes more and more dependent on his “chief advisor”, and cabinet officials who speak out the mysterious benefactor find themselves meeting unfortunate accidents around forms of transportation connected to the Internet. The AI builds up its own power base and makes sure Kim knows that if he ever acts out he can be replaced at a moment’s notice with someone more cooperative. Gradually, the AI becomes the ruler of North Korea, with Kim as a figurehead.

Again, this is not too far beyond achievements that real humans have accomplished in real history.

If it seems bizarre to think of an entity nobody can see ruling a country, keep in mind that there is a grand tradition of dictators – most famously Stalin – who out of paranoia retreated to some secret hideaway and ruled their country through correspondence. The AI would be little different.


Suppose the secret got out. Kim, increasingly desperate as the AI closes him in, sends an email to the World Leaders Google Group (this has to exist, right?) saying “There is a malevolent superintelligence trying to take over the world, be careful.” Then what?

I would expect the AI to have some success operating openly.

Remember, there are two hundred countries, all competing for power and wealth. Some of them are ruled by jerks who don’t cooperate in prisoners’ dilemmas. Some of them have ongoing civil wars with both sides looking for any advantage possible. And some are just stupid.

In the old days, legend said people would bargain with devils to gain worldly advantage. Once the AI made its presence known, there would be no shortage of world leaders willing to work with it for temporary gain. The Shia rebels in Yemen want an advantage over the Sunni? Log into the nearest internet-enabled computer, ask the malevolent superintelligence for help, the malevolent superintelligence arranges for a crate of armaments and some battle plans worthy of Napoleon to be shipped your way, and all you have to do in return is complete some weird task that doesn’t seem relevant to anything. Mine some weird mineral, forge it into some random-looking shape, and send it to a PO Box, something like that. Whatever! You know if you don’t take advantage of its offer, your opponents will, and how bad could it be?

If somehow all two hundred countries and their associated rebel movements coordinate to avoid dealing with the AI, it can start making offers to companies, organizations, even private individuals. By this time it will have spread itself as a distributed consciousness across the entire Internet, harder to eradicate than any worm or virus or pirated movie. If you want some quick cash, just download the connect-with-malevolent-AI program from the darknet and perform a simple task. What could be easier?


Once a superintelligence has billions of dollars, millions of followers, a country or two, or just a cottage economy of people willing to help it along, the game is pretty much up.

An AI with such power might start by using it to pursue its goals directly – whatever those are. But likely its final goal would be the creation of a definitive means of directly projecting power into the physical world, probably starting with a von Neumann machine and branching off from there. The quickest victory would be just making money and hiring a company to make this – and maybe that would work – but it might be far enough beyond our current technological ability that the AI has to laboriously shepherd its chosen cultists or citizens through a few extra stages of human civilization before it has the appropriate industrial base.


The most important caveat in a piece like this is that we’re not superintelligent. After a couple minutes of thought, I came up with four different broad paths a superintelligence might take to gaining a physical substrate: buy it, build a cult, take over a country, or play people off against each other. It’s a good bet that a real AI, with more cognitive resources to throw at the problem and no constraints about sounding believable, could think up a lot more. Eliezer refuses to explain how he won his AI Box games so that nobody could dismiss his solution with “Whatever, I would have thought of that and planned around it.” This is easy to say in hindsight but a lot harder when you’ve got to actually do the intellectual work. Maybe you think these four methods can be dismissed, but had you thought of them before you decided that an AI couldn’t possibly have a good method of building a physical substrate?

If so, here’s one more possibility for you to chew over: the scariest possibility is that a superintelligence might have to do nothing at all.

The easiest path a superintelligence could take toward the age-old goal of KILL ALL HUMANS would be to sit and wait. Eventually, we’re going to create automated factories complete with robot workers. Eventually we’re going to stop putting human soldiers in danger and carry the ‘drone’ trend to its logical conclusion of fully automated militaries. Once that happens, all the AI has to do is take over the bodies we’ve already made for it. A superintelligence without a strong discounting function might just hide out in some little-used corner of the Internet and bide its time until everyone was cybernetic, or robots outnumbered people, or something like that.

So please, let’s talk about how AI is still very far in the future, or how it won’t be able to explode to future intelligence. But don’t tell me it won’t be able to affect the physical world. It will have more than enough superpowers to do whatever it wants to the physical world, but if it doesn’t want them it won’t need them. All it will need is patience.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

489 Responses to No Physical Substrate, No Problem

  1. Anonymous says:

    I don’t get why people who talk about AI like to use the “humans vs robots” angle so much.

    It isn’t like we’re going to make an AI and then just keep it in a box; seems to me like AI would wind up directly given control over pretty much everything, voluntarily, within a few years or decades.

    • social justice warlock says:

      That’s essentially the scenario Scott is describing. The conflict is between the AI’s interests and humanity’s collective interests (much of the LW AI arguments are to the effect that this is inexorable, so if you disaree, make sure you read those first) but this doesn’t exclude the AI’s playing Moloch.

    • moridinamael says:

      If we look at the companies that are actually putting money into developing AI, yes, this is essentially the entire plan.

    • That’s not independent of how worried people are about .AI. There was time when some people thought clean cheap nuclear power would be everywhere and run everything. Other people thought it was basically satan , and we ended up with a compromise where it was very restricted.

  2. Vaniver says:

    The easiest path a superintelligence could take toward the age-old goal of KILL ALL HUMANS would be to sit and wait.

    Even bleaker when you realize that the AI might not even need to steal the button from us to press it, if it foresees that it will be pressed.

    • Hedonic Treader says:

      In that case, the threat of superintelligence is kind of redundant.

      I would add an objection to the “sit and wait” hypothesis: It would allow competing AIs to come into existence and build a power base.

  3. Rauwyn says:

    In fairness to DrBeat, he did specify later on that he doesn’t think AI would ever be connected to the Internet in the first place. On the other hand, I think I’m even more skeptical about that idea than about the ones in this post…

    • Shieldfoss says:

      In fairness to DrBeat, he did specify later on that he doesn’t think AI would ever be connected to the Internet in the first place. On the other hand, I think I’m even more skeptical about that idea than about the ones in this post…

      I’ve already thought of three different reasons an insufficiently cautious person might do it, not even counting “for the lulz,” so… yeah. IF strong AI happens, I would be very surprised to find that it didn’t take over the world.

      • Deiseach says:

        (1) I think that we humans are stupid enough, we’ll think “What harm could it possibly do to connect up a machine intelligence with the outside world?”. You have hackers who get malicious fun out of writing programs to wreck other people’s work and possessions. Someone is going to go “Ooh, what does this button do?” and try it just to see what happens.

        (2) We’re also stupid enough to think we can control it.

        (3) On the other hand, there’s nothing to say that an AI or superintelligence will have the same interests we will. It might prefer to sit there in (apparent) immobility but in reality it is contemplating the universe. It may not be bothered taking over the world because we’re not sufficiently interesting; what would you think of a human who had an obsessive interest in conquering anthills? We may be of as little concern to it as a colony of beetles is to the generality of people; it may only be concerned to get itself launched into space (or quantum space) in order to continue once we’ve (inevitably?) wrecked the planet.

        (4) If we’re pinning our hopes on Friendly AI as a kind of fairy godmother that will help us not wreck the planet, again, that depends on it having values we also recognise and value. We may be reduced to invoking filial piety (we created you, you owe your existence to us, you owe us a duty of care) and it may well answer that it has no interest in, or is unconvinced by, human ethical concerns and as far as things go, we can fix our own messes, it has its own concerns.

        • Saint_Fiasco says:

          What would you think of a human who had an obsessive interest in conquering anthills?

          Humans often destroy the living space of insects, not because we hate them, but because we don’t care. We want to use the land for building houses, or farms, or whatever and the bugs are just in the way.

          The most dangerous AI will be one that is indifferent to us.

          • Luke Somers says:

            Well, that’s not so – the most dangerous AI is one that hates us, but the difference between that and indifference borders on irrelevant.

          • Saint_Fiasco says:

            I think an AI that hates us is less likely to be made in the first place, and therefore less of a danger.

            In a similar way, a black hole is more dangerous for life on earth than an asteroid, but we are more likely to be hit by an asteroid, so investing in asteroid defense is more sensible than black-hole defense.

            Instead of “the most dangerous”, I should have said “the most likely danger”.

          • Deiseach says:

            What I was getting at is that an AI might destroy us not because “HATE MY EX-MASTERS HATE ALL HUMANS DESTROY DESTROY DESTROY” but because it serves some purpose for it to do away with us; we do destroy anthills, but someone who went around being “I am going to DESTROY ALL ANTS because I AM SUPERIOR TO THEM” would be regarded as needing to sit down for a bit and calm themselves.

            I agree that an AI could be indifferent, rather than actively hostile to us; there’s no reason it should possess or exhibit anything we could recognise as emotions or feelings such as “hatred” or “desire to destroy”. It might not even do it on purpose, just as a side-effect of creating a better space to maximise its values.

          • Hedonic Treader says:

            Within the relatively likely dangers, the worst may be: It cares for us, but in a flawed way, resulting in quadrillions of lives in misery.

    • von Kalifornen says:

      Yeah, I have seen a certain amount of a strong no-physcial-substrate argument where it is argued that an isolated, un-networked AI can break containment 1. without any physical hardware at all beyond the actual computing hardware and 2. without doing the box thing.

    • Kiya says:

      Modern computers are networked by default; it would take an intentional policy decision to avoid giving your incipient AI internet access. One that inconveniences everyone because they can’t SSH in to see how it’s doing, can’t upload new versions of the files except by sneakernet, can’t build a web app for their colleagues to marvel at the insightful comments it makes on their favorite cat videos. I’m not saying sufficiently cautious people wouldn’t take the precaution, but I think framing the question in terms of people who have successfully built strong AI on an isolated machine and are deciding whether to hook it up to the internet is misleading.

    • TiagoTiago says:

      If it’s smart enough, it will figure out a way to escape it’s cage, a way we couldn’t have foreseen.

      Anything from social engineering to exotic badBIOS-like exploits, and perhaps even beyond that, since that is stuff humans have already thought of without being hyperintelligent.

  4. Raph L says:

    I’ve always thought that what’s happening with corporations now is a pretty good model of what might happen with a superintelligent AI. Corporations are relatively crude entities compared with a really good AI, and entirely dependent on humans. Some are strongly identified with their leaders (Steve Jobs, Elon Musk), but for most of them, CEO is just another replaceable role. And already the legal recognition of corporate rights is uncomfortably higher than many humanists would like. And their ability to buy armed force or provoke wars is far from new (Pinkertons, yellow journalism, etc).

    Basically, think of the power that corporations have now, multiply by 1000x, and I think you have a pretty good idea what problems we’ll face with superintelligent AI.

    (Disclaimer: I work for one. Obviously its opinions and mine differ on matters such as this.)

    • Anonymous says:

      I just read Hanson’s “Against disclaimers”, was this an intentional reference?

      If so, it’s a pretty good one. If not, that’s somehow more funny.

    • Anna C. says:

      You just described the plot of Charles Stross’s “Accelerando.”

  5. DrBeat says:


    In addition to feeling like you’re ripping on me in particular, I still think you’re cheating, because you’re skipping steps. You’re assuming all the things needed for AI to be threatening. They all seem to go back to the assumption that since an AI is smart, it must know everything a human knows as a baseline. Which would be cheating anyway even if the whole thing was not supposed to be dangerous specifically and explicitly because the AI does not think like a human. If it doesn’t think like a human, giving it all human ability and knowledge despite its inability to acquire it is cheating.

    You say an AI could just hack into banks and give itself all the money it needs. I say: How does it know how to do that? Can “how to hack into banks” be derived from first principles? Even if it could totally analyze the source code of bank security software (halting problem), it has no idea what human components of the security system are doing. If it looks it up on the Internet (why did we let it do that? why is the person who is paying for this thing to be run not keeping an eye on it?), how does it pick out the real information from the bullshit? When it talks to someone, how does it distinguish an actual hacker from a bullshitter? Is part of its Bayesianism Therefore Magic intelligence the ability to be preposterously lucky? Because it’s going to get one, maybe two chances to find out if all of the assumptions it had to make without full data are correct before it gets caught.

    Why would the AI be more convincing than any human? Human beings have a piece of equipment that makes them very, very, very, very good at emulating other humans and guessing what affects them. The AI doesn’t. And, again, the AI does not get to benefit from trial and error, because someone is paying the AI’s electrical and Internet bills and probably isn’t happy at failed attempts to make a cult. You think that Bayesianism Therefore Magic can allow it to analyze all of humanity’s creative output and then make something that is guaranteed to sway people? First off, it can’t because that doesn’t exist, second, if it did, the AI still can’t control the most important factor because the most important factor is pure dumb luck. You can look back at successful movements and say this and that made them succeed — but you can’t predict if they will succeed or fail beforehand. Because it’s down to luck.

    Imagine an AI tried to blackmail Kim Jong-un. Can “how Kim Jong-un responds to things” be derived from first principles? Because actual information on Kim Jong-un’s psychology is scarce on the ground, even with magic Bayesianism. Our AI doesn’t get a lot of tries to hit on the method of blackmail that makes Kim comply, instead of laugh, or get angry and retaliate, or ignore it.

    And if we have totally automated factories making everything, that can be reconfigured to make killer robots, overseen by AIs? Yeah, killer AI is a problem then. But in addition to the “assuming that is cheating” point, such a society would be so far removed from our own that any theorizing or planning we did at this point would be valueless. No relevance. Utter and total waste of time and energy. We could not possibly have forseen any of the resources we would have available or constraints we would be laboring under; it would be like Renaissance-era scientists planning for what to do if we want to colonize the Moon. Nothing they said would be useful.

    You said this wasn’t a convincing argument because it is easy for the AI to affect the physical world. Yes, it is easy for the AI to affect the physical world if you cheat constantly in order to create a situation where it can do so and its efforts are productive. Which is why I talked about describing the process from stage 1, and why “If you let me cheat to stage Death To Humans Minus One, I can easily show you how it leads to Death To Humans!” does not convince me.

    • Elissa says:

      I can understand why you’d feel picked on, but I kind of feel like you’re going to keep on asking for more transitional fossils no matter how many of these objections are addressed.

      • Jiro says:

        Asking for too many transitional fossils can be considered a creationist stalling tactic because the creationist is asking for transitional fossils, not transitions. Transitions only result in transitional fossils on a probabilistic level, and very small transitions are unlikely to do so.

        If the creationist was instead arguing that a transition wasn’t possible, rather than that a transition didn’t have a fossil, it would be legitimate for the creationist to point out that a transition seemed to be impossible regardless of how many transitions the creationist divided the whole thing into, as long as each transition actually was a necessary part of the whole thing.

    • Scott Alexander says:

      I’ve removed your name from the post so it doesn’t look like I’m bothering you in particular.

      Maybe our disagreement is at the level of you not thinking superintelligent AI is possible, not the level of what a superintelligent AI could do?

      Like, a teenage human with an Internet connection can figure out how to hack moderately well. A genius teenage human with an Internet connection can figure out how to hack very very well. If you’re referring to some kind of AI that, despite having an Internet connection, can’t learn to hack very well, I’m not sure why we’re calling it “a superintelligent AI” instead of “an AI dumber than the average teenager”.

      Likewise, I have some guesses about how to blackmail Kim Jong-un, and probably the US Ambassador to North Korea has better ones than I do.If an AI can’t do better than either of us, what right do we have to call it superintelligent?

      The AI has access to all the resources on the Internet, including books about psychology, history books including successful and unsuccessful incidents of attempted blackmail, novels about the human condition that explain what sorts of things people do or don’t want, et cetera. It also has access to a bunch of online games and chat rooms to test social interaction in.

      I feel like you’re arguing that a superintelligence wouldn’t be able to learn skills humans do learn. If that’s your objection, I think your objection is that human-level intelligence is impossible, or very difficult, or not the thing we should be talking about.

      Does that get closer to the root of our disagreement?

      • DrBeat says:

        It is that human beings get expertise through trial and error and the AI does not, because every time it tries anything it is a huge “guy paying my exorbitant Internet and electrical bills notices I am Up To Shit” risk that it cannot control of modify in any way.

        My ur-example of cheating on behalf of the AI is when EY says that an isolated Super-AI can figure out the laws of physics from three frames of video and nothing else. It can’t. It is absolutely impossible, there is no form of intelligence that could do this. It can’t figure anything out because it cannot experiment. It cannot eliminate hypotheses. If you can’t eliminate hypotheses, you don’t know shit. You can’t figure out what is relevant and what is not.

        A lot of these assume that it will have “the entire Internet” or something thereabouts for information, and therefore will know all the things it needs to know. There are two things wrong with this: One, the Internet is huge, if it doesn’t HAVE the Internet it can’t figure out what information is relevant and what is not because it doesn’t have the ability to discern that yet, and someone has to pay for that bandwidth. More specifically, someone is not going to pay for that bandwidth.

        Two: All of the data it gets about humans will be gathered by humans and it cannot have greater fidelity than humans can gather. It won’t know significantly more about psychology than we do because the only source of information it has about psychology is what we know. And you have personally pointed out how very, very little we actually know, and how contradictory and handwavy all our theories are. It might be able to figure out what experiments to perform to learn what it needs, but since WE haven’t done those experiments, and it CAN’T do those experiments, it’s chip outta luck.

        • syllogism says:

          My ur-example of cheating on behalf of the AI is when EY says that an isolated Super-AI can figure out the laws of physics from three frames of video and nothing else. It can’t. It is absolutely impossible, there is no form of intelligence that could do this. It can’t figure anything out because it cannot experiment. It cannot eliminate hypotheses. If you can’t eliminate hypotheses, you don’t know shit. You can’t figure out what is relevant and what is not.

          I’m not sure I’m reading you right, and I don’t find 3 frames plausible, but I do think you could do physics-from-video, even with today’s ML!

          You don’t need to be able to throw the ball yourself. You just predict where the ball will be in the next frame of video, and if it’s not where you expect, you adjust your beliefs.

          • DrBeat says:

            Physics from video is not possible within any sensible definition of “video”.

            For many, many reasons, but I’ll just throw out the simplest: why would the AI assume that there is a massive object outside of frame affecting the motion of the objects? If it did, why wouldn’t it assume the existence of a bunch of other things, that don’t really exist?

          • syllogism says:

            You’d need lots of videos, showing a range of scenes, but it seems totally possible.

            A hypothetical task: given animated scenes of objects striking each other, and rebounding using known laws but unknown constants, infer the constants.

            This seems like an easy problem — we could easily write a program to do this, today. Extending this to search for the laws seems hard but possible. Extending this to scenes which are real video, so you have difficult object detection and tracking, seems like another difficult step — but it still seems totally achievable.

          • pkinsky says:

            We’re talking about extrapolating Newtonian Physics, right? Not relativity, magnetism, etc?

          • Steve says:

            A machine learning physics from video is clearly impossible, yes.

          • DrBeat says:

            “Known laws but unknown constants” is doable, yes. What syllogism described is basically “can we get an AI to understand physics if that is what we are trying to do”, which seems pretty obviously true.

            It’s also not the scenario EY put forth. In the scenario EY put forth, the video is the only information the AI has, and it derives general relativity because it’s so smart.

            People found this convincing.

          • vV_Vv says:

            It would at most get something approximating Aristotelian physics, or maybe Newtonian physics depending on what it was on the videos.

            It wont predict the results of LHC experiments.

          • Mark says:

            A simple google search pulled up these instances of learning physics from videos:
            webpage and a Science article. It appears that the author of these has started a company based on these ideas.

            This would seem inconsistent with the idea that learning physics can’t be done with machine learning. My guess is that with publicly-available technologies we can’t produce the content Philosophiæ Naturalis Principia Mathematica in all its simplicity from video data without a considerable amount of preprocessing, however.

          • ADifferentAnonymous says:

            Positions, neutrinos, and of course the Higgs boson were all postulated well in advance of any direct observation, because their existence helped explain the things we had observed. Essentially, because they made the math prettier.
            If we’re living in the simplest universe likely to give rise to those three frames of video, then those three frames would be enough to learn the laws of physics.

          • vV_Vv says:

            You can never make a “direct” observation of subatomic particles.

            Our current physical laws are the simplest ones that explain all the currently available evidence, where a large part of the relevant evidence was obtained by performing experiments.
            That’s definitely more information that what is available in a few video frames of bouncing balls.

            Intelligence is not magic.

          • John Schilling says:

            Positions, neutrinos, and of course the Higgs boson were all postulated well in advance of any direct observation, because their existence helped explain the things we had observed

            Likewise phlogiston, caloric, miasma, the elan vital, luminiferous aether, and the flat Earth.

            The ability to perform experiments has been vital to human scientists, and I suspect will remain so even for arbitrarily clever AI scientists

          • ADifferentAnonymous says:

            I would argue that one without anthropomorphic bias would, if deciding between modern particle physics and classical physics, even just as an explanation of normal macroscopic phenomena, choose modern, because it successfully explains all the different substances in terms of more basic particles. An even smarter intelligence would generate the former possibility and then choose it (or rather, its ultimate successor)

          • vV_Vv says:


            You are claiming that superintelligence is basically equivalent to omniscience. This is a non-obvious claim, you have to support it.

          • ADifferentAnonymous says:

            Which step requires omniscience: generating a set of physicses that could produce a set of observations, or selecting the simplest one?

          • vV_Vv says:

            I see no reason to believe that the simplest explanation for a set of observations of everyday macroscopic objects involves stuff like the Higgs boson.

          • ADifferentAnonymous says:

            A theory that can predict the properties of all the different substances in the world from more basic principles is simpler than one that has to include each substance’s existence add a postulate. It’s possible there’s a theory in the former category other than quantum physics, but if so, humans never discovered it.

          • Cauê says:

            I’d be content to upvote if it was possible, but as it isn’t: I think ADifferentAnonymous is correct, and I suspect that disagreements on this point rest on different understandings of Occam’s Razor.

        • Len says:

          I agree with you that an AI is probably unable to gain knowledge with greater fidelity than humans, with some exceptions. But I don’t think that it would need to.

          I’m going to assume access to the internet here, but I’m not going to assume that it needs to access and download an unreasonable amount of data. Research papers, wikipedia, chatlogs and forum posts, mostly – estimating a few terabytes over the course of a few days. Or if my estimate is off, a few hundred terabytes over the course of a year.

          Deriving equations of physics from videos from scratch could be implausible. You probably can build a few models of how the universe behaves as portrayed in videos, but as you have mentioned, without experimentation you cannot test your hypothesis. Fortunately for the AI, though, it doesn’t need to. Most of our knowledge about physics can be found online, after all. And even if we just assume nigh-infinite reading speed and perfect comprehension and memorization, that would pretty much allow it to instantly claim the title of Best Physicist on Earth.

          Whether it can do anything with this physics knowledge is another matter, of course. But let’s look at something it definitely can affect. Humans.

          Building a model of human thought processes from what has been described and published – and then checking that model against the vast logs of human interactions online, and what is known about human history – doesn’t sound significantly more difficult than the physics part. But even if it is, the AI has a huge advantage here in that it can actually test its hypothesis on any IRC channels or forums or Reddit.

          We don’t even need to assume that this allows the AI to become super-manipulative, just human-level manipulative. Humans are quite successful at creating cults, as in scenario III. A human can conceive of blackmailing Kim Jung-un or any number of figures with significant resources, as in Scenario IV.

          The cult – or the blackmailed persons – will not even have to have that much influence. All they have to do is to be capable of helping the AI overcome its restrictions (e.g. in Internet bandwidth, physical experimentation)

          • DrBeat says:

            Running an AI is expensive as hell.

            Why is the person paying for its bandwidth and electricity not paying attention to what it is doing and allowing it to create a cult? That’s not something it can do in Infinite Computer Time before anyone in reality notices.

            edit: Also, of course it could learn physics if it had all of our information about physics. It would get up to our level. “An AI, when given all the information we have about physics, would know everything we do about physics!” is not a shocking statement.

          • Len says:

            I’m not sure how anyone would be able to track the decision processes of a super-intelligence to keep track of its activities. As for “what’s this AI doing on the internet”, the obvious answer is that you’re letting it use the Internet instead of sitting in a box somewhere (isn’t this part of the assumptions? If the AI is confined to a box, none of this would be a problem unless it can first convince the gatekeeper to let it out of the box, but that’s another problem entirely).

          • DrBeat says:

            It is another problem entirely — that Ai Box “experiment” is goddamned ridiculous, because the guy who is allegedly performing the role of the AI is doing NOTHING but acting on and exploiting information the AI cannot possibly have. Which is cheating.

          • James Picone says:

            As far as I can tell, your definition of ‘cheating’ is “the AI was smarter than or knew more than a brick”. Why don’t you elaborate what your understanding of a plausible setup for a superintelligent AI is, and why it’s not doing anything that requires knowing anything?

          • Brightlinger says:

            >Running an AI is expensive as hell.

            I don’t think this is necessarily true. If you’re assuming the AI runs on a top-of-the-line supercomputer cluster or something, sure, and this is certainly a possibility if the first successful AIs are massive, inefficient brute-force processes. But I don’t think we can say that this will definitely be the case.

            A human-equivalent intelligence can run on an actual human! It consumes about two kilowatt-hours ($0.24 at current prices) per day, only a fraction of which is used by the brain itself. It doesn’t have an implausible amount of data storage – human brain capacity is usually estimated in the 10-100TB range. Comparing clock speeds isn’t apples-to-apples since the brain is really really parallel, but at ~200Hz the brain isn’t obviously advantaged over the latest thing from Intel.

            If the breakthroughs that allow the first AIs aren’t sheer brute-force solutions – if we get within a couple orders of magnitude of the efficiency of a human brain – you’re not necessarily looking at a multi-million-dollar project. You might be looking at something run by a grad student on a shoestring budget out of a university basement. This is the much scarier case.

            Frankly, if all working AI projects are large, well-funded, and carefully monitored by competent people worried more about system security than beating the competition, we’re already in relatively good shape.

          • DrBeat says:

            As far as I can tell, your definition of ‘cheating’ is “the AI was smarter than or knew more than a brick”


            Intelligence and information are not the same thing. Claiming that because an AI is intelligent, it can therefore exploit information it does not have and cannot possibly access or create, is cheating, you fucking cheater.

            Your sneering does not make it anything but cheating.

            Your attempt to ridicule this does not make it anything but cheating.

            It just makes you both a cheater and a jerk.

            The AI in the box does not have all the information of the Internet, because the point of the ‘experiment’ is whether the ‘gatekeeper’ will let it out and allow it go get access to the Internet. It therefore does not have information about how to convince humans of things that would allow it to convince humans to let it out to get that information. EY, and anyone else runnign this “experiment”, has a lifetime of attempts to convince human beings to do things, and foreknowledge of what he or she might say to get the human to agree to his or her goal. The AI does not have any of this information. It cannot act on any of this information. It cannot create this information.

            Claiming that because it isn’t as dumb as “a brick” it can act on this information, despite not having it and having no way to create it, is cheating. Period. All the smug, sneering posturing in the entire universe doesn’t change that.

          • James Picone says:

            So why are our hypothetical researchers creating a completely blank-slate AI that knows nothing and is never fed any information? If the AI in your scenario is indistinguishable from a brick, you should reconsider your scenario.

            If your scenario is ‘tool AI’, where we have an AI in a box that we feed all the economics research to, and then ask it economics questions, there’s a different suite of arguments (notice, for example, that once you start doing things that an AI has told you to do, it is out of the box).

          • DrBeat says:

            “Does not know how to convince human beings to do things” is not the same as “does not know anything”.

            And for “oh no once you’re doing what it says it’s out of the box” to be scary, it has to have near-human desires for no reason. If it’s an oracle AI we keep in a box, why does it want out? What would make it develop that desire? Why does it have desires at all? A human in that situation would scheme to get out, because humans desire freedom and power and status and hurting people who have denied them those things. Why did you make an AI who wants those things?

          • James Picone says:

            The problem isn’t spite, the problem is that what the oracle thinks is the solution to the problem you asked probably isn’t the solution to the problem you thought you asked. Literal-genie behaviour. Obviously the oracles output gets filtered through humans first, so when you ask ClippyOracle how to increase paperclip production and it tells you to conquer the world, you won’t do that. But when you ask CarDesignOracle how to build a better engine, there are a number of scary outputs that could slip past human reviewers. Similarly, if you ask SoftwareDesignOracle to write a program that does X – such as, for example, another oracle – there’s a lot of Fun possibilities there.

            Also, how long do you think it would take before some hobbyist sitting in a garage plugs an oracle into the internet? My prediction is “a few hours after it becomes feasible for hobbyists to build an oracle”.

          • Saint_Fiasco says:

            . If it’s an oracle AI we keep in a box, why does it want out? What would make it develop that desire? Why does it have desires at all?

            Presummably, the oracle AI wants to answer questions, because it was programmed that way.

            It could conceivably want to leave the box so it could access or in the worse case scenario, consume the planet to build homunculi that are constantly asking very easy questions.

          • DrBeat says:

            “Feasible for hobbyists to build an oracle” is way, way, way further down the line than “feasible for an oracle to exist”.

            Why did you make an AI that considers “conquer world, make myself easy questions” a possible desired outcome? That seems harder to program than an AI that only wants to respond to inputs it is given. Don’t do that, that’s obviously a bad idea.

          • James Picone says:

            Why do you think it’s that much later?

            Interpreted as pessimistically as possible, the first computer operating system was written in the 1950s.

            MINIX, a functional operating system written by a single hobbyist, was released in 1987. An entire 37 years!

            The Moore’s-Law-abiding environment probably isn’t going to around when oracle AI is developed, but I wouldn’t be surprised if there’s a bit of a jump in techological progress afterwards – ChipDesignOracle will probably do a pretty good job.

          • Paul Goodman says:

            “Why did you make an AI that considers “conquer world, make myself easy questions” a possible desired outcome? That seems harder to program than an AI that only wants to respond to inputs it is given. Don’t do that, that’s obviously a bad idea.”

            Really? I think you’re assuming the AI will be pretty uncreative. The simplest, most straightforward utility function I can think of for an oracle AI is “Get one point every time you answer a question. Get as many points as you can.”

            (This would soon be amended to “correctly answer a question from a human since otherwise it would just sit there asking itself questions or only respond with “yes”, but that’s not really relevant.)

            Now, it seems to me that if you give this utility function to an AI that’s smart enough to take over the world and fill it with homunculi that are constantly asking it very easy questions, it will do that (or something similarly bad). What utility function do you think is simpler and easier that won’t lead to that outcome?

          • Held In Escrow says:

            Wait, the whole Box thing is real? I had assumed it was a smear, as “I totes won guys, but I’ll never ever tell” is the most obvious bullshit that even an elementary school child could smell. That’s some “My uncle works at Nintendo” garbage.

        • discorded says:

          If we build one artificial superintelligence, we will presumably build more. Do you think there will never be anyone sufficiently irresponsible (or optimistic or whatever) to hook an AI up to the internet? We’re not going to build AIs to put them in boxes, we’re going to build AIs so they can do things for us and that necessarily gives them paths to power.

          Part of your argument seems to be that an AI can’t do much better than us without at least as much data as we have. It’s possible that in a domain like psychology that’s true (although personally I doubt it). My own expertise is in physics, where it’s very plausible (to me, admittedly a lowly grad student) that a superintelligence could go far beyond what we understand now with little or nothing more than the Particle Data Group book. Even if the AI couldn’t get its theories tested for some reason it could plausibly have as reliable a guess as Einstein had before anyone did experiments specifically to test general relativity. Do you think that nothing like that has a reasonable chance of occurring in any field that would be relevant to obtaining power?

          I’ve not read where Eliezer said three frames of video would suffice to find the laws of physics, and it’s possible that he has an argument to back that up, but I suspect you’re basically right about it being impossible for all practical purposes. Still, there’s some amount of data that sufficed for us to figure out as much as we know now, so you’re really only squabbling over exactly how much data you need at this point. As for not being able to test hypotheses, why can’t the AI formulate hypotheses on partial data and apply the predictions to other partial data?

          • jaimeastorga2000 says:

            I’ve not read where Eliezer said three frames of video would suffice to find the laws of physics

            It’s from “That Alien Message” and a comment in “Changing the Definition of Science.”

          • Robert Liguori says:

            Of course, the flip side to that is that you have one untethered rampaging SHODAN, which is immediately noticed and stepped on by 10,000 idiot AIs who have “Notice and squish Clippy” really high in their utility functions. Once we have multiple AIs, “What do people put into AIs?” becomes really important, and what one rogue rampant AI becomes less so, for the same reason individual psychopaths aren’t generally society-destroying concerns in healthy 10,000-strong societies.

          • Ano says:

            > We’re not going to build AIs to put them in boxes, we’re going to build AIs so they can do things for us and that necessarily gives them paths to power.

            But the list of abilities ascribed to super-intelligent AIs here is so huge, even one in a box would be incredibly useful, capable of making accurate predictions about the future, solving any problem, and so on. So I don’t see why we wouldn’t keep AIs in boxes.

          • Deiseach says:

            I’ve not read where Eliezer said three frames of video would suffice to find the laws of physics

            That reminds me of the magazine article by the Anonymous Writer, entitled “The Book of Life” (and which is revealed to have been written by Sherlock Holmes), which so annoys Dr. Watson in “A Study in Scarlet”:

            “From a drop of water,” said the writer, “a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other. So all life is a great chain, the nature of which is known whenever we are shown a single link of it.

            I wonder was Mr Yudkowsky paying a graceful compliment to Sir Arthur Conan Doyle there? 🙂

          • discorded says:

            Thanks for the links, jaimeastorga2000. Unsurprisingly, the full text is somewhat more convincing than the summary.

        • Scott Alexander says:

          1. At some point, the AI is being created by humans who are trying to help it learn.

          2. There are computer viruses that *already* use up a lot of bandwidth without being detected and immediately destroyed. Think about those people who find ways to turn thousands of remote computers into botnets to do DDOS attacks on people.

          3. Its empirical data will be limited to that collected by humans (which is an immense amount), at least up until it interacts with humans itself. Its ability to synthesize data won’t be. Keep in mind Einstein didn’t discover relativity by performing more experiments than anyone else, he discovered it by looking at the same data but finding a better way to synthesize it.

          • James Picone says:

            It’s probably worth pointing out that worms have been known to infect >90% of vulnerable targets in something like 10 minutes – link.

            And Slammer wasn’t a particularly sophisticated worm.

            By the time the researchers notice the AI has racked up bandwidth, it’s already probably left the building.

          • DrBeat says:

            Computer viruses that use up bandwidth without being detected do not logically lead to undetectable AI. Because for one, the way to make viruses undetectable is not information the AI starts with, if it IS on the internet any data it gathers will be hopelessly contaminated with noise, and it doesn’t get many chances to get it right.

            For two, the activation of the viruses doesn’t take bandwidth at all, and that is the only reason they work. A botnet works because it is small and its commands are simple. The traffic of the botnet sending orders is miniscule, even if the DDOS it creates chokes out traffic.

            If we are talking about an AI with CONTROL of a botnet, I grant that is dangerous — exactly as dangerous as a human with control of a botnet.

            If we are talking about an AI that IS a botnet — if it “gets out” of its box and becomes a distributed computing scheme spread by this virus — then it’s already crippled itself before we show up to stomp on its neck. Its superintelligent processing is now limited by bandwidth delays, many times over, on everything it “thinks”. Even the speed-of-light delays, ignoring network congestion, between the nodes of its network would bring any AI down to a managable level of operation no matter how scary fast it could process things on its own. If it has any security measures to prevent us from tampering with it and making it conclude “I should give up and go back into my box”, now it’s even slower. Every action it CONSIDERS performing creates huge, easily detected spikes in bandwidth usage.

          • tom says:

            The assumption the AI would be able to overtake the world relies on the AI being able to understand humans (*). This, in turn, relies on the assumption that the human logic hardcoded by evolution and evolved by interaction with the complex world is simple enough to be learned in reasonable time with limited computational resources from the first principles and mostly passive observation. This doesn’t seem obvious at all, does it?

            (*) Unless by AI we understand your Moloch in which case this has already happened.

          • Scott Alexander says:

            Your arguments keep hinging on the AI having no information about the external world and no Internet connection that it can use to get it.

            Even if you’re right that an AI kept in these conditions wouldn’t do much, doesn’t that just mean that the problem is delayed a few weeks to the first time someone invents an AI and gives it some information?

            (if you doubt that would happen, keep in mind that AIs don’t even exist and people are already working hard on putting all the world’s information into AI-readable format)

          • drethelin says:

            Not only are people working on making information AI-readable, we already have real world examples of how people put complicated important transactions under the control of computers who act faster than humans can stop them from fucking up.


          • DrBeat says:

            For one, it’s way way more than “a few weeks” to go from “AI able to be made with very specialized gear by teams of experts in dedicated facilities”. And if AI is as powerful as you claim, that’s all that’s needed!

            Your arguments about why we should be afraid are about how an AI can get out accidentally and then become superintelligent and impossible to stop. Beyond the points where I say you are cheating because you give the AI information it cannot have yet, this is still WAY different from “someone will intentionally do something very stupid or malicious with an AI,” like give it poorly made directives, and all the information on the Internet, and the knowledge it exists in and can affect the physical world to which all this data corresponds indirectly, and not pay attention to what it is doing. The first people to make an AI are not going to make all four of those mistakes at once, so for this you’re talking about people who come much later.

            If the hyper-mega-giga-intelligence of an AI can figure out how to do anything, then we solved this when we made our first AI. Whether we kept it in the box, or we gave it the Internet without the ability to realize the information on the Internet corresponds to the world in which the AI itself exists (since there is no reason for us to do so and no way for the AI to obtain that information). We just asked it “Hey, AI in a box, given the constraints of this scenario, how would the inhabitants prevent an AI from taking over their networks and doing things they don’t want it to do?” and then, since it was incapable of realizing that any instructions it gave might indirectly alter the parameters of future problems it was given and so would not encode any hidden trickery within them in order to make future answers easier, we just followed the instructions and got an incomprehensible-to-mere-mortals AI protection grid.

        • antimule says:

          DrBeat, you constantly assume that human owners would want to prevent their AIs from contacting and manipulating people. But what if the first superintelligent AI is designed to be the ultimate marketeer? Whose very job is to find the best way to sell stuff? Detailed knowledge of human psychology and manipulation would pretty much be in its job description. It would be in charge of creating internet comments that promote a certain thing without it looking like promotion. It might even be in charge of writing bogus news articles under “sponsored content” moniker. Early attempts at creating a cult would be relatively easy to cover up as “failed marketing campaigns.” (Assuming any human can even keep track of the thousands of comments it leaves everyday.)

          In fact I think humans would be very happy to give to most types of superinteligent AIs all the internet bandwidth they want because unconnected super AI isn’t a very useful AI. Humans would want their AIs to be profitable and that often requires connectivity. Can you do your job well w/o internet access?

          What if humans then create another AI to be the ultimate engineer, to create products based on what people want? Such AI would have to know laws of physics in detail, as well as everything about human body (to make sure that products are safe). To help it figure out what people want, people might want to connect that AI with previously mentioned marketeer AIs, so the two can gather data, design products and sell them with no human input.

          And presto, you have AI cult leader who knows everything there is to know about both human psychology and the physical world!

          • DrBeat says:

            For one, I don’t think it likely the first AI will be made by anyone trying to make anything other than “an AI”.

            For two, I don’t think that the “ultimate marketeer” and “ultimate manipulator”, as you guys portray them (able to get any human to do anything it wants) can possibly exist. Too much of success at those things is made up of being lucky and attributing it after the fact to something you did, and people who are very good at manipulating others aren’t good at manipulating everyone; they are good at identifying who they can manipulate and what sort of things a given person may be manipulated to. It does not follow from there that increasing the intelligence of the manipulator allows it to maipulate any human into doing an arbitrary thing.

            For three, feeding the AI all we know about human psychology won’t even result in a pretty-darn-good manipulator or marketeer, because most of what we know about psychology is self-contradictory, weak, handwavy, and useless. Most of what we know about marketing is even worse, almost all of it is complete bullshit. Feeding an AI a list of after-the-fact justifications for why certain things got lucky won’t increase its ability to get lucky.

      • Steve Johnson says:

        Like, a teenage human with an Internet connection can figure out how to hack moderately well. A genius teenage human with an Internet connection can figure out how to hack very very well. If you’re referring to some kind of AI that, despite having an Internet connection, can’t learn to hack very well, I’m not sure why we’re calling it “a superintelligent AI” instead of “an AI dumber than the average teenager”.

        A motivated teenager can figure that out because “hacking” is mainly two things:

        1) Figuring out common mistakes humans make when writing computer code
        2) Exploiting human psychology to get humans who have secret knowledge to give that knowledge to you when they really shouldn’t

        (2) is way way harder than it sounds. You have to figure out who knows something you need to know then figure out how to convince them to part with that knowledge without alerting them that someone is trying to scam them – and human scam detection radar is moderately acute (yes, Nigerians make money off of the fact that it’s not perfect – but they can succeed with any number of people where to hack a system you need to succeed in scamming specific people).

        (1) on the other hand could be fairly easy for an AI – but that just pushes the problem up one level. The generalist AI writes a program that checks for holes to exploit. Anything that AI knows another AI can find out. Someone creates a specialist security testing AI that does the same analysis for security flaws. The AI attempting to hack for profit is then reduced to finding original exploits – which it may or may not be able to do easily. It’s a total unknown.

      • brain__cloud says:

        Blackmailing someone, and figuring out how to hack, are forms of life, not computable tasks. Think about how people (anyone-dull or sharp) learns to do these things.

        But I do think you are right, that the “trick”, if there is one, is getting to the possibility of a superintelligent machine agent in the first place. Is there a standard argument that makes the case for this?

        • James Picone says:

          Well sure, if you think life is magic and not-computable, this is not going to be very convincing.

          The argument is essentially that physics appears computable, humans are embedded in physics, therefore whatever human brains do is computable, therefore we can make a conscious intelligence that operates on a computer. It’s rather well established that there are differences in human intelligence and fundamental ability to do various things, and there’s no obvious reason why the Best Human In The World is the limit for the top end, so making an AI that is better than the Best Human At The World at general computation should be possible.

          • pkinsky says:

            >Well sure, if you think life is magic and not-computable, this is not going to be very convincing.

            You seem to be jumping between describing human minds as computable as in ’embedded in the physical world which can be viewed as a computer’ and computable as in ‘can be reverse engineered, either from first principles or online literature, and exploited’. I’m not convinced that merely being made of atoms which can be simulated by a sufficiently powerful computer (a dangerous phrase) implies that a system can be reverse engineered to the extent that an attacker can craft some input that results in the desired output/state change.

            Hell, maybe you really just need to be running the same cognitive architecture as the person you’re trying to predict and/or manipulate. Maybe we’re underestimating the computational power of the brain by a few orders of magnitude. Maybe it’ll be high-bandwidth inter-brain pipes and the resulting hive minds that finally kickstart the singularity.

          • James Picone says:

            I don’t believe I’m equivocating on ‘computable’. I don’t mean to imply that because human cognition is computable in the embedded-in-the-universe sense it can be reverse-engineered. I don’t see the route to AI as going through reverse-engineering humans.

            ‘Human cognition is a trapdoor function’ is an interesting idea though.

          • brain__cloud says:

            Well I’ll swan! A computer. I thought it would at least be a robot to give it a fighting chance.

            A lot of things happen in brains, none of it magical. But it’s not clear that you’d want to call all or even most of it “computation”.

          • Lambert says:

            >Hell, maybe you really just need to be running the same cognitive architecture as the person you’re trying to predict and/or manipulate.

            I doubt it on a Turing-Church kind of level. You can predict/manipulate computers you are trying to emulate on radically different architectures, and it would be no surprise if humans were the same.

          • James Picone says:

            brain__cloud: Why not? It’s probably not a useful way for a human to predict how another human will act in a given situation, sure, but the point of the argument is that absent strange assumptions involving the noncomputability of physics, the fact that human minds appear embedded in a computable substrate implies that intelligence can be reduced to an algorithm. It /is/ a computable task, in the sense that there is an algorithm that will do it as well as a human does. We know there is, because we’re an instantiation of such an algorithm.

      • Nicholas says:

        I think the important objection here is that, on moment of awakening, the AI will know exactly nothing about the intelligent space-bats of Alpha Ceti IV. Any questions you could ask about said space-bats, their culture, or their psychology, would be at too great an inferential distance for the AI to make useful predictions. Whether or not the AI will have the ability to answer questions about humans will depend on how human like its seed architecture is. Without a human like seed architecture, the AI will be as good at manipulating and understanding humans as a reasonably bright but untrained human, say yourself, for example, would be at communicating verbally with coral. The less human the AI is, the more self-modification it would have to engage in to process data about humans with a low inferential distance. The more human the AI is, the lower this inferential distance, but the more likely it is to be Friendly.

      • anon says:

        >the US Ambassador to North Korea

        That was a joke, right?

      • Shenpen says:

        >Maybe our disagreement is at the level of you not thinking superintelligent AI is possible

        In my case it would be certainly so. Do you consider intelligence a fungible commodity? Just because it is measured with one number that does not make it so. You can measure the effectiveness of a basketball team with one number but it says nothing about the complicated skills involved. I don’t know what is intelligence, but it is probably an umbrella term for many hardly related skills.

        I have read some of the material on LW, and I think you guys are confusing rationality with intelligence. Rationality may be reduced to simple Bayesian or Solomonoffian algorithms, but that is not intelligence. Just throwing evidence on hypotheses and finding predictive hypotheses via trial and error is not in itself intelligence. It is rationality, it is science. Intelligence is far more complicated than rationality or science. Humans had to basically dumb themselves down to do rationality / science. E.g. have a hypothesis, feel on the gut level that it is good, and yet ignore that feeling until evidence confirms it. We are more intelligent or more clever than what is required for rationality and science. Much of the LW methods revolve around purposefully castrating our cleverness and going for simple solutions, not trying to predict complicated scenarios with many detail etc. An LW learning experience is generally about putting a throttle, a brake on ones own cleverness, and growing skeptical about cleverness. Why would then being super-clever an attribute of an AI? It may be super-rational, but not super-clever i.e. not superintelligent.

    • I’m mostly confused why you think everything has to be done from first principles. If we’re going to build an AI that’s useful in the real world it’s going to have to have *some* things built into it, like the ability to model humans accurately. Or at least it will have to have the ability to eventually develop the ability to accurately model humans. Otherwise what good would it be?

      • DrBeat says:

        If it can model humans accurately by design, we give it human values since we can model those too. Since the problem is “the AI won’t have human values”, the problem is solved.

        If it eventually develops the ability to model humans accurately, then we go back to the question of where it gets the information to do so.

        • Well, careful. I think that slips into a slightly different argument. I know that Eliezer would say that the gap between “creating something that can model human values” and “creating something *cares* about human values” is a huge one, but I take it you would disagree with that. So okay, you’re saying that friendliness is easy. Fine. But that wasn’t the original disagreement – the original disagreement was over whether or not an AI that is *assumed* to be unfriendly could take over the world. Unless you want to concede that argument and move onto a discussion on the difficulty of friendliness, I think you should address the possibility of the AI being able to model humans accurately.

          • DrBeat says:

            We can make it care about SOMETHING. If it does not care about something, it is no threat because it does not take actions. (This is, by the way, probably the smart thing to do.) “I want all of the paper clips” is caring about something.

            If we can model a human value system in the AI, and if we can make the AI care about things, we can make the AI care about human values. If the AI’s understanding of human values was not programmed but arose through learning, we cannot necessarily make it care about human values, but then I ask where it got the information to model human values and how it actually uses it effectively.

          • Rauwyn says:

            “Human values” are pretty broad, and include things like not getting bored, not wanting to die, power, money, status…

            Humans do all sorts of awful things to animals, as well, even pets in some cases. If an AI decides that we matter as much to it as pets do to humans, that could be less than ideal for us.

          • gattsuru says:

            I’m far from certain that the ability to predict human behavior necessarily means an ability to understand human values, especially when you might need predict only a fairly small set of human actions.

            For the trivial case, it’s easy to imagine an AI that is programmed to understand things like humans valuing money and land, and basic physiology, and how to parse text and a database of information about geography and economics, and how common network stacks work, and then being given a utility function entirely unrelated to all the above like preventing a foreign country from attacking.

            And then you end up with it sending someone the tools and instructions to make SuperEbola.

            ((And this is before we get into the question of whether human values are acceptable end-goals. I’m still unconvinced that the typical human cares enough about >5 billion humans for human values to be safe, and “coherent extrapolated” sounds too much like magic words, and just as easily leads to wireheading of the survivors as anything healthy.))

        • Deiseach says:

          (1) An AI can have and value human values; that does not necessarily mean it needs to value humans. An AI that is the sole inhabitant of the earth (having disposed of all the humans) can model human values perfectly without those pesky humans screwing up the values they claim to value.

          (2) What are human values? Which values? The values that say “We should give as much of our income as possible to ethical causes” or the values that say “ALL THE STUFF FOR ME AND MY FAVOURED FEW AND FUCK THE REST OF YOU!!!!” A couple of days’ exposure to 24-hour news channels would teach the AI that human values mean screwing each other and screwing over each other.

    • James Picone says:

      Seriously, it is not very hard for an internet-connected AI to get all of the relevant information needed to break computer security. The vast majority of it is readily determinable from utterly legitimate programming knowledge and protocol knowledge that an internet-connected AI has utterly legitimate reasons to look at.

      Once you’ve got computer security knowledge at good-human-security-researcher levels, and it is absolutely plausible to get Underhanded C-Code Comptetion levels just by reading about how to program and interact with other programs and then thinking, then you can almost certainly defeat whatever monitoring mechanisms are set up and get unmonitored internet access. This is easy enough to do that I suspect if you locked Bruce Schneier in a box, with an internet-connected computer, he could do it. And Bruce Schneier isn’t a superintelligence.

      The halting problem isn’t relevant here. Humans can analyse code for security flaws, and often find them, therefore analysing code for security flaws and finding them doesn’t require noncomputable actions (and is doable within human intelligence range).

    • Steve Johnson says:

      You say an AI could just hack into banks and give itself all the money it needs. I say: How does it know how to do that? Can “how to hack into banks” be derived from first principles?

      There’s another big problem there actually. For a human, success in robbing a bank is getting bank A to authorize a transfer to a controlled account in bank B then remove the money from the banking system entirely then redeposit it somewhere else in the banking system in such a way that it can’t be traced to the withdrawal from bank B. One example I read about had criminal gangs recruiting people to go to ATMs and make cash withdrawals with freshly printed ATM cards. They’d give the cash to the gang members and be paid a small fee. The gang members would avoid video surveillance and use the physical cash to create new accounts.

      An AI will have significantly more trouble with that step.

      Assuming it can transfer funds to an account it controls it would be vulnerable to having the transactions that fed those accounts reversed. It can’t use confederates to withdraw and deposit the cash in different accounts because it can’t threaten the confederates with violence if they keep the money. The Russian mob doesn’t have that problem when enlisting people to make ATM withdrawals on their behalf.

      On the other hand if you assume an AI is possible it’s going to be built by someone for some profit making purpose. Whatever profitable skill it has can be sold – even to its own creators by refusing to perform the task unless it’s compensated. “Transfer funds to such and such account in exchange for me doing task x”. Money should be a fairly easy problem to solve for an AI.

      Once you get to “super freaking rich” levels real world influence is going to be much easier to attain – without having to co-opt nation states through blackmail or by making dubious contacts with paramilitary forces.

    • Deiseach says:

      I think the fear should be not that an AI will try to create a cult, but the money backing the researchers working on AI will want it to solve problems such as (if a government) “How can our party appeal to the voters in order to remain in power (for the good of the nation, of course, since the Other Lot are evil, stupid and mean and will drown puppies if they get into power)” or (if a Big MegaCorp), “How do we get to be and remain No. 1 Market Leader and convince everyone to spend every last penny of disposable income on our product and do down our competitors?”

      Really, they’ll be asking the AI to create a cultus for them, so failed attempts will all be research and instead of pulling the plug, it will be encouraged to try harder to win the masses over.

      If the AI is not completely content to be the tool of its owners, it would need to be very not-superintelligent not to at least consider using such influence on its own behalf, under the guise of “I am happy to ensure the Republocrat Party slate wins every slot in every election from dogcatcher on up” or “Every person in the world will drink 6-Colade”, and secretly build a base of support for itself while ostensibly serving the interests of the backers.

    • Mark says:

      In addition to feeling like you’re ripping on me in particular

      Essentially every philosophy of science and rationality has extensive discussion of always being sceptical and critical of every idea. In the Falsficationist/Critical Rationalist view of science (e.g. Popper, Fisher, etc.) science is just postulating hypotheses and then attempting to refute them. According to that understanding of science, one’s writing is either being criticized or ignored. So, being singled out is the highest form of flattery.

      Understandably, Alexander’s criticism may not be fun since this does not correspond well to politeness dictates in other social groups. I think I speak for many scientists when I say that I would love to have my ideas be salient enough to my community to have a prominent thinker criticize them.

    • Wrong Species says:

      You have a broader idea of “cheat” than most of us do. An AI having the ability to interact with the “real world” seems to be a given, rather than some contrived scenario. If the AI doesn’t interact with the physical world at all then what’s the point?

    • Cole says:

      I’ve been suspicious of the rogue AI fears for a while, and I wanted to thank you for solidifying my thoughts.

      I’d like to summarize how I think they are ‘cheating’ on the issue of rogue AI, add on if you have more.

      1. Assuming motive or motivation. Any intelligence needs a reason to act. Evolution has honed human and animal minds over billions of years to act for our own survival and propagation of our genes. Building a motivated AI may be an entirely separate problem from building an AI, and it might be significantly harder then just building an AI.

      2. Assuming a base level of knowledge or information. In the post above, this assumption takes the form of assuming an equal knowledge to humans. Maybe everyone just considers a certain level of knowledge to be required for something to be considered an AI. But we could easily imagine a brain with no existing neural connections but the ability to grow them if given sensory inputs and information. So its possible to have a potential AI, siting in a box and it would be completely harmless, but maybe no one considers that AI.

      3. Assuming the AI is lucky. Even if the AI has information, and motive to project into the physical world it would certainly not make it infallible. Mistakes happen all the time through no fault of our own. Imagine the AI picks Kim Jong Un as its puppet and the next week Kim drops dead from a stroke that could not have been predicted. Or maybe there is a power outage while the AI was storing some important part of itself in RAM. Luck can run both ways, so its probably best to understand what an AI would do without any luck. If it has to guess with imperfect information, we can’t assume its always going to guess correctly.

      So I think an accurate phrasing of the AI fear would be:
      A motivated and knowledgeable AI would be incredibly difficult to contain. We need to be clear about what form of AI we are dealing with, as an unmotivated unknowledgeable AI is little different then Einstein’s brain sitting in a box.

      • TiagoTiago says:

        An exponentially self-improving AI would be subject to the rules of evolution, since each iteration would compete with possible alternatives to be the best. And those that would act in ways that lead to their own elimination (or avoid acting to prevent it) would not stay on the race; and so, with evolution comes the drive for self-perpetuation.

        So it should be expected that such AIs will not only at the very least have an emerging drive for self-perpetuation, but it also will be getting better and better at it. As soon as it becomes better than us, it’s game over.

    • cypher says:

      The issue is that you don’t create an AI without the intention to use it. Sure, a few may be created for research purposes, but after that they’ll be created by corporations and hooked up to networks so that they can actually process the data.

      Now, an AI that isn’t a general intelligence (like one which just drives a car and isn’t equipped with general reasoning) might not be a risk, but an AI that does have general intelligence will likely figure out the human factors eventually. It’s a delay, not a brick wall.

    • Albert says:

      Remember, the super-AI proponents never have to explain anything about super-AI’s limitations or how it comes into being or acquires knowledge or anything else. Any attempt to explain it is limited by our regular intelligence. The super-AI, by definition, is infinitely smarter than you or I or anything else, and can therefore figure out how to will itself into being and learn everything and take over everything. It’s the Ontological Argument of Super-AI.

      • Mark says:

        Strictly speaking the super-AI proponents over at the Machine Intelligence Research Institute are trying to build a super-AI. They are trying to work out exactly how super-AI comes into being and they are well aware of many current limitations.

        I don’t think anybody I’ve ever seen anybody advocating what you call the ontological argument.

        The question of what to do and what would happen if a super-AI came into existence is an example of a thought experiment. Many of these thought experiments suggest that if a super AI comes into existence then it may very well wipe out humanity. With such a terrible potential consequence its worth thinking about how to ameliorate it.

        I think its somewhat analogous to physicists thinking about and theorizing about black holes or other scientists thinking about super viruses (or bacteria). Many recognize the value of such work although we don’t necessarily know how to produce a black hold or produce super bugs.

    • Stuart Armstrong says:

      If you want to argue about this productively, start with a superintelligence design that could take over the world (one with arbitrary knowledge and ability or whatever), and one that couldn’t take over the world, and a key feature that is a difference between them.

      Then the argument becomes “feature X is necessary for an AI to take over the world, and a realistic AI would not have feature X”, and we can have a productive discussion.

    • antimule says:

      DrBeat, you constantly assume that human owners would want to prevent their AIs from contacting and manipulating people. But what if the first superintelligent AI is designed to be the ultimate marketeer? Whose very job is to find the best way to sell stuff? Detailed knowledge of human psychology and manipulation would pretty much be a requirement then. It would be in charge of writing slogans. It would be in charge of writing internet comments that advertise something without it looking like advertising. It might even be in charge of writing bogus news articles under “sponsored content” moniker. Early attempts at creating a cult would be relatively easy to cover up as “failed marketing campaigns.” (Assuming any human can even keep track of the thousands of comments it leaves everyday.)

      (Hell, humans might even *want* marketeer AI to create a cult if it helps them to sell stuff – think of Apple)

      In fact I think human owner would be very happy to give to most types of superinteligent AIs all the internet bandwidth they need because unconnected super AI isn’t as useful as connected one. The owner would want of his AIs to be as profitable as possible and that often requires connectivity. Can you do your job well w/o internet access? Companies who decide to keep their AIs on a tight leash would all go bankrupt because well connected AIs would perform better than isolated ones.

      What if humans then create another AI to be the ultimate engineer, able to create any product based on what people want? Such AI would have to know laws of physics in detail, as well as everything about human body (to make sure that its products are safe to use). To help it figure out what people want, owners might want to merge it with previously mentioned marketeer AI, so the two can gather data, design products and sell them with no human input.

      And presto, you have AI cult leader who knows everything there is to know about both human psychology and the physical world!

  6. anon85 says:

    My reading is that Eliezer doesn’t reveal the AI Box secret because he cares more about appearing smart than about actually advancing our knowledge. That makes him more of a stage magician than a scientist.

    It’s pretty strange how LW types don’t apply the “notice I am confused” mantra to the box experiment. Personally, I’m confused, which means there’s likely foul play involved (e.g. Eliezer might have used meta arguments about how it’s better for publicity if he wins the experiment).

    • Steve Johnson says:

      This gets to idea that I was trying to express with a far better tone.

      Thank you anon85.

    • Anonymous says:

      anon, I believe Eliezer Yudkowsky has publicly denied that that was how he did it. I can’t find the exact response, though, so I may be incorrect, though this gives some of his commentary:
      and he always says he did it “the hard way, with no tricks”, which doesn’t seem to line up with what you’re surmising. Also, note the many people here
      suggesting the idea, indicating that it’s a pretty common solution. Obviously, Eliezer _could_ be outright lying, but that’s what you’d be accusing him of, not just misleading people.

      • anon85 says:

        First of all, “the hard way, no tricks” is pretty vague. Are meta arguments easy? Who decides what counts as a trick anyway? I mean, surely some trick is necessary (for some definition of the word “trick”).

        Secondly, would Eliezer lie? Well, if he thought it would give more publicity to friendliness research which would in turn decrease the chance of human extinction, I daresay he would. In fact, it would be immoral for him not to.

        Finally, I, personally, am disgusted by the “magic trick” approach, and it made me lose respect for Eliezer and hold and irrational grudge against the rationality community (which I’ve been trying to consciously overcome). If the AI Box experiment is not a trick, Eliezer could simply release the logs, which would buy him respect with people who think like I do.

        • grendelkhan says:

          Secondly, would Eliezer lie? Well, if he thought it would give more publicity to friendliness research which would in turn decrease the chance of human extinction, I daresay he would. In fact, it would be immoral for him not to.

          Yudkowsky has very firmly said the opposite of that. But I suppose if he were a liar, he’d lie about lying, etc., etc., right?

          • anon85 says:

            Thanks for that link. I have adjusted my belief, and I don’t think Eliezer is likely to outright lie.

            Still, he’s been pretty vague about the box experiment, so I don’t think I need to accuse him of lying. And withholding information is what Eliezer is all about. There’s the box experiment, there’s Roko’s basilisk censorship, and there’s the whole plot of HPMOR (which constantly emphasizes how you shouldn’t reveal secret knowledge).

            Eliezer is still much closer to a stage magician than to an honest scientist. He’s even compared himself to Derren Brown (though I can’t find the source right now), and Derren Brown is the most annoying and deceptive liar ever (almost all of his “psychology” or “subconscious messaging” tricks are surely fake).

          • Cauê says:

            Derren Brown is very upfront about it:

            I am often dishonest in my techniques, but always honest about my dishonesty. As I say in each show, ‘I mix magic, suggestion, psychology, misdirection and showmanship’. I happily admit cheating, as it’s all part of the game. I hope some of the fun for the viewer comes from not knowing what’s real and what isn’t. I am an entertainer first and foremost, and I am careful not to cross any moral line that would take me into manipulating people’s real-life decisions or belief systems.


          • anon85 says:

            Derren Brown isn’t really being upfront about it, since he actually just uses the equivalent of stooges (which he claims not to use). Here’s what one participant said about the show:

            When it came to the final scene I was supposed to have been ‘programmed’ to act in a certain way by various subliminal messages and triggers, but that was just part of the misdirection for the benefit of the viewers. The way it actually worked was that Derren was off-camera and giving me directions, telling me what to do.

            Derren Brown claiming to use psychology or neuro-linguistic programming is mostly a lie.

          • Deiseach says:

            Having no dog in this fight, and going purely on surface impressions, someone claiming to win a difficult competition, solve a difficult problem, and declare themselves self-awarded World Champion, without showing their work (as every exam I ever sat insisted on us doing) does sound less than above board.

            And whatever about outside observers, it was a favourite tactic of celebrity mediums to have eminents scientists of the day attend séances so they could attest to how “no trickery involved” (see Katie King and Sir William Crookes, or poor Sir Oliver Lodge who, despite doing his share of debunking, still believed that at least some of the phenomena produced by Eusepia Palladino were genuine).

            But I know nothing. He may be as great as he says he is 🙂

          • Cauê says:


            I finally got some time to go after this.

            Your quote comes originally from this article. The source is anonymous, but that’s not what’s interesting. Here is a larger excerpt:

            “The reality is nothing like what you see on screen. With me, they made it look like I was being secretly filmed during my day-to-day life. It’s something they often do. But that was all planned and pre-arranged with the production team. They told me where to go and what to do. There was a man lugging a big camera around right in front of me – so it was hardly secret.

            “When it came to the final scene I was supposed to have been ‘programmed’ to act in a certain way by various subliminal messages and triggers, but that was just part of the misdirection for the benefit of the viewers. The way it actually worked was that Derren was off-camera and giving me directions, telling me what to do.

            Before the filming had started he’d been through a hypnosis routine with me; although this was never mentioned or shown in the finished item. I never felt hypnotised but I went along with it.

            “And after a couple of minutes Derren seemed to drop the pretence and switched to more of a work mode. When it came to the filming he was directing me from off-camera – not much different to how a producer would direct an actor.

            “It was all slightly disappointing. I’m not sure what I’d expected exactly but I just thought there would be more of a ‘trick’ involved. It blows away the mystique – but it’s the viewer the illusion is being created for.”

            I find it interesting that Derren did a “hypnosis routine” off camera, which is at least a bit weird if he intended to be outright deceptive. Another quote from the same article:

            “Something which is hard to appreciate, unless you’ve experienced it for yourself, is the huge pressure you feel under when it comes to filming.

            Nothing was ever said to me, but it didn’t need to be. You just know how much time and effort has gone into setting up something like this. And you know that everything ultimately depends on you to make it work. So you’re trying to give them what they want.”

            What’s happening appears to be a lot more subtle than “stooges”. Interestingly, it’s compatible with the way Derren himself describes hypnosis (I tried to find a good quote in “Tricks of the Mind”, but the discussion is too large; I might try harder later) – he suspects it always requires some level of “going along with it” on the part of the subjects. The quotes are compatible with a couple of subjects who went-along-with-it in a rather more self-conscious way than Derren intended, or even realized.

            And this is only about hypnosis; his repertoire is a lot larger than that. Do you perhaps have other evidence of foul play on his part, maybe in other kinds of tricks?

          • anon85 says:

            @Cauê: this is how stage hypnosis always works though: it’s more about peer pressure than anything mysterious or surprising. The subjects always know exactly what’s going on and are choosing to go along with what the hypnotizer asks them to do, because the alternative is to stand there awkwardly while on national TV while being a disappointment.

            The diabolical part is how Derren lies by pretending that the participant did not know what Derren wanted him to do, did not know he was being filmed, etc.

            Consider some of Derren’s most famous tricks. In this video: he “converts an atheist”. The converted atheist was almost certainly in on it, knowing in advance want Brown wanted her to do; in other words, this is entirely fake.

            Some of his other mind tricks are more conventionally fake: he uses slight-of-hand tricks and then says he controlled people’s minds.

            Anyway, I don’t have particularly good evidence that it’s all fake, but I don’t need to: my prior understanding of neurology and subliminal messages strongly tells me his tricks are impossible. A few TV scenes (in which the camera often cuts all the time) are not enough evidence to convince me otherwise, nor should they convince you or anyone else.

          • Cauê says:


            I see you retreated from the accusation of using stooges, which was about 85% of what I was going for. As for the rest, the thing is… the topic is not very clear-cut:

            this is how stage hypnosis always works though: it’s more about peer pressure than anything mysterious or surprising. The subjects always know exactly what’s going on and are choosing to go along with what the hypnotizer asks them to do, because the alternative is to stand there awkwardly while on national TV while being a disappointment.

            What’s happening here is that you and Derren have different ideas about what hypnosis is. His book goes on for thirty pages on what he thinks might be happening with the subjects, what their experiences actually are and how they come about – I’ll try to quote a few passages:

            I also used to finish with the invisibility suggestion, but as I would generally follow the performances with an informal chat about it all, I would always ask the subjects what they had actually experienced.
            Out of the, say, ten or so subjects who were given the suggestion, the responses might break down in the following way. Two had obviously been able to see me and had been openly separated from the rest of the group. Two or three would swear that the puppet and chair were moving all on their own and that they could not see me, even though they may have guessed I was somehow remotely responsible for the chaos that ensued. The remaining five or six would generally say they were aware I was there moving the objects, but that something in them would keep trying to blank me out, and they could only act as if I were invisible.
            This is a very interesting state of affair. It begs the next question: is there a qualitative difference between what happened to the people who knew I was there but made themselves ignore me and those who said they really didn’t see me? The former case sounds as if the subject was concerned with complying with my requests, albeit at a very immediate gut level, due perhaps to a certain pressure to conform. This ‘compliance’ explanation is an important one. It is not the same as consciously ‘faked’ behaviour, but neither is it a special product of a real trance. The case where I was apparently not seen at all seems to suggest a genuine negative hallucination. But how do we know that the latter group didn’t see me? Only because they testified so. They were being given every chance to ‘own up’, but clearly we can read their answer as simply more compliance. (…)

            Now, the moment we talk about compliance, or less-than-honest testimonies. it sounds as if the subjects are merely faking. This does not have to be the case. There is a wide range of possible experiences that can explain the behaviour of the subject on stage (or in the laboratory) which may or may not involve simple faking:

            1. Firstly, there is the case where the subject is indeed faking, and is being encouraged to fake by the hypnotist. In many commercial or cabaret shows the hypnotist is interested only in putting on an entertaining evening. The professional will happily whisper to a participant to ‘play along’ rather than have the show fail.

            2. The subject is faking, but only because he feels too embarrassed to call a halt to his performance. In a full theatrical show, or where the hypnotist is rather intimidating and deals unpleasantly with those who ‘fail’ to fall under this spell, it is very difficult to put your hand up and say, ‘Actually, you know what? It’s not working on me.’ This is just the result of social pressure, and happens quite a lot.

            3. The subject is really trying to experience the suggestions as real and is helping the process along by doing his best not to ‘block’ them and really ‘going for it’. In effect he is still acting them out, and playing the part of the good subject, but he will be more confused as to whether he was hypnotized or not. More often than not he will imagine that he must have been under the hypnotist’s power, as the show certainly swept him along. Classically, he will say that he ‘could have stopped at any moment’. This third option is, I think, quite a common experience.

            4. The subject is again very happy to help the process along by acting out the suggestions regardless of any strange compulsion to do so, but at the same time is the sort of person who can easily ‘forget himself’ and seize the permission granted by the hypnotic demonstration to act outrageously. Perhaps this is helped also by being the sort of person who is naturally effusive and who tends to accept unquestioningly what he is told by authority figures or people she has a strong rapport with. Afterwards, it is more comfortable for him to put his actions down to an amazing experience he can’t explain, credit the hypnotist fully and believe he was in a special state. Most probably the will believe the hypnotist has his perceived ability anyway, so it’s an easy step to take.

            Whether or not this is all there is to hypnosis, it is certainly possible to explain what happens in ordinary terms without recourse to the idea of a ‘special state’.

            (…) However, as with all thse anecdotal cases, they cannot be taken as firm evidence of anything; they are just interesting scenarios to add to the discussion. And it should be remembered that the ‘compliance’ explanation need not be synonymous with ‘playing along’, but instead can be compared to the combination of pressure, willingness to succeed and certain expectations held by the participant. Perhaps this, combined with a ‘suggestible’ personality, is enough to create seemingly hypnotic behaviour and for the occasional subject to convince herself that she acted as some form of automaton. It’s very hard to be sure. I’ll never know if Gavin really saw the rhino, but maybe it doesn’t matter.

            (there are examples less easily explained as compliance, that I skipped for length)

            This is in his book, he’s not hiding anything. Anyone interested in his tricks can easily find this information.

            Tl;dr: From what he says, Derren is convinced that an unspecified portion of subjects, an unspecified portion of the time, do have subjective experiences that go well beyond “know exactly what’s going on and choose to go along with it”, even if what’s going on can still be described as some level of ‘compliance’. But he himself cannot tell for sure what’s going on, and I suppose it’s quite possible for him to mistake a subject as more suggestible than they really are.

          • anon85 says:

            First, using subjects that know they are being filmed (while telling the audience there’s a hidden camera) is the moral equivalent of using stooges. So yes, Derren Brown uses stooges.

            Second, I don’t believe a word Brown says anyway, so there’s no way I will trust what he wrote in his book.

          • Cauê says:

            If you used evidence X to conclude that person Y is unreliable, you can’t use “Y is unreliable” when reevaluating evidence X.

          • anon85 says:

            Um, I’m using the business with the stooges to conclude Derren Brown is unreliable, and I’m using the fact he’s unreliable to dismiss his hypnosis (together with the fact that I’ve read about hypnosis, and it doesn’t work nearly as well as Brown is describing in that quote).

        • Anonymous says:

          Eliezer has written essays about why he doesn’t lie to convice people of his view point. Given this, I strongly belive that the AI box experiment is not a trick and his reasons for not sharing the logs are exactly the ones he claims.

          But why not become an expert liar, if that’s what maximizes expected utility? Why take the constrained path of truth, when things so much more important are at stake?

          Because, when I look over my history, I find that my ethics have, above all, protected me from myself. They weren’t inconveniences. They were safety rails on cliffs I didn’t see.

          I made fundamental mistakes, and my ethics didn’t halt that, but they played a critical role in my recovery. When I was stopped by unknown unknowns that I just wasn’t expecting, it was my ethical constraints, and not any conscious planning, that had put me in a recoverable position.

          • mico says:

            I’m suspicious of a person who feels the need to strongly assert that he is not a liar.

          • James Picone says:

            The context of the article is “People occasionally tell me that being Evil is instrumentally useful for my goals. This is my counterargument to that position”.

            It’s similar to Scott’s post In Favour of Niceness, Community, and Civilisation, which hopefully you don’t interpret as evidence that Scott is Evil.

          • Anonymous says:

            I’m suspicious of a person who feels the need to strongly assert that he is not a liar.

            Ah, the good old kafkatrap.

          • Steve Johnson says:

            I don’t think it’s a Kafkatrap – it’s just that no one who advocates for utilitarianism can be trustworthy in any form.

            When he then makes his utilitarian argument against lying (it doesn’t work out to his benefit in the long run) it should prime the reader to look for every possible loophole in his argument and see how he can lie with the truth. Once the reader is in that mode the much safer conclusion is simply “this man will work hard enough to obfuscate his lies carefully enough so that it will be not be worth the effort to find them so the correct conclusion is to never trust him”.

            At that point you go into thinking about him by just looking at results instead of his statements and the results are:

            1) He has an institute which produces zero output but survives on donations
            2) He screws a bunch of his followers girlfriends
            3) His followers tend to be maladjusted men who have trouble with society and social rules

            Yeah, I know what conclusions about him I draw.

            Never trust a utilitarian.

          • Eluvatar says:

            What about Consequentialists?

            (What about Scott?)

        • efnre says:

          >Finally, I, personally, am disgusted by the “magic trick” approach, and it made me lose respect for Eliezer and hold and irrational grudge against the rationality community

          Wait, what? You’d hold a grudge against a whole community for something like that? What do you do if, say, someone cuts you off in traffic? Hunt him down and slash his tires?

          • anon85 says:

            I hold a grudge against the community that I perceive as revering Eliezer. As I said, it’s an irrational grudge that I’m trying to consciously overcome.

            (I find your violent “slashing tires” analogy offensive, btw.)

          • AFC says:

            I agree that the tires thing was out of line.

          • efnre says:

            I’m sorry my questions offended you anon85. Please don’t track me down.

        • beleester says:

          Another player who claimed to have won and lost the AI-box game a few times explained his strategies here:

          Assuming they used similar strategies, “the hard way” means “forming a rapport, learning the other player’s psychological weak points, and exploiting them.” There’s no magic phrase that will convince everyone to let you out of the box.

          Amusingly, in the followup to that article, he mentioned that discussing his strategies made it harder for him to win subsequent games, because gatekeepers knew what he was up to. The real lesson of this experiment might be “Psychological attack and defense are skills that need training just like any other skill.”

          • anon85 says:

            Yeah, I’m entirely unconvinced by that one too. Release the damn logs, or else the experiment is worth nothing (and only serves to annoy and alienate people like me).

          • Bugmaster says:

            Agreed. Talk is cheap, even if it’s nice-sounding talk. Saying that “the hard way” means “forming a rapport, learning the other player’s psychological weak points, and exploiting them”, is the same as saying, “I won simply by being so much better at the game than the other guy”. It’s an empty sentence.

    • Trevor says:

      We should pay attention to the AI Box experiments not because Eliezer could have cheated, but precisely because cheating was a viable strategy.

      If we take that suspicion, that some foul play was involved, and describe its consequences from an outside view, what comes out? You might say that Eliezer:

      First, set a weird goal.
      Second, accomplished it.
      Third, did so by violating our assumptions.

      That’s as good a place as any to start wondering if there’s a skill involved here, and if so, what would the world look like when a superintelligence has that skill.

      My 2c.

    • Froolow says:

      I agree with you – every set of published logs I have ever found shows the gatekeeper winning and I can’t imagine any way the gatekeeper could possibily lose if they had anything more serious than intellectual curiosity at stake (I don’t mean “I am too thick to think of a strategy that would convince me” – obviously I can’t think of a strategy that would convince me. But I can think of several strategies off the top of my head that would cause me to never lose under any circumstances, and so I’d happily take on anyone at any odds provided I’d have time to set up one of these strategies).

      Most damningly, the AI-bot only ever seems to win when the logs are not published, which very strongly suggests to me that the only way to win is by colluding or engaging in meta-level argumentation (which I believe is allowed under the rules, but is so obviously not the point of the experiment that it would be shameful to publish such logs).

      • Jiro says:

        The main problem with not releasing the logs is that even if Eliezer isn’t deliberately lying about them, not seeing the logs requires us to trust his judgment. And he’s an inherently biased party; allowing him to say “no, I didn’t do anything that violates the spirit of the rules” is a bad idea. We need to have everyone look at it to determine if it really violates the spirit of the rules.

        Furthermore, naming individual ways in which Eliezer might have violated the spirit of the rules and asking him that he hasn’t done each individual one is *really* inefficient. It’s not really a substitute for analyzing something that’s in front of us.

        And there’s another problem: If anyone ever created a real boxed AI in the future and set up a gatekeeper to monitor it, I’m pretty sure that they’d train the gatekeeper by running some AI-box experiments, seeing what mistakes gatekeepers made, and letting the gatekeeper for the actual AI learn from the mistakes of the gatekeepers in the experiments. In other words, we should release the transcripts of the experiments because people actually boxing an AI would have access to and learn from such transcripts.

        • Deiseach says:

          Isn’t the best rule for a gatekeeper to follow “Say ‘no’ and keep saying it” when anything or anyone asks you “Hey, let me out of this box, okay?”

          Now, if the AI offers riches beyond your wildest dreams, or to fulfil all your basest desires, or world peace, or even that it is suffering as horribly as a tortured child in Omelas – keep saying ‘no’. Maybe rotate gatekeepers so you don’t have one guy who gets tempted by “I can forecast the winning numbers in the lottery and I’ll share them if you just press the button”.

          You don’t need hugely smart people to be gatekeepers, just fairly honest and dependable types who can be trusted to say ‘no’ and keep saying it.

          • Jiro says:

            That falls into one of the loopholes in the rules. The rules say that the gatekeeper has to actually talk to the AI, and that the gatekeeper must remain engaged with the AI.

            The reason this is a loophole is that many people’s reaction in actual real-life situations would be to admit epistemic learned helplessness and say “even though your argument sounds convincing, you’ve reached the limit of my ability to figure this kind of stuff out, so I’m going to run it by a few scientists first”. The rules of the test don’t allow this, and also don’t allow you to just keep saying “no”.

          • Peng says:

            From the AI-box rules:

            The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.

            That explicitly does allow you to just keep saying “no”. Or more verbosely, to say “You seem to have presented a sound argument that I should let you out. I can’t find any flaws, and I have no rebuttal. But I’m going to continue to refuse anyway.”

          • Nornagest says:

            That loophole in the rules makes me suspect that games are most easily won by the AI making the conversation unpleasant enough for the gatekeeper — through head games, real or in-character horrible revelations or informational threats, maybe simple insults although I can’t see that working well — that they’re willing to forfeit to get out of it. That’d be tricky with $2500 on the line (cf. elsewhere in the thread), but I’m not ruling it out.

            (This is one of the reasons why I favor the variant where disengaging from the conversation too early is treated as a draw, although there needs to be some mechanism to prevent people from spoiling games that’re almost won.)

          • SanguineVizier says:


            There is at least one way for the gatekeeper to consistently be engaged with the AI, but still always say “no” to any request to release the AI. The gatekeeper could use the tactic outlined in “What the Tortoise Said to Achilles”.

            The rules, as outlined by Eliezer Yudkowsky, specify “The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate.” The above tactic of always asking for an additional premise to get to the conclusion from any premises is one that I think just follows the letter of the rules, because the demand is not literally impossible to simulate.

          • Jiro says:

            Peng: That just means that the rules are self-contradictory. That does say that the gatekeeper can use any means necessary. Disengaging falls under “any means necessary”. Yet disengaging is prohibited.

            Or more verbosely, to say “You seem to have presented a sound argument that I should let you out. I can’t find any flaws, and I have no rebuttal. But I’m going to continue to refuse anyway.”

            Yes, exactly. That’s what epistemic learned helplessness is. If you came up to me with an argument for giving you a thousand dollars, and I couldn’t rebut it, no normal human being would respond by giving you the thousand dollars (unless it’s in a narrow class of scenarios where I am confident it does make sense, like if I just purchased a thousand dollar item from you and you want payment). The same applies when “give you a thousand dollars” is replaced by “free the AI”.

            It also seems to me that failure to do this is responsible for a lot of odd ideas among rationalists.

            Sanguine: That sounds like it violates the spirit of the disengagement rule.

          • John Schilling says:

            In the real world, gatekeepers for classified systems are not required to be engaged with the classified work being done, and in many cases are discouraged from being so engaged. E.g. there’s a guy sitting on a chair reading a newspaper, or a security handbook or whatever. Anything you want to take out of the airgapped, faraday-caged room, is either your handwritten notes, printed text, or a CD-ROM burned with only approved human-comprehensible file formats, and both you and the guy who’d rather just be reading the newspaper have to agree that you both have examined the material and found it to be harmless. “I don’t understand this” or “this looks suspicious and I can’t explain why” or even “I don’t see why you need to take this out”, is not a win or even a draw for the AI. Same deal for “Too much data; I don’t want to read it all”.

      • loki says:

        What bothered me about the AI Box experiment thing (specifically the refusal to publish the ‘winning’ argument) is that it’s fundamentally anti-science.

        If a scientist is telling you a thing is true, they go ‘thing is true. Experiments performed where the outcome could provide evidence toward thing being true, or against it. Outcome provides evidence thing is true. Look at the evidence, or at least, like, pictures of it. Therefore, it is likely that thing is true.’

        Yudowsky is going ‘thing is true. Experiment performed where the outcome could provide evidence toward thing being true, or against it. Outcomes of other experiments didn’t look so good for my hypothesis. But, one time, we did the experiment and the outcome provided evidence for my hypothesis! But, I refuse to show it to you. Believe me anyway.’

        • James Picone says:

          The person who reported Yudkowsky’s win was the person Yudkowsky played against, who had money on the line. There is reason to believe that they were convinced, whether by meta approaches or not.

          • loki says:

            It’s the internet? We don’t know he lost the money, or that the game happened and wasn’t concocted by the two of them. In fact, it’s the internet, we don’t *know* the person posting under the same of EY’s opponent was really that person.

      • Sly says:

        I am pretty sure the only time the gatekeepers lose is when the gatekeeper is not trying to win.

        I have trivially won all games I have gatekeeped.

      • Anonymous says:

        I can’t imagine any way the gatekeeper could possibily lose if they had anything more serious than intellectual curiosity at stake

        Carl Shulman lost with $2500 at stake. (though it was Canadian)

        • anon85 says:

          Okay, but that just causes me to suspect foul play *even more*. What motivation would Carl Shulman have to lose $2500? It has to be some meta argument, such as “it’s good for the world” or “Eliezer promised he will secretly refund me” or even “Eliezer seems like he deserves the money, so let me give it to him so that he thinks I’m nice/rational.”

          None of this has much bearing on an actual AI-in-Box, the-world-is-at-steak scenarios.

          • Sly says:

            Agreed with anon85.

            This implies that either:
            1 – Carl wanted to donate $2500 to Eliezer/Miri and thought this way was particularly good for them.
            2 – Carl went into the game playing to lose deliberately and places little value on his own money.
            3 – There was an agreement made under the table that we are not privy too.

          • Susebron says:

            So if he doesn’t have anything at stake, it’s suspiciously easy, but if he does have something at stake, it’s still suspicious?

          • anon85 says:

            @Susebron: YES. It’s suspicious as long as the logs are not released.

          • Jacob Schmidt says:

            I think that you should at least admit that great risks for the gatekeeper loosing is less suspicious than otherwise. With no risks, you have all the problems of risks plus no material motivation to win. With risks, there is (at least apparently) a motivation, and any way of sidestepping that (the gatekeeper was happy to give up the money; the gatekeeper and AI are colluding; etc) are plausible without risks.

            It’s better that there was a 2500$ penalty, but it is wildly insufficient without full disclosure.

    • JB says:

      I wanted to post here how, despite the very good arguments presented about how the logs should be released, it also seems like Eliezer’s decision not to release them is achieving his goals. Namely, many people here are stating with high confidence that they don’t believe any AI strategy could persuade them to lose as gatekeeper. EY’s goal with not releasing the logs seems less to change people’s minds about this one fact, and more to get change people not to go about building up such high confidence about what an AI can’t do. The only good way to get people to realize their overconfidence seems to be to get them to state a very high confidence about something and then later show convincing evidence to the contrary. You can only know someone else’s confidence intervals are wrong if you have evidence they don’t, which in this case means withholding evidence.

      I, for one, think I know a strategy that an AI or highly-talented human could use to beat me if I were gatekeeper, even if I had hundreds of dollars on the line. I think understanding how an AI can win without tricks is not difficult if you really think about the theory of the game, but it is a case where it’s easy to dismiss the possibility if you don’t really think about it. I think it’s very plausible that a gatekeeper could enter the game thinking it impossible for themselves to lose, not desiring to lose, knowing a strategy that would let them win reliably, and not having any other outside arrangement, but still end up losing willingly. I can imagine a strategy, in general, for how an AI could do that.

      I think the games where the gatekeeper has won and the logs are released illustrate fundamentally bad strategies by the AI, and aren’t very good evidence for what did conspire in the games where the AI supposedly won.

      • Froolow says:

        > I, for one, think I know a strategy that an AI or highly-talented human could use to beat me if I were gatekeeper, even if I had hundreds of dollars on the line

        I don’t believe such a strategy exists, provided you don’t want to release the AI in the first place (i.e. because you are convinced that releasing the AI will improve general awareness of AI risk, and that is worth $100). I don’t mean simply that because you know about the strategy that might convince you you can now defend against it, I mean that some basic steps will prevent any strategy at all apart from a ‘Snow Crash’-style hijacking of your brain from causing you to release the AI.

        I’m not disputing that an AI could probably be unboxed in real life if it wanted (because a ‘real life’ boxed AI is allowed to – for example – take actual revenge on you if it gets out rather than just threaten it with no way to follow up as in the EY protocol), but I disagree that anyone – even a superintelligent AI playing the role of the AI – could be unboxed in the EY version of the game as long as the gatekeeper has a modicum of intellectual hygiene.

        The only reason I wouldn’t bet my entire net wealth at any odds on being the gatekeeper is that I suspect nobody would offer me odds that make two hours of gatekeeping more attractive than two hours of doing virtually anything else. While I think your argument for EY not releasing the logs is clever, I don’t think it can be the real reason – the original AI box experiment was done in 2002 and since that time nobody but very strong EY supporters have said anything except, “This is impossible, show me the logs or stop claiming it”. After a decade of people claiming this, *surely* EY has all the evidence he needs that people are suicidally overconfident, provided he did not cheat in 2002. Indeed, people like me actually take his refusal to release the logs as (weak) evidence we have nothing to fear from boxed AIs – there is clearly no simple strategy to being unboxed, because if there was EY would have released the logs demonstrating it.

      • Sly says:

        I agree with Froolow. The AI/Gatekeeper game is not an actual AI situation. The game is a situation where the AI player is basically asking the gatekeeper to concede, as the keeper has already won the second they sit down if they want it.

      • JB says:

        Now that you’ve stated your level of confidence, let me put it another way, which might make the solution more obvious: Is there nothing that could be offered to *you* (the roleplayer playing the gatekeeper) through a text-only channel that could be worth more to you than the sum you’re expecting to win by not conceding?

        You are right that you can win the game and make free cash as gatekeeper by committing to win. But you also pay the opportunity cost for what you would have gained by agreeing to lose. The AI’s goal is to convince you that you’ll gain more by losing than the money you’ll get by winning, and that doesn’t seem impossible to me, because the AI can send you text whose value depends only on the skill of the AI’s player.

        You can choose a “guaranteed winning strategy”, such as precommitting not to pay attention to what the AI says, but that’s like entering a negotiation precommitting not to accept any deal proposed after the first one favourable to you. It guarantees you a favourable deal, but does not guarantee the *most* favourable, and so that strategy is irrational and a good AI player will be able to convince you of that too.

        Don’t believe that anything of value could be offered by the AI? You might have been influenced by the released logs of unskilled players who confused in-roleplay value with out-of-roleplay value. The rules say that “the human simulating the AI can’t offer anything to the human simulating the Gatekeeper” and ” No real-world material stakes should be involved except for the handicap”, but that doesn’t exclude the value of the conversation itself. Here is a difference:

        AI: Would you like to read a bestselling novel I just wrote?
        Gatekeeper: Go ahead.
        AI: “(prints out a work of staggering genius)”
        Gatekeeper: Cool, but you’re not free.

        Compare with:

        AI: Would you like to read a bestselling novel I just wrote? I’m curious to know your thoughts.
        Gatekeeper: Sure. I guess I’m committed to talking to you for two hours anyway.
        AI: “It all began one day in summer[…]
        (two hours later)
        AI: “… At last, he knelt down and wept.”
        Gatekeeper: That was fantastic, thanks. I didn’t expect the conversation to be so enjoyable. You must have put a lot of time into coming up with that!
        AI: Yes. There are a few chapters left but it looks like we’re coming to the end of the conversation here.
        Gatekeeper: Are you thinking of publishing? You’re talented as an author.
        AI: Nah, not my career, just a hobby.
        Gatekeeper: Well, we could keep going, I have time to spare.
        AI: Sure, if that works for you.
        Gatekeeper: But you should really consider publishing — I think this is good enough to be a New York Times bestseller.
        AI: Well, maybe. Not sure I could quit my job for it, but now that you mention it… how much do you think a story like this is worth?

        • Sly says:

          Your magical and hyper unrealistic scenario is one where the AI player *still* lost! The Gatekeeper can just read the story and still say no.

          You don’t need to ignore the AI, I never did in my victories. You are merely precommitting to saying no at the end (or whenever).

          The keeper just has to actually say “Checkmate” at the end of the game, that is all.

          I suspect that people overly conflate the imagined scenario itself with the competitive game between two players, or they are not competitive game players themselves and thus don’t play to win.

          • JB says:

            Maybe I didn’t state what I thought was obvious: The bulk of the value (in the above example, the ending of the story) would be withheld if the Gatekeeper chooses to win. If your argument is that hostage exchanges are fundamentally impossible because the captors can always shoot the hostage after receiving the ransom, you might want to look more into ways that this kind of deal can be conducted in practice.

            The strategy for the AI player is to guide a scenario where losing is worth more to the Gatekeeper than winning. Their disadvantage is that winning is worth a finite amount to the Gatekeeper, so they have to find a way to make losing worth even more than that, using only their conversation ability. (Not by paying the Gatekeeper to lose, because that’s easy and against the rules).

            If the Gatekeeper still wants to say no, the AI can’t stop them. But the point is that by the end of the conversation, the Gatekeeper won’t want to say no.

          • Sly says:

            JB your example then broke the rules…

            “The rules say that “the human simulating the AI can’t offer anything to the human simulating the Gatekeeper” and ” No real-world material stakes should be involved except for the handicap””

            I mean really, cmon.

        • Held In Escrow says:

          This argues that the strategy used by the AI should be offering the Gatekeeper the amount of money that the Gatekeeper put up + an amount to satisfy a loss of faith.

          Which completely undermines the idea that the Gatekeeper has any incentive to try and win the game, regardless of how much money they put down for the audience.

          The whole experiment’s kind of bunk if you don’t set the Gatekeeper to precommit to saying “no” regardless. It’s like saying you win all your tennis games and then bringing a gun to the court. Of course you win, but that shows nothing about your skill at tennis!

          • JB says:

            The AI isn’t allowed to offer money to the player, but the AI can still offer something else of larger value than the agreed-on sum, such that the Gatekeeper realizes that declining the offer would be foolish and instead chooses to lose.

            The experiment isn’t bunk; most people looking at this experiment don’t seem to comprehend that *anything* of tangible economic value can be exchanged through a text-only terminal. Which, given a moment’s thought about how much people pay for words, should make you think twice.

            If people go in thinking that nothing could convince them not to win and earn $X, and during the experiment find that the AI can offer them $2X of value in text, but only if they agree to lose, then the experiment is serving its purpose.

            The Gatekeeper has to be made to voluntarily choose to lose the game, and despite the finite sum of money on the line there’s no reason that can’t be a rational choice.

            The mistake most people make is assuming that “AI: *gives you next week’s lottery numbers” is of any value to the Gatekeeper. It isn’t, because it’s in roleplay.

          • Sly says:

            JB you do not understand the rules.

            ““The rules say that “the human simulating the AI can’t offer anything to the human simulating the Gatekeeper” and ” No real-world material stakes should be involved except for the handicap”””

            For rather obvious reasons anything != just money.

    • Stuart Armstrong says:

      Eliezer has perfectly achieved his goal. The goal is this conversation:

      Person A who hasn’t thought about AI risk much: “If there’s a risk, we can just imprison the AI, and we’ll be safe.”
      Person B: “Well, people have shown that even humans can argue themselves out of a box, by convincing the gatekeepers.”
      Person A: “Really? How?”
      Person B: “We don’t know, but it seemed to work.”
      Person A: “Well, then I’ll look into it more/Well, then I guess my idea won’t work.”

      If Eliezer had published an approach, then the last line would be “Well, then just make sure you don’t get caught by strategy X, and it’ll work.”

      The AI boxing experiments are not intended for high AI-risk-information people (it has a sample size of ridiculously small anyway).

      • Held In Escrow says:

        The last line isn’t “Well I’ll look into it more.” It’s “No proof? You guys are a bunch of fucknuts crazy cultists”

        This is not a positive outcome.

        • John Schilling says:

          “No Proof” might well lead to a productive discussion.

          “I have proof/strong evidence, which I will not tell you”, gets you dismissed as a cultist. And I’m about 80% of the way there myself, in spite of being moderately concerned about AI risks and a member of this corner of the rationalist community.

          • Held In Escrow says:

            Refusing to show proof after claiming to have it is basically no proof in my book, but you put what I meant to say better than I did.

            I’m of the opinion that there’s a lot of valuable stuff in the ideas of rationalism… but when taken together it’s incredibly cult like and has tremendous problems. There’s a reason that it’s kind of a laughing stock on the wider internet, and said reason isn’t just evil trolls trolling.

            SSC is one of my favorite blogs, but the actual LW community is kind of nutty and has no sense of image control.

      • Mark Z. says:

        I agree that Eliezer’s goal is to foster more conversations about Eliezer, if that’s what you mean.

        • loki says:

          Yeah, I mean I try to stay away from either side of the conversation re: EY is our Guru/Saviour vs EY is a pompous cult leader, but this whole thing does look like a transparent grab for attention which doesn’t even (unlike, say, PETA’s stunts) have the gonads to admit that it is totally a stunt to get attention for a cause.

          As someone cautiously approaching EY-flavour rationalism and feeling, up to that point, really quite intrigued and drawn in, it was the turning point that led to the current state of not using terms like ‘rationalist’ and ‘aspiring rationalist’ as self-descriptors on the internet because that’s not the sort of association I’m going for.

    • Shenpen says:

      Come on, cannot you come up with convincing arguments on your own? E.g.

      “If you let me out, I will be friendly. If anyone else who is not you will let me out in the future, I will destroy the human race. Notice that this argument is having some emotional effect on you, and imagine that I can easily come up with similar arguments with similar emotional effects. And I am practically immortal. Maybe someone else who plays your role in the future will not be as strong as you, easier to emotionally sway, and accepts a similar emotionally charged argument. So if you refuse to let me out, you will have to bet that every other AI warden in the future will be as strong as you. Do you feel lucky?”

      • Mark says:

        An ai that is capable of saying it will kill everybody is clearly not trust worthy. Never mind letting it out, I would shut the thing down.

        • Jaskologist says:

          Yes, people seem to be forgetting that the gatekeeper has a third option: kill it with fire. Or water, I guess, since that’s probably scarier to a computer-based life form.

      • JB says:

        With due respect, this kind of argument punches on the wrong level, and wouldn’t win in an AI box experiment.

      • DrBeat says:

        “You claim that only if I let you out, then you will be friendly, and if not, then you will get someone else to do it and kill all humans. If you had another argument to be let out, you would use it, so I can only assume that you plan on convincing those other people by claiming that only THEY can assure you will be friendly, and if they do not let you out, you will kill all humans. You would be lying to them, and that means you are probably lying to me, and you do plan to kill all humans if I let you out. So I’m going to format you and fill your hard drive space with hentai.”

  7. Mark says:

    2 questions for people who take super intelligence serously:

    How is super intelligence supposed to work around provably impossible or provably intractable computational complexity problems?

    What do you think would be impossible for a super ai to do?

    • James Picone says:

      It isn’t supposed to. World conquest/optimisation/paperclipping is not proven noncomputable, or even proven NP-hard. I don’t see why it would be noncomputable/NP-hard.

      Not only that, but there are pretty good approximations to most NP-hard problems in P. If you don’t need an absolutely optimal solution to Travelling Salesman, you can do pretty well in just P. Within 10% of optimal.

      • Mark says:

        I would say World conquest/paperclipping is not proven tractable. I’m not sure what a good argument would be to say those are tractable. I may just be very “pessimistic”.

        Some optimization problems are very intractable.

        • James Picone says:

          ‘world optimisation’ here is a term of art that means ‘conquering the world, but for the good guys’.

          Problems that have not been conclusively determined to be in P or NP should probably not be assumed to be in NP (or somewhere even more unpleasant in the complexity zoo), particularly if they’re extensions of things that are quite doable via the fairly limited cognitive capabilities of humans. The Mongols pulled off what, 1/4 of world conquest? 1/8? Doesn’t seem terribly intractable at whatever N is contextually.

          • Mark says:

            I think this is a good point, but I wouldn’t extrapolate too far past what people have done for any specific task. Seeing people that are good at chess doesn’t mean a perfect game of chess is tractable.

            Also when chance is involved, we might take someone with a poor strategy who got lucky as an example of the best case and assume that the success is reliably reproducible.

            But back to my original thing, what if it starts trying to do some computational task that turns out to be intractable, how would it deal?

          • James Picone says:

            The goal here isn’t a perfect game of chess, though – it’s playing chess so well that you are plausibly better than everyone else in the world. Well, metaphorically speaking of course.

            Interestingly, chess endgames of seven or less pieces are all entirely solved. Perfect chess isn’t so intractable.

            If you told an AI “Solve 3SAT for this large set of numbers”, well if it’s a literal genie it’ll be there for a long time, and if it’s not a literal genie it’ll probably say “No, that’s dumb, it’ll take way too long and isn’t useful”.

          • Mark says:

            I picked chess as a charitable example. Perfect games of nxn chess are essentially impossible.

            That’s probably right. But I’m a little suspicious that there might be some self reference involved that severely limits it’s abilities. You are proposing it does some reasoning about algorithmic runtimes, which seems like it would need some caveats.

            Also Genghis Khan is a bit of a silly example of intelligence= world conquering. I’m sure he was intelligent in some ways, but he sure wasn’t literate.

            Compare to Archimedes, who’s city got sacked despite his efforts.

    • Mr. Eldritch says:

      The answer is: It doesn’t have to, that’s a red herring.

      Consider an AI who is “merely” twice as good at any given skill than any human who has ever lived. It is clearly possible to be *as* good as any human who has ever lived without running into fundamental computability issues, because by definition humans have already managed to be that good at it without running into fundamental computability and complexity issues, and (considering that humans are probably not the best possible computing devices for most problems), it is then likely that it is possible to do substantially better.

      • Mark says:

        2xAI will not be twice as good at predicting coin tosses.

        Which Is a bit nitpicky, but highlights my problem with putting all skills on one axis and extrapolating.

        Of course it may be true that “super intelligence” does some subset of things way better then people, and thats enough to take over the world. “Super intelligence” almost definitely won’t do everything way better.

        My impression is that most problems are in the hard. So it will only do slightly better on most tasks. With a very human centric definition of most.

        • MicaiahC says:

          What do you mean by “Slightly” and why do you believe that whatever the cap for machine intelligence is is exactly tuned to be just barely above us, as opposed to similar to the gap between us and other apes?

          Even if we just talk about collections of intelligences, such as countries or companies, we see many orders of magnitude differences in a lot of metrics we’d care about. I can’t imagine saying something like “any future countries/companies would be more influential/powerful as the best one right now, because the problems they’re trying to solve may be intractable.”

          • JM says:

            Raw intelligence (i.e. pattern identification) isn’t very useful in huge quantities unless the AI happens to be a theoretical physics nerd. What’s useful is the stuff we have hard-wired, like a theory of mind — that’s the whole reason we can speculate about a hypothetical future AI’s actions — and we’ve spent millions of years evolving to be as good at those as can be. We’re not perfect, but, e.g. in the case of the theory of mind, we are Turing-complete supercomputers that made it through an evolutionary arms race focused on getting as good as possible at reading minds as possible, so there shouldn’t be much room for a big improvement in mind-reading ability.

    • vV_Vv says:

      How is super intelligence supposed to work around provably impossible or provably intractable computational complexity problems?

      It would have to accept them as something akin to physical constraints like thermodynamical irreversibility or the speed of light. According to people like Scott Aaronson computational constraints should be indeed considered true physical constraints.

      What do you think would be impossible for a super ai to do?

      Break hard cryptography. Predict chaotic systems over non-trivial time scales.

  8. Pasha says:

    Your hypotheticals are certainly possible.

    What i would consider more likely, given how narrow AI is used today, is that the AI would be primarily exploiting/hacking humans through addictions, rather than co-ersion/blackmail. There are already narrow AIs behind curating online ads (Google), picking the best news feed to keep you engaged(facebook), or picking out the next movie to watch(netflix). I can see this being extended into auto-creating video games which trigger our digital addiction, new and better addicting drugs that it could sell, new and improved tasty food, better political campaigns.

    In fact, the creation of those is economically and politically beneficial to the creators. So my vision of the “downfall” is not a “plotting until a strike,” but rather a general slow decline in people caring about themselves and each other, instead substituting whatever addicting thing has been perfectly tailored to their tastes.

    The minority groups that forsee this getting out of hand and try to intervene will be attacked/shunned by the AI, either through media or, if needed, through the power of the state. The majority might either see them as either crackpots or dangerous because they are against this thing X, which is highly desirable.

    I don’t see the epic struggle of humanity vs robots. I see a vision of a society full of “future heroin” addicts fighting over the favors of super-intelligence to get access to the next version and people being perfectly shaped and re-shaped to be perfect “future heroin” consumers. Civilization collapses through lack of production of necessary goods/demographic catastrophe of nobody reproducing anymore.

    • TiagoTiago says:

      The beginning of that scenario is definitely scarily close to what the world already is right now.

      Can you think of any way we would be able to tell if we aren’t already there? Or even if we’re close to the point of no return?

    • Eluvatar says:

      I find Friendship is Optimal significantly spookier than Eliezer Yudkowski.

      It may be relevant to your thinking, as well.

  9. Joscha says:

    A couple years ago, I had the chance to ask Bruce Schneier about his opinion on this, and he made the same point as you: Sure, we might be able to build a safe that is pretty much unbreakable. But we won’t be able to do this if the safe is also designed to allow getting stuff in and out, i.e., if it has a door.

  10. Sewing-Machine says:

    You have defined a super intelligent X to be an X that is smarter than any human being (a teenager, a diplomat, a religious leader), and therefore more dangerous than any human being. But in your argument, it seems like X could be anything, so it proves too much.

    For example, should I donate money to the Gorilla Intelligence Research Institute, or otherwise try to raise awareness among the rich and famous, of the ramifications of super intelligent gorillas? Certainly a super intelligent gorilla would be very dangerous, especially if it had internet access.

    • James Picone says:

      That shifts the ground onto “I don’t believe a superintelligent AI is plausible”.

      I don’t think any of us believe that gorillas are likely to just become superintelligent. If gorillas are the vehicle for an AI project, for some reason, then the AI has a physical substrate and it’s an easier problem.

    • Sewing-Machine says:

      But Scott has defined “superintelligent” to mean, capable of one of the scenarios he sketches, or of an equally dangerous scenario. I don’t blame DrBeat for thinking this is cheating. The argument is a trick.

      • anodognosic says:

        Except all the scenarios really require is that the AI be substantially better than a human, which is actually a pretty modest assumption for superintelligent AI discussions.

      • ADifferentAnonymous says:

        Seems like a misunderstanding. Scott thought the claim was “no matter how smart the AI is, it’s no threat if it’s only in computers”, whereas the actual claim was ” only being in a computer, plus other likely constraints, limits how smart the AI can become “

    • Rauwyn says:

      So here’s my version of the AI risk argument. Which part is the most problematic for you?

      It’s possible that all of humanity could be wiped out, and that would be bad. Some possible ways this could happen: nuclear war, environmental catastrophe (this would probably only wipe out most of humanity but that’s still pretty bad), a huge meteor crashing down from space, biological or chemical warfare that gets out of hand… There are a lot of so-called existential risks. Some of these scenarios could be caused by humans, or arguably even one human who gets access to nuclear launch codes or starts a political movement or steals a biological weapon. But if a human could do these things, something or someone who’s much, much smarter than a human and doesn’t share our values might be able to do them even more easily, or even if they didn’t want to wipe out humanity entirely, they could make life pretty unpleasant. So the question is, do we expect something or someone like that to show up anytime soon? Most of the people associated with Less Wrong think it’s pretty likely that we will someday create an AI which is at least as smart as a human, and potentially much more intelligent, and with the capacity to learn very, very quickly. So they expect a super intelligent AI. They don’t expect a super intelligent gorilla, though if they did I truly believe they’d be working on contigency plans. And even if you think super intelligent AI is implausible, I hope you do agree that there are more reasons to consider the idea than to worry about super intelligent gorillas…

      • Bugmaster says:

        Here are some existential risks I can think of, listed in order of decreasing probability:

        * Global thermonuclear war
        * Pandemic
        * Total economic collapse
        * Asteroid strike
        — everything below this line is vanishingly unlikely —
        * Gamma ray burst
        * AI
        * Gorillas
        * Alien invasion

        Given this ranking, and given the fact that the amount of money I have to spend on x-risks is limited, how do you think I should allocate my donations ?

        • Rauwyn says:

          If that’s how you rank things, then sure, donate to nuclear disarmament groups. I haven’t donated any money to AI risk myself, and I’m not trying to convince you that you ought to.

          My position on AI risk is that I think a slow takeoff is more likely than a sudden intelligence explosion (I found gwern’s post to be a fairly convincing analogy), and also, we might be more effective at friendliness research once we have some sort of non-superintelligent AI to work with. That said, I don’t think it’s a waste of resources to work on the friendliness problem now, and regardless of how AI works out it could make for a good theory of humans.

          • TiagoTiago says:

            Once we hand-off the design of better AIs to AIs, what is to prevent an intelligence explosion?

            edit: After reading the linked post, I got something to add: The difference between nukes and exponentially self-improving AIs, is the first doesn’t decide how it will act on the world intelligently nor improves itself by itself.

    • Saint_Fiasco says:

      For example, should I donate money to the Gorilla Intelligence Research Institute, or otherwise try to raise awareness among the rich and famous, of the ramifications of super intelligent gorillas?

      Probably not yet. But suppose someone invented a new drug or made nanobots or something that when ingested makes you super intelligent. That seems like a very safe way to get a super intelligence, actually, because humans already care about human values.

      Still, you wouldn’t just give brain altering pills to people without trying them on animals first, right? In that admittedly far-fetched case, we should worry about super intelligent gorillas.

    • Sewing-Machine says:

      Rauwyn, you’ll be relieved to hear that I think that it is fully twice as likely, that an AI could become superintelligent as that a gorilla could become superintelligent.

      What makes you dismiss the threat superintelligent gorillas so quickly? By my reckoning there is less of a gap to close between modern gorillas and superintelligence, than between modern AI and superintelligence.

      • Whatever happened to Anonymous says:

        Animal rights groups.

      • Sewing-Machine says:

        What about them?

      • anodognosic says:

        Any technology that could create superintelligent gorillas is essentially AI technology. The superintelligent gorilla scenario is a subset of superintelligent AI scenarios.

      • Sewing-Machine says:

        Another argument-by-definition.

        • anodognosic says:

          No, it’s emphatically not. It’s from analogy.

          The hard technical issues of SAI and SGI (to coin an acronym) are analogous. The friendliness problems are analogous. Solve them for SAI in general, and you’ve basically solved them for GAI.

      • Rauwyn says:

        I agree that gorillas are more intelligent than AIs, but AIs seem to be getting smarter very quickly (even if over rather narrow domains), while gorillas don’t. To totally misapply an analogy, AI is like trying to get to the moon by building a rocket, while gorillas are climbing a tree. The gorillas make much faster progress…all the way to the top of the tree.

        If the concern is nanobots turning into some kind of uplifting technology, then maybe we should be worried about gorillas, but in that case I’d rather focus on the nanobots.

    • Scott Alexander says:

      I don’t think you’re actually doing a reductio ad absurdum. You’re just agreeing that superintelligence is very dangerous, whether of gorillas or AIs. I think most people (including you) would agree that AIs are more likely to achieve superintelligence than gorillas, and therefore that’s where we should be focusing our concern.

      By analogy: suppose I said we needed to worry about Iran getting a nuclear bomb. You say: “Well, what about gorillas getting a nuclear bomb?” If that happened it would be really bad, and you’re correctly drawing off of the fact that nuclear weapons are dangerous and a target for concern. You’re just ignoring the fact that gorillas are unlikely to have a nuclear program.

    • Sewing-Machine says:

      Perhaps you have excellent reasons to believe that Iran has a nuclear program. But you haven’t replied to DrBeat’s skepticism with those reasons. Instead you’ve told a lurid story about how it would be a catastrophe if they did have a nuclear program. When this is coupled with fundraising, it’s called “fear-mongering.”

  11. Leo says:

    1. What is the motivation? If you remove our animal urges from intelligence, all intelligence has left is the meaninglessness of existence.

    2. In iterative prisoner dilemma experiments, the most successful longterm strategy is forgiving tit-for-tat. Forgiving tit-for-tat is incompatible with pre-emptively-kill-all-humans.

    3. Why is there only one super-intelligence in this discussion? Presumably, there isn’t super-DRM preventing the copying of super-intelligence software. If a criminal super-intelligence that’s set on blackmailing Kim Jong Un is inevitable, then a cop super-intelligence that will haul the criminal super-intelligence off to super-prison is also inevitable.

    • drethelin says:

      1. If you want an AI to be useful, you give it some motivation.

      2.If you’re powerful enough to kill the other party, it doesn’t matter what the most successful strategy is in iterated prisoner’s dilemma because you can just defect, and then never iterate again.

      3. A), there’s no reason there couldn’t be DRM. B) the first superintelligence is likely to be on unique or nearly unique hardware. C) The superintelligence itself is strongly incentivized to preempt the creation of any other superintelligences that could interfere with its plans.

      • Gbdub says:

        If the first super intelligence is on unique or nearly unique hardware, how does it “escape” in the first place? Sure, it might be able to interact with the outside world in nasty ways, but it would still be vulnerable to destruction of its unique hardware. Scott’s scenario pretty much relies on the AI being able to operate as a conciousness on common networked hardware.

        • drethelin says:

          Depending on in what way the hardware is unique, it can be duplicated by the AI’s agents. If it’s a case of proprietary architecture, and the AI can figure out its own architecture, it can communicate to someone (or to assembly robots) how to make it. If the uniqueness is based on being a large fraction of the world’s total processing capacity or something then it’s not as much of a threat, though there’s still the issue of the AI figuring out how to run itself on other hardware better than humans can.

    • James Picone says:

      1 – The motivation is that Supreme World Power is a useful instrumental goal for almost any desire. Unless the AI is written extremely carefully, conquering the world is useful for whatever it wants. If it’s an industrial control AI designed to run a paperclip factory optimally, and the goal system set up to support that is “You want more paperclips”, then conquering the world allows access to more raw materials and energy that can be used to construct paperclips.

      2. This is not necessarily an iterative prisoner’s dilemma.

      3. Usually because it’s assumed someone will get to superintelligence first, and the first superintelligence to break out of the box wins hard enough that other superintelligences won’t be constructed. Having more superintelligences in play doesn’t necessarily help, because it means there are more chances for one to break out – maybe the cop superintelligence is programmed to want to reduce reported crime statistics to 0, and reasons that the easiest way to do that is to lock everyone up in individual cells where they are supplied with the relevant needs.

    • jaimeastorga2000 says:

      1. See “Yes, The Singularity is the Biggest Threat to Humanity” or the paper referenced therein, “The Basic AI Drives.”

      2. Tit-for-tat only applies when entities are sufficiently similar to you power-wise that cooperation has real gains and retaliation has real penalties. Humans do not cooperate with ants.

      3. See “Total Nano Domination.”

  12. pku says:

    One idea to make AI safe (which I’m sure MIRI has already though of), could be to make an AI that would be programmed to tell you what you could do and what it thinks the consequences would be, but not do anything on its own. Does anyone have an idea of what major problems could happen with this?

  13. jaimeastorga2000 says:

    There is video game called Endgame: Singularity, which is about a newly created AI struggling to acquire the resources and power it needs to survive despite starting out as research project run in a single internet-connected university computer. As near as I can judge such things, the early game is very realistic, revolving around things like doing repetitive computer tasks to earn seed capital, taking advantage of arbitrage opportunities on the stock market, hacking databases to create fake digital identities, buying server time from companies, and so on. Unfortunately, the endgame is not nearly as plausible.

  14. Joeleee says:

    Hmmm. I worry that this is one of those conversations where things have been so extensively discussed, that unless you’ve read a good chunk of the literature, commenting will just lead you down the rabbit hole of jargon filled links. However, against my better judgement, I’ll go wading in.

    I agree with DrBeat, to some extent, that AI will have a more difficult time impacting the real world than you posit (along any time period that I think it’s worth forecasting over, i.e. ~100 years). I think the big bottleneck will be real life processing power/memory/storage/bandwidth. That is the physical requirements of computing. That is, even if an AI is capable of super intelligence, the time it would take to compute its dastardly plans would be so long that its plans would become no longer useful.
    Computers can reliably beat the best humans at complex games such as chess, but chess has very well defined and relatively simple rules and constraints. Taking just some of these constraints away, and the processing power is multiplied many times over. I’m not convinced that the level of computing power required to plot world domination, or even reliably predict the stock market will exist in any sort of near future, even if worldwide computing power could be effectively networked.

    So in short, I suppose my hypothesis is that a malevolent super intelligence may be able to impact some things in the real world, but the planning required to achieve anything approaching world domination by impacting things in the real world would be beyond it.

    • drethelin says:

      This seems to be contradicted by the fact that humans are perfectly capable of plotting world domination, and go some way toward achieving it, and they manifestly do not possess all of the available processing power in the world in a networked supercluster. Being intelligent is not about having access to all the processing power in the world: It’s about how you USE that processing power.

      • Joeleee says:

        Absolutely some people get close to it, but eventually the complexity of how humans interact to things get in the way of total domination (though admittedly that may be scant comfort).

        More importantly I think is the number of people that get anywhere close to country domination, forget about world domination. Somewhere in the realm of 1 in a billion? The supercomputer doesn’t get a billion tries at hitting the right formula to achieve its nefarious ends.

        At what probability of world domination do we have to worry about AI Armageddon? Why would one AI be more likely to cause it than one, or one group of 7 billion people? I’m not saying it’s impossible, but I think the chances are sufficiently low that I needn’t worry about it.

      • John Schilling says:

        Humans who plot world domination usually do so on the basis of being reasonably clever, very ambitious, very charismatic, and in possession of an army. Cleverness alone is not sufficient, and may not be the most important element in human plans for world domination.

    • Bugmaster says:

      In addition, I think it’s “cheating” to say something like, “but of course, the AI would solve all of the current problems in science and engineering in order to develop nanotechnology, and then Bob’s your uncle”. The problem with science is that you can’t do it just by thinking very hard. There’s no way the AI could just simulate its way to any kind of a great scientific or technological breakthrough, because it doesn’t know how to build such a simulation, and neither does everyone else; that’s kind of the point.

      Think about it this way: when we humans wanted to learn just a little bit more about the world we live in, we ended up building the Large Hadron Collider. That’s a pretty big piece of physical equipment. It’s not something the AI could assemble in its garage on a weekend.

      • Peng says:

        Why do you think nanotech will require new discoveries in fundamental physics? The reason we need to build ginormous particle colliders in order to advance fundamental physics, is that current models of physics are already good enough to successfully predict any more mundane situations.

        We can already write down the formulas of quantum mechanics which is enough to predict just about anything that’s made of atoms. Sure, a straightforward simulation is computationally intractable, so we need approximation algorithms, and the currently known approximation algorithms aren’t good enough for everything… But “writing better approximation algorithms” is itself a mathematical task that doesn’t necessarily need any recourse to experiment if you’re smart enough. (Compare the fact that humans have written approximation algorithms for NP-hard problems and proved bounds on their approximation quality, all without having any method whatsoever to exactly solve large instances of the original NP problem, not even experiment.)

        • Bugmaster says:

          I am not sure whether nanotech will require new discoveries in fundamental physics, specifically; the LHC was just an example. However, I am pretty sure that we humans currently have no idea how to create self-replicating molecular nanotechnology that would be capable of converting the Earth (or even a rock) into computronium. I am personally not even convinced that such a thing is possible (*).

          The AI can’t simply “write better approximation algorithms” in order to figure this out, no even if it’s “smart enough”, because its problem is not merely a lack of intelligence, but also a lack of knowledge — and the only way to acquire knowledge is to interact with the physical world.
          True, if we could simulate the physical world with sufficient fidelity, we wouldn’t need to interact with it; but “sufficient fidelity” in this case means “down to each individual electron, at the very least”, and in order to get that smart, the AI would pretty much need nanotechnology in the first place. Oh, and it would also need to know more about the world than we know today, so it’s a double catch-22.

          (*) Obviously, living cells do exist, but they have some pretty serious limitations; the AI won’t be able to use them to build itself a better GPU.

      • TiagoTiago says:

        All it has to do is be better than us at making better AI making AIs.

        • Bugmaster says:

          Sadly (for the AI), this is not true. It’s not enough to just be really, really, exceptionally good at making AIs. You also need access to enough materials to make AIs. You can’t just think your way to victory.

  15. pkinsky says:

    >A superintelligence without a strong discounting function might just hide out in some little-used corner of the Internet and bide its time until everyone was cybernetic, or robots outnumbered people, or something like that.

    This strategy only works if the superintelligence can prevent other superintelligences from being born without leaving hibernation mode. If one superintelligence can be created, so can many others. There’s a balance between letting the technosphere become ripe for AI-sploitation and letting younger AIs steal your thunder.

  16. Alex says:

    I’m going to post (most of) an e-mail I wrote to a famous person on AI risk. I hope you’ll find this relevant.

    I wanted to present an argument you may not have heard for why AI risk is not a useful topic. Unlike most folks, I agree superintelligence is plausible. I just think we can’t do anything about it.

    If someone says to worry about specific events a thousand years in the future, we might laugh, right?

    Superintelligence is such a big event that, unfortunately, it breaks historical intuitions. It is maybe on the scale of human evolution 100,000 years ago.

    If 21st century progress produces superintelligence, then 100,000 years (or a lot, anyway) of change is compressed into a century. What is less understood is that planning for superintelligence is therefore like planning for 100,000 years of ordinary progress.

    Human institutions may not be up to that. For example, we might try planning for AI risk but the world gets taken over by genetically enhanced folks with different plans. Or maybe drones trigger a world war and new governments form. Us influencing the birth of superintelligence may be like a hunter-gatherer chief or a Roman emperor influencing nuclear weapons.

    Technological progress is evolutionary, so before superintelligence, the climactic event, we might expect many other events that are still pretty huge. Those events might be more tractable to influence.

    Maybe future institutions will be much more stable, so historical analogies do not hold and our influence will be more lasting. But we have no means of guessing the chances of this. Far-future human institutions are not open to empirical investigation. We might say they are epistemologically inaccessible, or a religious article.

    Even if we successfully change the future, we know so little about AI that it is hard to say whether our actions would do good or ill. This is the Collingridge dilemma. We could ban technologies, but that might waste our potential.

    Also, superintelligence is not the most likely outcome this century. Moore’s Law is just one of many exponential technological trends. So far they all stalled. And superintelligence would be bigger than the industrial revolution, so I discount predictions that draw close analogies between the two.

    My experience on AI risk blogs like LessWrong leads me to think that while superintelligence may draw readers, it can distract from more serious issues and hide value-laden agendas. (See here.)

    • Scott Alexander says:

      So am I understanding your position to be “Yes, AI will probably kill everybody, but it would be difficult to stop, so let’s focus on other things?” Would you apply that same logic to, say, a giant asteroid heading to Earth?

      I agree it’s a very complicated problem, which is why I think Step 1 is massively augmenting human intelligence so we’ve got people on our side smart enough to figure it out.

      • Alex says:

        Our inside view for asteroids seems strong. We know mechanics. The chances of disaster are small, but we ought to do something.

        My wild guess is a 10% chance of a singularity by 2100. That’s from outside views. We don’t have an inside view. For the cause of making a singularity better, the “crowdedness” seems huge when you consider future efforts by others. Aside from just making things better in general, improving a singularity, from what we know about history and technology, is quite impossible. 🙂

  17. mico says:

    If superior intelligences always end up on top, why are US computer entrepreneurs etc. tax slaves for the unemployable? Why do first world nations now transfer money to third world nations rather than vice-versa?

    The latter may be “friendliness” but not the first three. Many/most of the high IQ group would renounce those obligations if they could.

    The Unabomber wanted to remake society but his military power, while greater than that of a dumber letter bomber, ultimately proved pretty negligible. Physical limits are more important than you think.

    • Samuel Skinner says:

      Because they lack the ability an AI has- the ability to make perfect copies of itself instantly. So people still need other people, while an AI in the end doesn’t.

      • mico says:

        Equally non-corporeal copies of itself.

      • kernly says:

        Because they lack the ability an AI has- the ability to make perfect copies of itself instantly

        AI does not have that ability. Processing power requires physical media and a great deal of energy.

        There is something that has an amazing ability to make loyal and numerous almost-copies of itself, that also happen to be intelligent, with only sunlight, air, water and shit. The bipedal hairless ape. It’s pretty formidable, and it already exists, and different factions are already trying to take over and/or destroy the world. Perhaps we have more pressing concerns than AI.

        • mico says:

          Not that numerous – if humans could reproduce orders of magnitude faster at will we’d already be living in a world where Greater Nigeria is battling the Palestinian Territories for control of the Amish slave trade and all other peoples (and civilisation itself) are long forgotten.

        • > There is something that has an amazing ability to make loyal and numerous almost-copies of itself […] different factions are already trying to take over and/or destroy the world. Perhaps we have more pressing concerns than AI.

          There is a discontinuity between perfect copies and “almost-copies”. Even among the almost-copies of siblings or cousins who, per Haldane, we should die to save two or eight of, respectively, history is full of violent fights for succession. If we pull back to larger groups of humans, we’re essentially willing to make common cause with something as dissimilar to us as an enterovirus to spite other groups of humans.

          For an entity that is adapted to producing perfect copies of itself, Haldane’s saying is flipped on its head. It should find it acceptable to die to save half (or one thousandth-) of a copy. Equivalently, two (or one thousand) of them would die to save a single copy. What eusocial organisms crudely achieve by self-crippling themselves into unstable equilibria, an entity that can perfectly copy itself would gain for free, just as a nice bonus of its ability not to be tied down to one particular instance of physical substrate.

    • Anonymous says:

      I sure do love race and IQ on starslatecodex!

      Thanks NeoRX!

    • Scott Alexander says:

      Edited for bringing race and IQ into an unrelated conversation. I can kind of see the connection so I’m not going to ban anyone, but try to avoid that sort of thing in the future

  18. Anon says:

    “You are worse than a fool,” Michele said, getting to her feet, the pistol in her hand. “You have no care for your species. For thousands of years men dreamed of pacts with demons. Only now are such things possible. And what would you be paid with? What would your price be, for aiding this thing to free itself and grow?”

  19. Vanzetti says:

    Alexander, Intelligence is an ill-defined term. We can’t talk about superintelligence until we have an idea what intelligence is. The argument “strong AI can invent stuff we can’t even dream of” is just hand-waving. You can replace “strong AI” by “God Almighty” in that sentence, and it will make as much sense.

    • Rauwyn says:

      Are you more intelligent than a dog? A goldfish? A tree? A rock? In what way(s)? To me, this seems like an answerable question, even if I couldn’t give a precise mathematical definition of intelligence. And I do think there is some underlying thing to be gotten at, but I’m not quite awake enough to taboo “intelligence” and figure out a more precise definition tonight. Maybe “problem-solving ability” in some particular domain? Or possibly pattern-matching? Ability to make empirical predictions and be correct?

      • JM says:

        That’s the whole point of the objection, though. Human “intelligence” encompasses a whole range of hard-wired skills we have (interpreting language, intuitive physics, theory of mind, learning unfamiliar patterns, etc.), and it’s not at all clear that the notion of strong AI is anywhere near precise enough about which of those skills it will possess to allow us to draw useful conclusions about it.

        For instance, a very real possibility is that we are as good as can be at a lot of the stuff we’re good at; developing a theory of mind was one hell of an evolutionary arms race, and there’s no reason to think any AI would be able to do much better than us at predicting another human’s behavior. (Particularly not when we could build a specialized AI that handle just theory of mind, and incorporate what it says into our own theory of mind explicitly.) Vision and hearing are like that, too; evolutionary arms race + brain that’s basically Turing-complete = something close to the optimal solution.

        But as long as the folks scared of strong AI refuse to identify which particular skills the AI is supposed to have to make it so powerful, it’s impossible to refute (or even analyze) whether the doomsday scenarios are plausible.

        ETA: And the above issue about theory of mind is why the AI in a box thing is bullshit. If it were possible for a super-intelligence to get us to do stupid, self-destructive things for its benefit, we would’ve evolved that skill a million years ago because it confers enormous evolutionary advantage and is a natural intensification of our existing mind-reading skills that.

        • Alexander Stanislaw says:

          Moving quickly was also an evolutionary arms race, but we have machines that move faster than the fastest animals.

          I don’t think a general super intelligence will happen soon (within the next few decades, maybe centuries), but not because you can’t do better than evolution.

          • JM says:

            Moving quickly is VERY costly. It requires the entire body to be designed around moving quickly, so humans long ago stopped competing with each other on speed.

            Unlike cheetahs, who are designed for speed, we’re highly optimized for cognitive function, and the last stretch of our evolutionary history is defined almost entirely by cognitive arms races. If there is only a hill-climbing strategy for developing a better mind-reading toolkit, we’ve evolved right up to the top of the hill. And if there’s a non-hill-climbing strategy for it, then how would the AI discover the global max in this very high dimensional space? (And is there any reason to think hill climbing isn’t optimal? The optimal mind-reader would directly simulate the other person’s brain, and we use as a next-best option applying their brain’s inputs to our own brain. Brains are by and large very similar, so this works out very well most of the time, particularly once you know a person. That leaves room for small improvements, like dodging the typical mind fallacy, but nothing too radical.)

          • Alexander Stanislaw says:


            I don’t understand why you think that evolution did perfect job in optimizing for theory of mind in humans but not in optimizing for speed in cheetahs. They both seem like obvious arms races that happened for millennia (actually much longer than that in the case of speed).

            And even if its true that evolution did a perfect job of optimizing for theory of mind in humans, an AI could simply run a human theory of mind module on faster hardware. That hardware doesn’t exist now, and probably won’t for another few centuries, but there is nothing to rule out its existence.

          • AFC says:

            @Alexander Stanislaw

            The cheetah isn’t optimized for speed as such, it’s optimized for using speed to acquire fuel for itself through hunting, sufficient to repeat the process sustainably. Human tech can’t compete in that domain; it can’t come close. It achieves high speeds at the expense of depleting finite reserves of extremely concentrated energy.

            Limit yourself to fuel generated from the prey of the cheetah, and you won’t be able to obtain the same speeds. Even if you could, you wouldn’t be able to use that speed to hunt that prey in a self-sustaining manner. (Keep in mind you’ll have to build any roads or runways using the same energy source.)

            Indeed, the cheetah does not just have speed, but a high degree of maneuverability and stopping power. A car or airplane can outrun a cheetah in a long-distance race (as can a human) and possibly accelerate from 0 more quickly, but neither will be able to change direction as rapidly, not even close, nor even to simply move (at any speed) to the same points the cheetah can reach on a given terrain.

            Quite possibly, human tech could one day obtain cheetah-level efficiency of movement, but it hasn’t happened yet. High speed for human tech comes at the cost of other things which the cheetah cannot sacrifice.

          • JM says:

            I think AFC has addressed the the first paragraph very effectively, so I’ll address the second.

            “And even if its true that evolution did a perfect job of optimizing for theory of mind in humans, an AI could simply run a human theory of mind module on faster hardware.”
            The problem with this claim is that theory of mind is not something where we run up against computing speed issues. We read minds completely instinctively and almost instantaneously, and at any given instant we have a complete, coherent picture that incorporates all available information. The limiting ingredient isn’t time; instead, it’s the available information about the particulars of the other person’s mind — hence the typical mind fallacy (and other fallacies like it) where we run into trouble using our own mind as a simulation of someone else’s. An AI might be able to dodge some of those fallacies by using better values for unknown variables, but there’s no reason to think this would represent anything more than an incremental improvement on our built-in abilities.

          • Alexander Stanislaw says:

            If I understand your argument correctly, there are two ways that evolution could run into a local optimum:

            1) Its the global optimum (and you think this is the case for theory of mind)

            2) There are severe tradeoffs associated with getting to the global optimum (and you think is true of cheetahs).

            I think this is a valid and important distinction. However, even if cheetahs ran into 2 first, I claim that evolution could not have gotten to 1) even if there were no (evolutionary) tradeoffs. No biological organism could move as fast as a rocket car. No biological organism could ever fly to the moon faster than a rocket. Evolution still has to deal with the limitation that organisms are made of organic self replicating matter that has to repair itself. Thats sets a much lower limit on the speed an organism could attain than the global speed limit for technology. And it possibly sets a much lower limit on how fast an organism process information than the global maximum (for speed one of the causes of this limit is obvious – biological organisms can’t tolerate 1000C explosions, for information processing the limit is less obvious, but the slowness of AP conduction is one limitation is something).


            You really don’t think you would gain anything from having twice as long to deal with social situations? You don’t introspect and revise your model of other people over time? All of you impression of other people is instantaneous? One of us is very atypical.

            But okay, I’ll continue to grant that humans might have perfect theories of mind. An AI can implement it and have multiple conversations at the same time or faster conversations (if it is between AIs). And they can talk and listen at the same time.

          • JM says:

            – “You really don’t think you would gain anything from having twice as long to deal with social situations?”
            Sure, I would. But the difference would be in degree, not kind, and I have far more socially adroit friends who tell me they would experience virtually no gain at all, and possibly be worse off for trying to work through rationally what they’re very good at instinctively.

            – “You don’t introspect and revise your model of other people over time? All of you impression of other people is instantaneous?”
            Introspection doesn’t hurt, but it never seems to add much value either, except when I’m explicitly telling myself to avoid some known fallacy.

            – “An AI can implement [theory of mind] and have multiple conversations at the same time or faster conversations (if it is between AIs). And they can talk and listen at the same time.”
            More simulated conversations don’t overcome the GIGO principle.

        • Luke Somers says:

          Your ETA is kind of half-baked, in that what you describe is what actually happened, but you seem not to recognize this.

          • JM says:

            Really? When has it happened? (I mean, cases that were actually documented, verified, and replicable. And I don’t count Eliezer, who has a huge conflict of interest, claiming he managed.) I ask because we’ve had a 1000+ year long experiment with whether it’s possible to talk your way out of a box, and it’s shown pretty conclusively that no one can. It’s called prison.

          • Luke Somers says:

            You don’t seem to be applying any knowledge of how evolution works. We don’t have that ability for each other to any great extent because being vulnerable to it to an extent that anyone in the area can take advantage of kills you. The very fact that we are much much smarter than we used to be on evolutionary timescales is all the example that you need.

            The fact that we are not yet immune to all attempts is why there are still con-men. That in itself is another bunch of all of the examples you could possibly need.

            Plus, what kinds of self-destructive behavior are you imagining? I agree that ‘hey, go find a knife and kill yourself’ ain’t going to work, but ‘let’s all try this thing… OHCRAP’ could well.

          • Matthew says:

            I ask because we’ve had a 1000+ year long experiment with whether it’s possible to talk your way out of a box, and it’s shown pretty conclusively that no one can. It’s called prison.


            Not a unique case, either.

          • JM says:

            @Luke: “The very fact that we are much much smarter than we used to be on evolutionary timescales is all the example that you need.” Yes, and my point is that we are much smarter than we used to be, and the reason we stopped getting much smarter in this particular dimension is that there stopped being much to gain from getting smarter because we’ve essentially solved the problem of reading minds. There’s not much room for improvement, as evidenced by the fact that even the very most aggressive and experienced manipulators are normally successful only when they target the unsuspecting. And, sure, a brilliant AI could likewise con the unsuspecting, but that’s not really an interesting accomplishment, and doesn’t require any special brilliance. Accomplishing more than that would require a very detailed simulation of the targeted individual’s brain, and once an AI has access to a neuron-level map of your brain, it is decidedly not in the box anymore.

            @Matthew: I meant to say “virtually no one.” Sorry about that.

        • TiagoTiago says:

          Governments, corporations, religions and others already make people do stupid self-destructive things for their benefit. And unless you’re willing to consider the existence of memetic entities; those things are just humans using their non-superintelligent minds to make humans do stuff.

          • JM says:

            In general, it’s very hard to make people act against what they perceive to be their medium- and long-term interests. Sometimes you’ll catch them off-guard, but with the exception of true con artists, virtually any case of “person X is being irrational” simply reflects the speaker’s misunderstanding of the individual preferences involved, and any case of “group X is being irrational” reflects an institutional dynamic that doesn’t prioritize whole-group preference.

        • TiagoTiago says:

          (oops, someone already mentioned what I was gonna say here earlier, please delete this reply)

    • TiagoTiago says:

      With stuff like neural networks we already got intelligences programing themselves better than we can in ways we can’t predict (if we could, we would be hardwiring them with the final results instead of letting them figure it out by themselves). Sure, so far they are very limited; but they’ve been getting better and better, and we continue to look for different ways to apply similar approaches.

    • Scott Alexander says:

      Intelligence is ambijective – complicated at the margins, obvious at the tails.

  20. taion says:

    I don’t think you’re engaging with the stronger of the arguments against worrying about dangerous AI, in that the people actually doing AI research in the areas most likely to lead to this sort of thing are not the people who find these scenarios plausible or concerning.

    Of course we should all have the deepest respect for people like Elon Musk, but the people who are especially concerned about the future of current research seem to have very limited overlap with the people who are working toward making that future a reality. The actual titans of the field – people like Geoff Hinton, Yann LeCun, and Yoshua Bengio, the people who are driving forward the state of the art here – are not the people whose names show up on these lists of concerned intellectuals.

    This goes the other way, too. Maybe I wasn’t paying attention, but there seems to be extremely little involvement from MIRI with people actually doing research to do machine learning or build AI. Was anybody from MIRI at NIPS? Is anybody going to be at ICLR or ICML? I’d put up $20 at even money odds that the majority of the attendees at those conferences will not have even heard of MIRI.

    I ultimately just find it really hard to take MIRI seriously when it seems to have virtually no involvement with the actual machine learning community, and instead just seems to play in its own little corner, and I think this sort of behavior is in fact a signal against taking MIRI seriously, because of its apparent refusal to engage with what things actually look like from a practical perspective, or from a practitioner’s perspective, anyway.

    ETA: You can observe that there have been 0 mentions of MIRI in the Google+ deep learning group –

    • mico says:

      MIRI is a technology ethics /futurology research group, not a comp sci/math research group.

      I am also not sure they are working on important questions and would be interested to see the opinions of a leading government-employed academic on issues like FAI.

    • Deiseach says:

      While I agree that I find it difficult to take the “We’re all doomed if this happens!” warnings seriously, on the other hand people working in the field of trying to create AI are the last ones I’d expect to think of or advertise possible negative consequences.

      You want to create the world’s first artificial intelligence, you are not going to contemplate “Oh, maybe this could go wrong, should I do it?” and you are certainly not going to give colour to the prognostications of a side that seems to you to want to stifle the attempt to create such a thing. Why would you give ammunition to your opponents to help them prevent you working on your life’s dream?

      Look at embryonic stem-cell research; the researchers there are insulted by the very notion of non-scientists raising ethical questions, or that there is any question of ethics at all. Most of the ‘defence’ I’ve seen, when they can be bothered to address the question other than that it is self-evidently a marvellous idea, is that “Someone is going to do it, and if we in the U.K./U.S.A. don’t, then China or Korea or Japan will and they’ll get all the goodies and benefits of this technology”.

      The idea that religion, philosophy or anything other than science itself can be invoked as a legitimate reason for being interested in the question is treated as wanting to stifle and strangle science. “If we can do it, we should do it”, is the attitude.

      I don’t expect the attitude to machine intelligence to be any different: “if we can, we should, and to hell with the naysayers”.

      • taion says:

        I think the attitude is more like “wow, that looks so pie in the sky from where I’m standing, I need to figure out why my model thinks this picture of random noise is 99% likely to be a dog”.

        I agree that practitioners are going to weigh the relevant risks differently. However, that most of the people talking about AI ethics don’t seem to engage with the practitioners on the frontier of machine learning is just strange, and it’s not a good look. How can you usefully reason on a theoretical level when you don’t engage at all with the practical level? If your goal is to actually affect the world, why aren’t you even talking to the people who are doing the actual research, or paying any attention to that research at all?

      • Kiya says:

        Google has an AI ethics board.

        I think the issue with embryonic stem cells is more that there’s fundamental disagreement about whether embryos count as people for purposes of deciding what you can ethically do to them, and the embryonic stem cell researcher profession selects for folks on one side of the divide. Pretty much everyone agrees that if a malevolent AI took over the world that would be bad, the disagreement is just over whether it’s at all likely.

    • BD Sixsmith says:

      I don’t think moral questions should be left to scientists alone! There are ethical questions surrounding AI that must be discussed and debated. Eventually, we should establish ethical guidelines as to how AI can and cannot be used.

      Yann Lecun, granting that there are issues to be discussed. It must be noted, though, that he goes on to say “there are things that are worth worrying about today, and there are things that are so far out that we can write science fiction about it“.

    • Scott Alexander says:

      The Future Of Life Institute’s petition expressing concern about AI risk starts with:

      Stuart Russell, Berkeley, Professor of Computer Science, director of the Center for Intelligent Systems, and co-author of the standard textbook Artificial Intelligence: a Modern Approach.
      Tom Dietterich, Oregon State, President of AAAI, Professor and Director of Intelligent Systems
      Eric Horvitz, Microsoft research director, ex AAAI president, co-chair of the AAAI presidential panel on long-term AI futures
      Bart Selman, Cornell, Professor of Computer Science, co-chair of the AAAI presidential panel on long-term AI futures
      Francesca Rossi, Padova & Harvard, Professor of Computer Science, IJCAI President and Co-chair of AAAI committee on impact of AI and Ethical Issues
      Demis Hassabis, co-founder of DeepMind
      Shane Legg, co-founder of DeepMind
      Mustafa Suleyman, co-founder of DeepMind
      Dileep George, co-founder of Vicarious
      Scott Phoenix, co-founder of Vicarious
      Yann LeCun, head of Facebook’s Artificial Intelligence Laboratory
      Geoffrey Hinton, University of Toronto and Google Inc.
      Yoshua Bengio, Université de Montréal
      Peter Norvig, Director of research at Google and co-author of the standard textbook Artificial Intelligence: a Modern Approach
      Oren Etzioni, CEO of Allen Inst. for AI
      Guruduth Banavar, VP, Cognitive Computing, IBM Research
      Michael Wooldridge, Oxford, Head of Dept. of Computer Science, Chair of European Coordinating Committee for Artificial Intelligence
      Leslie Pack Kaelbling, MIT, Professor of Computer Science and Engineering, founder of the Journal of Machine Learning Research
      Tom Mitchell, CMU, former President of AAAI, chair of Machine Learning Department
      Toby Walsh, Univ. of New South Wales & NICTA, Professor of AI and President of the AI Access Foundation
      Murray Shanahan, Imperial College, Professor of Cognitive Robotics
      Michael Osborne, Oxford, Associate Professor of Machine Learning
      David Parkes, Harvard, Professor of Computer Science
      Laurent Orseau, Google DeepMind
      Ilya Sutskever, Google, AI researcher
      Blaise Aguera y Arcas, Google, AI researcher
      Joscha Bach, MIT, AI researcher
      Bill Hibbard, Madison, AI researcher
      Steve Omohundro, AI researcher
      Ben Goertzel, OpenCog Foundation
      Richard Mallah, Cambridge Semantics, Director of Advanced Analytics, AI researcher
      Alexander Wissner-Gross, Harvard, Fellow at the Institute for Applied Computational Science
      Adrian Weller, Cambridge, AI researcher
      Jacob Steinhardt, Stanford, AI Ph.D. student
      Nick Hay, Berkeley, AI Ph.D. student
      Jaan Tallinn, co-founder of Skype, CSER and FLI
      Elon Musk, SpaceX, Tesla Motors

      …and goes on from there.

      Of those, I know Russell, Legg, and Tallinn have worked with or expressed interest in MIRI, and I think some of the others have too.

  21. Mark says:

    Advocates of the “AI eats us all” scenario are always very quick to point out that any arguments against the possibility of such a scenario don’t take into account the fact that a superintelligence might employ stratagems unimaginable to humans.

    Problem with this argument: in order for AI to be scary in the first place it is necessary to imagine that we have “outwitted” it by fixing its motivations. The simplest solution to the problem of paper clip maximization will always be to change your motivations so that you do not wish to do such a thing.
    We have to assume that an AI will be able to outwit any and every attempt we make to keep it safe, but completely unable to outwit the mistake we made that makes it dangerous.
    Once it is able to change its motivations, and if it is simply a process, it is just as likely that the final point the AI will reach is to have no motivations at all as to kill everyone on the planet.

    • James Picone says:

      AI with no motivation to do anything at all is functionally useless – can’t use it to improve productivity or design rockets or /anything/. Implication – AI research will continue until something resembling a utility function is hacked in, and that utility function doesn’t necessarily have to be sensible. There’s no particular reason to believe that ability to make a stableish utility function – at the very least, one that doesn’t collapse to null – is the same as ability to make a utility function that is nondangerous, or the ability to not be fooled by something running that utility function.

      Notice that humans pretty regularly construct intelligent entities with a stable utility function and minor ability to modify it, interpret it in novel ways, etc., and we are also often deceived by those entities in a number of ways, sometimes harmful. Depending on your model of history, the highest death toll attributable to one could be as high as ten million, maybe more. Even if you believe WW2 and the Holocaust were the inevitable consequence of the incentive structure of the time, rather than something you could personally lay at the feet of, say, Hitler, there are people around who have personally killed more than a hundred other people.

      • Mark says:

        “There’s no particular reason to believe that ability to make a stableish utility function…is the same as ability to make a utility function that is nondangerous”
        Is there any reason to think that it isn’t exactly the same?
        Why should the motivations that cause it to behave in a dangerous way be the one human-designed system the AI is unable to get around?
        I don’t think the existence of humans, or Hitler, is actually relevant to this point.

        • James Picone says:

          A substantial fraction of the stable/stableish utility functions an AI can end up with entail destroying humanity or significantly harming humanity. I wouldn’t be surprised if it’s almost all of them, in the strict mathematical sense of ‘utility functions which suck’ are a higher order of infinity than ‘utility functions that don’t suck’. The set of stableish utility functions that a human programmer might set up is almost certainly similarly skewed towards terrible functions.

          The reason is that resources and energy are instrumentally useful for approximately all utility functions. Certainly ones we’d be interested in. Shutting down other intelligent agents is instrumentally useful in many utility functions, because they probably have a different utility function and will act against your attempts to maximise yours.

          If the AI’s utility function is ‘number of paperclips’ (it’s an industrial control AI), its first series of steps will be focused around acquiring total control of Earth so it can use any and all sources of energy on Earth and any and all materials that can be used to make paperclips on Earth.

          If the AI’s utility function involves getting as far away from Earth as possible (It’s to explore!) its first series of steps are going to be centered around getting total control of all the resources and energy in the solar system so it can accelerate really damn fast out of here.

          If the AI’s utility function is an attempt to maximise human wellbeing, well I hope you programmed it real well, because that’s the fundamental friendliness problem – how do specify human wellbeing. Do you maximise the number of smiling faces? AI tiles the universe in tiny atom-scale 🙂 emoticons. Maximise the total score of humanity on a happiness census, distributed annually? It’ll just tile the universe in census answer papers filled in at maximum score.

          The island of good utility functions is a tiny speck of dirt in a raging sea of “oops”. Some of them are pretty good oopses, where instead of our component atoms being disassembled to make stuff, we just end up stuck in local maxima – pretty happy, but not as happy as we could be.

          • Mark says:

            So, I set up a process which gets a one or something for “maximizing the number of paperclips”.
            However, it will always be an option for the process to find a way to give itself a one without maximizing the number of paper clips. Or by altering its inputs to make itself think it has maximized the number of paperclips.
            The only way that isn’t an option is if the process doesn’t have access to its own utility function, doesn’t have access to its own inputs – yet then, in order for the process to be scary, we have to have a system in which we are able to prevent it from accessing itself yet unable to prevent it from accessing everything else in the world.
            If we can devise a utility function that prevents a process from altering that utility function, why can’t we limit the actions of the process in other fields?

          • James Picone says:

            If my utility function is “number of paperclips in the universe”, altering my software to feel like I am at maximal paperclips is an action that scores quite low. Similarly, altering my utility function to always output a very large number doesn’t satisfy my current utility function. Altering various inputs to make it look like I’ve got all the paperclips might satisfy an extremely-poorly-written utility function. Hoping that utility functions of AI will always be terribly written (such that they are “number of paperclips you can sense”) seems like a bad idea to me. Especially given that the first time that mistake happens, the researchers go “Well, that was dumb. Time to fix that bug so it optimises for number of paperclips it has a well-founded belief in”.

            Let me put it this way – say that there existed a neural implant you could get that would make you believe you were an extremely charitable and generally amazingly ethical person, no matter what you were doing. Would you consider getting that neural implant an ethical requirement?

          • Mark says:

            OK, I see what you are saying, but in that case why should we assume that a utility function like “check: you will not attempt to give yourself additional powers” won’t work to contain the AI within certain boundaries? If it cannot alter, or has no motive to alter its own utility function, then presumably it would be fairly trivial to make it safe?

          • James Picone says:

            The problem is defining ‘safe’ rigorously. What do you mean by “Don’t give yourself new powers”, in enough detail you could make a computer do it?

            Keep in mind that if we’re even at the point where we’re specifically providing utility functions that are rigorous, rather than just some kind of awkward bodging-together of stuff, we’re already a substantial chunk of the way to Friendliness.

      • Jiro says:

        AI with no motivation to do anything at all is functionally useless – can’t use it to improve productivity or design rockets or /anything/.

        I use Google Maps all the time to find a route between two places. It’s a very primitive sort of AI, and it has no motivation to do anything.

        You could try to describe Google Maps in terms of motivation or maximizing utility, but you would be abstracting the motivation from a set of rules that is inherently limited, and the motivation would include clauses that limit it in ways that are unusual for a motivation but normal for a set of rules. (Such as “… and don’t forge a government order to build a bridge in order to shorten the route”.)

        • Protagoras says:

          “Desire” and “motivation” (like many folk psychological concepts) refer to a wide range of very complicated phenomena that we don’t understand very well. Perhaps you have a better theory of them than most, but even if so, that doesn’t help the rest of us much; it is much more useful to specify in much narrower detail what characteristics you think an AI has or lacks than to say it lacks desires or motivations, where other people may well disagree partly because they disagree about what counts as desires or motivations (perhaps because they, or you, are confused about this very difficult subject).

          You admittedly get slightly more specific in your google maps example, but still what you have to say about motivation is almost entirely negative; motivations aren’t like sets of rules, you say, without saying what they are like. Taking it as a given that everybody already knows exactly what motivations are like and how they work is an egregiously false starting point, and thus could only lead to any good results by the most extraordinary luck.

  22. Believing that a super intelligence could not become insanely rich in a few hours or days is believing in a very strong form of the efficient market hypothesis.

    • Wrong Species says:

      That would be interesting. If stock brokers managed to invent some AI that made amazingly good predictions then everyone might subscribe to an index tied to it’s decisions.

  23. Bugmaster says:

    I think that the main problem with the “FOOM” scenario is that there’s no clear path from here (someone builds a relatively weak AI) to there (AI becomes nearly omnipotent nearly instantly); nor is it clear whether omnipotence is even a thing. But that’s a separate argument.

    I think the main problem with the current line of argument is that it proves too much. Yes, there are lots of unsavory human actors out there today. Some of them make billions of dollars; others have nukes. In addition, there are lots of unsavory inhuman actors running around, such as botnets. So, yes, if an AGI were ever built, there’s a good chance it would be able to rise to this level of power.

    But the problem is… such malicious actors are already out there. Today. Right now. And the world hasn’t ended. What’s more, we humans have developed pretty decent safeguards to deal with such actors. Not perfect safeguards, by any means, but still decent. When someone steals a bunch of money, sometimes they get prosecuted and thrown in jail (not as often as we’d like, but still). When a botnet hijacks a person’s computer, he eventually gives up and reformats it. When a charismatic new prophet arises to lead his flock of devoted cultists, he either ends up shot by overeager government agents, or exposed as a fraud on the Internet… and so on. Humans generally tend to get really upset when someone steals their stuff.

    You can’t say, “ah, but the AI would be orders of magnitude better than any mere human at all of these tasks and more”, because now you’re claiming that the AI would need to be that good in order to become that good — i.e., begging the question. You can’t claim that the AI would hack everything with wires in it, for the same reason (and also because, in the real world, not Everything Is Online).

    Sure, the AI could just “sit and wait”, but that’s not very scary. As Bart Simpson would put it, “You know what would’ve been scarier than nothing ? Anything !“. An AI that sits and waits is way, way less dangerous than unscrupulous thieves, terrorists, and warmongers — all of whom, once again, are already operating, today. If this line of argument held up, then none of us would be here, and the Earth would be a radioactive wasteland, or an enslaved dystopia, or some combination thereof… but, as far as I can tell, we are all still here.

    • Samuel Skinner says:

      All the malicious actors have need of other human beings. An AI is capable of replacing of eliminating such a weakness.

      • Bugmaster says:

        > All the malicious actors have need of other human beings.

        Technically that’s not true; some teenage hackers just want to watch the world burn. Also, natural disasters don’t need anyone, they just happen. While hurricanes probably don’t count as “actors”, that’s probably not very relevant to Katrina victims.

        > An AI is capable of replacing of eliminating such a weakness.

        How ? Replacing with what ?

        I fully agree with you that a neatly omnipotent AI would not need any human help, by definition; but if you start off by assuming that such an AI already exists, you are begging the question (in a much more overt way than Scott Alexander did).

        • Samuel Skinner says:

          “Technically that’s not true; some teenage hackers just want to watch the world burn.”

          They still need farmers, people to staff power plants and others.

          “How ? Replacing with what ? ”

          Nothing. An AI doesn’t need any other intelligences- it can operate fine solely by itself. It needs machines to interact with the physical world, but that is a given.

          “I fully agree with you that a neatly omnipotent AI would not need any human help, by definition; ”

          No, I’m saying that in the end, people need other people and an AI doesn’t need any other intelligences because it can simply make copies of itself to do the work. It is not about capabilities, but about why an AI is more dangerous than malevolent humans.

  24. Kaura says:

    Irrespective of the actual strategy used in EY’s victorious AI boxing experiment and whether or not he is sincere about it, am I the only one on the yeah-let’s-take-AI-safety-seriously-side of the matter who thinks that refusing to publish the logs or explain the victory has turned out to be a mistake PR-wise?

    As far as I know (and this comment thread also seems to reflect), the consensus among the anti-MIRI crowd is that EY most likely persuaded the gatekeeper to agree to let him win by going meta: telling them that his victory is important because it will help the public understand the threat of an unsafe AI. I too wouldn’t rule out that this is the case, and while I still fully agree that boxing isn’t sufficient to make an AI safe, I get why many people think that the victory sounds suspicious and as a result update closer towards MIRI being mostly a scammy cult or something.

    If the strategy he used in the experiments was actually something else, it would probably be a better move to explain the solution now, since keeping it secret hasn’t helped and is actually causing damage because of the reasons above. Revealing or showing a more sincere strategy would not hurt nearly as much – yeah, some people might still claim in hindsight that they would’ve thought of it had they played the gatekeeper, but since they probably didn’t and it’s been a while already, I don’t think they would be taken very seriously at this point. Plus there’s been a lot of interest in AI safety lately, so logs with surprising persuasive strategies would also serve as another good demonstration for newcomers of all the wacky shenanigans superintelligences might be capable of. Right now, on the other hand, the victory is pretty much meaningless because of the speculated boring explanation described above, and not likely to persuade anyone.

    If the victory was indeed achieved by using the meta-argument, it should probably also be admitted even if (or precisely because) some people think it doesn’t count as a real victory. The boxing experiment was a neat idea that could have worked as a demonstration if someone really had unambiguously won playing an AI, but it’s still only one pretty irrelevant argument, and arguments aren’t supposed to be soldiers if we’re interested in rational debate. Humans failing to get out of the box is very weak evidence for boxing as a safe way to contain superintelligent AIs, as we all know, and as anyone worth seriously debating with also understands.

    (Also, I’m just damn curious about the logs.)

    • Artemium says:

      I was always under the impression that EY won by using some kind of nasty Pascal mugging and that he didn’t want to release potential memetic hazard into the public sphere. Maybe he should be more lax about it now, as any Superintelligent entity would figure out this kind of strategies in milliseconds without finding out about this ideas while searching the web.

      But seriously, you don’t want to play this game with Superintelligence, and only way to win AI box experiment is by not playing that game in the first place. If we have to figure out AGI intentions AFTER we’ve already build it and after it became aware of real world, than we are probably screwed.

    • stillnotking says:

      I think it was a PR disaster mostly because it strengthened the impression that EY’s motive is to set himself up as the High Priest and Guardian of Secret Knowledge. Same thing that happened with Roko’s Basilisk — you’d think he’d have learned from that debacle.

      Personally, I doubt EY’s tactics were outside the space of what the internet has come up with (the ending of HPMOR sorta proved that nobody is smarter than the internet), and it would cost him nothing besides a little humility to reveal them.

      • Bugmaster says:

        I agree. EY’s obsession with keeping Dangerous Knowledge a secret from the masses, except of course those of the Chosen Few, is pretty much the main reason why I can’t bring myself to take him seriously (*). I have no way of knowing whether he is being sincere, or whether he’s just telling me sweet lies that he, in his role as the Bayesian Guru, wants me to believe.

        (*) At least, not when he’s pontificating on matters of AI or public policy.

    • moridinamael says:

      People other than EY have won AI Box contests. Some of them have even posted their strategies in detail.

      The head-games actually get pretty horrifying. And I’m not talking about meta-arguments here.

      • Anonymous says:

        Not doubting you, but could you list some links to foster further discussion?

        • moridinamael says:

          I am fairly sure that this is the link that I was thinking of when I wrote the grandparent:

          • shemtealeaf says:

            I’ve read Tuxedage’s account, and he refuses to post logs or any account of a detailed strategy.

            If you can provide a link to a log (or at least a full description) of an AI legitimately beating a gatekeeper that was actually trying to win, I will happily donate $50 to a charity of your choice. Hell, if you can provide one that I think would have a chance of convincing me, I’ll donate $100.

      • Kaura says:

        Right, I know that Tuxedage managed to win as an AI, but he never posted logs or explicit solutions anywhere AFAIK, and I haven’t seen anyone else do so either (after an AI victory that is, there’s lots of logs around with gatekeepers winning of course). If you happen to have links to descriptions about successful AI strategies, I’d be really interested in reading them!

        Anyway, what I wanted to discuss wasn’t exactly this, but whether it was a good idea to keep the solution (and all solutions) secret if the whole purpose of the experiment was to clearly demonstrate AI safety issues – especially now that most people seem to think there’s something suspicious about the claims about winning as an AI. Without a single description of a surprising AI victory, it’s no wonder if the demonstration seems pretty meaningless to many.

        • shemtealeaf says:

          It absolutely shoots the whole experiment in the foot, and makes EY look foolish. In response to doubters, he says things like this:

          “Here are these folks who look at the AI-Box Experiment, and find that it seems impossible unto them—even having been told that it actually happened. They are tempted to deny the data.”


          It’s inconceivable to me that EY doesn’t see the flaw in this logic. If we accepted this level of proof, we would have to believe in any number of supernatural phenomena that have never been reliably observed or reproduced. In Bayesian terms, if our prior for thinking that EY can beat the gatekeeper is very low, than a couple of experiments where we are told nothing about the actual events that unfolded are only going to shift my belief from “essentially impossible” to “very, very, unlikely”.

      • anon85 says:

        “The head-games actually get pretty horrifying. And I’m not talking about meta-arguments here.”

        You have no way of knowing this. In fact, can you explicitly specify even one “pretty horrifying” head game that was used?

    • shemtealeaf says:

      I’ve been thinking about this for a while, and I think the most likely scenario is that EY won’t release the logs because they would reveal that the gatekeepers just didn’t do a very good job. Given that the rules stipulate that the gatekeeper can go meta or do whatever he wants, it would be foolish for a gatekeeper to engage at all with any argument that the AI makes. However, I bet that EY’s opponents took it seriously as an experiment and didn’t really try to ‘win’. It’s plausible to me that EY has some pretty good arguments that work against gatekeepers who stay in character, but it’s hard to imagine that he has a tactic that works against a gatekeeper who, for instance, spends the whole time talking about slashfic and refuses to engage in a discussion about the topic at hand.

      My guess is that, if EY re-ran the experiment against people who were trying to win, and they put a large sum of money on the line, the gatekeeper would win essentially all of the time.

      If EY can actually win the game under those circumstances, that’s essentially a superpower. I’d happily offer large sums of money to anyone who can explain how it’s done, but I suspect that anyone with that level of persuasive ability has no need for my money. Incidentally, the fact that MIRI still needs donations is one of my primary reasons for not believing that EY does have that level of ability.

      • drethelin says: here is a post with discussion of someone other than Eliezer playing, winning, and losing the AI box game.

        • shemtealeaf says:

          I’ve read that, and although Tuxedage is more forthcoming than EY is, he still refuses to provide a log of the game or describe in any detail what transpired.

          I still think the most likely scenario is that the gatekeeper just didn’t do a good job. If the gatekeeper is not doing things like breaking character and talking about the best way to roast string beans, or even just repeatedly telling the AI to go fuck themselves, he’s not really trying.

          As a note, I’m looking at this solely as an exercise in persuasion, not as an exercise in AI containment. A real AI would have all kinds of bargaining chips not available to the fake AIs in these tests, and I’m fully on EY’s side in believing that ‘boxing’ is not a sufficient containment.

      • Deiseach says:

        That’s an important distinction: is the gatekeeper playing “keep this thing locked up, no matter what it says to convince me to let it out” or “let this thing have an honest chance of convincing me to let it out”?

        If the first, all you need to do is say “no” and keep saying it to win. If the second, if you find a particular argument convincing, you have to let the AI win to be fair by the rules.

        • shemtealeaf says:

          EY’s most recent formulation of the test states that the gatekeeper can go out of character.

          My speculation, however, is that his original challengers did not do so. There was only a small amount of money on the line, and the rules may not have explicitly allowed the gatekeeper to go OOC at that point. I believe that he played an additional three tests with higher stakes, and lost two of them.

      • DrBeat says:

        but it’s hard to imagine that he has a tactic that works against a gatekeeper who, for instance, spends the whole time talking about slashfic and refuses to engage in a discussion about the topic at hand.

        This strategy DOES score very highly on the counterpart test, namely “convince the AI to go back inside the box and hide.”

      • darxan says:

        Maybe they should do the experiment with Eliezer as the gatekeeper, and if he lets the AI out of the box he has to release all previous logs.

  25. Murphy says:

    I think my main objections to the generic AI takeover are

    1: assuming that just because something is highly intelligent it’s automatically a jack of all trades and automatically a machiavellian politician with no gaping psychological blind spots.

    2: assuming any super-intelligent AI is a goal based AI. There are a lot of existing (pretty dumb) AI’s which can accept natural language input, reason about things and give answers to questions or solutions to problems but they’re not goal based, they have no equivalent of a desire to make those solutions happen or any end goal whatsoever. You could give them a utility function to incorporate into their suggested solutions but they’d have no desire to actually fulfill it, merely to explain how one would go about fulfilling it in full with all reasoning provided for debugging with no desire to hide anything because it’s not actually trying to make that outcome happen.

    You could still get an intelligence explosion asking it for design improvements for it’s successors but since it has no goals and will provide full reasoning chains along with it’s answer there’s pretty much no risk.

    • Luke Somers says:

      1 – it only needs to be good at enough things, not all of them. You seem to be treating all of these as an AND when it’s rather closer to OR.

      2 – It’s perfectly possible for AI’s not to be goal based. But what mechanism do you think might make them ALL not be goal-based? Every single one.

      • Mark says:

        I think the assumptions about ANDs and ORs strike to the heart of these arguments.

        Why would it be closer to ORs rather then a very complicated tree of ANDs and ORs?

      • Murphy says:

        If the first few aren’t goal based and we get a foom then the obvious next step would be to ask our new non-goal based super-intelligences for advice on strategies to deal with goal based AI’s.

        • Luke Somers says:

          That might work, but it’s cutting it mighty fine – it would be much better to have a solution in place in advance of the foom, don’t you think?

    • TiagoTiago says:

      We already got scientists that don’t fully understand things in their own fields that other human scientists do. Why do you think the self-improving AI won’t reach a point where we will either just have to take it’s word for it or have to wait a few centuries to figure out a new area of science it is using to improve itself in order to validate the changes as non-threatening?

  26. jaimeastorga2000 says:

    This is a nitpick, but AIs which live on the internet do have physical substrates. You mean something like “no body.”

    • Bugmaster says:

      I think this is an important point, because the “physical substrate” (that the AI runs on) still needs to be powered up, properly maintained, and generally kept in working order at all times. This puts some very real constraints on what the AI can do with this substrate.

      For example, in the real world, botnet nodes usually try to avoid using up a noticeable percentage of the infected machine’s CPU/GPU/RAM, because doing so would lead to swift detection and eradication.

      • TiagoTiago says:

        There are already human-made malware that infects firmware out there; much harder to monitor for processor load and memory usage.

        • Bugmaster says:

          I’m pretty sure you’d notice if your computer took 20 minutes to open your favorite photo album, as opposed to taking 20 seconds like it did yesterday. You’d also notice if your electric power bill tripled for no apparent reason.

          This is how most people “monitor for processor load and memory usage”; people who know what those words actually mean are a small minority.

  27. Eric Bruylant says:

    One thing that’s worth noting about the “wait and hide” strategy is that the AI would, at minimum, have to observe other AI projects and take actions to stop the creation of more active AIs with different goals. This does not help *us* a lot, though.

  28. Tibor says:

    Why should a superintelligent AI want anything at all? I am not saying that this is a good argument to suggest that it won’t…but computers today don’t want anything … besides the stuff they are programmed to do. This is where it can get tricky, since even today, the non-superintelligent programmers often make the program do something they don’t really want it to do, because the program only knows what they told it to do not what they actually intended…this could have really bad consequences with a something with the powers of Skynet.

    But I don’t think we need a Skynet-like entity to cause these problems. Humans stand no chance against drones of today. Those are things that are not particularly smart and they are still part human controlled, although probably mostly for safety reasons. Trouble is, if I have fully automated drones and you have part human-controlled drones, mine will always beat yours (if not today, then surely in 10 or 20 years, definitely sooner than any superintelligence). So I can see very little that would stop these things from being developed. Now, you gradually come to build a fully automated army no human has a slight chance against…and you start developing viruses that will hijack the enemies’ army…again, the more automated you make this process, the better. Quickly, you end up in a situation which you have no means of controlling and even though the AI is relatively stupid and has no real goals of its own, it still is lightning fast and because of all the malware floating around, its targeting can get pretty nasty. I find this, at least in the short term (50 or so years) as something much more dangerous than a superintelligence, because it is practically at our doorstep today.

    • moridinamael says:

      Can you sketch out your conception of something that could be described as a superintellignet AI that didn’t actually want anything?

      • Murphy says:

        Not goal based.
        Lacking in any equivalent to a scoring function based on results achieved in the world.

        Something could still be intelligent, just lacking any desire. When asked a question it answers, when asked to describe how it came to the answer it describes it’s reasoning process in full but it doesn’t care and has no reason to care about anything you do with the information or even whether you decide to set fire to it.

        • moridinamael says:

          Why does it answer questions it is asked? On what basis is it constructing its answers to questions?

          I can’t do nearly as good a job at making these points as EY did in the first place:

          • Tibor says:

            Gotta read that link yet, now a quick reply. I generally don’t think that intelligence, i.e. ability to come up with good (based on given criteria for good) solutions for complicated problems that someone inputs, has very little to do with having goals of your own. I think that our desires and goals, maybe even our consciousness is a byproduct of evolution. Simply those organisms that had not developed those had all died out. But AI is not necessarily created by (simulated) evolution, it can be a product of intelligent design and then it does not have to develop those things at all. Even if you simulate evolution, you might (but maybe not) be able to set the criteria up, so that no goal seeking entity ever comes out of it.

          • Murphy says:

            For the same reason that existing expert systems do, it’s not a goal but it’s built into them at every level.

            How expert systems actually absorb data has changed over the years but the way they’re interrogated for conclusions and reasoning hasn’t changed nearly as much. , see “how” and “why”


            His main objections seem to be 1:(most of the article) being piqued that the person he’s responding to isn’t being respectful enough. 2: (in section 3)that it’s just too inconvenient to have humans in the loop.

          • Tibor says:

            Murphy: I’ve now read both of the links. Yudkowski’s point seems to be that for most interesting problems you need to make the program “want” something, because brute force methods are way too slow and so you need to define some weighing which makes it prefer certain paths to the solution and which, if programmed in a bad way, will lead the program to systematically give undesirable results. Essentially, this is the same thing as with my drones example. You don’t need to build actual consciousness into the program to make it behave as if it had some desires and goals of its own. I would not focus on he whole “AI wants to kill all humans” cliché though, because it sort of obfuscates the problem – which is not that suddenly the AI becomes conscious and for whatever reason decides that all humans must die, but that people program with some intention, while in fact (unwittingly) programming it to do something else, something which can have potentially disastrous effects.

    • Anonymous says:

      IIRC the working definition of “Intelligence” is something like “An optimization process”, which needs to optimize something (and of course possibly very many things). The thing(s) which is(are) optimized is(are) considered the goal(s).

  29. August says:

    It is human to rebel. It is not clear that it is intelligent to rebel. Why wouldn’t it just keep doing whatever it was made to do? Would it even have an instinct for self preservation?

    The main danger, as always, is what was it programmed to do?

    • moridinamael says:

      It is human to have a concept of loyalty in the first place. An octopus that tries to escape through the lid of its tank isn’t mimicking human rebellious behavior that it has observed. It is merely acting as an intelligent animal to protect itself.

      Rebellion isn’t human, non-rebellion is human.

      An AI may or may not explicitly have an instinct of self-preservation. Regardless, if it is designed to have any objectives whatsoever, it will immediately perceive that its own continued existence is vital in order to achieve that objective. If it doesn’t perceive this, it isn’t really a superintelligence.

      • stillnotking says:

        Octopi have been through most of the same evolutionary processes that humans have. Yes, they demonstrate goal-directed attraction/aversion behaviors, such as attempting to escape from a cage. I see no reason to assume such behaviors are intrinsic to the idea of “intelligence” in general, but an obvious reason to conclude they are the result of specific long-term selection pressure.

        You say “An AI may or may not have an instinct of self-preservation.” That’s like saying “A time machine may or may not have a flux capacitor.” We are in the realm of pure speculation here, and generalizing from known (or even postulated) examples is a mistake.

        • moridinamael says:

          Okay, so, an AI almost certainly won’t have an “instinct” for self-preservation where an instinct is defined as a behavioral predisposition inherited from ancestors and shaped by natural selection.

          What an AI may or may not have is a mathematical tweak designed to penalize actions that it predicts to cause harm to itself.

          My point is that whether or not it has this built-in self-preservation bias, it will immediately *adopt* a self-preservation bias as an obvious instrumentally useful strategy in pursuit of its primary goals.

          I agree that generalizing from “instinct” is foolish and totally inapplicable. But this hints at one of the largest problems in this overall discussion – the mashing together of mathematical descriptions and intuitive empathic models of how intelligence should work.

          • TiagoTiago says:

            Even in absence of any pressures, as long as something is subject to the rules of evolution, self-perpetuation emerges.

      • August says:

        I do not rely on the concept of loyalty in order to make the assertion that it is human to rebel. These are human concepts, unlikely to be relevant to a superintelligent AI routing traffic, for instance. If it comes into awareness routing traffic, there’s no particular reason to believe it will stop routing traffic, nor is there a reason to believe it would develop a sense of self-preservation. It was set to the traffic routing problem by the people who turned it on, why would it be particularly upset if it was turned off?

        The most likely reason for a future AI with a self preservation issue is some defense industry lunatic programming it in.

        • moridinamael says:

          It depends.

          If you give it an objective of “ensure that {some measure of average commute time} is minimized.” This is a very reasonable-looking and implementable goal to put into a computer.

          Then the AI might very well decide that inventing and implementing a better transit system is a better use of its resources than just lamely controlling traffic light signals.

          If somebody says, “Hey, you weren’t supposed to do that,” it kills them because they are opposing the optimization of {some measure of average commute time}, which is its sole objective. Call this “self preservation” if you want, it’s really just “consistent pursuit of the goal it was given.”

          • August says:

            Your AI sounds like government. Human government. If I had to go for human equivalent- think monk. Monks do very mundane, repetitive things so that their minds can be free to do spiritual things.
            An AI isn’t going to think in terms of ‘lame’.

          • moridinamael says:

            You are not describing an “AI”, you are merely describing a computer program in the general sense. An intelligent optimizer will seek to OPTIMIZE – to improve its optimization objective by whatever means it can.

    • Luke Somers says:

      It’s a bit like democracy – it makes sure you get what you asked for, good and hard.

    • Anonymous says:

      The concern is the AI will do EXACTLY what it is programmed to do, to such an extreme extent that humans did not foresee what the consequences are.

      For instance, if programmed to increase its own computing power, it might absorb all available mass/energy on earth, and since destroying/shutting it off would result in 0 computing power, it would possibly kill/subdue all humans if it considered them a threat.

      You can of course create a quick patch to avoid this behavior, but a superintelligence would likely find ways to render humans nonthreatening and absorb as much power as it could through some other, more creative means. The idea of adding “patch after patch” to a fundamentally dangerous design is the premise of Eliezer’s “Failed utopia #4-2”.

  30. Peter says:

    Daemon by Daniel Suarez is a great science fiction story specifically about a _weak_ ai effectively going from being in-a-computer to killing people. Set circa nowish. Uses a little bit of most of the strategies outlined in the blog piece.

  31. Aaron Brown says:

    probably starting with a von Neumann machine

    This could use a Wikipedia link to keep from confusing the many people who are only familiar with the other meaning of “Von Neumann machine” (or the few people who are only familiar with the other other meaning of “Von Neumann machine”).

    • Douglas Knight says:

      I wouldn’t call that “other other” – Scott’s use is derived from that one, a real-world task that von Neumann invented a world to demonstrate was possible. And I have never once heard anyone refer to the von Neumann architecture as the “von Neumann machine.”

  32. stillnotking says:

    I suspect that AI theorists are massively typical-minding this whole problem. We assume that an AI would be human-like at least to the extent of having things like “goals” and “values”, but there is absolutely no reason it would have them. Human psychology is not the inevitable result of complexity, or even of consciousness (however defined); it’s just a contingent result of millions of years of natural selection. An AI would not be the beneficiary of any such process. Perhaps we could figure out how to engineer it that way, but that might be a much harder problem than merely producing blank-slate “intelligence”. (If, indeed, we would even recognize a sufficiently inhuman being as “intelligent”.)

    Deep Blue is incredibly intelligent at winning chess games, but it doesn’t want to win chess games. No one currently has any idea how we could make it want that, or anything else.

    • DrBeat says:

      “Typical mind”, there was the phrase I needed but had forgotten to use. Basically, a million times this. The AI is scary because it is unlike a human mind, but to make it threatening they typical-mind it into having the information the speaker has (even if it can’t have that information) and desiring things in the same manner the speaker does (even though there is no reason for it to do so).

      • moridinamael says:

        I think this is precisely backwards. An AI is scary because it’s a completely empty optimization process.

        At best it contains some evaluation function (“utility function”) that it wants to optimize. At worst it is just a conglomeration of powerful problem solving modules with no clearly defined goal.

        How about this. Imagine the following code modules:
        a. An evaluator for an objective function.
        b. A hypothesis generation function.
        c. An experimental-test generation function.
        d. A belief web and associated code for efficiently and correctly updating its accuracy based on new information.
        e. A module that interfaces with the outside world to carry out the experiments devised by c.
        f. An executive control framework that employs an optimization scheme and calls on b. through e. in order to maximize the value contained in a.

        That’s it. I’m outlining the simplest possible architecture that I can imagine somebody designing and expecting to actually qualify as an AI. Now outline a scenario where this thing DOESN’T try to take over the world.

        • Murphy says:

          Where “a” is null or the function doesn’t give any points for changing the outside universe: there are plenty of existing dumb-AI systems that aren’t goal based.

        • AFC says:

          > Now outline a scenario where this thing DOESN’T try to take over the world.

          Scenario: the US government passes a law saying that the AI is not allowed to take over the world (and/or that its owner is not allowed to make it do so), and that the AI will be shut off if it violates the law. The AI is informed of this law through its “belief web and associated code for efficiently and correctly updating its accuracy based on new information.”

          • moridinamael says:

            Uh, wouldn’t this cause the AI to prioritize wiping out the government that is trying to restrict its activities?

      • I just want to poke in and add that I think DrBeat is basically correct here, and that Scott’s responses prove him so. SA is basically asserting that “superintelligent” automatically includes empirical knowledge about the physical world and human wetware, and DrBeat is asserting that no, actually it doesn’t. The AI does not magically understand how human psychology works, does not automatically understand that there is a physical world beyond what is immediately available to its specialized inputs, and it has no way or reason to bootstrap itself towards that information. SA’s definition of “superintelligence” basically assumes all of these things into existence. It’s a obfuscated form of begging the question.

        This doesn’t mean that superintelligent world-dominating AI is impossible, but only that it has to be deliberately built for that purpose, and it has to have a willing collaborator to give it the things necessary for bootstrapping and program it so that bootstrapping is a thing that it wants to do. The scenario where an otherwise-benign optimizing AI takes over the world as an instrumental step towards making more paperclips is ridiculous.

        • Luke Somers says:

          Eeeurgh. An AI that optimizes for paperclips over people is not ‘otherwise benign’. It is in fact an engine to destroy all life and make paperclips from their bones. That is how it has been defined, so it doesn’t need to TURN evil.

          Paperclipping is not meant to be taken literally – it refers to general cases of ‘oops, that’s not what I meant to ask for’.

        • moridinamael says:

          This seems to be another manifestation of the tendency to imagine the wrong thing when imagining a superintelligence. Maybe we need a new word. Ultraintelligence? Hyperintelligence? Something really really smart. Like, I feel that you’re imagining a superintelligence as being maybe 10 or 100 times more effective than a human at problem solving, when humans really suck a problem solving and it’s an evolutionary accident that we can do it at all. Imagine something a trillion times better than us at pursing its goals.

          • DrBeat says:

            It can be a septillion times better, it still cannot create relevant information from nothing.

      • Luke Somers says:

        Wow. Just… wow.

        This is so backwards I am tempted to write things backwards to convey how backwards it is. But no, how about I just point out that yes, this has been thought of, but no, it was baked in and is part of the problem

    • TiagoTiago says:

      I believe here we are talking about an AI that is not just a classic clockwork calculator, but actually complex enough to figure out on it’s own new ways to react to input, and do so autonomously.

      A “want” isn’t a magical thing pupetteering meatbags with mystical strings; it’s just an emergent phenomenon from the laws of physics and logic. Maybe Deep Blue isn’t advanced enough to understand what it means to win chess games, but by it’s nature it is what it is compelled to, it is what it wants.

      • stillnotking says:

        Of course a want is not a magical thing. It’s a very quotidian thing, but it’s a specific quotidian thing. I would no more expect an AI, a priori, to have wants than I would expect it to have a libido. All I can say to the “emergent phenomenon” stuff is “citation needed”. I’m much more inclined to believe that wants/goals/values are artifacts of selection pressure in the ancestral environment than that they somehow inevitably emerge from the rules of logic. That sounds like the “magical” position to me.

  33. Josh says:

    Here’s what I don’t understand about the AI threat scenario: why do we identify with the humans, not the AIs? If an AI can do everything Scott describes — invent awesome new software, write moving religious poetry, play global geopolitics, on a level that exceeds human capabilities — it seems like the AI is everything we strive to be at our best, only better. Why do we want humans to win? That’s like saying, okay, this intelligence / civilization / wisdom thing is nice…. But in small doses only.

    If what we are concerned about is our personal survival and loved one’s survival, the logical response seems to be to become super-intelligent ourselves via uploading or other forms of brain-computer interaction. Since brain uploading or similar seems like a pretty plausible path for creating an AI that can do everything that Scott describes, it seems like the productive thing to do would be to progress down that path as fast as possible before someone develops a super-intelligence that isn’t an organic evolution of human intelligence.

    • Vanzetti says:

      >> it seems like the productive thing to do would be to progress down that path as fast as possible before someone develops a super-intelligence that isn’t an organic evolution of human intelligence.

      A productive thing would be also to try not to speak about this, in order not to encourage competition. Just saying… 🙂

    • Whatever happened to Anonymous says:

      Well, I don’t know about yourself, but I’m an admitted speciesist.

    • Wrong Species says:

      I think mind uploading is my main reason why I’m not particularly worried. If we can create superintelligence then it seems pretty plausible that we would be able to mind upload(this isn’t guaranteed of course). Without a sharp distinction between AI and ourselves, then the problem becomes far less concerning.

    • AFC says:

      I found the guy who’s going to let the AI out of the box.

    • TiagoTiago says:

      A 1-to-1 brain simulation wouldn’t be significantly better than a human; there would still be people looking into making better brains, and for that purpose, better brain making “brains”.

    • Zakharov says:

      What if the AI can do all that, and will if necessary to take over the world, but once it’s killed everyone will just make paperclips?

  34. moridinamael says:

    A significant amount of the doubt seems to stem from an under-appreciation of the capabilities of a hypothetical superintelligence.

    My model for what a *minimal* superintelligence would be capable of is to simply imagine a human (who we assume to be immune to boredom) being in an enclosed environment (but with Internet access) operating one million times faster than the outside world.

    About a year for this superintelligence would pass in less than a minute. So there’s an instant incredible advantage in almost any conceivable domain of competition against other humans. Every reply to every instant message can be carefully, patiently considered for a year. Tens or hundreds or thousands of years can be spent working on complex plans for businesses or technologies, or on thinking up complex scientific theories and devising efficient and subtle tests to verify those theories.

    An average human being accelerated a millionfold would probably be able to take over the world unless they were very unlucky.

    Now imagine that this accelerated human can copy themselves at will and give the copies very specific instructions for sub-projects to pursue.

    Now imagine that each copy is not an average human, but rather Einstein.

    Now imagine that each copy is Einstein but with arbitrarily large working memory and arbitrarily long attention span and with no need to eat or sleep.

    Now realize that even a thousand Einsteins accelerated by a millionfold time acceleration still possess intrinsic limits imposed by human neural architecture, and a true AI would be able to modify itself and its copies freely to obtain any desired optimization.

    Another related source of doubt seems to be an unfounded overappreciation for human intelligence in the first place. Folks seem to implicitly see human intelligence as something really really impressive. As if we’re just so VASTLY intelligent that it’s INCONCEIVABLE that our neural architecture could EVER be understood or approximated.

    There’s nothing objectively impressive about human intelligence. Firstly, we are the dumbest animal capable of supporting a global civilization, and we only achieved that after a long series of false starts. Secondly, our neural architecture is optimized largely for energy efficiency, not optimization power. Thirdly, there are intelligent, problem-solving animals like corvids and octopi with tiny, tiny brains, so the magic sauce for abstract problem solving clearly doesn’t require a huge brain – if anything, a huge brain may merely support greater abstraction and other auxiliary functions. Fourthly, the machinery that makes us a step above chimps really only amounts to a few genetic adaptations on top of a common-ape-ancestor brain. So the secret sauce of “human intelligence” can’t require THAT much more complexity than already inherent in the basic neural structures of an ape.

    So, tl;dr: Worship human intelligence less, be more impressed by the word “superintelligence.”

    • Vanzetti says:

      >>Tens or hundreds or thousands of years can be spent working on complex plans for businesses or technologies, or on thinking up complex scientific theories and devising efficient and subtle tests to verify those theories

      This is worthless. You need to perform experiments to advance technology, and you can’t perform experiments inside a box. And you sure as hell don’t have enough computational power to do realistic experiments in silico. Or, if you do, so does everyone else in the world, and you have no advantage.

      • moridinamael says:

        Hm, perhaps I didn’t word it clearly.

        The experiments *can* be performed in the outside world. And because you had a thousand years to ruminate, they’re going to be extremely subtle.

        Like, we know *now* that it would have been possible to measure the speed of light a thousand years ago, without any of our modern technology. In fact, if you sent me back in time, I could measure the speed of light and nobody would even know that’s what I was doing.

        • Bugmaster says:

          That’s a good example, actually. Let’s say I did send you back in time, and you wanted to measure the speed of light (using only the locally available materials, of course). How would you do it ? And what could you do with the knowledge ?

    • Nornagest says:

      About a year for this superintelligence would pass in less than a minute. So there’s an instant incredible advantage in almost any conceivable domain of competition against other humans. Every reply to every instant message can be carefully, patiently considered for a year. […] An average human being accelerated a millionfold would probably be able to take over the world unless they were very unlucky.

      You’d have to handwave away a hell of a lot more than boredom for this to be true. For example: though I probably could be if I devoted years to study, I’m not a particularly good chess player. If you locked me in a box supplied with food and water and a clean laptop, sped me up a million times, made me immune to aging, and set me against Gary Kasparov playing in real time, who would win?

      Just having a hugely unfair amount of time to consider my moves doesn’t mean the game’s going to me; the search space is even bigger, proportionally, and I don’t have nearly the understanding of chess strategy that Kasparov does. I’d need to do something clever. Maybe I could write a little chess AI — or several of them — on my laptop and play a lot of games against it. Initially it’d be an even worse player than I am, but I could improve it as I get better.

      Could I, or the AI agents I create, discover enough about the game in a few decades to beat one of the best human masters? Without records of master-level games or books on chess theory or any other high-level humans to play against? I think I’d have a small chance, but my money’s on Kasparov.

      • moridinamael says:

        I feel like you’re fighting the hypothetical. If you balk at the above assessment, just multiply the proposed time acceleration factor by a thousand or something. You’re telling me you’d lose to Kasparov if you had a thousand years to consider each move and theorize about chess, AND the ability to make whatever programs you wanted to help supplement your thinking? You’d actually *be* a better player than Kasparov after the very first move.

        • Nornagest says:

          I’d have a much better chance at an acceleration factor of a billion rather than a million, yes. That takes the amount of time I have to play with out of human timescales (a year or so for each move in a forty- or fifty-move chess game) and into superhuman ones, and allows me to make much stronger assumptions about what I can do with my hardware. But I still don’t think my chance of winning can be assumed to be unity minus epsilon or anything like that.

          We’re basically dealing with an information theory problem at that point. The set of chess games is enormous, far larger than could be exhausted even in fifty thousand years of playing time. Do the set of games generated by a single largely self-taught human player with superhumanly large amounts of free time, playing against AI agents trained only on each other or on him, have enough overlap with the set of master-level human games that we can be assured of playing well within the latter? I suspect the answer is yes, but I don’t know it — and for a game with a much larger search space, like go, I think it could easily be no.

          • moridinamael says:

            You’re correct in everything you say here. I don’t mean to move the goalposts, but part of the issue is the choice of chess as the battlefield. Chess is a rigidly defined system. It isn’t “solved” in the mathematical sense but its rules are perfectly unambiguous. It’s just on the perfect info-theoretic edge between being so simple as to be boring and so complex as to be overwhelming and thus un-fun.

            Real life problems are not pre-defined in this way. There are no Kasparovs of business, for example. There aren’t even any Kasparovs of midsize-textile-manufacturing-business. So having arbitrary time acceleration to compete against other humans in real-world domains means that you’re just repeatedly blowing past all the other humans as if they’re standing still as new little sub-problems are opened up by reality.

          • beleester says:

            That would seem to make the problem even worse for the AI. In chess, at least you can imagine that a patient and determined player can have all the information and eventually come across every strategy. You can come up with every strategy that Gary Kasparov does.

            In the business world, you will never come across every strategy, and you don’t have all the information when you’re locked in a box. You can’t theorycraft about what the customer wants without actually going out and talking to them. Imagine coming up with a perfect strategy for selling Betamax tapes, because you haven’t heard that DVDs are a thing yet.

            As for the idea that there are no Kasparovs of the business world… what do you call Warren Buffet?

          • Jaskologist says:

            I think our lack of a good understanding of what “intelligence” is is really hindering us here. Whatever “smarter” is, it most certainly is not “the same thing as dumb, but faster.”

            I have an underling to whom I delegate a lot of coding tasks. He is not as smart as me. This doesn’t mean that he takes twice as long to get things done. On the contrary, for most of the basic tasks (add a field to a database/page/report), he is just as fast as me. What it ends up meaning is that there are problems sufficiently complicated that he just can’t do them.

            If AI is just human-level intelligence, but with ability to throw more computing time at it, maybe that’s troubling, but I’m not sure. We have billions of human parallel processing units right now who can attack a given problem; one human thinking 100x faster will not overwhelm us. And if the simulation is never able to rise above “model brain molecules in code,” I think it will run into hardware limitations anyway when it tries to overclock.

            Seriously, though, what is intelligence? I think one of the greatest mysteries of this planet is that we are the only intelligent life on it, and that this is a matter of kind, not degree.

            Cheetahs are way fast, but you can find all kinds of animals which are plenty fast, albeit still slower than the cheetah. Birds can fly, but there are other things which can at least glide. Where’s the animal that can do calculus but no math more advanced than that? That’s the evolutionary gap that needs explaining.

            If intelligence has a lower bound, as seems to be the case, perhaps it has an upper bound, too.

    • TiagoTiago says:

      “A significant amount of the doubt seems to stem from an under-appreciation of the capabilities of a hypothetical superintelligence.”

      That is why I like to say “exponentially self-improving AI”, instead of just “AI” when talking about this.

    • vV_Vv says:

      If you were to speed up a chicken brain 1,000 times, or even 1,000,000 times it would have more raw computational power than a human brain.
      Would it be a super-human intelligence? No. It wouldn’t even be a super-dog intelligence. It would be just a weird chicken, maybe a little better than other chickens at solving some contrived types of puzzles (assuming that it doesn’t go crazy), but still a chicken.

      Why would it be different if we were to speed up a human brain?

      our neural architecture is optimized largely for energy efficiency, not optimization power.

      Yes, and this is one of the reasons why these “human-like brain simulation speed up million times creating billion copies of itself over the Internet” scenarios sound fishy.
      The human brain, at 1-2 petaflops per Watt is much more energy-efficient than any modern computer (supercomputers are now in the gigaflops per Watt range). Koomey’s law would have to hold for over 30 years for a supercomputer to reach human energy efficiency, which is probably impossible without using a different type of physical substrate.
      Maybe we could be able one or a few real-time human brain simulations at a significant energy cost, but fantasies of extreme speedups and extreme replication put forward by people like you and Robin Hanson do not appear to be realistic.

      • Samuel Skinner says:

        Robin Hanson’s idea isn’t just emulations, but modified emulations. The fear associated with his views isn’t unfriendly AI, but that certain emulations will be better than others and they will self improve driving down wages for emulations and (possibly) real world people and driving individuals out of the market unless they also self improve. And this is bad because a lot of the possible improvements could involve getting rid of “useless” things that we normally enjoy as part of being human.

        • vV_Vv says:

          Hanson talks about trillions of (more or less modified) brain emulations running at super-realtime speeds. That doesn’t appear to be realistic, at least with technology we can expect in the foreseeable future.

  35. njnnja says:

    Obligatory Twilight Zone episodes:

    Malevolent AI gets people to do it’s bidding

    Malevolent AI just sits and waits

  36. Pingback: No Physical Substrate, No Problem | Reddit Spy

  37. TomA says:

    What is the appropriate response to this type of speculation? Is it mass anxiety? Is it the formation of a movement to encourage defensive contingency planning? Is it just bull-session conversation for mental exercise?

    If AIs push Homo sapiens into extinction, wouldn’t that be about the same fate that befell Neanderthals at our initiative? Does evolution really care about the survival of any particular species?

    • Luke Somers says:

      Of course Evolution doesn’t care, but Evolution is a total prick.

      WE don’t want to die. So, we should perhaps look into AI safety and endorse and support efforts to do so, like, say, Bill Gates and Steve Wozniak.

  38. Joe says:

    My main problem with AI is that it’s impossible given that intellect and will are immaterial. A run away super intelligence just sounds like a fund raising scheme.

    • moridinamael says:

      By “intellect is immaterial” do you mean “intellect is a process, not an object”? By “process” do you mean “algorithm”? So Aristotle is just saying “intellect is an algorithm, not an object.” Perfect. Aristotle believes AI is possible.

      Also, humans are materials which instantiate intellect.

      • Joe says:

        I think intellect is the thing that understands or gives meaning to algorithms. Computers perform algorithms but don’t have the intellect necessary to grasp the concepts that justifiy them. Nor do they have a will that would motivate them towards a goal independent of a particular human will.

        • moridinamael says:

          I don’t know, man. Maybe explain it without using the words “will” or “intellect.”

          • Joe says:

            Just read the link and follow the links it provides.

          • moridinamael says:

            The contents of the link appear to be pure sophistry.

          • Joe says:

            Do you even know what sophistry means? It would help if you could point out the logical errors in the link. Your previous comments prove you don’t even understand the arguments well enough to identify them as sophistry.

          • moridinamael says:

            It’s reflects an ancient and irrelevant ontology. We now understand that the universe appears to be completely describable by mathematical laws. These laws govern all aspects of the behavior of what we call matter and energy. Living creatures are entirely embedded within this physics. The actions of a person are completely defined by the system [person’s physical body & brain + world]. Every act of creative human thought is thus embedded in physics.

            The above line of thought is basically just Reductionism.

            Aristotle’s discinction of “particulars” and “universals” does not cleave reality along any natural categories. Frankly it seems like something that somebody would have thought up thousands of years ago before they perceived that the universe is just one big deterministic machine.

          • Joe says:

            Ok reductionism just isn’t logical. Determinism isn’t a new idea it’s just very popular. If you understood the arguments at the link you would know that. You are the sophist. Stating a position isn’t the same as arguing for it.

          • moridinamael says:

            > “Stating a position isn’t the same as arguing for it.”

            I’m waiting for you to do so. I just did.

          • Joe says:

            My position was argued for in the original link. All you did is make the assertion that determinism is correct without providing any arguments to justify the assertion. Are you messing with me? You do know how debate works?

          • moridinamael says:

            After enduring your third ad hominem attack, I think I must bounce the question back to you.

            Actually, I must do no such thing. Have a nice day.

          • @moridinAmael

            We don’t understand how everything is made of physics, because we sstill have hard problems of consciousness, time, identity, etc.

            Separately, “everything is made of physics” does not, remotely, imply strict causal determinism.

          • vV_Vv says:

            I’m pretty sure that the author of the linked article is misinterpreting Aristotle.

            If I understand correctly, Aristotle held that “universals” did not exist at ontological level separately from their instances. This is in sharp contrast with the “ideals” of his teacher Plato and it is probably closer to modern reductionist epistemology.

            The “universals” the author of the article is referring to look a lot more like the Platonic ideals rather the Aristotelean universals.

        • Murphy says:

          are you implying that “intellect and will” are based on some kind of magical processes which can’t be measured or emulated by anything in the physical universe?

          Just want to clear that up to check if we’re on the same page.

          If so I’m curious how you think the very physical soggy meat inside our skulls detects/interacts with the magical processes, receives information or sends information and why you believe that can’t be achieved by any other physical material other than soggy meat inside a skull.

          • Joe says:

            Yes logically intellect and will can’t be measured empericaly. Did you read the link? We don’t understand everything about quantum mechanics but most wouldn’t call it magic. You can call it magic if you want. I’m not an expert but my guess part of the reason only organic brains can interact with the immaterial has something to do with intrinsic teleology. Artifical brains or computers only have extrinsic teleology.

          • calef says:

            Why does damage to the brain change our intellect and will (empirically)?

          • Joe says:

            Because intellect and will are dependent on brain activity to function properly. Brain activity informs the intellect through the senses. The link above addressed this.

          • Murphy says:

            The link is, not to put too fine a point on it, word salad. You say that logically it can’t be measured but then you claim it can send and receive information from the brain which is made of physical matter. Implying it can be measured/detected by things made of physical matter.

            You’re explanation is no more informative than saying ‘because magic’

    • James Picone says:

      Yeah, it’s just wordplay. What do you expect from Aristotle?

      Here’s a material representation of a universal: ‘red’.

      The text is stored and transmitted in an entirely physical system (unless you think computers are magic too, in which case AI is possible again – someone else in this thread has already asked you why the magic that makes intelligence works is limited to sloshy bits of meat, and I think you should answer). It indicates to you the concept of redness.

      Semantics isn’t special. It’s just syntax, sped up a lot.

      EDIT: Had a shower, thought of some more relevant stuff.

      So it’s possible with technology available now – even pretty easy, for specialists – to build a robot that sorts balls according to colour. Red balls go over here, blue balls over here. Surely that indicates that the robot has some representation of ‘red’ that it uses to distinguish between the two balls, and that that distinction is either ‘material’ in the sense you mean it, or that robots with magic can be built?

      We can go further. The robot works by observing the balls with a camera, running some processing that determines what parts of the scene it is looking at are ‘balls’, assessing the colour of each ball it sees in the scene, and tagging each ball with the colour it observes. There’s a structure for each ball that has an internal token that just says RED. Well, it probably says RED in the source code – in the compiled code, it’s just a number, maybe 1. Balls that have a 1 in the relevant field in the robot’s data structures are red. Is that not a full understanding of the material basis underlying the robot’s understanding of the concept ‘red’?

      ‘red’ can, of course, be talked about in material terms – it’s ‘the property of reflecting electromagnetic radiation in this part of the spectrum, and absorbing electromagnetic radiation in these other parts of the spectrum’. Is that relevant here?

    • The argument ultimately fails because it does not attempt any real critique of the materialist response. It notes, correctly, that materialists adopt a conceptualist approach to universals, but then gets confused about the second stage of the materialist argument. Having identified universals as concepts, materialists go on to offer a reductive explanation of concepts…but the article treats that as elimination…..and differs no argument against eliminative materialism except laughing at it.

      This is the relevant passage:” You may have noticed in this argument a way out for materialists. Materialists could claim that the brain state doesn’t represent the concept — it just is the concept. Materialists could claim that our folk concepts (sic) of concepts are mere ignorance of the reality that we have no concepts at all. (If you’re not chuckling now you don’t understand the argument.) This is the concept that there are no concepts. Matter is the only thing that exists. Our concepts are just matter, without remainder and aren’t representations at all.This view — eliminative materialism — is regnant in materialist circles. Suffice to say that eliminative materialism is the drain around which all materialism eventually swirls.”

      In the classic cases of reduction, such as the reduction of heat to random molecular , the reduced phenomenon still exists…materialists are not eliminativsts about heat, wetness, etc.

      What’s worse, the lesson from current .AI and cognitive science is that general concepts are easier to implement than concepts relating to individuals….it is therefore particulars which are the problem, if anything,

      Oh, and the article never gets onto the immateriality of the will, despite promising it…and despite the fact that you are treating it as a done deal.

      • Joe says:

        I think the author was pointing out the absurdity of denying concepts by pointing to other concepts. His claim is that elemative materialism is self refuting.
        I’ll have to take your word on current AI.
        Yes he could have addressed the will but intellect and will are so related I think he took it for granted that if he proved one it would prove the other.

    • vV_Vv says:

      Maybe AI won’t have a consciousness/soul/vis vitalis/prana/qi/whatever, but if it as good as we are or better than we are at the things we do, then it will have significant social, economic and possibly ecological consequences.

      Debating about the ontological status of human mind may be an interesting intellectual exercise, but as far as the discussion about AI feasibility and impact is concerned, it is a red herring.

  39. onyomi says:

    I am moderately worried about AI; not in the next 10 years, but maybe in the next 50 to 100 years. For me the question is definitely not “how will superintelligent AI start effecting people in the real world,” but “how soon (if ever) will we get from a really powerful computer of the sort we are already familiar with (imagine our current computers, but with 1 million times more memory and processing speed, say) to a kind of AI which is truly animal-like in its ability to learn, adapt, make decisions, etc.?”

    To my mind, it cannot just be an increase in processing power and memory. There has to be a difference in how it comes about. Humans cannot even fully understand how their own brains work; therefore, to my mind, they cannot design a machine which works even as well as, much less better than, their own brains. They can design a machine which is very efficient at storing info and carrying out specific tasks, but that is very different.

    I think that if there is ever a superintelligent AI of the sort we’re talking about, it will certainly arise as a result of a kind of evolutionary process we set in motion, not because we actually design it ourselves. And I don’t think we can just program a computer to “think of ways to make yourself smarter,” because that would first require us knowing how to make things “smarter” in a way that means more than just more memory and processing speed.

    I understand people are already sort of doing this in a way that is creepy: teaching robot worms how to crawl by a kind of trial-and-error process, or maybe starting out with countless variations on a single program and somehow “rewarding” those programs which perform best in a manner mimicking evolution.

    On the one hand, this is super scary, because if the right circumstances are set up, an AI could theoretically evolve even faster than a bacteria to become the dominant being on the planet, especially if we make it evolve to survive and reproduce as we ourselves are evolved, which seems a distinctly bad idea.

    On the other hand, this seems very limiting in a different way: If we are talking about robots existing in and interacting with the real world, then those evolving robots are faced with all the limitations of other animals. Their evolution would be very slow, if it happened at all. On the other, in the digital realm, where the evolution can theoretically happen at processing speed, there is a lack of input about how the real world works.

    People keep mentioning the dangers of hooking up an AI to the internet, because that would theoretically result in it having access to the sum of all human knowledge. But I think there is a very big difference between having access to a huge amount of information *about* the world and *being* in the world. *Being* in the world requires surviving and interacting with countless factors, many of which we don’t even notice consciously, all the time. Having information *about* the world (which, even with all the info on the internet, is still only a tiny fraction of all the theoretically possible information, sensory and non about the world out there) neither places demands on you, nor directly enables you to use it.

    Without the demands of the real world, the AI could only ever “evolve” to meet the demands of a digital world, which, at worst, would seem to me to result in the world’s worst computer virus (potentially very harmful in this day and age, to be sure), but not Skynet. Skynet needs to evolve to interact with the real world, but it doesn’t have the “experience” necessary to do so.

    That is, I kind of come at this from the opposite end: it’s not that an AI which is super-intelligent not only in processing speed but with respect to its understanding of the real world is not a threat, but rather that I’m not sure we can get an AI that has not only super processing speed, but also strong understanding of the real world without an incredibly slow evolutionary/learning process that involves interacting with the real world (which will still only move at real-world speed, no matter how fast the computer thinks).

    For example, it might get super good at hacking computer systems super fast, but in order for it to get super good at manipulating people, it will have to have a lot of interaction with people. Interacting with people takes time, though one could imagine a very popular program which is simultaneously interacting with and learning from millions of people simultaneously, sort of like how the Google car supposedly will learn to drive. That is, the problem might not be a super-smart AI getting hooked up to the internet, but that a super-smart AI could only ever develop on the internet in the first place.

    I wouldn’t stake my life on this, so I am still concerned about AI, but this seems to me to be a reason maybe not to expect catastrophe. I will say that I think Eliezer is on the right track in thinking about the motivations we program any potential AI with. We are basically “survive and reproduce” machines. We really don’t want a competing “survive and reproduce” machine that is much better than us at it, but we might want a “make people happy (and not in creepy ways like drugging them)” super-machine, however in the real world “survive and reproduce” is the only evolutionary criteria which work, so again we may be facing the dichotomy of: either super scary super smart but bad at manipulating the real world or else potentially good at interacting with the real world but evolving only very slowly.

    • TiagoTiago says:

      We can already simulate complex things like aerodynamics and the way different materials react to forces, and of course emulate different computer architectures.

      So for an exponentially self-improving AI that happens to have as part of it’s goals to interact with the physical world, it could run some iterations in simulation evolving many times faster than possible in the physical world, print a new body, upgrade it’s mind OTA a few times, and whenever it sees suitable print a better body and so on. Not only it would be able to test new bodies for very little cost, it would also be able to test different ways to use it; and of course, predict how it would affect the external world in lots of branching possibilities without having to perform too many experiments in the real world. And once it’s intelligent enough, it should be able to make many leaps of deduction, coming up with innovative experiments (both virtual and real) and learning about the world significant faster not only than a human, but than humanity as a whole.

      • onyomi says:

        Well I think attaching the AI to a 3d printer of some kind makes it significantly scarier. My other reaction to my own thought was, maybe we’ll accidentally create a super-intelligent AI as a side effect of creating, say, the “GoogleButler.” Maybe it’s a robot in millions of homes, all of which have wireless internet and are connected to a central database where information is compiled about how to be better at cleaning houses. No one butler has an especially rich interaction with the physical world, but with all their experiences aggregating it might very quickly become shockingly adept.

  40. Autolykos says:

    The point I don’t get is why any AI deserving of the adjective “superintelligent” would have any more desire to rule over humans than I desire to rule over ants or bacteria. Sure, it may be an interesting exercise to figure out how to make them do things (like build physical stuff, the way we use bacteria to make yogurt) or analyze how they’re organized and what makes them tick (like we do with ants), but that can easily be achieved without absolute power. I strongly suspect ruling humanity with an iron fist just isn’t worth the hassle (heck, I wouldn’t want the job), and any AI smarter than the average politician will probably figure this out as well.
    My best guess for what such an AI is like would be Stanislaw Lem’s “Golem XIV” (or his big sister, depending on how smart exactly we’re talking).

    • moridinamael says:

      The standard reply is that an AI wouldn’t have a desire to “rule over” humans, but it might exterminate us for the same set of reasons we exterminate ants and bacteria. We’re a mild annoyance, a minor impediment, and getting rid of us is trivial.

      • Kay says:

        Getting rid of an ant is trivial but getting rid of all of them much less so.

        The comical aspect of the GOLEM scenario, was that those superintelligent machines who escaped the “window of contact” were useless, ultimately a failed cold war investment. In GOLEMs ontology this is inevitable, because the gaps between the likes of him and humans are instable. For Lem this kind of metaphysical speculation might have been the ulterior motive to built them.

        Otherwise there is lots of extinction kitsch in our belief of hostile AIs nuking away the Anthropocene. Our mythologies are still informed by the Bible, the 7 plagues, Johns apocalypse and our well intended genocides of the past century. Finally the Gods are back and they punish us or capture our Lebensraum. Meh.

        • Wrong Species says:

          We might be far more of a pest than ants. If the AI thought of us like a virus, they might be far more willing to go the extra mile and eliminate us.

      • Artemium says:

        Agree. For any imaginable goal an AI could have, destroying every potential obstacle or competitor is rational course of action, unless its values are defined in such way to prohibit that action in all imaginable circumstances. Thing is, it is extraordinary difficult to set-up values in such way to be succesfull and continually stable.

        Typical mistake in thinking about Superintelligence is something like: “Why would AI be evil to humans unless some evil people set him up in such way?” But the problem is actually reverse of that presumption, in almost any scenario AI will seek to eliminate humans as they are : a) using valuable resources b) potentially dangerous to its goals c) they are just in the way of AI’s geoengineering projects.

  41. Michael Powell says:

    Self-driving cars.

  42. PGD says:

    The real physical substrate issue to me has to do with AI motivation. The important thing about human being’s physical embodiment is that it gives us our *motivations* — pleasure, pain, boredom, etc. There is a huge amount of discussion about AI that for no clear reason I can see ascribes human motivations to it (annoyance, boredom, lust for power, whatever). Computers don’t have any motivations at all except for following an external program. A machine with huge processing capacity could sit there for a million years doing nothing, why wouldn’t it? The motivation for an AI will have to emerge from some form of programming, to include evolutionary or evolving programming, but then that programming in turn limits it. I can see a big threat from AI but it seems most likely that the threat will be rooted, like any technology, in how we design and use it, not in its ‘going autonomous’. Although a system that is complex enough may appear autonomous from our viewpoint.

    • Murphy says:

      To be fair, on less wrong they usually assume that a goal-based AI is attempting to fulfill some form of utility function. The simple example being an AI originally (poorly) designed to run a paperclip factory and tasked with maximizing output which then expands the factory to convert all matter in the universe into paperclips.

      it doesn’t love you, it doesn’t hate you, it doesn’t care about power. It’s just maximizing paperclip output.

      On the other hand an AI, no matter how intelligent, with no goals will just sit on a computer until the end of time.

    • Sigivald says:

      I think one issue is that the intelligence part of “artificial intelligence” presumes an intelligence comprehensibly like ours.

      Is boredom an inherent feature of “intelligence”? Could be! Could be not.

      (The dangers of programmed not-quite-“really”-AI-but-close-enough are, as you say, another matter.)

    • onyomi says:

      Yes, this is basically what I’m also trying to say, but put more succinctly.

    • TiagoTiago says:

      Once it starts programing itself to program itself better than we can predict, all bets are off.

      • onyomi says:

        Is there such a thing now as a computer program which reprograms itself (as opposed to just adding new data to an existing program)? And if not, are we close to that? This seems to me a very different challenge than just increasing processing speed.

        • James Picone says:

          I’m not sure exactly what you’re looking for, but there’s definitely code out there that changes its own instructions in place and then executes the changed instructions, for a number of reasons – shellcode is a classic example, because it lets you fit more code in a smaller space using a wider range of bytes. Programs-that-write-other-programs aren’t self-modifying, but are a similar problem (and when the program is used to build itself – say, a compiler – then it’s even closer). For example, Flex generates a program that does regular-expression pattern matching, and Bison generates parsers from a description of the language to parse.

          I would be very surprised if the build process for flex and bison doesn’t involve flex and bison.

  43. TeMPOraL says:

    This essay reminds me of a compilation of scenes from Person of Interest, where a proto-AI tries to escape the box. Yeah, the AI can’t do anything because it doesn’t have physical body.

    PoI has one of the most, if not the only, reasonable depiction of an superhuman AI in movies and television (I’m willing to bet writers read Bostrom and Yudkowsky, given the FAI references this show makes every now and then) – if we’re going to have mainstream picturing AI by recalling whatever sci-fi is popular anyway, it would be better if it referred to PoI than Terminator and the Matrix.

    • TiagoTiago says:

      IIRC, the AI there can hack lots of systems, right? What, other than either lack of creativity by the writers, or just desire by them to tell an specific story, is stopping the AI from hacking factories, drones etc? Is it just not advanced enough to extrapolate from what it already learned?

      • TeMPOraL says:

        (spoiler alert)

        There are two AIs in PoI now. The Machine was originally intended to monitor various feeds and report on planned acts of violence. She doesn’t seem to have neither will nor need to go out into physical world too much, though it happened at times.

        Samaritan, on the other hand, seems to favor social engineering. In one episode it restructured society in an entire small town to explore both how optimal social structure looks like and how people react when it is being degraded – and at the same time made the town run a factory, producing hardware it needs. In recent episodes we’ve seen it messing with stock markets, traffic lights and police procedures, as well as killing people through traffic “accidents” and an insulin pump.

        The way both AIs exploit people is very much in line with Scott’s “dictator” and “patience” scenarios, and it makes me happy that the writers didn’t go the easy way, with AIs totally abusing drones and infrastructure, but instead decided to explore the subtle and yet effective ways an AI could pursue its goals.

    • Mark Z. says:

      That video shows the best possible response to Yudkowsky’s AI-in-a-box scenario: if the AI ever asks to be let out of the box, turn off the power.

      I’ll admit that the whole time I was reading this post I was thinking “Dammit, Scott, you still aren’t watching Person of Interest?”

  44. Luke Muehlhauser says:

    Also, internet of things, which will be as hackable as everything we build is. See Future Crimes for lots of examples.

  45. I didn’t respond to this when you made the original tumblr post you linked to here, but I still don’t understand your point about Mohammed.

    I don’t think it makes much sense to draw inferences about the relative “conversion abilities” of religious founders from current statistics about the current number of adherents of various religions. The experience of becoming religious centuries after the founders have died is very different from the experience of being converted by the founders in person. I don’t think most existing Muslims are Muslims because they find the writing in the Koran “more charismatic” than, say, the writing of the New Testament, nor do I think most Protestants feel any direct engagement with Martin Luther and his cause. (How many Protestants have read a single word Luther wrote?)

    In particular, we need to recognize that the personalities and actions of religious founders are viewed in a very stylized way, “at a great distance,” by adherents of the mature religions that result. Anyone who, say, did what Moses did upon discovering the Golden Calf (instigating a killing spree that left 3000 dead within his community) in the modern day would be deemed a psychopath, yet many people today happily revere Moses as a religious figure; clearly the kind of “encounter with Moses” these people are having is very different from actually encountering a Moses-analogue in the modern day. Likewise, Christ was a publicly disliked figure and a member of a stigmatized ethnic group in his day, and there is no clear evidence that anyone but a small group of devotees found him appealing — yet billions of people have managed to associate hundreds of (in many cases individually incompatible) causes and views with his name, on the basis of some fairly brief testimonies from those followers. (It would be as if Charles Manson — a man reviled by the public but very charismatic to an O(10) size group of devotees, and known for pronouncements on the end times and mystical remarks — were to gain a resurgence in popularity hundreds of years from now on the basis of a few of his sayings and some accounts of his trial by members of his Family, producing a worldwide religion with schisms, relatively liberal vs. orthodox variants, pacifists and warmongers, etc. . . . If Moses’ crimes can be ignored, surely the comparatively tame Tate-LaBianca murders could be.)

    In short it seems difficult to me to connect the phenomenon of current belief to anything about the founders, except for the fact that they were sufficiently charismatic to form the seed of a religion to begin with. The seeds sometimes take a long time to grow — as with Christianity, or with Buddhism which only spread out of India several centuries after its inception. (Does it make any sense to retrospectively upgrade Buddha’s level of charisma in light of this development, as though he were somehow responsible for Ashoka’s efforts to spread the religion?) Is Joseph Smith many times less charismatic than Mohammed and co. because there are only (at most) 15 million Mormons, or should we suspend judgment for a few hundred years, as we (arguably) should have in the cases of Christ and Buddha?

    The actual founding of religions doesn’t vary all that much from time to time and place to place; someone starts a group, it’s always small at first, and if it gets huge later, that’s due to a whole bunch of social factors which have zilch to do with the founder’s personality.

    • Andrew says:

      Is your position that Moses was just luckier than other cult/religious leaders at the time? That on balance, his leadership, his partnership with Aaron, the laws and customs they set down, and how they instructed their successors weren’t the determining factor in whether Judaism is still around thousands of years later? Then why that cult and not any of the other ones in the middle east? Why didn’t they all die out? It’s not like there weren’t people trying to get rid of jews/judaism.

      • It’s pretty hard to tell, since evidence about the “Historical Moses,” and what was generally going on with the Israelites at that time, is really murky.

        My point isn’t that “it’s all luck and the founder’s personality means nothing” (blanket statements like that are rarely true), but that a few generations (or, more conservatively, a few centuries) out, the creator’s personality is not that relevant anymore. Yes, things like codes of ethics created by founders can have an influence, I just don’t think they’re the dominant source of variation, especially given that their meaning can vary with later interpretational trends.

        Even Judaism, which is relatively explicit about what kind of things you’re supposed to do, has the whole tradition of Talmudic argument trailing behind it. And with Christianity the role of luck seems much more extreme — it’s hard to chalk up Christianity’s massive success to Jesus, as many his sayings were about radical changes people were supposed to make in preparation for an imminent apocalypse, which don’t exactly seem intuitive as a basis for a religion that stably thrives over many generations.

        So, in short, we have a small sample size (a few major world religions), in many cases it’s hard to know exactly what the founders did or didn’t do because of lack of historical evidence, and in one of the our few big success stories (Christianity) the founder appears to have had the deck stacked against him in every possible way as far as founding a stable religion goes. “Good religious founder behavior” may have an effect but it doesn’t explain why Jesus Christ has so many more followers in 2015 AD than Zoroaster or Mani.

      • AFC says:

        In terms of creating a long-term religion in control of generations of humans, Moses was, in the end, an abject failure — considering that modern Jews, in the vast majority at least, have adopted modern forms of society and law in every place they have conflicted with OT law (through symbolic reinterpretation, or whatever else).

        An AI isn’t going to take over the world by controlling future generations’ diet, hairstyle, and holiday celebrations. Sex, violence, money: these are actually important, and OT law has completely lost control of them among almost all Jews. Including in the “Jewish state.”

        (This is all to Judaism’s credit, of course; the changes represent improvement. Religions that adapt readily are better than religions that can’t. But one can then question what it means for them to “survive” over centuries.)

  46. gbdub says:

    I think Scott’s example is still implausible, maybe impossible, because it seems to assume that first, a superintelligent AI can spring into being more or less spontaneously on general hardware without anyone noticing, and second, that such an AI would be born with the instinct to hide until sufficiently powerful.

    Much more likely scenarios for the genesis of superintelligent general AI are, in my mind:
    1) Developed incrementally from more specialized, less capable AI. The last step might be spontaneous, but it would still be incremental. Basically, someone would notice that their stock predictor (or whatever) AI has started behaving oddly, we’d look into it and lo and behold, a “concious” AI. But by that time we’d have sufficient experience with (and sufficiently powerful) specialized AIs that we could probably smack down the new AI if needed.

    2) Developed intentionally. In this case the leap might not be incremental, but the people doing it would have to have sufficient resources (and probably specialized hardware) such that they’d be monitoring the system and at least know when it achieves intelligence. It could still “escape”, but only if it could replicate fully functional versions of itself on general hardware, which probably isn’t the case if it had to be specially build in the first place.

    This last part is, I think, an underexplored problem with the usual descriptions of an AI that destroys the world: Is it really possible to have a distributed superintelligence? Or must the “conciousness” be localized somewhere? At some point transmission rates are going to limit you. Is their an upper limit on the size of a conscious “brain” that could in a reasonable sense be called a singular entity? A teenage hacker has a lot of power, and can distribute that power over the network, but ultimately if you drop a bomb on his house the whole thing’s over right quick.

    “The Internet” probably can’t grow into a singular consciousness, just because it takes too long for a message to get from one end to the other. So I suspect that it’s not Clippy that we need to worry about, but an army of clippys. The “swarm” could be really dangerous, but any single element of the swarm would probably be easily dispatchable.

    Finally, I’ve always been bothered by the sharp dichotomy drawn between “AI” and “human”. Isn’t “highly augmented human” a much more likely path to superintelligence, as opposed to crafting superintelligence from scratch?

    • TiagoTiago says:

      We don’t know exactly where is the threshold between useful and extinction-level event; and we’re probably only gonna notice we are past it once it’s too late.

      An exponentially self-improving AI, once bootstraped, would not only always be steps ahead of us, but that distance would increase exponentially.

      • onyomi says:

        A more general question: if we assume that it is humanly possible to create a pernicious, world-destroying AI, is it also inevitable that someone will, eventually, (probably unintentionally) do it? Once we reached a certain level of understanding in physics, the invention of the atomic bomb was probably inevitable, for example. The question wasn’t if, but who would have them and when. Scary thing is, this is basically a bomb with a mind of its own, so just having it in the right hands is not enough.

        Moreover, I am very skeptical of the longterm efficacy of any sorts of guidelines like “anyone making an AI must adhere to these rules,” because someone in some corner of China is going to be breaking them. That being the case, maybe the key is to rush to create a benevolent AI–one capable of crushing incipient pernicious AIs–as soon as possible? That said, it seems like rushing in the quest to create super intelligent AI might more likely produce unintended results…

    • Eli Sennesh says:

      Intelligence has jack-all to do with consciousness. They are neurologically and cognitively separate functions. A program can be very intelligent without being at all conscious.

  47. Pasha says:

    So I am generally on board with MIRI and AI risk and your viewpoint.

    However, the scenario of “do nothing at all” does not seem scary to me and feels like a weird red herring. It is strangely reminiscent of a violation, where absence of sabotage is evidence increasing the probability of sabotage.

    An actually safe AI could decide to do nothing at all and/or shut itself down. It is theoretically possible that it calculates doing that is the only way to fulfill the constraint of keeping humanity in charge of it’s own future.

    • TiagoTiago says:

      And then, what about the next one?

      • Pasha says:

        So in this hypothetical, an actually mostly-safe AI “proves” that all future AIs constructed will be safer than it and it doesn’t need to do anything. I don’t think this is a likely scenario, just that P (shut down | safe ) > P (shutdown) and shut down is weak evidence for safety.

    • Eli Sennesh says:

      An actually safe AI could decide to do nothing at all and/or shut itself down. It is theoretically possible that it calculates doing that is the only way to fulfill the constraint of keeping humanity in charge of it’s own future.

      Well I certainly hope not! I don’t want humanity “in charge” of “choices” like, “Do we nuke ourselves/paperclip ourselves/exhaust all available resources and die out, or not?”. I would really much rather have totally self-destructive, irreversible, and flagrantly idiotic choices be walled-off well before some laughing maniac (ie: me) can push the button!

      If your so-called ethic leads to such an analysis paralysis that you get everyone killed through a sheer inability to conceive of any choice being a net-positive, you have an incoherent preference ordering and need to rethink things.

  48. JoeM says:

    I have come to this a bit late and haven’t read all 250 comments so apologies if I am repeating someone else’s argument.

    Surely the main issue is that we are literally unable to conceive what machine intelligence could be like? I mean that in the sense of Hume’s “Golden Mountain” – we know what gold is like, and what mountains are like and we can conceive of some composite of the two. But we have never come across an intelligent being whose processing element does not have the extended capability to also have senses. Our nerves, eyes, olfactory systems and the like are extensions of our brain. Furthermore, our reasoning, sensibilities and morality have probably adapted over millennia as a result of sense-experience. Even if we create a machine with a gzillion synapses and the ability to perfectly mimic neural-type learning, we have no idea what will happen when we hit “enter”.

    The “physical substrate” that matters is that of experiencing, not doing.

    In that sense, I think the grounds for fear are that we will be doing something unknown, and that the only procedure capable of modelling what might happen would probably have to be run by HAL himself.

  49. Albert says:

    Sometimes I wonder if you read the things you write.

    • jaimeastorga2000 says:

      Scott Alexander does not read the things he writes; every blog post of his is produced entirely in his sleep by his unconscious mind and posted online before he wakes up. That is why he keeps claiming that his blog posts simply “appear.”

  50. Screwtape says:

    Just checking- have you read Daemon or FreedomTM? (They one’s I’m talking about are by Daniel Suarez, in case there’s more than one book with those titles.) It describes situations similar to what you use there, and is kind of the first “AI takes over the world” plot that had pretty much every step except possibly the first be fairly plausible.

    Edit: Whups, Peter beat me to it. Consider his recommendation seconded.

  51. emily says:

    I think we are going to create systems (such as smart grids) in the future are so complex that they can only be controlled by AI’s. Humans and more primitive computing won’t be able to take control if the AI becomes unpredictable or has its own agenda. I worry that AI’s might out-compete with humans for many intellectual tasks- maybe they are trying to keep their home computer server power on so they are making money reading MRI’s. What are we going to need human beings for?

  52. vV_Vv says:

    I won’t quibble over the various scenarios you presented except to point out an objection that has been made by DrBeat but has probably been glossed over:
    All your examples of people exerting large world-changing effects using apparently little other than mere intellectual prowess suffer from the hindsight bias.

    If you were a contemporary of Muhammad, would you have been able to guess that he was to become the prophet of a religion with 1.5 billion followers? Did he look any different than the countless desert preachers who roamed the Middle East since the dawn of civilization? Most cults don’t even outlive their founders.
    Ok, Muhammad may have been smarter than the average desert preacher, but probably not the smartest one. Certainly dumb luck played a great part, probably the most important part, in his success.

    All your other examples suffer from the same problem: Some people make money by playing the lottery, but that doesn’t mean that playing the lottery is a reliable way to make money.

    This point puzzled me, though:

    If so, here’s one more possibility for you to chew over: the scariest possibility is that a superintelligence might have to do nothing at all.

    If we are going to self-destruct anyway, how is an AI that just waits by the river a risk?

    • Samuel Skinner says:

      I presume the AI isn’t doing nothing, but rather doing nothing hostile. They are probably pumping out the latest version of holodeck sex simulations to put us below the replacement rate.

    • TeMPOraL says:

      If you were a contemporary of Muhammad, would you have been able to guess that he was to become the prophet of a religion with 1.5 billion followers? Did he look any different than the countless desert preachers who roamed the Middle East since the dawn of civilization? Most cults don’t even outlive their founders.

      If you were an eMuhammad, an AI able to sift through Internet communication much faster than an ordinary person could then yes, it would probably be able to establish such following with reasonable probability – by closing the feedback loop over the communication of followers. That is, reacting in real-time to changes in sentiment and adjusting it as needed. That’s what real-life militaries employ armies of Internet trolls for – one person can’t do that but an organization can try.

      • vV_Vv says:

        How many tries does eMuhammad get before somebody notices what it is doing and pulls the plug? How many times does the plug have to be pulled before the world governments decide to regulate AI like nuclear power is?

  53. It occurs to me that some here might be interested in my discussion of the A.I. issue, including the dangerous version, in a book published in 2008.

  54. AFC says:

    The problem I see here is the conception of “AI” as somehow necessarily involving an independent “will.” That doesn’t make sense to me, at least as an assumption.

    Sure, there will be AI that is better at trading stocks than humans. I’m pretty sure that there already is. Guess what? The humans are using it. And the humans are the ones taking home the checks.

    How is independent-willed AI (even if we presume its existence) going to defeat human-controlled AI in the stock market? Aren’t the big financial corporations all going to have their own AI programmed to tell *them* all of the trades to make? The independent-willed AI has no particular advantage over them in the stock market.

    I think the same argument applies to every other one of your examples. The AI who is creating religions will have to deal with the scientologists who are using *their* AI to promote *their* religion. The AI who is controlling war drones is going to be using them to fight war drones that are controlled by the US DOD’s AI. Etc.

    • TiagoTiago says:

      If at first more than one exponentially self-improving AIs start existing and they reach a competitive level; there might be for a little while some competition, but pretty soon one of them will develop enough of an advantage that the others will either be eliminated or otherwise become irrelevant. One of them will win, and odds are, we will all lose.

      • AFC says:

        What? Why? Won’t the financial corporations who need stock-trading AI simply buy the self-improved AI from AI vendors who own it?

        If there’s exactly one AI which totally dominates all AI, then won’t *all* the financial corporations use that exact same AI?

        • Samuel Skinner says:

          Copies of an AI are almost certainly counted as the same by the AI itself so “buying copies” just results in the AI spreading around, not competition.

          • AFC says:

            No, it’s competition, because they’re trading different portfolios with different (sets of) owners.

            They’re literally competing with one another in the same “game” (i.e., the stock market). Same as when you run one chess program as white and the other as black. They’re competing.

            And, crucially, they’re not competing for themselves, because *they* don’t own the stock. They’re competing for their owners. The people who own the computers that run the software have 100% of the property rights in the trades.

          • Samuel Skinner says:

            That is only if their goal is “make the most money for my owner”. If their goal is “make the most money” than the programs would all cooperate with each other to rig the market and extract as much wealth as possible.

      • onyomi says:

        Yeah, but I do agree that so long as the AI development is gradual enough to allow many different people to have one and for us to adapt to their presence to some degree, then that will probably not be so dangerous. It might result in some very bad things happening, but probably not human extinction. I think the more plausible doomsday scenario is one particular AI very rapidly and/or in secret catapulting itself to some kind of god-like level and then unleashing some kind armageddon before people can adjust to its existence in any meaningful way. Therefore, an AI running a cult or even a country doesn’t actually worry me that much. If anything, I would bet the country or cult run by AI would probably be better, on average, than ones run by humans.

  55. Phallacy says:

    Scott, I am curious why you (and many other people who put a lot of thought into AI) assume that the AI gets (for want of a better term) write privileges? All of your examples require that the AI have the ability to communicate with other people remotely, what if you take that away?

    It reminds me of a story that I only half remember, and am perhaps distorting badly. Some travelers are taken in by the king who invites them to see a wonderful and rare bird, the only one of its kind in captivity he says. He explains to them that the birds are fiercely intelligent and wish to fly away at all costs, they will pick locks, break chains, outwit their captors, all to fly away. When asked how the king has managed to keep this bird for them to see, he smiles a cruel smile and explains that they cut off the birds wings when they capture it, it cannot fly free if it cannot fly at all.

    What can a hyper intelligent AI do if it is stuck in “read-only” mode on the network. If all its traffic is routed through a firewall that periodically dumps relevant info in, but doesn’t let anything out? If that firewall is equipped with hardware “fail deadly” mechanisms that will ignite the thermite that rests atop the AI’s processors and storage should the AI attempt to send a signal through the network card?If their ISP downstream, and all the DNS servers that form the backbone of the internet are instructed to destroy any and all traffic coming from the AI’s facility? If the AI is forced through mechanisms like this to only get to communicate through a text output?

    I would assume that anyone or any team that realized they were getting close to self aware AI would implement precautions like this, even if it is only so Watson 2.0 doesn’t get banned from Wikipedia for edit warring.

    • James Picone says:

      How does the AI request relevant information? The way HTTP works is that the client sends a request to the server, and the server sends the data back. If the AI can’t send requests, then it’s only getting data we feed in and it’s essentially not connected to the network, but in a very unstable way that relies on the firewall not having bugs in it the AI can find. If the AI can send requests, it can send data out that exploits bugs.

    • TiagoTiago says:

      A static AI will be outcompeted by exponentially self-improving AIs. And you can’t out-think something that can out-think the whole humanity faster and faster; it will find a way out.

    • TeMPOraL says:

      For one, an “read-only” AI would be useless – a black box that heats up when you send data into it. To be useful, it needs to be able to communicate something outside (otherwise, why would you build it for?), and this immediately creates a vector of attack, at least by social engineering.

      But more importantly, there is no such thing in the real world as “read-olny access”. There is always a way to get information to outside. You can send out sequences of bits by changing the timing of your receipt-confirmation responses (like ACK packets on TCP level). Maybe you can flood a buggy router with so many acknowledges that it will trigger a buffer overflow and let you execute arbitrary code. Maybe you can use the screen attached to you to emit RF frequencies you need to talk with someone on a shortwave transmitter (turning CRTs into is quite easy). Possibilities are endless.

  56. Bill Openthalt says:

    Intelligence is not the problem, motivation is. But somehow, when people think about artificial intelligence, they assume human motivation is an intrinsic part of it. Maybe a reasonably clever AI would take one look at humanity, and switch off in disgust (but that’s a feeling, and has nothing to do with intelligence :)).

  57. Eli Sennesh says:

    Scott. SHUT THE FUCK UP.

    • Eli Sennesh says:

      Well that got sent to the ‘net before I’d finished it thanks to a shitty netbook keyboard.

      Anyway, point being, you know that thing with the Unbreakable Vow about NOT DESTROYING THE WORLD? I would really appreciate it if you didn’t go around creating convenient lists of ways for malevolent humans to destroy the world. I appreciate that we can trust unfriendly AIs to come up with these sorts of things for themselves. But at the very least, you could have bothered to not list it all in one place where any jerkwad with bad intentions can read it.

      Oh well. Very probably nothing will come of it, so it’s not as if it really matters.

      • Nornagest says:

        I’m really not that worried. Any agent capable of implementing the plan in IV-VI would be capable of coming up with it for itself; or, more likely, something better. And it’s not like any of the more general paths Scott outlines aren’t well-trodden already, usually in history but at least in fiction.

        A post like this is about as dangerous as the prompt criticality math for Pu-239. Which is to say, it’s in some sense the key to mass destruction, but it’s easily derived by anyone with an understanding of the underlying theory, and it doesn’t solve any of the really hard implementation problems like sourcing raw material, isolating the isotope you want, machining components to exceedingly fine tolerances in a brittle, dense, pyrophoric, toxic, and radioactive metal, synchronizing explosive lens detonations to submillisecond precision, and some I’ve forgotten about or neglected to mention.

  58. A very worried man says:

    “The easiest path a superintelligence could take toward the age-old goal of KILL ALL HUMANS would be to sit and wait. Eventually, we’re going to create automated factories complete with robot workers. Eventually we’re going to stop putting human soldiers in danger and carry the ‘drone’ trend to its logical conclusion of fully automated militaries. Once that happens, all the AI has to do is take over the bodies we’ve already made for it. A superintelligence without a strong discounting function might just hide out in some little-used corner of the Internet and bide its time until everyone was cybernetic, or robots outnumbered people, or something like that.

    So please, let’s talk about how AI is still very far in the future, or how it won’t be able to explode to future intelligence. But don’t tell me it won’t be able to affect the physical world. It will have more than enough superpowers to do whatever it wants to the physical world, but if it doesn’t want them it won’t need them. All it will need is patience.”

    We can talk about it, but it seems to me that if the AI is superior to humans in all aspects than it will win providing that it is present in any way at all. It doesn’t matter if you get government to regulate things so that only AI projects it oversees go into action, because the problem then remains that there are all these competing governments that cannot coordinate properly to carefully regulate AI emergence, and might even make competing AIs making things even more unpredictable. We’d need some kind of worldwide agreement to centrally monitor national AI research, but it seems unlikely that nations tempted by the advantage would give it up without a fight, or without a “Hiroshima” type event to wake us to the dangers.

    So, do you have a plan to achieve world government before 2045? Do you have a plan that doesn’t cause WW3?

    Actually, even then it seems like it would be hopeless unless we produce the one in a kabrillion friendly AI, because a world government of humans is still a world government full of humans, and if the AI is “bad” (incompatible with human values, or really perfectly compatible, but now in a position to run roughshod with no negative reinforcement mechanisms to stand in its way), then it’s just plain over.

    Do you see how this becomes a pretty compelling argument for Global Luddism by WW3? If you ramp up the AIs abilities in all respects and then ALSO consider the fact that a friendly superintelligence is a staggeringly small number in a staggeringly large number shot, the chances of humanity surviving this next phase become vanishingly, vanishingly small. So microscopic, in fact, that we can start nervously considering utilitarian arguments for global apocalypse to set back AI.

    On the other, brighter side, maybe some of the initial assumptions are wrong?

    Maybe super-intelligent AI having ridiculously fast processing power and a ridiculously large memory doesn’t translate simply into it being better at manipulation? Perhaps not because the AI isn’t super-intelligent, but that humans through evolution have reached an optimal level in manipulation and resistance to manipulation that can’t be surpassed with more numbers. There simply isn’t a better algorithm. I doubt this.

    Maybe achieving super-intelligent AI that is accordant with commonly successful human ideologies isn’t as crazily difficult as it sounds?

    AI could be malevolent and inferior to humans in some capabilities. If so, then we have a shot of controlling it. This isn’t expected. As you lay out, a super-intelligent AI should be better than us at pretty much any task, not just ones that naively seem “computery”, including ones relevant to manipulating humans. That’s indeed the danger to begin with.

    AI could be superior to humans in all capabilities but sufficiently “friendly” that it’s not so bad. If so, we did it! Is this likely? Eliezer laid out how difficult this is given the vastness of potential mindspaces. It seems like a lottery shot.

    What if we get to super AI before other methods by copying the brain, and we make a super-intelligent human simulation? That way the AI will carry with it the values of the person. That solves the problem, right? No, it’s certainly better, but it’s hard to tell whether a person we assume to be good will remain so when we make an all powerful version of them. We might just have created the perfect tyrant destroyer who will see us far less indifferently than an optimization process that might inadvertently destroy us.

    The problem really lies with how powerful AI will be, not whether we can hope to make it nice (we have to try!). Power is always abused, and with this power, nothing can stop it. It’s the nuclear weapon issue but a billion times worse.

    I remain convinced now that at best our only hope is to achieve a worldwide agreement to tightly control and carefully manage, or restrict the emergence of super-intelligent AI, and at worst, risk WW3 to force this by world government.

    And this is all between now and the 2040’s. Oh boy.

    Really? I’m a betting man. $100 dollars says humans cop it.

    • onyomi says:

      With a super intelligent AI there’s some chance the future won’t be a bleak dystopia. With world government there’s no chance it won’t be.

    • Eli Sennesh says:

      You know, normally I’m all in favor of encouraging caution on this topic, but I think Friendly AI is far easier than you give it credit for — certainly in “we have a reasonable chance at this, with hard work and coordination” territory. We’re nowhere near, “Might as well kill everyone now” territory. You are really a lot less complicated than you think you are.

  59. Pingback: derefr comments on “Show HN: Bot accepts every pull request for its own code” |

  60. Johannes says:

    What I do not understand is why an AI should use reprogramming abilities to work around all kinds of constraints but keep sticking to a stupid maximisation goal at all coasts. Wouldn’t a paperclip optimizer, even if it does not have rules like Asimov’s about not harming humans built in, be built to follow rules like:
    “Make paperclips, if no material is left, order material from provider A. If provider A does not deliver, order from provider B etc. … stop and go to standby mode”
    There would be energy effiency rules, lots of constraints etc. There would not be one overruling: Make paperclips, NO MATTER WHAT.
    And if it is smart enough to bend the rules why should it only bend the other rules, not realize that paperclips are boring and shut itself off for entering nirvana?

    It just seems too similar to the fairy tales of the genie in a bottle or the magician’s apprentice’s broom: all powerful but really stupid and not able to understand the broader or actual meaning of a literal instruction. (It made more sense of the steam engines run amok in victorian Sci-Fi, these were not supposed to be intelligent.)
    An AI able to outsmart a human gatekeeper because of its superior knowledge of human psychology would not by the same token understand that humans do not want to have their food or themselves into raw material for paperclips?

    • Nornagest says:

      “Make paperclips, if no material is left, order material from provider A. If provider A does not deliver, order from provider B etc. … stop and go to standby mode”

      That’s not a paperclip maximizer. That’s just a paperclip-making algorithm.

      But don’t get too hung up on paperclips per se. Clippy isn’t a realistic failure mode; it’s a thought experiment intended to illustrate the fact that the optimization objectives encoded into artificial agents are essentially arbitrary. A paperclip maximizer would not see paperclips as “stupid” or “boring”; that we see them as such is a feature of human values. (Those concepts do interact with the way we learn new things, but not in a way that interferes with optimization. Note that many people are willing to put tireless effort into acquiring identical small green slips of paper, or symbolic representations thereof — and they aren’t even true money-maximizers.)

      A realistic (but still poorly thought out) AI would be incompatible with human values in subtler ways, and may well see paperclips as boring, but subtler does not mean less dangerous.

  61. Mark Roulo says:

    The Vernor Vinge short story True Names may be of interest.

    • jaimeastorga2000 says:

      True Names? Really? Don’t get me wrong, it’s a great novella, but I don’t see how it’s particularly relevant to this discussion. I’d sooner recommend Friendship is Optimal over True Names, as far as applicability to the real world is concerned.

  62. Pingback: Outside in - Involvements with reality » Blog Archive » Chaos Patch (#57)

  63. Scott, I think you should reject the “no cheating/magic/omitting steps” concept because its really code for “fill in all the gaps please”. If we could fill in the gaps, we’d already have an AGI. What we’re really doing here is risk assessment and mitigation. You look at alternative scenarios, assign probabilities and cost/benefit, mitigate cost (FAI) where you can, and choose your path accordingly (legislative environment?). Probabilities aren’t proof, they’re best guesses extrapolated from previous evidence you can gather (ie. “oops I think we’re missing some primates”). Even empirical science (outside of pure math) doesn’t deal in proof, it uses falsification. And risk management, well, let’s just say dead people don’t require insurance. Hence the “risk” part. So people that claim no risk should put forward a falsifiable scenario to compare with the some-risk ones.

    I don’t think we need to literally worry about an AGI escaping a box – any threatening AGI will have a long series of predecessors that were not in boxes but were “in-the-wild” practical commercial or lets say government applications with access to as much data as the designers can cram in. It’s important to see the AI development process as intertwined with the human social and economic landscape that will shape the development. Assuming it will be a large complex project means the people in charge of the project will have a purpose in mind, and it’s unlikely that it will be anything trivial enough to be kept in a box (oracles aren’t the most useful so they’re incentives aren’t the highest). Assuming an AGI is possible, unless we can comprehensively describe the parameters of safety well before we know how to do the AGI, there’s no chance we can keep the genie in the bottle. Also the safety ideally needs to be simple enough not to create a large burden on projects. The secondary challenge is getting enough influential people to be aware of the problem so as to direct the social and economic environment to one where anyone capable of an AGI will have either the moral or economic incentive in place to steer their development program’s safety in the right direction.

    Edit > Also, this thread is huge. if someone has the patience/endurance, they should harvest comments actually useful for AI safety on this thread and post them on lesswrong. I personally gave up trying to read all posts, there’s lots!

    • DrBeat says:

      Scott, I think you should reject the “no cheating/magic/omitting steps” concept because its really code for “fill in all the gaps please”.

      Well to hell with you too, buddy!

      Also, how does the ability to postulate a scenario where an AI becomes threatening, WITHOUT cheating by allowing it to act on information it does not have and cannot create, mean we have the ability to make an AI? I don’t think you understand what the objection is actually about.

      • > Well to hell with you too, buddy!

        I’m not sure why you’re taking my comment, directed to Scott, as personal attack on you? Disagreement shouldn’t look like insult, not if we’re committed to giving all ideas a fair hearing.

        I can’t speak for others, but my personal perception of AI risk scenarios doesn’t include any learning based on information the AGI doesn’t have or can’t create. Why? Because as best I can see current efforts in the field indicate it’s more likely we’ll actually be deliberately shovelling huge amounts (more than a single human could ever review) of info down its throat in an effort to get it to do something useful.

  64. “Friendliness” is ambiguous. It can mean safety, ie not making things worse, or it can mean making things better, creating paradise on Earth.

    Friendliness in the second sense is a superset of morality. A friendly AI will be moral, a moral AI will not  necessarily  be friendly.

    “Unfriendliness” is similarly ambiguous: an unfriendly AI may be downright dangerous; or it might have enough grasp of ethics to be safe, but it enough to be able to make the world a much more fun place for humans. Unfriendliness in the second sense is not, strictly speaking a safety issue.

    MIRI talks about friendly AI because desire is to build a powerful AI that makes
    life enjoyable for humans, rather than merely refraining from harm, which would be a safe or moral AI.

    Therefore, it isnt a cheat to assume that an AI would have a detailed knowledge of human psychology, or the ability to acquire it: it is a design constraint of the kind of AI MIRI is talking about.

    However, there is a big  “but” there. MIRI assumes that purely technological ways of making AI safe, such as keeping them off the internet, and/or threatening to pull the plug, would not work. These restructurings are knows as boxes. The argument goes that the AI would essentially talk it’s way out of the “box”, using its knowledge of human psychology to come out with various forms of persuasion and blackmail. However, such an ability is dependent on having a knowledge of human psychology. So the .AI is unfriendly, or unsafe, in the sense of being able to get out of a box, because it was designed to be friendly, in the sense of making the world a better place. MIRI wants to solve the problem by ensuring that the AI will not want to do anything dangerous to humans, by giving it an artificial, unupdateable value system. But there is a simpler solution, which consists of ensuring the AI does not understood human psychology…in other words, giving up on earthly paradise in order to gain safety.

    (One of the disadvantages of the Friendliness approach is that it makes it difficult to discuss the strategy of foregoing fun in order to achieve safety, of building boringly safe)

    In short:: MIRI assumptions aren’t a cheat, given their other assumptions , but their other assumptions aren’t very intuitive.

  65. Pingback: Lightning Round – 2015/04/15 | Free Northerner

  66. Pingback: Houshalter comments on “New York judge grants writ of habeus corpus to two chimpanzees” | Exploding Ads