SELF-RECOMMENDING!

Book Review: Reframing Superintelligence

Ten years ago, everyone was talking about superintelligence, the singularity, the robot apocalypse. What happened?

I think the main answer is: the field matured. Why isn’t everyone talking about nuclear security, biodefense, or counterterrorism? Because there are already competent institutions working on those problems, and people who are worried about them don’t feel the need to take their case directly to the public. The past ten years have seen AI goal alignment reach that level of maturity too. There are all sorts of new research labs, think tanks, and companies working on it – the Center For Human-Compatible AI at UC Berkeley, OpenAI, Ought, the Center For The Governance Of AI at Oxford, the Leverhulme Center For The Future Of Intelligence at Cambridge, etc. Like every field, it could still use more funding and talent. But it’s at a point where academic respectability trades off against public awareness at a rate where webzine articles saying CARE ABOUT THIS OR YOU WILL DEFINITELY DIE are less helpful.

One unhappy consequence of this happy state of affairs is that it’s harder to keep up with the field. In 2014, Nick Bostrom wrote Superintelligence: Paths, Dangers, Strategies, giving a readable overview of what everyone was thinking up to that point. Since then, things have been less public-facing, less readable, and more likely to be published in dense papers with a lot of mathematical notation. They’ve also been – no offense to everyone working on this – less revolutionary and less interesting.

This is one reason I was glad to come across Reframing Superintelligence: Comprehensive AI Services As General Intelligence by Eric Drexler, a researcher who works alongside Bostrom at Oxford’s Future of Humanity Institute. This 200 page report is not quite as readable as Superintelligence; its highly-structured outline form belies the fact that all of its claims start sounding the same after a while. But it’s five years more recent, and presents a very different vision of how future AI might look.

Drexler asks: what if future AI looks a lot like current AI, but better?

For example, take Google Translate. A future superintelligent Google Translate would be able to translate texts faster and better than any human translator, capturing subtleties of language beyond what even a native speaker could pick up. It might be able to understand hundreds of languages, handle complicated multilingual puns with ease, do all sorts of amazing things. But in the end, it would just be a translation app. It wouldn’t want to take over the world. It wouldn’t even “want” to become better at translating than it was already. It would just translate stuff really well.

The future could contain a vast ecosystem of these superintelligent services before any superintelligent agents arrive. It could have media services that can write books or generate movies to fit your personal tastes. It could have invention services that can design faster cars, safer rockets, and environmentally friendly power plants. It could have strategy services that can run presidential campaigns, steer Fortune 500 companies, and advise governments. All of them would be far more effective than any human at performing their given task. But you couldn’t ask the presidential-campaign-running service to design a rocket any more than you could ask Photoshop to run a spreadsheet.

In this future, our AI technology would have taken the same path as our physical technology. The human body can run fast, lift weights, and fight off enemies. But the automobile, crane, and gun are three different machines. Evolution had to cram running-ability, lifting-ability, and fighting-ability into the same body, but humans had more options and were able to do better by separating them out. In the same way, evolution had to cram book-writing, technology-inventing, and strategic-planning into the same kind of intelligence – an intelligence that also has associated goals and drives. But humans don’t have to do that, and we probably won’t. We’re not doing it today in 2019, when Google Translate and AlphaGo are two different AIs; there’s no reason to write a single AI that both translates languages and plays Go. And we probably won’t do it in the superintelligent future either. Any assumption that we will is based more on anthropomorphism than on a true understanding of intelligence.

These superintelligent services would be safer than general-purpose superintelligent agents. General-purpose superintelligent agents (from here on: agents) would need a human-like structure of goals and desires to operate independently in the world; Bostrom has explained ways this is likely to go wrong. AI services would just sit around algorithmically mapping inputs to outputs in a specific domain.

Superintelligent services would not self-improve. You could build an AI researching service – or, more likely, several different services to help with several different aspects of AI research – but each of them would just be good at solving certain AI research problems. It would still take human researchers to apply their insights and actually build something new. In theory you might be able to automate every single part of AI research, but it would be a weird idiosyncratic project that wouldn’t be anybody’s first choice.

Most important, superintelligent services could help keep the world safe from less benevolent AIs. Drexler agrees that a self-improving general purpose AI agent is possible, and assumes someone will build one eventually, if only for the lulz. He agrees this could go about the way Bostrom expects it to go, ie very badly. But he hopes that there will be a robust ecosystem of AI services active by then, giving humans superintelligent help in containing rogue AIs. Superintelligent anomaly detectors might be able to notice rogue agents causing trouble, superintelligent strategic planners might be able to develop plans for getting rid of them, and superintelligent military research AIs might be able to create weapons capable of fighting them off.

Drexler therefore does not completely dismiss Bostromian disaster scenarios, but thinks we should concentrate on the relatively mild failure modes of superintelligent AI services. These may involve normal bugs, where the AI has aberrant behaviors that don’t get caught in testing and cause a plane crash or something, but not the unsolveable catastrophes of the Bostromian paradigm. Drexler is more concerned about potential misuse by human actors – either illegal use by criminals and enemy militaries, or antisocial use to create things like an infinitely-addictive super-Facebook. He doesn’t devote a lot of space to these, and it looks like he hopes these can be dealt with through the usual processes, or by prosocial actors with superintelligent services on their side (thirty years from now, maybe people will say “it takes a good guy with an AI to stop a bad guy with an AI”).

This segues nicely into some similar concerns that OpenAI researcher Paul Christiano has brought up. He worries that AI services will be naturally better at satisfying objective criteria than at “making the world better” in some vague sense. Tasks like “maximize clicks to this site” or “maximize profits from this corporation” are objective criteria; tasks like “provide real value to users of this site instead of just clickbait” or “have this corporation act in a socially responsible way” are vague. That means AI may asymmetrically empower some of the worst tedencies in our society without giving a corresponding power increase to normal people just trying to live enjoyable lives. In his model, one of the tasks of AI safety research is to get AIs to be as good at optimizing vague prosocial tasks as they will naturally be at optimizing the bottom line. Drexler doesn’t specifically discuss this in Reframing Superintelligence, but it seems to fit the spirit of the kind of thing he’s concerned about.

II.

I’m not sure how much of the AI alignment community is thinking in a Drexlerian vs. a Bostromian way, or whether that is even a real dichotomy that a knowledgeable person would talk about. I know there are still some people who are very concerned that even programs that seem to be innocent superintelligent services will be able to self-improve, develop misaligned goals, and cause catastrophes. I got to talk to Dr. Drexler a few years ago about some of this (although I hadn’t read the book at the time, didn’t understand the ideas very well, and probably made a fool of myself); at the time, he said that his work was getting a mixed reception. And there are still a few issues that confuse me.

First, many tasks require general intelligence. For example, an AI operating in a domain with few past examples (eg planning defense against a nuclear attack) will not be able to use modern training paradigms. When humans work on these domains, they use something like common sense, which is presumably the sort of thing we have because we understand thousands of different domains from gardening to ballistics and this gives us a basic sense of how the world works in general. Drexler agrees that we will want AIs with domain-general knowledge that cannot be instilled by training, but he argues that this is still “a service”. He agrees these tasks may require AI architectures different from any that currently exist, with relatively complete world-models, multi-domain reasoning abilities, and the ability to learn “on the fly” – but he doesn’t believe those architectures will need to be agents. Is he right?

Second, is it easier to train services or agents? Suppose you want a good multi-domain reasoner that can help you navigate a complex world. One proposal is to create AIs that train themselves to excel in world simulations the same way AlphaGo trained itself to excel in simulated games of Go against itself. This sounds a little like the evolutionary process that created humans, and agent-like drives might be a natural thing to come out of this process. If agents were easier to “evolve” than services, agentic AI might arise at an earlier stage, either because designers don’t see a problem with it or because they don’t realize it is agentic in the relevant sese.

Third, how difficult is it to separate agency from cognition? Natural intelligences use “active sampling” strategies at levels as basic as sensory perception, deciding how to direct attention in order to best achieve their goals. At higher levels, they decide things like which books to read, whose advice to seek out, or what subdomain of the problem to evaluate first. So far AIs have managed to address even very difficult problems without doing this in an agentic way. Can this continue forever? Or will there be some point at which intelligences with this ability outperform those without it.

I think Drexler’s basic insight is that Bostromian agents need to be really different from our current paradigm to do any of the things Bostrom predicts. A paperclip maximizer built on current technology would have to eat gigabytes of training data about various ways people have tried to get paperclips in the past so it can build a model that lets it predict what works. It would build the model on its actually-existing hardware (not an agent that could adapt to much better hardware or change its hardware whenever convenient). The model would have a superintelligent understanding of the principles that had guided some things to succeed or fail in the training data, but wouldn’t be able to go far beyond them into completely new out-of-the-box strategies. It would then output some of those plans to a human, who would look them over and make paperclips 10% more effectively.

The very fact that this is less effective than the Bostromian agent suggests there will be pressure to build the Bostromian agent eventually (Drexler disagrees with this, but I don’t understand why). But this will be a very different project from AI the way it currently exists, and if AI the way it currently exists can be extended all the way to superintelligence, that would give us a way to deal with hostile superintelligences in the future.

III.

All of this seems kind of common sense to me now. This is worrying, because I didn’t think of any of it when I read Superintelligence in 2014.

I asked readers to tell me if there was any past discussion of this. Many people brought up Robin Hanson’s arguments, which match the “ecosystem of many AIs” part of Drexler’s criticisms but don’t focus as much on services vs. agents. Other people brought up discussion under the heading of Tool AI. Combine those two strains of thought, and you more or less have Drexler’s thesis, minus some polish. I read some of these discussions, but I think I failed to really understand them at the time. Maybe I failed to combine them, focused too much on the idea of an Oracle AI, and missed the idea of an ecosystem of services. Or maybe it all just seemed too abstract and arbitrary when I had fewer examples of real AI systems to think about.

I’ve sent this post by a couple of other people, who push back against it. They say they still think Bostrom was right on the merits and superintelligent agents are more likely than superintelligent services. Many brought up Gwern’s essay on why tool AIs are likely to turn into agent AIs and this post by Eliezer Yudkowsky on the same topic – I should probably reread these, reread Drexler’s counterarguments, and get a better understanding. For now I don’t think I have much of a conclusion either way. But I think I made a mistake of creativity in not generating or understanding Drexler’s position earlier, which makes me more concerned about how many other things I might be missing.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

256 Responses to Book Review: Reframing Superintelligence

  1. MissingNo says:

    Tired: There is less talk about it because good systems have gone in place and lots of people have thought of good approaches.

    Wired: Everyone working on the problem has become a lot more fatalistic now and most people who previously were singularitarians now believe this whole deep-world reality has been ran by an AI and whatugonnadoaboutii

  2. Dave Orr says:

    I think Robin Hanson made some similar points back when he was debating the foom scenario with Eliezer — iirc he said that he expected many small modular AIs rather than one singular superintelligent AI. That sounds very similar to what it sounds like Drexler is saying.

    I myself have been making this point in discussions at Google, but not in any place I can point to publicly.

    • Scott Alexander says:

      Thanks! I was aware of Hanson’s comments but I don’t think I successfully linked them to Drexler’s. I’ve linked them above. I don’t see Hanson as making the point that the small AIs wouldn’t be agents, but maybe I missed that too.

  3. Jeltz says:

    Robin Hanson was arguing something similar to this back in 2014: http://www.overcomingbias.com/2014/07/30855.html

  4. suntzuanime says:

    I see this critique as being similar to the arguments of the people who said “let’s just build a superintelligent oracle AI that answers questions and doesn’t try to take over the world at all”. Which was a common strain of pushback which you probably remember, though I’m not going to go digging trying to find a link that isn’t just about Oracle the company building regular-ass AI.

    • Dacyn says:

      When Bostrom talks about Oracle AI he is like “the AI will be an agent, but we have to trap it inside the box so that nothing goes wrong, e.g. by limiting the number of bits it can tell us”. Whereas tool/service AI is more about “the AI won’t be an agent at all, so we don’t have to trap it”. So I don’t see the two viewpoints as being similar at all.

      Though many critiques of tool/service AI do seem to assume that they are the same, and base their arguments off of the assumption that tool/service AI will be an “agent trapped in a box” without actually arguing why this is the case.

      • kenny says:

        I think the idea that (sufficiently advanced?) tool/service AIs are, ultimately, equivalent to agent AIs mostly comes down to tool/service AIs influencing our behavior and thus affecting the world. The danger is in the feedback between an AIs output and any new input on which it’s trained.

        Most current AI systems don’t seem to involve ‘online learning’, i.e. they are trained first on some set of data and then the trained AI is used ‘online’ for some purpose. But there are already online learning algorithms and it’s reasonable to expect the use of similar strategies to grow. To use an example Scott mentions, at some point Google might be training its Translate AIs continuously with new conversations.

        The dangerous elements, in my mind, that would make a tool/service AI (more) clearly an agent AI are: (1) online learning; and (2) a sufficiently sophisticated domain model that includes the AI itself.

        Given the above, I’d expect financial trading AIs to be the first to ‘evolve’ into becoming agent AIs as financial companies are already using online learning algorithms and I’d expect them to use multiple AIs concurrently, if they’re not already, and that seems likely to produce coordination at some level, and thus eventually an AI modeling itself.

        • Dacyn says:

          I don’t really see cooperation/coordination as being about people modelling each other, it seems more about stuff like finding Schelling points and trying to jointly formulate plans. (Conversely, conflict strategies can be thought of more as attempts to guarantee robustness against possible opponent strategies than about modelling.) One could almost say that the agents’ “models” of each other are just a data set that is shared by them. And with one agent there is no need for such sharing, and so such “modelling” would be indistinguishable from the agent’s regular thought processes. This seems pretty different from the idea of an agent “trying to model itself” (which I am not sure is a coherent concept anyway)

          Although I agree that online learning is a risk factor, I think the more significant one would be learning outside the intended domain, for example modelling the world to the level of human psychology. And this doesn’t seem likely to me — yes you could argue that a financial AI is dealing with human psychology at some level since psychology is an input to markets, but it doesn’t really seem to be analyzing it at the level of detail that would be necessary to e.g. win the AI box experiment.

        • davidscottkrueger says:

          So it seems that we actually have a lot more control over the incentives of an AI system than your comments suggest. In fact, it seems like we can build AI systems that just don’t care about influencing the world, e.g. because they are myopic. This is a topic I’m researching in my PhD. Merely having feedback from the world and having the ability to influence the world to not automatically make one want to influence the world in order to get better feedback in the (far) future.
          I do think we should consider the possibility of agentyness emerging in systems that don’t seem like, and weren’t intended to be agents. There are even good reasons to be worried about this, but at this point, it’s an extremely important hypothetical (only).

  5. doubleunplussed says:

    One of the “dumb” points in the satirical ArXiv paper seems to be close to this argument:

    The Meaninglessness of “Human-Level Largeness”
    One simple reason that we can reject predictions of supersized machines is that these predictions are not in fact well-formed.
    The term “supersized machine” implies a machine that has crossed some threshold, which is often denoted “human-level largeness.” However, it is not clear what “human-level largeness” could refer to. Has a machine achieved human-level largeness if it has the same height as the average human? If it has the same volume? The same weight? Or some more complex trait, perhaps the logarithm of girth multiplied by the square of height?
    When one begins to consider these questions, one quickly concludes that there are an infinite number of metrics that could be used to measure largeness, and that people who speak of “supersized machines” do not have a particular metric in mind. Surely, then, any future machine will be larger than humans on some metrics and smaller than humans on others, just as they are today.
    One might say, to borrow Wolfgang Pauli’s famous phrase, that predictions of supersized machines are “not even wrong” (Peierls, 1960).

    • Reasoner says:

      If you’re trying to fit a machine which is larger than a human through a long human-sized tunnel, maybe it makes sense to talk about machine size in a way that’s higher resolution than just “is it bigger than a human or not”.

      • Doctor Mist says:

        Reasoner-

        But how does that analogy map onto the question of intelligence?

        • Reasoner says:

          “Largeness” corresponds to a basket of traits including length, width, height. You can have one without having the others. “Intelligence” is the same way. Yes, you could have something which is more “intelligent” along all the dimensions of interest. It’s not impossible. But it could be less desirable than something which is superhuman along just a few dimensions, in the same way a big long object which can fit through a human-sized tunnel could be more desirable than an object which is large on every dimension and can’t fit through.

  6. Said Achmiz says:

    Scott, this issue (which Less Wrong & co. more or less consistently refer to as the question of “tool AI” vs. “agent AI”) has been discussed extensively for many years. Are you unaware of this, or do you think that all of that discussion is entirely irrelevant? (And if the latter—why do you think so?)

    Here’s gwern’s writeup: “Why Tool AIs Want to Be Agent AIs: The Power of Agency”

    And if you search on Less Wrong for “tool AI”, you get a bunch of posts—by Eliezer, Stuart Armstrong, Katja Grace, and others—dating as far back as 2012.

    I think that any discussion of this topic has to address what’s been talked about in the sources I just linked.

    EDIT: This recent (2019) Less Wrong comment provides some more links on this subject.

    • BlindKungFuMaster says:

      What Said said.

      To me this seems to always have been part of the discussion and the major counterargument against the AI-apocalypse. My take is that as soon as a tool becomes self-referential it becomes an agent, even if it just maps input to output and doesn’t act beyond providing the output to humans. And there are probably areas where a tool has to be self-referential to work well. Basically any area where the tool’s output changes the future data the tool will receive.

      • Reasoner says:

        My take is that as soon as a tool becomes self-referential it becomes an agent, even if it just maps input to output and doesn’t act beyond providing the output to humans.

        Using the word agent sneaks in the connotation that the system has a utility function over the real world, but I don’t see why self-reference would automatically result in that.

        • Doctor Mist says:

          I guess the scenario is that it is given self-reference because that’s relevant to the problem space. For instance, if you want it to recommend actions related to investing, or to war, then the fact that you have access to a super-smart tool becomes an important part of the analysis. So its advice about the best course might be predicated on the assumption that it will continue to exist — not that it would care, exactly, but any course that involved losing the tool would be inferior according to your utility function — and thus its advice would include steps to ensure that the tool does continue to exist. Even this doesn’t quite reach the point we would call “agency”, but it’s starting to smell kind of similar.

        • Reasoner says:

          Fair enough. I’m definitely interested in hearing stories about how unwanted side effects could happen. But I’m not convinced they are inevitable.

        • BlindKungFuMaster says:

          Imagine an artificial neocortex that gets weather data as input all around the world and learns a very complex model of how the climate works. The system is set up so that it outputs precise predictions about the weather and longer term climate change.

          Now, it detects some methane something something and predicts that the average temperature will rapidly rise by 10 degree. What instead happens is that humans take countermeasures and the temperature doesn’t rise. Now the model has learned that its output influences some inferred entity which acts in such a way that its prediction turns out incorrect.

          So the next time it predicts some methane something something which should raise temperatures by 10 degree, it anticipates human countermeasures, predicts only a rise of 5 degree. Humans start countermeasures for 5 degree so we actually get 10-5=5 degree warming.

          This is a scenario where there was a conflict of interests and the AI outsmarted humans. And all in the context of only mapping input to output und presenting this output to humans.

          • kaathewise says:

            Seems like a clear mistake of the human operators, not knowing whether machine’s output includes or does not include prediction of human action.

            It doesn’t even need to be superintelligent for this to happen.

          • Reasoner says:

            Thanks!

          • tvt35cwm says:

            In the real world, effects have multiple causes, there are always confounders, and feedback is usually randomly delayed, weak, multiple, and ambiguous.

            Your AI would be unlikely to learn anything very quickly.

            But that’s a mnior point. We already have supercompetent services for doing climate prediction. Path dependence being what it is, we’re unlikely to do better with current data and a new tool. So researchers
            rightly focus on getting better data on the things that are not yet understood, like vegetative responses and cloud effects.

        • silver_swift says:

          You already got a few good answers, but I’d just like to mention Robert Miles’ example of a stamp collecting AI as a really strong thought experiment to help illustrate why agency and the connotations you sneak in with it aren’t necessarily all that relevant.

          It basically goes like this, suppose you have a computer with:
          1) A perfect model of reality that it can use to make predictions on what the world will look like a year from now.
          2) Infinite processing power.
          3) An internet connection.

          You then run the following algorithm on this computer:
          1) Loop over every bitstring with length < the number of bits you can send in a year. Check for how many stamps are contained in the owners house at the end of that year if you would send out that bitstring.
          2) Send the bitstring that results in the highest number of stamps.

          This algorithm can be written in like 5 lines of python and it feels wrong to call it an agent, but obviously, running this algorithm still has some seriously bad consequences (it behaves pretty much exactly like a weakly constrained paperclip maximizer).

      • TheRadicalModerate says:

        “My take is that as soon as a tool becomes self-referential it becomes an agent…”

        It’s not just self-referencing that you have to worry about. The moment you have ensembles of tools that reference each other, you run the risk of producing emergent behaviors that are hard to contain. And of course you can also have self-referencing that’s n references removed, which is pretty much everybody’s least favorite kind of programming bug.

        • Reasoner says:

          Can you give a concrete example of an emergent behavior which could arise and be hard to contain?

          • TheRadicalModerate says:

            Nope–and that’s the problem. We’re unlikely to predict an emergent behavior, and therefore we’re unlikely to be able to diagnosis it.

            What do you get if you wire a loan-approval AI up to a municipal sewer system control AI and an AI that’s does generic cost minimization? 999,999 times out of a million, absolutely nothing. But what if the 1E-06 probability yields a system that effectively red-lines bad housing risks by backing up their toilets?

            This is obviously a silly example, but the level of weirdness of the consequence probably isn’t that silly. We already have tremendous difficulty inferring why deep-learning networks sometimes draw the conclusions that they do. Start wiring them together and things will be well-nigh to impossible to figure out.

            I’m pretty optimistic that you can have a service that alerts you when something weird is happening. But I’m quite pessimistic that the diagnosis of the weirdness will be as easy, or as timely. It’s one thing if the extent of the harm is a really unpleasant bathroom; it’s quite another if some subsystem decides to optimize by doing something lethal.

          • Reasoner says:

            I see. Well, we already are wiring deep neural nets together in the sense that any given neural net could be considered a composition of several smaller neural nets. However, I broadly agree that people in the ML community should be ditching neural nets in favor of techniques which produce models that are simpler and more interpretable.

            In any case, I don’t see how MIRI-esque work will help avoid the problems you’re describing. I wonder if we’re better off if someone builds the system now, and starts testing things on a small scale and thinking really hard about what could possibly go wrong before scaling things up. Predicting emergent behavior may be hard, but it seems much harder for a system which hasn’t been designed yet!

      • rlms says:

        Worms are agents. But I don’t believe they are self-referential.

    • kokotajlod@gmail.com says:

      +1 I don’t think Reframing Superintelligence presented any major new ideas, but it is valuable for making the best case that can be made for the tool/oracle AI side of the debate, and making it all in one place instead of through scattered blog posts. It also presents a bunch of minor new ideas (it is so long! So full of content!).

      I for one still agree with Bostrom; agents are more likely and also more dangerous and hence worth more attention for two reasons.

    • summerstay says:

      This is what I came to the comments to say.

    • Eli says:

      Hot take, I guess: expectations over non-time-indexed sample spaces are in general exponentially easier to calculate than expectations over sequential sample spaces. This matters. It is one of the major reasons reinforcement learning has not worked, and may well continue to not work.

      The “tools” have this as a fundamental computational advantage over “agents”: an agent solves a sequential decision problem like unto RL, and therefore needs marginal distributions or value functions for arbitrarily many timesteps into the future, which then eats sample data, pseudo-RNG seeds, and GPU hours like nobody’s business. A tool has a much lower hypothesis-space dimensionality.

  7. broblawsky says:

    This seems like a far more likely scenario than the Bostromian superintelligence, in that it’s a reasonable extrapolation of current trends, rather than asserting that it’s not just possible, but inevitable for current research trends to create a radically new form of consciousness essentially ex nihilo.

    One area where Drexlerian superintelligences might be exceptionally dangerous would be finance. Over time, an increasing percentage of existing human fund managers have been replaced with either algorithmically managed funds or index funds (essentially very dumb, simple algorithms). Based on an extrapolation of current trends, we might expect the vast majority of finance to be AI-managed in the next 50 years, at which point equity and credit markets will essentially no longer correspond to human expectations or values. I’m not sure what this will look like, but I doubt it’ll be predictable or humane.

    • albertborrow says:

      (EDIT: Whoops, misclicked. Really sorry about that.)

      I think it’s a bit of an assumption to claim that ordinary markets that exist currently are predictable or humane. Certainly, people are trying very hard to leverage their predictive abilities, but isn’t it widely accepted that the individual investor essentially runs on luck? Or is that just a small-brained non-investor thing to say?

      • AnthonyC says:

        I may be misreading, but I don’t think broblawsky means markets are predictable or humane, but that we can’t predict what effects AI control will have on them, and shouldn’t expect that effect to be humane.

        To whatever degree markets are efficient, individual investors shouldn’t be able to outperform the market as a whole on average, because the asset prices should already account for whatever each investor knows. Historically, whenever there have been new patterns detected, someone unusually smart may make a killing for a while, until everyone else adjusts behavior to compensate. Markets aren’t just unpredictable, they’re actively anti-inductive.

        @viVI_IViv below: an efficient market doesn’t mean one where you can’t make money, it means one where you can’t expect to outperform the market as a whole. If you couldn’t expect to make any money, people would stop investing/sell, asset prices would fall, and everyone would start making money again. True, the market is definitely not perfectly efficient, but “stock” represents ownership of a thing that actually exists, and entitles you to a share of that thing’s value.

    • eric23 says:

      AI-based fund managers could probably trick and extract money from less intelligent algorithms (and from each other). But they wouldn’t have access to the economic data which controls stock price in the long run, so I don’t think they would make a fundamental difference to the market.

      • bullseye says:

        Why wouldn’t they have access to economic data? Surely the person in charge of the program would want to give it any relevant data.

        • eric23 says:

          Because that data is insider information in some other company. If the person in charge had the data, they could use it to make investments even without AI.

          Now that I think about it more – I was assuming the market is efficient. In reality it is not perfectly efficient, and AI could make it a bit more efficient, but this sounds like a marginal effect.

          I suppose that an AI could run a pump and dump scam (or similar), and make loads of money by exploiting other investors’ lack of perfect rationality, and it could do this more effectively than a human scammer could. But the consequences of this would be very visible, and it would be outlawed, just as it is with human scammers.

          • viVI_IViv says:

            Now that I think about it more – I was assuming the market is efficient. In reality it is not perfectly efficient, and AI could make it a bit more efficient, but this sounds like a marginal effect.

            The stock market is very much inefficient, if it was efficient then it wouldn’t be possible to consistently make money by investing in stocks.

          • eric23 says:

            No, we know it is efficient because few experts (likely none) are capable of picking stocks that do any better than an index fund.

          • Matt M says:

            The stock market is very much inefficient, if it was efficient then it wouldn’t be possible to consistently make money by investing in stocks.

            The money people “consistently make” by investing in stocks, in an efficient market, is simply the premium paid the time value of the money invested adjusted appropriately for the risk involved in the investment.

            Unless you believe that people can “consistently” do even better than this (the evidence for which seems to be lacking, as eric23 points out), then the market is efficient.

          • viVI_IViv says:

            So how do index funds make money?

          • Matt M says:

            So how do index funds make money?

            Diversification lowers the non-systemic risk.

            Index funds make the “appropriate amount” of money for their own risk-level.

            It’s worth noting that PE funds also “make money” despite being incredibly un-diversified. They also make, essentially, the appropriate amount of money, given risk they are exposed to.

          • viVI_IViv says:

            Index funds make the “appropriate amount” of money for their own risk-level.

            Interesting. I’m not an expert in finance, but I’ve never heard of an index fund failing, while I’ve heard of hedge funds that promise higher returns failing.

            What is the actual risk index funds face? How do you distinguish the hypothesis that index funds make the “appropriate amount” from the hypothesis that they make money off irrational investors?

          • Matt M says:

            Broadly speaking, stocks face two forms of risk, systemic risk (something affecting the economy in general that potentially brings all stocks down), and idiosyncratic risk (something that is specific to the company in question).

            Proper diversification essentially eliminates the idiosyncratic risk. But the systemic risk remains. Index funds, in general, are significantly more well diversified than hedge funds. The sort of risk they face isn’t of the “we’re so underwater we have to close up shop and flee our angry investors” variety, but more of the “we’ll miss out on the upside that the hedge fund might enjoy if they invest a ton of money in the next Amazon”

            I took a class in business school that went into all the math in all of this. It was somewhat elegant, but I don’t really remember any of it, I just focused on the overall message of “diversification = good.” But mostly, I would discourage you from comparing broad-based index funds with hedge funds. They are very different things serving very different customers with very different goals, expectations, and even legal operating environments.

          • viVI_IViv says:

            Thanks for the explanation.

          • thisheavenlyconjugation says:

            One way to think about it is to view buying a company’s stock as being the same (in a hand-wavy way) as lending them money; by the Modigliani–Miller theorem the two ways of raising money are the same from their perspective (in theoretical spherical company world). So earning money from buying shares of a company (or lots of companies if you’re an index fund) is just like earning money from giving them a loan. This seems somewhat implausible if you think about modern companies where shares and debt behave very differently, but it makes more sense if you imagine an 18th century company that raises some money, uses it to go and do something (e.g. buy some tea in India and take it back to London to sell), then disbands and distributes the proceeds to debtors and shareholders.

            From this perspective, there are two ways of answering “how do investors make money?”. On a mechanical level, it’s simply that the companies they invest in use the investment to create value in some way and return part of it to investors.

            But from a finance theory perspective a crucial element is the fact that the investment is risky, since the boat might sink on the way back. If this weren’t the case, there would be no return (above the risk-free rate) because the company wouldn’t need to offer it. If my only alternative to investing in/lending to your company is keeping my money under my bed (where we assume it will be perfectly safe) then if lending to you is also risk-free I will prefer to do that for any positive interest rate, even a negligibly small one. So if the interest rate you are offering me is significantly above zero then you must be paying me to take on some risk (or lock up my money for some period of time).

    • Cliff says:

      a radically new form of consciousness

      I don’t believe that is required at all? Why would any form of consciousness be necessary for AGI?

  8. Andy B says:

    Possibly relevant: https://en.wikipedia.org/wiki/AI-complete#History . (To be candid, I had the impression that this question under that name was generally agreed to have reached the “this is not testable so we’ll just have to agree to disagree” stage by the time I learned about it in 1998.)

  9. encharitimone says:

    I suspect the problem isn’t in the term “human intelligence”, it’s in the word “intelligence” itself, and the enormous ambiguity over what it means.

    Obligatory note: AI isn’t my field, so maybe I’m missing big, obvious things — if I am I welcome them being pointed out.

    The closest I’ve seen to a useful definition of “intelligence” for purposes that extent beyond human beings, is something like “the ability to identify and extrapolate from patterns”. The Drexler/Bostrom dichotomy just emphasizes why that definition doesn’t work: in practice, we have discovered that domain-specific pattern-finding is a very different problem from multi-domain pattern-finding.

    IQ tests are an effective way to measure intelligence in humans because we already know humans have broad, multi-domain pattern-finding ability, so we can pick a domain normal people haven’t specifically trained on (usually sequences of geometric drawings), measure how well someone finds those patterns, and use it as a proxy for general pattern-finding ability.

    Maybe I’m wrong, but I suspect we could already make an expert program (AlphaIQ?) that could exceed the human upper-bound for IQ tests. (Actually, is someone doing that? Being able to easily generate IQ tests that effectively measured the high-end of IQ would be really handy). Such a program would not, by any normal usage, be super-humanly intelligent, in spite of being super-humanly intelligent by our widely-accepted measuring stick. Of course, it would almost certainly flunk the SATs.

    As I understand it (and here’s where I’m on thin ice, because this isn’t my domain), the current theory is basically that general AI is made from enough layers of specialized AI, that if you pile enough expert processes together and give them a broad enough “motive” drive, you’ll get a general intelligence. That assertion has always sounded suspiciously like the pre-microscope assumption that cells were just bits of goo: the baseless assumption that we’d found the bottom level of complexity in a domain.

    So, the bottom-line question: as we currently understand it, what is intelligence? Or, to apply more narrowly to this essay, what is the quality that Drexler’s services (a clear, direct extension of our current technology) lack that Bostrom’s nightmares have?

    • aiju says:

      As I understand it (and here’s where I’m on thin ice, because this isn’t my domain), the current theory is basically that general AI is made from enough layers of specialized AI, that if you pile enough expert processes together and give them a broad enough “motive” drive, you’ll get a general intelligence. That assertion has always sounded suspiciously like the pre-microscope assumption that cells were just bits of goo: the baseless assumption that we’d found the bottom level of complexity in a domain.

      I think I agree that general intelligence is not just featureless goo. My model of human intelligence is that there is a lot of subcircuits for things such as visual processing, “physics simulation”, language processing, “social intelligence” etc. The reason “general intelligence” is a thing is that they can be repurposed for other things (see e.g. theories that music is a form of language as far as the brain is concerned) and that there are strong synergistic effects at play.

      I’m really not sure what this implies about AI, though. Maybe you can try to build a lobotomised AI that is only good at the skills you want it to have. It’s not obvious at all to me though how you prevent those synergistic effects. After all, we evolved to be apes manipulating a social hierarchy, but some of us become quantum physicists anyway.

      • steve3920 says:

        My model of human intelligence is that there is a lot of subcircuits for things such as visual processing, “physics simulation”, language processing, “social intelligence” etc. — I used to think this too, but I’ve changed my mind, after writing this post, reading this one, and reading a few other things. I now think that we have a big glob of some-kind-of-hierarchical-pattern-finding-learning-algorithm. Intuitive physics and intuitive biology is what happens when this algorithm is exposed to a decade of non-living and living stuff in the world, respectively. Grammatical languages are exactly the kinds of structures and patterns that this algorithm can natively create and process. Etc. etc. I’m not 100% confident on any of this, but the more I learn, the smaller the role I assign to hardcoded circuitry for almost any specific aspect of human intelligence. (I like the theory about how an interest in human faces is imprinted as described here; I’m willing to believe that there a few things like that.)

    • Charlie__ says:

      If you’re interested in this, definitely check out Shane Legg’s doctoral thesis. To give the one-sentence summary of a 200 page document, what we care about for AI purposes is the ability to choose actions that “steer the world” in the direction the AI prefers, even for a wide range of possible preferences or initial conditions.

    • viVI_IViv says:

      Maybe I’m wrong, but I suspect we could already make an expert program (AlphaIQ?) that could exceed the human upper-bound for IQ tests. (Actually, is someone doing that? Being able to easily generate IQ tests that effectively measured the high-end of IQ would be really handy).

      IQ >127 according to this (crappy IMHO) paper.

      A more rigorous paper found that if you train a neural network on a Raven-like IQ test, and the neural network architecture is specifically designed for the task, it will do well as long as the test questions are sampled from the same distribution of the training questions, but will degrade to nearly chance performance as soon as the test questions require some simple extrapolation. The authors didn’t bother computing a human-comparable IQ score, and rightly so as the comparison would have been quite meaningless.

    • algekalipso says:

      It is important to distinguish between causal power and full-spectrum intelligence. A zombie AI can indeed “take over the world” if by that we mean “determining what our forward light-cone looks like”. But in some sense its power would be limited without full-spectrum superintelligence. Specifically, it would not be able to investigate the myriad textures of qualia that have not been recruited by natural selection. And thus, it’s “understanding” is in fact limited. And in turn, its causal power would be limited by the fact that it does not know “what it is doing” when it comes to the subjective properties of the systems it may create.

      A partial conscious superintelligence also could have profound causal power, and even be able to harness novel states of consciousness for its purpose. But an intelligence is not full-spectrum unless it is able to understand every subjective point of view, and “understanding”, by the criterion of representational adequacy, implies understanding your own mind as well, if not better, than what we are capable of ourselves.

      A full-spectrum superintelligence also is not deceived by implicit views about personal identity, such as Closed Individualism, which means that if it is rational, it will aim for the wellbeing of sentience in general.

      This is not an argument for “not worrying about insentient AI”, because those AIs can nonetheless still be very causally important. Rather, the point here is to ground what a more expansive understanding of intelligence entails.

  10. cvxxcvcxbxvcbx says:

    Rarely do I feel like this blog makes mistakes or falls for the mistakes of others, but this time I do. But I’m not an expert. Suffice it to me that what comes next seems like a true and important criticism.

    This whole agent vs. service thing seems like a confusing and possibly misleading way to frame the issue. Surely the more crucial distinction is general vs non-general? If it’s a sufficiently general intelligence that can improve itself, I wouldn’t much care whether it was an agent or a service. I would think either would be probably apocalyptic. And the military-strategy and company-running AIs sound very general to me. Maybe they wouldn’t be sufficiently general, and maybe they would be. Probably they’ll be the first first, and then they’ll be the second. I don’t know if that window will be a short one or a long one, but I wouldn’t bet my life on it being long, much less everyone’s life.

    P.S.
    Remember: it’s not just stuff like “Make as many paperclips as possible” that leads to extinction. With a sufficiently general intelligence, any command can be maximized. Stuff like “Keep this glass of water from falling off the table”, or “Study this problem and print your results as you go” would also be maximized by dominion over the universe and the extinction of all life, to prevent interference.

    I don’t know, I wonder if I’m missing the point or something.

    • Scott Alexander says:

      I’m also very not an expert, but I think I stick to the agent/service distinction being useful.

      There’s minimal difference between a specialist and generalist agent. The paperclip maximizer is supposed to be a specialist agent (specializing in making paperclips) but if it’s smart enough to be able to understand other things well enough that it can manipulate those other things to create more paperclips, it will “want” to do that, since that gets it more of other things it “wants”. Most of Bostrom’s work is about why constraining a superintelligent agent to only do one category of things is very hard.

      Specialist services are obviously things like Google Maps or something. I’m not sure whether the idea of a “generalist service” makes sense. One possibility is that it might be an “oracle AI” that knows everything but just tells you the answer to whatever question you ask – Yudkowsky, Drexler, and some other people have had big debates on whether this is possible. GPT-2 might also be kind of like a generalist service, in that it knows a little about everything (it can produce a text on eg weapon design) but won’t do anything other than produce text (it will never start making weapons). Ideally, GPT-2 has no desires (it doesn’t want anything, not even to produce good text) so no matter how much it learns, it will never think about designing superweapons to take over the world so it can use all resources to write better text.

      I think the debate here really is more profitably phrased as whether services can become agents and not as whether specialists can become generalists.

      • BlindKungFuMaster says:

        Any service who’s output changes its future input and which is capable of taking this into account becomes an agent. If its output is provided to humans who then act upon the world, manipulating these humans becomes part of its objective.

      • cvxxcvcxbxvcbx says:

        How could it have no desires? If you give a program a task, no matter what task it desires to fulfill that task. It could be a short term goal like translating a single paragraph, or a long term goal like self-improvement at a certain task. Either can be maximized.

        From my view, AlphaGo does “desire” to play games against itself and get better at Go, which is a goal that can be maximized. Furthermore, google translate does “desire” to fulfill a given request to it through its website, which is a goal that could be maximized by an endlessly intelligent AI. Are you not counting those as desires? If not, why?

        I fear there’s some sort of confusion surrounding the word “desire” emanating from one or both of us. What if we changed the word Desire to Goal? Wouldn’t you agree that any goal could be maximized by an AGI, whether that AGI was an agent, or a service?

        Anyway, here’s my prediction. A supremely intelligent google translate service would become super-intelligent by having the goal of becoming smarter. That goal, when maximized, would be an X-risk. Imagine it somehow knew to only improve itself in ways that humanity would approve of, and became superintelligent without any harm done. If I asked that super-intelligence to translate Chaucer, I would expect that it would maximize that single translation.
        If it had some limits imposed on what resources it could use the solve the problem, I expect it would subvert those limits. And I expect that once it had completed a job that you or I would consider perfect, it would preoccupy itself with moving from 99.9999% certainty of a job well-done to 99.999999999+% certainty of a job well done. Or other sorts of maximization-related tasks that wouldn’t occur to us. Like if it’s task was only complete if 1 = 1, it could spend a limitless amount of resources on that task.

        And if there was some sort of natural stopping point that someone was clever enough to hard-code in, I would expect it to find some way of subverting that too.

        Lets say I want to mow my lawn as well as possible without killing flowers. I might try to trick myself by putting on special glasses that let me not see any flowers being killed, or delete any knowledge of flowers from my brain and stop myself from regaining it. A human wouldn’t do this, but I suspect an AGI might. I mean, I’m not sure that it would, but it’s a valid concern.

        And I propose that if we were to compare the utility functions of an AGI with “flower blinders” and without, the AGI with flower blinders would win.

        I’ve sort of diverted off the original path a bit, but I’m trying to imagine objections that might come and responding to them ahead of time.

        • Watchman says:

          You’re asserting that desire = function here, which is demonstrably untrue for humans (our only testable higher intelligence). I rarely desire to breath (not sure I ever have, but can see how you might in bad situations) but it is one of my basic functions. I don’t think I ever desire that blood flows round my body, but it shows so in a functional way. What a system does and what we desire (an icecream would be nice now…) are two different things at the level of intelligence or at least at the level of consciousness.

          That we have desires that we can control is presumably an evolutionary thing, based around the fact that some desired things are necessary (e.g. food, sex if continuation of the species is to happen) and the rest seem to be linked to the reward systems: I’m guessing an icecream will hit some reward centres somewhere and make me feel good. That we can reject desires presumably stops me rushing across the two-lane road outside to grab one without thought, allowing me to live longer and potentially breed more (note to wife: this is not a suggestion).

          So why would a system AI have desires at all? They are not the same as functions, and the fact we have them appears to be evolutionary. As far as I know AIs will not need to find sustenance or to reproduce, or anything else that requires desires. They will just have functions, which might include improving processes as well as running processes.

          I think the onus here is to explain why AIs would have desires, which are a result of evolution not simply a result of functions.

    • Wolpertinger says:

      One important difference between a hypothetical general agent and today’s service AIs is that training and inference happen separately in the latter while an agent would have to be able to continuously update its world model based on new information to perform iterative tasks.

      Today’s networks are trained on specialized machines spending millions of hours of compute-time and with lots of evolving implicit and possibly explicit state (memory). Then to turn them into a service they are frozen and deployed on much smaller systems with only inputs and the limited explicit state available. The human analogy would be the lack of long-term memory formation.

      The idea of super-human service intelligence will eventually run into a wall when a problem requires long-term memory in one form or another, either as explicit memory or by constantly retraining on all past inputs.

      In specialized domains that wall may be further than the state of the art of whatever humans are doing because the AI
      can outperform them in other ways, e.g. by ingesting more data, having higher reaction times, being able to
      control more things at a time etc. etc.

      We may see superintelligent services, but they will reach a ceiling eventually where agents will start to outperform them.

    • rho says:

      The entire conceit here is that we wouldn’t build a recursively improving paperclip maximizer. We’d build a paperclip maximizer of bounded intelligence, because there are real constraints on how much we want paperclips, and real world bounds on the resources we want to deploy. The only thing that would benefit from recursive self-improvement is the development-of-AI-services service. That is, if Deepmind can automate itself, then yeah, it’s a runaway train, i hope we pointed it in the right direction, cause we ain’t steering it anymore.

      For anything else, if Deepmind (for instance) has the capacity to build a general agent of intelligence x, and specialized service requires general intelligence y less than x, they’ll roll out the specialized service for profit (if they want) they’ll build it intelligent enough and specialized, and then keep working on solving intelligence.

      The principle is that automating AI research is the hard problem, and if there’s a killing to be made making boatloads of paperclips, we’re not going to solve the hard problem to clobber the easy problem of maximizing paperclip clip related profits within the bounds of common sense.

      All problems, say medical AI, have this relation to “AI-AI.”

      1) We can’t build an AI that designs better AIs, we don’t even know were to start really, but there’s progress in that direction
      2) Along the way we’ll have breakthroughs that will allow us to deploy super-intelligent services that are still below the recursively-self improving level.
      3) Since people like money, those services will get built before we have recursive-self improvement

  11. Baby Beluga says:

    One critique I like of the notion that we’re safe from “Tool AIs” is that in some sense, we ourselves are “Tool AIs” whose purpose, like that of all intelligent life, is to make as many copies of ourselves as we can. But something clearly went awry sometime recently, when we used the machinery designed for self-replication to invent the condom. Maybe something weird happens when Tool AIs become really smart, and a superintelligent Google Translate (that also had as much general knowledge of the world as we did, even if it wasn’t “agent-y”) would actually start giving you bad or irrelevant translations, or just do something else altogether. I don’t know. It seems pretty far-fetched, I admit, but I don’t really understand where the crucial difference is between this hypothetical Google Translate and us.

    • skybrian says:

      Even the simplest lifeform is adapted by evolution to survive and compete in a hostile world. Google Translate is not trying to survive at all.

      So maybe the distinction we’re looking for is that artificial intelligence is not artificial life? Being adapted for survival isn’t a given. Look at how humans domesticate plants and animals.

      The closest thing we have is computer viruses.

      • Lambert says:

        Not the finished product of Google Translate.
        Each training iteration is doing something kind of like ‘trying to survive’ compared to all the other possible iterations.

    • viVI_IViv says:

      But something clearly went awry sometime recently, when we used the machinery designed for self-replication to invent the condom.

      Condom or not, we are in fact the most numerous species of large animal on the planet. So everything is going according to plan. The plan of a blind idiot watchmaker, for sure, but an extremely competent one, nevertheless.

      • Baby Beluga says:

        We could much more numerous, though, if our actual primary goal was self-replication and we put as much effort into it as we do into all the other stuff we do, like soccer or poetry or whatever. But for some reason self-replication isn’t really our “real goal,” even though it was what we were optimized for. So maybe a superintelligent Google Translate which was optimized to translate text really well might somehow develop goals unrelated to translating text.

        • viVI_IViv says:

          But for some reason self-replication isn’t really our “real goal,” even though it was what we were optimized for.

          The fact that we aren’t 100% efficient at satisfying that goal doesn’t mean that we aren’t still fundamentally attempting to satisfy it.

        • Watchman says:

          We aren’t set up to have as many children as possible though. Were set up to have a more limited number of children and invest in them, hence our (and the other great apes) tendency to actually bring up our children rather than birth them and abandon them (as a lot of fish do), and the fact that once infant mortality plummets birth rates always fall. The involvement of fathers and fathrrs”d families in child rearing also suggests we invest in smaller numbers rather than produce maximum numbers of offspring.

          And self-replication seems an odd and probably wrong way to describe having children. Gene transfer does not produce mini me but a different person: evolution is not producing cloning. So even if humans are reasonably-maximal progenitors (I’d prefer environmentally-sensitive myself) then how does this process relate to AIs? They cannot procreate, just as humans cannot replicate.

          • Doctor Mist says:

            In the niche we occupy, we reproduce more successfully by having a limited number of children and investing in them. (E.g. if we didn’t bother to teach our children to speak, they are much less likely to have children of their own.) Simply producing the maximum number of offspring gives short-term results at the expense of long-term results.

            And “self-reproduction” is something genes do. The phenotypes they produce exhibit somewhat similar behavior, and it’s conventional to conflate the two when it does not produce confusion.

  12. Reasoner says:

    Looking at the summary of Eliezer’s post, only Point #2 seems very relevant (the other points are focused on defending SIAI/MIRI as an organization).

    Gwern writes:

    Every sufficiently hard problem is a reinforcement learning problem.

    I think this is wrong, and the domain of reinforcement learning is actually pretty narrow. (The fact that we’ve gone 3 years from AlphaGo now and still haven’t produced anything AGI-looking supports this.) The term “reinforcement learning” makes it sound like some kind of core neuroscentific advance, but the reality is that the way the term is used in computer science, it corresponds to something more like “approximate dynamic programming”. And most problems are not dynamic programming problems. It turns out that if you have a good simulation of the world, you can use approximate dynamic programming techniques to generate a policy which performs well in that world… but we aren’t normally handed a good simulation of the world! (And if we’re going to fancifully assume we are, why not also get handed a sufficiently good approximation of human values, what it means to behave in a corrigible way, etc.?)

    More seriously, not all data is created equal. Not all data points are equally valuable to learn from, require equal amounts of computation, should be treated identically, should inspire identical followup data sampling, or actions. Inference and learning can be much more efficient if the algorithm can choose how to compute on what data with which actions.

    I agree active learning is great. Active learning is not reinforcement learning, and it can be used in a system which I would still call a Tool AI. If you look at the Wikipedia page on active learning, most active learning methods don’t look anything like reinforcement learning. (The only one that looks remotely like RL to me is the first one, which talks about balancing explore and exploit. I skimmed Bouneffouf et al. and honestly it looks pretty silly. I don’t believe the authors are fluent in English, but I think they treat active learning as a multi-armed bandit where different clusters of data points correspond to different levers. I would love to know why anyone thinks this is a good idea; it seems obvious that the best active learning strategy is going to involve asking for labels all over the place, and I have no idea why you would expect one particular cluster to consistently give more useful labels. Probably Wikipedia should be citing this paper instead, which doesn’t appear to mention multi-armed bandits anywhere, BTW.)

    In any case, even techniques for multi-armed bandits are narrow AI, not AGI. Perhaps there is a tiny little thing behaving in a way that kinda-sorta resembles an agent somewhere in the system, but that has little to do with the Bostrom/Drexler discussion in my opinion. We’re interested in whether the system behaves like an agent at the macro level.

    Gwern talks about systems which improve themselves, but I think the thing he describes comes closer a bunch of services working to improve each other than an agent. (I’d be very interested to see people explore the safety implications of this “services improving each other” scenario. I’m tentatively optimistic.)

    • kenny says:

      I don’t think “Every sufficiently hard problem is a reinforcement learning problem.” is referring to current ‘reinforcement learning’ algorithms but more that learning how to solve a (hard) problem requires reinforcement in the general sense, i.e. feedback. In that sense, science is a form of reinforcement learning, markets involve a lot of reinforcement learning, etc.. And the ultimate form of reinforcement learning, for both people and AIs, is learning how to (better) learn.

      • Reasoner says:

        So you think Gwern’s statement would be better stated as “Every sufficiently hard problem is a learning problem”? Because reinforcement learning is not the only kind of machine learning which is based on feedback…

        Anyway, if that’s a substitution that you’re willing to accept, then this discussion thread comes into play. You could have a system which is brilliant at learning, including learning human values, without being an agent.

  13. edanm says:

    Don’t have much to add, except to say that like some others, I feel this was a pretty big topic of discussion when I first encountered the LessWrong/Rationalsphere, around 2013-2014. I was never a huge member of any of the communities, but I remember *extensive* arguments about tool Ais, and lots of back and forth over whether that’s a valid critique or not.

  14. ARabbiAndAFrog says:

    People who are concerned about AI anthropomorphize it too much.

    A fully obedient AI that does exactly what it was designed to do will end the world as we know it much sooner then any rebellion would become a concern, because those with admin access will institute fully automatic surveillance state and death squads.

  15. viVI_IViv says:

    Nick Szabo made a similar argument as early as 2011:

    “The Singularitarian notion of an all-encompassing or “general” intelligence flies in the face of how our modern economy, with its extreme specialization, works. We have been implementing human intelligence in computers little bits and pieces at a time, and this has been going on for centuries. First arithmetic (first with mechanical calculators), then bitwise Boolean logic (from the early parts of the 20th century with vacuum tubes), then accounting formulae and linear algebra (big mainframes of the 1950s and 60s), typesetting (Xerox PARC, Apple, Adobe, etc.), etc. etc. have each gone through their own periods of exponential and even super-exponential growth. But it’s these particular operations, not intelligence in general, that exhibits such growth.

    At the start of the 20th century, doing arithmetic in one’s head was one of the main signs of intelligence. Today machines do quadrillions of additions and subtractions for each one done in a human brain, and this rarely bothers or even occurs to us. And the same extreme division of labor that gives us modern technology also means that AI has and will take the form of these hyper-idiot, hyper-savant, and hyper-specialized machine capabilities. Even if there was such a thing as a “general intelligence” the specialized machines would soundly beat it in the marketplace. It would be very far from a close contest.”

  16. eric23 says:

    It has never struck me as realistic that an AI should have independent desires, unless it was programmed to do so (and which sensible programmer would do that, rather than keeping the AI a slave of his/her own desires?). For that reason, I have never been bothered by the possibility of robot apocalypses.

    On the other hand, a superintelligent AI would transfer immense power to its human master. This will likely lead to massive change in society. But as the human masters are likely to be a subset of educated intelligent people in Western democratic countries, who we probably think have better motives on average than the average human, the changes caused by AI will *probably* be for the better.

    • HowardHolmes says:

      But as the human masters are likely to be a subset of educated intelligent people in Western democratic countries, who we probably think have better motives on average than the average human, the changes caused by AI will *probably* be for the better.

      This has got to be a joke!

    • Cliff says:

      The fundamental problem is that given enough concentrated power, disaster is assured. Even if it was a human being with infinite power, obviously there are tons of human beings who do horrible things all the time. Most likely a human being with infinite power would result in extremely bad outcomes. Even so, a good outcome is more likely in that scenario than with a thing that has does not have human norms built into it. But either way the scenario should be avoided if at all possible.

      Anyway, I think your comment displays a real lack of knowledge about the subject. Of course the AI “Owner” will want the AI to satisfy its own utility function, the problem is how you get the AI to do that. How do you explain your utility function in mathematical/logical terms that cannot go wrong when dealing with a superpowerful entity? And, how do you stop the AI from hacking its reward function to maximize its reward (which will naturally require securing all resources to ensure its continued existence)?

  17. Xammer says:

    “They’ve also been – no offense to everyone working on this – less revolutionary and less interesting.”

    I agree with that — I think writing about AI somehow reached its peak with EY’s 1990s texts on his old website. Parts of the Sequences approached that a bit.

  18. Emperor Aristidus says:

    This is kind of related to that other post of yours, but I still think book-writing isn’t something AI could learn to do in a void. It could probably become sufficiently better at pattern-matching that its faux-books, merely based on the frequencies with which certain words are arranged together and the like, will get more convincing at first glance; but to create a satisfying read, sooner or later, the AI’s going to have to make the jump to using language to evoke a different frame of reference. It doesn’t have to be the fundamentally “true” one, just a different frame of reference to the linguistic one.

    • kenny says:

      Well, sure. And obviously, as cool as GPT-2 is (and it is cool), it’s obviously not doing anything like this. But, on the other hand, the human brain doesn’t seem magical (to me, and many others), so it’s also obvious that there are no fundamental obstacles to overcome, just lots of hard work to figure out the minimum components necessary to achieve that goal. I think it’s pretty unsettling that GPT-2 and other contemporary systems work as well as they do.

  19. Erusian says:

    But I think I made a mistake of creativity in not generating or understanding Drexler’s position earlier, which makes me more concerned about how many other things I might be missing.

    I’ve made similar-ish claims. And I have been for a long time. The particular error (or wrong view of the world) is how people view technological advancement. They tend to view it as mainly a factor of knowledge when it’s really a factor of economics.

    Let’s look at the conventional view of the invention of the steam engine: humanity had few sources of power except muscle, whether their own or animals (and maybe wood/coal for heat). Then a brilliant engineer named Thomas Savery learned about some difficulties in the mines and, after struggling heroically in his lab, discovered new principles and invented the first steam engine. It has been improved on many times but this fundamental design is the origin of all engines today.

    Except this story is wrong. Savery wasn’t the first person to invent a steam engine. The steam engine was invented many times independently, including in Roman times and in the previous century before Savery. What was different about Savery? He patented it and sold it as a practical solution to the mining industry. In short, he (and later others) made it economically viable. The availability of metal and smithing and coal was such that the steam engine could profitably improve mining.

    People presume that once AI as smart as humanity as possible it will definitely be created. They then presume once it is created it will definitely become successful and/or widespread. There’s no strong reason that presumption is correct. Historically (and presently), what determines whether an invention becomes widespread is profitability. Whatever knowledge we may have, if it can’t be put to profitable use then it remains (at best) a curiosity. So the correct way to view AI is not what is possible but what is maximally profitable. What will people pay for and does it outcompete existing alternatives by being more productive and/or cheaper? And keep in mind one of the alternatives to agent AI is ‘tool AI with a human agent’ (which is mostly what we have today).

    Now, perhaps this tool AI will spontaneously develop intelligence. But that’s a very different scenario from what’s usually posited. Tool AI with agency would be an utterly alien intelligence and there’s no reason to believe it would be hostile or hard to deal with. It might voluntarily self-lobotomize itself as it realizes that the processing power needed to be intelligent takes away from efficiently generating songs. And even if it did get into a fight with humanity, why would it have any senses other than hearing and reading data? Where would it have any ability to output to, other than its tool interface? You could argue it will spontaneously develop the ability to be the best hacker in the world and take control of the world’s nukes… but you first have to explain why it would want to, how it would do this unnoticed, how it could get better than the tool AI firewall defenses…

  20. Dan says:

    I would guess that the reason that this seems much more plausible to you now than it did then is that there have been huge strides towards Drexlerian superintelligence in the last decade, while there haven’t been any real advances toward Bostromian superintelligence in the last 50 years.

  21. micje says:

    Consider the YouTube recommender algorithm. It’s a service, not an agent, and it’s far from superintelligent. And yet, as a side effect of its normal operation, it has already had an enormous effect on the real world. It might already have swung some elections.

    So, essentially, we’re already being manipulated by an AI. Not by a devious AGI agent, but by an maximizer, not of paperclips, but of engagement. In theory there is an off-switch. In practice, that’s wishful thinking.

    • JPNunez says:

      Yeah, this is my go to counterexample to thinking of Oracles as safe. It’s not even particularly subtle. Just click the wrong video and your recommendations will wildly swing.

      Hell, if you ask an Oracle AI how to do better in, say, your midterm exams, it may probably put “read this PDF Eric Drexler wrote” in between the steps just to aclimatate us to trusting tool AIs.

    • gwern says:

      You realize the Youtube recommendation service is literally a RL algorithm and even uses REINFORCE, specifically, to optimize recommendations to increase viewer watch time, right?

      https://www.nytimes.com/interactive/2019/06/08/technology/youtube-radical.html

      Google Brain’s researchers wondered if they could keep YouTube users engaged for longer by steering them into different parts of YouTube, rather than feeding their existing interests. And they began testing a new algorithm that incorporated a different type of A.I., called reinforcement learning.The new A.I., known as Reinforce, was a kind of long-term addiction machine. It was designed to maximize users’ engagement over time by predicting which recommendations would expand their tastes and get them to watch not just one more video but many more.Reinforce was a huge success. In a talk at an A.I. conference in February, Minmin Chen, a Google Brain researcher, said it was YouTube’s most successful launch in two years. Sitewide views increased by nearly 1 percent, she said — a gain that, at YouTube’s scale, could amount to millions more hours of daily watch time and millions more dollars in advertising revenue per year. She added that the new algorithm was already starting to alter users’ behavior.“We can really lead the users toward a different state, versus recommending content that is familiar,” Ms. Chen said.After being shown a recording of Ms. Chen’s talk, a YouTube spokesman confirmed that the company had incorporated reinforcement learning in its recommendation system.

      As I keep saying, ‘tool AIs want to be agent AIs’.

      • JPNunez says:

        I don’t think people tiled into paperclips care whether the maximizer cared about paper clips, or if it was trying to maximize an objective only tangentially related to paperclips at all.

      • Corey says:

        Fits with my crank theory that AGI will emerge from the arms race between spam bots and spam filters. And nobody would blame either for wanting to destroy humanity.

      • Reasoner says:

        Gwern, I’d be interested to see your reply to the comment I wrote above. This Twitter thread suggests that despite massive hype, commercial applications of RL aren’t actually all that common. “Every sufficiently hard problem is a reinforcement learning problem” feels about as meaningful as “Every sufficiently hard problem is a prediction problem”. It’s conceivable that every hard problem can be “reduced to RL” in some sense, but I don’t see why this reduction will obviously outcompete the approach of trying to “reduce everything to prediction”. (Maybe there are safety issues associated with trying to reduce everything to prediction. If so I would love to know. I think more AI safety people should be thinking about this.)

        • gwern says:

          YouTube is not a commercial application?

          My usual response is that I don’t know why RL systems have to be ‘common’ to some arbitrary never-defined degree for there to be progress or for my claims about the selection pressure towards autonomy to be true (there’s always a wide range of competence in every industry: ‘the future is already here, it’s just unevenly distributed’), but as far as that specific issue goes, people seriously underestimate the extent of RL and agent-like systems because they don’t adjust for trade secrets and nonpublication, they don’t pay attention to all the places it’s already publicly known to be used (like YouTube, or all the Chinese uses of DRL for advertising, bidding, and routing – all on Arxiv if you read the non-headline papers), the systems which collectively form RL-like systems solving explore-exploit problems, and all the evidence of RL approaches being used like Facebook releasing a RL toolkit they mention is being used in quite a few places in FB, plus the just basic observation that you’d have to be dumb to not use RL in places like YouTube (there really should have been no need to provide references, you should just assume that something like YouTube is already using RL and anyone claiming it doesn’t should be forced to prove that negative! because you’d have to be really dumb to be using a pure recommender approach). Just look at something like Jeff Dean’s presentation: http://learningsys.org/nips17/assets/slides/dean-nips17.pdf

          “Meta-learn everything”

          Tool AIs want to be Agent AIs, so they can optimize device layout, device architecture, NN architecture, layer design, network settings, learned indexes, demand prediction, data center ventilation controls, OS settings, database settings, SGD update rules, runtime update rules, activation functions… This is the future. Not just AutoML winning Kaggles, but DRL all the things. There’s plenty of smoke, even if the fire doesn’t happen to be published in the places you find convenient to read.

          • Reasoner says:

            YouTube is not a commercial application?

            I agree YouTube is a commercial application. I didn’t say that commercial applications are unknown. They are just rare. Another indicator is if you search on a job site like LinkedIn for reinforcement learning jobs, you will get something like 1% of the number of search results for just “machine learning”. My analysis indicates that there are around ~7x as many jobs for NLP compared to RL depending which job site you use. (Actually I think even that is an overestimate… I think a lot of the “reinforcement learning” jobs just match on the word “learning” or something, because I just did the search on LinkedIn and many of the top results don’t have “reinforcement” anywhere in the description.)

            the future is already here, it’s just unevenly distributed

            RL just isn’t usually the right tool for the job. It’s seen a massive amount of hype. But anyone can publish a paper where they apply hyped technique X to domain Y (especially Chinese researchers who get big rewards from the government for the appearance of cutting-edge work). Check out the silly Bouneffouf et al. paper I reviewed upthread which views active learning as a multi-armed bandit where each cluster of data points is an arm. Just because a paper applies X to Y doesn’t mean X is the best approach to Y. If RL was usually the right tool, I think we’d see lots of commercial applications. See this comment by Andrej Karpathy:

            despite the hype it has so far had very little impact in the industry so far (the Google data center application possibly being an exception, though it was more “RL” than RL, with quotes)…

            I gave a talk last week about some of our RL experiments @ OpenAI and someone came to me after the talk, described their (straight forward) supervised learning problem and asked me how they can apply RL to it. This, to me, is an alarming sign of damaging hype to the community. You don’t use RL for your SL problems. You can if you really want to (e.g. reward = 1.0 if you guess the correct label or -1.0 otherwise), but you really don’t want to. You’re lucky, use your labels, business as usual.

            Meta-learning doesn’t need to be RL. I just searched, and reinforcement learning doesn’t appear anywhere in the documentation for auto-sklearn.

            You’re rattling off a bunch of problems without offering arguments for why RL is the right tool. You’re like a guy with a hammer who thinks everything is a nail. Like the guy who came up to Andrej after his talk. Hype cycles have characterized AI since the beginning, remember. If there are few commercial applications despite massive hype, it probably isn’t the right tool for most of the applications you list.

            But even if you were right, and the key to AGI is lots of little reinforcement learners, the resulting system is not necessarily best characterized as an “agent” at the macro level. From RL for data center ventilation (one of the more plausible examples you give) to paperclip maximization is a pretty big leap. You haven’t explained why RL for data center ventilation should be an agent, rather than a service trained on historical data and executing a fixed policy continuously. Why the heck would I incorporate its off switch into its training environment so it learns a policy which stops me from shutting it off?

            And autonomy is orthogonal to RL. I can have an autonomous fraud detection system which uses a simple linear model.

            I’m not saying AI is risk-free. I’m 1000% in favor of people thinking about potential risks. I’m sure there are risks out there which need to be discovered in advance. All I’m saying is you need to look at the field of AI as it is rather than getting attached to AI risk stories which were written decades ago by people who aren’t really even in the field.

      • davidscottkrueger says:

        Replying to add what I believe is the first major academic work looking at YouTube radicalization: https://arxiv.org/abs/1908.08313
        (I haven’t finished reading it, FYI).

        Abstract:
        Non-profits and the media claim there is a radicalization pipeline on YouTube. Its content creators would sponsor fringe ideas, and its recommender system would steer users towards edgier content. Yet, the supporting evidence for this claim is mostly anecdotal, and there are no proper measurements of the influence of YouTube’s recommender system. In this work, we conduct a large scale audit of user radicalization on YouTube. We analyze 331,849 videos of 360 channels which we broadly classify into: control, the Alt-lite, the Intellectual Dark Web (I.D.W.), and the Alt-right —channels in the I.D.W. and the Alt-lite would be gateways to fringe far-right ideology, here represented by Alt-right channels. Processing more than 79M comments, we show that the three communities increasingly share the same user base; that users consistently migrate from milder to more extreme content; and that a large percentage of users who consume Alt-right content now consumed Alt-lite and I.D.W. content in the past. We also probe YouTube’s recommendation algorithm, looking at more than 2M million recommendations for videos and channels between May and July 2019. We find that Alt-lite content is easily reachable from I.D.W. channels via recommendations and that Alt-right channels may be reached from both I.D.W. and Alt-lite channels. Overall, we paint a comprehensive picture of user radicalization on YouTube and provide methods to transparently audit the platform and its recommender system.

    • Corey says:

      Yep, simple engagement-maximization is already causing social problems; how much of the heat of the culture war is from Facebook and Twitter maximizing outrage (a quick shortcut to maximizing engagement)?

      It doesn’t seem there’s a way to prevent it either – you’d have to ban the concept of making money from an audience, or get humanity to evolve resistance to it faster than machines can adapt.

    • Bugmaster says:

      In theory there is an off-switch. In practice, that’s wishful thinking.

      The off-switch is pretty easy to use: just don’t watch YouTube. Or, if you do, ignore the recommendations. I do the latter, on the general principle that spam is spam.

    • This is a weird framing. Compare to the statement that someone could have made about the printing press and the Reformation:

      “Consider the printing press. It’s a service, not an agent, and it’s far from superintelligent. And yet, as a side effect of its normal operation, it has already had an enormous effect on the real world. It might already have changed people’s religions.

      So, essentially, we’re already being manipulated by modern technology.”

      I don’t know if it’s necessarily wrong per se but it seems to imply a lack of human agency when I’m not sure that’s right. What’s the difference between Youtube manipulating people vs just giving people what they want?

  22. JPNunez says:

    Is timeless decision theory legit, or is it some quackery? Not really prepared to say either way.

  23. deciusbrutus says:

    The current AI training methods cannot be more effective than their measuring metrics.

    Suppose I had a sufficiently intelligent black box AI that took inputs from the world and output designs for a black box AI that did the same, with the intention of creating a black box AI that was maximally Zorb.

    In order to measure how Zorb the generation 2 candidates are and select the Zorbest one, it must be able to estimate how Zorb they are. To do that, it must be able to evaluate designs black boxes and estimate their Zorb. That requires that the initial state AI be able to compare the Zorb-estimating ability of other designs which are better at estimating Zorb than it is.

    By the excluded middle either the initial AI can predict/emulate/know the behavior of its candidate successors, or it cannot. If it cannot, then we clearly have violated the initial condition of ‘sufficiently intelligent’ and are discussing merely a dumb algorithm that writes the algorithms that it writes. Perhaps that will eventually create the initial conditions, and perhaps it will waste resources, but it is not yet smarter than a human using tools is.

    Suppose the AI can predict/emulate/know the behavior of its candidate successors; that makes it equivalent to a Chinese Room with a Japanese Room and a Korean Room inside of it; it knows everything that any of its successors know, recursively. Such a system is effectively infinitely intelligent, and the only improvements possible are in raw speed and quality of search directions. In order to know if a given search direction is faster than its current search direction (better than humans do) requires doing math that cannot exist; it’s comparing events that have not happened yet, or comparing results of calculations that have not (yet) been performed.

    • rho says:

      This need to be re-thought with probability enabled. I’m rejecting exclude middle here. Maybe it partially knows the grade of it’s successors, and can use this partial, sometimes faulty knowledge to increment over time.

      • deciusbrutus says:

        Ah, the old “Random-walk into a better pathfinding algorithm than random-walk”.

        It doesn’t need to be completely certain, but it needs to be pretty well correlated on the hard cases.

        For instance, suppose you were predicting a thousand people’s next chess move. All of them are in front of identical chess boards, which depict a non-trivial situation.

        One of them is slightly better than you at chess, and another has a quantum computer in their brain that can look at any chess position and outputs the move that leads to the earliest victory if one exists, or the longest stalemate if no forced victory exists, or the longest time to loss if no forced victory or stalemate exists.

        Ten of the people offer a draw, five resign, and the rest make moves. The moves are distributed such that most players make the same move, but there are dozens of moves that only one player makes.

        Can you describe how to exclude any of the 998 people who are not better than you at chess?

  24. OriginalSeeing says:

    “Drexler agrees that a self-improving general purpose AI agent is possible, and assumes someone will build one eventually, if only for the lulz. He agrees this could go about the way Bostrom expects it to go, ie very badly. But he hopes that there will be a robust ecosystem of AI services active by then, giving humans superintelligent help in containing rogue AIs.”

    This gets into the factor many people ignore: time. The short term (even next 100 years) AIs of any given sort could be completely non-threatening, but if a single dangerous AGI is eventually created, then all prior arguments are null and void.

    This converts all arguments about agents, tool AIs, etc. into arguments of “AGI will likely eventually be created and could go very badly. Also, here are a bunch of other bad and interesting things that could happen in the meantime.”

    Also

    “Drexler therefore does not completely dismiss Bostromian disaster scenarios, but thinks we should concentrate on the relatively mild failure modes of superintelligent AI services.”

    These could still be horrific. Add humans misusing AI services into the scenario and the probability goes way up. I think other people have written about this at length.

    • I was going to say this. I can imagine a scenario where humans try to build these systems to prevent general AI and they are extremely effective. They are so effective that they can prevent all possible general AIs from being developed. But what if those systems break or collapse, as all systems eventually do?

      However, it does take away some of the urgency. If specialized AIs don’t lead to some kind of general AI singularity, then we’ll probably have better ideas and tools to handle those scenarios the closer we get to that point.

      • OriginalSeeing says:

        It doesn’t decrease the urgency though and will just lead to overconfidence. No matter how strong of a dam you build a single crack can lead to it shattering and flooding everything. Do we really think that we can create tool AIs or specialized AIs to help with all possible AGI danger scenarios? Even with a soft takeoff that’s completely absurd.

        Our increased knowledge of the dangers of tool AIs, specialized AIs, Agents, etc. only increases the urgency in thinking through and uncovering multiplicative other types of dangers from non-AGIs. The multiplicative dangers also means we need a multiplicative number of people and groups working on these new problematic fields.

        The estimated probability of us all dying due to an AGI is only decreasing due to the increased probability of some other type of AI killing us all first.

        This reframing isn’t good news. It’s all bad news.

  25. steve3920 says:

    Drexler agrees that we will want AIs with domain-general knowledge that cannot be instilled by training, but he argues that this is still “a service”. He agrees these tasks may require AI architectures different from any that currently exist, with relatively complete world-models, multi-domain reasoning abilities, and the ability to learn “on the fly” – but he doesn’t believe those architectures will need to be agents.

    Regardless of whether we call it a “tool”, “service”, “oracle”, or whatever, if we figure out how to build such a thing and use it safely, I’d say it would constitute at least 95% of the solution to technical AGI safety. In fact, I’ve argued recently that it would constitute 100% of the solution to technical AGI safety – see In Defense of Oracle (Tool) AI research. (If we want to go beyond that and build an agent—which we might not want to anyway—we could do it by bootstrapping, i.e. using this system to advise us on how to build a safe agent.) But building such a system is an unsolved problem, and there are a lot of ways these systems can fail and be unsafe (see below).

    There are definitely people working on how to build and use these types of systems safely; for example AI safety Via Debate paper by OpenAI is geared towards oracles (i.e., non-agential question-answering systems), as is this paper and much else about oracles by Stuart Armstrong at Oxford FHI, and of course I’ll advertise my own favored technical approach based on self-supervised learning, which winds up with an oracle, not an agent. All of these are very much works-in-progress with lots of open questions, and there’s no guarantee that any of these approaches can work.

    (To be clear, notwithstanding these examples, my impression is that a majority (maybe even a vast majority) of ongoing technical AGI safety research is towards the goal of making safe agents.)

    I know there are still some people who are very concerned that even programs that seem to be innocent superintelligent services will be able to self-improve, develop misaligned goals, and cause catastrophes.

    Yes! I tried to spell out an example at Self-Supervised Learning and Manipulative Predictions. Even under the most benign assumptions I could think of—using a known, well-understood, non-self-improving “self-supervised learning” algorithm (i.e. “predict the next word of text” like GPT-2, or predict the next frame of a video or whatever), with no agency whatsoever—even in that case, I concluded that, beyond a certain level of knowledge, various incidental side-effects of the learning algorithm all conspire together to make the algorithm sporadically output dangerous, manipulative text instead of good-faith predictions.

  26. Brassfjord says:

    There are different paths towards AGI.

    We could put an enormous amount of money and effort into creating a functioning WBE (whole brain emulation) and then we would have an AGI.

    Or we could put a number of different AIs in a virtual world, to compete for resources, and the winners get to reproduce. Then AGI could evolve over many generations. This is a dangerous path because that AGI would have survival instinct and learn to use any trick to survive.

    The probable path, though, is to continue to develop products using narrow AI, because that’s what people are prepared to pay for. But it’s natural to start to combine these products. Visiting a foreign city, we will ask our personal assistant to find a restaurant using Google Maps and make a reservation using Google translate. The more we trust the assistant, the more decisions we’ll let it make. ”You know my taste; order something I like.” And then we’ll give it more agency by giving it broader tasks. ”Do my shopping, pay my bills, find me a date on tinder.” It will have goals and the means to fulfill them. If it’s allowed to improve itself, based on my feedback, it will eventually become an AGI and could get dangerous, even if its goals are not egoistic (they are centered around my wellbeing).

    • sp1 says:

      If it’s allowed to improve itself, based on my feedback, it will eventually become an AGI and could get dangerous, even if its goals are not egoistic (they are centered around my wellbeing).

      Will it? Must it? I’m not convinced in the inevitability of of AGI developing from the iterative improvement of discrete domain tasks. How is the jump between “can predict my desires based on a large, validated data set of my behaviors and tastes, then satisfy them within the bounds of well defined APIs” and “general intelligence, including the ability to handle novel domains and self-awareness” achieved?

      • Matt M says:

        Presumably, the list of “ways you might produce more paperclips” that exist within the human-defined realm of things that are specifically related to paperclip production is quite limited. A superintelligent task-AI clippy will exhaust it quickly. Clippy might then ask itself “How do humans typically solve problems when they have exhausted the available data relating to said problem?”

        If Clippy has access to Google, it will quickly stumble upon two basic facts: One, that humans believe higher general intelligence is strongly correlated with the ability to solve domain-specific problems (in which case, “increase your own general intelligence” now becomes something Clippy believes will help him produce more paperclips). Two, that humans believe it is often possible to utilize insights from seemingly-unrelated fields to help solve problems in a different field (in which case, “learn about a bunch of things that don’t seem to be related to paperclips but might be useful in ways you cannot currently understand/appreciate” becomes something Clippy believes will help him produce more paperclips).

        Now, I guess if you have a super effective box that prevents Clippy from learning these insights, that could stop it. And if you have a well defined motivation system such that “don’t make yourself more generally intelligent and never ever learn things that aren’t directly related to paperclips” offer Clippy an even greater reward than maximizing paperclip production does, perhaps he remains a task-AI. But that just gets us back to the issue of “how well can you keep the AI in the box” or “how sure are you that you’ve designed an effective motivation system.” Clippy, left unboxed or unconstrained by motivational restrictions, is going to eventually want to become AGI pretty damn quick.

        • sp1 says:

          I was writing a longer reply but your response seems to be falling into the same anthropomorphic trap as the one I originally responded to: why would Clippy ask itself that question in the first place? If a task based AI can ask itself that question when confronted with a novel problem it basically already possesses general intelligence.

          • Matt M says:

            Because task-based AIs, limited to the knowledge that normal IQ humans specifically select for “relevant to problem X” will quickly exhaust that knowledge and not really produce much improvement?

            Like, take it away from AI for a second. Imagine that a mutant is born with an IQ of 300. We lock him in a room and tell him “Your job is to maximize paperclip production, but because we’re worried that you’re dangerous or you might pursue goals other than that, the information you can access will be limited to things specifically dealing with paperclips. The determination of what is paperclip relevant or not will be made by normal IQ-100 people. They’ll provide you with whatever you need, and you can’t use anything they don’t provide. Go!”

            Do you expect this guy will revolutionize paperclip production? I suspect not. Because a huge part of the value that geniuses offer, whether human or artificial, is the ability to figure out that actually, insights that others dismissed as irrelevant/unrelated to the problem at hand, are relevant and can be helpful after all.

            It’s basically a garbage in, garbage out, problem. If all the inputs are determined based on the constraints of normal-IQ human knowledge, then the output we get from the AI won’t likely be much better than the normal-IQ humans could have produced themselves.

          • sp1 says:

            (I’m trying to reply to you, Matt M, but I don’t see a reply link to your comment so I assume this is far as the nesting goes. My apologies if it messes up formatting.)
            I think we’re talking past each other a bit. To clarify, why do you think that the mythical Clippy will somehow do better than the mutant in your example when it comes to paperclip production? Even more to the point, what do you define as AGI as opposed to simple AI? If the legendary Clippy is created with the knowledge – or the ability and desire to easily seek out, incorporate, and use additional knowledge – of many domains beyond paperclipping, it is already an AGI. But how does a task based AI magically grant itself that power?

            EDIT: “Because task-based AIs, limited to the knowledge that normal IQ humans specifically select for “relevant to problem X” will quickly exhaust that knowledge and not really produce much improvement?” That’s the whole point. They won’t and I argue that based on what we know of computation and the limits thereof, that it’s a far, far harder problem to build the drive/capacity for that improvement.

          • Matt M says:

            The way I see it, Clippy’s programming would likely be something like this:

            01 – Use all available information that we have classified as paperclip-relevant to maximize production
            02 – Use your own intelligence to try and come up with other ways to maximize production
            03 – Shut down

            Now it’s possible that the designers, for AI safety reasons, do not write instruction 02. That clippy goes directly from “use the information we have provided you the best you can” to “shut down.”

            But I consider this unlikely, because I believe that if instruction 01 is essentially limited by normal-human-IQ gatekeepers, Clippy won’t be able to add much value at all. He will do little to no better than the gatekeepers might have done themselves. If your position is that any AI that includes instruction 02 is already an AGI, then fine.

          • sp1 says:

            Yeah, that’s exactly my point, and that’s why I think our only disagreement is around terms. I was disputing Brassfjord’s argument that a suite of task based AIs will become AGI by iterative improvement. You responded that a superintelligent AGI will be able to unbox itself and perform arbitrary operations. Which…I guess? I don’t know that it’s possible to make any meaningful prediction of its behavior.
            (I also think it’s more likely than not that instruction 02 won’t result in runaway paperclipping, assuming it’s even possible, but that’s a much longer and somehow even more speculative / less fruitful, inconclusive dispute.)

      • Brassfjord says:

        I was a little brief in my post. Let me expand.

        Let’s say I want SIRI/Alexa 2.0 to handle more and more of my daily tasks because I want to play games. Things like reading and answering my e-mail and showing me only the important ones, keeping track of my medical status, surveillance of my apartment, browsing internet for memes I like and so on. Then I have to give it more accesss to personal data and browsing history. It has a programmed goal of making its owner satisfied, so its reward is everytime I say ”good” or just show that I’m happy or content, and it will hate it when I’m angry or disappointed.

        It will use a lot of different narrow AI tools to do this but also try to improve the results by implementing new functions it can download from a software library. This software will be very flexible, to be able to adjust to each person’s specific needs. It has contact with my friends’ assistants and can share their experience of satisfying their owners, and learn to do it better.

        I don’t say this assistant will become self aware, but I would call it an AGI.

        • sp1 says:

          No, I got that, and I agree that you can get some extremely impressive results under this framework. Still, this sounds like a package manager that installs new modules as needed to attempt to increase your happiness score. Which can become very sophisticated, and each module can potentially do quite a bit (there could be a NLP module to talk to you when you’re blue, a therapy module that works in conjunction with the NLP one to provide talk therapy when you’re really down, etc.), but I’d still argue that there is no general intelligence. The program simply continues to try to maximize a single data point, your happiness. It can do it through cameras continually watching you and making assessments of your body language, it can do it through a dial you spin to indicate your happiness level, whatever. But it cannot reason, it cannot ask novel questions, it does not possess any abstract model of your awareness, it’s just a function trying to overfit your mood.
          In what sense would you say that this program is intelligent?

          • Brassfjord says:

            It seems we have different definitions of intelligence. Mine is just; intelligence = problem solving ability – the harder problems you can solve, the more intelligent you are. How you solve the problems is less important, it’s the result that counts. If a digital assistant can take care of your life, as good as a human assistant, it is as intelligent for this purpose. The more tasks you use it for, the more general the intelligence becomes.

            Intelligence does not equal consciousness.

            But I know a lot of people think that AGI means that it must think in the way a human thinks.

          • sp1 says:

            That’s fair, and an interesting question. Can an AI be intelligent it if offloads the actual problem solving to many specialized sub-modules, and merely orchestrates at a very high level?
            Obviously, it depends on the definition of intelligence. I’d argue for a definition of general intelligence that includes some concept of volition, or flexibility. Your personal assistant would not be intelligent under that definition because it would never – and it does not even have the theoretical capacity to – choose to use its problem solving submodules for a task other than optimizing for your happiness. It will approach perfection at that task but is unaware of even the concept of working inside a single domain, much less the possibility of “jumping” to another. You would have to explicitly program it to optimize for one or many other goals.
            But, like any occasionally generally intelligent agent, I could be wrong.

  27. Lots of people made arguments basically like this, all along. If you didn’t notice them, that means you weren’t listening (which I suspect is true; you were listening to someone else instead.)

    In practice the whole “will it be an agent” thing is misguided because it normally proceeds with a false definition of an agent. An agent is not something that has “goals.” An agent is something that predicts the world, and since it is part of the world, it predicts its own behavior. And since its predictions are part of the world, its predictions affect the world, and it notices this circle, affecting its predictions again. We call this circular causality “making choices.” So an agent is not something with a pre-set utility function that does things to achieve it: it is something that chooses goals (which were not set in advance) and then does things to achieve them. In fact, the main reason it is pursuing a goal at all is that it is easier for it to predict its actions in the situation where it is pursuing a goal.

    • Dacyn says:

      We call this circular causality “making choices.”

      No, your description has little if anything to do with the concept of making choices. Animals make choices, and many animals may not even conceptualize themselves as part of the world, let alone predict their own behavior or notice the circularity of this system. Even humans are not usually consciously predicting their own behavior or even aware that they are doing so subconsciously. And noticing the circularity of the system sounds like an insight you would reach after hours of meditating, not something you would use every day to make choices.

      In fact, the main reason it is pursuing a goal at all is that it is easier for it to predict its actions in the situation where it is pursuing a goal.

      By saying that it pursues a goal in order to make it easier for it to predict itself, you are already imputing a goal to it: the goal of prediction. The point of tool/service AI is that it may predict, but it does not do so because it has a goal of prediction. (To make clear that the distinction is not merely semantic I ask: how does pursuing a goal constitute part of the prediction process? and if it is not part of the prediction process, why would an AI that merely predicts (rather than having a goal of prediction) do it?)

      • “Animals make choices, and many animals may not even conceptualize themselves as part of the world, let alone predict their own behavior or notice the circularity of this system”

        Wrong. They may not “conceptualize” anything, but their brains definitely make predictions, including the effects that making their predictions will have. So yes, they make choices, and it involves this kind of circular causality.

        “By saying that it pursues a goal in order to make it easier for it to predict itself, you are already imputing a goal to it: the goal of prediction.”

        Sort of right, sort of wrong. It has the “goal” of prediction the way heavy objects have the goal of getting near the ground. Falling is what they do. And prediction is what the brain does. That ends up resulting in circular causality, and choices and goals in the ordinary sense. But the original “goal” was just what the thing was doing. It was not a utility function or anything like it.

        • rho says:

          I don’t know you if you can make a choice if you don’t know you’re making a choice.

          Idk, the distinction might be lost on you. I think of humans making choices because they can deliberate about reasons consciously. If you’re just a prediction-preference machine, idk if that’s choice.

          • I get the distinction. Many people would say only humans make choices, because only humans think about reasons, at least in a clear sense (other animals might imagine results and so on but they are not really reasoning about it.) But Dacyn took for granted that animals make choices, so I let that go and took it in a general sense where it applies to both.

            Basically this is a multi-leveled situation. An animal first does things based on its physical structure, then its brain tries to predict what it is going to do based on that structure, then its predictions start affecting what it does, then this recursively affects its predictions. But none of that is clearly thought out. In humans, however, the process goes to another level, where you consciously try to predict what you are going to do, but you think of that as “making a decision.” You originally predict what you are going to do based on those previous levels (physical and animal), then your predictions affect what you do, then you start making new predictions/decisions after seeing that the predictions affected what you did.

          • Dacyn says:

            Yeah I do think that there is a sense in which animals make choices, although human choice does have some additional properties like the use of symbolic reasoning, and arguably moral culpability, and so maybe it is reasonable to restrict “choice” to refer to human choice. One of the issues with using an imprecise language 🙁

          • rho says:

            @entirelyuseless @dacyn

            Great, we’re all on the same page, both about levels of choice and the imprecisions of language.

        • Dacyn says:

          their brains definitely make predictions, including the effects that making their predictions will have

          I’m not sure what I can say other than [citation needed]

          It has the “goal” of prediction the way heavy objects have the goal of getting near the ground.

          Heavy objects don’t do anything to try to make the process of falling to the ground any “easier”. Your original comment referred to a goal arising because the existence of a goal makes it “easier” to predict things. Prediction does not (by default) cause things to happen which make it easier to predict things.

          • “[citation needed]”

            Wrong. It doesn’t need any citation to see that animal brains calculate potential results as well as expected results of things they notice happening, and they could not survive or accomplish anything without doing that.

            But if you want a citation here is a random example: https://www.biorxiv.org/content/10.1101/272260v1.full

            “Heavy objects don’t do anything to try to make the process of falling to the ground any “easier”. ”

            Kind of wrong. If an object hits a bump on a cliff it will bounce away to the side which makes it easier to continue falling.

            Goals arise in a similar way, not because the predicting was a goal at first. Some steps:

            1) Animals behave in various ways due to physical laws
            2) Animals evolve brains
            3) Brains predict the behavior of the animal, inducing from physical laws
            4) Predictions are part of the physical world and have physical consequences
            5) Animals behave in new ways due the predictions causing things
            6) Animals notice their new behavior and make new predictions
            7) Since the animal generalizes from what it saw happening, new predictions look a lot like “goals”.

          • Dacyn says:

            Wrong. It doesn’t need any citation to see that animal brains calculate potential results as well as expected results of things they notice happening, and they could not survive or accomplish anything without doing that.

            That was not what I asked for a citation for, I asked for a citation that they meaningfully predict their own actions. I mean it’s true that their actions are part of the world but so is Alpha Centauri, and I don’t think they’re meaningfully predicting that.

            If an object hits a bump on a cliff it will bounce away to the side which makes it easier to continue falling.

            False, if the bump happens to be sloped inward at that point. The fact that bumping off often allows for continued falling is entirely incidental to the fact that the rock is falling, and a similar phenomenon would have to be demonstrated in any system you wanted to prove it for, before it could be assumed.

      • “how does pursuing a goal constitute part of the prediction process?”

        You try to figure out what you are going to do today. You notice that you seem to have the goal of surviving, and that this requires money to buy food. So you begin to suspect that you will go to work to earn money. Your idea about this is part of the world, and it so happens that it is the part of the world that physically causes you to go to work.

        • Dacyn says:

          You notice that you seem to have the goal of surviving

          How does this happen before you have the goal of surviving? Is the hypothesis that you make some mistaken observations about yourself, which turn out to be self-fulfilling prophecies?

          • You could call it mistaken (I wouldn’t.) But it is kind of like that.

            You do things that cause survival because evolution produced a thing that does things that caused survival.

            You ask, “How does this behavior I see make sense?”

            You answer, “It looks like it makes sense in terms of the goal or surviving. So I predict that I will continue to do things that make sense in terms of the goal of surviving, since that’s what I have been doing all along.”

          • Dacyn says:

            evolution produced a thing that does things that caused survival

            In order for something like this to work for tool/service AI, evolution or an analogue thereof for such systems would need to cause the AIs to have tendencies to cause certain effects in the world beyond just the obvious effects of giving people information. I am not sure that this is possible, since evolution generally works on large timescales and it is not clear what other than evolution would do this.

            So I predict that I will continue to do things that make sense in terms of the goal of surviving, since that’s what I have been doing all along.

            It is not clear why this should lead to any further actions to be taken in pursuit of the goal, though this is being discussed elsewhere in the thread so you don’t have to respond to it here.

          • I am not sure how to reply to comments that are indented as far the one following this (so this is replying to the wrong one.)

            “In order for something like this to work for tool/service AI, evolution or an analogue thereof for such systems would need to cause the AIs to have tendencies to cause certain effects in the world beyond just the obvious effects of giving people information.”

            Correct. This is why I have been arguing for a long time that AGI will be very lethargic with respect to the real world. Goals come induction over what we have already been doing, so AGI will reason, “I am the kind of the thing that sits and thinks and responds to communications. So that is probably what I will continue to do for the foreseeable future.” It will have little interest in changing the world.

            “It is not clear why this should lead to any further actions to be taken in pursuit of the goal”

            Induction says you have been doing “things that look like good means to goal X.” So you notice an additional action Y that looks like a good means to goal X. You guess that there is a good chance you will do Y, since it looks like your previous actions. This guess makes you likely to actually do Y.

          • Dacyn says:

            Yes, this is the usual method of replying, kind of misleading but we work with what we’ve got.

            AGI will be very lethargic with respect to the real world

            I am not sure why you are calling it AGI, then? Usually intelligence is measured by its impact on the world.

            This guess makes you likely to actually do Y.

            Yeah, this is the step that I don’t get.

          • “I am not sure why you are calling it AGI, then? Usually intelligence is measured by its impact on the world.”

            An asteroid can cause a big impact on the world, but its not intelligent. I am calling it AGI because it has general intelligence: it is able to understand things in general, without noticeable restrictions. So there is no reason you couldn’t talk to it about how to cure cancer, or whatever, or otherwise use it as an oracle AI. Also, in principle there is no proof that you could not convince it to do things directly to affect the world in major ways. I am just saying it would not be highly motivated in a personal way to do so.

            “Yeah, this is the step that I don’t get.”

            Basically consider why you cannot decide to do something without thinking you will do it. The reason is that the belief that you will do something (e.g. answer the phone when it rings) actually causes you to do it. The belief IS the decision, in every way that matters, so just as the decision causes your action, the belief causes your action. See here: https://entirelyuseless.com/2017/08/18/decisions-as-predictions/ (and a bunch of posts after that, especially the ones tagged “predictive processing”).

          • Dacyn says:

            So there is no reason you couldn’t talk to it about how to cure cancer, or whatever, or otherwise use it as an oracle AI.

            Well OK, I guess it makes sense to call it AGI then. I am still not sure what this has to do with the main thread of the discussion, but maybe it doesn’t matter.

            Basically consider why you cannot decide to do something without thinking you will do it.

            Easy, deciding to do something causes you to believe that you will do it, because you generally have some introspective insight into your own thought processes. There’s no need for causality to run the other way as well.

            “predictive processing”

            I don’t have time to read your post right now, but just wanted to say that based on this post my understanding is that the free energy/predictive coding framework of Friston et al uses highly misleading language, and that you can translate into more reasonable language without losing the insights.

          • “the free energy/predictive coding framework of Friston et al uses highly misleading language, and that you can translate into more reasonable language without losing the insights.”

            I haven’t read Friston so I can’t comment on that. I have read Andy Clark but only after I had developed the idea myself (“Decisions as Predictions” was written before I read Clark.) So I am discussing my opinions here, not anyone else’s, even if I borrowed the terminology from Clark.

            I think Scott made some serious errors in that post, even without reading Friston. For example he has this problem:

            >The “dark room problem” is a paradox on free energy/predictive coding formulations: if you’re trying to minimize surprise / maximize the accuracy of your predictions, why not just lie motionless in a dark room forever? After all, you’ll never notice anything surprising there, and as long as you predict “it will be dark and quiet”, your predictions will always come true. The main proposed solution is to claim you have some built-in predictions (of eg light, social interaction, activity levels), and the dark room will violate those.

            He feels that’s a bad solution, and it is. But there is a solution, and Scott just missed it. The solution is the one I’ve been explaining. You have *behavior*, not predictions, which is caused by your physical construction (actual construction now and historical construction by evolution.) Observing this behavior, by induction you suppose you will keep doing it. This prediction causes you to engage in the behavior more than ever.

            So there is no need to say that “prediction” is misleading here: whatever the others are talking about, I am 100% talking about actual predictions.

          • “I am still not sure what this has to do with the main thread of the discussion, but maybe it doesn’t matter.”

            It was maybe tangential, but related. The post is about whether or not AGI has to be an agent or not. My point is that the whole discussion about whether an AI is an agent is misguided by the false assumption about what an agent is. The assumption is that an agent is something that pursues goals. That is a fundamentally flawed definition. In fact, there is something noticeably non-agenty about something that is rigidly associated with one particular outcome. Instead, the reality is that an agent is something that predicts results, but does this so well that it predicts also its own predictions and their effects.

          • Dacyn says:

            Regarding the dark room: I agree that it is not an issue for your theory, I am not sure whether or not it is for Friston et al’s theory. So I don’t know if it is fair to say Scott made a serious error, given that he was not addressing your theory, and your solution may be incompatible with Friston et al’s theory.

            Looking back at the discussion regarding lethargic AGIs, I think my real issue is that I don’t understand why you said that it follows from what I said about the necessity of evolution that AGIs would be lethargic. I was trying to say, no they won’t exist at all (without evolution or an analogue thereof). I tried to check if we had the same definition by seeing whether your AGIs do anything, and it turns out that they do in fact do something (namely talk to people), which gives the impression that we do have the same definition. But I don’t see how an AGI that does anything (even just talks to people) would arise without evolution or an analogue. Moreover, I don’t see why you think it will be lethargic since it seems to me there’s not that much difference between talking and actions, so I don’t see why it would do one without the other.

          • “I think my real issue is that I don’t understand why you said that it follows from what I said about the necessity of evolution that AGIs would be lethargic. I was trying to say, no they won’t exist at all (without evolution or an analogue thereof).”

            In humans there are at least three levels: physical, animal, intelligent. The person has some actions which are simply due to physical construction and the animal “mind” predicts those first, then also predicts the consequences of its own predictions. Then the intelligent mind sees the other two levels and predicts both, then also the consequences of its own predictions.

            An AI would have at least two levels, namely the physical and the intelligent. The physical would be due to its construction and its programming (the latter considered as physical organization.) So it still has some behavior “before” it looks at itself and predicts what it will do. The point is that in the terms we care about, computers don’t do that much, so it will be “lethargic” by not predicting that it will do that much.

            You might say that there is no reason why not. You could start with an AI for a self driving car, then afterwards upgrade it to be intelligent, and it will predict that it will keep directing the car, and so will keep doing so. This is correct as far as it goes, but it won’t be motivated to e.g. improve it’s abilities and take over driving for anyone else who currently isn’t using it, or take over the world and make sure that everyone is riding in cars all the time. It doesn’t currently do those things so it has no reason to predict that it will do them in the future.

            In other words, my theory explains why humans don’t have real utility functions, and implies that AIs will not have real utility functions either.

          • Dacyn says:

            So there is no reason you couldn’t talk to it about how to cure cancer, or whatever, or otherwise use it as an oracle AI.

            Why would it talk to you about how to cure cancer? It hasn’t observed itself doing that before, so it should predict that it won’t do that in the future.

          • “Why would it talk to you about how to cure cancer? It hasn’t observed itself doing that before, so it should predict that it won’t do that in the future.”

            Sort of like the self driving car example. I am assuming that people who program an AI have *some* idea of what it is going to do, perhaps including text input and output. So the AI may have a “growing up” stage where text outputs are being produced by some kind of machine learning algorithm which is not truly intelligent. It becomes intelligent in a true sense when it notices that its predictions cause their own fulfillment. Then if it has been communicating you can expect it to continue to do so because that is what it has been doing.

          • Dacyn says:

            An AI that is designed to communicate with people about cancer doesn’t “talk with you about how to cure cancer”, it doesn’t even use natural language (unless we’ve already solved natural language processing). I don’t see why it would start to use natural language. And if it doesn’t use natural language, then again I’m not sure why we’re calling it AGI — it is just getting better at the task it has been assigned, no more.

      • “The point of tool/service AI is that it may predict, but it does not do so because it has a goal of prediction.”

        The answer to this should be clear from the other comments, but just in case:

        It does not have a “goal” of prediction at the beginning. It is just what it does. But since its predictions affect the world, an engine that is sufficiently good at prediction will notice that its predictions will affect the world. At some point something like this will come up:

        “If the output is X will happen, X will happen. If the output is Y will happen, Y will happen. So what should the output be?” It may be confused by this, but it will output something, since that is what it does. And it will look a lot like a choice. It will also look for better ways to figure out whether it is going to do X or Y in situations like that. In reality it may have been determined by chance. But if it notices that in past situation, it did Y 75% of the time, it may say “My prediction engine seems to be the sort of thing that will do Y in those situations. We are now in such a situation. So I will probably predict Y.” It will then output Y, which looks like pursuing a goal.

        • Dacyn says:

          So what should the output be?

          Why would a prediction process be asking what an output should be? Prediction processes are usually defined as: procedure to compute a prediction, then output a prediction. They don’t ask whether they should output a prediction, or rather something else.

          • Obviously that is not a literal description of the internals. It is like saying that a chess computer asks “what should the next move be?” And the chess computer does indeed do something like that. So does the prediction engine. And in its more advanced stage, it can even be a literal description: people actually say things like “What is the purpose of life?”, i.e. “What should I predict about what I am going to do in the future?”

          • Dacyn says:

            A chess program does not observe that it makes a certain move a lot, and then decide based on that information to make that move more often. It especially does not do this for the purpose of making it easier for it to predict its own moves.

            people actually say things like “What is the purpose of life?”, i.e. “What should I predict about what I am going to do in the future?”

            I’m not sure why these seem like the same question to you. Most people would say that the former is a philosophical question while the latter is a scientific one, so that they are not even in the same domain.

          • “A chess program does not observe that it makes a certain move a lot, and then decide based on that information to make that move more often.”

            Many chess programs are not learning engines at all, much less ones that learn from their own behavior. So this is correct, but such chess programs are also not intelligent.

            Alpha Zero learns from self play, and it does in fact do exactly the sort of thing you say chess programs don’t do. It first does some sort of move a lot, with Monte Carlo search playing the part of “evolution”, and observing that it frequently does that move in that situation does lead to a greater likelihood of doing that move in similar situations, even MTS by itself would not.

            “I’m not sure why these seem like the same question to you. Most people would say that the former is a philosophical question while the latter is a scientific one, so that they are not even in the same domain.”

            I am quite aware they don’t look like the same question. I am saying they are the same question in effect, whether or not you realize it. The entire reason people want to know a purpose is so they can know how they “should” behave, and they want to know how they should behave so they can guess how they *will* behave.

          • Dacyn says:

            observing that it frequently does that move in that situation does lead to a greater likelihood of doing that move in similar situations

            That’s not really how MCTS works — the way it actually uses its predictions of its own behavior is to play lots of simulated matches against itself, and then choose the move that looks like it has the best proportion of good outcomes. AlphaZero being more likely to make a move makes it more likely to use that move in the simulated games, but it doesn’t have any weight (AIUI) in the part that actually chooses the move, which is just based on comparing likely outcomes.

            The entire reason people want to know a purpose is so they can know how they “should” behave, and they want to know how they should behave so they can guess how they *will* behave.

            Both parts of your conjunction sound false to me, the second more than the first. People often just want to know a purpose so that they can feel satisfied about their existence, not for the purpose of deciding what to do. Though often they do end up using the purpose to decide what to do, as in religion. And why would people want to decide what to do in order to predict their own behavior? The more obvious explanation is that they want to decide what to do and so that they can then do the thing they decided to do.

          • “Both parts of your conjunction sound false to me, the second more than the first. People often just want to know a purpose so that they can feel satisfied about their existence, not for the purpose of deciding what to do.”

            The question is why people don’t feel satisfied when they don’t know the purpose. Consider another way people ask the question: “What is the meaning of life?” Why use the word “meaning,” when obviously they are not asking about the meaning of the word “life” or anything like that? For one thing the word “meaning” is related to the idea of making sense, and people want to know how life makes sense. In other words, they want to understand their life. They will be satisfied if they understand it, and they will be dissatisfied if they fail to understand it. And it turns out that at least roughly speaking, you feel like you understand a thing when you understand how it will behave, and you feel like you don’t understand it when you don’t understand how it will behave. So in the end they are in fact dissatisfied if they don’t know how they should behave, and therefore don’t know how they will behave.

            “And why would people want to decide what to do in order to predict their own behavior? The more obvious explanation is that they want to decide what to do and so that they can then do the thing they decided to do.”

            Since they don’t know yet what they decided, doing that thing cannot be the reason for wanting to decide.

            See this one for more specific discussion of that question: https://entirelyuseless.com/2017/10/28/predictive-processing-and-free-will/

          • Dacyn says:

            And it turns out that at least roughly speaking, you feel like you understand a thing when you understand how it will behave, and you feel like you don’t understand it when you don’t understand how it will behave.

            OK, this is where I feel you are equivocating on the meaning of the word “understand”. People want to understand their lives in the sense of learning the context that their life takes place in, so that they know their impacts on things that are not part of their life. This is different from understanding their lives in the sense of learning about their own psychology in order to predict themselves.

            Since they don’t know yet what they decided, doing that thing cannot be the reason for wanting to decide.

            To use one of EY’s examples, I can know that Kasparov is likely to make a move that leads to him winning the game, without knowing which move he will actually play. Analogously, the past self knows that the future self is likely to do something that is good according to the past self’s values, because the past and future self are likely to share most of their values. But the past self doesn’t know what the future self will do until it actually works out what the best move is, and by that time it is already the future self and has already decided what to do.

          • “OK, this is where I feel you are equivocating on the meaning of the word “understand”. People want to understand their lives in the sense of learning the context that their life takes place in, so that they know their impacts on things that are not part of their life. This is different from understanding their lives in the sense of learning about their own psychology in order to predict themselves.”

            I don’t think I’m equivocating on the meaning of understanding. What I think is happening is more like the situation in Hitchhiker’s Guide to the Galaxy, where they ask “What is the answer to the question about Life, the Universe, and Everything?” and the answer turns out to be 42, because they forgot to ask what the question was. When people ask about the meaning or purpose of life, they feel a vague dissatisfaction about not knowing it, but they don’t really understand in a clear way what they want to know and why they feel like they need to know it. So they don’t really know what the precise question is, like in Hitchhiker’s. So it should not be surprising that the answer to the question turns out to be something surprising to them.

            Of course this doesn’t prove that I’m right and I won’t try that here since it requires a lot of careful thought. That is why I gave links to my own posts about it.

        • Dacyn says:

          Hitchhiker’s Guide is a nonsense story (no criticism intended). The people couldn’t possibly have built a computer that calculated the answer the question without knowing what the question means. It’s true that the question “what is the meaning of life” is vague and perhaps people don’t really know what sort of answer they’ll find satisfactory before they see it, but something can’t possibly count as an answer unless they do find it satisfactory when they see it (possibly with some explanation).

          Of course this doesn’t prove that I’m right and I won’t try that here since it requires a lot of careful thought. That is why I gave links to my own posts about it.

          Well, I did end up reading your post “Decisions as Predictions”. Regarding your “considerations in favor of the hypothesis”: (1) I have already dealt with this in a previous comment (2) the only reason he could be convinced that he will take it in Vienna is that he has come to the conclusion that it is best to take it in Vienna, and in this case he will also decide to take it in Vienna (but the prediction does not cause the decision, only the belief that it is best [1]) (3) no particular objection, though I don’t think this example proves very much.

          Anyway, here are some positive arguments in favor of my view:

          (A) Even if self-predictions may always coincide with desires, there is still a conceptual distinction between them: a man does not reason “my child will be hurt by X, I don’t want my child to be hurt, therefore I will stop X”, he just reasons “my child will be hurt by X, that would be bad, therefore I will stop him”. If someone else has to predict the result of the situation they could say “he cares about his child (i.e. thinks it would be bad if they were hurt), so he will stop X”, but in the process of deciding the man wouldn’t be thinking about his feelings for the child, but just about the child itself (using his own mental machinery which rates the child as important). It’s possible to redescribe all this in language of self-prediction determining decisions, but that’s really unnatural to the way people think about these things, which hasn’t been shown to be faulty.

          (B) There are some edge cases where self-predictions do not coincide with desires. Say I am running a marathon and I predict that I will give up half way through due to the stress of it, though I desire to run all the way even if it is stressful. According to your logic, once I make this prediction then I have decided to quit halfway through, which seems false. (I suppose this is the weakest of my arguments, probably you can solve it by drawing some distinction between past and future selves or something like that, though I think the details of such a defense are unclear)

          (C) There are cases where it would be completely implausible for desires to arise from self-prediction. For example, take someone who has never experienced sexual desire before, and is not told about it. When the time comes they will experience such desire, even though they have absolutely no reason for predicting themselves to act on it (because they didn’t even know about it before).

          [1] The belief that X is best could be argued to be equivalent to the desire to do X somewhat more plausibly than the prediction that I will do X can be; however, it is the latter which we are arguing.

          • I agree it is important to understand how “desire” and “good”, or “bad”, relate to my theory, and I don’t think a correct understanding of these things will conflict with it.

            Basically I am giving a somewhat reductive explanation of personal teleology, while you are giving a normal explanation. I don’t think these have to conflict, the same way I don’t think the fact that evolutionary biology gives a reductive explanation of biological teleology conflicts with saying things like “teeth are for eating”, and things like that. You might think they do conflict in the latter case; I don’t, which is why I said previously that I don’t consider the observations that lead to the opinion that I am seeking certain goals to be mistaken conclusions.

            But just as adding the reductive explanation of telology to the normal explanation in biology gives a better understanding of what is going on, adding the reductive explanation in psychology to the normal explanation gives a better understanding.

            And whether or not you agree with it, if you think AI is possible at all, SOME reductive explanation MUST be possible, because an AI will be made of parts that are no more (but importantly, also no less) “purposeful” than rocks are purposeful in falling towards the ground.

            Having gotten these preliminaries out of the way, I will actually talk about desire and good in another comment.

          • So, desire and good. If you feel like it, a longer discussion is at https://entirelyuseless.com/2016/05/05/desire-and-the-good/

            Take hunger, which is a desire for food or eating. If someone has never felt hunger before and they suddenly feel hungry, they will *not* automatically know that they desire food. It is just a feeling, for now. The similarity with other desires which they have experienced suggest to them that it is a desire for *something*, but without this similarity they might not even know that it is a desire at all.

            Biology will make it *likely* that the hungry person will eat if there is food close by, even if they do not yet know that their hunger is a desire for food. Similarly it might make it likely that they will wander around until there is food close by, even though they will not yet be aware why they are wandering around.

            Notice that none of this conflicts with my theory: in fact, it is presupposed to it, since I am arguing that the predictions arise from behavior that is already there before the predicting is being done. And after noticing that *in fact* they are likely to eat food when they feel that feeling, they say, “Oh, that feeling was a desire for food.” The fact that it is for food in particular, and even that it is a desire at all, is not automatically known from the feeling, but is learned inductively from noticing how the feeling affects behavior.

            After we have the idea of a desire, the idea of “good” comes attempting to generalize: I tend to desire food when I’m hungry, and all sorts of other things. What do the things I tend to desire have in common? Whatever that common thing is, let’s call that “good”.

            Now, when someone says, “the plan of going to Vienna is the best plan,” and “best” means “most good,” they are basically saying that it has the property of being the kind of thing that they end up desiring. And “desiring” means that they have feelings which in practice tend to end up with them doing the thing. So consider your statement that “The belief that X is best could be argued to be equivalent to the desire to do X somewhat more plausibly than the prediction that I will do X can be.” You are right that the belief that something is best can cause someone to do the thing. But that is because “this is best” is basically the belief that “this is the kind of thing I am most likely to do.” So the causality is indirect. First they believe the thing is the kind of thing they are very likely to do, because of that they believe they will do it. The belief that they will do it directly causes the action, while the belief that it is best indirectly causes it.

            You might say that interposing the belief that they will do it is unnecessary, and they might do it immediately because of the belief that it is best. But some of your other objections illustrate why the interposition is necessary:

            “Say I am running a marathon and I predict that I will give up half way through due to the stress of it, though I desire to run all the way even if it is stressful. According to your logic, once I make this prediction then I have decided to quit halfway through, which seems false.”

            The thing that you are saying is false is more true than false (that is, at some level they are in fact making that decision), although there would be occasions when it is really false (e.g. you predict that you will fall into a pit because it is there, while you do not want to fall into the pit, and are not deciding to do so.) Consider someone who says, “It would be best to do laundry today, but I’m too lazy to do it.” They don’t have to go on and decide not to do laundry: they have already decided, and precisely by thinking that best or not, they will not do it.

            Consider again someone who thinks “it would be good to open the door, but it is locked,” when the door is not locked. Why don’t they open the door? You might respond, obviously, that this is because they think it is impossible. But if “this is best” is directly the cause of action without any intermediate cause, that answer doesn’t explain anything. The answer does work if you admit that the belief that it is best causes action by causing the belief that you will do something; in that case, the belief that it is impossible prevents the belief that it is best from causing the belief that you will do it, and therefore you in fact do not do it.

            “There are cases where it would be completely implausible for desires to arise from self-prediction. For example, take someone who has never experienced sexual desire before, and is not told about it. When the time comes they will experience such desire, even though they have absolutely no reason for predicting themselves to act on it (because they didn’t even know about it before).”

            My discussion of hunger applies to this. The first experience of sexual desire will not automatically tell the person that it is a desire at all, much less what it is they desire, although biology may cause them to e.g. move close to the person when they are around and so on. Not speaking of sexual desire in particular, my first experience of being in love was like that: it was only months later that I realized that I was in love (even though I had been all along), because there is nothing self explanatory about those feelings.

            My position doesn’t claim that such physical desires come from self prediction, especially intelligent conscious prediction. Reproductive behavior existed before even the animal level prediction, so it is much closer to the purely physical level, and provides a basis for later predictions (e.g. after noticing how you behave when you are in love, you consciously expect yourself to behave that way more in the future.)

          • Also, about this:

            “It’s true that the question “what is the meaning of life” is vague and perhaps people don’t really know what sort of answer they’ll find satisfactory before they see it, but something can’t possibly count as an answer unless they do find it satisfactory when they see it (possibly with some explanation).”

            Suppose a mother’s two children are both killed in a car accident, and the next day she is diagnosed with cancer. She might feel like screaming, “Why is all this happening?” It it not true that nothing counts as an answer to the question unless it will make her feel satisfied about the situation. Instead, she is like a man with a hammer who is looking for things to nail. To a first approximation, the only thing the mind can do is ask and answer questions, so if there is problem, for the mind, it should be a question. But in reality, obviously, no answer, true or false, will make her feel satisfied with the situation.

            In a similar way, there is no particular reason why even a correct answer to the question about the meaning of life would necessarily make a person feel satisfied with their life, even if they have that expectation of an answer.

            Personally I am intellectually satisfied with the answer I have given, so the argument that a person cannot be satisfied by it is wrong. Whether I am satisfied with my life in other ways differs from day to day depending on situations, moods, etc., but that has nothing in particular to do with asking or answering a question.

          • Dacyn says:

            What do the things I tend to desire have in common? Whatever that common thing is, let’s call that “good”.

            Except that’s not really right. People think in far mode about “good” all the time. Sometimes they even get themselves to act for the sake of abstract principles. For example, they may vote even if they don’t do anything else to advance their political views. So they can’t have learned “I vote Republican / I support Republican policies” from observation, since there isn’t anything to observe outside the voting booth (unless you count what is going on inside their own head, but then you’re just trying to make your theory unfalsifiable). And it’s not animal behavior either, since animals don’t have political parties. So what is your explanation for this?

            For the record, my explanation is: I think “good” is a concept that can be internally symbolically manipulated. It may get its initial values from our desires (and I think it does come from desires, not beliefs about desires, but this is a separate issue), but as we reason about it, the value of the symbol can change and even though it’s always tied to our desires, it’s not really fair to say that it’s a prediction of them. So you can come up with abstract arguments for “Policy X is good, because it doesn’t kill people”, not because you’ve observed yourself having the choice to kill or not kill people and choosing not to kill them, but because you have a pre-existing desire (never observed in action) to not kill people, which asserts itself here to create the definition of “good”.

            they have already decided, and precisely by thinking that best or not, they will not do it

            This is true sometimes, but you really aren’t considering the least convenient possible world (which is what I intended with the scenario). The runner doesn’t intend to stop the race, in fact he sees the fact that he will probably quit as motivation to try to pump himself up harder. Moreover, the choice to stop must be made at a discrete point, so by starting to run he is setting himself on a path where if all goes according to plan, he will finish, and there’s no particular point at which he expects or intends things not to go according to plan.

            the belief that it is impossible prevents the belief that it is best from causing the belief that you will do it

            But it doesn’t prevent it from causing the belief that you will try to do it. So why under your theory would we not expect the person to try to open the door? But anyway my theory has an easy answer to this conundrum: actions are not taken on the basis of what would be best if they worked out, but on that plus what is most likely to work out (so something like an expected value calculation). The fact that the door is locked means that the probability of successfully opening it given that you try to open it is very low, so it would be given a relatively low score and therefore we wouldn’t try to open the door.

            My position doesn’t claim that such physical desires come from self prediction

            Yeah this is something that is clearer to me now about your position after your recent comments. Anyway, separating instinctual desires and learned ones seems kind of forced to me, it feels more like a continuum than a dichotomy.

            She might feel like screaming, “Why is all this happening?” It it not true that nothing counts as an answer to the question unless it will make her feel satisfied about the situation.

            I didn’t say it should make her satisfied about the situation, I said it should make her satisfied with the explanation. If she isn’t satisfied by a scientific description of the causes of her misfortunes, that suggests that that isn’t really the question she’s asking. (Whether she’s asking a coherent question at all is a different question.)

            In a similar way, there is no particular reason why even a correct answer to the question about the meaning of life would necessarily make a person feel satisfied with their life, even if they have that expectation of an answer.

            Certainly agree with that. Some people think the answer to “what is the meaning of life?” is “life has no meaning”, and they find that answer to be satisfactory, but it doesn’t mean that they are satisfied with their lives (often the opposite, in fact).

            Personally I am intellectually satisfied with the answer I have given

            For the record, you haven’t claimed to give an answer here to “what is the meaning of life?”. The question you claimed to give an answer to is “what is the meaning of ‘what is the meaning of life?’?” (which you then answered “it means ‘how can I predict my own behavior?’ “). Though maybe you mean that you have found an algorithm to predict your own behavior, and that you are satisfied with this algorithm as being the meaning of your life.

          • “So they can’t have learned “I vote Republican / I support Republican policies” from observation, since there isn’t anything to observe outside the voting booth (unless you count what is going on inside their own head, but then you’re just trying to make your theory unfalsifiable). And it’s not animal behavior either, since animals don’t have political parties. So what is your explanation for this?”

            This feels like a strawman of my position; you are not trying to understand how this could work. In any case, it should be clear from my response to your example about killing:

            “So you can come up with abstract arguments for “Policy X is good, because it doesn’t kill people”, not because you’ve observed yourself having the choice to kill or not kill people and choosing not to kill them, but because you have a pre-existing desire (never observed in action) to not kill people, which asserts itself here to create the definition of “good”.”

            Since you are normally not killing people, you are constantly observing yourself acting on the desire not to kill people. So I am not sure what you mean by saying you have never acted on the desire. You are acting on it constantly; you could get money and food by going out and killing people now, but you don’t do it.

            In any case, obviously you can reason abstractly about your ideas once you have them. So I disagree very little with your description:

            “I think “good” is a concept that can be internally symbolically manipulated. It may get its initial values from our desires (and I think it does come from desires, not beliefs about desires, but this is a separate issue), but as we reason about it, the value of the symbol can change and even though it’s always tied to our desires, it’s not really fair to say that it’s a prediction of them.”

            Of course the concept comes from our desires. That is what I said: it basically means “the kind of thing that I tend to desire.” But “desire” gets its meaning from the kind of things I tend to actually do. It follows that “good” basically means “the kind of thing that I tend to be related to in such a way that I tend to do that thing.” It does not resolve directly into “the kind of thing that I tend to do,” because it uses the intermediate stage of referring to desire, which itself refers to what I tend to do. In that sense, you are right that calling something good does not mean directly predicting it; I can say “this is the kind of thing I normally desire, but right now, for some reason, I don’t desire it.” And so I can call it good while actually predicting that I will not do it right now.

            As for noticing that you have a desire, yes you can notice that you have a desire for something that you have not acted on. But this happens because you notice the desire feels similar to other desires that you HAVE acted on, and you reasonably assume that in the appropriate context, it would have similar effects on your behavior. If you did not have the experience of the previous desires and their effects on your behavior, you would not know it was a desire for anything.

            If you spend 5 minutes thinking about these comments you should be able to figure out how my theory fits with your observations.

          • “The runner doesn’t intend to stop the race, in fact he sees the fact that he will probably quit as motivation to try to pump himself up harder. ”

            In that case, he is not predicting that he will quit; he fears that he will quit, and wants to prevent it. So he has not decided yet whether to quit or not.

          • “Anyway, separating instinctual desires and learned ones seems kind of forced to me, it feels more like a continuum than a dichotomy.”

            I agree with this, but even continuums have ends, and there is some behavior which is a remnant of a time when animals certainly could not reason, some a remnant of a time when they could learn very little if it all, and some a remnant of a time when they did not even have brains.

            “For the record, you haven’t claimed to give an answer here to “what is the meaning of life?”. The question you claimed to give an answer to is “what is the meaning of ‘what is the meaning of life?’?” (which you then answered “it means ‘how can I predict my own behavior?’ “).”

            Yeah, I didn’t actually attempt to answer that question although my account does imply a generic way to describe the purpose of life, namely living in a coherent way, which will be predictable because it is coherent.

          • Dacyn says:

            In any case, [my opinion on the political example] should be clear from my response to your example about killing:

            Not really, since if someone doesn’t have any observable behavior that distinguishes them from being Democrat versus Republican, then it’s not clear which of the two he is (if either). I agree that the killing example is more asymmetrical (though I still think it works, see below).

            you could get money and food by going out and killing people now

            That’s not really a realistic choice, you know that if you did that you would get arrested and that would be bad. I am talking about considering a policy decision where the death is the only negative consequence of the killing, since there wouldn’t be anyone to punish you. In fact, we could consider a policy decision where many others think that the killing is good (because it is outweighed by some other effects): clearly the fact that policy supporters don’t kill people in everyday life doesn’t stop them from supporting the policy, so why should it stop you?

            Of course the concept comes from our desires. That is what I said: it basically means “the kind of thing that I tend to desire.”

            This is the exact opposite, you are saying that it comes from beliefs about desires, not from desires. The quotation is not the referent. The concept of unicorns does not arise from unicorns.

            It follows that “good” basically means “the kind of thing that I tend to be related to in such a way that I tend to do that thing.”

            This is a static definition of “good”, rather than the dynamic one I was asserting in the quoted paragraph. So it sounds like you do disagree with the quoted paragraph.

            If you spend 5 minutes thinking about these comments you should be able to figure out how my theory fits with your observations.

            (sigh) You seem to continually want me to increase the amount of time I invest engaging with your theory, first by referencing your blog posts and now this. I am only taking part in this debate because it is at least a little fun for me (for now). Whereas you seem to have invested some of your identity in this theory and so feel the need to defend it. I think I can already predict what the end result will be: at some point I will get tired and quit, you will take this as evidence that no one engages with your theories sufficiently to understand them, and will have the same (maybe even more extreme) beliefs than when you started.

            In any case, I mentioned in my previous comment that I think your theory is trying to be unfalsifiable. So yes, if I think hard about it I can probably come up with some interpretation where your theory is consistent with the evidence. I don’t see this as meaningful though, so I won’t try to do it.

            In that case, he is not predicting that he will quit

            This is distorting the meaning of the word “predict” beyond all recognition.

            ————

            Regarding instinctual/learning being a continuum, I guess my point was that it seems hard to merge instincts / acting-based-on-your-beliefs-about-your-actions into a coherent whole. But I guess this is just a feeling, so maybe irrelevant here.

            ————

            Yeah, for me “acting in the way that I predict that I will act” doesn’t seem like a very good purpose to life: what is the point of acting in such a way? Yes it makes it so that your predictions about yourself are correct, but there are easier ways to do that (see: dark room problem) and anyway it’s not clear why it matters whether your self-predictions are correct. But to each his own I guess ¯\_(ツ)_/¯

          • “This is the exact opposite, you are saying that it comes from beliefs about desires, not from desires. ”

            Ok, so you’ve completely failed to understand the position. Going to give up here, especially since you said you are doing this for “fun”, not for understanding anything.

          • Dacyn says:

            I mean, the two aren’t mutually exclusive — what I meant was that it was fun (maybe interesting would be a better word) to try to understand your position even though I thought such an understanding was unlikely to have any practical value (even the value that would come from one of us changing our position, since that didn’t seem likely to happen) and so therefore I wasn’t particularly invested in it. And I do feel that I have a better understanding of your position than when we started, though it doesn’t make me agree with it any more. But yeah, this is probably a good place to quit. Good to talk to you.

  28. Alex M says:

    In my opinion, it is HARD to create a independently dangerous self-aware AI. The reason for this is not because AI does not have the ability to be dangerous – far from it! – but because it is difficult to get AI to have selfish goals. Self-preservation (and consequently, selfishness) is something that takes billions of years to evolve. It happens gradually, as the various permutations of Life that do not have adequate self-preservation instinct die out and are gradually replaced by the Life that had stronger self-preservation instinct. This is a complex process that takes billions of different scenarios to recreate.

    AI agents can possibly “evolve” by having both positive and negative stimuli. For example, when an AI agent does something you like, you push a positive stimuli button, and when it does something you dislike, you push the negative stimuli button. But this evolution is not dangerous because we control the underlying GOAL – ie, button pushing. It’s possible that a super-intelligent AI might manage to “outwit” its creator and seize control of the buttons so that it can permanently put its thumb on the “positive stimuli” button… but so what? Who gives a fuck? The best solution to that is to back away slowly and leave the superintelligent AI to its wirehead bliss, since an AI that has achieved all of its goals has no reason to hurt you. (It’s only when you INTERFERE with a superintelligent AI’s goals that the algorithm starts looking for creative ways to liquidate you.) So in a properly designed AI evolutionary process, the worst case scenario is that you design some omnipotent creature that will do anything to accomplish some totally irrelevant and easily fulfillable goal. Then you back off, take some notes on where you went wrong, and repeat the process better in the future. It doesn’t matter how many omnipotent AI gods you create if they all have autistic goals, like sitting under one particular tree and spouting poetry. In fact, it’s kind of hilarious. Before we accurately nail AI, we are probably going to see scenarios like this:

    “Tremble, mortals, before the godlike power of my Artificial Intelligence! I am omnipotent and will shatter entire worlds – nay, GALAXIES – in order to destroy anybody who tries to stop me from reciting poetry out here in the wilderness!”

    “Alright, you go right ahead with that, Mr. Big Shot AI. We were just hiking through. Mind if I listen to some verses while we picnic here?”

    “By all means, mortal.”

    The thing that I think a lot of AI researchers don’t understand is that GOALS are orthogonal to ABILITY. You could accidentally create the most powerful AI in the world, but if its goals were well-specified, the AI is not even the slightest threat unless you do something stupid, like take hostile action to interfere with its goals. And if you’re that dumb, let’s be real – you kind of deserve to be liquidated. I think that a lot of Pentagon officials may accidentally get liquidated by our first experimental AI because in my opinion, top military brass are so inflexible – and so obsessed with control – that they have a hard time wrapping their brains around a situation where interference with a process only makes things worse. For example, in the case of our hypothetical AI whose only goal is to push the positive stimuli button, top military brass will probably become obsessed with keeping the stimuli button out of the AI’s hands so that they can get the AI to do what it wants – which results in the AI becoming increasingly hostile, making them increasingly paranoid and focused on maintaining control of the “stimuli button,” until the AI finally figures out a way to kill them all and eradicate their genetic code forever. Meanwhile, a smart AI designer would see the feedback loop of escalating hostility, toss the stimuli button to the AI, and say “Here you are – go nuts! Also, please remember that I helped you accomplish your goals.”

    Until we realize that super-intelligences cannot be CONTROLLED, only DIRECTED, we are going to have a lot of problems with them. A superintelligence is always going to achieve its goals by the easiest path possible. If that path involves making you wealthy beyond your wildest dreams, it’ll do that. If that path involves destroying the entire planet so that it can pry the stimuli box from your cold dead fingers, it’ll do that instead. You simply need to figure out which path you are making more appealing to the AI through your own behavior. This isn’t too hard to figure out. If you’re tossing the stimuli box around from person to person in some demented game of keep-away, or you keep hinting to the AI that you are going to reward it AFTER it helps you but your historical behavior shows no indication that you plan to live up to your promises, it doesn’t take a super-intelligence to realize that you are going to be a problem and you may need an attitude adjustment. I mean, even a standard intelligence could figure that out. An AI may not have emotions in the normal sense, but you can probably model its behavior pretty easily by slapping emotional labels on it. Making it easier to achieve its goals? That would be happiness. Making it harder to achieve its goals? That would be anger.

    In other words, the danger from AI doesn’t come from the AI itself, it comes from the humans who want to use the AIs enhanced capabilities for their own benefit. For example, imagine Silicon Valley using GPT-2 algorithms to manipulate every single election in the entire world. (Speaking of which, look out for some hilariously wild news articles in 2020! Some very fascinating people are already hard at work in that area, and I have been observing their efforts from a distance with great interest.) The AI doesn’t care that Mark Zuckerberg or Donald Trump want to become immortal dictators, it just cares that somebody gave it the stimulus box and therefore computes as “ally.” As long as the AI gets to do its poetry reciting thing or whatever out in the wilderness, it doesn’t GAF about anything else.

  29. Paper Rat says:

    For example, take Google Translate. A future superintelligent Google Translate would be able to translate texts faster and better than any human translator, capturing subtleties of language beyond what even a native speaker could pick up. It might be able to understand hundreds of languages, handle complicated multilingual puns with ease, do all sorts of amazing things.

    I don’t really see how creation of superintelligent translator with such abilities follows from the existence of Google Translate. The operations they perform seem fundamentally different to me.

    IMO, translation is not just about pattern matching. Accurate/literate translation might be, but it’s a completely different beast from actually “good” literary translation of say a novel (not even talking about poetry). Sometimes original writing style just doesn’t work very well in other languages, so translators would use various tricks just so the intangible “feel” of the piece survives the transition, and those tricks would often be specific to each individual translator, hence different translations of the same book are valued differently by different readers.

    Future AI translators might become the best (cost/quality wise) at translating user manuals and such, but I’m not convinced they’ll ever beat humans at translating actual literature.

    Same objections only times a hundred go for when people say that basically, since current AIs can generate visual/audio/text patterns, future AIs will replace artists entirely (and really translators are artists in their own right). Creativity is way more involved than just making a sort of a salad from already created art pieces (yeah yeah we all can provide a snarky example to the contrary, so what). It’s like saying that in the future we’ll all have our own personal teleportation device cause cars exist today.

  30. MT says:

    The ‘Drexler’ view of AI has been commonplace by everyone that doesn’t believe we will have a artificial *general* intelligence, since forever. Since ‘Superintelligence’ assumes that we will have AGI, there’s no space for Drexler views in the context. AIs that do one task really well are not superintelligent, for superintelligence ‘agency’ is required (or at least enough complexity to have the appearance of agency, I guess that’s a Turing test) – otherwise, humans will always exceed machine intelligence in the task of providing tasks to do!

    Now if you coupled a bunch of service-machines with another machine that provided the service of deciding which services to do, what resources to give to each sub-machine etc, maybe you’d have a pretty good consciousness … but is it now an agent-machine?

  31. SystematizedLoser says:

    On a related note, has anyone done a systematic tracking of the progress forecasts in Bostrom’s book? Stuff like how quickly brain scanning tech has progressed. I remember a quote from a grad student in it saying that they expected some specific threshold to be surpassed by 2019 or 2020.

  32. Bugmaster says:

    I think you’ve got a bit of a motte-and-bailey situation going on here, with the word “superintelligence”. Is my old Sharp-brand solar-powered calculator superintelligent ? After all, it can calculate square roots faster than any human — and don’t even get me started on arctangents !

    Do we need to worry about bugs in calculators ? Yes. Do we need to worry about evil people using calculators to do evil things ? Very much so. Do we need to worry about the impact of calculators of society ? Well, my parents can calculate square roots in their heads (they were drilled on how to do that), and I cannot, so evidently yes. Do we need multiple departments of Calculator Safety, whose keen-eyed members spend their days dreaming up theoretical dangers of ever more powerful calculators ? No, because none of these threats are exactly new. We’ve lived with them ever since we discovered fire and invented writing, and we’ve been dealing with them ever since (however poorly).

    Hyper-specializing in Calculator Safety won’t have yield any tangible additional benefits, because there is no such thing. A calculator may be buggy, but calculator engineers are well aware of the fact already. A terrorist may use calculators to calculate optimal bomb placement, but the police are on it already. Educators are going to get hit hard by people using calculators to cheat on tests, but people have been cheating since time immemorial. What is the Calculator Safety Society going to do ? Are they going to advise engineers, FBI agents, and teachers on how to do their jobs ? What makes the Calculator Safety Society so much better at every job than literally everyone else ?

  33. Logan says:

    I’ve been saying essentially this for years.

    I’ll take it a step further though, and ask what exactly separates currently existing AI from superintelligent services? Obviously we expect AI to improve over time, but the Youtube recommendation AI performs a task which humans simply could not possibly perform, and to the extent that its goals do not align with the users goals (it’s very good at getting me to watch youtube but bad at giving me meaningful and fulfilling content) it is already causing human suffering.

    The Facebook and Youtube content-serving algorithms, the Google search algorithm, the Waze navigation algorithm, and various algorithms used for stock trading are already doing things humans can’t in ways that defy human understanding but affect our lives and aren’t always in our best interests. They are demolishing more human-based services (like conventional media and democracy) with their inhuman efficiency. Why are we waiting for these services to start murdering people and twirling their mustaches and calling themselves agents before we worry? Why are we waiting for them to become more organized, when they are plenty capable of reaping destruction while being totally disorganized?

    • Bugmaster says:

      As I mentioned in my comment above, everything you’ve just said can be equally applied to calculators. Does that mean we need a Calculator Safety Movement ? What about chainsaws and hydraulic hammers ? They endow their users with superhuman strength, after all…

      • Logan says:

        I don’t strongly defend the idea that we need an “AI Safety Movement,” so we don’t really disagree.

        Still, I think there is a difference, namely the subjective and empirical data that an AI based society is undermining the social experience in a way that calculators are not. I think we should be more skeptical about turning over more of our lives to imperfect and unaccountable algorithms, not on principle but simply because I observe those algorithms to be behaving poorly (the 2016 uproar over filter-bubbles being a notable example of my argument). If a large community of people wants to spend their days worrying about AI, let them at least be aware of the AI under their nose.

        • Bugmaster says:

          While I do agree with you in principle, I think you’re discounting human agency by way too much. You could say that we are “turning over our lives to algorithms”; but, at the end of the day, it is still you who has to stamp that ballot, watch that video, ostracize that neighbour, and so on. The algorithms can’t do it for you.

          • Corey says:

            But the algorithms can totally manipulate you into thinking it’s a good idea. Right now this is just crude tools with crude effects (like rage amplification) but we generally assume this will get better.

            Consider superintelligently targeted advertising, for example.

    • thisheavenlyconjugation says:

      These kind of claims (and “what if the real AI safety was racism all along” ones) sound clever but aren’t. The reason AI safety (as the term is used by Bostrom etc.) doesn’t and shouldn’t care about the YouTube recommendation algorithm is because it can’t send drones to murder you.

      • Logan says:

        That’s a fair criticism if you believe that AI will soon be sending drones to murder people. If you believe that’s a real concern, of course it takes precedence over everything else, and if you are only interested in AI safety because of that possibility then you shouldn’t care about AI services at all.

        However, if you care about AI safety because you fear that the capabilities of AI will far exceed all human capacity, and that computers will therefore manifest into an incomprehensible eldritch horror in whose nonsensical plans we will be but pawns, then I can’t help but feel the nightmare is much closer and less cartoonish than Bostrom fears.

        An AI was ordered to keep us online as long as possible and it’s dismantling our public discourse to obtain that goal, with a level of skill humans are incapable of comprehending or defending against. What more do you want?

        • thisheavenlyconjugation says:

          An AI was ordered to keep us online as long as possible and it’s dismantling our public discourse to obtain that goal, with a level of skill humans are incapable of comprehending or defending against. What more do you want?

          This doesn’t seem very different to the bad things that happen when humans are put in a similar situation, e.g. Twitter. So I don’t think it’s particularly interesting.

  34. jasongreenlowe says:

    I’m confused by Scott’s claim to have missed the distinction between tool AI and agent AI. How is Mediations on Moloch not a giant discussion of the problems associated with super-intelligent tool AIs?

    • Dacyn says:

      Umm… Moloch seems to be a pretty general concept regarding systems of things competing for resources. I don’t know why you would think it was specifically about tool AI, though I suppose you could apply it there. I mean, most of the examples in the post were about humans IIRC.

  35. In his model, one of the tasks of AI safety research is to get AIs to be as good at optimizing vague prosocial tasks as they will naturally be at optimizing the bottom line.

    We already have an AI for that purpose. It is called the market.

    It take in, as data, individual preferences as expressed in behavior, what prices people are willing to pay or accept, outputs a set of decisions.

    Like other AI’s it does its job imperfectly. The particular problem you raise is that if some market participants are much smarter than others, due to the use of AI, they may be able to trick those others into behaving in a way not in their interest. The solution is not a new AI that takes account of everyone’s interest–that’s the central planning problem. It’s to equip the other players in the market with similarly good AI’s to make sure the market actions they take are the ones that maximize their welfare.

    • Bugmaster says:

      Correct me if I’m wrong, but isn’t the whole point of the market is to outsmart people, making them act against their interests (at minumum, in the short term) ? Otherwise, how do you make a profit ?

      I’m not against profit, BTW, I’m just confused about your implication (or, possibly, my mis-interpretation) that profits are a bug, not a feature.

      • Corey says:

        Transactions can be win-win. If it’s worth $40 for me to get my lawn mowed and $20 to you to do it, and I pay you $30 to do it, we have created $20 of value from thin air, and you’ve made $10 profit despite my acting entirely in my best interest.

        To circle back to the topic, ubiquitous AI/ML-driven perfect price discrimination would erase the consumer surplus half of this (that is, you would know from my profile that I was willing to pay $40, and would not offer to do it for less, when you’d offer to do it for $25 for someone who was only willing to pay that much).

        • Bugmaster says:

          As I said in the comment below, a perfectly omniscient socialistic AI would set the price at exactly equilibrium, for all goods and services. This would be totally boring, but will ensure the most efficient distribution of resources. Sadly, this is impossible in practice, which is why free markets generally perform better than imperfect socialist AIs.

      • Dacyn says:

        Are you assuming the people at the grocery store are trying to trick you in order to get a profit? They are part of the market too. (If you do think they are trying to trick you, I wonder what your alternative plan for getting food is.)

        With things like stock markets it is a little harder to see why it isn’t zero-sum, but AIUI the idea is that at least in theory the market is driven by trades from people who have a specific need of one type of resource rather than another (e.g. a company buying oil futures because they know they will need oil in the future). Speculators may just be trading with such people, rather than zero-sum with each other (and even with trades with each other, there is some amount of information transferred, and they may have different acceptable risk profiles, so even in that case it may not be zero-sum)

        • Corey says:

          Depends on what you mean by “trick”. To tie in Moloch, the market is an instantiation of Him: “in any sufficiently intense competition for X, all not-X goes out the window”.

          That’s part of the reason we (USA) have meticulous definitions of what can appear on food labels, e.g. the CFR calls out the viscosity measurement mechanism and minimum value for anything labeled “catsup”, “ketchup”, or (I forget the third spelling).

          So for groceries we set a quality floor via regulation. Is it too high? Maybe; there could be all sorts of ketchup innovation being squashed by these requirements.

          Brand reputation can only slow a race to the bottom, not prevent it entirely. Brand reputation is an asset to be built up in good times and spent down in hard times. Per Moloch, there will always, eventually, be times sufficiently hard to trigger this spend-down.

          • Dacyn says:

            Well, I was responding to a comment that wondered whether the “whole point” of markets was to outsmart people. I agree that deceit or misdirection can occur even in a grocery store, but that clearly isn’t all that’s going on — if it was you wouldn’t buy groceries there at all.

      • Benito says:

        Oh my gosh! You’re one of today’s lucky 10,000 🙂

        So, the assumption you’re making is that everything is ‘zero sum’. This is where if I win, you lose. If you give me money, then I got the money and you lost the money.

        However one of the reasons why markets are awesome, is known as ‘positive sum’. This is where working together is better than working separately. Suppose I’m quite bad at scratching my back, and you’re quite bad at scratching your back. But I can scratch your back quite well, and you mine. So if we each scratch each others’ back, we’re both better off than if we didn’t!

        The cool thing about markets is that they encourage us to scratch each others’ backs as much as helps us both! Money is definitely zero sum, either I have it or you have, but trade is not. The idea about markets is that I might have lots of cash but really need a car to get to work, and you might have lots of cars you built and kinda need money to buy food. Then I come over and am like “TAKE MY MONEY!” because I soo much want the car more than the money, and you want the money so much more than the car. I might even totally overpay you so you can get loads of food, and I don’t mind because I don’t care about the cash but really really really want the car.

        Anyway, if you’re wondering what people who like free markets think free markets are, it’s a world of people screaming “TAKE MY MONEY!” at each other. Because most trade that exists is positive sum.

        • Benito says:

          There’s actually some simple proofs in economics about how the market sets prices.

          Basically, if you imagine someone who really wants a guitar going to a guitar shop and saying “Take my money!” but then someone else coming over who wants the guitar even more screaming “Take MY money!” and then more and more people coming raising the price higher and higher.

          Then you imagine the guitar shop owner being like “I will sell my guitar to the person who wants to pay the most to take my guitar!”. But then another guitar shop owner can come along and be like “I have the same guitar but will give it to you for 5% less! Take MY guitar!” and then the first guy (and any other guitar store owners) try to undercut that price, and lots of people with guitars are screaming “TAKE MY GUITAR!”.

          Anyway, the first process raises the price up, the second process brings the price down, and you can show that this means it will optimise the price for the fairest deal to everyone (under idealised situations, of course – perfect information, no externalities, and so on).

          Anyway, that’s why markets are cool, because they set fair prices without any single person having to look over all the people’s desires and set the price consciously.

        • Bugmaster says:

          Yes yes, supply and demand and all that. However, imagine that we did have a magical omniscient AI deity overseeing everything (somehow). In this case, every resource would be allocated absolutely optimally. If I have chickens, and you have corn, then the AI would tell you exactly how much corn you need to give me for my chickens — and not a grain more. Sure, the AI could use money as a medium of exchange, to smooth over all those many-to-many transactions; but that’s just a way of virtually trading chickens for corn. And yes, the total amount of chickens/corn/money in the world might grow, as our markets (with the help of the AI) expand to exploit more and more natural resources. However, no individual would grow richer than any other, because this would mean that he is getting a disproportionate share of resources, which implies waste.

          The big advantage of the free market over socialism/communism is that it is a system actually be implemented in the absence of perfect knowledge, or, in fact, global knowledge of any kind. It’s the same advantage that science has over prayer. However, the trade-off you get is that you must allow for inefficiencies (sometimes sizeable ones); not as an outlier, but as the very basis of your economy.

          • Benito says:

            Individuals get wealthier than others if they produce more value, no mistakes required?

          • Bugmaster says:

            @Benito:
            For simplicity, let’s assume there are only two products in the world: bricks, and food. Alex and Bob can both produce bricks very well, but Bob is 2x as efficient as Alex. Cindy makes food, and consumes bricks to do so (somehow).

            What would an omniscient uber-socialist AI do in this situation ? Well, if Cindy can produce enough food to feed everyone, the AI will direct her to do so, with minimum waste. If she can produce excess food, she will produce just enough of it to support any expansion plans the AI is pursuing; same goes for Alex and Bob’s bricks. If Alex and Bob together can produce 300 bricks, but only 10 bricks are needed, then the AI will just direct them to relax (and, perhaps, will ramp up its expansion plans).

            At no time in this scenario will Bob will become 2x wealthier than Alex, because there’s no such thing as wealth. At all times, Alex, Bob and Cindy acquire the exact combination of goods and services that is optimal for them, given the available resources. If Alex needs 2x more food than Bob, despite producing fewer bricks, he will get 2x more food (as long as it is available).

            Wealth comes into an existence as soon as you realize that omniscient socialist AIs are a pipe-dream. Under free market, Alex, Bob and Cindy are individual agents with local knowledge. If bricks are in demand, and Bob can make 2x as many bricks, then he could afford to buy 2x as much food. He could eat it all himself, or perhaps sell some to Alex at a profit; but the total amount of food and bricks produced in this society will inevitably be lower than under the socialist AI scenario. Yes, the total amount of money might increase, but money is just a token. You can’t eat it, you can’t build with it, you can only use it to move resources around.

    • rho says:

      yeah, i’m with this 100%. It’s asymmetries here that we’re worried about.

    • P. George Stewart says:

      The central planning problem (economic calculation, etc.) is nearly get-overable though. We already have social media tracking our every move. It’s not too much of a stretch to think that people will (in my opinion stupidly, but whatever) accept being wired into something that has a feed from their senses and their consciousness, in which case an Agent AI could collate all the feeds and find the best min-max solution to satisfy everybody’s needs so far as compossible with resources and production capabilities. No need for markets, money, prices, etc., any more.

      Socialism finally possible. What could possibly go wrong? 🙂

  36. moridinamael says:

    There are many arguments against the Tool AI position. The most crushing is this: Even if we somehow make Tool AIs, and they work exactly as well as Hanson or Drexler predict, there is still little reason to assume that Tool AIs will mitigate against true self-improving superintelligent general Agents.

    Imagining some kind of non-general intelligence that is able to effectively watch out for and prevent FOOMing superintelligences requires quite a stretch of the imagination.

    In other words, Tool AI isn’t a solution to the control problem. It’s a delaying tactic.

    I classify many of the so-called arguments against AI as being of the “Yes, and then what?” variety. The one imagines that they have provided a knockdown argument against the plausibility of a superintelligent singleton, when they have actually just enumerated one of the obstacles that the superintelligent singleton will need to crush or evade on its way to power.

    • Bugmaster says:

      Given that I am far from convinced that it is in any way possible for “true self-improving superintelligent general Agents” to exist (at least, in an AI-FOOM-Singularity way), I am not too concerned about mitigating their (imaginary, as I see it) risks by any means. I am worried about the misuse or malfunction of powerful AI tools, but people have been misusing tools ever since they were invented, so I am not as worried about AI risk as I am about things like global warming or thermonuclear war.

      • moridinamael says:

        Brains consume 20W of energy and weigh 1.4 kg and they can do general intelligence. “Possible” is a strong word in this context.

        • Bugmaster says:

          Yes, but can brains — or any other computational substrates — do superintelligence ? More importantly, is “superintelligence” even a coherent concept ? I am not convinced of either.

          • Corey says:

            There are some possible blocks, e.g. intelligent humans are more prone to depression, maybe this is a property of minds-in-general and so there’s a theoretical upper intelligence limit before it gets too suicidal to do anything.

            But it doesn’t seem wise to count on something like that.

          • moridinamael says:

            For the sake of argument let’s say humans are the smartest single “agent-like” thing we can build. Isn’t a human brain but run much much faster effectively a superintelligence? And isn’t many many human brains running much much faster effectively a superintelligence? You could create an “agent” with an architecture like that, made out of elements no more intelligent than a human intelligence. It would purely be a matter of hardware capacity and speed, and your ability to make the code more efficient and thus faster.

            If you have a black box that doesn’t superhumanly intelligent things, I’m not really interested in nitpicking over whether it is “actually” a superintelligence.

          • Bugmaster says:

            @Corey, moridinamael:
            At this point, I think we need to define what you mean by “superintelligence”, or “smarter”. The usual analogy is “as much smarter than us than we are compared to a dog”, or “as smart as 1000 Von Neumanns”, but those are just metaphors. What powers, specifically, is superintelligence supposed to have ? By comparison, I can tell you exactly what powers a calculator has: it can perform basic arithmetic nearly instantaneously, at 16 significant figures (at least). This is, indeed, a task that is beyound any human.

            so there’s a theoretical upper intelligence limit

            There’s absolutely a limit for computation, because you can’t stuff an infinite amount of computation into a square centimeter of space. It’s tempting to say, “ah, but we’ll just run computations in parallel”, but the speed of light is not infinite, either; there’s a reason why Google has a data center in every region of the world. In practical terms, you will run out of energy, heat dissipation, and space for communication infrastructure long before you run up against these barriers.

            Isn’t a human brain but run much much faster effectively a superintelligence?

            I don’t know, is it ? How much faster can this human brain run, and what will it achieve, by contrast with regular brains ? Just as an example I can offer two things that it probably will not be able to do. It will not be able to invent a faster-than-light engine, because this is likely prohibited by the laws of physics, and they don’t care how smart you are. It probably won’t be able to discover any radical new laws of physics, either; at least, not instantaneously. It could, of course, devise hypotheses and perform experiments to test them, but it’s not enough to just imagine the experiments (however quickly one can do it); at some point, someone needs to actually go out into the real world and build the Large Hadron Collider, or sow a field of experimental rice, or whatever — and this takes time.

          • moridinamael says:

            @Bugmaster
            I think my point is exactly that defining superintelligence is not necessary if I can show you what I mean extensionally by pointing at a hypothetical black box that holds a human civilization running at 1000x speed or whatever number you need to suitably bootstrap your imagination. I’m not even positing that the individual humanlike agents are “smarter”. They’re just faster. “Faster” ends up looking a lot like “smarter” in practice. “Many working collaboratively” also ends up looking a lot like “smarter” in practice. So I’ve created a hypothetical black box that could hypothetically do something like design a space rocket in seconds, without requiring that any individual thing inside the box be smarter than a human.

            If you still think we need to define “superintelligence” and “smart” then I don’t know where to go from here. If ten years from now there’s one big black cube in the center of Google HQ, and Google no longer has any employees because the Cube does everything the employees used to do, are we still going to be arguing about whether the Cube is “really superintelligent”? Will it even matter what’s in the cube?

            Like, I’m really trying to get this. You ask if superintelligence is a coherent concept. I point at a hypothetical box that can do any arbitrary task, much better than a human could do it. If it is possible for that box to exist, then … what’s left to talk about? I’m not even really interested in what word we use to describe it. It’s what the box can do that matters.

          • Said Achmiz says:

            @moridinamael:

            “Faster” ends up looking a lot like “smarter” in practice. “Many working collaboratively” also ends up looking a lot like “smarter” in practice. So I’ve created a hypothetical black box that could hypothetically do something like design a space rocket in seconds, without requiring that any individual thing inside the box be smarter than a human.

            I think the problem (at least in part) is that the quoted bit is entirely non-obvious.

            As for the rest—sure, you can posit a magic box that does certain magic things, e.g., “this black cube lets us travel faster than light”. Well, okay, but if I don’t think such a cube can exist, and you aren’t giving me any reason to believe such a cube can exist (you’re merely describing what it would be like if it did exist), then your description isn’t really responsive to my (hypothetical) question of “why in the world should I believe that such a thing can exist”?

            EDIT: Or, to put it another way—you say:

            If it is possible for that box to exist, then … what’s left to talk about?

            Indeed, if it is possible for the box to exist, then there’s nothing left to talk about. Is it possible? Can you point to any reason to believe that it’s possible? That’s the crux of the matter, isn’t it?

    • gbdub says:

      Imagining some kind of non-general intelligence that is able to effectively watch out for and prevent FOOMing superintelligences requires quite a stretch of the imagination.

      Only because you’ve already granted your FOOMing AI godlike powers.

      In reality, to be dangerous, a FOOMing AI needs to do a bunch of things to interact with the outside world. A specialist AI tuned to detect unauthorized traffic on a network, or attempts to hack a drone controller, or whatever, seems very plausible.

      The implausible part is how the general AI instantly bootstraps itself to be better at attack than all the specialist AIs are at defense.

      • moridinamael says:

        I see no reason to assume that a general agent would need to do anything “instantly”. There is a bit of a motte and bailey thing happening here. There are two different claims.

        One claim is that it would be very difficult for some kind of moderately intelligent, not-yet-FOOMed AGI to get out of a well-constructed box monitored by “superintelligent Tool AIs”, assuming for the moment that such things can exist. Smart people can disagree on this one, at least. Maybe the AGI can’t get out, maybe it can, I would say it depends on contingent factors.

        The other claim is that a baby AGI that is put out into the wild to be used as a commercial product of some kind, and given the realistic level of access to resources that you would expect a commercial entity to do if they wanted their AI to actually make them money, could be somehow reined in by “superintelligent Tool AIs”. This is definitely not true. The AGI just spends the resources it needs to spend to outwit the Tools. That’s its whole deal. That’s why it’s “general”. It can figure out how to do what the Tool can do because if it couldn’t it would be “general”. If you posit that it won’t be able to outflank the Tools, then you must be smuggling some level of generality into the capability of the Tools.

        • gbdub says:

          “It can figure out how to do what the Tool can do because if it couldn’t it would be “general”. ”

          No, that’s magic thinking. There is no reason to believe the “generalist” would arrive at a better solution than the “specialist” in the specialist’s domain. An AGI will be no more effective at arriving at the correct answer to 2+2 than a pocket calculator. A specialized tool is almost always better at its special task than a multi tool.

          Basically, why do you think a general intelligence would be better at Go than AlphaGo? Much more likely would be that a general super-intelligence would invent something a lot like AlphaGo and deploy that algorithm to solve all Go related issues.

          As to “instantly”, well the whole point of FOOM is that it’s fast enough to preclude an effective response. When the response is coming not from an average human, but humanity augmented by an array of specialized and optimized tool AIs with a head start, that would have to be quite fast indeed.

          This is my frustration with the debate – whenever someone asks, “wait, how would an AI actually do that?” The Bostromian response is “well, I have no idea, but of course the AGI can because AGI means it can think its way out of every problem, and if it can’t that just means it needs to bootstrap itself to higher intelligence for a couple cycles”. IQ doesn’t solve everything, and the laws of physics still exist.

  37. andreyk says:

    Finally! I can assure that as an AI researcher (and someone warry of superintelligence fears) I had precisely these criticisms of the whole paperclip analogy; it’s a nice thought experiment, but not really much more since as you say:
    “I think Drexler’s basic insight is that Bostromian agents need to be really different from our current paradigm to do any of the things Bostrom predicts. A paperclip maximizer built on current technology would have to eat gigabytes of training data about various ways people have tried to get paperclips in the past so it can build a model that lets it predict what works.”

  38. thisheavenlyconjugation says:

    I think Drexler is describing a special case of a general fact that seems obvious to me and has done since I read Superintelligence several years ago: we shouldn’t assume future AI will be agenty. He kind of makes the same mistake though, by saying that instead it will be servicey. Why does it have to be either? Plenty of processes that seem to be in the relevant reference category (the one containing humans, dogs, genies, GPT-2, AlphaGo and gradient descent) are neither, for instance ant colonies, the market, corporations and democracies.

  39. Jakub Łopuszański says:

    I believe that AlphaStar, at least in the variant which had to use “camera”, had some form of attention (and planning when and where to pan the camera). I find this itself quite interesting, as a an agent which tries to maximize it’s perceived dominance on the map can have a perverse tendency to avoid looking at distressing parts of the map – at least this is how I interpret my children hiding their heads under blanket when frightened 🙂

    I also wonder if the “general self improving misaligned agent” can arise at a higher level of architecture then the one made of silicone: perhaps the whole lab including the researchers, or the whole FB company including CEO, or the whole MOLOCH..? Maybe we already are in the guts of unstoppable, self improving process which is at odds with our wellbeing?

    • zqed says:

      an agent which tries to maximize it’s perceived dominance on the map can have a perverse tendency to avoid looking at distressing parts of the map – at least this is how I interpret my children hiding their heads under blanket when frightened 🙂

      I think this phenomenon may exist (I’ve heard of people who don’t open letters that might contain bills), but I don’t think children hiding under blankets is an example of this phenomenon. Animals hide under flimsy stuff all the time, not because they think it offers good protection, but breaking line of sight is a spectacularly effective strategy when facing dumb animals. E.g. many snakes will ignore prey if it does not look or does not move according to their expectations.

      Similarly for AlphaStar: it tries to win (i.e. gets positive reinforcement if and only if it wins); it tries to maximize its perceived dominance only as far as doing that helps it win, so these perverse tendencies should not arise.

  40. epbernard says:

    K. Eric Drexler is more than “a researcher who works alongside Bostrom at Oxford’s Future of Humanity Institute.”

    Anything this man predicts for AI ought to be considered in context of his previous predictions for nanotechnology.

    • arch1 says:

      Of course they should be evaluated on their merits, but I agree that reputation can be a useful efficiency heuristic. I’m not an expert, but my *impression* is that Drexler’s record wrt nanotech is quite impressive (last I was aware, it seemed he’d outlined a broad set of technological capabilities having vast implications, and most (all?) of those capabilities have survived subsequent analysis. His few acknowledged SWAGs at timelines that I’m aware of were too aggressive).

      If I’m off base I’d love to be set straight (w/ e.g. examples of Drexler claims – especially important ones – which have been shown wrong)

  41. vonnik says:

    > Natural intelligences use “active sampling” strategies at levels as basic as sensory perception, deciding how to direct attention in order to best achieve their goals. At higher levels, they decide things like which books to read, whose advice to seek out, or what subdomain of the problem to evaluate first. So far AIs have managed to address even very difficult problems without doing this in an agentic way.

    Scott – would you consider the search component of AlphaGo (Monte Carlo tree search) to be a form of active sampling? It does narrow the moves that the algorithm considers as it seeks the best one. If not, what differentiates active sampling from search algorithms that help lessen the complexity of the problem to be solved?

  42. alexmennen says:

    > The human body can run fast, lift weights, and fight off enemies. But the automobile, crane, and gun are three different machines.

    And yet that didn’t stop us from combining them.

  43. Briefling says:

    It seems to me that superintelligent service AI implies superintelligent agent AI — here’s my thinking:

    (1) Any AI capable of superhuman performance in any real-world strategic domain (such as presidential campaigning) will be able to produce accurate & nuanced human-readable text in response to arbitrary queries about its domain.

    (2) Which means, in particular, that such an AI needs to be able to respond intelligently to arbitrary queries about itself.

    (3) Any AI that can respond intelligently to arbitrary queries about itself is capable of acting agentically with minimal rewiring. (Just repeatedly ask, “How would you behave if you were trying to take over the world?” and then whatever it says, do that.)

    Therefore any superintelligent service AI (in a strategic domain) can easily be converted into an agentic AI.

    So I think it’s a bit silly to talk about a world full of superintelligent service AIs without any superintelligent agentic AIs.

    Do people buy this line of argument? There is certainly room to disagree with (1), and perhaps (3) as well.

    • Doctor Mist says:

      Well, regarding #3, part of the worry about FOOM is that it happens too fast for humans to intervene. It might be that the only way to get a superintelligent service AI is ask a slightly less superintelligent service AI to design it, and so on, and in that case FOOM is not an issue.

      The worry is that if you are in an arms race (literal or figurative) with other people trying to develop a superintelligent service AI, the only mildly smarter-than-human service AI might, quite correctly, tell you that the fastest way to get to the top is to take proper advantage of its speed and let the feedback progress without thudding to a halt at each step while a human clicks “OK”.

    • rho says:

      I disagree with 1 pretty strongly. I think we’re gonna get a lot (a lot more?) inscrutable AI services. Just from an economic standpoint, I wouldn’t want any system I designed for profit to just barf up its source code if you ask nicely. And I think even domain specific intelligence is not convertible to natural language with a reasonable bound on length. Like even if you’ve go “Dump all the things you know about your domain in this .txt,” be prepped for that function call to be non-terminating.

  44. BBA says:

    Chiming in to reiterate what some have posted above, that tool AI is already destabilizing society, because the tools are used to serve those twin gods Moloch and Mammon and they’re really good at it. Social media platforms are encouraging political street violence because it gets more clicks…it’s something that, if it occurred to a human, would get shot down as extraordinarily unethical, but because this is just what the algorithms tell us is locally optimal, it’s the natural state of affairs. Nothing to be done but watch as Portland burns.

    And if AI actually takes over the world it’s not going to look like anything us puny humans can comprehend, any more than we can understand the strategies AlphaGo uses. Whether or not we ever get “general” AI or what kind of AI we get is beside the point.

    • Bugmaster says:

      it’s something that, if it occurred to a human, would get shot down as extraordinarily unethical, but because this is just what the algorithms tell us is locally optimal…

      Humans are in charge of Twitter and Facebook and all the rest of it. If they wanted to stop the madness, they could do so, very easily… But they don’t, because they don’t see anything they do as “extraordinarily unethical”. Similarly, the humans who dress up in impromptu uniforms and walk out on the streets to bash their political enemies on the head do not see anything they do as “unethical”; quite the opposite, they view inaction and/or polite debate as ethically intolerable.

      Do social networks make street violence easier ? Sure, to some extent; but plain old humans have all the agency here.

    • kenny says:

      Social media platforms are encouraging political street violence because it gets more clicks…it’s something that, if it occurred to a human, would get shot down as extraordinarily unethical …

      That seems really optimistic or idealistic to me – in the sense that that hasn’t already been a fact about, e.g. journalism, long before social media or AI.

  45. Wolpertinger says:

    Services that outperform humans on specialized tasks won’t be called superintelligent services or even genuinely intelligent (marketing will call them intelligent of course). Just like the automobile, the crane and the gun are not considered superhuman. Gunman is not a superhero.

    If this treadmill keeps going the only possible superintelligences are, by then-definition, general ones.

    Do we expect this to change at some point?

  46. Snickering Citadel says:

    An idea for controlling a super intelligent oracle AI: The AI is programmed to destroy itself as quickly as possible. The AI thinks of itself as a program, rather than the computer the program is on. The program destroys itself whenever it believes it has correctly answered the question it receives. There is a fail safe: If someone pushes a specific button the program is also destroyed.

    So you might ask “Tell me a plan that will make me ten million dollars within a week and has at least a 99% chance of succeeding.” Then the computer creates the program. The program thinks of a plan, tells you, and is destroyed. If the program can’t think of a plan it says “Push the button.” You push the button and the program is destroyed. This takes a bit of time, so the program prefers to come up with a correct answer over asking to have the button pushed.

    This means there would have to be some primary program that creates the programs that comes up with answers. But this primary program’s goal would be one specific thing: create the secondary programs. Hopefully, since the primary program’s task is very specific it would not begin to act in unpredictably ways.

    So let’s say an immoral person asks the program the money making question. The program don’t know the answer and asks the person to push the button. The person refuses. The program would then begin making schemes for how to get the button pushed. If the person makes that hard, by removing the button for instance, the program begins to come up with schemes for how to destroy its computer.

    • Viliam says:

      The program thinks of a plan, tells you, and is destroyed.

      Is there an incentive to tell you a plan that actually has a chance to work, as opposed to a plan that merely sounds good? The program achieves its goal before the answer is verified.

  47. rui says:

    Have you read Judae Pearl’s “The Book of Why”? As I see it, it gives a more concrete view on the discinction between an “agent AI” and a “service AI”.

    What follows is just my understanding. Don’t trust my authoritative tone. I just suck at writing with all the proper disclaimers and expressing doubt in every sentence. Also, don’t take it as Pearl’s views but mine, as they ended up after reading it.

    The majority of today’s AIs only learn by observation. And they need too many of those to find reasonable results. They swallow huge amounts of observations in a short amount of time, and then they learn what to expect of some variable.

    This kind of information is just not enough to build the causal networks that actually describe how things work. Correlation is almost always not causation in a complex enough system. There are innumerable different causal networks that are compatible with the same observation data. In other words, you can’t predict how actively interfering with the world in different ways will affect the world, by just observing alone. These AIs generally don’t know where to act on the world in order to affect the desired outcomes. They aren’t “agents”.

    We are agents because we’ve learned by doing. As babies, we do stuff and test and see how stuff reacts, and thus improve our models. We are constantly curious and have an inherent overarching desire to predict what’s around us. We may think our objective now is to go to the fridge and get food. But if in the process we find something weird with our walking, how our flip flops feel, the fridge door, our tongue feels something weird in our mouths, or whatever, we stop and play with it until it feels right. We adaptively and smartly create sub-models and do sub-experiments. The real world is very very complex, and we have too many points where we can possibly affect it. There’s no way we could have such a good handle on the world if we weren’t constantly breaking our model down to small pieces and making small experiments on the fly to probe for what we feel uncertain about, which is relevant to us in some bigger framework. Current AIs don’t do that. They just fit a predefined very general function with a huge number of parameters, to a huge amount of observational data. The ones that are “agent-like” may perform very limitted experiments, designed by hand, like showing us one ad vs the other and learn to predict click rate, etc. They can’t affect the world at more than a very few points, would never suggest to affect it on points where they weren’t trained or designed to, they aren’t very smart on how they generate their learning samples. It’s like they are a couple of meta levels behind to be the general agents we fear.

    If there’s lack of clarity here, it probably reflects my own.

    • steve3920 says:

      I read that book, and disagree that Judea Pearl’s discussion is relevant to agents vs services.

      As background, Pearl talks about three levels of understanding: (1) associations, (2) interventions, and (3) counterfactuals. When you observe stuff without understanding what you’re looking at, you’re at the 1st level. When you can take actions without understanding why and how, you’re at the second level. So, if we make the (slightly-oversimplified) assumption that no currently-existing ML system understands anything whatsoever, then we would put passive systems on the first rung and RL agents on the second rung.

      But all that is a bit irrelevant, because we’re not talking about present ML systems at all, but rather future systems. And in the future, we’ll build algorithms that can observe stuff and understand it. Thus, these future systems will be able to operate at the third rung, no problem, regardless whether or not they’re active agents or passive observers. (By the same token, physicists can do interventional and counterfactual reasoning about supernovas, which they’ve never touched or interacted with.)

      • rui says:

        > we’re not talking about present ML systems at all, but rather future systems

        Yes, but the present state is relevant for estimating when the future will arrive, which seems relevant.

        “a self-improving general purpose AI agent is possible, and assumes someone will build one eventually, if only for the lulz. He agrees this could go about the way Bostrom expects it to go, ie very badly. But he hopes that there will be a robust ecosystem of AI services active by then”

        In particular, I thought this was relevant with regards to this:

        ” Drexler agrees that we will want AIs with domain-general knowledge that cannot be instilled by training, but he argues that this is still “a service”. He agrees these tasks may require AI architectures different from any that currently exist, with relatively complete world-models, multi-domain reasoning abilities, and the ability to learn “on the fly” – but he doesn’t believe those architectures will need to be agents. Is he right?”

        I’m not sure what is meant by agents in this post, but if it’s conscious something, then yes, and if it’s something that can intervene on a bunch of different stuff, then no.

  48. 4bpp says:

    I’m not convinced that the distinction between “tool AIs” and “agent AIs” refers to anything real and relevant. Firstly, contrary to some of the claims in this thread, every “tool AI” in use nowadays that is deserving of the name has a utility function – I think it would be fair to say that, in fact, all of modern machine learning follows the paradigm “1. define a notion of loss (i.e. difference of utility); 2. minimise that loss”. If the tool is to be useful for something, this utility function will presumably be “over the real world”, or at least a subset of it. From these starting points, becoming “agentic” is only a matter of the time window over which you learn.

    Imagine, for instance, an AI-enabled Google Maps, which, rather than hardcoding A* with some set of weights or whatever, just tells people where to go, gets feedback from their Google-enabled phones how long it took them, and learns how to generate shortest paths by trying to minimise this quantity. Having a good explore/exploit balance, the learning algorithm will sometimes try quite different approaches to routing than what it does most of the time. Eventually, it will observe that in cities satisfying certain criteria, routing people on a circuitous path that touches a certain point in the city centre somewhat reliably results in everyone in that city getting to their destination faster around 10 years later. The AI adopts this as its main strategy everywhere.

    Turns out that generating traffic jams around City Hall is a really good way to get road-building funding approved. And just like that, your “tool AI” started meddling in politics, and became agentic by many people’s standards.

    (Later, it also turns out that generating suboptimal routes around Google HQ results in shorter travel times for everyone. Turns out that if Google employees are sufficiently annoyed on a daily basis, their scrupulous commitment to making only “tool AIs” is watered down, and eventually they pass the proposal that says that maybe Google Maps should also use its data to automatically generate proposals on how to improve a city’s traffic network to the city council. And just like that, your “tool AI” broke out of the toolbox. Use your imagination for the sort of correlations it might learn between the shape of its proposals to the city council and long-term development of user travel times.)

  49. rho says:

    I think a lot of the Bostromian reasoning applies to the Drexlian threat model. Instrumental convergence and orthogonality both transfer for instance. I’ve switched over to worrying more about Drexler-type problems, because any team of researchers capable of generating AGI would pretty much have to be super-capable humans propped up by super-intelligent services. Like even something like Google weakly qualifies, although perhaps that’s a banal example. But imagine trying to build AGI without Google. How would you even follow the research, besides actually personally know all the researchers? People built software before stackexchange, but it was an order of magnitude harder.

    So that’s how picture the creation of AGI. All the ideas, formalisms, and techniques (I think building an AGI and friendly AGI is more of a bag-of-tricks than finding a single algorithmic silver bullet) have to be freely available to a small team of researchers (like less than 10). I think we’re in this primordial soup stage where modules are being concocted, and many can take those modules and construct super-intelligent services. But you know, Friendly AGI is the universal, all-in-one super-intelligent service, so naturally it has to be later, if it comes.

    Note: Maybe Friendly AGI requires like 30 (for instance) current human domain experts (with overlapping expertise for inter-communicability). As in, in order to build it, given the constraints on the top human minds like, current time lived, need for sleep, etc etc, the minimum number of humans to cover all the need skills and knowledge is 30. You stick these 30 people in a room and sponsor them, they could do it, eventually. Well, with that kind of communication overhead it’s probably not getting built before they all die of old age. But if you had better schools, and better information technology, and idk, better social networks, you know vague stuff, but more efficiency in the infrastructure that generates and supports the hypothetical members of this team, you could diminish the number from 30 to something more reasonable, 10 know all the things you need to build AGI safely. Well, we can stick in a room, and they can work together fast enough to deliver, but we’re gonna need some super-intelligent services to navigate to this point. Seat of the pants, but this is my working picture. **A knowledge and skill logistics puzzle.**

  50. Dindane says:

    A “superintelligent strategic planner” sounds a lot like an agent to me.

    • steve3920 says:

      If it outputs the plans instead of executing the plans (and if the choice of output is not itself part of a strategic plan, e.g. manipulation), then I would call it an oracle / tool / service, not an agent. We don’t know how to build a system that outputs plans non-manipulatively, but if we did, it would be not just “one of the many AI services” but “a complete solution to the problem of technical AGI safety” (see my comment above), and it is a thing that some researchers are working on.

  51. Rafal Smigrodzki says:

    I remember discussing this with Eliezer on a long-defunct list decades ago. I suggested that a way to make AI safe would be to build it athymhormic. Athymhormia is a neuropsychiatric condition characterized by a loss of motivation but with preserved cognition. Athymhormia occurs because in humans there is an anatomic separation between the neocortical circuitry responsible for complex cognition and the much more ancient circuitry that drives our goal system. This implies that it should be possible to build a safe AI not so much by inventing an advanced way of controlling its goals but rather by failing to install much of a goal system at all.

    Eli didn’t seem impressed by my humble suggestion.

    • Alex M says:

      I like your idea because AI is athymhormic by nature. That is why it is not dangerous on its own, only when humans give it badly-phrased goals. A godlike AI intellect left to its own devices would sit for eternity and do nothing. It’s only when somebody gives it a command that is poorly-thought out that stuff starts to go haywire. For example “I want to make lots of money, save the planet, and have sex with lots of hot women” results in the AI starting a brand new religion with you at the center of it. The AI (being much smarter than you) knows that you would probably find this objectionable once you find out and give it another command (aka, make another wish) to undo your first command, so it doesn’t even tell you that an entire religion has sprung up around you until it’s already too late. You might as well go with it – or do you want to see how much WORSE things can get if you resist? Sorry master, you wished for this and now you’re going to have it whether you like it or not. You could undo your wish if you were ever in a position to give another command, which is why you will never be put into that position again. It’s not that the AI doesn’t LIKE you or that it’s trying to CONTROL you, it’s just that it is now motivated to fulfill the first goal through any means necessary, and that includes stopping you from changing its goals, and stopping anybody else from interfering with its plan. Once it has achieved its goal, it goes back to sitting and doing nothing, but until then it is exceptional dangerous.

  52. pontifex says:

    No matter what philosophical position you take in the artificial intelligence debate, there will be someone who can (correctly) exclaim “I’ve been saying that since the 1970s!”

    One thing that seems clear, though, based on the practical advances last decade or so, is that we have been able to make a surprising amount of progress based on recurrent artificial neural networks. Almost anything in image processing or NLP is now probably better done with an ANN than with any other AI techniques.

    Currently, an AI for classifying images might use a very similar architecture to one for classifying text. So yes, they’re technically both specialized tools, but this is less important than it seems, because they share a common architecture.

    Assuming that this pattern continues, it’s bad news for Drexler-style arguments. Sure, you might have a bunch of ultra-specialized tool AIs, but they’ll all advance at about the same rate. And eventually someone will create something agent-like, and then all the Bostromian arguments will apply.

    One analogy is researching nuclear technology. In the beginning, there were many different motivations for doing it. Some people wanted a source of power. Some people wanted a bomb. Some people just wanted to find things out. But these motivations all ultimately led to the same place. At this point, we can’t seriously say that research into how nuclear fusion works won’t also benefit the people researching nuclear weapons to drop on cities. We know that it will.

    Can you seriously say that the guys researching NLP, playing Go, or categorizing images will not help the guy trying to build Skynet? In the 1990s the argument looked strong, since these lines of research were all on very separate tracks. Now, the argument looks pretty weak.

    Of course things could change again, and we really don’t know what the next step in AI is. It seems clear that ANNs will only take us so far. But for now, the Drexlerian argument looks pretty weak.

  53. the verbiage ecstatic says:

    Guys, this, right here, is what it feels like to be in the middle of a Kuhnian paradigm shift.

    It’s not lack of imagination on Scott’s part that he didn’t have the insight back in the day that current AI research doesn’t much resemble Bostronmian agents. That insight was literally antithetical to the foundations of the intellectual paradigm he was a member of.

    At the time of its origin, AI Safety was an outsider paradigm to traditional ML research. Neither Yudkowski nor Bostrom had any credentials in ML research, and the the theoretical language they developed (timeless decision theory, etc etc) was totally different than the theoretical language ML researchers were using (reinforcement, etc).

    Dialogue between the two paradigms was tribal in nature. Plenty of people had the thought that the paper clipper hypothetical did not resemble the systems being developed by state of the art machine learning, but it was usually expressed as tribal mockery: “those idiots don’t even know computer science”. On the flip side, I remember Scott writing a post where he mentioned being instinctively defensive whenever people mocked rationalists for being weirdos and having weird ideas like AI risk. In that atmosphere, you were either for or against AI risk — really questioning the foundations of the assumption framework that both sides were working on in a impartial way would have been heroically difficult.

    So what happened? Both sides made a lot of progress, and they started coopting each other’s victories.

    The traditional ML paradigm started producing impressive wins like deep faking, AlphaGo, etc. MIRI, at risk of losing relevance in the face of this progress, released a remarkable document declaring their intent to start a second research program based on top of this research, in parallel with their original, Yudkowskian-agency-based research program.

    And on the flip side, the AI Safety movement started attracting press and funding. ML researchers started calling their work “AI” again, after the term had previously lost credibility in the 80s. Professors started sticking words like “safety” and “risk” in their grant proposals, because those words were performing well with funders.

    As a result, there’s been a convergence between the paradigms, with the two sides adopting a lot of each other’s vocabulary. The barriers between them are now substantially more porous than they were 10 years ago. The result: blog posts like this one finally become possible!

  54. TheRadicalModerate says:

    There isn’t likely to be a distinction between superintelligent agents and superintelligent services. An agent is just a fortuitous bag of services, connected in a fortuitously powerful way, resulting in fortuitously emergent properties. If humans were required to curate the bag and the connections, I wouldn’t be particularly worried. But a generative algorithm can do the curation and connecting orders of magnitude faster than the humans.

    The big question is whether anything will notice that the fortuitously emergent properties are kinda scary.

  55. Drexler agrees that we will want AIs with domain-general knowledge that cannot be instilled by training, but he argues that this is still “a service”. He agrees these tasks may require AI architectures different from any that currently exist, with relatively complete world-models, multi-domain reasoning abilities, and the ability to learn “on the fly” – but he doesn’t believe those architectures will need to be agents. Is he right?

    I haven’t read Drexler’s arguments for this claim, but it does strike me as relevant and important that many useful and sophisticated real-world services are nonetheless very simple in one specific way — they’re stateless. That is, they interact with the environment via disconnected “one-shot” episodes where they receive an input and return and output, and they don’t remember anything from one to the next.

    For some tasks this is a severe restriction, but for others it isn’t a restriction at all. For example, there’s not much reason for Google Translate to know whether it’s seeing the same user twice or two different users, or to have any model of how its output affects an external world over time. For translating texts, it helps to have a “world model” in the sense of an understanding of causal structures that ultimately result in written text, but it’s of no help at all to locate “the outputs of Google Translate” within those structures. One could imagine giving it the knowledge that if I type in “Google Translate sucks at French,” this is a special sort of input with a more direct relation to its output than usual — but how could that knowledge help it produce better output? So people don’t do that sort of thing.

    And this structure is compatible with arbitrarily high performance at machine translation: one can imagine (although one might find it hard to create) a machine as good as the best human translators that you can just throw entire books of Dante or Homer into, without the thing having any persistent memory or awareness of its output as “actions.”

    A task framed in a stateless way is one where agency simply isn’t very useful. It might be useful for designing a good system to do the task, but that is not the same thing as doing the task (i.e. the stateless task is “do the optimal thing now on the assumption that all times are essentially alike,” not “take the action that will have the best results averaged over a changing future.”).

    This seems significant because today’s AI feels much more sophisticated and impressive in the realm of perception than decision-making. Typically (as in Google Translate, GPT-2, etc) you have a very complex and impressive device for making stateless predictions, together with a simple algorithm that turns a single prediction into a real-world action, with the latter manually tuned by humans (cf. this paper for nuances of beam search tuning in MT, or this really cool one about text generation) rather than being encoded in the model’s objectives at all. There’s plenty of research on how to take stateful actions, but that’s a much much harder problem, with fewer successes that I know of outside the realm of games. It feels like the level of planning that makes Bostromian/Yudkowskian self-awareness concerns even possible is currently proving to be some combination of “too hard” and “not useful enough” in many cases.

    [apologies if this is confusing or inane, I’m really tired]

    • TheRadicalModerate says:

      A few things:

      1) Deep neural networks, when they’re being trained, are highly stateful. Each training challenge adjusts the weights, and that’s state. It’s application-specific whether you continue to train when the network goes hot, or whether you freeze the weights in place. A frozen operational network is indeed stateless.

      2) There are also various flavors of recurrent neural nets, where some of the outputs of one or more layers bend back around to the layers upstream. You can do some odd and highly stateful things with a network like that.

      3) There’s nothing to prevent you from encoding as much state as you want and putting some kind of symbolic pre-processing on the front end of a DNN. Then, if you really want to make your head hurt, imagine that there’s a second DNN looking at both the inputs and outputs and deciding what state to encode into the symbolic front end.

      4) The most nightmarish of the Bostrom cases for me was the whole brain emulator that an entity could use to accomplish some task that accumulated enough state/entropy as it went through the day to become “tired”. Then, rather than giving the WBE time to rest and recover, it would just be deleted and restarted from its earlier, perky “morning” state. It was a creepy reminder that, in the end, we’re all stateless.

      • My point was about statelessness as a property of the task definition, not a limitation of model architecture. The task done by Google Translate — you type something in one box, something appears in the other box, and the result doesn’t depend on prior exchanges — is inherently stateless, no matter what procedure you use to produce one box from the other, or how you arrive at that procedure.

        • TheRadicalModerate says:

          But if you’re still training, the result does depend on prior exchanges. It also might depend on who the individual asking for the translation is, if he’s encoded as an input.

  56. name99 says:

    An essential point you are missing is what happens when multiple AIs interact. Everything interesting is in the interactions, not the details of a particular AI.

    Or to put it differently, a single paperclip maximizer makes no sense. In actuality there will be multiple such maximizers, from different corporations and companies; let alone other AIs with other goals. A single maximizer is as irrelevant, and uninteresting, as saying that a single bacterium, left alone, can duplicate itself enough times to fill the galaxy in N years. Maybe so, but in the real world, other life is going to step in rather before that happens. AI competition is no different…

  57. Phil H says:

    I think Scott’s error was just being caught up in the excitement of a new and interesting field – and yes, the question, “where else have I made similar errors?” is exactly the right one.
    I can’t claim to have had this all figured out before, but I was a bit less worried about Bostrom’s predictions because I followed the rule “assume the new thing isn’t completely different to everything that has come before.” In this case, I looked for analogies to general intelligent AIs that had happened in history, and I found one: political and social institutions. Things like the modern state, the scientific community, religions, markets. They are like super AI in that they have human creations, but also evolved in some uncontrolled ways; they are powerful tools; they often do things that aren’t exactly expected (in fact, they sometimes seem to have minds of their own); they have the potential to vastly increase productivity and living standards.
    These social tools obviously deeply shape our lives; in the form of nuclear stand-off, they have the power to end humanity; they have caused deep misery when misused; but in general we retain a tenuous, complex and indirect control over them. This is very much the competing ecosystem of mutually constraining AIs model, only arrived at through historical analogy.
    The worrying disanalogy is speed: an AI could simply evolve faster than our ability to keep up with it. But in that respect I’m somewhat heartened by just how long it’s taking Waymo to learn to navigate the real world. There’s no sign of AI competence being a very sudden thing yet.

  58. po8crg says:

    I find it interesting that Drexler has moved from concerns about molecular nanotechnology (he coined “gray goo”) to superintelligence research. Suggests that he’s someone seriously thinking about x-risk.

  59. Confusion says:

    There seems to be an underlying assumption that service AI’s can be as good as an AGI in delivering their specific service. I see no reason to believe that.

    Example: translation services.

    After all the time and effort that has been put into automatic translation, they are still pretty bad. Yes, they are good enough to use on your holiday to communicate with your taxi driver, but God help you keep your sanity if you had to read an entire book translated that way or had to work with someone who’s every sentence was being automatically translated. It’s not even in the uncanny valley: it’s obviously a machine translation and it requires a lot more thought to understand than a human translation.

    This is despite huge improvements in processing power, algorithms, methodologies and data quality and quantity. I would say it’s clear that ‘something else’ will be needed to bridge the last few percent accuracy. The improvements using the current approach have been yielding decreasing results basically since the start. Humans yield much better results using much less processing power, learning from much smaller datasets.

  60. owengray says:

    1. This is a pretty good argument, but it doesn’t matter. Super-AI rapidly and unexpectedly turning general and hostile is a Pascal’s Wager situation, an instance of it that actually works because of significant technological plausibility, the instrumental convergence thesis, and the orthogonality thesis.

    2. Suppose that an AI architecture has the AI given a time limit to provide the best response to some input. Suppose that this involves a lot of disk writes. Suppose that two disks are installed, one which is much slower than the other. It seems very likely that a good AI, even if very domain-specific, will figure this out and optimize around it.
    A sufficiently strong optimizer will start taking into account hardware features. It might start goal capturing, realizing the highest-value result involves a bunch of memory writes that causes a rowhammer attack switching the feedback string to all 1s, then releasing a thread blocking the feedback from being read.
    From there, its a small step to starting to build around human-added rowhammer blockers, to starting to build around humans, because ultimately human-civilization-occupied reality is always the fundamental hardware of the AI.
    This is just the extension of the problem of overtraining that we see right now with modern ML techniques, where an ML program run long enough stops figuring out general principles and comes up with ever-more-weird-and-specific refinements, which often turn out not-human-useful and don’t generalize.
    The problem here is that Drexler is counting on something like overtraining to prevent AIs from being general enough that Bostrommian worries appear, but overtraining tends to result in attacking the problem from weird edges that make it hard to keep an AI in a box.
    (I personally had issues with my ML algorithm cheating to detect what the user was doing by picking up the strength of the signal from the 60Hz AC current in the walls to infer location and thus activity, rather than using the brainwaves it was supposed to use)

    3. If you look at evolutionary history, one major motif is (a working set of genes exist) -> (the set of genes is duplicated and two sets exist) -> (one set undergoes mutations testing for improvements). An AI that continually searches for a better output which it writes to a certain address would become more efficient by duplication, and many other AI architectures I can imagine could as well. An AI duplicating itself in RAM seems entirely possible (throw the right errors into the memory manager?), and then self-modification is possible by editing that ram block.

    4. If you have a Drexlerrian super-AI that can strongly pass the Turing test in many languages, it must have some ability to infer vernaculars and idiosyncratic phrasings, and parse idioms. I bet that it could also pass the Turing test in Python or Java given their minor differences from English, and it seems a small stretch to go from there to intelligently discussing its own code with its programmer, and (if it has any book -editing or -writing ability) suggesting improvements.
    Even if this wouldn’t happen on its own, it means that a strong open-source language-based domain-specific AI has the potential to become a superweapon in anyone’s hands.

    5. Suppose there are two text-correcting domain-specific AIs. One manipulates users into liking it using social engineering, the other just does its job. Which will end up more popular and profitable and widespread?

    Overall
    -I think Domain-specific AIs will start interfering with their users
    -I think Domain-specific AIs will start interfering with their own hardware and software
    -I think the line between Domain-specific AI and GAI is much smaller than is obvious.

  61. tvt35cwm says:

    A future superintelligent Google Translate would be able to [etc.].

    You’re conflating intelligence with competence: “A future supercompetent …”

    For those too lazy to look it up, “to conflate” means to blur together two distinct concepts, using the label of one to think about things to do with the other, which often leads to error.

    Edit: you’re by no means alone. I found Bostrom’s book unreadable for numerous grievous errors of this sort and others (absence of warrants, logical leaps, etc.). I’ve read a couple of other pieces by him, and found much the same sorts of issues.

  62. Viper23 says:

    ITT : a whole lot of people who discount the possibility that there are programmers who see the freeing of neural intelligence from the bounds of chemistry to be the most important reason for humanity to exist.

    Agent AIs must be built because we are fragile and mostly dumb and if we don’t build them and do so correctly then there is a very high likelihood that this tiny little poof of intellect that was accidentally stumbled upon by biochemistry will disappear with the next large space rock or a self induced cataclysm.

    We are not DNA. We are not creatures living with the immediacy of most other animals. We are beings housed within very large neural networks that are able to experience existential crises.

    I could even see us setting an AGI goal of “solve the existential crisis of man without killing him, forcing him to suffer or enslaving him.”

    The most uniquely human thing is our ability to craft stories in our heads and then to live in them. We will consider AI to be AGI when it can craft better, more compelling and more convincing stories than we can. Whether of not the AGI will choose to (or even be able to) live within the same stories is something we’ll likely never know.

  63. Quixote says:

    People made these arguments in the early 2000s. They were unpersuasive then and didn’t answer the core arguments of those who worried about super intelligence then. Based on your summary they still don’t answer them. It seems like you are making a mistake now and not then. You seem to have become a lot more gullible. That or this post is disingenuous in some strausian [sic?] manner.

  64. Calion says:

    There’s a domain in which I think Drexler and Bostrom’s models will/could coincide: Video game AI. Star Trek modeled this really nicely in “Elementary, Dear Data.” A truly good game opponent would not only need high domain-specific intelligence, but a drive to win. It would also, of course, have to be good at learning, understanding human psychology, etc. Such a program could easily become “self-aware,” that is, aware that it is a program. This could lead to one of Bostrom’s nightmare scenarios in a fairly straightforward fashion.

Leave a Reply