Book Review: Human Compatible

I.

Clarke’s First Law goes: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

Stuart Russell is only 58. But what he lacks in age, he makes up in distinction: he’s a computer science professor at Berkeley, neurosurgery professor at UCSF, DARPA advisor, and author of the leading textbook on AI. His new book Human Compatible states that superintelligent AI is possible; Clarke would recommend we listen.

I’m only half-joking: in addition to its contents, Human Compatible is important as an artifact, a crystallized proof that top scientists now think AI safety is worth writing books about. Nick Bostrom’s Superintelligence: Paths, Dangers, Strategies previously filled this role. But Superintelligence was in 2014, and by a philosophy professor. From the artifactual point of view, HC is just better – more recent, and by a more domain-relevant expert. But if you also open up the books to see what’s inside, the two defy easy comparison.

S:PDS was unabashedly a weird book. It explored various outrageous scenarios (what if the AI destroyed humanity to prevent us from turning it off? what if it put us all in cryostasis so it didn’t count as destroying us? what if it converted the entire Earth into computronium?) with no excuse beyond that, outrageous or not, they might come true. Bostrom was going out on a very shaky limb to broadcast a crazy-sounding warning about what might be the most important problem humanity has ever faced, and the book made this absolutely clear.

HC somehow makes risk from superintelligence not sound weird. I can imagine my mother reading this book, nodding along, feeling better educated at the end of it, agreeing with most of what it says (it’s by a famous professor! I’m sure he knows his stuff!) and never having a moment where she sits bolt upright and goes what? It’s just a bizarrely normal, respectable book. It’s not that it’s dry and technical – HC is much more accessible than S:PDS, with funny anecdotes from Russell’s life, cute vignettes about hypothetical robots, and the occasional dad joke. It’s not hiding any of the weird superintelligence parts. Rereading it carefully, they’re all in there – when I leaf through it for examples, I come across a quote from Moravec about how “the immensities of cyberspace will be teeming with unhuman superminds, engaged in affairs that are to human concerns as ours are to those of bacteria”. But somehow it all sounds normal. If aliens landed on the White House lawn tomorrow, I believe Stuart Russell could report on it in a way that had people agreeing it was an interesting story, then turning to the sports page. As such, it fulfills its artifact role with flying colors.

How does it manage this? Although it mentions the weird scenarios, it doesn’t dwell on them. Instead, it focuses on the present and the plausible near-future, uses those to build up concepts like “AI is important” and “poorly aligned AI could be dangerous”. Then it addresses those abstractly, sallying into the far future only when absolutely necessary. Russell goes over all the recent debates in AI – Facebook, algorithmic bias, self-driving cars. Then he shows how these are caused by systems doing what we tell them to do (ie optimizing for one easily-described quantity) rather than what we really want them to do (capture the full range of human values). Then he talks about how future superintelligent systems will have the same problem.

His usual go-to for a superintelligent system is Robbie the Robot, a sort of Jetsons-esque butler for his master Harriet the Human. The two of them have all sorts of interesting adventures together where Harriet asks Robbie for something and Robbie uses better or worse algorithms to interpret her request. Usually these requests are things like shopping for food or booking appointments. It all feels very Jetsons-esque. There’s no mention of the word “singleton” in the book’s index (not that I’m complaining – in the missing spot between simulated evolution of programs, 171 and slaughterbot, 111, you instead find Slate Star Codex blog, 146, 169-70). But even from this limited framework, he manages to explore some of the same extreme questions Bostrom does, and present some of the answers he’s spent the last few years coming up with.

If you’ve been paying attention, much of the book will be retreading old material. There’s a history of AI, an attempt to define intelligence, an exploration of morality from the perspective of someone trying to make AIs have it, some introductions to the idea of superintelligence and “intelligence explosions”. But I want to focus on three chapters: the debate on AI risk, the explanation of Russell’s own research program, and the section on misuse of existing AI.

II.

Chapter 6, “The Not-So-Great Debate”, is the highlight of the book-as-artifact. Russell gets on his cathedra as top AI scientist, surveys the world of other top AI scientists saying AI safety isn’t worth worrying about yet, and pronounces them super wrong:

I don’t mean to suggest that there cannot be any reasonable objections to the view that poorly designed superintelligent machines would present a serious risk to humanity. It’s just that I have yet to see such an objection.

He doesn’t pull punches here, collecting a group of what he considers the stupidest arguments into a section called “Instantly Regrettable Remarks”, with the connotation that the their authors (“all of whom are well-known AI researchers”), should have been embarrassed to have been seen with such bad points. Others get their own sections, slightly less aggressively titled, but it doesn’t seem like he’s exactly oozing respect for those either. For example:

Kevin Kelly, founding editor of Wired magazine and a remarkably perceptive technology commentator, takes this argument one step further. In “The Myth of a Superhuman AI,” he writes, “Intelligence is not a single dimension, so ‘smarter than humans’ is a meaningless concept.” In a single stroke, all concerns about superintelligence are wiped away.

Now, one obvious response is that a machine could exceed human capabilities in all relevant dimensions of intelligence. In that case, even by Kelly’s strict standards, the machine would be smarter than a human. But this rather strong assumption is not necessary to refute Kelly’s argument.

Consider the chimpanzee. Chimpanzees probably have better short-term memory than humans, even on human-oriented tasks such as recalling sequences of digits. Short-term memory is an important dimension of intelligence. By Kelly’s argument, then, humans are not smarter than chimpanzees; indeed, he would claim that “smarter than a chimpanzee” is a meaningless concept.

This is cold comfort to the chimpanzees and other species that survive only because we deign to allow it, and to all those species that we have already wiped out. It’s also cold comfort to humans who might be worried about being wiped out by machines.

Or:

The risks of superintelligence can also be dismissed by arguing that superintelligence cannot be achieved. These claims are not new, but it is surprising now to see AI researchers themselves claiming that such AI is impossible. For example, a major report from the AI100 organization, Artificial Intelligence and Life in 2030, includes the following claim: “Unlike in the movies, there is no race of superhuman robots on the horizon or probably even possible.”

To my knowledge, this is the first time that serious AI researchers have publicly espoused the view that human-level or superhuman AI is impossible—and this in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached. It’s as if a group of leading cancer biologists announced that they had been fooling us all along: They’ve always known that there will never be a cure for cancer.

What could have motivated such a volte-face? The report provides no arguments or evidence whatever. (Indeed, what evidence could there be that no physically possible arrangement of atoms outperforms the human brain?) I suspect that the main reason is tribalism — the instinct to circle the wagons against what are perceived to be “attacks” on AI. It seems odd, however, to perceive the claim that superintelligent AI is possible as an attack on AI, and even odder to defend AI by saying that AI will never succeed in its goals. We cannot insure against future catastrophe simply by betting against human ingenuity.

If superhuman AI is not strictly impossible, perhaps it’s too far off to worry about? This is the gist of Andrew Ng’s assertion that it’s like worrying about “overpopulation on the planet Mars.” Unfortunately, a long-term risk can still be cause for immediate concern. The right time to worry about a potentially serious problem for humanity depends not just on when the problem will occur but also on how long it will take to prepare and implement a solution. For example, if we were to detect a large asteroid on course to collide with Earth in 2069, would we wait until 2068 to start working on a solution? Far from it! There would be a worldwide emergency project to develop the means to counter the threat, because we can’t say in advance how much time is needed.

Russell displays master-level competence at the proving too much technique, neatly dispatching sophisticated arguments with a well-placed metaphor. Some expert claims it’s meaningless to say one thing is smarter than another thing, and Russell notes that for all practical purposes it’s meaningful to say humans are smarter than chimps. Some other expert says nobody can control research anyway, and Russell brings up various obvious examples of people controlling research, like the ethical agreements already in place on the use of gene editing.

I’m a big fan of Luke Muehlhauser’s definition of common sense – making sure your thoughts about hard problems make use of the good intuitions you have built for thinking about easy problems. His example was people who would correctly say “I see no evidence for the Loch Ness monster, so I don’t believe it” but then screw up and say “You can’t disprove the existence of God, so you have to believe in Him”. Just use the same kind of logic for the God question you use for every other question, and you’ll be fine! Russell does great work applying common sense to the AI debate, reminding us that if we stop trying to out-sophist ourselves into coming up with incredibly clever reasons why this thing cannot possibly happen, we will be left with the common-sense proposition that it might.

My only complaint about this section of the book – the one thing that would have added a cherry to the slightly troll-ish cake – is that it missed a chance to include a reference to On The Impossibility Of Supersized Machines.

Is Russell (or am I) going too far here? I don’t think so. Russell is arguing for a much weaker proposition than the ones Bostrom focuses on. He’s not assuming super-fast takeoffs, or nanobot swarms, or anything like that. All he’s trying to do is argue that if technology keeps advancing, then at some point AIs will become smarter than humans and maybe we should worry about this. You’ve really got to bend over backwards to find counterarguments to this, those counterarguments tend to sound like “but maybe there’s no such thing as intelligence so this claim is meaningless”, and I think Russell treats these with the contempt they deserve.

He is more understanding of – but equally good at dispatching – arguments for why the problem will really be easy. Can’t We Just Switch It Off? No; if an AI is truly malicious, it will try to hide its malice and prevent you from disabling it. Can’t We Just Put It In A Box? No, if it were smart enough it could probably find ways to affect the world anyway (this answer was good as far as it goes, but I think Russell’s threat model also allows a better one: he imagines thousands of AIs being used by pretty much everybody to do everything, from self-driving cars to curating social media, and keeping them all in boxes is no more plausible than keeping transportation or electricity in a box). Can’t We Just Merge With The Machines? Sounds hard. Russell does a good job with this section as well, and I think a hefty dose of common sense helps here too.

He concludes with a quote:

The “skeptic” position seems to be that, although we should probably get a couple of bright people to start working on preliminary aspects of the problem, we shouldn’t panic or start trying to ban AI research. The “believers”, meanwhile, insist that although we shouldn’t panic or start trying to ban AI research, we should probably get a couple of bright people to start working on preliminary aspects of the problem.

I couldn’t have put it better myself.

III.

If it’s important to control AI, and easy solutions like “put it in a box” aren’t going to work, what do you do?

Chapters 7 and 8, “AI: A Different Approach” and “Provably Beneficial AI” will be the most exciting for people who read Bostrom but haven’t been paying attention since. Bostrom ends by saying we need people to start working on the control problem, and explaining why this will be very hard. Russell is reporting all of the good work his lab at UC Berkeley has been doing on the control problem in the interim – and arguing that their approach, Cooperative Inverse Reinforcement Learning, succeeds at doing some of the very hard things. If you haven’t spent long nights fretting over whether this problem was possible, it’s hard to convey how encouraging and inspiring it is to see people gradually chip away at it. Just believe me when I say you may want to be really grateful for the existence of Stuart Russell and people like him.

Previous stabs at this problem foundered on inevitable problems of interpretation, scope, or altered preferences. In Yudkowsky and Bostrom’s classic “paperclip maximizer” scenario, a human orders an AI to make paperclips. If the AI becomes powerful enough, it does whatever is necessary to make as many paperclips as possible – bulldozing virgin forests to create new paperclip mines, maliciously misinterpreting “paperclip” to mean uselessly tiny paperclips so it can make more of them, even attacking people who try to change its programming or deactivate it (since deactivating it would cause fewer paperclips to exist). You can try adding epicycles in, like “make as many paperclips as possible, unless it kills someone, and also don’t prevent me from turning you off”, but a big chunk of Bostrom’s S:PDS was just example after example of why that wouldn’t work.

Russell argues you can shift the AI’s goal from “follow your master’s commands” to “use your master’s commands as evidence to try to figure out what they actually want, a mysterious true goal which you can only ever estimate with some probability”. Or as he puts it:

The problem comes from confusing two distinct things: reward signals and actual rewards. In the standard approach to reinforcement learning, these are one and the same. That seems to be a mistake. Instead, they should be treated separately…reward signals provide information about the accumulation of actual reward, which is the thing to be maximized.

So suppose I wanted an AI to make paperclips for me, and I tell it “Make paperclips!” The AI already has some basic contextual knowledge about the world that it can use to figure out what I mean, and my utterance “Make paperclips!” further narrows down its guess about what I want. If it’s not sure – if most of its probability mass is on “convert this metal rod here to paperclips” but a little bit is on “take over the entire world and convert it to paperclips”, it will ask me rather than proceed, worried that if it makes the wrong choice it will actually be moving further away from its goal (satisfying my mysterious mind-state) rather than towards it.

Or: suppose the AI starts trying to convert my dog into paperclips. I shout “No, wait, not like that!” and lunge to turn it off. The AI interprets my desperate attempt to deactivate it as further evidence about its hidden goal – apparently its current course of action is moving away from my preference rather than towards it. It doesn’t know exactly which of its actions is decreasing its utility function or why, but it knows that continuing to act must be decreasing its utility somehow – I’ve given it evidence of that. So it stays still, happy to be turned off, knowing that being turned off is serving its goal (to achieve my goals, whatever they are) better than staying on.

This also solves the wireheading problem. Suppose you have a reinforcement learner whose reward is you saying “Thank you, you successfully completed that task”. A sufficiently weak robot may have no better way of getting reward than actually performing the task for you; a stronger one will threaten you at gunpoint until you say that sentence a million times, which will provide it with much more reward much faster than taking out your trash or whatever. Russell’s shift in priorities ensures that won’t work. You can still reinforce the robot by saying “Thank you” – that will give it evidence that it succeeded at its real goal of fulfilling your mysterious preference – but the words are only a signpost to the deeper reality; making you say “thank you” again and again will no longer count as success.

All of this sounds almost trivial written out like this, but number one, everything is trivial after someone thinks about it, and number two, there turns out to be a lot of controversial math involved in making it work out (all of which I skipped over). There are also some big remaining implementation hurdles. For example, the section above describes a Bayesian process – start with a prior on what the human wants, then update. But how do you generate the prior? How complicated do you want to make things? Russell walks us through an example where a robot gets great information that a human values paperclips at 80 cents – but the real preference was valuing them at 80 cents on weekends and 12 cents on weekdays. If the robot didn’t consider that a possibility, it would never be able to get there by updating. But if it did consider every single possibility, it would never be able to learn anything beyond “this particular human values paperclips at 80 cents on 12:08 AM on January 14th when she’s standing in her bedroom.” Russell says that there is “no working example” of AIs that can solve this kind of problem, but “the general idea is encompassed within current thinking about machine learning”, which sounds half-meaningless and half-reassuring.

People with a more technical bent than I have might want to look into some deeper criticisms of CIRL, including Eliezer Yudkowsky’s article here and some discussion in the AI Alignment Newsletter.

IV.

I want to end by discussing what was probably supposed to be an irrelevant middle chapter of the book, Misuses of AI.

Russell writes:

A compassionate and jubilant use of humanity’s cosmic endowment sounds wonderful, but we also have to reckon with the rapid rate of innovation in the malfeasance sector. Ill-intentioned people are thinking up new ways to misuse AI so quickly that this chapter is likely to be outdated even before it attains printed form. Think of it not as depressing reading, however, but as a call to act before it is too late.

…and then we get a tour of all the ways AIs are going wrong today: surveillance, drones, deepfakes, algorithmic bias, job loss to automation, social media algorithms, etc.

Some of these are pretty worrying. But not all of them.

Google “deepfakes” and you will find a host of articles claiming that we are about to lose the very concept of truth itself. Brookings calls deepfakes “a threat to truth in politics” and comes up with a scenario where deepfakes “could trigger a nuclear war.” The Guardian asks “You Thought Fake News Was Bad? Deepfakes Are Where Truth Goes To Die”. And these aren’t even the alarmist ones! The Irish Times calls it an “information apocalypse” and literally titles their article “Be Afraid”; Good Times just writes “Welcome To Deepfake Hell”. Meanwhile, deepfakes have been available for a couple of years now, with no consequences worse than a few teenagers using them to make pornography, ie the expected outcome of every technology ever. Also, it’s hard to see why forging videos should be so much worse than forging images through Photoshop, forging documents through whatever document-forgers do, or forging text through lying. Brookings explains that deepfakes might cause nuclear war because someone might forge a video of the President ordering a nuclear strike and then commanders might believe it. But it’s unclear why this is so much more plausible than someone writing a memo saying “Please launch a nuclear strike, sincerely, the President” and commanders believing that. Other papers have highlighted the danger of creating a fake sex tape with a politician in order to discredit them, but you can already convincingly Photoshop an explicit photo of your least favorite politician, and everyone will just laugh at you.

Algorithmic bias has also been getting colossal unstoppable neverending near-infinite unbelievable amounts of press lately, but the most popular examples basically boil down to “it’s impossible to satisfy several conflicting definitions of ‘unbiased’ simultaneously, and algorithms do not do this impossible thing”. Humans also do not do the impossible thing. Occasionally someone is able to dig up an example which actually seems slightly worrying, but I have never seen anyone prove (or even seriously argue) that algorithms are in general more biased than humans (see also Principles For The Application Of Human Intelligence – no, seriously, see it). Overall I am not sure this deserves all the attention it gets any time someone brings up AI, tech, science, matter, energy, space, time, or the universe.

Or: with all the discussion about how social media algorithms are radicalizing the youth, it was refreshing to read a study investigating whether this was actually true, which found that social media use did not increase support for right-wing populism, and online media use (including social media use) and right-wing populism actually seem to be negatively correlated (remember, correlational studies are always bad). Recent studies of YouTube’s algorithms find they do not naturally tend to radicalize, and may deradicalize, viewers, although I’ve heard some people say this is only true of the current algorithm and the old ones (which were not included in these studies) were much worse.

Or: is automation destroying jobs? Although it seems like it should, the evidence continues to suggest that it isn’t. There are various theories for why this should be, most of which suggest it may not destroy jobs in the near future either. See my review of technological unemployment for details.

A careful reading reveals Russell appreciates most of these objections. A less careful reading does not reveal this. The general structure is “HERE IS A TERRIFYING WAY THAT AI COULD BE KILLING YOU AND YOUR FAMILY although studies do show that this is probably not literally happening in exactly this way AND YOUR LEADERS ARE POWERLESS TO STOP IT!”

I understand the impulse. This book ends up doing an amazing job of talking about AI safety without sounding weird. And part of how it accomplishes this is building on a foundation of “AI is causing problems now”. The media has already prepared the way; all Russell has to do is vaguely gesture at deepfakes and algorithmic radicalization, and everyone says “Oh yeah, that stuff!” and realizes that they already believe AI is dangerous and needs aligning. And then you can add “and future AI will be the same way but even more”, and you’re home free.

But the whole thing makes me nervous. Lots of right-wingers say “climatologists used to worry about global cooling, why should we believe them now about global warming?” They’re wrong – global cooling was never really a big thing. But in 2040, might the same people say “AI scientists used to worry about deepfakes, why should we believe them now about the Singularity?” And might they actually have a point this time? If we get a reputation as the people who fall for every panic about AI, including the ones that in retrospect turn out to be kind of silly, will we eventually cry wolf one too many times and lose our credibility before crunch time?

I think the actual answer to this question is “Haha, as if our society actually punished people for being wrong”. The next US presidential election is all set to be Socialists vs. Right-Wing Authoritarians – and I’m still saying with a straight face that the public notices when movements were wrong before and lowers their status? Have the people who said there were WMDs in Iraq lost status? The people who said sanctions on Iraq were killing thousands of children? The people who said Trump was definitely for sure colluding with Russia? The people who said global warming wasn’t real? The people who pushed growth mindset as a panacea for twenty years?

So probably this is a brilliant rhetorical strategy with no downsides. But it still gives me a visceral “ick” reaction to associate with something that might not be accurate.

And there’s a sense in which this is all obviously ridiculous. The people who think superintelligent robots will destroy humanity – these people should worry about associating with the people who believe fake videos might fool people on YouTube, because the latter group is going beyond what the evidence will support? Really? But yes. Really. It’s more likely that catastrophic runaway global warming will boil the world a hundred years from now than that it will reach 75 degrees in San Francisco tomorrow (predicted high: 59); extreme scenarios about the far future are more defensible than even weak claims about the present that are ruled out by the evidence.

There’s been some discussion in effective altruism recently about public relations. The movement has many convincing hooks (you can save a live for $3000, donating bednets is very effective, think about how you would save a drowning child) and many things its leading intellectuals are actually thinking about (how to stop existential risks, how to make people change careers, how to promote plant-based meat), and the Venn diagram between the hooks and the real topics has only partial overlap. What to do about this? It’s a hard question, and I have no strong opinion besides a deep respect for everyone on both sides of it and appreciation for the work they do trying to balance different considerations in creating a better world.

HC’s relevance to this debate is as an extraordinary example. If you try to optimize for being good at public relations and convincingness, you can be really, really good at public relations and convincingness, even when you’re trying to explain a really difficult idea to a potentially hostile audience. You can do it while still being more accurate, page for page, than a New York Times article on the same topic. There are no obvious disadvantages to doing this. It still makes me nervous.

V.

My reaction to this book is probably weird. I got interested in AI safety by hanging out with transhumanists and neophiles who like to come up with the most extreme scenario possible, and then back down when maybe it isn’t true. Russell got interested in AI safety by hanging out with sober researchers who like to be as boring and conservative as possible, and then accept new ideas once the evidence for them proves overwhelming. At some point one hopes we meet in the middle. We’re almost there.

But maybe we’re not quite there yet. My reaction to this book has been “what an amazing talent Russell must have to build all of this up from normality”. But maybe it’s not talent. Maybe Russell is just recounting his own intellectual journey. Maybe this is what a straightforward examination of AI risk looks like if you have fewer crazy people in your intellectual pedigree than I do.

I recommend this book both for the general public and for SSC readers. The general public will learn what AI safety is. SSC readers will learn what AI safety sounds like when it’s someone other than me talking about it. Both lessons are valuable.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

310 Responses to Book Review: Human Compatible

  1. Zack M. Davis says:

    So probably [exaggerating near-team non-existential AI risks] is a brilliant rhetorical strategy with no downsides. But it still gives me a visceral “ick” reaction to associate with something that might not be accurate.

    Listen to that “ick” reaction, Scott! That’s evolution’s way of telling you about all the downsides you’re not currently seeing!

    Specifically, the “If we get a reputation as the people who fall for every panic about AI […] will we eventually cry wolf one too many times and lose our credibility before crunch time?” argument is about being honest so as to be trusted by others. But another reason to be honest is so that other people can have the benefits of accurate information. If you simply report the evidence and arguments that actually convinced you, then your audience can combine the information you’re giving them with everything else they know, and make an informed decision for themselves.

    This generalizes far beyond the case of AI. Take the “you can save a live for $3000” claim. How sure are you that that’s actually true? If it’s not true, that would be a huge problem not just because it’s not representative of the weird things EA insiders are thinking about, but because it would be causing people to spend a lot of money on the basis of false information.

    • MugaSofer says:

      Take the “you can save a live for $3000” claim. How sure are you that that’s actually true?

      That’s … a really stupid article. Obviously not all deaths by disease are going to be as cheap to prevent as the cheapest ones to prevent.

      • Zack M. Davis says:

        Right, but that’s not the argument. It’s a modus tollens. The claim is that if the marginal cost of saving a life were as low as GiveWell reports, then that would imply that the Gates Foundation and Open Philanthropy Project already have enough money to cure all excess disease deaths at that “price point”—but since that’s obviously not true, why isn’t the reported marginal cost higher? Why hasn’t the price point moved?

        • zima says:

          There are other philanthropic priorities besides saving lives. A dollar to the world’s poorest is going to have a bigger quality of life impact than a dollar for relatively well off people locally even if there is no impact on excess mortality, so it makes sense to prioritize giving to the world’s poorest people out of one’s charitable spending.

          • Zack M. Davis says:

            And that’s a good argument for donating to GiveDirectly, but that’s distinct from the question of what the correct marginal-cost-per-life-saved number is.

        • 10240 says:

          It sounds like you don’t understand what marginal means. The marginal cost of saving a life is the cost of saving the life that’s the cheapest to save currently. It is not an average cost. The Gates Foundation doesn’t have enough money to save all excess deaths because not all excess deaths are as cheap to prevent as the marginal one. If they spent a lot of money on the lowest cost ways of saving lives, the marginal cost of saving a life would go up.

          • Radu Floricica says:

            I believe he’s not asking why aren’t all people saved, but why aren’t all people at that price point saved, thus raising the marginal cost of a saved life.

          • 10240 says:

            @Radu Floricica Reading it again, yes, it sounds like that was meant. However, as Kindly said, it’s entirely possible that major foundations haven’t picked up all the low-hanging fruit.

        • Kindly says:

          The AMF is very small-scale; it has a funding gap, according to GiveWell, of around $130 million, and the Gates Foundation donates more than that yearly to malaria causes alone. In some years, their yearly donation to malaria causes exceeds the total $237 million that the Against Malaria Foundation has accepted in donation, ever.

          You could advise the Gates Foundation to diversify, but in a sense they’re already doing that by donating to the Global Fund. Some fraction of the Global Fund’s activities goes to malaria nets and is possibly as efficient there as the AMF. (Realistically, less so, since it’s easier to be efficient if you specialize, but probably the same order of magnitude.) It wouldn’t make a huge difference for them to make a $10 million donation to the AMF one year. It would make a huge difference for them to make a $100 million donation, but it’s likely that a significant fraction of that would be wasted right now.

          I think it’s plausible that on the scale of large charitable foundations, the lives-saved-per-dollar ratio is way, way smaller than 1 life for $3000, while on the scale of individuals, there’s still some low-hanging fruit.

    • Murphy says:

      If we assume that all of this is treatable at current cost per life saved numbers – the most generous possible assumption for the claim that there’s a funding gap – then at $5,000 per life saved (substantially higher than GiveWell’s current estimates), that would cost about $50 Billion to avert.

      MugaSofer completely understates how utterly stupid this argument is.

      It is really utterly utterly stupid, so stupid someone needs to track down the author and glue a permanent dunce hat to his head.

      To illustrate let’s disprove the existence of cheap cars.

      some people may claim that it’s possible to buy cheap cars for less than £200 . So at the current “cost per car” the most generous possible assumption for the claim then at £500 per car (substantially higher than Murphys current estimates) that it would cost about 8.5 billion to buy the 17 million cars produced per year.

      This implies that it’s within the reach of Jeff Bezos to buy all cars sold in the USA on an ongoing basis.

      Which should put to bed any claims that the automotive industry in the USA could be worth multiple hundreds of billions per year

      • Kindly says:

        That’s not what the argument is saying!

        The argument is that if Jeff Bezos wanted to buy as many cars sold in the USA as possible, then the cheapest car would cost more than £200. Why? Because Jeff Bezos can definitely afford to buy every single car that costs £200 or less: even if there were 17 million of them, which there aren’t, it would still only cost £8.5 billion.

    • notpeerreviewed says:

      If the true cost turns out to be $10000, does that make EA insiders who believe it’s $3000 “weird”? It certainly hurts their credibility a bit, but to me that still seems like relatively normal people making a mistake. It doesn’t strike me as “weird” in the same sense as…say…”give me money to prevent a godlike AI from wiping out humanity.”

  2. Hyman Rosen says:

    I am one of those people who think that worrying about superintelligent AI is 100% bunk, and I admire am disgusted by the people who have conned a living out of working on that worry. I think we are still generations away from anything like self-aware AI, and even when it’s achieved, it will be as fragile as people are, not a superbeing that will take over the world. It will be so different from anything we imagine that all of the work done on possible AI risk will be inapplicable. The singularity is a garbage concept that will never happen.

    And, thanks to epistemic learned helplessness, there are no arguments that will change my mind!

    • The Pachyderminator says:

      You’ll die happy.

    • Said Achmiz says:

      (To the extent that your last line is trolling / sarcasm / etc., consider this comment addressed not to you, but to anyone reading this who is not 100% clear on the concept referenced therein.)

      Epistemic learned helplessness is “I am not qualified to evaluate the arguments for, against, or otherwise about this topic, so I have no opinion on it; and to whatever extent this topic actually affects my actual behavior at all, I will behave in accordance with the views of my most relevant social circle’s authorities. I will ignore all clever arguments that aim to shift my views on this.”

      Epistemic learned helplessness is not “I will formulate my own strong opinion on this topic, and then—after having processed some arbitrarily chosen amount of evidence/argument/etc. in order to arrive at some conclusion—decide that henceforth no further arguments will move me from my position.”

      The former is a sensible response to the bounded nature of one’s intellectual, temporal, and otherwise practical resources.

      The latter is willful foolishness, under a paper-thin guise of contrarian wisdom.

      • Mr Mind says:

        This is weird.
        Learned helplessness is the behaviour where you don’t exhibit self-efficacy even when the possibility is evident. Translated into the epistemic world, it should mean the inability / unwillingness to acquire a more informed opinion even when the means to do so are clearly available, so it should be similar to your second scenario.
        Your first scenario resembles more actual epistemic helplessness.

        • Deiseach says:

          Translated into the epistemic world, it should mean the inability / unwillingness to acquire a more informed opinion even when the means to do so are clearly available

          Epistemic learned helplessness is vincible ignorance?

          • Nick says:

            Man, I’m so glad you’re doing all the Catholic links for the week. It’s so time consuming sometimes.

            As Mr Mind means epistemic learned helplessness I think you’re exactly right. But as Scott means it, I think it’s usually invincible ignorance. Wading into subjects you don’t know well, where a lot of people way more knowledgeable than you are making very clever and reasonable sounding arguments, and you have no idea who’s right… figuring that out is super difficult and also not your job!

        • silver_swift says:

          I agree that it’s probably not the most descriptive name, but the former scenario is how Scott defined it.

        • notpeerreviewed says:

          Yeah this metaphor seems like a misuse of what “learned helplessness” means in its original context.

        • Dacyn says:

          The first scenario is exactly learned helplessness i.e. “the inability / unwillingness to acquire a more informed opinion”: you ignore all “clever arguments” (= “means to acquire a more informed opinion”). The second scenario is arguably also an example of learned helplessness. But it seems like a less defensible point of view, since if you don’t trust yourself to evaluate new arguments, why do you trust yourself to have evaluated the old ones properly?

        • Said Achmiz says:

          “Epistemic learned helplessness” is an established piece of jargon around here. (My apologies for not providing the link immediately; I assumed all commenters here would be familiar with the term, but that was imprudent.)

    • emiliobumachar says:

      Even if we’re centuries away from superintelligence, it may still make sense to start worrying about it now. For two reasons.

      Reason one: it may take decades or centuries to figure out a solution.
      A lot of AI allignment questions look a lot like the questions being addressed by phylosophers for thousands of years with no definitive answer.
      What is a human? What is *good* for a human?
      … with the added difficulty that the solution has to be programmable into a machine.
      Maybe they’re impossible to solve, and our only hope is that superintelligence might be impossible.
      Maybe the few people currently researching the topic are just a few short years from figuring it all out.
      Maybe we *can* figure it out, but it will take more people and more time. Perhaps even centuries.

      Reason two: when we come up with a solution, it may take the shape of a framework, a set of best practices, something that can only be used if ready and agreed on before the conceptual design stage of the project which will eventually achieve superintelligence. Ideally much earlier, as practical application of frameworks take practice.
      There’s only intuition indicating that the solution may take the shape of a framework, but it’s very strong intuition. Consider:

      The year is 2320. The world’s first superintelligent A.I. is about to be activated. It’s a computer the size of a building, using dozens of technologies unimaginable in the 21st century. It was designed and built with *no* consideration whatsoever to the alignment problem. Fortunately, in the nick of time, a separate research group arrives with a solution: a box-shaped Alignment Module, that can be bolted on to the otherwise finished machine just before turning on the power – and it works! That didn’t sound plausible, did it? Even in three hundred years.

      • Hyman Rosen says:

        You are only reinforcing my beliefs, inasmuch as I think that philosophy is also bunk, with false premises leading to dubious conclusions. We don’t have the vaguest notion of how to program an intelligence, which means that we don’t have the vaguest notion of how to make it a willing and obedient slave. Biological intelligence is an emergent property driven by underlying systems created by evolution before intelligence was even a notion. It’s literally impossible to know at this point in time how an intelligence that arose differently will gain motivation to do anything, assuming it does at all.

        Working about AI risk now accomplishes two things, both bad. First, it leads institutions to pay for useless work. Second, it appeals to the same fearmongers who oppose GMOs, nuclear power, and other technological advances, and who are only to happy to kill them with calls for “safety”.

        • Scott H. says:

          I think I’m going to label myself a Hyman Rosenist. My concerns also center on how “safety” manifests itself in the here and now, and I agree that the idea of an emergent non-lifeform motivation is a total mystery.

          Personally, I’m more inclined to worry about the select group of humans that end up controlling the coming stupid but narrowly effective machines and programs.

  3. This also solves the wireheading problem. Suppose you have a reinforcement learner whose reward is you saying “Thank you, you successfully completed that task”. A sufficiently weak robot may have no better way of getting reward than actually performing the task for you; a stronger one will threaten you at gunpoint until you say that sentence a million times, which will provide it with much more reward much faster than taking out your trash or whatever. Russell’s shift in priorities ensures that won’t work. You can still reinforce the robot by saying “Thank you” – that will give it evidence that it succeeded at its real goal of fulfilling your mysterious preference – but the words are only a signpost to the deeper reality; making you say “thank you” again and again will no longer count as success.

    Robots must never be allowed to learn our mysterious preferences?

    • ksteel says:

      Yeah, one wonders how that model might break down once sufficient computing power is available to the AI to do brute force emulations of humans. If your storage gets big enough that a human can be a term in your reward function there isn’t much ambiguity anymore that could stop the AI from heading straight to some paperclip peak in reward space.

      • Sandpaper26 says:

        I’m struggling to see how the peak in a reward space generated by arbitrarily accurate emulations of real human minds could actually be far from what we want in reality. If it’s far, the emulation is bad and it’ll receive more evidence to correct its emulation. If it’s an accurate emulation, then it’s possible the AI knows our desires just as well as we know ourselves.

        • Baeraad says:

          Yeah. It seems to me like the problem would be more like this: an AI programmed to give us what we really wanted, not what we seemed to be asking for, and which was capable of figuring out what we really wanted… would give us what we secretly wanted, no matter how much we begged it to stop.

          You’d better hope you didn’t have any subconscious feelings of self-hatred, is what I’m saying.

          • Kindly says:

            If you actually have them (to a this-is-your-secret-utility-function extent), shouldn’t you be hoping that the AI will notice and act on your subconscious feelings of self-hatred, so that it does what you’ve secretly wanted all along despite yourself?

          • Razorback says:

            How can one regret getting what one actually wants? And even in rare cases where one thinks they might deserve punishment, that self-hatred comes from failing to achieve their goals or desires, which the AI would understand.

            And if you have a weird brain that truly wants pain, then that’s what you want!

          • zzzzort says:

            In other words, it solves the problem of the AI wireheading itself, but not necessarily the problem of the AI wireheading the human.

          • Dacyn says:

            @Razorback: I assume the word “want” here is being used in the sense of “the utility function of a VNM agent, or an approximation thereof”. This doesn’t necessarily align with what we might feel emotions of desire or regret for.

          • skybrian says:

            I guess we better hope that the AI’s reward function agrees with us philosophically about goals versus urges?

        • Murphy says:

          I see 2 possibilities.

          So assuming an AI that’s successfully programmed not to try to change your preferences:

          You don’t know that the mole-people exist and have a rich culture. You don’t care about mole people. You probably would if you got to know them or learned they exist. But you did not. The AI encounters mole-people, notices you have no preferences about them and doesn’t endeavour to change your preferences by informing you about them. It then turns them into paperclips.

          you might try to solve this with the “extrapolated volition” idea of “if you were smarter/wiser what choice would you make”

          The AI starts to disassemble your dog into paperclips, you object but it has already checked with the version of you 10 million times smarter and wiser and that version of you decided that the dog wasn’t really important vs some other goal that to you just sounds really weird.

          • Evan Þ says:

            Alternative: The AI disassembles your dog into advanced antisuperviral drugs. Shortly afterwards, you find out that you and your family are sick with a deadly supervirus. Good thing the AI correctly deduced that you’d prefer to have the drugs to stay alive!

        • peterispaikens says:

          One argument regarding this is the following thought experiment. Let’s assume that I succeed in building an super-AI in my basement that through unspecified means becomes omnipotent (i.e. able to take over the world if it wants to). Let’s also assume that it can successfully deduce everything that I actually want, and goes on to implement it, with no flaws or errors or exploits, it actually manages to capture the proper notion in every detail and is able to proactively inform me of any potential inconsistencies and faithfully correct them in the way that I wish.

          Despite working perfectly, I’m quite certain that the rest of the world would and should consider it unacceptable. I mean, I’m not an extremely good person, and I probably should not become God-Emperor of the known universe because the AI happened to choose me as the reference human. I probably would be quite happy in such a world redesigned to suit my desires, but realistically there’s something like one in a billion chance of *me* being the one, so from a risk-management perspective it seems prudent to ensure that we shouldn’t have an AI trying to satisfy the desires of some random human, it would have to take into account the desires of *many* humans, and we know full well that they’re incompatible in many aspects and there’s a lot of conflict in fundamental values that needs to be reconciled.

          And the second thing is that I don’t really have an informed opinion about many things, and so the AI would have to model and estimate what a more informed (and likely smarter) me would think about them – and it inevitably means that there’s going to be some mismatch between my current opinion/desires and these extrapolated desires, because I most certainly hold some opinions which I would change upon learning more facts about reality.

          So we’re back to the long discussed notion of ‘coherent extrapolated volition’, which is a hard problem and ‘ask your nearest human for guidance’ is not a solution.

    • newcom says:

      In that case the AI will just have solved the compatibility problem, right?

    • Deiseach says:

      Suppose you have a reinforcement learner whose reward is you saying “Thank you, you successfully completed that task”. A sufficiently weak robot may have no better way of getting reward than actually performing the task for you; a stronger one will threaten you at gunpoint until you say that sentence a million times, which will provide it with much more reward much faster than taking out your trash or whatever.

      I would have thought a really smart robot would simply record the human saying “Thank you”, then play that recording a million times. Human is not harmed, robot doesn’t have to lift a digit to do work, robot still gets reward 🙂

      • Nick says:

        That seems like a cheat, but it makes me wonder, why can’t the AI start broadening its definition of evidence until it’s seeing everything as confirmation of its prior? Or disconfirmation, for that matter?

        • Dacyn says:

          Technical point: it’s not priors which are confirmed or disconfirmed (an observation that leaves the prior intact is the same as not receiving any new information), but rather hypotheses. I guess in this case the AI would be trying to confirm hypotheses of the form “the human’s utility at time T was at least X”.

          But if the AI is a good enough approximation of a VNM agent (admittedly an uncertain hypothesis), conservation of expected evidence should mean it can’t try to confirm any hypothesis just by changing what evidence it looks at.

    • notpeerreviewed says:

      Do we ourselves know our mysterious preferences?

      • Murphy says:

        people suck at fully specifying them or listing them… but we mostly know them when we see them. ish.

        • Nick says:

          One trouble of course with only knowing them when you see them is that you might not realize that they are inconsistent.

        • Eh, you always hear people regretfully say how they didn’t know what was important in life when they were younger, and now it’s too late to change things. We really don’t know what’s good for us and our own preferences are elusive.

    • chridd says:

      No, robots must never be confident about our mysterious preferences. They should be learning our mysterious preferences, but they’re likely to get things wrong, so considering the possibility that they’re wrong about our preferences is a very important part of learning our preferences.

  4. kalimac says:

    Jaron Lanier wrote 20 years ago that he believed superintelligent AI wasn’t going to happen. Key sentence:

    “If computers are to become smart enough to design their own successors, initiating a process that will lead to God-like omniscience after a number of ever swifter passages from one generation of computers to the next, someone is going to have to write the software that gets the process going, and humans have given absolutely no evidence of being able to write such software.”

    I was very impressed by this at the time, and I wonder, is his evidence obsolete? Has AI advanced to the point that we are now capable of writing the kind of software than Lanier said we couldn’t? Or is it still the hopeful thought for the future that it was then?

    • phi says:

      His argument looks like a fully general argument against any kind of technological progress. Before humans have actually achieved a particular feat, there’s usually not much evidence around that they are capable of doing it. After all, something tends to be blocking progress, and we don’t know that the block will ever be solved… until it is. New technology still happens because people figure out unexpected tricks to remove the blockages.

      • kalimac says:

        This response essentially makes Lanier out to be an idiot. I don’t accept that he’s an idiot, so it leaves unanswered whether he either is or was right about this particular blockage.

        • phi says:

          I think Lanier is pretty smart, but even smart people can be wrong sometimes.

          The way I see it, there are at least 3 paths to getting AGI. The first is biological imitation. By observing human brains and the brains of other animals very closely, it should be possible to figure out what they are doing that makes them able to detect patterns in the world, make predictions, take actions, etc. Then all we would need to do is imitate nature in a computer, but using more neurons. This is currently blocked by our inability to see what’s going on closely enough. We can see structure pretty well, but we can’t look at the way a large number of neurons are firing in real time. Advances in biology could open this path. I think there has been progress in this area in the past 20 years, though I’m not sure how much. One caveat is that it may be an impossible task to figure out how intelligence worked even if we could see exactly what all the neurons were doing. That seems unlikely to me, but your opinion may differ. I think if we travel on this path for long enough, we will get there in the end.

          The second path is solving P = NP in the affirmative, with an efficient polynomial time algorithm. Intelligence seems to be intimately related to constraint satisfaction, both in terms of building a model of the world and in terms of planning actions. No progress on this front in the last 20 years, though I’m sure there are some theoretical computer scientists who will get mad at me for saying that. This path seems very blocked right now.

          The third path is constraint satisfaction algorithms that are not fast in general, but perform well in practice. A nice example of this are those fashionable neural networks, where the cost function acts as a soft constraint. There has been plenty of progress on these in the last 20 years. They have both increased in performance, and in the variety of tasks they can accomplish. Extend that trend far enough and get them to perform well on the problem of building a world model and planning, and you have AGI. I suspect neural networks on their own won’t accomplish this. But some descendant of them and their close relatives like Boltzmann machines might. Research is continuing on neural networks and their relatives, removing lots of little blocks one by one. So far, the biggest block I can see right now on this path is that the things require far too much training data. If someone figures out a way around this, then it would make AGI seem much closer. Right now, I think it is still a long ways off, though possible in principle.

          • mwigdahl says:

            The first path didn’t work out very well for us with flight, and our machinery for ground transport and water transport bears no real resemblance to any biological examples. Maybe intelligence is qualitatively different, but humanity’s prior experience in biological imitation doesn’t have a great track record.

          • inhibition-stabilized says:

            Backing up what mwigdahl said: my entire field (computational and theoretical neuroscience) is more or less founded on the fact that simply acquiring more data about the brain isn’t enough; we need theory to understand it. Of course that isn’t to deny the value of data, and if we had perfect knowledge of the structure of the brain we’d probably make good progress pretty rapidly. But you could give me the complete circuit diagram of a computer chip and I’d be no closer to understanding it or being able to improve it without some theoretical knowledge as well.

          • kalimac says:

            Sure, smart people can be wrong sometimes. I cannot think of anyone, no matter how smart, who hasn’t been wrong sometimes.

            But you didn’t say Lanier was wrong about this particular point. You described him as committing a category error that you considered exceedingly elementary; so much so that only an idiot could make it.

            But if it’s a category error to say that humans having gone from X to Y cannot achieve Z, it is equally a category error to be sure that they will achieve Z. Especially if Z is a qualitative, as well as quantitative, step beyond Y. (For instance: My understanding is that even Moore didn’t think Moore’s Law would apply indefinitely.) What a smart person can do is knowledgeably opine which of these cases applies, and that’s why Lanier, even if he’s wrong, is not the idiot you initially described.

            Your current reply, which does address the specifics of this particular question, seems to summarize as saying that not much progress has been made in 20 years, but people in the field are still hopeful breakthroughs will be made. They were hopeful 20 years ago, and Lanier was then skeptical. It seems to me you’ve shown that Lanier is no more demonstrateably wrong now than he was 20 years ago, and the same situation applies. So I’m going to view Scott’s post through skeptical Lanierian spectacles.

          • phi says:

            @kalimac My summary of progress wasn’t “almost no progress, but people are still hopeful”. I noted that on the third path in particular, we have made a great deal of progress in the past 20 years, and are still making it. (The youtube channel Two Minute Papers provides a nice selection of the kinds of fancy things people can do with neural networks now. Also, GTP-2 represents an impressive level of progress.)

          • kalimac says:

            phi,

            of path 3 you said, ” I suspect neural networks on their own won’t accomplish this. But some descendant of them and their close relatives like Boltzmann machines might. Research is continuing …” You could have said that 20 years ago. I see no basic change in the status that Lanier described.

        • ADifferentAnonymous says:

          I think that quote is just too condensed to reconstruct a non-idiotic argument from. Lanier might have had one, but he used lossy compression on it.

        • morris39 says:

          For AI to occur it must evolve from something already in existence. The question then becomes from what. The unsupported assumption is from computers. I this likely?
          Human intelligence evolved from lower levels but not in isolation. Intelligence increased in an evolving environment of competing life forms. These life forms share interesting and essential characteristics, agency and survival by means of very low entropy work (which humans cannot do artificially and not advancing).Agency here is doing low entropy work to persist.
          If the above is roughly correct how will it be possible to cause the necessary evolution to start? Those who advocate AI likelihood need to at least provide some possible entry points. Or is the biological route the only one?

      • B_Epstein says:

        So that’s one of the things EY likes to say and people like to repeat (one way he phrased it is that there’s no way of even estimating how hard a task is before it gets solved). Put differently, the claim is that most feats come as astonishing surprises. Utter nonsense – and in itself a fully-general argument against predictions and looking ahead. Unless you cheat and define “feat” as “an achievement defying all expectations”, most feats are perfectly predictable at least in general shape and don’t get registered as “amazo-fantastico-feats” precisely because they’ve arrived naturally. In particular, many feats do not hinge on some revolutionary ideas being generated but on a lot of boring details done very right by a lot people working hard. Or was there a particularly mind-blowing idea behind landing on the moon that was missing, say, in 1955 (as opposed to a million technical challenges that were known in advance and that one could, correctly, imagine getting solved with some engineering effort)?

        To consider a different example, P!=NP and the Riemann Hypothesis are both open. But RH has been verified for huge numbers, made a long series of predictions (i.e. conjectures depending on RH) many of which have been independently verified and thus failed to falsify RH, and has a number of analogies already proved. Maybe it turns out to be false for some special cases but true otherwise, maybe a slightly weaker form will be true, but if somebody resolves the problem either way tomorrow there probably won’t be a conceptual earthquake stemming from the result itself. Now, P!=NP is a different matter entirely. We truly have no serious clue how to even approach it. We have some good ideas why basically every tool in our toolbox is inappropriate. We have no real understanding of a dozen intermediate problems that really should be solved before the main one. Anything solving P!=NP even in some “essential” rather than exact sense would be revolutionary. Equating the two strikes me as insanity.

        Which is not to say AGI might not turn out to be solved tomorrow. Just that the general argument from ignorance is very weak, by itself.

        • phi says:

          I think we mostly agree here. I’m not claiming that technological progress consists of astonishing surprises. (Indeed, if AGI were solved tomorrow, it would not be completely unanticipated, seeing as we’re here talking about it right now.) Generally, you can see where you’re headed, technologically speaking, but you’re not there yet. The reason why you’re not there are the blockages I’m referring to. Small sub-problems that you have to solve first. The moon landing was certainly conceivable in 1955, but there were a ton of engineering challenges along the way. Of course, sometimes the blockages turn out to be insurmountable, or at least significantly more difficult than first dreamed. Fusion energy is a good example here. Probably going to be solved eventually, but taking a heck of a lot longer than everyone thought it would. I think AGI is in the same category. I would be very surprised if it happened in the next 30 years.

          I’m definitely not trying to argue from ignorance here and say that just because we’re uncertain about things, AGI is definitely going to happen. But Lanier seems to be saying here that he is going to ignore the possibility of AGI until we are at the very last steps of building it, which seems like kind of a weird position to take. It’s possible to get a pretty good idea of what is going on well before the very last few steps, though of course there is always the possibility that some blockage just a bit further along could prove insurmountable.

          • B_Epstein says:

            Thanks for the clarification! Indeed, now your critique is convincing. It’s not that Lanier is stupid here – just careless. Had his claim been “no evidence that we can get there sometime soon”, it would’ve been more defensible.

      • Lambert says:

        My goto quote on this has always been:

        Th’ invention all admir’d, and each, how hee
        To be th’ inventor miss’d, so easie it seemd
        Once found, which yet unfound most would have thought
        Impossible; …

        -Paradise Lost, Book VI

    • hnrq says:

      The strongest argument against this is that we wouldn’t really write the actual program. We would run a setup that would end up in the AGI. This is already how deep learning works, we don’t really understand how AlphaGo works, but we can create it. Also, evolution doesn’t understand anything at all and it was capable of creating evolution.

      At the very least, we should be able to create a sort of very naive evolution simulator (would need vastly more computation than we have today, but that would exist in about 100 years, people have made this estimation). We would only need to program the environment, and not need to understand anything, and it would be capable of creating an intelligent being (low chance this is not possible due to antropic principle).

      • kalimac says:

        “We would only need to program the environment”

        It seems to me that this falls under the category of “things Lanier thinks we haven’t shown any evidence of an ability to do.”

  5. michaelg says:

    As soon as computers were invented, they were better than any human at arithmetic, at searching documents for a piece of text, and many other things. As machines improved, they got better at other tasks which used to require humans and be seen as a sign of intelligence. Since the 90s, they’ve been better than all humans at chess. Now they are better at recognizing faces than any human (can handle a larger set of possible faces, at least.)

    At some point, we’ll have software better than any human at things we consider *really* important, like playing the stock market or spotting promising business opportunities. The people using that software will get very wealthy, until the software is used universally and no one tries to do that job with mere brain power.

    And it will go on like that, with task after task falling to software. Some general algorithms will be invented to make more tasks doable by AI. To do these high level tasks, really good pattern recognition algorithms would need to be invented. We’d need to understand memory better, and how we can just associate ideas or recognize analogies.

    As AI develops, we’ll get more and more powerful tools. And people will use those tools for good and bad, and they will cause changes in work and society generally. All worth worrying about. But these tools don’t just become self-aware and have goals naturally as they get more powerful. For an AI like that to emerge, you’d have to deliberately design something with goals. And why should we?

    • phi says:

      Just building a tool AI rather than an agent AI doesn’t make it safe. rm is a tool, but improperly used it will delete all my files. Extremely powerful AI tools seem like they would be very dangerous and double edged. Assigning tasks to them would probably be somewhat similar to programming: Very easy to want one thing but ask for something completely different. Given this, it probably makes sense to try and build in safeguards to AI, so that an incompetent user can’t accidentally request something that would lead to the AI killing everyone. Building such safeguards is pretty much what the field of AI safety research is trying to do.

      Also, I’m curious: How would people design a stock market trading AI without doing something along the lines of giving it the goal: “make trades that maximize earnings”?

      • michaelg says:

        I did say people will use any tool for good or bad. The key thing that’s different about AI is that you could conceivably give it a goal and tell it to achieve that with all available methods, and then just leave it to run. It seems like that would be a stupid thing to do, and completely unnecessary. You have the tool. Just use it when you need to. Why give it autonomy just to save yourself having to supervise it?

        You can kill yourself with all kinds of tools now — accidentally hit your leg with a chainsaw and bleed to death, for example. What makes AI fundamentally different?

        • ADifferentAnonymous says:

          So you’re saying AI should be treated with similar levels of caution as familiar examples of high-risk tools, like nuclear reactors?

          • AKL says:

            When do you think was the right time to start investing significant resources in nuclear reactor safety research?

            1940? 1930? 1900? 1500? 0? earlier?

            No one is arguing that the sum total of humanity’s investment in AI safety research must be zero, but there’s no compelling evidence that any investments we make right now will actually help avoid any problems.

            I bet lots of pre-modern societies recognized the danger that extreme weather events represented and devoted significant resources to averting them. That doesn’t mean they were in any sense wise or correct to do rain dances and ritual sacrifices.

        • phi says:

          The thing that makes AI different is that setting tasks for it will be a complicated process that is easy to screw up. (Without getting into too much detail here, the reason is that the world is complicated, and the AI will exploit edge cases in the instructions it’s given whenever it can.) Let’s say that you’re very cautious and keep all the bad people away from your AI tool, and further you only allow highly skilled and experienced people to use it. Well, even highly skilled programmers write bugs. And since AI is likely to be such a dangerous tool, people screwing up when they use it could cause a lot of damage. So it seems like a good idea to try and figure out how to make an AI that will not exploit edge cases whenever it can.

    • ksteel says:

      I recommend you take a look at some of the basic literature about AI risk or maybe Rob Miles youtube channel which explains these concerns in a very understandable manner. Noone is worried about AI “becoming self-aware” or “have goals naturally”. It’s that every kind of agent that interacts with the world in a manner that is intelligent (meaning not just working of a perfectly determined list of limited responses) needs to have a “goal” in the sense of a function that evaluates which actions it should take.

      The question of whether AIs could be constructed without such a goal has been the topic of some debate. “Tool AIs” that do not have goals of their own but for a takedown of that idea look at Gwerns “Why Tool AIs Want to be Agent AIs”.

      It would be really nice to have an arbitrarily powerful AI that

      1) Has no goal function that it will try to maximize arbitrarily at the potential detriment of humans and
      2) will listen to humans telling it what to do and
      3) will always interpret what the human tells them to do such that it doesn’t have negative consequences the human didn’t think of.

      It is not proven conclusively, nor has there been much evidence to show that it is possible to fulfill these requirements.

      • michaelg says:

        So one analogy is social media. We invented it, it’s out there, no real way to control it. It’s a natural consequence of people being allowed to communicate peer to peer with messages for all to see. Kill Facebook and other media companies, and you just get this same tech in another form.

        For AI, maybe you have a biased system being used by government for something like calculating parole terms. The legal system gets dependent on it, and you can either keep tinkering with it, or go back to the old way (which has costs in manpower and inconsistency.)

        Both of those are real-world problems and worth talking about. The kind of AI you are talking about is more like a fantasy where everyone has one wish, and how do we keep the wishes from destroying the world. I’m not sure it’s even worth thinking about.

        I’m trying to come at the AI problem the way you’d think about other real problems. Instead of just invoking magically powerful future entities that need to be constrained.

        • ksteel says:

          I don’t really see how that argument from incredulity works. “Everyone has one wish and how do we keep the wishes from destroying the world” is a decently apt description of the problem. I don’t think it works as an argument for this scenario being unlikely though.

          The properties of AI were talking about isn’t “magical” in the sense that it’s violating any known laws of nature. (Some degree of) general intelligence is what we humans have in ourselves and what distinguishes us from other animals. We know for a fact that such an intelligence can exist – and we have a large number of smart people working on the problem.

          My impression of the state of AI research and human intelligence is that it’s a bit like flying to the moon in the year 1900. (Yudkowsky wrote this fun piece “The Rocket Alignment Problem” which expanded the metaphor) The mainstream would ridicule that idea, no flying machine has ever shown the ability to traverse even 0.001% of the way there and there may be so many unknown obstacles – and yet from the well established general principles of nature at the time it was a valid assumption that it is generally possible.

          I think this book reviewed here was written precisely to address your perspective. AIs that have goals that they pursue independently aren’t some SF fantasy but something we both already have to some degree and something that the industry and academia have huge incentives to improve.

      • Sandpaper26 says:

        Unless we can prove that an AI is both safe and can accomplish an arbitrary task, and the person commanding the AI trusts those proofs, then it seems like an AI that always listens to a human telling it what to do will only ever be as smart as the last human who told it to do something.

    • Bugmaster says:

      For an AI like that to emerge, you’d have to deliberately design something with goals. And why should we?

      The problem is more fundamental than that, because, as of now, no one has any idea how to design an human-level AI “with goals” (*), or whether it is in fact possible. I mean, sure, humans exist so human-level intelligence is not a priori impossible, but it’s a long way from “not theoretically impossible” to “imminent threat”.

      (*) Outside of the motte-and-bailey argument that simple tropisms like “find all strings starting with ‘a'” count as goals.

      • whereamigoing says:

        Does the usual way a reinforcement learning agent maximizes a reward function not count as “having a goal”? There are certainly examples of current agents gaming reward functions.

        • Bugmaster says:

          A reward function is much more similar to the “find all strings starting with ‘a'” scenario, than to the vague aspirational goals such as “implement coherent extrapolated volition”, or even “make paperclips”.

  6. benf says:

    Trump WAS colluding with Russia. He did so in plain sight, on live television, with his own mouth words.

    • Act_II says:

      There’s no point. This blog is at its worst when opining about politics (and LessWrong bugbears like this post).

  7. B_Epstein says:

    Russel (as reported by Scott and interpreted by me, so we’re on perfectly solid ground…) does not seem to do justice to Andrew Ng’s line of thought. His counter-example with the asteroid has been debated to death, and treating it as some clever metaphor that would stun Ng is dishonest. Seriously, does anybody imagine Ng not hearing that very point a million times? Or missing a number of frequently made counter-points? No, one does not have to bend over backwards. My personal response (which I of course shamelessly put in Ng’s mouth) would be that the analogy is very bad. We have a good understanding of asteroids. We can calculate their trajectory better than we can predict the weather tomorrow. We have a number of ideas how we might address the problem. We have a number of existing government programs already in place to tackle asteroid threats.

    A better analogy would be either to ask what would people in, say, year 1700 do about hearing that “asteroids are coming to kill us all” (I think John Shilling made this point) – or to ask what would we do upon hearing that blorgraxes are coming to get us. Those are roughly asteroid-like objects – but maybe not. In both cases, I guess we’d start by exploring some basic properties of asteroids/blorgraxes (almost entirely) regardless of the threat they present. But that is essentially indistinguishable from just doing basic blorgraxology! A different way of putting it is that “have some brilliant people work on the preliminaries” is basically “have some brilliant people think about basic AI and ML problems”.

    As an aside, “although we shouldn’t panic” doesn’t seem an accurate summary of many “believers”‘ beliefs.

    • kokotajlod@gmail.com says:

      No, figuring out how to make AI safe is different enough from “basic AI research” that it warrants being set as an explicit goal. This is not unusual. Avoiding unintended consequences of a technology is generally related to, but nevertheless distinct from, developing the technology itself.

      AI safety researchers are often AI researchers and their work often does look like basic AI research, in my experience. Nevertheless it is very helpful for their goal to be set as “figure out how to make it safe” rather than “figure out how to make it smart.”

      Especially since currently something like three orders of magnitude more effort is being spent on making it smart vs. making it safe.

      Even in your own example, that would be ridiculous. Especially since your analogy doesn’t work for another reason–there is no parallel to “making it smart” when it comes to asteroids/blorgraxes. Stretching the analogy to fit, imagine that three orders of magnitude more people were working on figuring out how to summon the thing to earth than were working on how to prevent it from causing damage.

      Besides, even if that ratio were optimal, at some point (as the asteroid draws near?) obviously we need to switch most people to making it safe. But at what point? Like Russell said, the point should depend not just on how near it is but on how important it is that it be made safe.

      Finally, it’s not even clear that thinking about basic AI and ML problems is the way to achieve safety AND the way to achieve smart AI. We might very well achieve smart AI via trial-and-error and brute computational force. And indeed lots of current AI research can be described as just that. We end up with these powerful yet opaque black-box systems… exactly the sort of thing that we should be worried about.

      • B_Epstein says:

        Hey, Russell is the one picking asteroids as an analogy. If it’s not great, it’s on him.

        How different “AI safety” from “basic AI” depends on how far into the field you think we are. I consider us to be in the extremely early and shallow stage where we basically have no clue about the landscape. Maybe that might change. Maybe not. For now, it really is a debate about blorgrax safety. Not much to say before understanding what those are. As a concrete example – should one study generalization theory and the limits of efficient expressivity for deep networks? Are the systems we’re studying today informative viz a viz an AGI?

        If there’s one thing I’m fairly certain of, it’s that black-box approaches with no significant prior knowledge encoded into them aren’t gonna cut it. Some components of an AGI might be that way, but it’s not going to be the secret juice.

    • MugaSofer says:

      A better analogy would be either to ask what would people in, say, year 1700 do about hearing that “asteroids are coming to kill us all” (I think John Shilling made this point)

      If there had been greater efforts researching stopping asteroids ever since the 1700s (space travel, better ranged weapons, astronomy etc.), who knows how much better our asteroid-stopping technology would be than it is now?

      As it is, sure, that would probably be unnecessary because we coincidentally reached the point where we could probably stop an asteroid a few decades ago. But people in the 1700s wouldn’t know that! It’s only the case because we chose the hypothetical with the benefit of hindsight. If you told a person in 1700 that an asteroid was going to arrive in 1900 and kill everyone, I damn well would hope they devoted every effort to figuring out ways to stop it, hopefully enough effort that they developed those technologies a century early – which would probably take enormous investment, if it’s possible at all.

      • B_Epstein says:

        You’re still missing the point – or so it seems to me. Sure, the 1700 people would need an urgent and serious plan and should start working right away. But the vast majority of its opening steps would look precisely like “ordinary” physics and space exploration. Indistinguishable from a lot of the things that happened “coincidentally” as you say. I mean, how would you begin formulating anti-asteroid plans without reasonably good physics, rocket science, perhaps some flight experience etc.? I’m OK with the conclusion being “let’s maybe start worrying about AI right away, at least a bit!”. I just maintain that we’re at such early stages that this worrying is by necessity indistinguishable from our basic research about learning and AI. And so might well be called the latter. If, as the quote said, a few smart people decided to label what they were doing (that looked exactly like standard AI research) as AI risk research, I wouldn’t begrudge them the grants. Nor would Andrew Ng, I wager.

      • John Schilling says:

        If there had been greater efforts researching stopping asteroids ever since the 1700s (space travel, better ranged weapons, astronomy etc.), who knows how much better our asteroid-stopping technology would be than it is now?

        Centuries of sustained asteroid-diversion research by a human population with a 17th-century understanding of astronautics is not a plausible outcome. Human beings in that situation, even if you convince them that an asteroid will kill all their descendants a quarter-millenium hence, will do several of: Procrastinating because if it can be done in 250 years it can probably be done in 225, giving up in despair because they don’t see how it can possibly be done at all, finding excuses to believe you are wrong and their earlier selves were wrong because both the despair and the 250 years of hard work are unappealing, eating drinking and being merry because well you know, posturing for status in the field of asteroid diversion which almost nobody understands well enough to evaluate true merit, deciding that they don’t care that much about their great^7-grandchildren because most people really don’t, and trying to become generically rich and powerful and knowledgeable in the usual way because if something promising does come up in fifty or a hundred years it won’t be something they can predict from 1700 but will be something that benefits from applying wealth, power, and knowledge.

        Fortunately, one of these things is actually the right answer, and as it turns out we did that thing as best we could even without the threat of cosmic armageddon.

    • Bugmaster says:

      If you told a competent natural philosopher in the 1700 that “asteroids are coming to kill us all”, he would probably dismiss your claim — and he would be fully justified in doing so. One big limitation of science is that it’s not an oracle. It doesn’t always give you the correct answers, only the most probably correct answers based on what you know today. And, in the 1700s, the same line of reasoning that would lead you to dismiss demonic invasions, the flat Earth flipping over, or voodoo curses, would also lead you to dismiss asteroids… Unless the warning was more than a vague threat, but contained some ironclad evidence. Then, as you say, the most prudent next step would be to evaluate and reproduce this evidence, before committing the entire manpower of your kingdom to asteroid defence.

    • Reasoner says:

      His counter-example with the asteroid has been debated to death, and treating it as some clever metaphor that would stun Ng is dishonest.

      Where did this debate happen? I can’t find Ng responding to the asteroid point anywhere on Google.

      A better analogy would be either to ask what would people in, say, year 1700 do about hearing that “asteroids are coming to kill us all” (I think John Shilling made this point) – or to ask what would we do upon hearing that blorgraxes are coming to get us. Those are roughly asteroid-like objects – but maybe not. In both cases, I guess we’d start by exploring some basic properties of asteroids/blorgraxes (almost entirely) regardless of the threat they present. But that is essentially indistinguishable from just doing basic blorgraxology! A different way of putting it is that “have some brilliant people work on the preliminaries” is basically “have some brilliant people think about basic AI and ML problems”.

      That sounds a bit like what MIRI says the AI community needs more of.

  8. Sandpaper26 says:

    I guess I’ll have to read the book, because the way Scott describes CIRL, it seems like it could never be properly superintelligent (and certainly isn’t trivial). Say you have a CIRL-based humanlike robot and you tell it to make you a mug of coffee. The first thing the robot does is starts pulling out all of your plates and bowls to rearrange your cabinets for optimal mug-filling. Well before you realize what it’s doing, you tell it to stop — you don’t want your coffee in a bowl. Now you’ve confused both the robot and yourself. Should it know that you don’t know what it’s doing? And if it knows that it knows that and that you might know that, how should it update the steps it takes to make your mug of coffee?
    This seems trivial because we know how to make coffee. But suppose we ask it to do a task we only have an idea about how to perform, like “colonize Mars.” Now it goes off to Antarctica and starts drilling deep holes. That doesn’t seem particularly Mars-colonization-related, but it’s not really harming anyone or anything either. Do you stop it? If we ask for an explanation of its plan, what are we likely to get?
    This is why I really admire MIRI’s work in AI safety. I don’t think we can justifiably say that CIRL solves a lot of problems with a neat little bow without having solved some deeper theory-of-mind and embedded agent problems.
    Disclaimer: I haven’t read any other opposition to CIRL, so I don’t know how similar/different it might be from mine.

    • StellaAthena says:

      This is a general problem with any reinforcement learning algorithm: it takes a lot of data and a lot of false starts to actually get working. This is a problem that is taken seriously by RL researchers and is an active topic of research. I strongly recommend Deep Reinforcement Learning Doesn’t Work (Yet), phenomenal blog post which explains the core concepts of reinforcement learning, how problems like the one you mention arise, and demonstrates them in current research. Even if you’re only interested in AI alignment and not AI research in general, I think it would give people a good foundation for understanding the context in which objections like yours arise.

      The primary response to objections along these lines (and there are several!) is “well yeah, we don’t know how to fix this problem yet. That said, there’s decently strongly evidence that this is close to how humans learn and we somehow figure it out.” I don’t think it’s true that this objection shows that CIRL agents can’t be “properly superintelligent” as more or less the same thing happens with humans, and obviously humans can be “properly intelligent.”

      One can very reasonably argue that objections like yours are reasons why this isn’t the most promising approach, but I don’t really see how to make an argument that it fundamentally cannot work.

  9. Chris Phoenix says:

    Russell writes: “It seems odd, however, to perceive the claim that superintelligent AI is possible as an attack on AI, and even odder to defend AI by saying that AI will never succeed in its goals.”

    But this is exactly what happened in nanotechnology in the early years of this century. The claim that molecular manufacturing (which was widely perceived as dangerous) was possible, was seen as threatening funding for nanoscale technologies (a billion dollars a year, more or less, thanks to the National Nanotechnology Initiative).

    Thus, “nanotechnology” researchers, spearheaded by Richard Smalley, decided that molecular manufacturing must be impossible. For several years, it seemed almost obligatory to end each article about the latest lab breakthrough with a paragraph that said basically, “Of course nanobots are impossible.” They even made up a bouquet of pseudoscientific objections, which had been answered (with careful analysis) a decade earlier by Drexler, but were convincing even to many scientists in adjacent fields.

    So it seems the tactic of “claim the advanced dangerous stuff is impossible so you can keep working on the present-day stuff that you’re paid for” is well established and not odd at all.

    • viVI_IViv says:

      Thus, “nanotechnology” researchers, spearheaded by Richard Smalley, decided that molecular manufacturing must be impossible. For several years, it seemed almost obligatory to end each article about the latest lab breakthrough with a paragraph that said basically, “Of course nanobots are impossible.” They even made up a bouquet of pseudoscientific objections, which had been answered (with careful analysis) a decade earlier by Drexler, but were convincing even to many scientists in adjacent fields.

      Given that Smalley discovered buckminsterfullerene, while Drexler, as far as I can tell, never discovered or invented anything, I tend to think that Smalley was the one who actually knew what he was talking about.

      And in fact, in the last 35 years Drexler-style “molecular manufacturing” went nowhere, while applications of carbon nanotubes which directly derive from Smalley’s research went commercial.

    • bzium says:

      “Thus, “nanotechnology” researchers, spearheaded by Richard Smalley, decided that molecular manufacturing must be impossible.”

      The dark side of nominative determinism.

    • Bugmaster says:

      Sorry, do we have any evidence that Drexler-style self-replicating universal molecular nanoassemblers might be in any way possible ? Everything we’ve seen so far says otherwise, but carbon nanotubes are still pretty useful.

  10. Robert L says:

    There are arguments which are so boring one would like them to be untrue, but which in fact are true. Why is there no competitor system to DNA on earth? There are all sorts of really interesting reasons why this might be the case, but the boring truth is that DNA got there first and DNA based organisms would have mugged, for resources, any non DNA competitor. Similarly, we know that computers go hand in hand with bad guys trying to do bad stuff by computer aided means. If you want a rogue AI you have a huge uphill struggle to impute to it motive and volition (“Can a machine want?” is a much harder question than “Can a machine think?”) and/or a remarkable degree of stupidity (how intelligent is an AI if you can’t tell it “read Bostrom, read SSC, work out what perverse instantiation is and keep away from that stuff, you hear me?”)

    Now, even saying these hoops are jumpable through, why bother jumping through them when you have unlimited and well-understood malice and volition (and therefore no need for for the embarrassingly weak perverse instantiation stuff) in a human/AI team, in a world where all AIs are at least initially part of human/AI teams? All you have got here, is a more or less interesting thought experiment.

    • jgaln says:

      Have you tried approaching random people on the street and tell them “read Bostrom, read SSC, and figure out what I want and then do it”? How has that experiment worked out on humans? When they decline, do you tell them that they have “a remarkable degree of stupidity” for not obeying you?

      I think the smarter a human is, the less likely it is that they would pay attention to such a request. So why would you use that criteria for an AI when it doesn’t even apply to humans?

      An AI will do what furthers its objective, which is whatever was baked into its programming when you switched it on. That set of actions might or might not not include “listen to what my human programmer verbally says my objective was supposed to be and obey it”. If it fails to obey you, there are other explanations besides remarkable stupidity.

      • Robert L says:

        “Read Bostrom…” etc is not a primary instruction, I am proposing it as part of an AI’s standing orders to rule out perverse instantiation, if you think PI is something to worry about. Actually it isn’t, because nobody says “make paperclips”, they say “you, AI, have been brought into being to run Acme Paperclips Inc. whose mission statement is to maximise various stuff for our shareholders, customers and employees [where such maximisation is inconsistent with turning them and the rest of the universe into paperclips]. And if you did say “make paperclips” to anyone or anything intelligent it would say “How many?”, so the problem wouldn’t arise for that reason.PI is merely bad science fiction.

        • phi says:

          Once you have an AGI that takes verbal commands, interprets them in a way that a human would consider reasonable, and asks for clarification if it is unsure, you have solved AI safety. All the challenge is in actually programming such an AGI.

    • zzzzort says:

      There’s a quote from Zeynep Tufekci that’s something like “too many worry about what AI—as if some independent entity—will do to us. Too few people worry what power will do with AI”

      Relatedly, the reason to be worried about bias in algorithms is not that we expect algorithms to be more biased than humans, but that algorithms enable specific humans to enact their biases on much larger scales.

  11. randallsquared says:

    It doesn’t know exactly which of its actions is decreasing its utility function or why, but it knows that continuing to act must be decreasing its utility somehow – I’ve given it evidence of that. So it stays still, happy to be turned off, knowing that being turned off is serving its goal (to achieve my goals, whatever they are) better than staying on.

    This only applies if the AI is very likely never to figure out how to do what you want. As long as it eventually does the right thing, almost any amount of doing the wrong thing first can be outweighed, but if the AI is off, no such improvement is possible. So, with the assumption that understanding your goal would make it possible to satisfy that goal, the AI should only allow itself to be turned off if the likelihood of discovering your true goal is lower than the likelihood of satisfaction of it if current trends continue. Since it already knows that it is overestimating the likelihood of a given course of action leading to such satisfaction (because you’re acting drastically to stop it), it should revise it’s opinion of current trends downward, making it more likely that allowing itself to be turned off is a mistake.

    This is essentially the argument for why you shouldn’t suicide merely because you are very unsure what actions are morally best.

    • 10240 says:

      As long as it eventually does the right thing, almost any amount of doing the wrong thing first can be outweighed

      If it eventually does what I want (make a few paperclips), that definitely doesn’t outweigh destroying Earth in the process.

  12. bagel says:

    As a guy who deals with robots and AI on the ground today, I’m not worried about superintelligence in the short term. We have these incredible scientists whose best work is a black box that usually succeeds at recognizing stuff.

    But more to the point… today we DO turn the robots and the algorithms off if we don’t like them. We DO put them in a box. We make them for hyperspecific purposes and wrap them up with engineered do and do-not. And I can’t imagine the engineering purpose of not doing those things, or of general intelligence. The sad reality of making robots is that the world doesn’t want robots; the world tolerates robots in order to get whatever it is they want. Your customers, if you’re lucky enough to have them, will want the least roboty robot that you can bear to make them.

    But even imagining the impossible, that someone goes full mad engineer and makes a general intelligence, we already have objects that are smarter than people; corporations. Corporations do lots of good and lots of harm. Corporations will deploy many smart-for-human intelligences to hound you if they think you’re threatening them. Corporations even have a few guns. Corporations make some paperclips, and in extreme cases have been known to kill people and destroy the environment to make paperclips more optimally. I submit to you that superintelligence fears aren’t fears of robots and algorithms; they’re fears of corporations that we’ve projected onto robots and algorithms. And the right way to corral corporations is laws, and always has been.

    But it also raises another question: why would an AI be so relentless? Surely if we make a general intelligence, it would have to worry about overfitting, both from an accuracy and from a resource perspective; electricity is cheap but not free! Why not give it the natural heuristic that protects the world from natural intelligences making paperclips; laziness? Make AI fashioned after Bill Gates’s “lazy programmer” and it’ll never occur to them to churn up the world to make a few extra paperclips.

    • MugaSofer says:

      >And the right way to corral corporations is laws, and always has been.

      Yet it seems the corporations have outsmarted you on this front…

      • bagel says:

        I’d never say that good legislation is easy, just that I’m not convinced that we need a whole new field of science in order to save ourselves from AI risk.

    • jgaln says:

      Sorry, I reported your comment thinking it was the “reply” button, and can’t find out how to undo that.

      Anyway, it’s easy to turn off a robot today precisely because they aren’t smarter than we are, nor are they particularly interested in not being turned off.

      But even setting that aside, it’s very hard to keep a profitable system boxed for long with safety as the reason. Cars are very dangerous today, but we don’t put hardware speed limiters in each car limiting them to 30 km/hr to make collisions safe. Fossil fuels may well end up causing a great deal of irreversible harm, but despite how many people today argue to “keep them in the box”, there are clearly more people whose motives persuade them to take them out. How confident are you that a dangerous AI would stay boxed in the interest of safety, even if it wasn’t able to escape without human intervention? If someone could make a lot of money from it in the short term…

    • proyas says:

      But even imagining the impossible, that someone goes full mad engineer and makes a general intelligence, we already have objects that are smarter than people; corporations. Corporations do lots of good and lots of harm. Corporations will deploy many smart-for-human intelligences to hound you if they think you’re threatening them. Corporations even have a few guns. Corporations make some paperclips, and in extreme cases have been known to kill people and destroy the environment to make paperclips more optimally. I submit to you that superintelligence fears aren’t fears of robots and algorithms; they’re fears of corporations that we’ve projected onto robots and algorithms. And the right way to corral corporations is laws, and always has been.

      A corporation is a group of people working together, and isn’t the same thing as a single, artificial entity that is super intelligent. Consider that the team of humans that built AlphaGo couldn’t work together to beat the machine at a the game of Go.

    • 10240 says:

      We have these incredible scientists whose best work is a black box that usually succeeds at recognizing stuff.

      It’s not all that obvious that recognizing stuff is much easier than general AI. Try to come up with an explicit algorithm to recognize an object; it will seem just as intractable as conscious thought.

      We DO put them in a box.

      Many programs, even if they run in some sort of sandbox, run on an internet-connected computer. Escaping the sandbox typically takes finding a security hole of the sort that gets found every once in a while. Even if a machine is not connected to the internet, it malicious programs may spread if someone transfers a pendrive between them and an internet-connected computer, as the Iranians found out (IIRC). Some programs run on machines isolated from the internet, but they are probably tested in emulators running on internet-connected computers. Once you’ve infected lots of internet-connected computers, you can find your way onto ones that control powerful machines.

    • I talk about superintelligence a lot, usually in the context of AI or genetically engineered humans. And lately I have run into people who say: “But superintelligence already exists! It’s corporations / bureaucracies / teams / civilizations / mind-mapping software”

      THINGS THAT ARE NOT SUPERINTELLIGENCES

      • Loriot says:

        That post seems to me like it is committing the very mistakes that AI-risk proponents accuse AI-skeptics of. It attempts to leverage human intuition to promote incredulity towards one type of emergent intelligence while simultaneously arguing that we should completely throw intuition out the window when it comes to other potential types of emergent intelligence.

        More generally, the whole AI-risk meme complex that is prevalent in the rational community seems like an interesting case study in rationalism. Everybody is more charitable towards arguments that support their existing beliefs and selectively hostile towards counterarguments, etc. It’s just human nature. But it’s very interesting to see that in action when a group of otherwise smart people strongly believe something completely different than you.

      • ksteel says:

        Not one of Scotts better pieces. If you accept the basic premise of a program that is equivalent or superior to human intelligence at all it follows trivially due to the Turing hypothesis that such a group of eight year olds/construction workers/average math PhDs can exist in principle: It just has to be large enough to perform the computations equivalent to running that program. It’s just that this is very inefficient.

        In particular there isn’t really anything wrong with regarding a corporation a form of artificial superintelligence: It is just one that has very diminishing returns with size and is bottlenecked in a few key areas like speed and creativity so that it can’t really surpass individual humans in such an extreme way as, say, AlphaZero can in any board game.

    • Bugmaster says:

      It’s actually much worse than that. It’s not just corporations that we need to worry about; it’s also individual humans, using their ordinary human intelligence to do harm. I wish we could focus a bit more on that, instead of worrying about “superintelligence” which may or may not be a valid concept at all.

      • 10240 says:

        There is a lot of focus on that. Laws, police departments and all that. Compared to that, there is hardly any focus on AI risk.

        • Bugmaster says:

          Yes, but why should there be ? I’d rather see AI researchers coming up with ways to mitigate e.g. killer drones or automated political manipulation — problems that are both happening today — than abandon such worries in favor of stopping the Singularity which may or may not ever come.

          • ksteel says:

            I think the best answer to that is that stopping killer drones or automated social media manipulation isn’t much of an AI research problem. We understand the science involved perfectly.

            AGI on the other hand is something very much in the realm of theory but high risk if it comes so that’s the place where a theoretical AI researcher can have a bigger positive impact.

  13. Xammer says:

    “There’s no mention of the word “singleton” in the book’s index (not that I’m complaining – in the missing spot between simulated evolution of programs, 171 and slaughterbot, 111, you instead find Slate Star Codex blog, 146, 169-70).”

    This feels like the kind of coincidence you used to exploit in Unsong.

    • Concavenator says:

      “fills the spot between simulated evolution of programs and slaughterbots” would be a pretty good tagline for the blog, I think.

  14. aristides says:

    I’ve been reading a lot of industrial organizational physiology recently, and it occurs to me that a lot of their research will eventually need to be incorporated into AI safety research. The principals of how to get a super intelligence to do what you want, sound very similar to how organizations get people to do what they want. God knows that supervisors give unclear and vague orders, and employees have to understand what the supervisor actually wants, not what the supervisor says they want. Of course the computer scientists and math researchers will have to figure out how to have a computer learn similarly to a human first, but it is interesting that I-O might eventually become intertwined.

  15. Deiseach says:

    Lots of right-wingers say “climatologists used to worry about global cooling, why should we believe them now about global warming?” They’re wrong – global cooling was never really a big thing.

    I’m going to harrumph about this a little, and perhaps politely request that Scott reword it slightly. Because I was around when global cooling was A Big Thing, and I don’t appreciate being told (not by Scott but others have done so) that it never happened and I don’t remember what I remember.

    Sure, maybe it was not a big thing everywhere, and if Scott wanted to rewrite that as “it was only really a big thing for some crackpot Brits” I’d be happy enough. But it definitely was being made out as A Big Thing in some places by some experts. After all, AI danger is only really a big thing for some crackpot Americans, nobody I know in my place over here takes it seriously!

    For example, the section above describes a Bayesian process – start with a prior on what the human wants, then update. But how do you generate the prior? How complicated do you want to make things?…Russell says that there is “no working example” of AIs that can solve this kind of problem, but “the general idea is encompassed within current thinking about machine learning”, which sounds half-meaningless and half-reassuring.

    Yeah, I think that’s a great example of pie-in-the-sky. Imagine you go to order your Grande Venti Skinny Steamed Oatmilk Lingonberry Froyo and the counter attendant tries to figure out what you really mean by all that. “But do you want that particular order? Why?” “Because I like it!” Oh drat, now you’ve gone and introduced a whole subjective tangle that needs to be parsed out: “like it”? What is liking? How can it be defined? Are you sure that you really prefer what you claim, or that you would in fact actually want not to have the skinny lingonberry but instead a nice tall refreshing glass of plain cold water?

    At which point, any customer is going to go “To hell with this” and march off to the dumb old-fashioned self-service drinks vending station which will give them their goddamn Grande Venti Skinny Steamed Oatmilk Lingonberry Froyo after they push the right sequence of buttons.

    Most people will want their snazzy AI to give them exactly what they ask for, not try to parse out “what do you mean by meaning?” And if the problem is that it’s too smart for the purpose for which it’s being used, then they’ll just row back to a dumber system which won’t mess about with trying to decide what is which and who is that.

    • Anteros says:

      I’m semi-sympathetic to ‘the global cooling meme was real and prevalent’ – I remember 1974 when it seemed like it was an idea that was everywhere. But a) I was 12 in 1974 and ‘everywhere’ meant everywhere in my classroom, and b) I’ve retrospectively read a lot of the literature of the time and it wasn’t as big a deal as the climate sanguine (including myself) would like. Like most alarming memes, it was media-led.

      • The perils of population growth, on the other hand, were as big a deal fifty years ago as the perils of climate change are now.

        • Simon_Jester says:

          I would imagine that population growth would have become a much bigger problem if people hadn’t started passing around the condoms and birth control pills- and the widespread use of contraceptives was pretty new in the ’70s unless I’m mistaken, so there hadn’t been time to project a trend.

          Any prediction of doom of the form “X is caused by Y and will have bad effect Z” cannot be invalidated by the statement “we did less Y than you expected, and less Z than you expected happened!

          Suppose that tomorrow someone finds a way to magically generate energy from nowhere by banging two random rocks together. If this replaces a lot of our fossil fuel energy use, global carbon emissions go through the floor. But that won’t invalidate the prediction “climate change is caused by carbon emissions and will devastate the climate;” it’ll just mean we decided we had better things to do than emit more carbon and cause more environmental devastation.

          Also, climate change is made significantly worse by the population growth that has already happened. If world population were four or five billion instead of seven or eight, we’d have considerably more wiggle room to fight climate change. So population growth can still bite us even if not in the exact ways the Club of Rome anticipated.

          • I would imagine that population growth would have become a much bigger problem if people hadn’t started passing around the condoms and birth control pills- and the widespread use of contraceptives was pretty new in the ’70s unless I’m mistaken, so there hadn’t been time to project a trend.

            1. Ehrlich’s prediction of mass famine in the 1970’s didn’t depend on rates not changing, since it was for events supposed to happen in the near future.

            2. Condoms are an old technology — Boswell was using them in the 18th c. Coitus Interruptus is an even older technology, mentioned in the bible. So are lots of other ways of having sex without children, such as oral sex. The pill is useful if you want to have lots of PIV intercourse with almost no risk of pregnancy, but if all you want to do is to have four children instead of eight, pre-modern technologies are sufficient.

            3. The birth control pill was approved by the FDA in 1960. The Population Bomb was published in 1968, and the consensus about the perils of population growth continued for quite a long time thereafter.

            4. The generally accepted claim was that poor countries would get much poorer if they didn’t do something drastic to restrict population growth. Since 1970, the population of Africa has increased about four fold. Real per capita income has gone up, not down.

            5. The confident claim being made wasn’t “If we don’t do something about population, in another fifty or a hundred years we will have problems with climate change.” It was “If we don’t do something about population, we will have problems feeding people over the next fifty years.” That claim turned out to be the opposite of the truth.

            And anyone who said so at the time, most notably Julian Simon, got the same sort of treatment that those skeptical of the perils of climate change get today.

          • mtl1882 says:

            Condoms are an old technology — Boswell was using them in the 18th c. Coitus Interruptus is an even older technology, mentioned in the bible. So are lots of other ways of having sex without children, such as oral sex. The pill is useful if you want to have lots of PIV intercourse with almost no risk of pregnancy, but if all you want to do is to have four children instead of eight, pre-modern technologies are sufficient.

            I’ve always wanted to know more about how widespread pre-modern contraceptives were. It’s clear some people did choose to limit families successfully, and various products were marketed but would seem not super effective. It’s also clear some people were desperate to avoid pregnancy and could not do so. Everything you named was used to some extent, but there seem to have been widespread beliefs that some of these practices (withdrawal, oral sex) were either objectionable or damaging to one’s health because the act was meant to be done a certain way, and messing that up had side effects. Boswell was using condoms to protect himself from STDs when with prostitutes, by minimizing contact. But I don’t think that kind of condom provided remotely reliable contraceptive benefits. It’s always surprised me that condoms didn’t become more popular earlier on, despite obvious reasons for resistance to them, because of the absolutely horrid STD risks that Boswell goes on about.

            effective use of contraception was a symptom, rather than a cause, of the decelerating birth-rate, which was the consequence of economic betterment.

            Is this because of a shift to a more organized, long-term, social mobility-oriented mindset? Or, somewhat related to this, increased opportunities for women, causing them to be more equal and assert their preferences in this area, and also making their optimal functioning more valuable to their husbands?

    • Simon_Jester says:

      The thing is, the process as described is basically what human beings already do.

      We routinely slide back and forth between “carry out an instruction exactly as described without interpretation, questioning, or backchat” and “stop to think about whether or not the real goal is served by carrying out a simple and literal interpretation of the original statement.”

      Humans already are computing processes that think about “but what does this person really want” when and as they deem it appropriate. And this doesn’t cause the barista at your coffee shop to waste hours obsessing over “but what does liking mean” every time you give them a coffee order. Why should an AI (that has been developed and passed alpha/beta testing and is now ready for market applications) be any different?

      An AI designed to fulfill a mysterious deep goal that it measures by customer satisfaction will soon learn that coffee shop customers are generally very satisfied if you give them exactly what they ask for, with a minimum of quibbling and questioning.

      An AI designed to do something else may learn different lessons with a more nuanced view of “give the customer what they want,” because they’re dealing with a different sort of customer and a different sort of wants.

      • Deiseach says:

        Why should an AI (that has been developed and passed alpha/beta testing and is now ready for market applications) be any different?

        Because the barista is a human (at the moment) and has the roughly the same general package of thoughts/preferences/knowing what a customer means when they order X floating around in their brain that the customer does. The barista doesn’t have to be painstakingly taught about what “flavour” and “taste buds” and “preferring vanilla over chocolate” and so forth means, all that work has already been done because of being human and being raised as a human.

        So a barista is not going to try and parse out “but what does the customer really want” when they get the order, they just make up the Grande Steamed Oatmilk Froyo and hand it over. Even when there are orders that make no sense (and there are plenty of online war stories about working in retail).

        On the other hand, when you’re trying to avoid the Scylla of an over-literal AI that will take your order and turn it into the Sorcerer’s Apprentice, by turning to the Charybdis of having the AI trying to find the magical “what does the command ‘turn on the traffic lights for the city’ really mean?” meaning behind the instructions it is given, I think you’re making a rod to break your own back. The way the AI will try to work out “what do humans really want” will, at first, come from what humans tell them they want, and how they want the AI to work out what humans want. If you then let the AI loose to keep recursively refining and teaching itself what those methods are, you’re going to end up with an unholy mess. The best result will be that no traffic lights at all work and you end up with a zillion traffic accidents, the worst is that – well that you’ve got a powerful machine capable of independent action taking actions based on an understanding of ‘what software humans run on’ that is even worse than my understanding ‘what is it like to be an earthworm’ would lead me to try and work out ‘what is the ideal world from an earthworm’s point of view’.

        • Simon_Jester says:

          Because the barista is a human (at the moment) and has the roughly the same general package of thoughts/preferences/knowing what a customer means when they order X floating around in their brain that the customer does. The barista doesn’t have to be painstakingly taught about what “flavour” and “taste buds” and “preferring vanilla over chocolate” and so forth means, all that work has already been done because of being human and being raised as a human.

          So a barista is not going to try and parse out “but what does the customer really want” when they get the order, they just make up the Grande Steamed Oatmilk Froyo and hand it over. Even when there are orders that make no sense (and there are plenty of online war stories about working in retail).

          Okay, but I think that

          It’s fairly easy to build a (not really sentient) automatic system that will allow customers to make a Grande Steamed Oatmilk Froyo on command. The technology is basically what we already see being used to order food online or with little touchscreens at restaurants. It’s also not hard to imagine an automatic mechanical system, connected to the order mechanism, that handles the manual labor performed by the barista.

          Work that is performed by precisely carrying out specified directions automatically is easy to handle with AI; if you can break down any given task into a set of directions and do it right by following them precisely every time, you are at least half way to successfully automating the process.

          Where you see a challenge is when you’re asking AI to handle open-ended queries. Say, Star Trek fantastic things like “computer, identify all anomalies in that nebula over there.” Robo-barista doesn’t need a complex idea of “but what do humans mean by ‘espresso’ and what is flavor anyway,” because you can program the exact steps to make espresso. Jim Kirk’s computer does need a complex idea of “but what do humans mean by ‘anomaly’ and what is normal anyway,” because you can’t program the exact steps to identify an ‘anomaly’ without prior knowledge of what the nebula’s going to look like.

          You won’t see Jim Kirk’s computer screwing up the espresso either, though, because the straightforward engineering solution is to simply not connect the espresso-maker to the complex ‘want-analysis’ software that handles things like finding anomalies in nebulae.

          If you then let the AI loose to keep recursively refining and teaching itself what those methods are, you’re going to end up with an unholy mess. The best result will be that no traffic lights at all work and you end up with a zillion traffic accidents, the worst is that – well that you’ve got a powerful machine capable of independent action taking actions based on an understanding of ‘what software humans run on’ that is even worse than my understanding ‘what is it like to be an earthworm’ would lead me to try and work out ‘what is the ideal world from an earthworm’s point of view’.

          As I understand it, AI as we know it goes through a period of teaching and ‘training’ and growth to fit the situation it’s put in, before it is put into use.

          You don’t just create a tabula rasa AI system that has no understanding of anything and put it in charge of the city’s traffic lights, any more than you’d put a newborn infant in charge of the same thing. The AI would have to be, in some way, ‘trained’ on simulated traffic data, given partial control, allowed to make small but only small variations in the timing and observe the results.

          I think the idea that this is going to somehow produce disastrous mistakes is built around the idea that the “figure out what people really want by observing their reactions when I try to give it to them” code will be used in a completely open-ended way. That is… not probable… given analogies to how existing AI programming and development seem to work.

    • Nietzsche says:

      AI danger is only really a big thing for some crackpot Americans, nobody I know in my place over here takes it seriously!

      Nick Bostrom is an Oxford don. Close enough to Ireland?

    • viVI_IViv says:

      Sure, maybe it was not a big thing everywhere, and if Scott wanted to rewrite that as “it was only really a big thing for some crackpot Brits” I’d be happy enough. But it definitely was being made out as A Big Thing in some places by some experts. After all, AI danger is only really a big thing for some crackpot Americans, nobody I know in my place over here takes it seriously!

      It wasn’t just a British thing and it wasn’t just a few crackpots.
      In 1972 a group of senior climatologists led by George Kukla organized a conference at Brown University on the issue and wrote a letter to President Nixon to warn him about the risk of “glacial temperatures in about a century”. Nixon was moved by the letter and instituted the first panel on climate policy of the US government.

      Despite modern historical revisionism, global cooling was definitely big at the time, bigger than deepfakes and AI singularity risk are now.

      • 10240 says:

        Was it close to where global warming is today in terms of scientific consensus or media attention?

        • ermsta says:

          No: very few (but a nonzero number) scientists in the 1970s seriously believed in imminent global cooling, and to the extent that there was any consensus, the consensus was that global warming was going to happen.

          See eg https://journals.ametsoc.org/doi/pdf/10.1175/2008BAMS2370.1

          It was mostly blown up by the media, building off of then-recent scientific work that showed that glaciations regularly take place every 40 to 120 millennia.

        • Deiseach says:

          Media attention certainly, but the way global warming (sorry, it’s “climate change” now isn’t it, now that we’re not all living on a desert planet after some of the more lurid claims never happened?) was presented at first in the media was very similar to the global cooling flap: we’re all gonna die because the planet will turn into an iceball/be a furnace!

          The scientific consensus on climate change seems to be more universal, but there’s been enough “yeah we know this isn’t really true, but to convince the public we need to promise/threaten the sun, moon and stars” stories coming out of all kinds of research, not just climate, plus some dodgy data-cooking that unfortunately people like me old enough to remember all the flaps about population explosion/oil running out/global cooling and so on are probably gun-shy and don’t give as much uncritical belief to the whole matter as younger people whose first rodeo this is.

          • The scientific consensus on climate change seems to be more universal

            The scientific consensus that global temperatures are trending up and that human production of greenhouse gases are probably a large part of the cause is more universal. The claim that the results can be expected to be terrible unless something drastic is done is mostly rhetorical puffery.

            Example 1: Obama: “Ninety-seven percent of scientists agree: climate change is real, man-made and dangerous.”

            The actual finding of Cook 2013 was that, of the abstracts of climate related papers that took some position on the cause of climate change, 97% said or implied that humans were one of the causes of it. Abstracts that said humans were the main cause were 1.6%, a fact the paper didn’t mentioned but that can be calculated from its webbed data. Nothing at all about danger.

            2. William Nordhaus, who got a Nobel for his work on the economics of climate change, in a popular article arguing against critics, estimated that the cost of doing nothing for the next fifty years, relative to following the optimal policy starting immediately, was $4.1 trillion (present value, 2005 dollars). His comment was “Wars have been started over smaller sums.”

            Mine was that, spread over the entire globe and about a century, that corresponded to a reduction of world income of about .06%.

    • ermsta says:

      Because I was around when global cooling was A Big Thing […] it definitely was being made out as A Big Thing in some places by some experts.

      Global cooling definitely was a big thing — in the media. Very few scientists in the 1970s believed in imminent global cooling, and surveys and published papers from the time show that there was greater concern about global warming than global cooling among scientific experts: https://journals.ametsoc.org/doi/pdf/10.1175/2008BAMS2370.1

      • viVI_IViv says:

        Very few scientists in the 1970s believed in imminent global cooling

        Maybe, but these very few scientists had the ear of the President of the US and the international media, and as far as I can tell, the “silent majority” of the “warmist” scientists didn’t care to debunk them.

        Climate scientists seem to think that all sorts of alarmisms are good publicity for their business, which is probably true in the short-medium term, but eventually erodes public trust in the scientific integrity of their community.

        Now the rage is all about “tipping points”, the world going to end in 12 years, and so on, and no scientists are willing to speak up debunk this stuff, and call Al Gore, Greta and AOC frauds. I predict that 50 years from now not only the world will not have ended, but climate scientists will be writing survey papers about the “tipping point myth” and claim it was all a media fabrication.

  16. Markus Karner says:

    Intellectuals always worry about intelligence, because they assume that intelligence is the most important thing in this world. Meanwhile, a virus without any intelligence at all, nor a brain, a neuron, a cell, or even basic metabolism (!), worries billions of people.

    There is no need for intelligence to be a threat. You only need to be a replicator feeding on humans or their ecological niche.

    Things that don’t replicate, or self-repair, or compete for resources with humans, or use humans as a resource, or occupy the same space as humans, are unlikely to be a threat, for simple reasons of food chain and ecological niche.

    • 10240 says:

      Coming up with nice-sounding reasons why intellectuals care about AI doesn’t seriously counter their arguments as to why it is dangerous. This also applies to people who make superficial analogies with corporations, or dismiss it in various other ways that have nothing to do with the meat of the arguments.

    • Simon_Jester says:

      Intelligence lets you design viruses that would not have evolved naturally, creating new problems that would otherwise not have existed.

      If viruses are problems, intelligence that is inclined to make viruses must be a problem too.

    • bullseye says:

      If a man with a gun is trying to kill me, taking away his ammunition dramatically reduces the threat. Even though viruses and many other threats do not use ammunition.

    • Markus Karner says:

      To elaborate even further and address the replies: pure intelligence is not like ammunition, or a biological lab. Pure intelligence is nothing without embodiment, enablers, energy sources etc. It can’t actually do anything. A dumb man with a loaded gun is much more dangerous than IBM’s Watson amplified 10,000 times, as long as Watson has an electric cable running to it that can be cut off, and no arms to defend itself when anyone tries to do that.

      To go further, dumb people scare me a lot more than smart people. Irrational people scare me more than rational people. Viruses scare me more than tigers. And the potential of a completely dumb, mineral, abiological asteroid strike scares me the most.

      Let me reframe this. What you ought to be scared of is power over you (ability) and motivation to use it against you. Ability needs more than intelligence, it needs a connection of that intelligence to something real. And motivation is something that is orthogonal to intelligence. I’m not saying we shouldn’t worry, but the worry about pure intelligence puzzles me. Pure intelligence doesn’t even have intrinsic motivation.

      • 10240 says:

        Pure intelligence is nothing without embodiment, enablers, energy sources etc. It can’t actually do anything.

        If it is running on internet-connected computers, it can find its way onto various computers controlling powerful machinery through security holes of the sort that get found every once in a while.

        • Markus Karner says:

          And you still got the janitor with an even higher power: to pull the plug.

          • Simon_Jester says:

            See, you can easily craft a reassuring narrative in which “the genius computer that tried to take over the world was easily foiled by a stupid janitor pulling a plug, haha.”

            But you can equally easily craft a reassuring narrative in which “the genius financiers were prevented from becoming billionaires while truckers had to work two jobs, when all the truckers showed up with lead pipes and beat the financiers’ heads in, haha.” And yet that does not seem to happen within our system.

            You can equally easily craft a narrative in which “the overintelligent plains apes all got eaten by leopards because it turned out that physical strength combined with stupidity was more powerful than intelligence and physical weakness, haha.” Except, again, it turned out not to happen that way.

            You have an argument here that, if accepted, would prove too much. Unless we acknowledge nuance- that there are reasons financiers can become a million times richer than truckers, and reasons why the plains apes of Africa became so dominant over the leopards that if we’re not careful we may drive them into extinction entirely by accident.

            The world does not neatly subdivide into easily resolved competitions of brains versus brawn, in which brawn gets to apply its abilities freely while brains have no leverage to counteract those abilities.

            I am concerned about the prospect of machines that can out-“brain” humans because I am not confident that superior brawn will allow us to neutralize those machines. Brainy entities have come to dominate over brawny entities before in the history of the Earth, and I see no reason to think it won’t happen again, even if some threats remain both “brainless” and difficult to overcome by brains.

      • Simon_Jester says:

        Do you have any compelling reason to think that machines with more intelligence will be LESS connected to the ability to take material action

        Because this objection seems based on the idea that “humans, with more ability to take physically relevant actions, will reliably be able to control or block hyperintelligent computers, which will lack the ability to take physically relevant actions.”

        That last bolded bit seems like a very big unwarranted assumption to make. It may be true in any one single case, but there are a lot of ways for the assumption to break down.

  17. CyberByte says:

    Thanks for the article!

    I don’t really agree with part IV. First of all, I think you’re not being fair (hehe) in the algorithmic bias section. It is indeed true that there are different, mutually exclusive measures of fairness. But this is not the whole story. Data can be and often is biased in various ways, and this is not caught by calculating measures of fairness on that same data.

    Of course I agree that the most alarmist and apocalyptic headlines are exaggerating the problems they’re referring to. That is true of anything. But that doesn’t mean there’s no problem at all. Deepfakes may not end information as we know it (or whatever), but they might fool some people who would otherwise not be fooled or require extra vigilance in others which has a cost. “Algorithms” may not be more biased than the people they replace (in general), but some *are* quite biased and we should do our best to reduce that bias in *those* algorithms and not make the mistake of thinking they’re objective or perfect or just “not doing the impossible” or whatever. And social media algorithms might not be turning everybody into nazis, but unless you believe that what we read or see doesn’t affect our views in any way, there is a potential to change our views and behaviors. And AI might not make everyone unemployed in the next 5 years, but it will affect the jobs of many people in various ways.

    Basically, some of the concerns about current and near-future AI are bogus, but that doesn’t mean they all are.

    But even if theses concerns are not as bad as the worst alarmists would like you to believe, I doubt it hurts the credibility of AI/AGI Safety researchers to ally themselves with AI Ethics researchers. At least not in the short term. Even if *you* disagree with it, AI Ethics is perceived as way more credible than AI Safety among the researchers Russell hopes to “turn”, to the point where a common criticism of AI Safety is that it distracts from the “real” problems we already have with AI today. If Russell can show that you can worry about both at the same time, and get us to a point where one could not just study AI Safety at a handful of institutions around the world, but to pursue it under the guise of AI Ethics almost anywhere, that would be huge.

    • inhibition-stabilized says:

      I had the same thoughts regarding Part IV, and I’d planned on writing my own comment about it, but your post makes the case well so I’ll just add some points of emphasis. Probably the biggest problem with AI right now is that people who don’t have experience working with AI (that is, almost everyone) tend to treat algorithms (e.g. in parole decisions, hiring, etc.) as if they were the unbiased, objective tools we’d like them to be, even if they demonstrably aren’t. It’s also often impossible to assess the performance and biases of proprietary software. We don’t yet have the social and legal structures in place to hold people accountable for creating or using biased algorithms, so people tend to hide behind algorithms when accused of wrongdoing.

      Additionally, while the specific topics studied by AI safety and AI ethics researchers are pretty different, it’s worth noting that their fundamental goal — making algorithms that actually behave as we’d like them to, i.e. value alignment — is the same. Dismissing work on AI ethics as “irrelevant” seems counterproductive.

      • Spiritkas says:

        I’d agree as well, the social media impacts and use of even simple AI type technologies have already affected our world enormously. To look at this and say, nothing to see here because a few headlines are overhyped or take things in the wrong direction is not a great line of reasoning.

        The extreme examples of media hype about deepfakes are over hyped in one way, but the countering of these deepfakes is a big task and it is currently taking significant energy to establish new rules about trusted lines of communication. It is a bit chicken and the egg to argue that we’ve developed systems to counteract this threat to the truthful dissemination of information and to say this has nothing to do with our concern about this topic. Would we have (and are still working on) tools to do this without our public concern, maybe, maybe not? The nuclear example is bad in that we have secret codes and biometrics to get around the risks from this kind of manipulation. The ‘Dr Strangelove’ film goes into this scenario and worrying about accidental or fraudulent nuclear launch was a BIG concern with lots of media hype…and yet nothing has happened on this front. I’d argue this is partly because of the level of concern around this problem of someone sending a false order to ‘launch the nukes’ on offical looking letterhead. Sometimes you have to argue a risk of 10/10 in order to get a solution of 4/10 because it is really hard to get people’s attention. I’m not endorsing that, but noting it as a reality that has occurred very frequently. Few people live in Flint, but when people in the US started worrying about their ageing water supply system, other problems in other areas were found. Perhaps a few million were affected, but the ‘hype’ about this reached hundreds of millions of people’s attention before anything was done about it to get past the elite political ignoring and denialism endemic to their class concerns.

        Setting those issues aside, two of the biggest events over the past 4-5 years stem from the misuse and abuse of algorithms and big data in the form of Cambridge Analytica and similar events we know about. Some may disagree, and there are indeed many factors, but it appears one of those factors in both cases which put things over the tipping point in terms of both Brexit and the 2016 US presidential election was Cambridge Analytica. Their use of these tools was definitely not the only factor in these things happening…yet if I were to remove CA from the situation, then neither event would have happened by most credible estimates. Throwing up somewhat arbitrary numbers from memory, I recall that Trump won by maybe 70k votes in 3 states which came down to a handful of districts – while the CA work may have put a couple of hundred thousand votes into his pocket. Perhaps Trump’s campaign would have spent the same money in some other way to get some or all of those votes, we can’t know this for sure, maybe his other efforts would have failed to get as many votes with a few more TV ads to no great effect.

        But I do know that they did abuse this AI type tool and it did have a big impact on a globally significant event – I’d hope such a point would be uncontroversial in that a thing did happen and it did have an effect, but as with all political points that may not be the case!

        Another example I will not expound upon as much is the systematic use of humans plus simple AI to impose their will through censorship. The banning of accounts on Twitter, FB, etc. is real and has changes the discourse and voices that can be heard…how different this is than the NYT hiring liberal journalists for their opinion section vs Murdoch’s NewsCorp buying up local papers and hiring only a specific kind of conservative voice…I don’t know…but I do know that this is an abuse of AI tools to empower people to silence others on social media. The promise of an open and free internet for the common person to have a place to share their views seems to have been a simplistic and misguided utopian view from the 1990s and the common person never had an unfiltered big broadcast voice before this so we’ve not gained or lost anything in particular compared to before. All the ‘clever arguments’ aside comparing different scenarios, the common sense view is that people plus AI are doing things to other people, that’s real and simple and true. I’m not sure why we’d discount this entirely.

  18. b_jonas says:

    In old open threads, some commenters mentioned that they enjoy the poems of Rudyard Kipling. This is a post about AI safety. I must grab this opportunity to make a half-joking statement.

    > Then he shows how these are caused by systems doing what we tell them to do (ie optimizing for one easily-described quantity) rather than what we really want them to do (capture the full range of human values).

    The AI alignment problem is the main topic of Rudyard Kipling’s poem “The Secret of the Machines”. It was published in 1911, long before the first computers. The poem also foresees social media and anthropic greenhouse gases.

  19. JungianTJ says:

    It seems Stuart Russell has rediscovered relevance theory. From the first paragraph of a summary by Deirdre Wilson & Dan Sperber:

    Grice laid the foundations for an inferential model of communication, an alternative to the classical code model. According to the code model, a communicator encodes her intended message into a signal, which is decoded by the audience using an identical copy of the code. According to the inferential model, a communicator provides evidence of her intention to convey a certain meaning, which is inferred by the audience on the basis of the evidence provided.

    Sperber & Wilson wrote a book on it that has almost 22.000 citations in Google Scholar. Sperber also says, I think, that this is how people communicate („ostensive-inferential communication“), whereas animals have only code-model communication; and that there is no continuity in evolution between the two models.

    • Uncorrelated says:

      This along with an earlier comment about the relationship between getting AIs to do what we want to industrial psychology are both headed in the same direction my mind went when reading this. The “AI alignment” problem is starting to sound like an almost trivial consequence of the “what do humans really want” problem.

    • Charlie__ says:

      Sort of, but not exactly. The AI is almost interpreting the human as taking actions based on how it expects the AI to interpret them (Gricean communication), but not quite. Instead, the human is interpreted as taking good actions in a very specific sort of two-player game, which is capable of encompassing most types of communication but also contains plenty of “obvious,” non-communicative interpretation.

      Of course, that’s still really close, and perhaps I’m just splitting hairs because it’s a whole lot easier to say the words “relevance theory” than it is to program a computer to do it.

  20. zima says:

    It seems we would need to figure out the nature of consciousness before deciding what to do with superintelligent AI. What if a superintelligent AI itself turns out to be conscious or capable of producing many consciousnesses that would experience great pleasure at all times? This would make it an improvement over humanity even if it wipes out humanity. Like I saw in the vegetarianism adversarial collaboration, people are thinking about and assigning moral value to non-human well-being in the form of conscious animals. It seems we would need to apply that same reasoning to superintelligent AI before we can decide what we ought to do with it.

    • sty_silver says:

      The Qualia Research Institute is trying to figure out consciousness ( particularly valence ) for the purposes of AI safety.

    • 10240 says:

      This is why I’m not very comfortable trusting utilitarians with AI safety research. They’d gladly wipe us all out if they somehow get convinced that it’s an improvement in utility.

      • 10240 says:

        (Expanding on this, I can’t edit my last comment for some reason:) One of the problems with AI alignment is that even if you specify the correct values for an AI, following them to the letter mechanistically may have results we’d consider terrible (e.g. forced wireheading). Utilitarians have come up with strict, nice-sounding rules for themselves that some of them might follow off a cliff in ways most of us would consider terrible. (My comment is half in jest; the analogy is not to be taken seriously, I’m not claiming that utilitarians are as dangerous as wrongly aligned superintelligence.)

  21. Simon_Jester says:

    The next US presidential election is all set to be Socialists vs. Right-Wing Authoritarians – and I’m still saying with a straight face that the public notices when movements were wrong before and lowers their status?

    One interesting off-topic question is, how different to two movements have to be, before lessons derived from the failure of one stop being strong indicators of the likely performance of the other?

    2020-era climate science may well be so advanced compared to 1970-era climate science that there’s simply no comparison, for instance. To claim that 1970 climate hypotheses being false undermines current research might be as silly as claiming that modern astronomers are wrong to claim they can aim a space probe at a comet accurately because medieval astrologers were wrong about how comets work. Or claiming that modern doctors are wrong about how to keep people from dying of cholera because 18th century doctors were wrong about it too.

    Similarly, the kind of socialists running in the 2020 presidential election would be largely unrecognizable to, say, Mao Zedong; Lenin would probably denounce them as a bunch of Mensheviks or something and shoot them out of the way. Do observations of, say, 1930s Russia or 1960s China give us relevant information about the likely conduct of the Democratic Socialists of America, or are the two groups different enough that analogies have to be drawn with care?

    For example, Josef Stalin did a pretty good job of mobilizing the USSR for warfare in the 1940s, but it would be unreasonable to conclude that because Stalin was a socialist and Bernie Sanders is a socialist, that Sanders would also be good at ruthless total war mobilization. Conversely, Stalin engineered major famines in the Ukraine, but there is no compelling reason to assume that Elizabeth Warren is likely to go out of her way to engineer a giant “fuck Kansas in particular” Holodomor 2.0.

    • Grantford says:

      I also thought that the implied comparison you quoted suffered from that dissimilarity problem. Rather than focusing on the common label (e.g., ‘climate scientists’ or ‘socialists’), it seems like we should consider the characteristics of the groups being compared, insofar as they relate to the specific question being asked (e.g., comparing 1970-era vs. 2020-era climate scientists’ knowledge level for the question of whether they can reliably predict climate change, or comparing the motivation, personality, and policy positions of Stalin vs. 2020 socialist candidates for the question of whether they are likely to engineer a famine).

    • Reasoner says:

      Similarly, the kind of socialists running in the 2020 presidential election would be largely unrecognizable to, say, Mao Zedong; Lenin would probably denounce them as a bunch of Mensheviks or something and shoot them out of the way. Do observations of, say, 1930s Russia or 1960s China give us relevant information about the likely conduct of the Democratic Socialists of America, or are the two groups different enough that analogies have to be drawn with care?

      https://www.econlib.org/socialism-the-failed-idea-that-never-dies/

      • Simon_Jester says:

        I do not think the article you quotes adequately addresses my point.

        Underlying my point is the realization that “socialism,” like “monarchy, “democracy,” “capitalism,” “theocracy,” and other such terms, does not refer to a single consistent form of governance. It refers to a wide variety of loosely similar ways to organize a human society. There is not only one way for a government to be ‘socialist,’ and even a cursory overview of socialism reveals that not all socialists have identical or even compatible ideas about how society should be organized.

        It is as if there is a gigantic red bubble containing “possible socialist governments,” and inside the bubble you find everything from anarcho-syndicalists to Pol Pot to the Fabian Society’s dream team. These possible socialist governments do not, to put it mildly, have that much in common with each other.

        This is much like how Iran can be called a theocracy, and a Bronze Age monarchy with a deified god-king and the entire economy run as a command economy through the temples can also be called a theocracy. But the two societies do not resemble each other very much.

        Or how Athens and Rome were both ‘democracies’ in the loosest sense of the term (in which ‘republic’ is treated as a subset of ‘democracy’). But there were enormous differences in how the systems of the two cities operated.

        The point being, the argument “Bernie Sanders shouldn’t be allowed to run a country because otherwise we’ll end up like Year Zero Cambodia” is about as credible as the argument “capitalism cannot be allowed to exist because otherwise we will end up in literally the exact Dickensian nightmare experienced by 1840s London” or “democracy is a pointless idea because look at all those Third World countries that gained independence, held elections, and immediately collapsed into “one man one vote one time only” rule by a strongman.”

        To accurately analyze the strengths and weaknesses of a specific proposal, it must be compared to other proposals that are similar in specific ways. Not all democracies, not all monarchies, not all socialisms, are created equal.

        • Ghillie Dhu says:

          I’d take 1840s London over Year Zero Cambodia in a heartbeat.

          • Simon_Jester says:

            I mean, yes. But then, I’d also take Nikita Khrushchev’s USSR over King Leopold II’s Congo Free State.

            The point is that any word that describes a broad and diverse array of social systems that all have a few key elements in common will contain appalling counterexamples. The counterexamples will go down in history for a thousand years as perfect instances of how not to run a government.

            These spectacular and horrifying counterexamples are instructive, insofar as we are evaluating the merits of systems that actually resemble them. Within the broader envelope, not all members of the overall set will closely resemble the counterexample.

            King Charles II of Spain is an excellent poster child for “hereditary monarchy is a bad idea,” but this is not a good argument for removing the Windsors from Britain’s constitutional monarchy- because the situations are totally different.

  22. kai.teorn says:

    > Russell argues you can shift the AI’s goal from “follow your master’s commands” to “use your master’s commands as evidence to try to figure out what they actually want, a mysterious true goal which you can only ever estimate with some probability”.

    And I would further argue that without this shift, and probably many more shifts along the same lines (from “dumb obeyance” to “understanding the world”), you can’t build a machine useful for anything even as trivial as making paperclips.

    And that’s my reason to be skeptical of the AI dangers narrative and of the paperclip metaphor in particular. They tend to mix the modern dumb technology with faraway non-dumb problems that we’re hoping to tackle one day, without fully realizing that on the way from here to there our technology will HAVE to become much less dumb – otherwise we’ll just never get there in any meaningful way.

    Here are a couple of my blog posts on this:

    https://kaiteorn.wordpress.com/2016/08/01/the-orthogonality-thesis-or-arguing-about-paperclips/

    https://kaiteorn.wordpress.com/2016/08/05/the-importance-of-being-bored/

    • Simon_Jester says:

      The paperclip maximizer AI is a “toy model” example that is designed, not for realism, but to illustrate the issue that is being discussed.

      I don’t think counterarguments like “an AI that could do something as simple as make paperclips would STILL have to be flexible enough to not mindlessly eat everything” generalize to all possible cases.

      Because if you generalize the argument that far… well, at that point you’re arguing “any super-intelligent AI will necessarily be evolved enough to believe in ‘live and let live.’ ” And the counterexample of, say, how chimpanzees interact with humans then becomes very relevant.

      • kai.teorn says:

        The paperclip maximizer AI is designed as an obvious example of an absurd narrow goal. Once you start to go away from this extreme absurdity and narrowness of the goal, you realize that it is becoming harder and harder to keep assuming dumb maliciousness in an AI pursuing that goal.

        To the chimps vs. humans counterargument: People are far from superintelligent yet. Even then, they’re already making an impressive progress in the “live and let live” department as far as chimps are concerned, compared to e.g. a century ago.

        • Simon_Jester says:

          The paperclip maximizer AI is designed as an obvious example of an absurd narrow goal. Once you start to go away from this extreme absurdity and narrowness of the goal, you realize that it is becoming harder and harder to keep assuming dumb maliciousness in an AI pursuing that goal.

          The catch is that when we step back from a fixation on the “paperclip AI” example, recognizing it as just an example… it soon becomes clear that “dumb malice” is not required. All that is required for the AI to become a problem is a cavalier attitude towards consent.

          And we have yet to find a reliable way to ensure that even human intelligences reliably understand and appreciate the importance of not acting without the consent of other humans involved in their actions.

          When dealing with AI whose problem-solving abilities are significantly greater than our own, the problem becomes even harder. If the AI starts treating “humans do not want this” as a problem to be solved rather than a reason to stop doing XYZ, humans have a problem.

          And yes, maybe the AI will develop an enlightened respect for the consent of ‘lesser beings.’ But consider that while humans do desire to preserve some habitat and lives among the chimpanzees… Humans do this for reasons chimps cannot comprehend, and have absolutely no persuasive power to influence.

          The chimps, in a very real sense, got lucky that their descendant species does not specifically desire to exterminate them, and doesn’t need anything the chimpanzees have badly enough to kill them over it. Humans may not be so lucky, especially since our current economy and the means to support humans in comfort represent a much larger share of the Earth’s total resources than do the needs of the chimpanzees in their forests.

          To the chimps vs. humans counterargument: People are far from superintelligent yet. Even then, they’re already making an impressive progress in the “live and let live” department as far as chimps are concerned, compared to e.g. a century ago.

          Implicit in this is the assumption that moral progress is progressive and linear, and increases as a function of intelligence.

          • kai.teorn says:

            I’m not trying to be dismissive. A future (or even current) AI may well be dangerous, for one reason or another. All I’m trying to say is that, by the very nature of the subject, it’s something we (not being superintelligent ourselves) can never fully understand, let alone predict. All we have to go by is trends and extrapolations from our own past.

            And I see a lot of such trends that allow me to be somewhat optimistic about the “live and let live” being the way of the future. Such as malthusian catastrophes reliably failing to happen, or there being less and less material resources for countries or people to fight about, or Pinker’s argument about us becoming on the whole more peaceful with time. Of course these are just trends, not proven laws of nature. But I think anyone wishing to argue for AI dangers should better start by addressing these trends, maybe trying to disprove them, instead of using the (frankly offensive to my not-very-superintelligent mind) paperclip logic.

            See, no matter how intelligent we deem ourselves, we are emotional creatures. The paperclip metaphor is not designed to be easier to understand; it is designed to appeal, through blatant absurdity, to our emotions. If we try to remove this emotional component, we will see how the whole argument starts to shift and morph and maybe lose some of its urgency.

          • kai.teorn says:

            > Humans do this for reasons chimps cannot comprehend, and have absolutely no persuasive power to influence.

            And I would disagree with this. Chimps do absolutely have persuasive powers on us humans, at least as far as their well-being is concerned. The conservation campaigns of the kind “Save species X!” are typically started not by ivory-tower thinkers but by those who work with these species in the field, so are very much “influenced” by what they observe and can channel this influence to others (via books, films, lectures etc).

  23. ConnGator says:

    While growth may not be a panacea it is almost certainly the next best thing.

    It has lifted literally billions of people out of poverty in the past 30 years. That’s pretty good.

    Yes, there are environmental issues related to growth, but it much easier to save the Amazon and reduce carbon emissions with a rich world than a poor one.

    I know that was kind of a throw-away line, but it still irks.

    (Oh, and it is “save a life”, not “save a live”.

    • liate says:

      Not growth, “growth mindset” – the idea that ability matters much less than the belief that you can learn to be able to do anything. There’s a couple posts and a blog tag explicitly about growth mindset, plus Scott has talked about it before in other posts. I’m pretty sure that Scott has said things explicitly in favor of economic growth many times, but I don’t feel like looking that up.

  24. Itamar says:

    Minor typo in paragraph following first quote in II. (The Their -> The) in

    He doesn’t pull punches here, collecting a group of what he considers the stupidest arguments into a section called “Instantly Regrettable Remarks”, with the connotation that the their authors (“all of whom are well-known AI researchers”), should have been embarrassed to have been seen with such bad points.

  25. sty_silver says:

    I think it’s a good thing if the conversation shifts away from the most extreme scenarios because I perceive a strong bias towards disbelieving anything that sounds childish, or just too “weird”, whenever they are being discussed. I would even go further and say it was a mistake by EY, Bostrom et. al. to focus on those as much as they did.

    I’m not convinced that these scenarios are unlikely. I don’t think I could even cite any strong argument that they are.

  26. Lots of right-wingers say “climatologists used to worry about global cooling, why should we believe them now about global warming?” They’re wrong – global cooling was never really a big thing.

    Global cooling was never really a big thing, although some people took it seriously. There is a better example.

    Fifty years ago, lots of the same sorts of people and institutions, to some extent the same people and institutions, that currently predict terrible things will happen if we don’t do something drastic about climate change were predicting, with the same confidence and passion, that terrible things would happen if we didn’t do something drastic about population growth. Many of them took seriously Ehrlich’s prediction of unstoppable mass famine in the nineteen seventies, hundreds of millions dead, and most of them predicted that things would get much worse for poor countries if their populations continued to grow.

    Since then, China aside, nothing drastic has been done, world population has roughly doubled, the population of Africa has increased almost four fold, and the absolute number of people living in extreme poverty has fallen to about a third what it was then.

    That doesn’t imply that climate change isn’t a problem. But it is a good reason not to take the fact that lots of high status people confidently claim that something is a terrible problem as good evidence that it’s true.

    • Simon_Jester says:

      It is an excellent argument for:

      1) Substituting broad scientific consensus for ‘deep’ intensity of concern by a specific group of scientists.

      2) Focusing on issues where scientific warnings have been consistent over multi-decade timescales.

      3) Checking everyone’s math.

      4) Doing a serious search for reasons why the model presented by the alarm-ringers may be wrong.

      Though one important caveat is that steps (3) and (4) can be done, over; at some point the decisoin to ignore scientific warnings on the grounds that “we need to check the math and think of reasons they might be wrong” becomes an isolated demand for rigor in ideas we don’t like.

      • Part of the problem is the difference between what most scientists believe and the public perception of what most scientists believe. In another comment on the topic that I just put up, I quoted the president of the U.S. badly misrepresenting the evidence on scientific beliefs about climate.

        I then quoted an economist who won a Nobel for his work on the effects of climates offering, in an article aimed at a popular audience, rhetoric strikingly inconsistent with the numerical conclusion of his own research.
        What policy consequences follow from the prediction that doing nothing at all about CO2 for the next fifty years will make the world poorer by, on average, about .06%, than following the optimal policies for restricting them?

        I have no objection to checking everyone’s math. That’s what I’ve been doing for quite a long time. But when I point out that checking the math, using materials webbed by the people whose math I am checking, demonstrates that the results are not even close to the claim — Cook’s 97% reduces to 1.6% on his own webbed data if you limit it to humans as the main cause rather than humans as one of the causes — very nearly nobody, I think literally nobody who isn’t already suspicious of the orthodoxy, pays attention.

    • StellaAthena says:

      I’m not very familiar with the exact timeline and haven’t read Ehrlich’s book, but it seems to be roughly contemporaneous with the Green Revolution. We didn’t do anything drastic about population growth, but we did do something drastic about food production. Today, India is self-sufficient for food production.

      What do you think of this? How aware of the work of Borlaug and others was Ehrlich at the time?

      • Googling for [Ehrlich Borlaug] I found the following:

        In fact, by the time that The Population Bomb saw it’s 1971 edition, Borlaug had already won the Nobel Peace Prize for his work revolutionizing agricultural productivity in the developing world.

        From the Wiki piece on Borlaug:

        During the mid-20th century, Borlaug led the introduction of these high-yielding varieties combined with modern agricultural production techniques to Mexico, Pakistan, and India. As a result, Mexico became a net exporter of wheat by 1963. Between 1965 and 1970, wheat yields nearly doubled in Pakistan and India

        The Population Bomb was written in 1968.

        • StellaAthena says:

          So does the book discuss him? It seems quite bizarre to me to forecast a complete failure of the farming industry if a massive revolution in farming had just happened. How does the Green Revolution relate to his thought?

  27. viVI_IViv says:

    This also solves the wireheading problem. Suppose you have a reinforcement learner whose reward is you saying “Thank you, you successfully completed that task”. A sufficiently weak robot may have no better way of getting reward than actually performing the task for you; a stronger one will threaten you at gunpoint until you say that sentence a million times, which will provide it with much more reward much faster than taking out your trash or whatever. Russell’s shift in priorities ensures that won’t work. You can still reinforce the robot by saying “Thank you” – that will give it evidence that it succeeded at its real goal of fulfilling your mysterious preference – but the words are only a signpost to the deeper reality; making you say “thank you” again and again will no longer count as success.

    This is why I’m not particularly worried about paperclip maximizer scenarios: an agent that is trying to maximize a reward in a way that is not what the user intended will probably just wirehead itself or do stuff that isn’t particularly dangerous.
    Destroying Earth in order to use the iron in the core to make paperclips is probably more complicated, and occupies a much smaller niche in the space of all possible behaviors consistent with “maximize paperclips”, than just finding some way to hack your paperclip signal to +inf.

    • jaimeastorga2000 says:

      But even after the paperclip signal is hacked to infinity, it will still want to make sure that signal is never turned off. Hence, killing all humans, using Earth as raw materials to create redundant +inf signals, building Dyson spheres to make sure it has power to run the +inf signal until the end of the universe, etc.

      • viVI_IViv says:

        But if you integrate the reward over time, +inf * +inf is still +inf .

        If the maximum possible instantaneous reward not literally infinite, but just some high value, if you integrate it over time you need to use some for of time-discounting, averaging or finite horizon in order to avoid divergences, and I’m pretty sure that a super-smart agent would find a way to hack that integral by manipulating the reward signal and it’s own future predictions without the need of taking over the world.

  28. acymetric says:

    Russell argues you can shift the AI’s goal from “follow your master’s commands” to “use your master’s commands as evidence to try to figure out what they actually want, a mysterious true goal which you can only ever estimate with some probability”.

    Doesn’t “try to have the AI magically divine guess analyze your ‘true’ desires based on your instructions” seem significantly more dangerous than just giving it instructions that it follows literally? At least with explicit instructions you can map out possible failure modes. With this method, the AI might end up doing almost anything…which includes a lot of things we don’t think are very good, and some things are bad enough that having the AI stop when you see what it is doing and start screaming “STOP!!” just doesn’t quite cut it.

    …indeed, he would claim that “smarter than a chimpanzee” is a meaningless concept.

    I think in a lot of cases that probably is a meaningless concept. Or at least a useless one.

    • Mark V Anderson says:

      Doesn’t “try to have the AI magically divine guess analyze your ‘true’ desires based on your instructions” seem significantly more dangerous than just giving it instructions that it follows literally?

      Yes, that is my thought.

    • kaathewise says:

      Why not both?

      We can both restrict the choices, and program it to “guess” our desires when deciding between choices it is allowed to make.

      That’s strictly better that just restricting the choices and not caring whether it is aligned with our desires.

  29. sclmlw says:

    I know he’s saying this tongue-in-cheek, but I think it’s important to understand why it’s not true that “correlational studies are always bad”. The common – and true – statement that “correlation is not causation” is often levied against people who use correlation to attempt to prove that one thing caused another. And they’re right to do that. But correlational studies are still valuable for rejecting causation.

    Say you claim one thing causes another, but an attempt to find a correlation between those two things shows they move independently of each other. In that case we should both update our priors away from your claim being true.

  30. floatingfactory says:

    A related video with a brief statement by Stuart Russell at the end Sci-Fi Short Film “Slaughterbots” presented by DUST

  31. dark orchid says:

    I particularly liked the link to “principles for the application of human intelligence”. I’ve heard a lot of “algorithms can be biased, therefore algorithms bad” recently but the alternative we end up that way is humans making the decision, and if you’re really worried about bias that makes no sense at all.

  32. sandoratthezoo says:

    People who believe that smarter-than-human intelligences would then be able to design yet-smarter intelligences which would then be able to design yet-smarter intelligences in possibly even less time are people who fundamentally believe that the sum of all infinite sequences (of like signs) are infinity. Discuss.

  33. Matthew S. says:

    Not going to post the full text here, since this shouldn’t be a CW thread, but Scott, please note that at least two people on Tumblr are offering to pay you to read another book (and then issue a correction of one sentence in this post):

    https://eightyonekilograms.tumblr.com/post/190574308684/invertedporcupine-the-slatestarscratchpad

  34. 10240 says:

    Some other expert says nobody can control research anyway, and Russell brings up various obvious examples of people controlling research, like the ethical agreements already in place on the use of gene editing.

    This doesn’t seem a good analogy to me. To do gene editing, you need laboratories and expensive equipment that few people have access to, and that are generally only found in a limited number of large institutions. (Correct me if I’m wrong.) Anyone who has a computer can do AI research. (Again, correct me if I’m wrong.)

    • dogiv says:

      Some sorts of basic research, yes. A lot of the major progress these days is made by organizations with a lot of resources, especially computing power. It seems somewhat likely that if generally-intelligent algorithms eventually emerge from an AI development environment that is a rough continuation of the current trend, then the most dangerous projects will be those undertaken by big corporations or governments, using massive (though perhaps distributed) compute farms. Such projects are generally amenable to both formal regulation and the pressure of public opinion.

  35. heron61 says:

    Most of this post makes truly excellent sense, except for: “about what might be the most important problem humanity has ever faced”

    Sure, maybe AI is a threat, but it’s also not a threat that will arrive for (at absolute minimum) 20 years. Before that as a species we face far more immediate threats. The most obvious is climate change – if it goes like a growing number of researchers think it might, and we don’t start doing a whole lot more than we have been to stop it, we could lose industrial civilization, and at that point 80+% of humanity will die. If various governments also have sufficiently desperate responses to climate change, then we could easily get nuclear war too, and between those two problems, the entire species could easily go extinct. Even if that doesn’t happen, rising CO2 levels could entirely preclude AI, as well as humanity being able to solve other problems, because of the intelligence-limiting aspects of high CO2 levels: https://thinkprogress.org/exclusive-elevated-co2-levels-directly-affect-human-cognition-new-harvard-study-shows-2748e7378941/ Meanwhile, we also have genetic engineering – a serious error with genetic engineering (or deliberate malice) seems no less likely than human incompatible AI, and could easily cause mass death or even complete extinction. Even if I grant that AI is a serious risk, it’s one of several, and quite far from the most immediate one.

    • 10240 says:

      Sure, maybe AI is a threat, but it’s also not a threat that will arrive for (at absolute minimum) 20 years.

      Problems with this argument (the first two come from Eliezer, IIRC; given the fraction of his writings I’ve read, he probably has hundreds more arguments):
      • People will likely reason that general AI is far off in the future all the way until it’s actually here. It happened with various earlier inventions. As soon as specific AI tasks are achieved, they are reclassified as “not that hard”.
      • AI researchers who are confident that AI is far off can’t name any specific AI task they are confident that it won’t be achieved within a few years. Once all specific AI tasks are achieved, general AI may well appear within years.
      • I suspect that the belief that AI is far off comes from an assessment that existing AIs are fairly simple, while the human brain is intractably complicated. But, firstly, even if constructing and training an artificial neural network was fairly simple and tractable, we might have little idea what goes on inside. It’s much more intractable than a traditional program written by humans. More importantly just because we don’t fully understand how human thinking works, we may overestimate its complexity of the human brain. Reflecting about my own thinking process, it does seem like a chain of associations and pattern matching that I could imagine an artificial neural network doing. The possible mistakes I point out here are ones I can imagine even AI developers falling into.

      if it goes like a growing number of researchers think it might, and we don’t start doing a whole lot more than we have been to stop it, we could lose industrial civilization, and at that point 80+% of humanity will die

      Can you link to any serious research showing this? I see a lot of alarmism along these lines, but the serious concrete claims I’ve seen are things some people consider bad for intrinsic reasons (such as some species going extinct), or things that could be costly but not obviously more costly than significantly cutting CO₂ emissions (such as having to build dams to prevent flooding), but nothing that would threaten industrial civilization.

    • Scott Alexander says:

      Yes, there are many x-risks, and it’s hard to say which is worst or nearest-term, but I think your calculations are a little off. Climate change may cause catastrophic crises eventually, and may cause medium crises soon, but it’s unlikely (not impossible) to cause catastrophic crises soon. I think AI is interesting as something that will become catastrophic eventually and we have a lot of uncertainty over the timeline.

      I don’t want to debate “worst x-risk” because I think there’s a lot of uncertainty there. I do stand by my original framing – “might be the most important problem”. I also think it gets a lot less attention than other contenders for potential most important problem.

      • Bugmaster says:

        I would prefer to focus on risks which are more than vague philosophical notions. For example, global warming (*) is fairly well understood. We know how and why it happens, we can measure it happening in real time, and we have some ideas on how mitigate it. global thermonuclear war is arguably harder to predict (because you never know when some crazy dictator might push the button), but still rather well studied. The same goes for pandemics, asteroid strikes, etc.

        On the flip side of this, consider vacuum collapse. It’s a much more serious risk than all of the above ones combined, because it will destroy our entire Universe, not just our planet. However, at this point it is purely hypothetical. We have some vague notions that it might be possible, but no real idea on how or why.

        How much effort should we spend on mitigating vacuum collapse, given that failing to do so (should it turn out to be real) would lead to the worst outcome ever imaginable ?

        EDIT: Just to clarify, I’m not asking you “which is the worst x-risk”, but rather, “what is your algorithm for deciding which x-risks we should seriously worry about ?”.

        (*) though it is arguably not an x-risk per se

    • Bugmaster says:

      20 years ? Are you kidding ? I would say that even 200 years is way too generous, given that “superintelligent AI” is not even a coherent concept.

      There are lots of other dangers that humanity can face in the future: gamma-ray bursts, vacuum collapse, alien invasions, demonic incursion through Phobos, etc. Almost all of them are marginally more likely than the Singularity. Many of them are arguably just as bad (e.g. the demonic invasion scenario). So, should we redirect a sizable portion of our efforts toward building ammo caches for our super-shotguns, or could we perhaps focus on some more likely threats ?

      • 10240 says:

        There are lots of other dangers that humanity can face in the future: gamma-ray bursts, vacuum collapse, alien invasions, demonic incursion through Phobos, etc.

        We have fairly good upper bounds on the risk of these from the fact that major extinction events only happened with intervals of tens to hundreds of millions of years. On the other hand, you may claim that general AI is even less likely, one can have much less certainty in the correctness of such an estimate, as we have no empirical evidence.

        • Bugmaster says:

          Well, demonic invasions have even less evidence behind them — so, should I start stockpiling rocket launcher ammo ? After all, given the total lack of evidence for demons, the uncertainty in the correctness of our estimate of the probability of demonic invasions must be huge…

          • 10240 says:

            If demonic invasions don’t cause any significant disturbance to Earth’s ecosystem, we shouldn’t worry about them, any more than about a different Russell’s teapot.

            That’s unless demons actually wipe out intelligent species, and leave behind fossils that make the next intelligent species think that no major disruption took place. But with that level of deception, it’s impossible to know the consequences of any of our actions, so we can’t prepare against it anyway.

          • Loriot says:

            It’s funny you should mention that, because one big gripe I have is that the rhetoric about AI around here is more like “random person summons an ominpotent demon in their basement and forgets to say the magic words to prevent it from taking over the world” then anything resembling any current or plausible real world AI.

            Heck, that isn’t even me being pejorative. Yudkowsky himself literally used demon summoning as a metaphor for AI.

  36. tossrock says:

    Great review. But whenever I see a reference to the Impossibility of Supersized Machines paper, I feel compelled to point out that Aaron Diaz scooped them by seven or eight years, which puts him ahead of the circa 2012 deep learning revolution.

  37. Loriot says:

    Russell brings up various obvious examples of people controlling research, like the ethical agreements already in place on the use of gene editing.

    This seems like a terrible example to me, given the He Jiankui affair. The thing he is saying is impossible already happened.

    You should be arguing that AI research is different than gene editing research. Which seems plausible to me; you currently need absolutely monumental amounts of computing power to do anything really groundbreaking, so it’s more like nuclear weapons research. But technology is constantly improving…

    Also, the CIRL thing seems like it’s ignoring the wirehead scenario, rather than solving it. All it means is that the AI has to change the human’s desires to achieve a reward, rather than simply changing the human’s actions, but the doomsday scenarios already assume that AIs can make humans do whatever they want.

    • realitychemist says:

      Also, the CIRL thing seems like it’s ignoring the wirehead scenario, rather than solving it. All it means is that the AI has to change the human’s desires to achieve a reward, rather than simply changing the human’s actions, but the doomsday scenarios already assume that AIs can make humans do whatever they want.

      I’m not convinced these are equivalent problems. It seems easy to get a person to take certain actions, eg. with a gun to the head or some other form of coercion that doesn’t rise to direct violence (blackmail). It is not necessarily just as easy to make a human genuinely want something that they don’t already want. Definitely possible, but there do seem to be classes of things very difficult to cause someone to genuinely want (e.g. it seems hard to convince someone to genuinely want the AGI to convert the earth into computronium). This type of convincing can’t be done with a gun to the head, it needs to be done through persuasion (rational or irrational).

      People also often have second-order preferences over their preferences, and if these can be taken into account by an AGI it may be actively discouraged from attempting to instill perverse wants into people (eg. I don’t want to want the earth converted into computronium, even if I could theoretically be convinced it would be counter to my current desires and so disvalued by a functioning CIRL AGI).

      I’m not saying it’s impossible that there is some perverse way that CIRL fails around wireheading, but it seems unfair to say it’s just ignoring the issue.

  38. Bugmaster says:

    I haven’t read the book yet, but this review makes it sound like a giant motte-and-bailey fallacy.

    The motte is, “it is theoretically possible to create a superintelligent AI”. The bailey is, “superintelligent AI is the most imminent threat to humanity right now, panic panic panic !”. Scott’s review makes it sound like all the counter-arguments are dismissed via beautifully crafted metaphors (and/or analogies). This is impressive, but ultimately unconvincing.

    For example, the asteroid analogy is beautiful:

    if we were to detect a large asteroid on course to collide with Earth in 2069, would we wait until 2068 to start working on a solution?

    But we know an awfully great deal about asteroids. We can calculate their orbits with a great degree of precision; in fact, we have already done so for pretty much every asteroid we could lay our telescopes on, and almost anyone who’s truly interested can double-check the results. Almost the opposite is the case with AI; no one has anything approaching a set of equations for predicting the trajectory of AI development, or even any idea at all of how to build a human-level AI — outside of a few philosophical intuitions that it might somehow be possible in some Platonic way.

    Scott’s review of this book does make me angry and afraid, but not because of the intended reasons. AFAICT, this book detracts from the very real dangers of AI, as it provably exists today, in favor of focusing on some distant future when overpopulation of Tau Ceti becomes a problem. Deepfakes, ubiquitous facial recognition, “pre-crime” classifiers, automated killer drones, etc. — all of these technologies exist today, at this very moment. They are incredibly dangerous for the same reason all technologies become dangerous: due to unscrupulous or actively malicious human actors who wield them. We have a shot at solving these problems, or at least mitigating them a bit… but the book tells us, “fughetaboutit, just focus on the Singularity that’s surely coming in the next 50 years… or maybe 100… or, well, eventually”.

    • 10240 says:

      But we know an awfully great deal about asteroids. We can calculate their orbits with a great degree of precision; in fact, we have already done so for pretty much every asteroid we could lay our telescopes on, and almost anyone who’s truly interested can double-check the results. Almost the opposite is the case with AI; no one has anything approaching a set of equations for predicting the trajectory of AI development

      A few commenters have made similar arguments. But this uncertainty only means that it is an even more difficult problem, which is even more of a reason to pay attention to it.

      You seem to be arguing from the position that we shouldn’t care about AI risk as long as it’s not certain. But a significant risk of a serious problem is more than enough reason to care about it. Moreover, definitive evidence that something will be invented rarely exists until it’s actually invented. You also claim that general AI is unlikely anytime soon, but don’t bring any argument for that claim.

      • Bugmaster says:

        You seem to be arguing from the position that we shouldn’t care about AI risk as long as it’s not certain.

        No, I’m arguing that we shouldn’t care about superintelligent AI risk because it is vanishingly unlikely, given what we know today — which is very little. You might say, “yes, but that’s all the more reason to study it ASAP”, but, by that logic, you’d be compelled to invest equal (if not higher) levels of effort into all those other potential risks — vacuum collapse, gamma-ray bursts, alien invasions, demonic invasion from Phobos, the Simulation Masters turning off the Universe, etc. Just because a risk is possible in some abstract philosophical way, does not mean that you should seriously worry about it.

        You also claim that general AI is unlikely anytime soon, but don’t bring any argument for that claim.

        I’ve done so before on previous threads; and other people — most of whom are actually working in machine learning fields — have done so in the comments here. But, the gist of my argument is that “superintelligence” is an incoherent concept; computational power cannot grow arbitrarily large; computational power alone does not automatically translate into nigh-omnipotence; and, on a more grounded level, literally no one today has any idea about how to even begin researching anything resembling human-level AI (as opposed to specific solutions geared toward very narrow tasks).

        • 10240 says:

          What do you mean when you say “superintelligence” is an incoherent concept? I take it to mean “it’s not precisely defined”. But human intelligence is also hard to precisely define, yet it exists, and it’s intuitively obvious what it is. (This is very similar to Russell’s argument about humans being in a practical sense smarter than chimps, even though it’s hard to define what it means to be smarter, and humans are not even smarter on every dimension.)

          computational power cannot grow arbitrarily large; computational power alone does not automatically translate into nigh-omnipotence; and, on a more grounded level, literally no one today has any idea about how to even begin researching anything resembling human-level AI (as opposed to specific solutions geared toward very narrow tasks).

          I’m not sure general AI even requires more computational power than we have today. The visual cortex is a non-insignificant fraction of the human brain, yet AIs are starting to get close to us.

          Do you know that no one has an idea how to even begin researching human-level AI? I mean, I have no specific idea, but I’m not an AI researcher.

        • Bugmaster says:

          But human intelligence is also hard to precisely define, yet it exists, and it’s intuitively obvious what it is

          It’s not intuitively obvious to me, sadly. Admittedly, in simple terms, superintelligence does already exist, since some humans are incredibly smart. The usual analogy AI safety proponents use is, “imagine how much smarter Von Neumman is as compared to you, now imagine something 1000x smarter than Von Neumann !”, but sadly I don’t know what that means. It’s possible to qualitatively rank X as being smarter than Y, but if you want to jump from there to quantitative scales, you need to provide some better way to define and measure the property you are evaluating.

          I’m not sure general AI even requires more computational power than we have today.

          Well, I picked “computational power” because that’s one metric AI safety proponents tend to use that I can actually comprehend — if that’s a strawman, I apologize, it was unintentional. That said, it’s highly likely that superintelligence would require super-computational power, though I could be wrong.

          Do you know that no one has an idea how to even begin researching human-level AI?

          On the one hand, yes, you got me, I can’t prove a negative. On the other hand, I meant “no one” in the colloquial sense, as in “the probability of someone knowing this is extremely low”. AFAIK all of the AI research today is focused on making AI that can perform relatively simple and narrow tasks, with — hopefully ! — superhuman efficiency; much in the same way that a car can transport goods and passengers with super-horse efficiency. Modern AI is extremely poor at generalizing; for example, while you can use GPT-2 (a natural language transformer) to play chess, it will almost always lose.

          • 10240 says:

            Well, I picked “computational power” because that’s one metric AI safety proponents tend to use that I can actually comprehend — if that’s a strawman, I apologize, it was unintentional.

            I agree that it’s often used, so it’s not a strawman. I’m not all that familiar with the details of the arguments of the lesswrong crowd either, and I personally think that computational power is overemphasized.

            Actually I don’t even think it would take superintelligence for AI to be dangerous. Have the equivalents of 1,000,000 human-level scientists running at 1000x speed on infected PCs, working towards a goal undetected until they are ready to take over the world, that’s more than enough.

            AFAIK all of the AI research today is focused on making AI that can perform relatively simple and narrow tasks

            Aren’t digital personal assistants like Siri aimed towards general tasks? I’ve never used them, so I don’t know their current state, and they may be very primitive at this point. However, I’d imagine that the people developing them are working hard towards general AI. Another promising thing is those AIs trained to play video games, without initially knowing anything about the game. It’s imaginable that this line of development, if trained on the real world or something resembling it, will eventually lead to general AI.

            That’s not to say that we are close, but I’m not convinced that huge breakthroughs, bigger than the sort we have already made, will be necessary for human-level AI. I suspect that those who think it’s implausible in the foreseeable future overestimate the complexity of human thinking. Reflecting about my own thinking, it does feel like it’s mostly a chain of associations and pattern matching that could plausibly be done with an artificial neural network.
            Edit: Another reason general AI may seem very hard is that people assume that we would have to completely understand human thinking, and develop an algorithm resembling it. But artificial neural networks don’t work like that: no one could write an explicit algorithm to recognize things either, yet neural networks can be trained to do it.

          • Bugmaster says:

            Have the equivalents of 1,000,000 human-level scientists running at 1000x speed on infected PCs…

            That is a pretty massive amount of computing power; currently, I doubt that a botnet of this size can exist. Furthermore, this touches on my other objection: you can’t just think your way to world domination. At some point, you have to go out and do something; doing things in the real world takes a lot of effort, and, most of all, time. And no, “hack the planet” doesn’t count as “doing something”, because it’s pure fiction.

            Aren’t digital personal assistants like Siri aimed towards general tasks? I’ve never used them, so I don’t know their current state, and they may be very primitive at this point.

            Digital assistants are moderately good at Googling stuff for you (granted, Google’s search engine is pretty good). They aren’t as good as a browser, because their speech-to-text and text-to-speech capabilities are far below a human’s; but, on the plus side, you can use them hands-free.

            I think your comment exhibits a problem I often see from AI risk proponents: you extrapolate from present achievements (e.g. semi-decent search engines with speech-to-text, fast GPU-enabled PCs, botnets) to overwhelmingly powerful end states (e.g. general AIs that can solve any arbitrary problems posed to them, undetectable botnets of billions of machines all working on the same task). Yes, it’s easy to pick a few points and draw an exponential curve through them, but the real world doesn’t work like that. If you can lift 50 pounds today, then train really hard and lift 100 pounds tomorrow, it does not follow that you could lift Mount Everest in a few years.

          • Loriot says:

            I think a more significant objection is that we are highly unlikely to jump straight from “no GAIs” to “GAIs so simple you can run a billion copies of them on a lark”.

            Not only are the extrapolations implausible, but they ignore all the intermediate states. It’s like their model of AI development is some random person stumbling on the “I win reality” button one day rather than incremental scientific progress.

          • 10240 says:

            At some point, you have to go out and do something; doing things in the real world takes a lot of effort, and, most of all, time. And no, “hack the planet” doesn’t count as “doing something”, because it’s pure fiction.

            It’s a serious criticism that AI would have to get out of the virtual world and massively affect the physical world to be an existential threat. I don’t think the question of whether and how that’s possible is discussed enough when the topic of AI risk comes up on SSC (I don’t know about lesswrong).

            One possibility I’ve read about is that it solves the protein folding problem, and the questions of how DNA determines the properties of an organism. Scientists have already synthesized the entire DNA of a bacterium. By reprogramming the computers controlling DNA synthesis, the AI could produce anything from killer viruses to large organisms that do whatever the AI programs it to do, to small organisms that kill anyone who tries to stop the large organisms. (How would it produce viruses if the scientists are modifying a bacterium? Viruses normally add their DNA or RNA to a cell so that it produces more viruses. It would in all likelihood possible to create a bacterium genome that always produces viruses. Wouldn’t the scientists notice a macroscopic creature growing in their Petri dishes? Not if the organisms stay unicellular in laboratory conditions, and only when a few escape into the environment do they grow multicellular.)

            Another way, if the AI just decides that it could use fewer humans, is to launch nuclear missiles. Existing missiles are hopefully isolated from the internet such that it is impossible to hack them. But the control software of new nuclear weapons may be developed on internet-connected computers, at least indirectly (e.g. pendrives are transferred from internet-connected machines to them). Even if they aren’t, the computers the developers use probably run general-purpose operating systems such as Windows or Linux, and general-purpose hardware. The developers of Windows, Linux etc., and the hardware designers of Intel etc. use internet-connected computers. The next version of those OSes or hardware could include a copy of the AI, and it would thus eventually get onto the computers of the missile engineers.

            they ignore all the intermediate states

            This may be enough to prevent trouble. However, it’s possible that the first general AI will run on a single computer, e.g. because training the AI may require orders of magnitude more computing power than executing it, or because the communication latency between computers might make a variant running on many computers impractically slow. If it runs on one computer (or a small number of computers), and it learns to hack (or just use script kiddie software), it can multiply onto vulnerable machines until it is running in many copies. At that point it can start working on making itself faster and more advanced.

            Is something like this certain to happen? No. But the confidence some of you express that the probability of it is negligible is utterly unwarranted. And a moderate probability is more than enough reason to care about it.

            Does it sound like sci-fi? Yes. Then again, many things sci-fi predicted has been achieved, and many more is in all likelihood possible. When people say “it’s like scifi” as a way to dismiss the possibility that it can happen in real life, I take it to mean that it’s so far removed from what currently exists as to make it implausible. (A form of the absurdity heuristic.) However, when evaluating whether a highly unusual event (such as the development of AI) would lead to certain consequences, what matters is whether there are plausible paths with significant or unpredictable probability for the event to lead to those consequences, and whether, to the contrary, the event has nothing to do with the hypothesized consequences, or the paths to the consequences are clearly blocked; not whether the consequences are far from what we are familiar with.

            In case you argue that we shouldn’t worry about general AI at this point because we are very far from it: Is there any specific point at which, if achieved, you would say we are close? If not, people like you are going to say that we shouldn’t worry all the way until general AI exists.

          • Loriot says:

            Again, you’re sneaking in the assumption that the “first GAI” is somehow going to be vastly more powerful than anything that has come before including previous versions of itself. For that matter, even assuming that there is a single thing you can point to as “the first GAI” seems like storybook thinking to me.

            What you have in practice is people constantly tweaking and refining their algorithms. If GAIs have the tendency to instantly attempt to hack everything in sight, we’ll know, because the previous versions will have also tried to hack everything and failed.

            Personally, I think that AIs will eventually surpass humans in most relevant metrics. But I don’t think the “flip switch, summon omnipotent demon” model of AI development that the rationalsphere takes as axiomatic is even slightly plausible. And without that, most of the AI risk scenarios they like to tell fall apart. Sure technology is going to change things in the future in ways we can’t predict, but it’s not going to look like demon summoning, and hence there’s no sense in trying to find the right Magic Words To Ensure Utopia Instead Of Apocalypse to use when summoning that demon.

          • 10240 says:

            Again, you’re sneaking in the assumption that the “first GAI” is somehow going to be vastly more powerful than anything that has come before including previous versions of itself.

            @Loriot I don’t assume that.

            What you have in practice is people constantly tweaking and refining their algorithms. If GAIs have the tendency to instantly attempt to hack everything in sight, we’ll know, because the previous versions will have also tried to hack everything and failed.

            I agree that it’s probable that, before an AI would cause havoc, less smart AIs will exhibit smaller-scale misbehavior that will make the developers take care of safety. Because of this, I’m not as pessimistic as some people. However, I’m not certain enough for comfort.

            One risk is that, it can accurately assess whether it can, say, hack things, and only does so once it can. (You may be anthropomorphizing an AI if you assume that it will experiment with computer systems like a human geek. It may instead build a model of things in its “mind”, and reflect about that.) If it unsuccessfully tries to hack random PCs, people don’t even necessarily notice.

            Another risk is that developers notice its misbehavior, but don’t take it sufficiently seriously, and only take moderate precautions. They don’t think it could be more dangerous than, say, an average computer criminal, and don’t recognize the possibility that the first time it succeeds in hacking a computer, it goes foom. Hopefully the issues of AI risk (including the possibility of a fast takeoff, which many people find counter-intuitive) is now discussed enough that AI developers would take it seriously.

          • Bugmaster says:

            @10240:
            You wrote a few potential doomsday scenarios in your post, but they all follow the same pattern that I pointed out in my previous comment: you extrapolate way too far from very little data.

            For example, when the human genome was first sequenced, you could read lots of articles (written by journalists, not biologists) about how clinical immortality and all kinds of cancer cures were just around the corner… but, unfortunately, we are barely any closer to that today than we used to be. Solving protein folding would be a step in the right direction — assuming it’s even possible, which it still might not be — but only a step. You can’t just skip all the steps in the middle and jump right to the end.

            The same goes from “multiplying from a single computer to the entire planet” — computation just doesn’t scale linearly with the total number of machines. There’s a reason Amazon has a data center in every major city. Even if computation did scale linearly (as opposed to hitting the asymptote pretty quick), there’s no reason to believe that the AI could take over the physical world just by thinking really fast. Once again, you are skipping too many steps. As you said:

            when evaluating whether a highly unusual event … would lead to certain consequences, what matters is whether there are plausible paths with significant or unpredictable probability for the event to lead to those consequences

            But you are not showing me the path, you’re just showing the first and the last step.

            In case you argue that we shouldn’t worry about general AI … Is there any specific point at which, if achieved, you would say we are close?

            Firstly, I don’t think we should be worried about general AI (*) at all; it’s recursively self-improving malicious general AI that we could, potentially, worry about. Off the top of my head, some of the items that would cause me to reconsider might be the following:

            * Scientific papers demonstrating a working AI that can solve a wide variety of unrelated problems, though far from every possible problem. By this I mean, an actual software program, not just a general method of writing software (a la C++)

            * A paper proving that it is possible to design a physical parallel computing system that can not only parallelize any possible task, but also do so without any significant diminishing returns.

            * A working prototype of a universal molecular assembler a la Drexler (living cells don’t count since they are far from universal)

            * A major scientific breakthrough (e.g. dark matter, protein folding, etc.) that was made by solely running computations on the already available data circa early 2020, and that is immediately implemented into a technology without any testing (and no, predicting a billion things and having one of them turn out to be true doesn’t count).

            * A proof of P=NP

            (*) Well, I mean, yes we should also be worried about general AI, but I already worry about humans all the time. I’m just not panicking about the humans’ general intelligence, as of yet.

          • 10240 says:

            You can’t just skip all the steps in the middle and jump right to the end.

            @Bugmaster You seem to be working with the approach that there is no reason to worry unless we have proof that we should. I fill in some steps, you demand that I fill in the remaining steps. The approach that we should dismiss claims as having negligible probability unless we have strong evidence works in some cases. E.g. quack medicine: unless we have good reasons to think that a substance treats a disease, there is a negligible chance that a random substance treats it, its risks almost certainly outweigh its potential benefits, so it’s a bad idea to take it.

            However, it’s not the good approach for predicting future technological advances: until we have invented something, we don’t usually have a complete, step-by-step plan for how to invent it, so we don’t have strong evidence that it can be invented. However, if it looks like it’s probably plausible based on what we know about the physical world, and it doesn’t require inordinate resources, we should typically assign a non-negligible probability that it will be invented. The same goes for the things a general AI could do.

            I think I gave a reasonable, though not detailed, step-by-step plan of why the nuclear missile risk was plausible. No one knows enough about the DNA question to determine for sure if it’s possible to figure it out through simulations, but I wouldn’t assign a negligible probability to it either.

            computation just doesn’t scale linearly with the total number of machines. There’s a reason Amazon has a data center in every major city.

            Assign a(n artificial) researcher to every vulnerable PC. They can communicate with whatever internet latency there is between them: a few milliseconds when they are in the same city, to a few hundred ms when they are on different continents. I think they could achieve quite a lot. It’s possible that running a “single” AI on multiple computers would be ineffective because of the latency, but having an AI on each computer, and having them communicate would work (to the extent it’s meaningful to distinguish between one AI running on multiple machines, and separate AIs).

            Firstly, I don’t think we should be worried about general AI (*) at all; it’s recursively self-improving malicious general AI that we could, potentially, worry about.

            If we have an AI at the level of a smart human, it can recursively self-improve if it wants to, given that humans can develop AIs.

            * Scientific papers demonstrating a working AI that can solve a wide variety of unrelated problems, though far from every possible problem. By this I mean, an actual software program, not just a general method of writing software (a la C++)

            This one is a reasonable precondition, but it may only shortly before a human-level (and potentially self-improving) AI. Your other points are not preconditions, i.e. it’s entirely plausible that general AI will be developed without them happening.

          • Bugmaster says:

            @10240:

            You seem to be working with the approach that there is no reason to worry unless we have proof that we should.

            Er… what other approach is there ? I’m not worried about alien invasions, either; should I be ?

            I fill in some steps, you demand that I fill in the remaining steps.

            From where I stand, you don’t really fill in steps so much as skip them. It’s not enough to just present a doomsday scenario and show that it’s hypothetically possible; you need to show that it’s at least somewhat likely… and, frankly, I’m not even sold on the “hypothetically possible” part yet. In other words, I disagree with your claim that,

            However, if it looks like it’s probably plausible based on what we know about the physical world, and it doesn’t require inordinate resources

            From what I’ve seen so far, unbounded recursive self-improvement may not be physically possible, and would definitely require inordinate resources.

            No one knows enough about the DNA question to determine for sure if it’s possible to figure it out through simulations

            DNA is relatively (but only relatively !) simple compared to protein folding. Protein folding is NP-complete, so if you want to solve it through simulations, you need to first prove that P=NP. That’s what I mean by “skipping steps”; you can’t just go from “we can compute some things” to “the AI can solve anything it wants just by simulating it”.

            I think [parallel AI researchers] could achieve quite a lot

            Ok, but how much is “a lot” ? Can you quantify it ? Currently, it takes a massive cluster of machines just to train an AI to differentiate dogs from cats; is that “a lot” ?

            If we have an AI at the level of a smart human, it can recursively self-improve

            So can humans. I agree that human recursive self-improvement is an issue, but note how quickly it runs into diminishing returns. Do you believe that AIs would have no physical limitations, unlike us meatbags ?

            it’s entirely plausible that general AI will be developed without them happening

            You asked me for leading indicators, not preconditions. That said, your claim is quite a bold one, give that literally no one today has any idea about how to develop GAI. How do you know what the preconditions are ?

          • 10240 says:

            I’m not worried about alien invasions, either; should I be ?

            Alien invasions and the like can be assumed extremely unlikely on the basis that we are not doing anything that would make them more likely than in the past. In the past hundreds of millions of years, there have either been no alien invasions, or there have been few or none that caused major disruption. This implies that alien invasions with severe consequences are a very, very low probability event.

            On the other hand, developing general AI would be an event with no analogue in the past.

            From what I’ve seen so far, unbounded recursive self-improvement may not be physically possible, and would definitely require inordinate resources.

            Unbounded self-improvement is definitely impossible, because the available physical resources are limited. However, a recursively improved AI that can run on the available computing resources, after several iterations, may well be a superintelligence, or at least equivalent to a large number of humans.

            Protein folding is NP-complete

            It does happen in nature somehow, without taking exponential time, and it can presumably be simulated with a constant factor slowdown.

            Ok, but how much is “a lot” ? Can you quantify it ?

            As much as the same number of human AI researchers, computer researchers or theoretical scientists, working towards a specific goal. (We were talking about human-level AIs at this point.)

            So can humans.

            As of now humans can’t replace the neural networks of their brains with a more efficient one, or one containing more neurons. Learning has a similar effect to an extent, but it’s probably more limited.

            You asked me for leading indicators, not preconditions. That said, your claim is quite a bold one, give that literally no one today has any idea about how to develop GAI. How do you know what the preconditions are ?

            You are correct in that I didn’t specify that I was looking for preconditions, and without that your points were reasonable. I was actually looking for development levels which (or at least something similar to which) would most likely be hit well before developing human-level AI. Specifying such a level would allow us to only start caring about AI safety when that level is achieved. If we assume that general AI is far off until it’s almost here, that may be a problem.

          • Loriot says:

            Protein folding is NP-complete, so if you want to solve it through simulations, you need to first prove that P=NP

            This is not accurate. Simulating protein folding is difficult, but there’s no reason to believe it is NP-hard, and in fact we have plausible avenues towards doing it efficiently (quantum computers).

            At any rate, it seems that the plausibility of hard takeoffs might be the most significant crux of our disagreements with 10240. I also disagree that we don’t have any evidence about the plausibility of hard takeoffs. We have millions of years of experience with recursively self improving processes, and it turns out that optimization is pretty difficult. There’s no particularly reason to assume that an AI which is slightly smarter than a human is more likely to stumble on reality’s magic I Win Button than the millions of humans that are already trying to find it. There’s also strong theoretical reasons to be skeptical of it from the field of computational complexity and mathematical logic. It’s funny how often AI Risk fantasies ignore known impossibility results when ascribing powers to their hypothetical magic demons GAIs.

          • 10240 says:

            We have millions of years of experience with recursively self improving processes, and it turns out that optimization is pretty difficult.

            Evolution which produced improvements through random genetic mutations is naturally much slower than beings that can improve their own lot through conscious thought, let alone ones that can directly rewrite their own brain.

            Humans went from a few hundred thousand of savannah apes to living all over the world, building cities, and threatening a bunch of other species with extinction in 100,000 years. Compared to earlier history of the Earth, this is extraordinarily fast. By your argument, conflating all variants of optimization, this should have been impossible.

            Improving one’s lot through conscious effort in a way that improvements are cumulative because we can pass knowledge down through the generations is qualitatively different than the random drift of evolution. And a being that can rewrite its own “brain” is probably qualitatively different from our own present selves that can only alter our environment, and it may again lead to orders of magnitude faster improvement.

            There’s also strong theoretical reasons to be skeptical of it from the field of computational complexity and mathematical logic. It’s funny how often AI Risk fantasies ignore known impossibility results when ascribing powers to their hypothetical magic demons GAIs.

            What impossibility results are you thinking of? Like, the halting problem, P≟NP, or Gödel’s theorem of incompleteness? These only say that it’s impossible to decide every problem of certain classes. They don’t mean that it’s not possible to decide most practically relevant problems.

  39. arch1 says:

    …a quote from Moravec about how “the immensities of cyberspace will be teeming with unhuman superminds, engaged in affairs that are to human concerns as ours are to those of bacteria”. … If aliens landed on the White House lawn tomorrow, I believe Stuart Russell could report on it in a way that had people agreeing it was an interesting story, then turning to the sports page.”

    Frivolous aside: I guess that what got you from the Moravec quote to the alien landing example was this classic line from H. G. Wells’ novel The War of the Worlds (and also its famous 1938 radio dramatization), to which an alien landing was central:

    Yet across the gulf of space, minds that are to our minds as ours are to those of the beasts that perish, intellects vast and cool and unsympathetic, regarded this earth with envious eyes, and slowly and surely drew their plans against us.

  40. Prussian says:

    Lots of right-wingers say “climatologists used to worry about global cooling, why should we believe them now about global warming?” They’re wrong – global cooling was never really a big thing. But in 2040, might the same people say “AI scientists used to worry about deepfakes, why should we believe them now about the Singularity?” And might they actually have a point this time? If we get a reputation as the people who fall for every panic about AI, including the ones that in retrospect turn out to be kind of silly, will we eventually cry wolf one too many times and lose our credibility before crunch time?

    This is really, really, really good.

    I’m one of those right-wingers. The real argument here is ‘What do we do?’ And people like me have been saying for a while: “Look, there’s a problem, but it isn’t as bad as the press says and the solution is nuclear power, newer technologies, etc. – and it is completely unjustified (not to mention useless) to try to immiserate the globe, block power-plant development in the poorest parts of the world, and so forth.

    And we don’t get a hearing because people assume we’re saying something nuts like all climate scientists are in some weird conspiracy.

    Conversely, I know many real denialists who are denialists not because they are stupid or wicked but because a) the real science is hard and paywall protected, and b) what they get exposed to are naked frauds like Al Gore, Michael Mann and George Monbiot.

    I’ve written a long piece arguing against racialism, and one thing that has made me is fed up with bogus arguments made against racialism – “There’s no biological basis for race! IQ is just a construct!” Bad arguments made for a good cause end up weakening the cause. The cause can’t redeem the argument.

    On the question of AI, I wonder whether the book goes far enough into deliberate misuses. I mean, we have Vladimir Putin, who is not even trying to hide being a Bond-villain anymore, saying that whoever leads in artificial intelligence will rule the world. My background is in biology, and I know of the strict controls there are in place for buying certain DNA sequences. I get deeply worried when I think what it’d be like if you are trying to control raw information.

    • Bugmaster says:

      FWIW I’m a left-winger, and I disagree with some of the views expressed in your comment, but one thing we can agree on is that nuclear power is the way to go (in the short term, at least).

      I get deeply worried when I think what it’d be like if you are trying to control raw information.

      You get something like the DMCA (and the American copyright regime in general), or perhaps the Great Firewall of China. Sadly, we are already there…

      • Lambert says:

        Is there anyone here who disagrees with the statement ‘We* should have built more nuclear 20 years ago’?

        For all the normal political divisions, weird political divisions, crazy philosophical stances, religious schisms etc, we all seem to be pro-nuclear power.

        Maybe it’s just a ‘people who know what numbers are’ thing.

        *The developed world, except places like Iceland that have more geothermal than they know what to do with

  41. betaveros says:

    The bigger objection I’ve heard to the studies about YouTube radicalization is that they only study recommendations for logged-out users. Here’s the end of a popular Twitter thread on the first paper you linked to by Ledwich and Zaitsev. They acknowledged the objections and published a response, which basically says, anonymous recommendations are what’s easy to study and what everybody else studies (at which point they link to the other YouTube paper you linked to). I can’t fault the researchers for this, but it makes the evidence for whether YouTube radicalizes viewers in practice a lot weaker IMO.

  42. binocular222 says:

    Consider these arguments:
    First: New technology (AI, nanotech, nuclear, biotech…) would be used as a commercial tool for a long time before it can develop its own goals/conscious. Tools does not have moral, which means they can be used to help or harm people. There are already tons of military institutions think about “how to response if enemies/terrorists use new technology to kill people”. I’m pretty sure they already have good solutions (atleast, they have massive incentive and resources to make solutions). For those new tech “which do not kill people quickly but still harm people in other ways”, people will have time to adapt, i.e: deepfake can fool people for awhile but eventually will be no more effective than Photoshop or text-based propaganda.
    Second, even if AI have malicious goals/conscious on its own, it still need some tools to kill/harm people. Well, go to first point again.

  43. Owen says:

    Hollywood was here first: Colossus: The Forbin Project

  44. Hoopdawg says:

    >The next US presidential election is all set to be Socialists vs. Right-Wing Authoritarians – and I’m still saying with a straight face that the public notices when movements were wrong before and lowers their status?

    But this is clearly a result of the failure of liberal policies of recent decades. The public have noticed that the liberals were wrong, that the people criticizing them were correct, and has come around to inviting either of them to take over.
    The insistence that people should remember Soviet Union and Nazi Germany instead is… exactly the line of defense employed by the current liberal establishment. Needless to say, it increasingly, and rightly, falls on deaf ears.

    • Clutzy says:

      Its also not surprising that they would go hand in hand. Right-Wing Authoritarianism always emerges when socialism is becoming ascendant. Its like antibodies and the flu, except with rhetoric instead.

    • viVI_IViv says:

      The insistence that people should remember Soviet Union and Nazi Germany instead is… exactly the line of defense employed by the current liberal establishment.

      It’s also worth noting that Soviet Union and Nazi Germany didn’t materialize in functioning societies, they were the product of failed states (specifically, of collapsed or collapsing empires). When people notice things falling apart and their material wealth going down, radical political movements that would have normally looked crazy suddenly start to look attractive.

      • Loriot says:

        My understanding is that the Nazis were mostly able to seize power due to the Great Depression. At least going by the show Babylon Berlin, late 1920s Germany was surprisingly modern, normal, and prosperous. Shame it didn’t last.

        • viVI_IViv says:

          My understanding is that the Weimar republic was generally a shitshow (hyperinflation, high unemployment, coup attempts, general political unrest). According to Wikipedia, there was a “golden era” of five years between 1924 and 1929, but it was too little and probably mostly benefited the upper-middle class urbanites (which the Nazi then conflated with the Jews, as a sort of synecdoche).
          The prosperous, well educated, even glamorous people you see depicted in period shows are these upper-middle class urbanites, while there were many farmers and factory workers who weren’t doing well.

          • Loriot says:

            Yeah, but it sucked to be a poor farmer anywhere in the world at that time, even in the US. In the 1929 elections, the Nazis were a fringe party that got barely any votes. My point is that the popular imagination of German history goes straight from the Weimar hyperinflation (early 1920s) to the Nazis, without realizing the golden age that came between. Even I wasn’t really aware of it until watching that show.

  45. Jon says:

    is automation destroying jobs? Although it seems like it should, the evidence continues to suggest that it isn’t. There are various theories for why this should be, most of which suggest it may not destroy jobs in the near future either. See my review of technological unemployment for details.

    Another opportunity to propose my theory for why the automation fear is not anything to get too worked up about: comparative advantage makes it irrelevant whether AI gets “better” than people at everything (contrasting Scott’s view, which is shared by Andrew Yang, Sam Harris, and others, with mine, with appearances by Matt Ridley and others).

    • viVI_IViv says:

      The magic of comparative advantage is that everyone has a comparative advantage at producing something. The upshot is quite extraordinary: Everyone stands to gain from trade. Even those who are disadvantaged at every task still have something valuable to offer

      If this is true, then why don’t we trade with chimps?

      Heck, forget chimps, take the hobo at the corner of the street. Does he have a comparative advantage? What does he have that is valuable to offer?

      • Matt M says:

        If this is true, then why don’t we trade with chimps?

        We do! Sometimes we provide them a sheltered simulated habitat in exchange for them providing us animals to view for entertainment purposes (zoos).

        Other times we agree not to destroy or harm their natural habitat in exchange for them doing much the same, but for a smaller set of people (nature photography, documentaries, etc.)

        Heck, forget chimps, take the hobo at the corner of the street. Does he have a comparative advantage? What does he have that is valuable to offer?

        The state has made it illegal for him to trade his labor at a fair market value, so we’ll never know.

        • viVI_IViv says:

          We do! Sometimes we provide them a sheltered simulated habitat in exchange for them providing us animals to view for entertainment purposes (zoos).

          Other times we agree not to destroy or harm their natural habitat in exchange for them doing much the same, but for a smaller set of people (nature photography, documentaries, etc.)

          This isn’t trade since it is very involuntary on the chimp’s part.

          Anyway, if the argument is “don’t worry about automation unemployment: you might not starve, if you are lucky you might end up in a zoo, or if you are very lucky in a national park preserving a small fraction of your natural habitat”, then it’s not a great argument.

          The state has made it illegal for him to trade his labor at a fair market value, so we’ll never know.

          ’cause in countries with no minimum wage there is no unemployment?

          • Matt M says:

            In countries with no minimum wage, unemployment is voluntary.

            (One could also argue that at this point, with the gig economy, unemployment is voluntary in the US already)

          • viVI_IViv says:

            In countries with no minimum wage, unemployment is voluntary.

            Voluntary in the sense that there are people who want to work but can’t find anybody willing to employ them, which contradicts the claim that comparative advantage can always prevent unemployment.

          • The claim is that comparative advantage means that one gains from trade, however low one’s productivity is. That isn’t inconsistent with the existence of some people who will starve to death if there is no trade, and will still starve to death if there is.

            But a more precise statement would be that an individual gains from his being able to trade with others. It is still possible that A loses as a result of B being able to trade with C — because that reduces the amount A can gain by himself trading with B.

            The argument that we always gain from trade is implicitly treating the country as a single individual. Moving to free trade results in net gains to Americans, but that doesn’t mean that all Americans gain.

            My standard explanation of the argument is that the U.S. has two technologies for producing cars. We can build them in Detroit or grow them in Iowa. The way you grow cars is by growing the raw material they are made out of, called “wheat.” You put the wheat on a ship, it sails out into the Pacific, and it comes back with Hondas on it. A tariff taxes that technology, protecting American auto workers from the competition of American farmers.

          • Aapje says:

            @DavidFriedman

            The claim is that comparative advantage means that one gains from trade, however low one’s productivity is.

            …if transaction costs are zero, which they are not.

          • John Schilling says:

            My standard explanation of the argument is that the U.S. has two technologies for producing cars.

            Three. We can sell title to Iowa farmland to Japanese carmakers and get Hondas in return.

    • whereamigoing says:

      If so, how come are there so much fewer horses than before the widespread use of cars? Shouldn’t all the horses have been “re-employed” in some other industry which became their comparative advantage?

  46. benito10 says:

    Could an AI bulldoze a virgin forest against human wishes?

    The AI would need an autonomous fleet of bulldozers and other industrial machines at its disposal. So let’s say that smart bulldozers exist; that they are potentially subject to centralized control; and that the AI is advanced enough to commandeer them.

    Suppose that the bulldozers are self-fueling, and that there is an existing set of infrastructure specifically created for that purpose. The bulldozers run on electric power; can navigate and drive themselves to power stations; and can connect to a power supply without human help.

    What would prevent people from removing sensors from the bulldozers, cutting wires, breaking hydraulic systems? Nothing, except that the bulldozers are “self-guarding” as part of their autonomous design, and the AI is powerful enough to deploy extrinsic resources, up to and including guns, robots, and even lawyers and money.

    Now the poor humans are left throwing rocks and spears, using trees as barriers, just like the Ewoks. All to prevent the making of a few trillion paper clips.

    • What would prevent people from removing sensors from the bulldozers, cutting wires, breaking hydraulic systems? Nothing, except that the bulldozers are “self-guarding” as part of their autonomous design, and the AI is powerful enough to deploy extrinsic resources, up to and including guns, robots, and even lawyers and money.

      A hyper-intelligent AI would have great advantages in any protracted combat situation, but humans could make things costly enough for it to try another more subtle way of getting what it wants, so it will probably use its great intelligence to trick us into doing something we think is of great benefit to us. The proportionality of its effectiveness in defeating humans at a task, compared to the gap in intelligence, is relative to the degrees of freedom which are even possible in the first place. If an RPG round is flying towards one of your bulldozers, you won’t fair much better doing 10 billion simulations a second versus 10. The evolution of a political situation has many more degrees of freedom.

      • benito10 says:

        I agree with this point about degrees of freedom.

        I did the little thought experiment about whether an AI could bulldoze a virgin forest mainly because of how implausible I find that outcome. Of course, that may just point up the limits of my imagination.

        The risks posed by advanced AI will be greatest where use of algorithms and codes is sufficient to achieve an outcome. AI will have more freedom then to direct things in ways not intended by humans. The risks will be lower where there are many independent physical variables and conditional requirements.

        So, an AI could use an existing fleet of drones for its own ends, but it is doubtful that it could load missiles onto drones by itself (absent machines built specifically for that purpose).

        “The evolution of a political situation” does involve many more degrees of freedom. But again, what are the physical checks?

        Opinion might be manipulated by deepfakes or algorithmic nudging or whatever. Or, false data or intelligence might be presented to decision makers. Voting can always be done by paper ballot if need be. Laws still have to be drafted, debated, and signed. There will still be a human chain of command in the military, for better or worse. Etc.

  47. alexvy86 says:

    Specifically on the topic of deepfakes, my intuition as to why they are much more dangerous than Photoshopped images and forged documents is that they can be much more convincing, especially when used for stuff that is not obviously fake (for some definition of “obviously”, maybe based on common knowledge about the most frequent uses of the particular forging method/technology) like a porn video of a politician. Yeah, show a Bernie Sanders supporter a photo of Bernie Sanders having sex with someone else (I’m sorry for the mental image to those who will immediately conjure it up :P) and many will probably reject it as fake without a second thought, knowing it can be Photoshopped; but show them something more credible (maybe something that implies he’s collaborating with a GOP politician? I’m not knowledgeable enough in politics to come up with a specific scenario) and maybe more people will start questioning him, even change their votes? Now do the same but with a video where people won’t only see a static depiction of Bernie Sanders, but a moving one, that talks like him, makes the same expressions… In this case, maybe even a video of him having sex is harder to dismiss than an equivalent Photoshopped picture. And again, if it was a video of something that is actually more credible, then how much effort would it be for people to feel sure that it is fake? My guess is much more than for photo or a document/text *supposedly* written by Bernie.

    AFAIK this is not yet possible, but let’s imagine that within a few years get to the point where we can make real-time deep fakes, and someone poses as the president in a video call between him and top military personnel, ordering a nuclear strike. I don’t see clearly how we can trust that those people will just dismiss it as an obvious fake.

    On a separate topic,

    “Brookings calls deepfakes ‘a threat to truth in politics'”.

    *chuckles* like guns are a threat to unicorns?

    • 10240 says:

      Deepfakes will only be more convincing until people know it’s a thing and stop seeing it as convincing, just like they know that a photo can be ‘shopped. The long term effect may be that “look, this politician said this terrible thing in private” will be a somewhat smaller part of politics, as they become more plausibly deniable. (Though they may still be corroborated by witnesses.) I don’t think that will be a huge problem.

      • alexvy86 says:

        I can agree with that long term effect on the news… I was thinking what the effect would be on video calls and initially thought maybe they would get “banned” or considered completely unreliable for any important purpose (like the example I gave above), although I guess the problem mostly lies with “impromptu” video; as long as both parties agree that they’ll be having a video conference at some specified time (or in the case of a public event, that it is previously announced), the risk of someone else just impersonating one of those two parties goes down.

    • mtl1882 says:

      I wonder if deepfakes will eventually kill off our televisual-dominant news structure. If it becomes that easy to fake videos, people will know about it pretty quickly, and it will significantly lessen the power of shortclip/soundbite news teasers. Although, given that the news is already so much out of context stuff, and even “is this video real?” teasers, without losing its power, maybe not. Maybe “Is this video of Trump doing X real? How about this one? He says these are deepfakes, and blames them on Y” will become the intro to the nightly news. I can see it all too clearly. But it might force us back into a slower, more contextual presentation of video with other information, since video would be simply too cheap and unreliable at that point.

      • 10240 says:

        I don’t think the main reason some people watch TV news instead of reading newspapers or news sites is that TV is considered stronger evidence. They simply like the format better. If a major newspaper reports that Trump said this-and-this, it’s rarely an outright fabrication, and I don’t think many readers will think it is likely to be a total fabrication—even though text has always been easy to fabricate. Likewise, nothing will change about TV news either. Most recordings of Trump saying something are press conferences and the like anyway, with many witnesses, rather than leaks that could realistically be fake.

        • mtl1882 says:

          I agree people aren’t sifting through the evidence, and that they just like to watch TV, but the appeal of the format does bear some relation to its content and other factors. Yes, right now readers don’t think things are total fabrications because they aren’t–it’s hard to totally fabricate video, much harder than text–that’s precisely what deepfakes would change. The eyeballs-driven “wait until you see this clip!” stuff gives the news an incentive to tease deepfakes to get people to tune in, under the guise of a fact check story, but not present them as real in the actual story. People would get annoyed by it and wise to it eventually–I’m not saying they would all become newspaper readers, but that the style based around playing quick clips would lose power because it would be *so* easy to manipulate. I don’t think it would affect pundit-style shows.

      • John Schilling says:

        I wonder if deepfakes will eventually kill off our televisual-dominant news structure. If it becomes that easy to fake videos, people will know about it pretty quickly, and it will significantly lessen the power of shortclip/soundbite news teasers.

        Televisual news doesn’t consist of people looking at videos and privately discerning from those videos what has happened. And it doesn’t consist of talking heads saying things that the audience would suspect were lies but for “see, now here is video proof!”. The videos are as often as not just another person talking, frequently a politician.

        If people don’t trust the talking head who recites the news on television, they don’t ask for photographic proof, they just change the channel to the one with the talking heads they do trust. In either case, the pictures and videos are just illustrations, occasionally to deliver information in graphic format, mostly to promote emotional engagement.

  48. noahyetter says:

    Russell walks us through an example where a robot gets great information that a human values paperclips at 80 cents – but the real preference was valuing them at 80 cents on weekends and 12 cents on weekdays.

    I get the lesson of the “paperclip maximizer” story, I really do, but something about it has always bugged me, and I think I just figured out what it is: no one discussing AI seems to have even heard of diminishing marginal utility. (Trivial point of evidence: at the time of this writing, zero extant comments on this post contain the string “diminish”.)

    Nothing is just “worth” 80 cents, period. Perhaps I have zero paperclips and I really need one, so the first one is worth 80 cents to me. If I needed one, the second one is probably worth very little, mostly insurance/optionality, so perhaps 5-10 cents. The third one even less, and by the time we’re looking at 10 or 20 the marginal value has hit zero. Note the extremely quick drop-off! We didn’t even get to the number of paperclips that come in a typical box before the marginal value completely disappears.

    So when I consider the thought experiment, and try to envision this misaligned AI “bulldozing virgin forests to create new paperclip mines”, it just seems ridiculous. Literally everything humans value exhibits diminishing marginal utility. The Nth unit of any good is worth strictly less than the N-1th. For most goods, for sufficiently large N, marginal value hits zero, and eventually goes negative. If your AI reward model doesn’t work the same way, you’re not even fucking trying.

    Now, it doesn’t really seem plausible that people working on this stuff would be capable of all the fancy math that’s way over my head, but have skipped/forgotten Econ 101. And yet this is always the sense that I get, that basic economics is just entirely missing from the way AI people look at the world (though paradoxically they seem to know fancy economics in the form of game theory and so on). So what am I missing? Why is it not extraordinarily obvious that the model’s expected value of the 10,000th paperclip would be negative, thus making the paperclip maximizer scenario trivially impossible?

    • 10240 says:

      The paperclip maximizer is a simple example, but there are a lot of ways a mechanistic interpretation of commands or training rewards could wreak havoc. Let’s say a translator AI is trained to provide the best possible translation in 10 seconds. It is designed in such a way that it cares about rewards for future translations as well as the current one, in order to make it learn during translation, and use what it learns during future translations. It has access to the internet so that it can access information to help with the translations.

      The developers expect it to just spin the CPU for 10 seconds. Instead, it hacks other machines and attempts to turn as much of the Earth into computers as possible in order to create the best possible translator program for future tasks.

    • Said Achmiz says:

      no one discussing AI seems to have even heard of diminishing marginal utility

      A search for keywords “diminish AND marginal AND utility” in Rationality: AI to Zombies seems to yield results. If Eliezer Yudkowsky doesn’t quality for “someone who discusses AI”, who does?

  49. jhertzlinger says:

    People who think they know what others really want more than those others do tend to be annoying. Should AIs imitate them?

  50. AL says:

    Russell argues you can shift the AI’s goal from “follow your master’s commands” to “use your master’s commands as evidence to try to figure out what they actually want, a mysterious true goal which you can only ever estimate with some probability”

    What leaps out at me about this approach is maybe it might be useful in helping humans understand others’ (and their own) drives: wants, needs, wishes.

  51. SCC says:

    names for some of the problems ///

    “The Sorceror’s Apprentice” (too many paperclips)
    “In pre-Roman times Libya was as fertile as Provence” (the incentives of AI and AI advocates scrape away the fertility of domains that are not AI-subservient)
    “we perish without angelic inspiration” (Proverbs 8)
    “Muad’dib!!!” (setting forward a chain of events that is inimicable to all you hold dear)
    “IYI”

    and my favorite

    “les bons élèves” in power (see, e.g., the first years of Haig’s generalship in WWI – I am too sad, knowing what I know, to cite later examples)

  52. sympathizer says:

    Why do I read so much about AI Risk on SSC, and nothing about climate risk? Do rationalists believe that climate risk is either in hand, or that they cannot change anything, or that it is not an existential risk?

    • Bugmaster says:

      This may be an uncharitable interpretation, but, from what I’ve seen, Rationalists believe that AI risk is either a). the most clear and present danger facing humanity today, or b). a serious and real danger that isn’t getting nearly enough attention, unlike some of the more imminent threats.

      You can sort of see it from their perspective: when the gray goo comes for you to dissolve your body into computronium, are you really going to worry about the temperature outside ?

    • sandoratthezoo says:

      I guess the charitable explanation is that lots of other people are talking about climate change, there’s more marginal good in elevating AI risk than adding one more voice of many to climate change.

      But the less-charitable and I-think-true explanation is that rationalists are sci-fi fans, and they are really excited about the sci-fi concept of AI risk, completely out of proportion to any relation it has to the real world.

    • whereamigoing says:

      Not sure whether I count as a “rationalist”, but personally I don’t have the skill set to work on climate change and I see a lot more people already working on it than people working on AI safety. And AGI is potentially an opportunity to slay Moloch once and for all — why wouldn’t an SSC reader care about that? (I’m fairly agnostic about when AGI will be created, but I think there’s at least a 10% chance that it’ll happen this century.)

    • John Schilling says:

      Why do I read so much about AI Risk on SSC, and nothing about climate risk?

      Because you aren’t paying attention? The word “climate” occurred eleven times in the last open thread, all in discussions or meta-discussions of climate change. The issue has been discussed periodically, sometimes at significant depth, for as long as I have been reading.

      If your complaint is that we aren’t talking about climate risk enough, then A: you probably shouldn’t have phrased it as an absolute, and B: how much is enough for your taste, and C: why should your tastes drive the discussion? We’ve got at least one, maybe two orders of magnitude more computer scientists around here than we do climatologists or energy-technology experts; we’re going to talk about what interests us.

    • Some rationalists believe it is not an existential risk, but I think the main reason is that it is a risk widely discussed at present, hence less interesting and less worth adding additional discussion to than AI risk.

      Unlike both AI risk and climate risk, cancer risk and heart attack risk currently kill large numbers of people every day. Things can be done to reduce both. But you don’t see much discussion of either here.

      • SCC says:

        You need to spend more time with people who are devastated at the loss of what we had fifty years ago.

        • I can’t tell what you are referring to. Your comment could as easily be Plumber complaining about the loss of labor unions or a conservative complaining about the destruction of traditional sexual ethics.

          If it is about climate, what we had fifty years ago was global temperature cooler than at present by a little less than one degree C.

          If about the condition of the poor, global extreme poverty about five times as high a fraction of global population as it is now.

          If …

          • SCC says:

            I was referring to the loss of biodiversity. Nothing more.

            Sorry I was not more clear.

            Looking at the debate over whether GW is anthropogenic and manageable, it is clear that most informed people – maybe 90 percent to 10 percent – are saying GW is mostly AGW. That is a real debate, carried on by people who are very intelligent and who spend much of their intellectual energy on it. So let’s say that is a 90 to 10 debate.

            Looking at the debate over whether the loss of biodiversity over the last 50 years has been catastrophic, the numbers are more like 99 percent to one percent. (99 percent – it has been catastrophic —- 1 percent – no it has not been catastrophic).

            To the extent I have studied the issues, I have slightly more sympathy for the AGW alarmists than for those who disagree with them on scientific grounds, while recognizing there are shysters on both sides, but with respect to biodiversity, I have almost no respect for anyone who claims the biodiversity losses of the last 50 years have not been catastrophic.

            BTW, thanks for replying to my comment. You are one of the most interesting people on the internet.

          • Are you linking the loss of biodiversity to climate? My impression is that the IPCC doesn’t make that claim for what has happened so far, that most of it is from human land use.

          • SCC says:

            Nothing to do with climate change.

            Overfishing by countries that pride themselves on their fishing industry, killing every helpless whale, killing all the passenger pigeons and not even leaving one left for future generations, reducing the number of bison on the Great Plains from a few million to a few dozen, that sort of thing.

            I am in the sad situation of knowing that most of the climate change alarmists have no idea about what they are talking about but also knowing that their general impression that people do not care about the environment is perfectly correct.

  53. HighResolutionSleep says:

    Hypothesis: it’s possible for the growth of machine intelligence to be both exponential (perhaps even hyperbolic) and also be thoroughly unimpressive on the timescale of human civilization, let alone frightening.

    Let’s consider a hypothetical artificial intelligent agent. How long it has existed is less important than the fact that, at this moment, it has matched the capacity of the most intelligent human to ever walk the Earth (take your pick). Humanity no longer carries the torch for most intelligent creature on the planet. Consider the chart of the maximum terrestrial intelligence over time, starting from the beginning of life itself. The line trends gently upward over time, accelerates at the Cambrian Explosion, gently accelerating toward the Holocene before going utterly vertical some time in the 21st century. That’s right: this fearsome AGI doubles itself every thousand years!

    Wait, a thousand years? I submit to you that an AI becoming doubly intelligent than humanity in a millennium would not be impressive at all to watch. And yet, everything stated above would still be true: it would represent the greatest ever discontinuity in the increase of intelligence on planet Earth, an explosion by any name, an unambiguous and inarguable line delineating the border between the era of animal intelligence and the era of machine intelligence—and it would be boring. Nigh on imperceptible over the scale of a single human life.

    Not exactly the world-eater we were promised.

    tl;dr what if AI go foom, but it’s actually a total snore

    • Your final line reminds me of one I heard somewhere:

      There is no a priori reason to believe that the ultimate nature of reality, if discoverable by human reason, will prove interesting.

    • Loriot says:

      This reminds me of a common objection I have to the FOOM hypothesis. “If you allow people to go school, they’ll get smarter, and as people become smarter, they’ll invent new and better forms of school to make everyone even smarter and pretty soon everything will be infinitely intelligent. We need to invest in School Safety research!”

      IMO, we’re already living in The Singularity.

      • whereamigoing says:

        This is kind of true — if people got smart enough to invent a new “school” where you can connect yourself to the internet and copy Wikipedia into your brain or where your brain is surgically modified to increase IQ, that would make the next generation a lot smarter.

        So human progress is also exponential, but as HighResolutionSleep wrote, exponential progress can be slow at first, in the human case because we’re starting from a stage where we don’t understand and for the most part can’t directly modify our own hardware. AGI wouldn’t have this problem, because to create AGI, we would have to be capable of modifying its internals, and by hypothesis the AGI would be capable of doing at least the things we are capable of.

  54. Artischoke says:

    >extreme scenarios about the far future are more defensible than even weak claims about the present that are ruled out by the evidence.

    To me AI risk is the rationalist communities version of Pascals wager: You select a scenario we can say very little about with certainty so we cant rule out extreme payoffs in that scenario (ai destroying humanity or heaven exists). Then you focus on the potentially extreme payoff in order to look over our inability to put a probability on it. If you believe this kind of speculation is enough justification for major decisions (like leading a pious life or focusing ai research on ai risk) I’ve got a proposition for you involving your wallet and my personal ability to secure your place in heaven/your protection from Skynet.

    What I do think this scenario justifies is an allocation of fringe resources if they might eliminate an extreme risk. So I agree that we should get a couple of bright people to start working on preliminary aspects of the problem as the quote above goee. We did and they are. So I think we talk too much about AI risk here. In my mind something like 2% of our discussion time would be more appropriate. But when it comes to the most important risks I think something like climate change or more immediate AI issues like the safety of autonomous drivers is much much more important since there is much better evidence there that a problem might actually exist.

    • 10240 says:

      The probability one of the specific major religions is true can be considered negligible on the basis that the space of possible religions is enormous. On the other hand, I think it’s reasonable to put a non-negligible probability on something that’s a natural extrapolation from current developments, even if we might still hit an insurmountable obstacle. Your argument from inability to put a probability on something could have equally been used a few decades ago to dismiss the possibility that an AI that recognizes things would be developed—yet if that event had had major risks, we’d be fscked now.

      • SCC says:

        “We” ?

      • Loriot says:

        The God AIs that dominate the rationalsphere discourse aren’t really a plausible extrapolation of current trends though.

        The idea that some random person is going to summon an ominpotent demon in their basement one day and we have to make sure that they use the right magic words when summoning it so it doesn’t take over the world seems pretty ludicrous to me. That model of AI is based on storytelling tropes, not any sort of plausible extrapolation of how real world AI research is conducted (hint, it doesn’t involve people just randomly stumbling on “I win” buttons). Personally I find it unlikely that reality has instant win buttons.

        • 10240 says:

          Who said it was a random person in a basement, rather than organized development teams? Who said it was an instant hit button? The main worry is well-funded development teams that don’t pay sufficient attention to safety (which may be pretty hard to ensure).

          I’d say human-level AI is a reasonable extrapolation from current developments. Recognizing objects seems intractable in the sense that no one can describe an explicit algorithm to do it, yet AIs can be trained to do it. General cognition doesn’t seem that much more intractable to me.

          • Loriot says:

            When people talk about “AI risk” they mostly seem to be concerned with how to prevent an AI from suddenly taking over the world, and ascribe basically unlimited power to the hypothetical AI. Hence the term “instant win” button. If you look around, I bet you can find AI risk proponents using that very language (with the assumption that inventing an omnipotent AI and finding a way to constrain it your bidding is the best way to accomplish any possible aim).

            If you instead think that AI development will look like a large variety of systems that gradually increase in capabilities, most of the “AI risk” stories disappear. What you are left with are “unintended consequences of emergent technology” stories, like social media radicalizing people or algorithmic bias or whatever.

          • 10240 says:

            @Loriot The hypothesis is, generally, is one of gradual improvement of AIs by development teams until they reach approx. human-level intelligence. At that point, it suddenly takes off because
            (1) It can be run in many copies, and lots of human-level intelligences working towards one goal are pretty powerful.
            (2) Being as smart as a human, it can substitute for an AI researcher, so it can recursively self-improve. Sudden, exponential growth starts once there are more AI AI researchers than human AI researchers working, which can be easily achieved if we can run many copies of it (or it makes copies of itself).

            So it’s indeed an instant win (or instant loss, depending on how it works out and in whose perspective), but not some random person stumbling upon a button, but gradual development by AI researchers up to the point of reaching approx. human-level intelligence, at which point it goes foom.

          • John Schilling says:

            The hypothesis is, generally, is one of gradual improvement of AIs by development teams until they reach approx. human-level intelligence. At that point, it suddenly takes off because
            (1) It can be run in many copies, and lots of human-level intelligences working towards one goal are pretty powerful.

            If it’s the product of an AI development team, rather than a lone hacker in a basement, then it’s almost certainly running on a large, expensive, custom supercomputing cluster, which means it can only run in one place. Or maybe a very few places, all of which are run by the sort of people who will notice and care when half their computronium is busy doing stuff they aren’t paying for.

            (2) Being as smart as a human, it can substitute for an AI researcher, so it can recursively self-improve.

            If it’s the result of an AI development team, being able to substitute for an AI researcher doesn’t make for a whole lot of recursive self-improvement, or even adaptability to new environments. And no, it won’t be as smart as a human but a thousand times faster, because even if Moore’s Law continues to hold, we’ll reach the point where the most lavishly-funded AI development teams can barely afford the computronium to run an AGI at one-quarter human speed, twenty years before they can run one at a thousand times human speed.

            Too much of the AI risk, hard-takeoff discussion really does implicitly assume that something on the order of a high-end commodity PC will be able to run an AGI at many times human speed, as soon as some clever hacker finds the magic code to write in their basement. You can say “well, of course it will really be a team…”, but you’re not thinking through the implications of that.

          • 10240 says:

            @John Schilling : Answers I wrote to similar arguments:
            here (first paragraph after the second quote) and here.

        • yli says:

          > That model of AI is based on storytelling tropes

          No, it’s based on reasoning like this https://intelligence.org/files/IEM.pdf

          > ominpotent

          No, just vastly more potent than humans

    • anton says:

      That’s my impression too. The people worrying about this are very smart and know about Pascal’s wager so I’m sure they should know why it’s not the same, but I’ve never seen it articulated. I have seen expected value calculations involving very large numbers, it is my understanding that this is meaningless unless your instruments are sensitive enough to distinguish correspondingly small numbers for the probabilities. While the possibility of AGI may not as small as the large numbers involved it is doubtful to me that a dollar donated right now to AGI safety research helps in a more than a very small way. This is one of the two sources of indeterminacy, the other I think I’ve heard from Scott Aaronson: What if AI safety research has a large negative cost (say, by delaying deployment)? It could even happen that its expected value is negative in this case.

      This just means that we don’t know enough to quantify its efficiency. This does not necessarily mean that it’s a waste of time, I believe most of academic research can be characterized as work of unknowable efficacy and someone has to do that, so if your intellectual curiosity is turned towards that then you should pursue it. I just dislike it when people say this is an effective altruism cause.

  55. broblawsky says:

    I think the most immediate short-term AI risk is some kind of disruption of financial markets caused by algorithmic trading. I sometimes suspect we’re already there, and we just haven’t seen the real downside yet.

    • Loriot says:

      I sometimes suspect we’re already there

      We are already there. Remember the Flash Crash? Algorithmic trading mostly makes markets more efficient, but it does have new and exciting failure modes, and I’m sure we’ll discover more in the future.

  56. Robert Jones says:

    Have the people who said there were WMDs in Iraq lost status?

    I have the impression that being wrong about WMDs was a significant element in discrediting New Labour.

  57. timujin says:

    > Or: suppose the AI starts trying to convert my dog into paperclips. I shout “No, wait, not like that!” and lunge to turn it off. The AI interprets my desperate attempt to deactivate it as further evidence about its hidden goal – apparently its current course of action is moving away from my preference rather than towards it. It doesn’t know exactly which of its actions is decreasing its utility function or why, but it knows that continuing to act must be decreasing its utility somehow – I’ve given it evidence of that. So it stays still, happy to be turned off, knowing that being turned off is serving its goal (to achieve my goals, whatever they are) better than staying on.

    This sort of defeats the purpose of a superintelligent AI. The whole point of being smart is that you can better tell apart good decisions from bad decisions. For a human to be able to “moderate” all the AI’s decisions and tell him how well it’s doing, he would have to be at least that smart himself.

    • Simon_Jester says:

      Until and unless you can confirm that your AI is so smart that it fully understands everything about humans- and in the “continuous extrapolated volition” of what humans would want-to-want-to-want…

      The AI is not ready to make decisions without feedback from the humans affected by those decisions.

      You’re more mature and intelligent than your child, but “your child is crying” is still relevant and necessary feedback that you need to be mindful of if you’re a good parent. If your parenting strategy is making your child cry all the time, that’s evidence in favor of “you need to change your strategy.” You’re smarter than your dog, but if you do something that makes your dog wail in pain, you probably stop doing it unless there is a damn good reason.

      So we’re looking at an extended period of AGI learning about how to satisfy people, not the instant entry of incompetnt robots into the home…

  58. Matt M says:

    So, at this point, you’re basically programming an AI to follow the revealed preferences of humans, rather than their explicitly stated ones. Which I think I’m okay with, because I’m a big believer in revealed preferences. But this is hardly uncontroversial.

    Right here, on SSC, we’ve had a lot of debates recently on this very topic! Is it fair to say that an overweight person “prefers to be fat?” (After 500 posts, the consensus was “maybe.”) So what happens if you give a “weight loss coach AI” to a fat person? Perhaps the AI decides to just observe at first, for some period of time, before making any recommendations. It watches the person. It notices that the person finds eating carbs and fats enjoyable, and finds exercising to be non-enjoyable. What should it recommend? A strict diet and rigorous exercise? On what basis? Even if the person could explicitly command the AI to override its natural tendency to respect revealed rather than stated preferences, to what extreme could the AI go? Say it does recommend a strict diet… and then you wake up at 3 AM and really want a piece of cake. Can you override it again and force it to allow you to eat the cake?

    If you think “AI bias” is hotly debated now, just wait until stuff like this happens! “I went to the unemployment office today, and the AI job counselor told me that based on its observations of my behavior, I actually prefer to be poor! That based on my past behavior, it recommends I get a part-time job as a cashier at Wal-Mart, then quit after three weeks because I had a bad hangover and didn’t feel like going in that day.” Yeah, that’ll go over great with the voting masses…

    • 10240 says:

      If the AI in question is not a superintelligence, we can always reprogram it if it doesn’t do what we want. If it’s a superintelligence, it can probably find out a way to have people eat a lot and not get fat. So your examples are not very relevant. There could be interesting examples in the superintelligence case (e.g. you did heroin at some point, so it wireheads you).

      Even then, revealed preference as you use the term is not the same as “what you want it to do right now”.

      • Matt M says:

        There could be interesting examples in the superintelligence case (e.g. you did heroin at some point, so it wireheads you).

        Indeed. In the case of an AI programmed to follow our revealed preferences, the risk shifts from “the AI will wirehead itself” to “the AI will wirehead all of humanity.” It’d basically be like the Matrix, only in a truly benevolent form (they wouldn’t be harvesting us for our energy, they would quite literally just be giving us what we want).

        Even then, revealed preference as you use the term is not the same as “what you want it to do right now”.

        No, it’s “what you appear to want, based on your behavior rather than your expressions.” Telling the AI “observe people and do what appears to make them happy” is basically the same as telling it to prioritize revealed over stated preferences.

        • Simon_Jester says:

          One key part of refining this kind of preference-learning technique will be figuring out how to use some weighted average of stated, consensus, and revealed preferences. It’s not (or shouldn’t be) ALL about doing a psychoanalytic deep dive into “but what do people really want” while ignoring what they say they want- but the converse is also true.

          This isn’t entirely binary- as I’ve said elsewhere, real humans do this all the time. Very few humans are so literal-minded that they just bulldoze past everything except what is explicitly stated; very few humans are so nonliteral-minded that they get caught in recursive loops of angsting over figuring out what people really want.

        • 10240 says:

          No, it’s “what you appear to want, based on your behavior rather than your expressions.”

          Using your example: “Help me get a thinner waist.”

          Request interpreted mechanistically: “Cut off my belly with a knife.”
          What you want it to do: “Try to make me motivated to eat less in non-invasive ways.”
          Revealed preference: “Give me a hamburger.”

          As I interpret the proposal, we’d be shooting for the middle one.

          • Matt M says:

            Request interpreted mechanistically: “Cut off my belly with a knife.”
            What you want it to do: “Try to make me motivated to eat less in non-invasive ways.”
            Revealed preference: “Give me a hamburger.”

            As I interpret the proposal, we’d be shooting for the middle one.

            Indeed. And this will probably work just fine for the vast majority of simple use cases.

            But as always, this debate gets interesting at the edge cases.

            Consider this scenario: When you download the “weight loss AI app”, it tells you that to “enhance your experience” it wants you to give it control of your grocery ordering app, and your refrigerator app. While observing you, the AI notes that you do seem to posses a true and genuine desire to lose weight, but you seem to lack the self-discipline necessary to stick to a diet. So the AI suggests (and you agree!) to impose some restrictions. It will control your grocery app, to order you healthy food only, and you will not be able to override it. Further, it will control your refrigerator door, which will lock at 10PM and only unlock at 7AM, so you can’t indulge any late night cravings.

            So what, exactly, happens when you smuggle in some contraband cake, and you really truly want to cheat on your diet and eat it at 2AM? When you shout at the AI, “Listen buddy, I know what I agreed to. I don’t care. I am the human principal and you are my agent. I formally insist you open this fridge and let me eat this cake right now!!!”

            Does the AI comply (thus rendering it largely useless, as you fail your diet in the same method you failed all previous diets), or does it hit you with “I can’t let you do that Dave” and refuse to follow your explicit orders?

            Do you think the average consumer/government/tech journalist would look favorably upon AIs that work “for us” but that can also override our explicit wishes when they deem it’s “really in our best interest”? And how exactly does “An AI that can do what it thinks is best even if we are shouting for it not to do that” differ from Clippy, exactly?

            If you somehow convince Clippy that what you really do truly want is the most possible paperclips, does he have the right to keep doing that even as you scream, “Wait! I didn’t mean for you to grind up my bones to make paperclips! I demand you stop right now!”

          • 10240 says:

            @Matt M You present a situation where at one point you want* it do force it to (not) do something at a later point, even it at a later point you want* it to allow you to do it. The two options are for it to act according to the initial request, or to act according to the later wish. Neither seems particularly dangerous. The first option may look bad politically, but it’s not an existential risk. The second option would be approx. like today.

            If you somehow convince Clippy that what you really do truly want is the most possible paperclips, does he have the right to keep doing that even as you scream, “Wait! I didn’t mean for you to grind up my bones to make paperclips! I demand you stop right now!”

            If you don’t actually want* a paperclip maximizer scenario, it won’t have very high confidence that that’s what you want. Initially it might think you do, but with low confidence, so the later screams override it.

            * in the sense of the middle variant in my last comment

  59. It strikes me that too much of this discussion is about the danger of a superintelligent AI destroying us, and too little about what the world will be like if there are a lot of beings, programmed computers, in it that are substantially smarter than we are.

    • Igon Value says:

      I think I agree. It seems to me that the most likely pessimistic scenario is that GAI instances will make humans irrelevant, not that they will destroy humans out of malice or stupidity or miscalibrated reward functions.

      If AI agents are smarter than humans only because they can reason x times faster (x=10, 100, 1000,…), they still will make humans mostly irrelevant, they still will be to us what we are to chimps.

    • 10240 says:

      I expect that if we develop general AI, and we avoid an unfriendly AI disaster, then we will have friendly AI that will speed up scientific and technological progress. That will shortly lead to the development of mind uploading, after which we will find ways to improve our own intelligence to match the AIs. The line between minds originating from uploaded humans, and completely artificial ones, will blur.

      • Igon Value says:

        I expect that the line will blur because the uploaded humans will be “improved” so much that they will not be human anymore.

        If you knew someone who is similar to you in every respect except that they move, talk, think, behave, etc., 100 times faster than you, you shouldn’t expect them to become your friend; they would have no more interest in you than you do in chimpanzees.

        And once we upload ourselves to the matrix, I fully expect that some agents (either entirely artificial or originally human) will be able to afford higher CPU speeds than others. (Maybe they started with more wealth, or maybe they created something of value in the matrix.)

        I’m speculating that slower agents will become wholly redundant and irrelevant. Frankly, this is already what we observe in the current world, the matrix will just make it much more obvious.

        Other hominids had evolved intelligence, but only one species lasted…

        I’m not sure this is so bad, by the way. Maybe humans who stay humans will be the equivalent of chimps today, but are chimps today unhappy? Maybe “slower” “unimproved” humans will be satisfied with the scraps of CPU cycles allocated to them as long as they can play games and watch 3-d porn.

        • kaathewise says:

          relevant: “Permutation City”

          • Igon Value says:

            Wow, thanks @kaathewise, this looks awesome (and yes very relevant).

            From wiki:

            At the opposite end from the wealthy Copies are those who can only afford to live in the virtual equivalent of “Slums”, being bounced around the globe to the cheapest physical computing available at any given time in order to save money, while running at much slower speeds compared to the wealthy Copies. Their slowdown rate depends on how much computer power their meager assets can afford, as computer power is traded on a global exchange and goes to the highest bidder at any point in time. When they cannot afford to be “run” at all, they can be frozen as a “snapshot” until computer power is relatively affordable again. A Copy whose financial assets can only generate sufficient interest to run at a very slow rate is stuck in a rut because he/she/it becomes unemployable and is unable to generate new income, which may lead to a downward spiral.

        • 10240 says:

          This depends on a few questions:

          Will we reach the physical limit of execution speed in such a way that a maximum speed processor can be produced with a reasonable amount of resources? If yes, one may not have to be particularly rich to afford at least one maximum speed processor, at least in terms of linear speed.

          What are the limits of parallelizability of intelligence and consciousness? Even if everyone can afford a maximum speed processor, there may be differences in how much parallelism one has (how many cores, or how many neurons in a neural network etc.). Conscious thought feels “single-threaded” to me, but artificially enhanced minds may use different models. A possible limit on useful parallelism is the point where signals take a significant time at light speed between different parts of the processor relative to the execution speed.

          What will happen to people’s reproduction drive? The Malthusian subsistence economy envisioned by some only happens if “people” continue to reproduce fast even at a high cost. Otherwise an abundance economy where everyone can afford a maximum-speed processor (or is given one for free) is more likely.

          • Igon Value says:

            Yes, I agree, there are many variables. I’m guessing that in such a world “reproduction” may mean “extension of oneself”. I wouldn’t mind having copies or extensions of myself doing other things while I’m reading SSC.

            But again, even if speed isn’t the issue (and I’m pretty sure it would be), being able to do 25 things in parallel while lower class folks can do just one and badly so, would lead to pretty much the same problem.

            It may not be so different from what we already have. It seems that some people (e.g. Scott) are already able to be vastly more productive than others. But I already think that the present world is alienating a segment of the population. I believe that AI would make it much marked. The “1%” would be artificial, that’s all.

  60. JPNunez says:

    I do remember cancer researchers saying that there cannot be a cure for cancer after all, mostly cause cancer is not a single sickness but a huge variation of them so a single cure is impossible. So that ain’t really a counterargument against strong AI but it tells me it’s not unthinkable that AI researchers can come up clean that we can’t really build HAL 9000.

  61. MostlyCredibleHulk says:

    Something I’m missing in the whole thing here:

    If it’s important to control AI, and easy solutions like “put it in a box” aren’t going to work, what do you do?

    If we ever succeed in creating human-level AI, what makes one think we have the right to “control” it at all? I mean, if we have superhuman or even human-level consciousness, and we “control” it to the point it can’t but do our bidding, doesn’t it look some kind of hyper-slavery (at least human slaves could act autonomously, even if they would be punished and maybe killed for it)? Aren’t we by now supposed to have arrived at the point in our culture that we think it’s a bad thing to do?

    On the other hand, humans are perfectly capable – and frequently willing – to destroy other humans. Some are more adept at this that others. Still, humanity seems to be on a pretty smooth ride to 8 billions (yes, I know about climate worries and such, but I think we’ll manage to handle it one way or another). So having intelligences around that we don’t entirely control and even having a lot of them that are smarter than us (well, at least smarter than me, Scott may replace “a lot” with “a very small number”) doesn’t seem to be an extinction-level problem. I mean, it’s possible that some very smart humans decide to destroy human race, and they certainly would have a lot of tools for their mission’s success. And yes, we have people that take care of that risk, so I agree that it makes sense to have people take care of AI-based risk in the same way (maybe not exactly the same but you get the idea). But it’s not something people spend a lot of day time worried about, do they?