When is it Optimal to Launch a Friendly AI?

Suppose we have coded what we believe to be a Friendly AI. We then face a moment like in the film Pi: “12:50, press Return” (link). In other words, we face three options: Launch the AI now, try improving it further and launch it later, or never launch it. Why wouldn’t we launch right away or perhaps ever? Maybe we made a mistake in the code somewhere, or maybe we don’t need to launch and would be better off not chancing it.

Ideally, we’d have an infinite amount of time to make sure we got the code right, but in practice, we don’t have this luxury: The longer we wait, the more risk we run of an existential event (including an unFriendly AI launch from a different AI project) occurring while we wait. Our hesitation could save us, but it could also spell our doom.

So, when should we push the button?
Continue reading

Artificial Intelligence, Cognitive Biases, and Global Risk

If you have not read “Cognitive Biases Potentially Affecting Judgment of Global Risks” and “Artificial Intelligence as a Positive and Negative Factor in Global Risk“, I recommend reading them. They are excellent book chapters from SIAI Research Fellow Eliezer Yudkowsky, forthcoming in the edited volume Global Catastrophic Risks from Oxford University Press (Nick Bostrom and Milan Cirkovic eds.). If you do not have time to read both, I recommend reading the conclusion of the first, repeated below, and reading the second in its entirety.

Conclusion of “Cognitive Biases Potentially Affecting Judgment of Global Risks”:

Why should there be an organized body of thinking about existential risks? Falling asteroids are not like engineered superviruses; physics disasters are not like nanotechnological wars. Why not consider each of these problems separately?

If someone proposes a physics disaster, then the committee convened to analyze the problem must obviously include physicists. But someone on that committee should also know how terribly dangerous it is to have an answer in your mind before you finish asking the question. Someone on that committee should remember the reply of Enrico Fermi to Leo Szilard’s proposal that a fission chain reaction could be used to build nuclear weapons. (The reply was “Nuts!” – Fermi considered the possibility so remote as to not be worth investigating.) Someone should remember the history of errors in physics calculations: the Castle Bravo nuclear test that produced a 15-megaton explosion, instead of 4 to 8, because of an unconsidered reaction in lithium-7: They correctly solved the wrong equation, failed to think of all the terms that needed to be included, and at least one person in the expanded fallout radius died. Someone should remember Lord Kelvin’s careful proof, using multiple, independent quantitative calculations from well-established theories, that the Earth could not possibly have existed for so much as forty million years. Someone should know that when an expert says the probability is “a million to one” without using actuarial data or calculations from a precise, precisely confirmed model, the calibration is probably more like twenty to one (though this is not an exact conversion).

Any existential risk evokes problems that it shares with all other existential risks, in addition to the domain-specific expertise required for the specific existential risk. Someone on the physics-disaster committee should know what the term “existential risk” means; should possess whatever skills the field of existential risk management has accumulated or borrowed. For maximum safety, that person should also be a physicist. The domain-specific expertise and the expertise pertaining to existential risks should combine in one person. I am skeptical that a scholar of heuristics and biases, unable to read physics equations, could check the work of physicists who knew nothing of heuristics and biases.

Once upon a time I made up overly detailed scenarios, without realizing that every additional detail was an extra burden. Once upon a time I really did think that I could say there was a ninety percent chance of Artificial Intelligence being developed between 2005 and 2025, with the peak in 2018. This statement now seems to me like complete gibberish. Why did I ever think I could generate a tight probability distribution over a problem like that? Where did I even get those numbers in the first place?

Skilled practitioners of, say, molecular nanotechnology or Artificial Intelligence, will not automatically know the additional skills needed to address the existential risks of their profession. No one told me, when I addressed myself to the challenge of Artificial Intelligence, that it was needful for such a person as myself to study heuristics and biases. I don’t remember why I first ran across an account of heuristics and biases, but I remember that it was a description of an overconfidence result – a casual description, online, with no references. I was so incredulous that I contacted the author to ask if this was a real experimental result. (He referred me to the edited volume Judgment Under Uncertainty.)

I should not have had to stumble across that reference by accident. Someone should have warned me, as I am warning you, that this is knowledge needful to a student of existential risk. There should be a curriculum for people like ourselves; a list of skills we need in addition to our domain-specific knowledge. I am not a physicist, but I know a little – probably not enough – about the history of errors in physics, and a biologist thinking about superviruses should know it too.

I once met a lawyer who had made up his own theory of physics. I said to the lawyer:
You cannot invent your own physics theories without knowing math and studying for
years; physics is hard. He replied: But if you really understand physics you can explain it to your grandmother, Richard Feynman told me so. And I said to him: “Would you advise
a friend to argue his own court case?” At this he fell silent. He knew abstractly that
physics was difficult, but I think it had honestly never occurred to him that physics might
be as difficult as lawyering.

One of many biases not discussed in this chapter describes the biasing effect of not knowing what we do not know. When a company recruiter evaluates his own skill, he recalls to mind the performance of candidates he hired, many of which subsequently excelled; therefore the recruiter thinks highly of his skill. But the recruiter never sees the work of candidates not hired. Thus I must warn that this paper touches upon only a small subset of heuristics and biases; for when you wonder how much you have already learned, you will recall the few biases this chapter does mention, rather than the many biases it does not. Brief summaries cannot convey a sense of the field, the larger understanding which weaves a set of memorable experiments into a unified interpretation. Many highly relevant biases, such as need for closure, I have not even mentioned. The purpose of this chapter is not to teach the knowledge needful to a student of existential risks, but to intrigue you into learning more.

Thinking about existential risks falls prey to all the same fallacies that prey upon thinking-in-general. But the stakes are much, much higher. A common result in heuristics and biases is that offering money or other incentives does not eliminate the bias. (Kachelmeier and Shehata (1992) offered subjects living in the People’s Republic of China the equivalent of three months’ salary.) The subjects in these experiments don’t make mistakes on purpose; they make mistakes because they don’t know how to do better. Even if you told them the survival of humankind was at stake, they still would not thereby know how to do better. (It might increase their need for closure, causing them to do worse.) It is a terribly frightening thing, but people do not become any smarter, just because the survival of humankind is at stake.

In addition to standard biases, I have personally observed what look like harmful modes of thinking specific to existential risks. The Spanish flu of 1918 killed 25-50 million people. World War II killed 60 million people. 10^7 is the order of the largest catastrophes in humanity’s written history. Substantially larger numbers, such as 500 million deaths, and especially qualitatively different scenarios such as the extinction of the entire human species, seem to trigger a different mode of thinking – enter into a “separate magisterium”. People who would never dream of hurting a child hear of an existential risk, and say, “Well, maybe the human species doesn’t really deserve to survive.”

There is a saying in heuristics and biases that people do not evaluate events, but descriptions of events – what is called non-extensional reasoning. The extension of humanity’s extinction includes the death of yourself, of your friends, of your family, of your loved ones, of your city, of your country, of your political fellows. Yet people who would take great offense at a proposal to wipe the country of Britain from the map, to kill every member of the Democratic Party in the U.S., to turn the city of Paris to glass – who would feel still greater horror on hearing the doctor say that their child had cancer – these people will discuss the extinction of humanity with perfect calm. “Extinction of humanity”, as words on paper, appears in fictional novels, or is discussed in philosophy books – it belongs to a different context than the Spanish flu. We evaluate descriptions of events, not extensions of events. The cliche phrase end of the world invokes the magisterium of myth and dream, of prophecy and apocalypse, of novels and movies. The challenge of existential risks to rationality is that, the catastrophes being so huge, people snap into a different mode of thinking. Human deaths are suddenly no longer bad, and detailed predictions suddenly no longer require any expertise, and whether the story is told with a happy ending or a sad ending is a matter of personal taste in stories.

But that is only an anecdotal observation of mine. I thought it better that this essay should
focus on mistakes well-documented in the literature – the general literature of cognitive
psychology, because there is not yet experimental literature specific to the psychology of
existential risks. There should be.

In the mathematics of Bayesian decision theory there is a concept of information value – the expected utility of knowledge. The value of information emerges from the value of whatever it is information about; if you double the stakes, you double the value of information about the stakes. The value of rational thinking works similarly – the value of performing a computation that integrates the evidence is calculated much the same way as the value of the evidence itself. (Good 1952; Horvitz et. al. 1989.)

No more than Albert Szent-Gyorgyi could multiply the suffering of one human by a hundred million can I truly understand the value of clear thinking about global risks. Scope neglect is the hazard of being a biological human, running on an analog brain; the brain cannot multiply by six billion. And the stakes of existential risk extend beyond even the six billion humans alive today, to all the stars in all the galaxies that humanity and humanity’s descendants may some day touch. All that vast potential hinges on our survival here, now, in the days when the realm of humankind is a single planet orbiting a single star. I can’t feel our future. All I can do is try to defend it.