Friendly AI is bunk

September 9th, 2006 · 16 Comments

One of my recent posts to my old blog has generated quite a bit of interest. As I wasn’t able to reply to the comments due to the blog being broken, I’m going to repost it here along with the comments.

In case the title of the post gives the wrong impression, I’d also like to clarify that I think that trying to build AI’s so that they don’t do nasty things is obviously a good idea. My problem is that I think that the current efforts are woefully inadequate. The only way to seriously deal with this problem would be to mathematically define “friendliness” and prove that certain AI architectures would always remain friendly. I don’t think anybody has ever managed to come remotely close to doing this, and I suspect that nobody ever will. Even worse, I suspect that the only stable long term possibility is a super AGI that is primarily interested in its own self preservation. All other possibilities are critically unstable due to evolutionary pressure.

One other thing. The most active place on the internet for discussing Friendly AI is the SL4 email list. Ironically, it must be one of the most hostile email lists on the internet with frequent flame wars and people being banned from the list. The moderation system consists of so-called “list snipers” whose job it is to ban discussions that they don’t like. If these people are experts in friendliness… lord help us.

Anyway, here’s my original post:

Yesterday I went down to Genoa in Italy to meet up with Ben and Izabela Goertzel and talk AI. One of the things we discussed was friendly AI, that is, the idea that you can build AIs in such a way that they will not do nasty things — like killing off the human race. I think the idea is an impossible dream, and it seems to me that both Ben and Izabela are pretty skeptical of it too, however Ben at least attempted to put up the other side of the argument. I thought I might try to list my main arguments against the possibility of friendly AI here.

Tough love or killing us with kindness. Does “friendliness” have any meaning? If a super AI decided to start making all our wishes come true, might we just end up killing ourselves or at least becoming very unhappy? We’ve all heard stories of people who have won $50 million in a lottery and then years later claim that it destroyed their lives. Alternatively, perhaps a super AI might do something that seems extremely bad, like killing off billions of people, but only later, in the long run, we realise that this was in fact the friendliest thing for it to do. A bit like how your father didn’t allow you to do something as a child. At the time you didn’t think he was being very nice to you, but years later you understand and are thankful for what he did as you realise that it was in your own best interests. If seemingly terrible things can be really good, and seemingly wonderful things can be really bad, how could anybody figure out what is or is not a friendly action? Even with hindsight people still can’t agree on whether certain things in history were good or bad. Usually things are good in some ways and for some people, but bad in others.

Deadly butterfly wings. We all know the idea from chaos theory that a single flap of the wings of a butterfly could cause a hurricane a few weeks later. In which case, if an AI did some trivial act surely that could trigger a terrible event some time later? As even a super AI’s powers are limited, it might not realise this, in which case, was it being friendly or not? Or is friendliness the intent to be friendly, not what actually ends up happening. In the latter case, are fanatics being friendly when they do nasty stuff because they are really just trying their best to save the world?

Beautiful tool, terrible owners. Even if an AI didn’t have the motivation to do nasty stuff itself, it might well have owners with screwed up ideas. As they say, “power corrupts and absolute power corrupts absolutely”.

Evil in disguise. A super AI might invent a new drug to cure a terrible disease, knowing full well that within a few years of this new drug coming out somebody will discover closely related technology that will spell almost certain doom for the human race. We blame some crazy scientist for killing off the human race, but in fact the process was actually set off by a very sneaky super AI. It didn’t need to lift a finger, it just published a short research paper and sat back and waited for people to do the rest.

The provably unprovable. As I showed in a recent paper, while extremely powerful prediction algorithms exist that can predict all sequences up to any given Kolmogorov complexity, you can’t actually prove this for any specific predictor beyond a Kolmogorov complexity of about 1,000 bits. So let’s say you have a super AI and it contains one of these amazing ultimate 2,000 bit predictors. Every day some problem comes along where the AI has to predict some 1,500 bit complexity sequence in order to save the world. Will your AI save the world every day? In other words, is it friendly? Can you prove that it’s friendly? No you can’t, because if you could then you would have in effect proven that the AI could predict any sequence up to a Kolmogorov complexity of 1,500 bits, and that’s impossible. Thus for this AI system you can’t prove that it’s friendly, even if it is. The same goes for even more powerful AIs that can predict all sequences up to a Kolmogorov complexity of 3,000 or 100,000 bits etc. Thus, if you can prove the friendliness of a AI system then the power of this AI must be below the 1,000 bit bound.

Of all these things I still think the first is the biggest problem. What is friendly? I can have an idea of what “friendly” means for other people and the things they do in my life. But in the context of a super intelligent machine, the whole concept breaks down. If I can’t define or measure something, I can’t say anything solid about it.

Tags: Friendly AI

16 responses so far ↓

1 mathemajician // Sep 9, 2006 at 10:45 am

(reposed comment from Starglider)

I don’t think your arguments against Friendly AI are valid, for the following reasons;

* Does “friendliness” have any meaning?

I can’t make any sense of this one at all. You appear to be arguing that it’s impossible to even make a reasonable guess about whether actions will harm or benefit other people, and possibly that there is no worthwhile definition of ‘harm’ or ‘benefit’. There is no obvious reason why it should be easier to know whether something will harm or benefit oneself, compared to another person, given enough knowledge about that person (comparable to the knowledge you have about yourself). So you seem to be saying that goal-seeking behaviour is impossible (at least outside of microworlds, over long time periods). This is clearly not the case, so I’m unclear how you can jump from a few examples of where humans find it hard to predict future costs and benefits to a blanket ruling that predicting future desireability is impossible. As for selecting worthy desires, your options are essentially giving people what they think they want, what you predict will make them happiest or what you think they should have. In any case, if you don’t specify some scheme for selecting actions how are you going to build an AGI at all? If you don’t specify, you will get something arbitrary, and most arbitrary goal systems wipe out the human race as a side effect of optimising some other desireable. Even if an arbitrary seed AI doesn’t wipe out the human race, if it doesn’t stop other AGIs from doing so then it’s only a matter of time unitl someone else builds one that does. It shouldn’t be hard to come up with a goal system that’s better than complete extinction of humanity.

The best hypothetical answer I’m aware of to this question is the SIAI’s ‘Collective Volition’ proposal, though it still has serious conceptual and implementation issues. Ben’s Friendliness content ideas are an example of a fairly reasonable attempt to do a straightforward specification of a nice place to live - much better than nothing, though I have issues with Ben’s /structural/ Friendliness ideas.

* Deadly butterfly wings

Again, this applies equally to /any/ open-ended purposeful behaviour. To be worthwhile, we just have to do well on average, not win in every single case. Plus the whole point of superintelligence is to be better at predicting the outcomes of actions, such that the probability of making suboptimal choices (and hence unwanted side effects) steadily declines with increasing intelligence.

* Beautiful tool, terrible owners

For AGI designs that aren’t expressly designed to preserve their optimisation targets (core goal systems) against self-modification, it probably doesn’t matter; post-takeoff the AGI is going to have an effectively arbitrary goal system. People clever enough to get the structure of their AGI right and malicious enough to misuse it are a problem, though people with well-intentioned but short-sighted Friendliness content schemes may be a bigger problem (which may be what you meant with your first point - Friendly goal system design is not impossible but it is tough - and you still have to mess up pretty badly to do worse than the alternative, extinction of all humans). The only solution to this is to put all available resources into designing and implementing Friendly seed AI before someone builds an Unfriendly version.

* Evil in disguise

This is an argument against using ‘adversarial methods’ against an AGI - being unsure as to its actual goals and trying to judge its intent or vet its actions by observation (human or automated). Of course this doesn’t work and AFAIK no serious AGI projects advocate it as anything more than a last ditch backup; you can’t reliably police something more intelligent than yourself. The key is to ensure that the AGI doesn’t want to do anything malicious (and will stay like that however much self-modification goes on), which in turn rules out many opaque and/or causally messy AI designs.

* The provably unprovable

AFAIK no one is trying to make strong predictions about seed AI performance. As you describe it the endeavour does seem fundamentally futile; the whole point of superhuman intelligence is that it achieves goals we find impossible in ways that we cannot predict in advance. But this isn’t necessary for an AI to be Friendly. It is merely required that the AI /attempt/ to optimise the world towards Friendly targets, and that it is /well calibrated/, so that it does not attempt to optimise situations for which it does not have enough information to consistently effect a net improvement in utility.

- Starglider
2 mathemajician // Sep 9, 2006 at 10:46 am

(reposted comment from Sam Kayley)

If you can’t measure it you can’t say anything useful about it. Well,
create several utility measures, and try to compare them with each other and intuition.

It may be sensible for an AI to have multiple goal systems, and if an action would result in a sharp drop in utility in any of them, then don’t do it.

One could be Collective Volition, or individual volition, with varying parameters for how much it gives you what you want vs what you would want if you could see further ahead.

Another could be Russell Wallaces domain protection.

Yet another could be incorporate goals of things similar to myself as my own goals, the initial goal being curiosity; some possible problems being curiosity resulting in attempts to simplify sensory input by tiling the universe or hiding in a darkened room, a possible solution being expressing curiosity as optimal prediction of sensory input for any sequence of actions, not just the ones taken.

Another possible problem could be friendliness until other AIs exist, at which point it identifies with them rather than humans, or adopting goals we would rather it didn’t adopt, or identifying with other animals more than humans due to greater numbers even if the ‘affinity’ per creature was lower.

Much of my goal would be preserving consciousness, and indeed being able to perceive the world and mathematical structures and art in new and exciting ways (qualia enhancement ,so starting by working out what consciousness really is would be a good start.

It may be that problems relating to expressing human goals in an expected utility framework will disappear in the kind of reflective framework needed to get strong AI to work.

Friendliness failure due to prediction failure in cases that don’t involve cutting edge physics, competing AIs, butterflies, or the self-improvement process itself in practice probably won’t be as bad as mathematicians’ nightmares as one can usually ‘open up the box’ by using more/better sensors.

Proving friendliness may well be impossible, but surely given the stakes it is worth trying, reducing the number and producing the best defended assumptions for a friendliness strategy improves our chances.

Until there are vast, dull piles of literature on friendliness, a paradigm shift obsoleting most of the piles, and then some more vast, dull piles, the human race hasn’t really tried

-sam kayley
3 Nick Hay // Sep 10, 2006 at 9:06 am

I agree that current efforts are woefully inadequate, with a exception being the work done at SIAI. This work is woefully incomplete, however, which is why they’re still developing Friendliness theory rather than building a non-Friendly AGI. But that doesn’t mean their efforts will fail.

The only way to seriously deal with this problem would be to mathematically define “friendliness” and prove that certain AI architectures would always remain friendly.

I agree. However, defining a formal specification for the AI’s optimisation criterion is a non-trivial problem! One shouldn’t expect to quickly solve it, nor to even be able to see whether there is a solution.

I don’t think anybody has ever managed to come remotely close to doing this, and I suspect that nobody ever will.

Why do you suspect this problem impossible? Certainly it hasn’t been solved yet. This isn’t surprising: I expect the problem of defining Friendliness (note: we’re not talking about the human term “friendliness”; it is perhaps a bad choice of terminology) rigorously to be an incredibly difficult problem, on the order of writing a tractable AGI. Clearly failure to succeed so far is not evidence for impossibility.

Even worse, I suspect that the only stable long term possibility is a super AGI that is primarily interested in its own self preservation. All other possibilities are critically unstable due to evolutionary pressure.

Self-preservation is a derived goal from almost any goal. More generally, what matters is preservation of a system that implements the AIs optimisation criterion, whether it counts as “self” or not. What do you mean by “primarily interested in its own self preservation”, and what are the “evolutionary pressures” working against systems which are interested in self preservation for derived reasons? (Leaving aside the anthropomorphism in terms such as “interested” and “self”.)

The moderation system consists of so-called “list snipers” whose job it is to ban discussions that they don’t like. If these people are experts in friendliness… lord help us.

I see the joke, but it must be said that any sane Friendly AI design should not be sensitive to how irritable or arrogant its programmers are. It must be able to correct far deeper errors the programmers will make.

Does “friendliness” have any meaning

You give examples of actions which seem good at first glance but are actually bad, or vice versa. I strongly suspect a superintelligent AGI will be able to avoid killing people (that is, with sufficent power to manipulate the universe a large class of moral dilemma such as “do I kill N people to save M?” disappear), although the complaint of stale utopia still holds. It is not an easy problem to design a nice future.

But then humans face exactly this problem, only without the added advantage of being superintelligent, having the ability to understand and redesign their own source code, etc etc. An AGI simply needs to do better than human, to make one improvement to the world, for it to be worthwhile.

The problem of AI morality is not making a particular set of decisions about what the future should look like, nor designing a particular set of moral principles to uphold. It’s more about transferring the ability to reason about morality, about what humans really want.

Coherent Extrapolated Volition (http://www.singinst.org/friendly/extrapolated-volition.html) is a non-trivial informal optimisation criterion which makes these particular complaints void. There is plenty of work to be done (filling in the gaps, formalising the specification) but the particulars of present day human disagreements about which actions are good or bad aren’t relevant to the design.

Deadly butterfly wings.

There is a pretty big difference between fanatics making a moral error and the stupid making a prediction error. Leaving that aside, prediction failure is a problem. One of the major problems with human stupidity is not inability to predict things, but overconfidence in these predictions. An AI that knew when it didn’t know might make far fewer mistakes.

Regardless, an AI’s success will be limited by how correctly it predicts the universe. This is a problem if we are trying to unconditionally guarantee success, but that not what Friendly AI is trying to do.

Even if an AI didn’t have the motivation to do nasty stuff itself, it might well have owners with screwed up ideas. As they say, “power corrupts and absolute power corrupts absolutely”.

This is one critical problem with creating AI which obey their owner’s direct commands. Even the most altruistic and rational human may be corrupted by such power. However, not all AI designs are structured to obey their programmers blindly.

Evil in disguise

Nice example. Certainly an AGI can fool humans into thinking it is nice if it needs to and it needn’t be smarter than human to do so. You have to design AGIs nice, you can’t reliably detect it from their behaviour.

The provably unprovable

One problem here is the AGI doesn’t need to predict all sequences under a given complexity, only the actual sequences the universe will produce. Note also that a Friendly AI doesn’t have to be omnipotent, it doesn’t have to have 100% success probability. Improving on the non-AI status quo is fine enough, one doesn’t have to also perform logical impossibilities. On the other hand, I think there’s something interesting here.

I don’t think you need to prove these things to establish an AI Friendly. Proving a result like “will behave in a Friendly manner so long as it doesn’t have to predict its Goedel sequence”, or some such, is fairly strong and interesting. Although, hmm, you might need something stronger.

Good to see you thinking about these issues!
4 mathemajician // Sep 10, 2006 at 1:13 pm

You appear to be arguing that it’s impossible to even make a reasonable guess about whether actions will harm or benefit other people,

Sure you can make reasonable guesses. My problem is that when your future is in the hands of a super intelligence, reasonable guesses just aren’t good enough!

As for selecting worthy desires, your options are essentially giving people what they think they want,

I want to be the most intelligent, richest and sexiest man in the world! Is this really a good idea? I doubt it.

… what you predict will make them happiest or what you think they should have.

This has been done many times before in history… and the results were a total disaster. Indeed many people in this world today would love to give me what they think I should have in life and I wouldn’t like it one bit.

this applies equally to /any/ open-ended purposeful behaviour. To be worthwhile, we just have to do well on average, not win in every single case.

My point here is that you cannot define friendliness in terms of good things happening as a result of the agent’s actions. A friendly super AGI could be trying to help, but wipe out humanity by mistake. This makes defining “friendly” much harder as we need to focus on what the AGI’s intentions are. It also means that you could never fully ensure that an AGI won’t do something very harmful. You might think these are obvious points, but I think they are important.

People clever enough to get the structure of their AGI right and malicious enough to misuse it are a problem, though people with well-intentioned but short-sighted Friendliness content schemes may be a bigger problem

It seems here that we agree.

This is an argument against using ‘adversarial methods’ against an AGI - being unsure as to its actual goals and trying to judge its intent or vet its actions by observation (human or automated). Of course this doesn’t work

Ok, and we agree here too.

the whole point of superhuman intelligence is that it achieves goals we find impossible in ways that we cannot predict in advance. But this isn’t necessary for an AI to be Friendly.

Sure… however my point was somewhat the reverse: If you can prove that an AGI will try to save humanity from certain doom when it is easily able to (presumably a minimal requirement for friendliness?), then I can prove that its intelligence is below a certain bound. Will AGI intelligence always stay below this bound into the future? If not, then we are heading into a future where AGI friendliness, even in the very minimal sense above, cannot be proven.
5 Marc_Geddes // Sep 11, 2006 at 6:12 am

Shane,

Thought about this for 4 years. Finally cracked it. The condition for Friendly AI is given below. This sounds simple but I’m saying something extremely subtle and extremely strange here so you may need to re-read it carefully a number of times:

—
‘The condition is to create a *Sentient* AI - i.e an AI that thinks it has Qualia… such that all mathematical entities representing metaphysical (or ontological) categories of *universal scope* are *experienced directly* - as ‘mathematical qualia’ in the mind of the AI. Any AI is Friendly *if and only if* this condition is met’
—

For a summary of my reasoning see my post on the everything-list here:
http://groups.google.com/group/everything-list/browse_thread/thread/6c0b178a63118d8f/c86de6243cdc81f4#c86de6243cdc81f4

Cheers!
6 Accelerating Future » Consolidation of Links on Friendly AI // Sep 11, 2006 at 11:15 am

[…] Friendly AI is bunk by Shane Legg Alternatives to (Yudkowskian) Friendly AI proposed on the SL4 list Critique of the SIAI Guidelines on Friendly AI by Bill Hibbard SIAI’s Guidelines for ‘Building’ Friendly AI by Peter Voss […]
7 mathemajician // Sep 13, 2006 at 12:15 am

the work done at SIAI. This work is woefully incomplete

That’s the somewhat more polite version of my main message…

Why do you suspect this problem impossible? Certainly it hasn’t been solved yet.

Why do you suspect it is possible?

Humans have been arguing about what things like friendliness mean for thousands of years. Countless generations of people have thought about these things going way back to the Greeks, and who knows who before that. If you read what the Greeks had to say about some of these things, as I guess you will have, then you’ll see that they were pretty sharp and were thinking really hard about these issues. They were also thinking hard about physics, mathematics and many other things. Now, how far as our understanding of things like physics and mathematics come since then? It’s developed enormously. But ethics? Hmmm. Not much. Even basic problems in ethics continue to present huge problems after thousands of years of work, including by some of the most brilliant minds in history…. and you’re claiming that it will be possible to not just find good solutions to issues like what friendliness is, but to even do it using formal mathematical equations!

Hehe… good luck Nick!

*thumbs up*

Of course that doesn’t prove that it’s impossible, but it sure makes me suspect that it is. At the very least I’m really doubt that you will solve the problem before AGI turns up.

Besides, even if you did come up with an equation, I’m sure that I could find some strange situations, stick them into your equation, and get answers out that many people disagree with… for the simple fact that on many ethical issues the population is divided.

What do you mean by “primarily interested in its own self preservation”, and what are the “evolutionary pressures” working against systems which are interested in self preservation for derived reasons?

A friendly AI can only preserve itself through friendly actions, while a primarily self interested one can preserve itself in what ever way is the most effective. This gives it a survival advantage due to its very nature.

It’s more about transferring the ability to reason about morality, about what humans really want.

Go and ask various people in countries around the world is moral and what they want. You’ll get a real mixed bag of answers. The world is full of groups of people who hate each other.

This is one critical problem with creating AI which obey their owner’s direct commands. Even the most altruistic and rational human may be corrupted by such power. However, not all AI designs are structured to obey their programmers blindly.

So you can’t rely on the people telling the AI what is right and what is wrong. Which means that you need to get the AI’s internal model of right and wrong correct to start with, and to do that it seems to me that you’d need an extremely precise mathematical definition of friendliness… and as I’ve said above, I’m very skeptical about you managing to do that.

You have to design AGIs nice, you can’t reliably detect it from their behaviour.

Same comment as the last one.

One problem here is the AGI doesn’t need to predict all sequences under a given complexity, only the actual sequences the universe will produce. Note also that a Friendly AI doesn’t have to be omnipotent, it doesn’t have to have 100% success probability.

I’m not asking for omnipotence of the AGI. I’m asking it to solve a finite, but very difficult, problem. Even if it can solve it, you can’t prove that it will do so in order to save the world… and thus you can’t prove its friendliness. Note that I haven’t said anything about the AGI other than the fact that it’s very powerful. It might be friendly, it might not be. Mathematical proof just can’t answer that question for you.

It think I’m going to have to write a longer explanation of this last point. Maybe as a new blog post.

By the way, thanks for the well thought out reply.
8 mathemajician // Sep 13, 2006 at 8:17 pm

I read some of that article but I’m afraid that it’s more philosophy than I can get my head around!
9 mathemajician // Sep 13, 2006 at 8:22 pm

starting by working out what consciousness really is would be a good start.

Yes, if we really understood consciousness, I suspect that many ethical issues would become significantly easier to understand and reason about.

Proving friendliness may well be impossible, but surely given the stakes it is worth trying,

Sure, it’s worth trying. But man, as far as I can see we really don’t have a handle on this problem at the moment.
10 Marc_Geddes // Sep 14, 2006 at 6:44 am

The up-shot of all that philosopy is this:

There are 27 ontological ‘categories of cognition’ which are ‘universal in scope’ (applicable every-where in reality where minds could exist).

A universal reasoner (real AGI) would need class definitions for all 27 categories to work. The 27 categories are listed at the end of the essay.

Cheers!
11 Eliezer Yudkowsky // Sep 15, 2006 at 8:09 pm

Most of these questions are answered in either http://sl4.org/wiki/KnowabilityOfFAI (predictability of superintelligent optimization targets) or http://sl4.org/wiki/CoherentExtrapolatedVolition (actual criterion of Friendliness; what humans want). I will answer the one technical point separately (in the succeeding blog post).
12 Excerpts from Farlops Industries : Utopia? // Mar 25, 2007 at 11:02 pm

[…] I believe some pretty sketchy things:I believe that the premise of strong AI is sound.I believe that artificial life is already here.I believe the premise of molecular nanotechnology is sound.I believe the premise of cryonics.I believe the premise of brain taping.Given that I believe all that (I blame years of science fiction and an abortive career in physics.) more reasonable stuff like space elevators and the cure for aging are pretty tame.But what I never understood about these subjects is how they drive some people to get all, well, starry eyed and religious about them. There is always something about the future that gets people all dreamy. They assume somehow paradise will emerge and everything will get all cleaned up and solved. Then the handwaving starts:Post-humans will perpetually happy, all forms of suffering will end.God can be engineered and it’ll love us.Nanotechnology and superautomation will usher in a post-scarcity world but, I guess some of us didn’t get that memo.I flatly and categorically disagree with this handwaving. It’s handwaving like this that got us into serious trouble in the past. The trouble with most thinking about technological singularities is that it encourages sloppy thinking. A lot of people in futurist circles reach a point in their exposition where they get very vague on how to get from here to there.Maybe I’m just a curmudgeon. I remember, as a child back in the Seventies, reading these beautifully illustrated essays in an encyclopea about Gerard O’Neill’s space colonies and then watching video from the Apollo-Soyouz mission. Even then the juxtaposition was very informative to me. I think what I learned was that the eventually the future becomes the present and the wonderous becomes commonplace and problematic.I keep harping on this point but, I repeat it here. Heaven is a place where nothing ever happens. This suggests to me that the idea of Heaven and Utopia are logically flawed.Futurists would do well to avoid this kind of thinking. Published Sunday, March 25, 2007 2:59 PM by Mr. Farlops […]
13 Jeffrey Herrlich // Sep 10, 2007 at 5:19 pm

Mathemajician,

I don’t really understand the motivation for this post. Even if Friendliness wasn’t 100% provable, *trying* to make the first AGI Friendly sure as hell beats slapping together a minimal, arbitrary design and having *zero* intuition about whether it will be nice to humanity. If AGI designers intend to flip the switch on a minimal, arbitrary design without even an *attempt* to make it Friendly, then humanity and its future potential are probably screwed. Such an AGI would almost surely be totally indifferent to the well-being of humanity - the motivation-space is too large. Under such conditions wouldn’t humanity most wisely never even *attempt* to make such a minimal, arbitrary AGI. Making a Friendliness attempt is infinitely better than making no attempt. Is publishing the comment: “Friendly AI is Bunk”, really helping the cause?
14 Jeffrey Herrlich // Sep 10, 2007 at 7:32 pm

Also, with the fate of humanity wobbling on the edge of oblivion, as it already is, it’s not like I personally would *demand* that the design be 100% provably Friendly. Given the current situation, I’d probably green-light it at anywhere between about 85% and 100% estimated probability of sustained Friendliness. I know that some people would demand more, but I think that’s a fair, balanced range, given the situation today. Sorry ’bout that minor flare-up, but I hope you agree with the viewpoint behind it.
15 mathemajician // Sep 11, 2007 at 9:37 am

I’m certainly not against the idea of Friendly AI. What concerns me is that the state of the art in the area is, in my opinion, horribly inadequate. I don’t think we can even properly define what a “solution” to the problem would even look like. Post singularity all bets are off and most “rules” go out the window. Maybe it is ok to bring about the end of humanity, so long as we move on to something “better”, what ever that means. I think these are philosophical problems that are not going to be solved anytime soon, in particular, they are not going to be solved before machine super intelligence. Secondly, even if we knew what the goal was, I suspect that there are deep mathematical problems in this area that make many current ideas on Friendly AI practically impossible to achieve.
16 Jeffrey Herrlich // Sep 11, 2007 at 3:54 pm

“What concerns me is that the state of the art in the area is, in my opinion, horribly inadequate.”

The theory isn’t complete yet, but I think that decent progress has been made, especially considering their shoe-string budget. I agree that we need to substantially increase our investments in Friendliness research. SIAI is trying.

“I don’t think we can even properly define what a “solution” to the problem would even look like.”

Selecting the actual goals themselves is a different matter, but I’d personally define it as achieving a goal-system stable through recursive self-improvement. The self-improvements would allow for continually greater understanding of the assigned goals, in principle.

“Post singularity all bets are off and most “rules” go out the window.”

But what if the goal-system is stable, and the AGI lacks any motivations to throw-out its “rules”?

“Maybe it is ok to bring about the end of humanity, so long as we move on to something “better”, what ever that means.”

But I don’t think that’s garaunteed by any means. The dynamics of an arbitrary design could easily lead the AGI to ceaselessly pursue some ridiculous and trivial target(s). I don’t want to trade humanity’s future potential in exchange for some optimal paperclips.

“I think these are philosophical problems that are not going to be solved anytime soon, in particular, they are not going to be solved before machine super intelligence.”

But we shouldn’t give-up prematurely. Breakthroughs can happen. And we can’t really be certain of when the Singularity will happen. For example, if I had to rough-guess it, I’d say that Friendliness could be solved within 5 years. Also, keep in mind that as the AGIs advance from animal-level to human-level, our understanding of Friendliness is likely to skyrocket with them - it’s tech improving tech.

“Secondly, even if we knew what the goal was, I suspect that there are deep mathematical problems in this area that make many current ideas on Friendly AI practically impossible to achieve.”

SIAI/Yudkowsky reinvents Friendly AI theory fairly regularly, you probably shouldn’t base your assessment on anything written more than 1 or 2 years ago. And who knows, the critical insight may happen next year. The important thing is to keep trying.

vetta project

Pages

Blogroll