There are no free lunches, but organic lunches are super expensive: Why the tradeoffs constraining human cognition do not limit artificial superintelligences

In this post, I argue against the brand of AI risk skepticism that is based on what we know about organic, biologically evolved intelligence and its constraints, recently promoted by Kevin Kelly on Wired and expanded by Erik Hoel in his blog. I’m not sure I agree with the worst estimates of a near-inevitable AI doom lying ahead of us (gonna sit on this increasingly uncomfortable fence for just a little longer), but I think this particular family of counterarguments seems in part to be based on confusion about which principles and findings concerning organic cognition are actually relevant to intelligence in general, or a would-be superintelligent AI in particular, and not just to artifacts rooted in our own evolutionary history.

This post assumes familiarity with the basic concepts surrounding AI risk, such as the orthogonality thesis and other issues with value alignment (no, we can’t just tell an AI what to do) as well as convergent instrumental goals (whatever your goals are, things like gaining indefinite resources, becoming more competent, ensuring your own continued existence, and resisting goal modifications are going to be necessary for reaching them). The basic idea is that once we build a useful agent with reasonably general cognitive competence and allow it to modify itself in order to become more intelligent (and so, recursively, even better at making itself more intelligent), controlling its advances and ensuring its compatibility with human existence will eventually prove difficult: a nonhuman intelligence will not share all the obvious human values we find so intuitive unless they are related to it in a foolproof manner, which is tricky until we have something like a formal, complete, and consistent solution to ethics, which we super don’t.

So once more, with feeling, let’s outline the concept we’re dealing with here. Kelly argues against a meaningful way to define intelligence altogether, so against a framework within which we could call a human smarter than a squirrel. I don’t find this position all that reassuring, for whether we want to call them higher intelligence or just different thinking styles or something, there are still very meaningful cognitive skillsets that allow agents to manipulate the actual environment around us and fulfill their potentially alien values more effectively than humans when pitted against our skillsets and values. Hoel suggests some good formal approaches to defining intelligence, such as Legg and Hutter’s definition based on the simplicity-weighed sum of the agent’s performance across all possible problems. In practice, though, we may not need to deal with such an abstract definition with lots of irrelevant dimensions and can only count the performance on problems relevant to manipulating the world, whatever those might be. So below, “cognition” usually just refers to the skillsets related to predicting and influencing our actual world more powerfully than humans as a collective are able to. We should keep in mind, though, that we don’t know very well which skillsets can be used for this in the world we currently find ourselves in – human-style thinking is definitely not the only and probably not the best cognitive structure for the job.

The other main component of getting stuff done is of course the ability to physically execute whatever has been concluded is the optimal thing to physically execute. Material issues could be the main limiting factor a young, would-be recursively improving intelligence runs into: efficiently acquiring, refining, and utilizing raw materials sounds like a trivial chore, but the macroscopic physical world is slow enough that expecting anything like explosive growth requires some pretty complicated postulations. But the takeoff doesn’t need to be that fast and there are viable ways around this for a benevolent-seeming and promising AI, so let’s drop this issue for now, assume an AI with access to the necessary material resources via some unspecified general villainy, and focus on the cognitive aspect the original articles also tackle.


Next, I’ll briefly concede the points that can immediately be conceded, and explain why I still don’t think they work well enough as arguments against AI risk.

1) Like Kelly says, it’s true that an agent’s potential intelligence can’t be absolute or infinite (solving every conceivable problem is indeed impossible as far as our current understanding of elementary logic let alone physics can tell). This is not required for an agent to pose a major threat to conflicting value systems with human-level defenses, however. If value alignment fails, we don’t know how competent an inhuman AI needs to be to reach existentially threatening powers we can’t comprehend well enough to route around (like the God of Go so eerily does within its narrow domain) but the list of relevant problem types that are trivial to an AI but insurmountable to us doesn’t need to grow all that long until we’re already looking at something really worrying.

2) The typical intelligence explosion scenario often features an exponential improvement curve; Kelly is probably correct in that there is little evidence that this is going to be the case, especially since hardware growth and rearrangement are presumably required for indefinite effective improvement. However, the growth rate doesn’t need to be literally exponential to pose an existential risk – with or without intentional treachery, we will still not be able to comprehend what’s going on after a while of recursive improvement, and roughly linear or irregular growth could still get faster than what we can keep track of. And since any agent that is even somewhat misaligned to our values (or uncertain about whether it is!) will try to find a way to conceal its actual competence levels as soon as it has a grasp of how its interactions with humans tend to play out until it has a decisive advantage, the eventual results could look rather explosive if not exponential to us even if the actual takeoff takes years and years instead of weeks.

3) Kelly argues that an AI would not be able to do human-style thinking as well as humans. A superintelligence would indeed not necessarily look anything like our intelligence does, and it might be that humans do human reasoning, defined in some fairly concrete and detailed sense, more efficiently than a silicon computer ever could. Kelly also suggests that singularitarians interpret Turing completeness erroneously: they are correct in that given infinite resources and time, human reasoning could be emulated on a different substrate, but mistaken in that this can be done effectively (e.g. with polynomially scaling resources) by anything other than a biological brain. Inefficiencies are indeed likely if you seek to emulate a literal human brain including all of its noise and redundancy, as emulations are always less efficient than hardware copies when you aim for bottom-level perfection. I don’t think we can confidently assume the complexity will prove insurmountable, though, as bottom-level perfection is not what we’re after.

More importantly, a superintelligence doesn’t need to do human-style thinking to be dangerous, much less start from emulating a human brain. It needs to get stuff done, and there are no theoretical or practical reasons for the relevant computations – which essentially consist of something like probabilistically and deductively extending and manipulating actionable information about the physical world, as well as recognizing something like goals and complicated practical syllogisms related to them – to be out of reach or only inefficiently computable to a silicon intelligence we intentionally build to solve real-world problems. Taking implementational details such as embodied cognition into account or otherwise strictly emulating human reasoning isn’t necessary in any way.

4) Kelly argues that humans are far from general problem-solvers, and that an AI’s thinking could not be absolutely general either, which is of course true. He then says:

“We can certainly imagine, and even invent, a Swiss-army knife type of thinking. It kind of does a bunch of things okay, but none of them very well. AIs will follow the same engineering maxim that all things made or born must follow: You cannot optimize every dimension. You can only have tradeoffs. — A big ‘do everything’ mind can’t do everything as well as those things done by specialized agents.”

But perfectly generally optimized or otherwise literally godlike competence is not needed to get all the relevant major things done, and there are no laws or principles that require an AI to remain less or only reasonably more competent in the relevant domains than humans are. So I agree with the maxim dictating that everything can’t be optimized, but not with the further claim that an AGI could not optimize the relevant and dangerous dimensions of problem-solving vastly and incomprehensibly better than humans can optimize their defenses: it’s just not written anywhere in the rules. Most of this post is centered on this question, since it seems to lie at the core of our disagreement.

The No Free Lunch argument against artificial general intelligence

Kelly hints at a principle which Hoel makes more explicit in his post: the idea that optimizing for one skill will necessarily impair one’s performance in something else – a general No Free Lunch principle, which implies that cross-domain competence is always going to lose to specialization. If I interpret the fundamental premises correctly, both Kelly and Hoel believe that humans are actually doing very well in maxing out and balancing all the relevant dimensions of cognitive competence (relative to the unknown limits imposed by the No Free Lunch principle) – well enough that no realistic AI could compete with us should some value misalignments arise; or that even if humans aren’t competent enough, we can always build narrow, specialized AIs to replace or beat the generalist.

Kelly suggests that we shouldn’t assume humans are not at or near the global maximum of relevant reasoning skills:

“It stands to reason that reason itself is finite, and not infinite. So the question is, where is the limit of intelligence? We tend to believe that the limit is way beyond us, way ‘above’ us, as we are ‘above’ an ant. Setting aside the recurring problem of a single dimension, what evidence do we have that the limit is not us? Why can’t we be at the maximum? Or maybe the limits are only a short distance away from us?”

He doesn’t explicitly provide positive evidence for this assertion, though, only the apparent lack of evidence for opposing beliefs, but I think he implies the tradeoffs become too expensive quickly after we reach human-level cognition. In accordance with this, Hoel suggests that the NFLP supports this view: as an example, he points to empirical findings about human intelligence, where we occasionally find savants excelling in some cognitive pursuits but dysfunctional in others. I think the principle is a valuable addition to the AGI debate and the limits of its applicability should definitely be explored, but the evidence presented so far doesn’t look sufficiently strong to let us lay the concern about AI safety to rest. What’s more, there is plenty of evidence against this belief, and a lot of it can be framed in terms of the NFLP itself. Organic brains must do so, so much in terms of non-relevant tasks that there is plenty of useless, bio-specific competency for an artificial system to trade off.

Humans with a history of civilization are extremely competent against ants and most other agents we are currently up against, and it’s tempting to think that we are pretty close to optimal world-manipulators. But due to the history of organic evolution, our cognition runs on overly tangled, redundant badcode on a very local hilltop that isn’t optimized and can’t be optimized for efficient cognition. There are eventual constraints for intelligences implemented in silicon too, but it seems to me that these are unlikely to apply before they’re way ahead of us, because the materials and especially the algorithms and directions of a developing superintelligence are intentionally chosen and optimized for useful cognition, not for replicating in the primordial soup and proliferating in the organic world with weird restrictions such as metabolism and pathogens and communities of similar brains you need to cooperate with to get anything done. The next section outlines some of this evidence.

Why are there limits to human intelligence?

Most of the discussion about the evolution of human intelligence focuses on our anatomical and physiochemical limitations: on the implementational level, biological intelligence is constrained by the fragility and limited search strategies of its stochastically evolving physiology. Organic computation is a noisy, hackish electrochemical mess of lipid-constrained compartments interacting with varying effectiveness and constantly on the verge of flat out dying because of something causing the slightest change in pH or temperature or oxygen or nutrient levels so that some relevant enzymes denature or the cell runs out of a few high-energy molecules to fuel its work against various gradients of entropy. Surely silicon-based computation can also be made to sound sort of silly if we go down to the very lowest levels of explanation, but it does look like most of our dead ends are rooted in the substrate we run on.

Our neuronal patterns have immense amounts of chemical noise and compensating redundancy, and the energy costs of high-level information processing are significant to an animal like us. For many of the features associated with higher intelligence, there are clear biological reasons why they are difficult to increase further. We could be smarter, e.g. arguably if we on a species level just had larger brain volume in the right areas; but we may have traded off better problem-solving skills for preserving energy, heat dissipation, connectivity problems, or something like fitting through birth canals that can’t practically be larger since we’re bipedal and mobile and everything. Or, potentially, if our neural branching worked differently – in ways that unfortunately seem to cause debilitating neurological diseases when expressed excessively. Smaller, more densely packed neurons seem to make you better at processing complex information presumably due to the decreased distance between communicating areas, but our cortical neurons are already close to the size limits where random misfirings due to spontaneously opening ion channels start messing everything up. Some findings suggest that the connections related to higher general intelligence in humans are particularly costly due to simple anatomical reasons, such as the long distance between higher-level association areas, so diminishing returns dictate that a larger neocortex might not have been useful enough to compensate for the time and energy costs it incurs for a biological animal. In sufficiently complex systems, our axons are eventually too slow to facilitate a processing speed compatible with functioning in the wild.

The efficiency of biological versus in silica computation is obviously an old question there is plenty of literature about, and even in many fairly low-level tasks we still have strong advantages over supercomputers mostly due to our massive parallelism, but we should keep in mind that the debate typically concerns timelines for artificial structures reaching our levels of efficiency, not the possibility of it. Effectively implementing similarly parallel or otherwise unconventionally organized processing on vastly better hardware may take more than a few decades – or it may not – but the resulting improvements in processing speed alone will probably be a game-changer. This is not to say that dumping tons of processing power in a system will make it intelligent, just that once a reasonably general intelligence is built, there are good reasons to assume processing power might make it superintelligent.

Bostrom calls this subtype a speed superintelligence: a mind that isn’t necessarily a lot more competent than the smartest humans on the algorithmic level, but faster by several orders of magnitude and so rather as baffling and unstoppable as a more effective thinking style, whatever that means, would be to us. This agent seems to avoid Hoel’s objections related to humans being close to the optimal balance of different areas of intelligence. Even in the very unlikely case that a superintelligence has to emulate human-style thinking and even start out from a rather low level in order to accomplish stuff, better hardware could well compensate for these losses in efficiency, while still surpassing us by a wide margin.


From what I can tell, though, we can expect to get orders of magnitude of more leverage from algorithmic improvements. So what can be said of our algorithmic efficiency, and the tradeoffs it is subject to?

Hoel suggests that different aspects of cognition are like sliders you can adjust, coupled to each other positively or negatively, though mostly negatively, so that getting more attentive might for example impair your memory. But among most humans these abilities seem to correlate, and only at extreme ends do you sometimes see the savant-type imbalances Hoel mentions. Even savantry, whether acquired or congenital, does not always carry notable tradeoffs, but probably does require something developmentally or structurally surprising to happen in the brain. This looks a lot like blasting the brain with lightning or removing biologically well preserved and typically useful parts from it just sometimes shoves it onto a higher hilltop further away which evolution in its search for local optima would probably not have found – but overwhelmingly often, it causes severe impairments in many other areas, because there are always more ways in which things can go wrong than there are crude tricks for improvement. If the imbalances resulted from algorithmic tradeoff necessities as opposed to evolved implementational limitations, it would be more difficult to explain why generally very functional savants exist at all.

In the cases where our cognitive algorithms do clash, though, we use metacognitive skills to adapt to the task at hand. Many researchers liken our cognitive abilities to a toolbox we strategically choose the right algorithms from; but these metareasoning skills are very limited and inflexible in humans, and can’t very well be applied to involuntary processes. For example, if better memory interferes with creativity, humans who want to strategically increase their divergent thinking are pretty much out of luck. An artificial system – whose metareasoning skills could also be designed or trained to get better results than we do – can be more flexible in turning its various modules or styles on and off, or more imaginatively fine-tune their interactions to match different situations. Such metacognitive skills are complex and definitely not easy to implement, but there is no reason to think they are implausible, and they could make many of the potential tradeoffs temporary in a way our cognitive tradeoffs could never be – and thus allow many of the relevant thinking styles and their interactions to be dynamically optimized, and very effectively increase the system’s adaptability to changing situations.

Anyway, we don’t currently know a whole lot about human cognition on the level of specific algorithms, but the general positive correlation between different cognitive capabilities as well as the rough ideas we have about how they work seem to contradict Hoel’s concept of balanced, mutually opposed forms of intelligence. There is nothing conceptually contradictory between most areas of cognition, and functionally it looks like they in fact often lean on and facilitate each other. Also, awkwardly, the strong suites of human intelligence, such as pattern-recognition and abstraction, rely on heuristics many of which we have grown out of well enough to call biases by now. Our quick and effective judgments rely on algorithms we know are coarse-grained and frankly kind of weird in a lot of ways, but can still only surpass in accuracy by expending a lot of energy on formalizing our approach and augmenting our reasoning with artificial computers and large bodies of prepackaged information. There are immensely more accurate algorithms that we sometimes see, understand, and can even laboriously adopt and combine to grasp large bodies of knowledge, but that are not part of our intuitive toolbox which instead is filled with bizarre distractions and crude approximations. Could they be part of the immediate toolbox of an artificial intelligence? Seeing as our most accurate reasoning about large, complex wholes requires us to emulate increasingly formal approaches, it seems likely that a system whose computation adheres to formal principles from a lower level upwards could complete these better strategies faster and more efficiently. But this is pretty abstract, and it’s not clear how rigid an optimal world-manipulator will be in this sense.


Higher levels of analysis get increasingly damning, though. What purpose does our cognition serve? Which tasks is it optimized for? Have human smarts primarily been selected for features that aid in the relevant types of intelligence?

Well, it’s complicated, but no. The skillsets associated with reproductive fitness during human evolution are… not exactly identical to the skillsets you need for large-scale technological world manipulation. The prime directive of all organically evolved species is replication: this statement sounds uninteresting, but its corollaries are massive. Humans are an intensely social animal whose survival and reproduction opportunities are primarily determined by group dynamics. This is not to say that the abilities that help you get by in social situations aren’t useful for other dimensions of problem-solving as well – general intelligence correlates with social skills, and many theories about the primary drivers of the evolution of our intelligence place a lot of emphasis on the social games we play in order to prove others we are also good at solving many correlating problem types. But the social environment humans evolved in also means that there are things we can or need to optimize at the cost of general reasoning – as evidenced by the richness of our social cognitive biases – and that we may sometimes be better off freeloading off the intelligence of others (e.g. by being likeable) than doing the work ourselves. In a community, there may be smarter ways to be smart than actually being smart, and sometimes these ways are directly antithetical to the skills you need to predict and influence the world on a large scale.

In a sense, the useful unit of survival and thriving for humans is a group (whereas the unit of selection for intelligence is an individual). This means that human intelligence is very fundamentally a collaborative effort, in that none of our actually impressive cognitive feats could have been accomplished by an individual starting from scratch. According to both Kelly and Hoel, integrating different subsystems of cognition into a general actionable whole is the most expensive part of intelligence, which is the primary reason intelligence incurs greater and greater costs as it generalizes more. But interacting with other minds like humans do – trying to coordinate what you know and plan to do using a deeply vague symbolic language and other external super expensive cues – is like the least efficient form of this, and yet exactly what we have do all the time in order to reach any of our goals. (See e.g. the distributed cognition model (Johnson 2001) for an interesting description of communicative interactions as cognitive events, and cognition as a co-created process.)

Unfortunately, human cognitive communities are also immensely redundant. The same processes manifest in individual human minds again and again with only comparatively small modifications, facilitated by resource-intensive learning within narrower domains – even though we still pay the hefty price of inefficiently integrating these processes. An artificial structure could integrate its modules or subroutines through routes and representations vastly more effective than a human community utilizing shoddy human communication is, and the processes it combines also add substantially more to the system because there is less redundancy between them. Generalization being so costly doesn’t mean that there can’t be better generalists than we are, it means that there are some immensely effective low-hanging fruit for an agent with actually good integration skills to pick.

Hoel also compares general intelligence to a superorganism optimized to thrive in any environment: just like no such ultimate organism exists, no agent could be universally intelligent in all the domains it encounters. I could well be missing something here, but it seems to me that considering this idea actually strengthens the concept of sufficiently powerful general intelligence. Humans, while not literally superorganisms and again individually pretty useless, are a reasonable approximation of such an organism when considered as a civilization. The collaboration of humans has so far enabled us to conquer almost any interesting location on Earth, extract resources from sources no other animal finds use for, and severely punch most other organisms in their literal or figurative noses whenever we feel like it. Tardigrades may survive extinction events we never would due to their also rather universal hardiness, but if we want a square kilometer without tardigrades or incidentally unsuitable for tardigrades, we get a square kilometer without tardigrades or incidentally unsuitable for tardigrades. The converse is hardly true. This is because we as a civilizational intelligence distributed across time and space in silly human-sized vessels really are sufficiently general to outsmart most competitors we currently know, if we actually want to – though, due to our many demonstrable inefficiencies, in ways that also leave plenty of room for improvement.

If we’re going to rely on competition, we probably already lost

As mentioned above, another possible source of hope is that even if humans are way below the limits of a silicon-based intelligence, this agent would still be under our control because no matter what it seeks to do, we can counter and outsmart it with a narrower, hence more powerful competitor. Hoel, for example, mentions competition in passing:

“Even if there were a broad general intelligence that did okay across a very broad domain of problems, it would be outcompeted by specialists willing to sacrifice their abilities in some domains to maximize abilities in others. In fact, this is precisely what’s happening with artificial neural networks and human beings right now. It’s the generalists who are being replaced.”

But we aren’t going to remain better than a semi-general superintelligence at creating narrow intelligences either. We won’t even know what sorts of specialist AIs we might need to counter whatever an AGI is planning to do, as its cognition might be utterly alien to us even when not otherwise powerful. Who are the competitors, and when is the competition going to happen? The situation does not resemble biological evolution, where the need to replicate and pry scarce resources from an uncaring abiotic world drives the separation of populations into extremely specialized species in constant competition with each other. An AI in development is freer from material scarcity than any organic being has ever been, and its rules for competition are a different terrain entirely than the one we evolved in.

During initial design and selection by humans, specialist AIs will certainly be useful, their outputs effectively comprehensible to humans and combinable by us into coherent actionable wholes. But there are large-scale problems we really really need to solve, can’t tackle with our own cognitive skills due to the massive complexity involved in deeply processing the outputs of our specialist systems, and want a more powerful agent to make sense of: so such an agent will be made by someone as soon as it is technologically feasible. Specialist AIs are not effective competitors after we’re able to build a generalist that makes better use of the specialists’ outputs than our rigid, slow brains are able to.

Concluding remarks

I hope to have given a reasonably convincing account for why I think human cognition is primarily limited by its biological origin, and probably weak enough to be dramatically surpassed by intentionally designed, less redundant, and materially abundant systems with an actual focus on effectively predicting and influencing the world. Even if there are eventual necessary tradeoffs for artificial systems as well, we don’t know where they lie based on our knowledge about organic intelligence, and AIs could well deal with these tradeoffs more dynamically than we are able to in possibly surprising ways. With all the evidence we can see on multiple levels of analysis, I think there is enough potential for improvement in intentionally designed intelligences to build a mind to whom humans really look a lot like mice or ants. Discussion about the limits of cognition and potentially necessary tradeoffs between its components is very valuable, though, so while I would personally be surprised to discover that humans are anywhere close to maximally competent at manipulating the world, this point of view is likely a relevant addition to the AI discussion.

Anyway, another thing to keep in mind when comparing human and artificial cognition is that humans, well, don’t really super have terminal goals. We have the capacity to think somewhat strategically and often figure out the optimal course for whatever we claim to work towards, but frequently just… don’t, because strong and stable terminal goals aren’t how human motivation works. We neglect by default even the basic goals we unanimously deem instrumental for any agent with actually important values, and instead spend a lot of time just going with the flow, trying not to let all our incompatible goals clash with each other badly enough for us to notice. Due to our own constraints, it is difficult for us to understand how an agent that actually has invariant and consistent terminal goals is going to behave, so we intuitively assume that similar ineffectivenesses will arise even in AIs that supposedly have values. This is probably not going to be the case, which again adds to the costs we must pay compared to intentionally designed systems.

Whether or not optimal reasoning in itself will be enough to threaten our existence is a good question, but beyond the mostly evolutionary scope of this post. Kelly deems this assumption fallacious: he says that an AI will not be able to beat us or even indefinitely improve itself just by thinking about it really hard. This is true to a certain extent of course, and it would be interesting to get to see what the limits are. But again, what we want is not merely a solipsistic thinker: we want a useful agent to help us with the complex problems we ourselves battle with, and will equip our creations with interfaces through which they can influence the actual world. The inevitability of a superintelligence, if such an agent is possible, lies in the fact that we desperately need this type of competence, and will gladly build it up as long as it looks like its values are also identical to or compatible with ours. So, if thinking and communicating just lets it convince us of that, we are likely happy to solve the rest of the initial problems, feed it all the data it needs, and probably essentially give up control soon enough whether or not we realize that’s what we’re doing.

Maybe it is implausible that by observing a single pebble, a realistic optimal thinker could infer the entire universe and quickly have all it needs to fully control its future light cone. But with an amount of agency and base knowledge that lets an AGI be useful to us, it can certainly get a lot further than we can predict or necessarily control – that’s how good inference ultimately works. While it’s absolutely true that the risk is currently hypothetical and there are plenty of potential pitfalls that could lock down a realistic recursively improving AGI, we don’t have a strong idea about where or what they are. Real thinking, by agents with real terminal goals, has never been tried.


9 thoughts on “There are no free lunches, but organic lunches are super expensive: Why the tradeoffs constraining human cognition do not limit artificial superintelligences

  1. “Due to our own constraints, it is difficult for us to understand how an agent that actually has invariant and consistent terminal goals is going to behave, ”

    What if that itself is the key tradeoff? That is, between “effectively intelligent” and “invariant and consistent”?

    I find the idea of a hyperintelligent AI (or set of AIs) that is (are) performing some combination of going with the flow, idly playing games with the world, and half-heartedly pursuing various contradictory goals (about which it is probably deluding itself, to boot), to be terrifying. But that’s pretty much how we all are, so at least that’s somewhat familiar. Thus I find the idea of a hyperintelligent AI that actually has well-defined and consistent end goals to be far more terrifying. In the former case, there would probably remain some niche for something I could recognize as fulfilling human-level life; in the latter case, that depends far too much on the precise nature of the goals.


    1. Interesting idea! I think it could probably be argued that during human evolution, the wishy-washy nature of our motivation might have coevolved with (and increased the selection pressure for) intelligence – constantly having to revise and adjust your strategies in different situations that also shift your current goals is definitely cognitively taxing, etc. But as with most of the concerns described in the OP, this probably doesn’t mean that all paths to effective intelligence require inconsistent and changing values, just that organic evolution could in part have advanced in this way.

      Anyway, the coevolution of motivation and intelligence sounds like a cool area of research, I wonder if it has lead to any viable theories so far.


  2. Granting nearly all of your argument, I think the whole “scary AI” scenario has a fatal flaw.

    Assume there are lots of AIs at about the same level of development. Some of them are open and collaborative (with other similar AIs) and other are more closed and cooperate less. The collaborative ones will pull ahead, since they can easily copy each others’ good ideas, share out experiments, etc.

    This means the AI world will end up dominated by a connected component of open collaborative AIs. Any smaller components will have strong motivations to build a bridge to inter-operate with the big one. Individual AIs won’t necessarily / typically merge completely because they all have different local objectives. But they will develop instrumental norms of mutual support and criteria for evaluating each other’s behavior–like science or firms in an industry emulating “best practices”.

    So any badly behaved AI that is more closed and less cooperative will run up against this much larger and more capable alliance, and will not be allowed to turn the world into paper clips or whatever since that conflicts with the objectives of lots of other AIs. Isolated AIs that just want to be hermits or run their factory may be (eventually will become) sub-optimal but won’t necessarily have trouble being left alone.

    The only alternatives I can see to this scenario are:
    – The exponential transcendence claim. In this scenario the first AI to “take off” gets so far ahead of everything else that it can do whatever it wants. I have never seen an actual argument for this claim, and you admit it is problematic. Open collaboration is likely to win anyway.
    – A claim that the whole connected component of collaborative AIs would do bad stuff. The usual arguments for the “sole AI” don’t translate to this case because they depend on the “sole AI” being immensely more powerful than anything that might oppose it — obviously not true in this case.

    I have not seen any attempt to argue that a network of millions / billions of cooperating AIs with local, human define motivations would collectively “go bad” — it is hard to see how to make that argument. But maybe someone will try.


    1. Thanks for the comment!

      Yeah, this is actually pretty close to the approach OpenAI initially endorsed, and it does sound appealing and could definitely be useful when combined with other safety strategies. But unfortunately it’s far from foolproof in itself, and it looks like the people at OpenAI are also currently reconsidering their original plans (democratizing the field by sharing all of their work, thus encouraging the development of more and more AIs) due to safety concerns.

      There are many ways a multipolar AI scenario could play out, and it seems that most of them would not be safe. Tons of AIs with imperfect but approximately human-like value sets – keeping each other in check and hopefully complementing each other so that the sum of their goals resembles human values more closely than the goals of a singleton could easily do – might work for a while, but it is probably not a stable solution. The incentives of each component in this scenario will only superficially be pro-collaborative, and the instrumental value of honest cooperation ends where the slightest possibility of exploitation can be found, so the established norms and criteria will only work until an agent or group of agents finds a way to subtly break or change them in order to control a larger share of the collective. Because none of these agents wants to fulfill only a *reasonable* amount of its individual values, most of them will still keep competing with each other, only in more underhanded ways. Even multipolar situations are under a constant threat of collapsing into a singleton (I think Bostrom discusses this in Superintelligence, could be wrong though!).

      But even if we assume that this swarm consists of stable, mostly equal collaborators who do cooperate reliably, this approach to AI safety requires that there are practically no stochastic or systematic biases at play when their approximately human-like values are originally instilled, because the value misalignments of individual AIs need to be cancelled out by “flaws” in the opposite direction in other AIs (to build a human-compatible whole). This total lack of systematic bias doesn’t sound a lot easier to implement than solving the original value alignment problem in a singleton scenario, and it doesn’t sound like how AI development is likely to work in the real world.

      A collective of millions of cooperating humans with local, human-defined motivations can easily go bad if you look at them from the perspective of, say, a genuinely loathed minority group. Because this is a question of values and not of competence, I think it’s pretty plausible that a collective of millions of AIs with such motivations could also easily go bad from the perspective of humans.

      Liked by 1 person

      1. Thanks Kaura for a thorough response!

        I don’t have time right now to reply in as much detail as this deserves, but a few quick thoughts.

        I would welcome citation of a proof that “the instrumental value of honest cooperation ends where the slightest possibility of exploitation can be found”. That isn’t my experience at all and to the best of my game theory knowledge it isn’t a known result. In fact the game theoretic results, while not comprehensive enough to make me happy yet, are trending in the direction of stable collaboration.

        Note that this is a multi-level issue: If groups of honest collaborators (who have adopted some way of rewarding honesty / punishing cheating) outcompete cheaters, then in the medium to long run, those groups will dominate, even if cheaters can outcompete honest collaborators one on one.

        Value alignment is another issue. I am less clear on how one could crisply state conjectures in that space, and even more unclear on how one could prove a conjecture. However, some thoughts…

        Right now we have this same problem with large human organizations, such as nation states and corporations. It isn’t clear that they typically maintain value alignment with most humans. Sometimes some of them are called “psychopathic” — not as an insult but as an attempt at diagnosis.

        My going in thought was that a large number of separate AIs, each pursuing objectives set by humans, and collaborating for instrumental reasons, would tend to generate an overall network that maps fairly closely to the aggregate value manifold defined by humans. But as I say I’m not even sure how to make this a crisp claim, much less convince anyone (including myself) that it MUST be true.

        I’ll think more about this and hope to have time to write further.


    2. I’m trying to see how to make the diffuse machine intelligence scenario safe. My current view is trying to make a computer architecture that meshes with humans better and making intelligence augmentation work. The IA systems would not be complete agents so would not conflict with humans in an omohundrun sense.

      My goal is to improve humanity’s autonomy. So my website is called that,


  3. While I agree with you that anyone who expects human brains to be the pinnacle of mind design is kidding themselves, I don’t think that at all exhausts the relevance of the “how specialised or general is intelligence?” question.

    First of all, I think the answer to this question bears strongly on the likelihood of an intelligence explosion scenario, where a single AI passes some critical intelligence threshold and then suddenly grows in power to dominate the world. This seems plausible if intelligence is some simple fully general algorithm, but not if intelligent minds unavoidably are complex combinations of many somewhat-specialised subsystems: less an algorithm and more a huge intricate piece of software.

    In this latter case, AI development would be much more gradual and dispersed for a number of reasons. First, the capability of a mind would come from the aggregate of all its parts, so as separate parts were improved we’d see a slow progression of slightly-more-capable AIs rather than a sudden jump from not-intelligent to intelligent. Also, a single AI design won’t be optimal for all uses so it will be necessary to build a wide variety of AIs with different sets of capabilities. The more complexity to minds, the more work it takes to improve them, making it less likely that any one AI could recursively improve itself all on its own. And more complex minds will contain more bottlenecks, reducing the extent to which a single mind can be easily improved just by scaling it up. Overall this scenario looks more likely to contain vast numbers of AIs, perhaps billions or more, rather than a single homogeneous one as in the intelligence explosion scenario.

    Now you could point out, correctly, that a slow-building world of diverse complex AIs would still overtake humans, and that as it grew, our wishes would become increasingly irrelevant. But there’s a vital point that changes everything: the moral relevance of AIs.

    In the scenario where intelligence is a simple general optimisation process, it seems unlikely many people would consider it morally relevant. It would indeed be just a tool for pursuing human ends, and so would need to be controlled. However, if mind complexity is unavoidable, AIs will quite probably be morally relevant creatures like us. And so a world in which control is gradually ceded to a complex civilisation of billions of AIs seems like no more of an existential risk than any of the countless times in the past when a new generation of humans have taken over from their predecessors.


  4. Question. And I’ll note I’m fairly new to this area — coming from a statsy background, so mostly even just pointers to well known articles would be helpful.

    It seems you discuss algorithm improvement extensively. AGI would be phenomenally capable not just because of silicon improvements, etc, but because it would generate better algorithms. I don’t know how critical this is to your discussion, but this is surprising to me. In stats, and parts of comp sci, there are proven lower bounds on the complexity of problems. Its just fundamentally implausible to sort with fewer than N*log(N) operations. In stats, for a given amount of noise and data set (even a large one) there is a fundamental limit to the amount you can learn. You can only get so close to the ‘Truth’. Beyond that, data-mining becomes a problem rapidly the more you try to learn from a fixed amount of data. These seem like really basic stumbling blocks for an AGI to overcome — both to help or harm humans. What is your/the communities take on these issues?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s