Compassion as knowledge

In this post, I’ll defend a fairly ridiculous theory of metaethics and moral progress I low key believe at the moment, which takes the following form:

1. No matter what our goals and values are, we should act according to our best and honest assessment of reality – the alternative is behaviour based on willfully delusional beliefs.

2. Subjective experiences, such as valenced emotions and other motivations, are all parts of reality, the same as any other natural event.

3. Knowledge of subjective phenomena is different from the third-person knowledge we have of other events in that subjective experiences can only be known privately, by having them in the first person – just being able to describe them in public terms is not yet knowing or understanding them in a sense that’s relevant to their nature.

4. We can’t actually have all the subjective experiences, but by interpreting public signals, we can use third-person models to approximate the norms and behaviour the motivations would lead to if aggregated into a vector which maximizes the self-interest of every subject.

5. Even if we can’t know or understand a truth that would influence our behaviour, as long as we can know or understand how it would influence us, we should act according to this influence if we want to avoid basing our behaviour on willful delusion.


6. To avoid basing our behaviour on willful delusion, we should base it on our best and honest assessment of the self-interest of every conscious being.

This gives us a natural way to derive an “ought” from what “is”, with no need for metaphysical morality goo or moral truths as distinct from other truths. Moral progress is empirical progress, both in the capacity humans have for understanding other minds and in the knowledge we have of the consequences of our norms and behaviour: fallible and tedious to acquire, but still meaningfully improving. The spirit of this approach obviously becomes quite utilitarian, but it is not necessarily the case that the best implementation of morality for humans uses a utilitarian framework – for instance, this approach alleviates the problems that some forms of contractarianism have with animals and other non-rational minds by automatically protecting the self-interest of every subject, and so makes it a more palatable solution. Our moral system of choice should be expected to change as we learn more about the world, even though its purpose and justification stays the same.

Next, I’ll briefly justify the premisses above one by one, and try to clarify their implications a bit.

1. We act according to our best and honest assessment of reality, or according to willfully delusional beliefs.

This first premise is hopefully pretty uncontroversial, and out of these two options, I’m sure most people would also generally like to avoid choosing the latter. Acting rationally requires that we at least make a 34-assed attempt to honestly understand the world, and if we ever find ourselves outright denying or fighting beliefs that we know correspond to reality, we’re probably stunting our ability to effectively interact with it in one way or another.

Obviously we all have false beliefs that we sometimes defend even to the point of delusion, but these are usually supplementing details in more complex stories which we still largely believe to be true even under honest inspection, if only due to our personal experiences and other mix-and-match evidence. We also try to keep the number of our indefensibly false beliefs reasonably low, and avoid letting them interact with the actual reality in order to avoid uncomfortable clashes and overt cognitive dissonance. Other things roughly equal, a more sincere picture has to be better no matter what your initial goals are, as delusion implies that you are mistaken about the nature and so the consequences of some real events or behaviour. But let’s drop this question for now, and take a look at the rest of the components.

2. Subjective experiences, such as valenced emotions and other motivations, are all components of reality, the same as any other natural event.

When trying to understand reality, what is the status of subjective experiences? Conscious content is usually fleeting and intangible, but this doesn’t make it less real than the stuff we think of as objective. If you subscribe to some form of monism in which conscious events are interactions and properties of whatever stuff the rest of the natural world is made out of, or if you simply accept that there is a meaningful whole you can call reality which includes mental states superventient on brain activity or other computations, the subjective experiences of others are quite obviously as natural and real as a dog you have heard barking in your neighbour’s yard but never seen.

A position that denies the relevance of subjective states as components of reality would require bizarre definitions for both of the key terms (and ignore the massive causal influence that such states have on the world, depending on your theory of mental causation). Where first-person experiences differ from third-person events is not in how real they are, but in the kind of knowledge we can typically have about them.

3. Knowledge of subjective phenomena is different from third-person knowledge in that subjective experiences can only be known by having them.

An understanding of the subjective is both private and ephemeral. Other people can’t know how you feel when you hear a surprising joke the way they can know what the weather around them is like – and as the joke loses its initial novelty, the knowledge of what its first appearance felt like slips away even from you. You may recognize it later if a very similar experience occurs again (if you’re one of the lucky people destined to hear not one but two original jokes during their lifetime), but a capacity to recognize something is not the same as knowing or understanding it.

Experiences can be transmitted to others by using verbal reports and other, subtler public cues: depending on the empathetic capabilities of the receiver, these cues can trigger a simulation that lets them understand the speaker to an extent. But merely having a description of what someone is going through in terms that evoke nothing internal is not understanding it, just like describing a shade of red to a person who has never seen colours, or to a full aphantasiac, will not cause them to know what the shade looks like – unless done in a manner that eventually succeeds in acutely evoking the image in their private mind, which might be theoretically possible but beyond our abilities. (Though again, they may gain the ability to recognize it later, or some other kind of propositional knowledge about the shade.)

As mentioned above, having gone through a subjective state in the past is not automatically enough for understanding it either. A faithful memory of what a given experience was like can sometimes be retained and later recalled, especially if the event repeats frequently, but typically reminiscing past conscious content is not meaningfully similar to the original experience. Claiming to know what your favourite coffee tastes like when you’re not currently drinking it is usually not unfounded: you can probably conjure up a mental taste that resembles the actual perception, heavily dampened but in ways you may be calibrated to compensate for. I think this counts as knowledge of the experience instead of just a propositional memory of the taste being “pleasant” or “just right”, with no understanding of the sensations involved. But aphantasia comes in degrees, and sufficiently complex emotions remain beyond the understanding of typical humans who are not experiencing them at a given moment. In contrast to coffee, the memories you have of intense despair most likely don’t correspond very well to what the experience was like: it is now only accessible through the lens of you eventually surviving it, for one, reduced by hindsight and pollyannaism to an easily avoidable fluke of bad luck, and perhaps modified to emphasize your commendable resourcefulness when enduring and overcoming it. You are no longer the context that could accommodate the original feeling. Depressed people are often dismayed by the empathy gap in people who have recovered and seem to recall depression as a trivially beatable slump, because most stably recovered people are incapable of reliving deeply depressive qualia.

The crucial part here is that to have an accurate picture of someone’s motivations or valenced emotions such as distress, you must acutely simulate them – and so experience what they do, which necessarily also means that you are similarly swayed by their self-interest. To know what someone’s unbearable headache is like includes wanting the headache to stop, just like they do. And universally, to understand more about the motivations of other sentient beings is to also adopt more of the motivations.

4. We can’t actually have all the subjective experiences, but we can approximate the norms and behaviour they would lead to if aggregated into a vector which maximizes the self-interest of every subject.

It’s unclear whether the epistemic difference can be overcome with better tools once we understand the mind a bit better. In Mindmelding, William Hirstein suggests some ways in which our access to others’ subjective states could be improved with future neuroscience, and explores the philosophical implications of such a private-to-public transformation. Faithfully experiencing the mental states of others will probably indefinitely remain absolutely tedious, though – short of a full long-term meld where brains could eventually come to share the same representations directly, brain-to-brain communication is still bound to symbols and interpretations due to the differences in how separately developed brains must represent their content.

Considering the difficulty of deeply understanding even one other subject, a human mind could hardly be connected to, or even plausibly able to process, the content of every mind in existence. (Willow in Buffy the Vampire Slayer season 6 disagrees, and draws an understandable conclusion.) But if an entity actually understood all the existing subjective states, its natural self-interest would be the sum of the self-interests of every sentient being. So, if you wanted to act on the basis of the information you could get by running such an idealized empathetic simulation, you could always approximate this general will by models of internal states based on public data, which often do get you actionable results.

I’m not going to describe my current understanding of this vector in any detail at all, but I want to emphasize that 1) we are nowhere near a good, sufficient approximation of it, 2) we are still way nearer than we act like we are, and 3) getting more knowledge about the general will is itself a large part of what it must dictate in a situation like this, in which it is still largely unclear to agents that theoretically would heed it if only they understood it. There are already many ways in which we can, with some uncertainty but a sufficiently high probability, move the world slightly closer to its optimal state. We know that in evolution and in psychology, the bad tends to be stronger than the good, and the motivations to stop a specific form of suffering are typically both more intense and less easily modifiable than the motivations to gain a specific pleasure – which helps us weigh our efforts. We lack a moral theory of everything, but can easily identify some promising directions for future research, most of which will be solid and empirical.

In addition to improving our capability to genuine empathy, what else would be included in this research project? Even the basic building blocks of the general will are poorly understood: self-interest is not limited to explicit and acknowledged preferences, and may sometimes have contradictory components. Even if a being has an explicit preference to do something that causes them distress, they act against their self-interest if they follow this preference and are perpetually in a state they don’t want to be in. A large part of knowing what is right will be about first reconciling inconsistencies in the self-interests of individuals, which of course is even harder in the case of beings with no verbal and minimal conceptual abilities. Perhaps self-interest will always remain a slightly fuzzy concept, despite our best attempts to coherentize it – but this doesn’t mean that it’s meaningless, or that we couldn’t do a better job by orders of magnitude with realistically achievable future research. So for better or worse, moral knowledge is now tied to the complex but ultimately tangible progress we can make in understanding fields like psychology, biology, game theory, and economics, as well as some more specific interdisciplinary or philosophical questions, such as the most sensible way to think about personal identity and the subjective cost of modifying our preferences.

5. Even if we can’t know or understand a truth that would influence our behaviour, as long as we can know or understand how it would influence us, we should act according to this influence if we want to avoid basing our behaviour on willful delusion.

It seems quite clear that even when you can’t know or understand a fact that would influence your behaviour, knowing more about the way it would influence you by information from other, indirect sources is a rational basis for just acting according to the influence anyway. Of course it would be nice to also get to have the correct beliefs, but this is not always possible, nor necessary to avoid delusional and poorly informed behaviour which arguably is the actual problem.

Consider a situation in which a friend of yours is responsible for inspecting the hygiene and safety of a new cafeteria, but has, say, signed a temporary agreement to not leak any specifics about their investigation until it’s properly finished. You have no access to direct knowledge about what the kitchen is like or what the hygiene practices are, but when you mention to your friend that you are planning to visit the place tomorrow, they gasp and say: “Trust me, you super duper do NOT want to eat there.” Ignoring this advice because you don’t understand or have access to the specific reasons, and just going to the cafeteria hoping for a pleasant and safe lunch experience, would be delusional. You have a functional approximation of the reality of this cafeteria. You can also have a functional approximation of reality related to first-person facts.

6. To avoid basing our behaviour on willful delusion, we should base it on our best and honest assessment of the self-interest of every conscious being.

This is the conclusion, which still leaves us with the option of just intentionally choosing to act on a false worldview. Maybe acting on a false worldview is good. The whole structure does sound uncomfortable and somehow wrong: how could it return impartial care towards every sentient being even if you have your own initial values that don’t resemble such concerns at all? (Wow, uh, riddle me that.)

Maybe I simply don’t care about knowing more about the subjective experiences of others. I could claim that the relevant aspects of these events are limited to their third-person effects, and that I have no reason to acknowledge their first-person nature because it is unimportant to me, like learning more about some obscure tropical plants is. The problem with this is that being opposed to learning more about them is not the same as merely finding the information irrelevant, and would at best make me sound like I need to explicitly deny the existence of some obscure but perfectly real tropical plants in order to maintain my worldview. That would be unhinged.

Still, a normative weak point in this theory is establishing the desirability of acting on a better understanding of first-person reality in the first place. Choosing true beliefs was initially encouraged on the basis that behaviour based on delusions won’t get you to your goals – but now these goals are suddenly not good enough anyway! I admit that this was a bit underhanded, and that if you have a set of personal values that is incompatible with protecting every consciousness, understanding instead of denying the reality of other motivations is indeed more likely to prevent you from reaching your initial goals. So this is fair. The claim I am making is simply that failures of genuine empathy are failures to understand the world, whereas successes of genuine empathy necessarily bring you closer to considering the entire space of motivations that real sentient beings have when you choose your norms and behaviour.

There are a number of consequences and other questions left open by this approach, many of which are related to classic arguments against utilitarianism and pretty easily defeatable. For instance, the seemingly excessive demandingness that simple utilitarianism is often criticized for also applies to this system, but the same counterarguments work in this case too. No one can act in a perfectly impartially considerate way even most of the time in their personal life, so aren’t we just facing a demotivating choice between acting 99.6% deluded and acting 99.4% deluded? Of course not, since a human can not function very well in the long run unless they direct a lot of resources to seemingly irrelevant fluff that increases their personal wellbeing, which means that it is in accordance with the general will. It’s very hard to find the actually optimal amount of seemingly irrelevant fluff that actually maximizes your impact, and it’s true that few of us would ever commit even to this since it would still undoubtedly require substantial sacrifice, but 37.4% delusion already looks a lot more appealing than 99.6% delusion. Suggestions that this view leads to an unacceptable tyranny of the majority situation, or that the existence of past or future preferences poses a significant problem to it, also seem to be pretty easy to dissolve.

I’m sure there are some reasonable objections that question the concept of knowledge I use here, or the role of subjective experiences in understanding reality. So far I haven’t thought of a convincing one, though, so while I’m not quite, uh, deluded enough to claim this is the definite theory of metaethics and moral progress, I think this is close to what a satisfactory solution could look like. Not caring about the self-interest of others is only possible by denying reality, and that is as close as we can get from “is” to “ought”: the only unclear “ought” left is why we ought to know more about what is true. Any account of moral truths would leave us in the same situation, with the escape hatch of just insistently denying them, just like we can deny empirical third-person truths. But that doesn’t mean they aren’t true.


What does the trajectory of AGI safety look like?


No one has a convincing estimate for when the first artificial general intelligence that effectively matches human cognition will be developed, and whether or how quickly such an entity could be improved or allowed to improve into an optimizer that humans can no longer control, modify, or generally prevent from satisfying its potentially strange initial goals. Due to the massive gap between the narrow AIs that now beat humans in a steadily increasing number of cognitive environments (though mostly using data-hungry learning approaches that quite fundamentally seem to handle the complexity and variability of reasonable real world behaviour poorly) and general intelligences capable of meaningfully interpreting and influencing the world, I don’t think we would be justified in expecting a defining breakthrough literally any day now. Still, many machine learning experts do give estimates in the ballpark of a few decades when asked when they expect the first AGIs to reach human-level abilities in the relevant cognitive dimensions. What’s more, the inconsistencies in their predictions (such as huge differences depending on how a given question is phrased) show that their responses are probably based more on vague intuitions than on reliable evidence. So, the level of risk awareness and safety precautions in the currently active AGI research and development groups could already define a great deal of the trajectory these events will follow.

With this in mind, it’s surprising how little attention is being paid to the state of safety measures in ongoing projects. It’s true that Google’s DeepMind, the apparent leader in the field right now, has expressed concerns about the possibility of unpredictable accidents resulting from advanced machine learning, and also published some research on the matter. OpenAI, the other major team drawing ample media attention to their achievements lately, has also always been explicitly concerned about the large-scale disasters a misaligned AGI could bring about. However, there are many other groups as well that actively seek to build a general intelligence. These projects are typically smaller, but they are numerous and cover a pretty wide variety of different approaches  some of which are still underexplored  so it is not wholly implausible that one of them will eventually still get to a decisive milestone first. The likelihood of this is further increased when detailed research results of the trailblazer projects are available to other groups, lowering the cost of entering and keeping up with the game. Also, even explicit safety concerns can be insufficient, and it’s crucial that work on safety will be carried out continuously alongside new advances in AI.

The good news is, Seth Baum from the Global Catastrophic Risk Institute recently announced a survey aiming to characterize all ongoing AGI R&D projects, assessing them based on a variety of attributes directly or indirectly related to safety, such as size, nationality, stated goals, and explicit interest in AGI risk. The classification is based on publicly available data from a multitude of sources and includes 45 active projects mostly in academic and corporate institutions, 23 of which are located in the US. (Due to the massive amount of deep learning research going on right now, the survey didn’t take into account DL projects with no explicit intentions to develop a general intelligence, even though it has been suggested that powerful DL algorithms initially designed for narrower purposes could suffice to form an effective AGI architecture.)

The working paper is available here; my summary of it here will not be very detailed, since the results are presented very concisely in the paper itself (pages 17-31). Baum’s paper mostly sticks to explicitly stated information and is thus pretty light on interpretations, so in this post, I’m going to use it as a primary source and pointer, but spend most of my time expanding on its findings using other material or (hopefully not entirely unfounded) guesswork. I will focus on two key questions I think will have the largest impact: the competitiveness of the research and development process, and the interest the groups have in ensuring their product is safe, i.e. value-aligned with or controlled by human developers. Due to the high uncertainty involved in AGI development and all the numerous features of different research cultures that might influence its safety, I definitely don’t expect to be correct about everything here just bringing my knife to the generally confused gunfight that is AGI timeline prediction.

1. How competitive will the AGI race become?

A major fear in the AGI safety community is that security concerns will be prematurely buried as soon as the exceptionally powerful first-mover advantage in this area becomes clearer to research groups with competing interests, especially when it starts to look like the defining breakthroughs are right around the corner. As Baum points out, a competitive environment coupled with low trust strengthens basic game-theoretical concerns about each group maximizing their own progress while worsening the expected outcomes of everyone. Instead of shutting down and thoroughly revising every project that shows slightly worrying behaviour, even a very cautious group in an intensely competitive situation needs to consider whether a more careless project is going to get ahead of them if they do slow down, i.e. whether they should keep going even in the face of some uncertainty in order to prevent an AI with potentially greater flaws from gaining a decisive advantage. Competition is also part of what makes just putting a strict limit on an AGI’s influence such an inadequate strategy: in a world where superintelligences become feasible, a singleton sovereign could well be the only stable outcome, and a limited, probably friendly AGI won’t become one if a riskier but clearly more effective candidate is available.

The overall competitiveness of the development process is determined by a myriad of factors  a few candidates (such as the number of groups, their relative capabilities, and the information they have about other projects) have previously been identified by Armstrong et al. in a very simple game-theoretic model, but like everything related to social psychology, generally they are probably hard to find and interact in unknown ways. As a rule of thumb, it should nevertheless be better to have fewer serious projects going on, ideally with similar values and dissimilar capabilities, which makes it relatively cheaper for each group to invest in safety precautions. Armstrong’s model also suggests that it is always safer that no group has information about the others; as the authors note, in the actual world this is complicated by the need to establish trust and safety collaboration between different groups.

1.1. How many serious competitors will there be?

While the number of ongoing projects found by Baum is pretty high, most of them are small or medium-sized with three exceptions: DeepMind (the Google group behind AlphaGo and more recently AlphaZero), OpenAI (the project launched by Musk and Altman explicitly to mitigate the catastrophic societal risks of AGI) and the Human Brain Project (a major EU project aiming among other things to simulate the human brain). As a vague definition, a serious competitor is one that influences the behaviour of other groups that roughly understand its work or is seen by most of them as a relevant rival; everyone would describe DeepMind as influential in this sense, probably not yet pay a lot of attention to most of the medium-sized projects going on right now, and definitely not lose sleep over my chatbot even if I went around telling everyone I’m absolutely trying to build an AGI for reals (but maybe if I make one that quotes Gödel, Escher, Bach…).

Baum’s methodology uses group size as a proxy for capability. The task of going through each project and rating their actual potential would indeed be considerable and yield wildly uncertain results even if done by someone more immersed in machine learning than me, but there are some general impressions I have that feel a bit more accurate than relying solely on project size. It is obvious that DeepMind and OpenAI, especially the former, have the most impressive track records in solving recent machine learning challenges that the general public finds interesting. Whether or not this will in itself mean something substantial on the path to genuine AGI, due to their high profile they also seem like the most attractive career options for talented researchers. I’m assuming that progress in this field is limited more by exceptional talent than other resources, so a positive feedback loop in their future capability is likely.

But which other projects could challenge them? An obvious suspect is the Human Brain Project, the third project categorized as large. Despite its size and considerable resources, though, the HBP has other research interests apart from brain simulation, and seems to have shifted its focus to better neuroinformatics tools after a pretty rough start featuring heavy external and internal criticism and no convincing plans for reaching their initial goal of a simulated intelligent whole brain. I’m also skeptical in general about the potential of brain simulations as a road to AGI. I could not find an unambiguous definition of what the project constitutes as a human brain sim this has often referred to low-level modelling of neural cells or even subcellular processes functionally interacting with each other roughly as their biological counterparts do, which can plausibly be expected to bring about intelligent behaviour even with no clear directions for how its higher-level cognition is supposed to work. (This was also the initial approach of HBP’s sister project and precedessor, the Blue Brain Project, best known for modelling the connectome of a single rat neocortical column on Deep Blue in 2006. I don’t know what said neocortical column is up to these days, so I’m guessing significant advances haven’t followed yet.)

Such a cell-level simulation encompassing a whole brain dreamed of by many a philosopher of mind would be massively too costly for us to realize, and indefinitely remain generally wasteful due to translation costs between different substrates, even if we knew how individual neurons work and interact with each other well enough to try (check out the research on C. elegans neuron-level sims, and remember that this is a system of a mere 302 neurons). On the other hand, to model the interactions of larger brain areas in less resource-intensive and more modifiable abstractions, you need a lot more than we currently have in terms of detailed and accurate knowledge about these interactions, if you want to replicate a brain’s cognitive functions and not just build entirely toothless approximations. In neurobiology, especially as it relates to cognition, we don’t really have precise controls of different variables we can mess with until we understand the system in some neat non-chaotic sense we could build something solid on. We have a replication crisis just like everyone else and ten thousand well-funded headaches.

There are several other projects that also approach the issue mostly through low-level modelling of human cognition, and my suspicions about inefficiency when compared to architectures that scrap human cognition almost entirely extend to them as well. The gains in AI capability are likely to be overwhelmingly based on algorithmic redesign instead of replicating human biology with increasing processing power: despite their value as basic research tools for cognition and neurophysiology, brain sims loyal to biology have next to no track record or evident medium-term promise in producing intelligent behaviour, while inhuman algorithms elsewhere show faster progress in narrow but still considerable tasks and maybe even plausible paths to increasing generality. If machine learning approaches fail to generalize and accommodate more complex motivational systems in a couple of decades with no solutions in sight, the slow and steady advance of systems based on neuroscience may become relevant again, but I would not bet on them right now or count them as serious competitors to inhuman machine learning when it comes to AGI.

In addition to the strictly neuro-based projects, there are some smaller groups I would quite confidently dismiss, as well as some groups that have been doing their thing for a couple or even several decades based on one cog-sci theory or another but with slim (or clearly only narrowly intelligent) results and few apparent ways to improve. Most groups don’t go into a lot of detail about their research in public, though, and it can definitely not be ruled out e.g. that some old established projects with sufficiently functional theoretical frameworks will be able to benefit from modern machine learning in surprising ways. This, combined with the rest of the major tech corporations also investing in AGI lately, makes it reasonable to expect at least a couple of the current projects to become serious competitors to DeepMind and OpenAI.

What’s more, unless AGI projects are remarkably short-lived, their number is also steadily rising at the moment: out of the 45 active projects, 5 launched in 2017, 4 in 2016, and 5 in 2015. So, roughly a third of the existing projects have started during the past three years, and by the end of this year we may have five or so projects more. While all of them won’t be capable of course, the chance of another DeepMind grows quite quickly if there is no reason to assume that the trend will die down in the near future. The initial resource limitations of smaller, competent teams with interesting approaches may also be overcome if larger companies seek to take them under their wings, which is what happened to DeepMind and could easily happen to other similarly prominent groups. (Facebook was also interested in buying DeepMind shortly before it was acquired by Google  maybe they luck out next time. Gulping down young but exceptional groups sounds like an effective way to acquire more talent, which makes it easier for the demonstrably best projects to get the resources they need soon enough.)

Another factor that could lead to a larger future number of serious competitors is the fact that safety precautions are again bound to slow down a group’s progress, so smaller projects could see the costly risk avoidance showed by the leading groups as a miscalculation and a weakness exploitable for them to get in the game anyway. Some talented researchers restricted by safety concerns they happen to deem unreasonable may choose to search for a more liberal research environment, which the market is likely to eventually provide them. Other prestigious corporations could probably match the attractiveness of the two current leaders when it comes to recruiting exceptional researchers, and also top it by promising less of that avoiding-planetary-disasters jazz and more exciting opportunities once promising paths to AGI start to become clearer. Microsoft, Baidu, and Facebook all have projects devoted to AGI that currently seem to pay no specific attention to control and value alignment, which they could start using as a selling point to recruit researchers who lean towards skepticism on rogue superintelligences. So could smaller projects, of course – while the capacity and resources they provide might on their own mostly not be interesting enough to attract tons of talent, building an exceptionally free, risk-seeking culture could conceivably be. Please don’t actually do this.

1.2. How motivated will the groups be to compete?

Baum notes that many of the projects have convergent values: goals that are best described as humanitarian or intellectual are clearly dominant among the stated missions. This consensus of benevolent goals is nice but partially illusory  it’s not like most institutes can costlessly just say they don’t know or aren’t actually super interested in what their inventions can be used for. Since the aim of the survey is to collect public, explicitly stated information, Baum takes these statements at face value in his coding, but notes that they may not reflect the actual missions of these projects. For instance, the largely humanitarian image of corporate projects could simply be a reflection of what motivations are seen as acceptable in the eyes of the general public, so little more than a marketing gimmick. I agree with this caveat, and in fact worry that it is necessarily underemphasized in this paper due to its inherently trusting methodology.

Another thing I’m wondering about is how much such roughly convergent goals would ultimately even reduce competition, or in other words, how much of a group’s motivation to compete is actually rooted in their high level mission. Some of it certainly is especially since it plays a part in selecting the people who end up working on a given project, but there are still going to be layers upon layers of personal motivations under the endorsed values of any institute. Most saliently, only the winners will end up with the considerable amounts of fame, fortune, and all that sweet going down in history stuff available to whichever team ends up creating the first AGI, at least for the two weeks we have after that before it’s weird runaway optimization process swallowing everything we know o’clock. So the idea that similar utility functions reduce competition is obviously legitimate, but even if we assume that every group is being honest, mostly ignoring obvious implicit goals like monetary profit, and agreeing on what things like “improving the world” look like, I would not interpret agreement about group-level goals as identical or even closely related utility functions.

A better sign is the interconnectedness of different groups, which means that the social gains involved can be distributed a bit more evenly across the community independently of which group eventually makes it. Baum notes that many organizations all around the world share contributors, advisors, or parent organizations, which both reduces competition and makes it easier to increase attention to risks, as safety awareness can spread more easily in a tighter network.

Of course, collaboration can also accelerate the development itself by reducing redundant work and putting every group in a better position to assess what the good ideas are. This is worrying, but still probably better than acceleration resulting from a more competitive situation, particularly if the collaboration happens in a context that emphasizes transparency for the sake of safety. This may just be a tradeoff we have to accept. For instance, it looks like Apple  apparently lagging behind in AI development and talent acquirement due to its secretive culture  was sufficiently attracted by greater cooperation opportunities last year to join the Partnership on AI, a consortium founded by rivalling Western tech giants to ensure that AI has positive societal outcomes. While the Partnership is mostly focused on more predictable concerns related to narrow AIs, such as consumer privacy, it also includes groups such as Oxford’s Future of Humanity Institute, the home of academic AGI doom ‘n’ gloom. The Partnership hasn’t been very active so far, so I guess we will have to wait and see what this amounts to, but the main takeaway is simply that the need to collaborate will also make it easier to coordinate responsible research.

In sum, the situation right now seems somewhat less competitive than what I expected before reading the paper, and there are some promising avenues for increasing safety collaboration: as we can see in the case of Apple and the Partnership on AI, there may be ways for the leading groups to modify the incentive landscape for the rest of the projects in a less competitive direction. Still, it would be naive to expect a perfectly nice collaborative environment to prevail indefinitely on its own as we get closer to actual AGIs. Perhaps isolation in comparison to collaboration is expensive right now when there’s still a huge and unknown amount of foundational work to do and dead ends to check out, but eventually, when the goals get more specific and getting ahead of everyone else will be less likely to be just a temporary victory, the apparent relative payoff of secrecy and competitiveness will grow. I don’t know how many people in the AI safety community are primarily working on finding ways to influence these indirect factors in AGI development, and perhaps it isn’t the best use of resources right now, but it is probably more feasible to prevent the formation of a hypercompetitive culture than to tear one down once it’s already being established.

1.3. Military connections

Considering the geopolitical implications of a strong military AGI, Baum also looks at the military connections of the existing R&D groups. I’ll add some comments about this issue in this section, because in addition to their inherently, uh, hostile nature, military projects seem pretty likely to become intensely competitive if two or more rivalling nations decide to invest in them. A Space Race style era of overt competition and fast-paced technological innovation could prove catastrophic if the aim is a massively effective general intelligence. Such an environment would also generally be volatile and increase the likelihood of a fast takeoff instead of gradual, controlled development, and also incentivize equipping the product with a great deal of concrete power as soon as possible. It seems very likely that the culture in military R&D groups would also be dangerously disconnected from the rest of the field with its social connections and potential for collaboration on safety.

The survey identified nine (mostly academic) projects with military funding, eight of which are located in the US. Only four explicitly had no such connections, and 32 projects were listed as unspecified. Baum acknowledges the possibility of covert projects, but finds no direct reasons to worry about the issue at the moment:

“– the modest nature of the military connections of projects identified in this survey suggests that there may be no major military AGI projects at this time — the projects identified with military connections are generally small and focused on mundane (by military standards) tactical issues, not grand ambitions of global conquest. This makes it likely that there are not any more ambitious secret military AGI projects at this time.”

From what I can tell, the discussion about AI safety in military contexts right now is also mostly about narrow AIs in control of autonomous weapon systems or other pretty specific tasks, which is reasonable considering that these systems are right around the corner even with existing technology, their worst-case consequences could also be practically irreversible, and slowing down narrow AI development in the military could also push military AGIs further away into the future. However, while the harms from autonomous weapons and the dramatically reduced cost of violence they bring about could be immense both in international conflicts and domestic control, these scenarios could still more typically be survivable for human values and allow for the eventual recovery of the afflicted societies. This may not be the case when we’re talking about superintelligences, so while they are certainly a more speculative threat than killer drones, military AGIs are a distinct hazard that needs explicit attention.

While Baum’s scarce evidence of serious military projects is reassuring, there are some reasons to believe that this is going to change in the future. Interest in increasingly sophisticated military AIs  which obviously is growing among all major world powers  is likely to gradually shift towards more generally intelligent systems because that’s a rational and appealing trajectory once such systems start to seem possible at all. For example, this quite recent CNAS report based on translations of Chinese documents mentions some explicit “singularity” thinking in the Chinese military (PLA), on top of the already very general-sounding tasks their recently boosted AI strategy aims for. The report is also concerned about PLA’s willingness to relinquish control of their AIs to gain an advantage:

“– the PLA’s speculation on the potential of a singularity in warfare does raise the question of whether the U.S. emphasis on human intuition and ingenuity might be appropriate for the immediate future but perhaps infeasible for aspects of future warfare that may occur at machine speed. There inevitably will be contexts in which keeping a human fully in the loop becomes a liability, and the type and degree of ‘meaningful’ or supervisory human control that is feasible or appropriate will remain a critical issue.”

Of course, interest does not yet mean serious effort, a coherent strategy, or quickly scaling capability. The attractiveness of such military projects to a sufficient pool of talented researchers is unclear to me, for one, and talk is cheap. But other world powers eventually escalating with similar plans of increasingly general military intelligences is a quite realistic scenario, and it could be the beginning of a death spiral I’m not looking forward to, particularly in light of how much military research has historically accelerated other technologies.

(For obvious infohazard reasons I considered not mentioning the report above, but I expect this blog to reach a vastly larger number of people in AGI safety than people relevant to military activity. The latter are also likely to get the information soon enough from other sources since the report is in no way obscure, so the value of slightly increasing the knowledge the safety community can work with is probably higher than that of ignoring it. Don’t tell anyone else though unless they know the secret AGI safety handshake.)

It has previously been suggested that military projects could in fact be safer than a larger number of private organizations working on AGI with potentially shoddy research policies. However, coordinating safe research between the non-military groups we’re now looking at seems more feasible than between hypothetical rivalling government projects, so on balance I suspect that military AGI should indeed be discouraged if possible. Not knowing that much about the history of warfare, though, I don’t have any interesting ideas about this apart from the usual platitudes of world peace being, by and large, a good idea.

How similar is this problem to the threat of nuclear weapons? There are some analogies between nuclear warfare and a hastily developed military superintelligence, such as the irreversible horribleness of the worst-case scenarios involved and the difficulty of international regulation. However, AGIs may be harder to control even during development and testing, so even a successful ban just on using them isn’t going to be sufficient, and they also have the potential to directly and massively improve the global future (unlike nuclear weapons, arguably), which means that regulation attempts will likely become even more complicated i.e. ineffective than nuclear disarmament plans have historically been. Effective countermeasures will perhaps involve indirect means like getting more cultures and researchers personally understand the hazards, if we have to accept that international military research policies have a questionable track record and may be even less applicable to this problem than to other research. If we assume exceptional talent is scarce, it could also counterintuitively be good to have some soft competition between corporations, as this would encourage them to attract more researchers and so leave government projects with less talent.

In conclusion, military AGIs make a lot of sense, are built following different incentives and cultures than the rest of AGI R&D, have recently been hinted at in practice, will probably be funded quite well, are at most as easy to regulate as the last doomsday invention we totally only survived because of anthropic selection, and form one of the worst race-to-the-bottom scenarios we can imagine. Just because they don’t super exist yet doesn’t mean today isn’t a perfect day to start vaguely worrying about them.

2. How much do current AGI projects care about safety?

Probably the most important question explored in the paper is the extent to which ongoing projects are concerned about the risks associated with poorly controlled or misaligned AGI. Low competitiveness won’t help that much if the winner or winners are still not interested in safety precautions, after all. (Though a longer development process makes even disinterested groups more likely to become interested in safety issues, as they have more time to notice and analyze the ways in which things go wrong, of course.) Baum categorizes each project as active, moderately active, dismissive, or unspecified concerning safety: it turns out that the vast majority of groups have not expressed any concerns about major risks an inhumanly capable AGI might involve.

The good news is that as mentioned above, both of the groups that seem like the strongest candidates to lead AGI development are clearly interested the safety of their products  particularly OpenAI, of course, which was explicitly founded as a response to these worries. In this section, I’ll start out with a quick look at the concerns these groups have, and then outline the safety concerns among the rest of the projects, trying to figure out what the interested groups have in common and how successful the AGI safety community has been so far at getting their message through.

2.1. What do the safety concerns of the leading R&D groups look like?

While most people agree that increasingly general artificial intelligence will change things dramatically and potentially for the worse, what they typically think of is not a technical malfunction in reward optimizing combined with capability gains, but things like sophisticated citizen surveillance bringing about a dystopian society, automation rendering humans purposeless, or intelligent systems otherwise diminishing the meaning and quality of the average human life in unpredictable ways because the people using them are careless or simply have bad intentions. In addition to distinguishing between how near-term narrow AIs might change our lives and how a superintelligent AGI might, we should also separate the hazards stemming from people using a superintelligence in harmful ways from the risk of technical control failures. An uncontrolled superintelligence is not a special case of AGIs facilitating short-sighted or selfish societal planning; it’s as close to that as a devastating natural disaster is. The methods and the people equipped to handle these problems are so different that it doesn’t make a lot of sense to blur them into the same conversation, as the results from social science and ethics will have next to no bearing on the control problems, and vice versa. Both of them include disastrous and pretty much irreversible scenarios that require active attention, however.

Like those of the general public, the overwhelming majority of safety concerns expressed by the research groups fall in the social ethics category. The asymmetrical concern is very understandable  we have plenty of examples of smaller technological innovations unpredictably changing society because people choose to use them in malicious or weird ways, but no strong examples of harm caused by uncontrolled technology that itself resembles an agent and not just a tool. Control issues seem like an outlandish concern to most people, which might also cause researchers to be less worried about superintelligent agents at least in public, and instead demonstrate the need for careful progress through less theoretical examples related to societal changes (even if they personally are worried about technical accidents as well).

Perhaps exemplifying this, the ethics and security board demanded by DeepMind as a part of its deal with Google is rumoured to be quite focused on technical hazards, though notoriously secretive – while the recently founded, more public-facing separate ethics team DeepMind Ethics & Society, is true to its name and works on the social and economic changes related to DeepMind’s research. (They do also mention control problems, collaborate with Nick Bostrom, etc., it’s just clearly not their main interest.) Assuming that whatever the internal board does is indeed mostly focused on the technical risks, a dual strategy where societal issues are discussed out in the open while technical safety matters remain internal seems like a pretty good approach to me, especially if there still is collaboration on technical safety with other groups.

OpenAI’s idea was initially very different from this. Like their name implies, the team started out with the aim of publishing most of their research in order to democratize AGI development, and so prevent it from optimizing for values that only a few people would agree with. This garnered criticism from Bostrom and other safety researchers, who essentially pointed out that universal AGI development chances are less like voting and more like everyone having access to nuclear weapons. They still have a lot of resources publicly available, but apparently they’re now reassessing the optimal levels of openness – what this means is not clear yet, but it is at least good to see they’re willing to reconsider potentially harmful plans based on external feedback. 

As far as existing work goes, both DeepMind and OpenAI have recently published articles on technical safety as a concern distinct from societal issues like fairness and privacy: for instance, their collaborated paper Deep Reinforcement Learning from Human Preferences discusses relating complex goals to RL agents by involving a human in the learning process, Safely Interruptible Agents by DeepMind and MIRI/FHI develops the theoretical foundations of preventing interruption avoidance in reinforcement learners, Concrete Problems in AI Safety features authors from Google Brain and OpenAI discussing potential accidents during learning and goal attainment, and DeepMind’s AI Safety Gridworlds delves further into practical examples of properties needed to ensure the safety of intelligent agents.

What is the significance of these publications? We don’t know how much work on safety is enough, but I’m pretty sure it’s going to be more than a few papers of similar impact per year. The low ratio of AI safety research to innovative AI publications doesn’t match the magnitude of the risk. But I think there are reasons these papers warrant some optimism, and they are mostly indirect: they show us and everyone else working on AGI that top-tier researchers view the hazards as an acceptable thing to address instead of dismissing them in the face of their apparent absurdity. This is crucial, since to competent observers working with increasingly dangerous AIs, the need for thorough safety precautions should only become clearer with time as researchers get to observe all the concrete but counterintuitive ways in which their not-yet-general inventions keep going wrong. Right now it may sound laughable even to a somewhat cautious researcher that an AGI could really, actually, say, pretend to share our values until it has embedded itself into the most crucial components of our global infrastructure and only then let us realize that its terminal goal was dat sweet paperclip lightcone all along. But after observing AIs that learn similarly deceptive behaviour in complex game environments, when they aren’t yet quite clever enough to get away with it, even a skeptical researcher would probably reconsider their default optimism – as long as they aren’t too prejudiced or deafened by the public opinion to not notice what’s going on. (I’m guessing this would also be a required component in making the general public take AGI safety seriously: concrete examples of stuff like reward hacking and treacherous turns in increasingly complex environments are almost certainly going to be more convincing than even perfectly good abstract arguments with little in terms of empirical salience (more on this below).)

2.2. How interested in AI safety are research groups in general, and how well has the AI safety movement fared?

Considering that AI worries of the LessWrong flavour have been around as an increasingly coherent movement for a couple of decades, it is a bit disheartening to see how few of the research groups have addressed them at all: 15 out of 45 projects, three of which only moderately. In addition to the clusters identified by Baum (academic projects with intellectualist goals inactive on safety, corporate projects with humanitarian goals active on safety),  I tried to find some traits that could help predict a group’s stance. For instance, newer projects seem to be a bit more active: out of the 24 projects started since 2010, more than half were listed as interested in safety.* Maybe some of the recent projects, just like OpenAI, have actually been launched because of the increasing concern about AI risk. In the general case, i.e. unless you’re Elon Musk, this is probably not a good approach due to the acceleration of competition it could cause, but I don’t expect it to be super common older projects are also more likely to be calcified and less reflective regarding the ultimate impact of their work, so a culture shift such as AGI safety entering the mainstream can more easily be missed, whereas new projects are more directly formed and influenced by the changed culture.

There seems to be a small difference between projects in the US and projects elsewhere. Out of the 23 US groups, only six were explicitly interested in safety measures at all (two of which only moderately) and two  Numenta based on Jeff Hawkins’s somewhat pop-sciencey framework described in On Intelligence, and Victor, which I’m not familiar with  openly dismiss major risks calling them “not a threat to humanity” and “crazy talk”, respectively. Out of the 22 projects outside US, nine have specified at least some safety concerns (one only moderately), and no one has directly dismissed them. The difference is tiny for a sample size like this so it probably doesn’t mean much, but is there a plausible reason for projects in the US to be less worried about AI risk? It’s not that these groups are less likely to have heard of the problem: if anything, the majority of LW-adjacent activity and AI safety organizations are located in the US, and the projects close to them  geographically and so probably culturally  could in fact be more frequently exposed to their arguments. Perhaps the groups outside of the US are more likely to encounter them in a more academic and traditionally serious form? We already know that Bostrom’s 2014 book Superintelligence has been quite effective at making the right people worried  but maybe a reader who associates it with the historically weird aspects of LW is predisposed to dismiss it, whereas someone who hasn’t already experienced the less approachable argumentation style back in the days is more likely to give a chance to Bostrom’s thorough explanations that even carry Oxford’s prestigious logo.

This would be consistent with how loving to criticize LessWrong’s PR strategy is a neglected human universal. I agree that the culture has probably been off-putting to many people (at least one project here is led by a person I recall getting in pretty intense fights about AI risk with LessWrongers), but as always, it’s hard to compare to counterfactuals. Maybe a community with a more agreeable presentation could have leveraged similar initial resources for a better result, but then again, maybe the controversies and notoriously strange discussion norms were necessary to attract the attention of a sufficient pool of contributors whose work the academic publications can now build on. Post-Superintelligence, however, it seems that now that the cause has the attention of the general public and can more easily publish traditionally reputable or otherwise prestigious work affiliated with serious-looking institutions, doing so is getting us the best results.

(In addition to academic cred, what else did Superintelligence have that previous material on the subject didn’t? I think one of its merits was providing examples e.g. of genuinely surprising behaviour in actual algorithms  I remember being quite fascinated by this bit on evolvable hardware. To get a better picture, I browsed the Goodreads reviews of the book, but mostly found a confusing number of complaints about how dry, boring and overly abstract it is. On the flipside, the positive reviewers appreciated the detail and precision of Bostrom’s analysis, as well as the lack of sensationalism and the modesty he hedges his arguments with.)

All in all, particularly since the most prominent groups are on board, I’m actually tempted to say the safety movement has fared pretty well considering how far-fetched its central concerns sound like to the average person. Again, a disaster scenario based only on extrapolated estimates of capability trends and a bunch of other equally intangible arguments we haven’t seen at work empirically is not easy to convince people of, especially if it also matches a trope that is common but often poorly explained in fictional works. But if we assume that practically all groups have heard of the issue and some of them also believe that a disruptive superintelligence is possible in principle, why have they not addressed it? Here are some suggestions, ranked very roughly by how common I think they are:

• Confidence in that the necessary precautions will be easy: many groups that have commented on the matter suggest something like this, so it’s likely that some of the groups listed as unspecified also simply believe that safety is trivial. This belief can appear in conjunction with practical plans for safety measures, or just with an assumption that since AGI is hard and still pretty far away, the researchers talented enough to build it will also be talented enough to easily implement the necessary precautions or the optimal motivational systems, even if we can’t think of them yet.

• Socially determined threat responses: as described in Yudkowsky’s recent essay, there is no obvious mechanism that makes it socially acceptable to just react to the risk in the current environment where everyone else seems so remarkably chill about it. (I’m very slightly more optimistic about this changing, since AI safety has been gaining a lot of traction lately, the largest groups are demonstrating that they don’t see the issue as crackpottery, and I believe we have more time than Yudkowsky probably does since I’m not quite as worried about the recent machine learning advances, but this is not an endorsement of inaction.)

• Intuitive humility: in particular, academic projects with no intentions to develop their product into a massively useful general intelligence (rather than just a totally neat cognitive science research tool) might not see their project as dangerous or even all that impactful apart from intellectual curiosity. They’re not trying to change the entire world with their product, at least until decades of political negotiations someone else will eventually take care of, and improvement above the human level may not be a goal these projects have in the first place. This stance neglects many important risks if the AGI still has a motivation system of some sort, which it probably will if it’s going to be used even for anything just academically interesting.

• Impact neglect: sort of related to the idea above, but more specifically, a failure to intuitively feel how irreversible or massive the disasters caused by a rogue superintelligence could be. Many people agree that even weird technical accidents are possible but place them in the same mental category of, say, small-scale international conflicts or even just regular industry accidents where a dozen people are hurt but the situation is under control again in a few hours and we can learn from it and move on. But of course, any AGI worth its salt has the foresight to play nice for a while. It seems that if we have a slowly improving human-level intelligence with goals incompatible with ours, it’s more likely to look perfectly benevolent and totally useful re world peace/perfect healthcare/post-scarcity economy until it has a definite advantage than it is to start messing with our stuff at all at any point where it can still be stopped.

• Secrecy or PR: Baum suggests that the survey might also understate the attention actually paid to safety among R&D groups, since groups in the unspecified category might just not be vocal about their concerns even though they are aware of the issues. This is possible since it makes sense for groups to want to avoid associations with dangerous scenarios, but sounds generally pretty irresponsible  an environment where each group can trust the others to pay attention to safety seems more desirable, and can only be built by active signalling of safety precautions. Taking AGI safety seriously probably entails collaboration on safety matters as well (though such collaboration could in principle happen without leaving public traces, but this sounds pretty inconvenient).

3. What does the survey miss?

Finally, one interesting question is the nature of the projects that a survey like this would miss. Baum points out that his number of active projects is a lower bound, not necessarily an accurate picture. It’s true that in addition to the incentives to misrepresent various less virtuous-sounding goals in order to attract funding, talent, and goodwill, there are also many conceivable reasons for a group to not reveal anything to the public about their project’s existence at all. Many of these reasons imply goals that people generally would be reluctant to accept, such as disagreeable corporate practices or hostile purposes. Also, secrecy is again to some extent in conflict with sufficient safety precautions, which usually include engaging with safety researchers, trying to influence the field in general in the direction of responsible research, and explicitly fostering a noncompetitive culture of trust and collaboration. Hidden R&D prioritizes other things, typically just a competitive advantage, at the expense of basic cooperative values.

But could there also be projects that hide their existence for altruistic reasons? This is possible, I guess  again, just the knowledge of a higher number of groups could increase competition, even if no detailed information about a given group’s approach were available to others. So, maybe there are some extremely safety-conscious groups that have decided to play it safe and not even mention their existence anywhere in order to avoid adding pressure on the field. How wise this approach is depends on how well such groups can internally ensure the safety of their project: in most cases, I expect the value of collaborating on safety research to exceed the value of staying quiet. This is also why I don’t think there could be many such projects. Still, out of politeness towards these commendably paranoid hypothetical groups, I’m going to stop speculating about them and just hope that they know what they’re doing.

Anyway, all things considered it seems to me that projects potentially missed by this survey would generally be less cooperative, humanitarian, and safety-conscious than the ones it found  it’s not likely that their nature is better than that of the public projects, and somewhat likely that it is worse. However, AGI is such a difficult field of research that isolated projects should be dramatically less able than public high-profile groups to attract enough talent or resources to warrant serious concerns. Considering also how few of the known groups have reacted to AGI risk with even moderate interest, there are better things to work on right now than worrying about potential hidden groups.

4. Conclusions

It looks like getting a higher percentage of R&D groups on board with safety concerns at all is a goal that is both pretty tractable right now and crucial in ensuring the long-term safety of AGI development. Outreach-type work is hard to do efficiently and indeed sometimes indistinguishable from just shoving the responsibility on someone else’s shoulders, but we’re looking at a situation where the institutes working on AGI are still relatively cooperative with surprisingly strong social and economical ties, alliances coordinating research policies are being formed, and direct AGI arms races such as major military projects may not have started yet. Still, there is a lot going on already, and a number of events likely in the near future that could trigger the interests of active R&D groups to diverge significantly and break the well connected network we’re seeing now, making it more effortful to ensure that groups care about adopting whatever security measures anyone comes up with.

Increasing the chance that safety measures are developed alongside the actual projects by the AGI groups themselves will also quite straightforwardly advance these measures faster on the lowest level, or at least make it more likely that groups notice alarming situations and pause their projects until they have consulted others. This doesn’t mean that people concerned about the risks shouldn’t also work directly on the problems of course, just that the social aspects of the issue are currently being neglected, and people with the relevant comparative advantages are unlikely to have better opportunities in the future to influence the future of AGI safety. If only a handful of all the projects globally accept the need for precautions, nothing guarantees that the direct work safety organizations currently do will ever be implemented when it is required in practice; if all the important groups do, however, the concrete work on safety will be connected to the progress of AGI research, and the higher-level principles involved will be more efficient to implement even if they’re developed by outsiders.

As a more actionable ending to my mostly pretty vague post, here are some examples of more or less obvious tasks that I suspect could be valuable based on what we now know:

• Gathering and curating a library of examples of actual AIs showing unexpected behaviour and strange solutions, such as reward hacking and tactics that rely on deceiving humans, since a growing body of empirical evidence will probably feel more salient than abstract arguments both to researchers and to the general public. Bonus points for associating it with a reputable institute, minus points for overly preachy vibes, exaggeration, etc.

• Looking more closely into why certain groups are interested in AGI risk while others aren’t, possibly contacting researchers in groups that have addressed AGI safety to figure out what convinced them. Then doing more of the things that did, if applicable. While governance of AGI research is valuable especially when international, the end results also largely depend on researchers themselves taking the risk seriously, since no policy will be able to cover every dangerous case.

• For anyone with relevant expertise in areas such as international cooperation, Chinese politics and machine learning cultures, or technology forecasting, the Future of Humanity Institute at Oxford just announced a program on AI governance and they’re looking for applicants. They say team members will have lots of freedom regarding hours, research areas, possibly even remote working opportunities.

For one-shot ideas, GoodAI’s General AI challenge just opened a round looking for submissions on solving the AI race. It’s open until May and welcomes stuff from policy proposals to more general roadmaps or meta stuff, but with an emphasis on actionable strategies. Write a good thing and then submit the good thing, other people can learn from it.

• Making sure that organizations concerned with technical safety issues are also represented in whatever conferences and consortia we see discussing AI ethics and safety right now; while most of these will probably be pretty cosmetic, the alternative is isolation and missing out on potentially important connections which are being formed now.

• Regarding potential future military AGIs, too much explicit attention to the arms race aspect should probably be avoided and any sort of public awareness-raising campaign is almost certainly worse than ineffective, but indirect ways of slowing down the development can perhaps be found. There are activists and experts seeking to regulate various narrow AIs in warfare; supporting them might be worth it even for people who primarily care about AGI risks.

• If you’re Nick Bostrom’s parents, please call him and tell him you’re proud of him. This absolute legend sitting on every AI safety advisory board everywhere I look, and that’s after writing the book that caused some of the most serious people in the world to take action in order to prevent the end of the world. What a dude.


*These numbers are based on Baum’s classification. I’m not sure I agree about the status of Susaro, the originally-US-currently-UK based project which was labeled active on the safety front – Susaro currently has little information on their website, but is led by Richard Loosemore, who has quite clearly dismissed concerns about catastrophic AI risk in the past. His views seem to be based on the idea that the only cognitive architectures capable of human-level intelligence or beyond will trivially be friendly or controllable due to necessary features of their motivational system design, so Susaro’s attention to safety might ultimately be meant to refer to Loosemore’s confidence in that an actually smart AGI will necessarily also be smart enough to know what we want it to do, and safely just do it. Hopefully I’m wrong, though!

Baum also categorizes the Human Brain Project as not active on the safety front, since its ethics program is focused on the procedure and not the consequences of research. I did, however, find a small subproject, the Foresight Labs in the UK, focused on “identifying and evaluating the future impact of new knowledge and technologies generated by the HBP”. While not articulating what their specific worries are, the group is interested in new technologies leading to unpredictable and uncontrollable outcomes, and also mentions fears related to translating artificial intelligence research into practice (including human-level AI, though not AI exceeding human capabilities).  This sounds like they could be responsive to worries about a misaligned brain sim, though currently their concern level is moderate at best.

The second sleep of ectotherms

palikke[TL;DR: The existence and intensity of subjective consciousness in ectothermic animals probably depends heavily on their constantly fluctuating body temperature, and is therefore pretty unstable even when they are awake.]

[Epistemic status: Oh boy here we go]


Even animals that are capable of subjective experiences are not equally conscious at all times. Naively, our own consciousness may seem like a state we inhabit more or less monotonously unless we’re fast asleep or deliberately messing with it, but sometimes it’s pretty noticeable that this is not the case. The degree to which we are conscious actually varies a lot depending of what we’re doing or experiencing at a given time; keeping track of this is difficult, though, because reflecting on your own experience often returns you to a more intensely conscious state, so that you might never properly notice you were somewhere else. Some people, notably Dennett and Drescher, even suggest that subjective awareness is more or less the result of intentional self-reflection and so does not exist at all in minds that are incapable of metacognition. I think some versions of this are totally possible, just not likely enough to justify moral indifference towards beings that demonstrate no such abilities — a more plausible alternative seems to be that there really is a magnitude or loudness to whatever we are feeling at a given time, and that probing our internal experiences mostly just makes them more intense since we attend to them more closely.

The most significant changes in the intensity of our consciousness normally occur during sleep and dreaming, though. When we’re asleep, our experiences vary from full unconsciousness to states that subjectively almost resemble wakefulness: but even though the events we go through in dreams are often bizarre enough to justify immensely strong and vivid emotional reactions, dream qualia are usually less intense and emotionally salient than what we would feel if similar situations occurred while we’re awake. (Otherwise, suffering in dreams would also be a much worse moral disaster than it currently seems to be.) When our subjective experience becomes stronger, like during particularly alarming nightmares, we tend to wake up — unless it’s trained in lucid dreaming, a sleeping brain can’t sustain very intense levels of consciousness.

There are various theories about what exactly the physiological functions of sleep are, but currently it looks like it primarily facilitates enhanced glymphatic waste clearance and energy store replenishment in the brain as well as synaptic pruning and other tasks related to connectivity regulation. Because consciousness is almost certainly dependent on extensive and metabolically costly brain activity that is incompatible with these tasks, sleeping reduces it to a fraction of its normal intensity, and at times even shuts it down completely.

Temperature as a determinant of consciousness

Even though there are many other chemical and behavioural ways in which we can somewhat intentionally modify the quality and intensity of our mental state, we — as endothermic animals with uniformly tropical internal temperatures and hence pretty stable facilities for metabolism-related molecular kinetics — practically never experience the other necessary factor that should massively affect an animal’s consciousness levels: large shifts in body temperature and their effects on metabolism and neural activity.

Ectotherms are animals that rely primarily on external warmth to maintain useful levels of body heat, and therefore tend to cycle through a wide range of temperatures depending on how thermally stable their environment is. This group includes the overwhelming majority of animals on Earth, some of which are cognitively pretty sophisticated. If we accept that these animals are directed by processes that are also experienced subjectively — which I think is likely at least when it comes to most cephalopods, reptiles, amphibians, and some fishes — their subjective life must be significantly affected by temperature, perhaps similarly to how our consciousness levels are affected by cycles of sleep and wakefulness.

Even though brain activity can probably not be allowed quite as much variability as, say, digestion or growth, many findings support the idea that an ectotherm’s brain still works very differently in cold and warm environments. Optimal cognitive performance can in some reptiles at least only be seen when tests are conducted well above room temperature: when closer to 30°C, tortoises seem to show unexpected maze solving strategies, learning by example, and the formation of long-term memories (despite historically underperforming in cooler laboratory tests requiring these skills). Interestingly, the effects of temperature on ectotherm behaviour are not limited to the immediate short-term consequences of enzyme kinetics, but can also direct development and fixed long-term behaviour. For instance, honeybees reared in higher temperatures have improved short-term memory even when the initial temperature differences during their development are equalized later on; and in another study, an increased probability to dance, earlier onset of foraging behaviour, and increased engagement in removing dead colony members. It is sometimes unclear which changes can be attributed to adaptive developmental acclimation and which ones are best understood as simple deficiencies.

So, assuming phenomenal consciousness is related to cognitive processes and supervenient on neural metabolism (which obviously is pretty plausible), a reptile or other ectotherm waking up in a cool place might experience the world in a minimally conscious, dream-like state until normal body temperature is restored, or even lack subjective experience altogether if the environment is chilly enough. Furthermore, due to differences in nervous system development, even an individual’s capacity to be conscious in the first place could be permanently increased if its development takes place in the higher tolerable end of the natural temperature variation in its habitat.

Anecdotally, my tortoise (the total cutie pictured both above and below) typically reacts in what looks like a mostly reflexive manner after spending a long time in a cooler area — for instance, automatically retracting as a response to seeing a shadow or being lifted, things that normally don’t frighten it at all anymore. It mostly shows its more personal, learned, less robotic behaviours after spending a while in warmer temperatures: thoroughly inspecting and actively following interesting objects, ignoring threats that have a history of not actually causing harm, crossing obstacles and long distances in an intentional manner, and sampling objects it figures are potential new food items (a bafflingly large category, especially in a species that naturally just relies on grass and things indistinguishable from grass). This is obviously a hunch based on my individual observations and interpretations, but I find it entirely believable that the former behaviours could be conducted in a sleepwalking state with little or no conscious experience, whereas the latter category might require more advanced cognitive processes — including the ones that bring about sentience.


I’m currently pretty confident that this principle applies to most ectotherms whose behaviour is complex enough to respond to different temperatures in interesting ways, if they are significantly conscious in the first place of course. Arthropods, which comprise the overwhelming majority of animal individuals and biomass on Earth, also show major behavioural changes in things like feeding rates, mating and communication, muscle output, and sensory perception as a response to temperature changes. I’m not confident about the picture I currently have of arthropod consciousness, but in his recent report on consciousness and moral patienthood, Luke Muehlhauser gives a 10-25% personal estimate of fruit flies being conscious in a morally relevant way depending on the definition. I would give it a slightly lower but definitely not insignificant chance.

Since consciousness is likely to be necessary or at least useful for many classes of behaviour and cognition, one could intuitively expect ectotherms in colder climates to have adjusted to their environment by systems that facilitate consciousness even when their general metabolism and growth often works very slowly, and optimal body temperatures can only be entered for a few hours each day (often by purposeful basking). This may not be the case if sentience is built on or mostly serves cognitive purposes such as enhancing attention, memory, or reactions to complex stimuli. Due to the significantly smaller number and diversity of organisms in cold climates as opposed to equatorial regions, there is just a lot less going on  so in the relative absence of previously unencountered predators, extremely diverse food items, or generally high ecological complexity, conscious responses may no longer be as crucial. Maybe a common viper only experiences a couple of hours of vaguely sentient time a day when the general buzz in its surroundings also peaks, and then gets by with reflexes the rest of the time. It just took me an hour to brew my morning coffee. I super understand that life up here in the North gets a bit sluggish.

However, it’s also possible that subjectively experienced information processing really is important enough that cold-dwelling ectotherms have developed something like cognitive cold-hardiness in order to preserve whatever processes also bring about sentience. It also seems to me that, say, the muscles of any given ectothermic species perform sufficiently well in the temperatures it is adapted to, but work even faster when the temperature rises above this (probably with tradeoffs that would be suboptimal for the organism as a whole in the long run). So is it also possible that there is a similar overshoot in consciousness levels, when an animal reaches a temperature that is higher than normally optimal but that also increases some sentience-related aspects of neural activity so that the the animal’s consciousness actually becomes significantly more intense than it normally is? Hopefully not. This sounds pretty wild. Nevertheless, due to how little we know about sentience and its relation to metabolism in ectotherms, I don’t think it should be ruled out immediately.

Obviously the idea of temperature-dependent consciousness and its corollaries have major implications for wild-animal suffering even in the absence of such a hypersentience mechanism. It is unclear how global warming will ultimately affect animal populations and how much suffering each trajectory can be expected to cause; but if we only look at the immediate direct effects, warmer temperatures could mean that most of the animals in the biosphere will soon spend more time being sentient, or just in more intensely conscious states, which means that the sum of global suffering could increase massively without anyone ever even noticing what’s going on.

There are no free lunches, but organic lunches are super expensive: Why the tradeoffs constraining human cognition do not limit artificial superintelligences

In this post, I argue against the brand of AI risk skepticism that is based on what we know about organic, biologically evolved intelligence and its constraints, recently promoted by Kevin Kelly on Wired and expanded by Erik Hoel in his blog. I’m not sure I agree with the worst estimates of a near-inevitable AI doom lying ahead of us (gonna sit on this increasingly uncomfortable fence for just a little longer), but I think this particular family of counterarguments seems in part to be based on confusion about which principles and findings concerning organic cognition are actually relevant to intelligence in general, or a would-be superintelligent AI in particular, and not just to artifacts rooted in our own evolutionary history.

This post assumes familiarity with the basic concepts surrounding AI risk, such as the orthogonality thesis and other issues with value alignment (no, we can’t just tell an AI what to do) as well as convergent instrumental goals (whatever your goals are, things like gaining indefinite resources, becoming more competent, ensuring your own continued existence, and resisting goal modifications are going to be necessary for reaching them). The basic idea is that once we build a useful agent with reasonably general cognitive competence and allow it to modify itself in order to become more intelligent (and so, recursively, even better at making itself more intelligent), controlling its advances and ensuring its compatibility with human existence will eventually prove difficult: a nonhuman intelligence will not share all the obvious human values we find so intuitive unless they are related to it in a foolproof manner, which is tricky until we have something like a formal, complete, and consistent solution to ethics, which we super don’t.

So once more, with feeling, let’s outline the concept we’re dealing with here. Kelly argues against a meaningful way to define intelligence altogether, so against a framework within which we could call a human smarter than a squirrel. I don’t find this position all that reassuring, for whether we want to call them higher intelligence or just different thinking styles or something, there are still very meaningful cognitive skillsets that allow agents to manipulate the actual environment around us and fulfill their potentially alien values more effectively than humans when pitted against our skillsets and values. Hoel suggests some good formal approaches to defining intelligence, such as Legg and Hutter’s definition based on the simplicity-weighed sum of the agent’s performance across all possible problems. In practice, though, we may not need to deal with such an abstract definition with lots of irrelevant dimensions and can only count the performance on problems relevant to manipulating the world, whatever those might be. So below, “cognition” usually just refers to the skillsets related to predicting and influencing our actual world more powerfully than humans as a collective are able to. We should keep in mind, though, that we don’t know very well which skillsets can be used for this in the world we currently find ourselves in – human-style thinking is definitely not the only and probably not the best cognitive structure for the job.

The other main component of getting stuff done is of course the ability to physically execute whatever has been concluded is the optimal thing to physically execute. Material issues could be the main limiting factor a young, would-be recursively improving intelligence runs into: efficiently acquiring, refining, and utilizing raw materials sounds like a trivial chore, but the macroscopic physical world is slow enough that expecting anything like explosive growth requires some pretty complicated postulations. But the takeoff doesn’t need to be that fast and there are viable ways around this for a benevolent-seeming and promising AI, so let’s drop this issue for now, assume an AI with access to the necessary material resources via some unspecified general villainy, and focus on the cognitive aspect the original articles also tackle.


Next, I’ll briefly concede the points that can immediately be conceded, and explain why I still don’t think they work well enough as arguments against AI risk.

1) Like Kelly says, it’s true that an agent’s potential intelligence can’t be absolute or infinite (solving every conceivable problem is indeed impossible as far as our current understanding of elementary logic let alone physics can tell). This is not required for an agent to pose a major threat to conflicting value systems with human-level defenses, however. If value alignment fails, we don’t know how competent an inhuman AI needs to be to reach existentially threatening powers we can’t comprehend well enough to route around (like the God of Go so eerily does within its narrow domain) but the list of relevant problem types that are trivial to an AI but insurmountable to us doesn’t need to grow all that long until we’re already looking at something really worrying.

2) The typical intelligence explosion scenario often features an exponential improvement curve; Kelly is probably correct in that there is little evidence that this is going to be the case, especially since hardware growth and rearrangement are presumably required for indefinite effective improvement. However, the growth rate doesn’t need to be literally exponential to pose an existential risk – with or without intentional treachery, we will still not be able to comprehend what’s going on after a while of recursive improvement, and roughly linear or irregular growth could still get faster than what we can keep track of. And since any agent that is even somewhat misaligned to our values (or uncertain about whether it is!) will try to find a way to conceal its actual competence levels as soon as it has a grasp of how its interactions with humans tend to play out until it has a decisive advantage, the eventual results could look rather explosive if not exponential to us even if the actual takeoff takes years and years instead of weeks.

3) Kelly argues that an AI would not be able to do human-style thinking as well as humans. A superintelligence would indeed not necessarily look anything like our intelligence does, and it might be that humans do human reasoning, defined in some fairly concrete and detailed sense, more efficiently than a silicon computer ever could. Kelly also suggests that singularitarians interpret Turing completeness erroneously: they are correct in that given infinite resources and time, human reasoning could be emulated on a different substrate, but mistaken in that this can be done effectively (e.g. with polynomially scaling resources) by anything other than a biological brain. Inefficiencies are indeed likely if you seek to emulate a literal human brain including all of its noise and redundancy, as emulations are always less efficient than hardware copies when you aim for bottom-level perfection. I don’t think we can confidently assume the complexity will prove insurmountable, though, as bottom-level perfection is not what we’re after.

More importantly, a superintelligence doesn’t need to do human-style thinking to be dangerous, much less start from emulating a human brain. It needs to get stuff done, and there are no theoretical or practical reasons for the relevant computations – which essentially consist of something like probabilistically and deductively extending and manipulating actionable information about the physical world, as well as recognizing something like goals and complicated practical syllogisms related to them – to be out of reach or only inefficiently computable to a silicon intelligence we intentionally build to solve real-world problems. Taking implementational details such as embodied cognition into account or otherwise strictly emulating human reasoning isn’t necessary in any way.

4) Kelly argues that humans are far from general problem-solvers, and that an AI’s thinking could not be absolutely general either, which is of course true. He then says:

“We can certainly imagine, and even invent, a Swiss-army knife type of thinking. It kind of does a bunch of things okay, but none of them very well. AIs will follow the same engineering maxim that all things made or born must follow: You cannot optimize every dimension. You can only have tradeoffs. — A big ‘do everything’ mind can’t do everything as well as those things done by specialized agents.”

But perfectly generally optimized or otherwise literally godlike competence is not needed to get all the relevant major things done, and there are no laws or principles that require an AI to remain less or only reasonably more competent in the relevant domains than humans are. So I agree with the maxim dictating that everything can’t be optimized, but not with the further claim that an AGI could not optimize the relevant and dangerous dimensions of problem-solving vastly and incomprehensibly better than humans can optimize their defenses: it’s just not written anywhere in the rules. Most of this post is centered on this question, since it seems to lie at the core of our disagreement.

The No Free Lunch argument against artificial general intelligence

Kelly hints at a principle which Hoel makes more explicit in his post: the idea that optimizing for one skill will necessarily impair one’s performance in something else – a general No Free Lunch principle, which implies that cross-domain competence is always going to lose to specialization. If I interpret the fundamental premises correctly, both Kelly and Hoel believe that humans are actually doing very well in maxing out and balancing all the relevant dimensions of cognitive competence (relative to the unknown limits imposed by the No Free Lunch principle) – well enough that no realistic AI could compete with us should some value misalignments arise; or that even if humans aren’t competent enough, we can always build narrow, specialized AIs to replace or beat the generalist.

Kelly suggests that we shouldn’t assume humans are not at or near the global maximum of relevant reasoning skills:

“It stands to reason that reason itself is finite, and not infinite. So the question is, where is the limit of intelligence? We tend to believe that the limit is way beyond us, way ‘above’ us, as we are ‘above’ an ant. Setting aside the recurring problem of a single dimension, what evidence do we have that the limit is not us? Why can’t we be at the maximum? Or maybe the limits are only a short distance away from us?”

He doesn’t explicitly provide positive evidence for this assertion, though, only the apparent lack of evidence for opposing beliefs, but I think he implies the tradeoffs become too expensive quickly after we reach human-level cognition. In accordance with this, Hoel suggests that the NFLP supports this view: as an example, he points to empirical findings about human intelligence, where we occasionally find savants excelling in some cognitive pursuits but dysfunctional in others. I think the principle is a valuable addition to the AGI debate and the limits of its applicability should definitely be explored, but the evidence presented so far doesn’t look sufficiently strong to let us lay the concern about AI safety to rest. What’s more, there is plenty of evidence against this belief, and a lot of it can be framed in terms of the NFLP itself. Organic brains must do so, so much in terms of non-relevant tasks that there is plenty of useless, bio-specific competency for an artificial system to trade off.

Humans with a history of civilization are extremely competent against ants and most other agents we are currently up against, and it’s tempting to think that we are pretty close to optimal world-manipulators. But due to the history of organic evolution, our cognition runs on overly tangled, redundant badcode on a very local hilltop that isn’t optimized and can’t be optimized for efficient cognition. There are eventual constraints for intelligences implemented in silicon too, but it seems to me that these are unlikely to apply before they’re way ahead of us, because the materials and especially the algorithms and directions of a developing superintelligence are intentionally chosen and optimized for useful cognition, not for replicating in the primordial soup and proliferating in the organic world with weird restrictions such as metabolism and pathogens and communities of similar brains you need to cooperate with to get anything done. The next section outlines some of this evidence.

Why are there limits to human intelligence?

Most of the discussion about the evolution of human intelligence focuses on our anatomical and physiochemical limitations: on the implementational level, biological intelligence is constrained by the fragility and limited search strategies of its stochastically evolving physiology. Organic computation is a noisy, hackish electrochemical mess of lipid-constrained compartments interacting with varying effectiveness and constantly on the verge of flat out dying because of something causing the slightest change in pH or temperature or oxygen or nutrient levels so that some relevant enzymes denature or the cell runs out of a few high-energy molecules to fuel its work against various gradients of entropy. Surely silicon-based computation can also be made to sound sort of silly if we go down to the very lowest levels of explanation, but it does look like most of our dead ends are rooted in the substrate we run on.

Our neuronal patterns have immense amounts of chemical noise and compensating redundancy, and the energy costs of high-level information processing are significant to an animal like us. For many of the features associated with higher intelligence, there are clear biological reasons why they are difficult to increase further. We could be smarter, e.g. arguably if we on a species level just had larger brain volume in the right areas; but we may have traded off better problem-solving skills for preserving energy, heat dissipation, connectivity problems, or something like fitting through birth canals that can’t practically be larger since we’re bipedal and mobile and everything. Or, potentially, if our neural branching worked differently – in ways that unfortunately seem to cause debilitating neurological diseases when expressed excessively. Smaller, more densely packed neurons seem to make you better at processing complex information presumably due to the decreased distance between communicating areas, but our cortical neurons are already close to the size limits where random misfirings due to spontaneously opening ion channels start messing everything up. Some findings suggest that the connections related to higher general intelligence in humans are particularly costly due to simple anatomical reasons, such as the long distance between higher-level association areas, so diminishing returns dictate that a larger neocortex might not have been useful enough to compensate for the time and energy costs it incurs for a biological animal. In sufficiently complex systems, our axons are eventually too slow to facilitate a processing speed compatible with functioning in the wild.

The efficiency of biological versus in silica computation is obviously an old question there is plenty of literature about, and even in many fairly low-level tasks we still have strong advantages over supercomputers mostly due to our massive parallelism, but we should keep in mind that the debate typically concerns timelines for artificial structures reaching our levels of efficiency, not the possibility of it. Effectively implementing similarly parallel or otherwise unconventionally organized processing on vastly better hardware may take more than a few decades – or it may not – but the resulting improvements in processing speed alone will probably be a game-changer. This is not to say that dumping tons of processing power in a system will make it intelligent, just that once a reasonably general intelligence is built, there are good reasons to assume processing power might make it superintelligent.

Bostrom calls this subtype a speed superintelligence: a mind that isn’t necessarily a lot more competent than the smartest humans on the algorithmic level, but faster by several orders of magnitude and so rather as baffling and unstoppable as a more effective thinking style, whatever that means, would be to us. This agent seems to avoid Hoel’s objections related to humans being close to the optimal balance of different areas of intelligence. Even in the very unlikely case that a superintelligence has to emulate human-style thinking and even start out from a rather low level in order to accomplish stuff, better hardware could well compensate for these losses in efficiency, while still surpassing us by a wide margin.


From what I can tell, though, we can expect to get orders of magnitude of more leverage from algorithmic improvements. So what can be said of our algorithmic efficiency, and the tradeoffs it is subject to?

Hoel suggests that different aspects of cognition are like sliders you can adjust, coupled to each other positively or negatively, though mostly negatively, so that getting more attentive might for example impair your memory. But among most humans these abilities seem to correlate, and only at extreme ends do you sometimes see the savant-type imbalances Hoel mentions. Even savantry, whether acquired or congenital, does not always carry notable tradeoffs, but probably does require something developmentally or structurally surprising to happen in the brain. This looks a lot like blasting the brain with lightning or removing biologically well preserved and typically useful parts from it just sometimes shoves it onto a higher hilltop further away which evolution in its search for local optima would probably not have found – but overwhelmingly often, it causes severe impairments in many other areas, because there are always more ways in which things can go wrong than there are crude tricks for improvement. If the imbalances resulted from algorithmic tradeoff necessities as opposed to evolved implementational limitations, it would be more difficult to explain why generally very functional savants exist at all.

In the cases where our cognitive algorithms do clash, though, we use metacognitive skills to adapt to the task at hand. Many researchers liken our cognitive abilities to a toolbox we strategically choose the right algorithms from; but these metareasoning skills are very limited and inflexible in humans, and can’t very well be applied to involuntary processes. For example, if better memory interferes with creativity, humans who want to strategically increase their divergent thinking are pretty much out of luck. An artificial system – whose metareasoning skills could also be designed or trained to get better results than we do – can be more flexible in turning its various modules or styles on and off, or more imaginatively fine-tune their interactions to match different situations. Such metacognitive skills are complex and definitely not easy to implement, but there is no reason to think they are implausible, and they could make many of the potential tradeoffs temporary in a way our cognitive tradeoffs could never be – and thus allow many of the relevant thinking styles and their interactions to be dynamically optimized, and very effectively increase the system’s adaptability to changing situations.

Anyway, we don’t currently know a whole lot about human cognition on the level of specific algorithms, but the general positive correlation between different cognitive capabilities as well as the rough ideas we have about how they work seem to contradict Hoel’s concept of balanced, mutually opposed forms of intelligence. There is nothing conceptually contradictory between most areas of cognition, and functionally it looks like they in fact often lean on and facilitate each other. Also, awkwardly, the strong suites of human intelligence, such as pattern-recognition and abstraction, rely on heuristics many of which we have grown out of well enough to call biases by now. Our quick and effective judgments rely on algorithms we know are coarse-grained and frankly kind of weird in a lot of ways, but can still only surpass in accuracy by expending a lot of energy on formalizing our approach and augmenting our reasoning with artificial computers and large bodies of prepackaged information. There are immensely more accurate algorithms that we sometimes see, understand, and can even laboriously adopt and combine to grasp large bodies of knowledge, but that are not part of our intuitive toolbox which instead is filled with bizarre distractions and crude approximations. Could they be part of the immediate toolbox of an artificial intelligence? Seeing as our most accurate reasoning about large, complex wholes requires us to emulate increasingly formal approaches, it seems likely that a system whose computation adheres to formal principles from a lower level upwards could complete these better strategies faster and more efficiently. But this is pretty abstract, and it’s not clear how rigid an optimal world-manipulator will be in this sense.


Higher levels of analysis get increasingly damning, though. What purpose does our cognition serve? Which tasks is it optimized for? Have human smarts primarily been selected for features that aid in the relevant types of intelligence?

Well, it’s complicated, but no. The skillsets associated with reproductive fitness during human evolution are… not exactly identical to the skillsets you need for large-scale technological world manipulation. The prime directive of all organically evolved species is replication: this statement sounds uninteresting, but its corollaries are massive. Humans are an intensely social animal whose survival and reproduction opportunities are primarily determined by group dynamics. This is not to say that the abilities that help you get by in social situations aren’t useful for other dimensions of problem-solving as well – general intelligence correlates with social skills, and many theories about the primary drivers of the evolution of our intelligence place a lot of emphasis on the social games we play in order to prove others we are also good at solving many correlating problem types. But the social environment humans evolved in also means that there are things we can or need to optimize at the cost of general reasoning – as evidenced by the richness of our social cognitive biases – and that we may sometimes be better off freeloading off the intelligence of others (e.g. by being likeable) than doing the work ourselves. In a community, there may be smarter ways to be smart than actually being smart, and sometimes these ways are directly antithetical to the skills you need to predict and influence the world on a large scale.

In a sense, the useful unit of survival and thriving for humans is a group (whereas the unit of selection for intelligence is an individual). This means that human intelligence is very fundamentally a collaborative effort, in that none of our actually impressive cognitive feats could have been accomplished by an individual starting from scratch. According to both Kelly and Hoel, integrating different subsystems of cognition into a general actionable whole is the most expensive part of intelligence, which is the primary reason intelligence incurs greater and greater costs as it generalizes more. But interacting with other minds like humans do – trying to coordinate what you know and plan to do using a deeply vague symbolic language and other external super expensive cues – is like the least efficient form of this, and yet exactly what we have do all the time in order to reach any of our goals. (See e.g. the distributed cognition model (Johnson 2001) for an interesting description of communicative interactions as cognitive events, and cognition as a co-created process.)

Unfortunately, human cognitive communities are also immensely redundant. The same processes manifest in individual human minds again and again with only comparatively small modifications, facilitated by resource-intensive learning within narrower domains – even though we still pay the hefty price of inefficiently integrating these processes. An artificial structure could integrate its modules or subroutines through routes and representations vastly more effective than a human community utilizing shoddy human communication is, and the processes it combines also add substantially more to the system because there is less redundancy between them. Generalization being so costly doesn’t mean that there can’t be better generalists than we are, it means that there are some immensely effective low-hanging fruit for an agent with actually good integration skills to pick.

Hoel also compares general intelligence to a superorganism optimized to thrive in any environment: just like no such ultimate organism exists, no agent could be universally intelligent in all the domains it encounters. I could well be missing something here, but it seems to me that considering this idea actually strengthens the concept of sufficiently powerful general intelligence. Humans, while not literally superorganisms and again individually pretty useless, are a reasonable approximation of such an organism when considered as a civilization. The collaboration of humans has so far enabled us to conquer almost any interesting location on Earth, extract resources from sources no other animal finds use for, and severely punch most other organisms in their literal or figurative noses whenever we feel like it. Tardigrades may survive extinction events we never would due to their also rather universal hardiness, but if we want a square kilometer without tardigrades or incidentally unsuitable for tardigrades, we get a square kilometer without tardigrades or incidentally unsuitable for tardigrades. The converse is hardly true. This is because we as a civilizational intelligence distributed across time and space in silly human-sized vessels really are sufficiently general to outsmart most competitors we currently know, if we actually want to – though, due to our many demonstrable inefficiencies, in ways that also leave plenty of room for improvement.

If we’re going to rely on competition, we probably already lost

As mentioned above, another possible source of hope is that even if humans are way below the limits of a silicon-based intelligence, this agent would still be under our control because no matter what it seeks to do, we can counter and outsmart it with a narrower, hence more powerful competitor. Hoel, for example, mentions competition in passing:

“Even if there were a broad general intelligence that did okay across a very broad domain of problems, it would be outcompeted by specialists willing to sacrifice their abilities in some domains to maximize abilities in others. In fact, this is precisely what’s happening with artificial neural networks and human beings right now. It’s the generalists who are being replaced.”

But we aren’t going to remain better than a semi-general superintelligence at creating narrow intelligences either. We won’t even know what sorts of specialist AIs we might need to counter whatever an AGI is planning to do, as its cognition might be utterly alien to us even when not otherwise powerful. Who are the competitors, and when is the competition going to happen? The situation does not resemble biological evolution, where the need to replicate and pry scarce resources from an uncaring abiotic world drives the separation of populations into extremely specialized species in constant competition with each other. An AI in development is freer from material scarcity than any organic being has ever been, and its rules for competition are a different terrain entirely than the one we evolved in.

During initial design and selection by humans, specialist AIs will certainly be useful, their outputs effectively comprehensible to humans and combinable by us into coherent actionable wholes. But there are large-scale problems we really really need to solve, can’t tackle with our own cognitive skills due to the massive complexity involved in deeply processing the outputs of our specialist systems, and want a more powerful agent to make sense of: so such an agent will be made by someone as soon as it is technologically feasible. Specialist AIs are not effective competitors after we’re able to build a generalist that makes better use of the specialists’ outputs than our rigid, slow brains are able to.

Concluding remarks

I hope to have given a reasonably convincing account for why I think human cognition is primarily limited by its biological origin, and probably weak enough to be dramatically surpassed by intentionally designed, less redundant, and materially abundant systems with an actual focus on effectively predicting and influencing the world. Even if there are eventual necessary tradeoffs for artificial systems as well, we don’t know where they lie based on our knowledge about organic intelligence, and AIs could well deal with these tradeoffs more dynamically than we are able to in possibly surprising ways. With all the evidence we can see on multiple levels of analysis, I think there is enough potential for improvement in intentionally designed intelligences to build a mind to whom humans really look a lot like mice or ants. Discussion about the limits of cognition and potentially necessary tradeoffs between its components is very valuable, though, so while I would personally be surprised to discover that humans are anywhere close to maximally competent at manipulating the world, this point of view is likely a relevant addition to the AI discussion.

Anyway, another thing to keep in mind when comparing human and artificial cognition is that humans, well, don’t really super have terminal goals. We have the capacity to think somewhat strategically and often figure out the optimal course for whatever we claim to work towards, but frequently just… don’t, because strong and stable terminal goals aren’t how human motivation works. We neglect by default even the basic goals we unanimously deem instrumental for any agent with actually important values, and instead spend a lot of time just going with the flow, trying not to let all our incompatible goals clash with each other badly enough for us to notice. Due to our own constraints, it is difficult for us to understand how an agent that actually has invariant and consistent terminal goals is going to behave, so we intuitively assume that similar ineffectivenesses will arise even in AIs that supposedly have values. This is probably not going to be the case, which again adds to the costs we must pay compared to intentionally designed systems.

Whether or not optimal reasoning in itself will be enough to threaten our existence is a good question, but beyond the mostly evolutionary scope of this post. Kelly deems this assumption fallacious: he says that an AI will not be able to beat us or even indefinitely improve itself just by thinking about it really hard. This is true to a certain extent of course, and it would be interesting to get to see what the limits are. But again, what we want is not merely a solipsistic thinker: we want a useful agent to help us with the complex problems we ourselves battle with, and will equip our creations with interfaces through which they can influence the actual world. The inevitability of a superintelligence, if such an agent is possible, lies in the fact that we desperately need this type of competence, and will gladly build it up as long as it looks like its values are also identical to or compatible with ours. So, if thinking and communicating just lets it convince us of that, we are likely happy to solve the rest of the initial problems, feed it all the data it needs, and probably essentially give up control soon enough whether or not we realize that’s what we’re doing.

Maybe it is implausible that by observing a single pebble, a realistic optimal thinker could infer the entire universe and quickly have all it needs to fully control its future light cone. But with an amount of agency and base knowledge that lets an AGI be useful to us, it can certainly get a lot further than we can predict or necessarily control – that’s how good inference ultimately works. While it’s absolutely true that the risk is currently hypothetical and there are plenty of potential pitfalls that could lock down a realistic recursively improving AGI, we don’t have a strong idea about where or what they are. Real thinking, by agents with real terminal goals, has never been tried.

Do social animals suffer more?

mangusti[Epistemic status: Very speculative!! Not science!! Armchair evo-psych is bad for you etc., but there are some important questions we currently don’t have a better way to try to answer, so.]

[TL;DR: The intensity of suffering, an evolved motivational state, is likely to vary even between species with generally similar levels of sentience. I describe four principles which could suggest that as a result of their evolutionary history, social animals typically suffer more than asocial ones.]


There’s a lot of research going on about sentience and moral patienthood: which creatures are phenomenally conscious and to what extent is one of the first things to consider when figuring out what exactly deserves our moral consideration. However, consciousness itself is arguably a neutral property, much like existing as a material object is a neutral property. Even if a creature has subjective experiences, if these experiences solely consists of being aware of stuff – with no desires, aversions, or other subjectively felt motivations towards anything – it’s not really good or bad that such creatures exist or that things happen to them. Only the capacity to experience states with emotional valence makes something a moral patient (unless you insist on consciousness itself as a terminal value, which some people do of course – I think it’s aesthetically interesting and okay I guess, but distinct from morally important properties, which need to be tied to hedonic tone or motivations or preferences to make sense).

If our aim is to minimize suffering in some conventionally defined sense, it is obviously not enough to know if an animal is conscious. If we accept that life or even consciousness don’t necessarily imply a capacity to suffer, we need to estimate the extent to which the animal reacts to stress with subjective distress. Most mobile creatures produced by the brutal evolutionary processes we’re familiar with show clear behavioural signs of nociception when physically hurt, such as avoidance and attempts to disrupt the sensory pain signal if possible; and the closer an animal is to our own physiological, behavioural, and taxonomical type, the greater is also the probability that these signs really do imply subjective suffering as well, instead of just reflexive or mechanical reactions (this blatantly anthropocentric line of evidence is far from conclusive, of course, it’s just that it’s almost all the evidence we currently have).

However, assuming that suffering is a product of evolutionary processes, there are good reasons to believe that the intensity of subjective suffering varies between species just like other evolved properties do: according to their historical usefulness during the unique evolution of a given population. Even if every tetrapod has four limbs, different environments and niches have formed different uses and adaptations for these limbs. The capacity to suffer is more fundamental than that and its uses are probably more unified, but slightly different adaptations are to be expected, depending on what sort of things an animal is motivated to do and what kind of an environment it has been shaped by.

This seems likely because contrary to the standard biology textbook view, suffering is more than just a signal of a harmful situation. Intense suffering especially is primarily a motivational state that facilitates not only direct avoidance of harmful acts and environments but also complex decisions under threat or risk, long-term learning, social investment and bonding, competition and communicating, all depending on the other aspects of an animal’s evolutionary history, cognition, and lifestyle.

Behaviourally and, uh, anecdotally, it seems that humans have the capacity to suffer a lot. A defining feature of our species is the immensely complicated social behaviour we develop when surrounded by other people, and it has probably shaped our subjective experience more than any other aspect of our cognition has. So, in this post, I try to pin down some principles and hunches that suggest that a social evolutionary history in particular could produce species that suffer intensely – though significant suffering is still probably present in all conscious animals – and then take a brief look at the implications of this possibility.

The extended homeostasis of social animals

Suffering as a motivational state is typically the mental component of an animal’s homeostatic regulation, i.e. the processes that keep all the relevant physiological variables between healthy parameters. Most things that threaten your homeostasis in a way that humans have historically been able to survive when motivated to do so will cause some kind of suffering: thirst when your blood volume starts to drop, pain when a wound opens and leaves you vulnerable to pathogens and blood loss, sickness when you have ingested toxins and need to expel them. When the threat isn’t currently actual but can pretty reliably be predicted to come true unless you take physiological or behavioural precautions, your species will evolve predictive homeostatic processes. Many of these predictive processes are cognitive or emotional in nature, e.g. people often feel distress in darkness and high places – things that cause absolutely no damage in themselves, but correlate with future homeostatic disturbances.

Among social animals that habitually rely on others to survive and thrive, predictive homeostasis is extended to social relations as well, so that an individual without sufficient relationships suffers from loneliness and other emotional disturbances. Not all social relationships are homeostatically maintained: the drive to acquire social status probably doesn’t really settle around a set point or anything, as it has more to do with mating opportunities than with survival. Social belonging, on the other hand, can somewhat accurately be defined as the part of social relationships that is indeed homeostatic – maintained by feedback loops within a certain dynamic range, where a lack of it leads to negative emotions, and an excess is quite naturally dropped due to time constraints and/or social stress.

As the number of things you need to consciously attend to when maintaining your homeostasis increases, so does the probability that something is missing, which plausibly leads to more suffering. In a community, your wellbeing becomes directly tied to the wellbeing of others, which again increases the number of things that can go wrong: not only do you care about how others treat you to ensure your direct wellbeing, their interests are now inherently important to you too, so that you feel some of their pain even when it’s directly irrelevant to you. Empathy, especially its affective aspects, is a major mechanism by which this extension of homeostatic suffering becomes possible, since it motivates you to make sure your companions survive and thrive as well, and in a silly metaphorical sense moves you closer to becoming a single organism (only with multiple simultaneous consciousnesses, and so increased maximum suffering levels).

Suffering and contingent social commitment

A lot of human suffering comes in the form of worry or grief over lost social bonds. Evo-psych hypotheses about the origins of social grief are based on the utility found in maintaining close relationships and seeking reunion on pain of distress (e.g. Archer 1998). When an important bond is permanently broken and little to no chance of reunion remains, the normally useful reaction becomes temporarily maladaptive. Prolonged, intense, and public displays of grief probably serve a signalling purpose as well, providing evidence that you’ll emotionally commit to maintaining a social bond: this can only apply to animals whose social attachments are contingent and based on reciprocity, individual recognition, and familiarity, whereas eusocial animals (primarily social insects) may not need to experience such loyalty towards specific individuals. The exact evolutionary processes at play are poorly understood, but it remains likely that most other cognitively advanced, conditionally social animals also experience emotional separation distress, and that the accompanying behaviour aids an individual’s commitment to maintaining social bonds.

All of this should work in synchrony with the social homeostasis model sketched above. Indeed, Hofer (1984) found two distinct behavioural patterns in nonhuman animals separated from their companions. An immediate, acute reaction to a specific loss appears as distress, searching, preoccupation, and even aggression. This reaction quite naturally helps an animal to reunite with its lost companion should it still be possible. Another reaction develops afterwards or simultaneously but over a longer time period, and involves passivity, inactivity, and disturbances in biological rhythms, presumably in the absence of familiar sensory regulators provided by the lost companion or group. This is probably not directly adaptive in itself, but a byproduct of the otherwise useful state of being able to consistently rely on cues from others (possibly persisting again as an exaptation due to signalling or other indirectly adaptive reasons).

Some of human grief can also be modelled as a combination of these two processes, but might there be a difference between the typical separation distress that many social animals feel, and the cognitively heavy, temporally complex pain that human social suffering involves? Some intuitions suggest that animal suffering, even when subjectively experienced, is qualitatively different from human suffering since most animals lack the psychological layers of future-directed worry, advanced processing and rumination, and the resulting elements of subtle despair and hopelessness that intense human suffering typically involves. I’m not sure how likely this is regarding suffering in general, but I do think long-term social suffering is at least greater in humans, who rely on personal social commitments more than most other animals do. There are tons of unexplored nuances both in human grief and animal separation distress, but the strongest function may simply be that by making social relations part of the necessary conditions we feel miserable without, we successfully blackmail ourselves to seek company, and also prove ourselves loyal to others in the same predicament.

markhor.pngViolence: probably literally the worst thing

An obvious source of distress to social animals is intraspecific violence, which to the victim is likely to differ dramatically from other kinds of tissue damage. For an asocial animal, violence is not really an applicable concept: literal violence requires social intentionality of some sort. Much like a shark attacking a human isn’t really violent (just uh, various other kinds of suboptimal), for asocial animals conspecifics and other animals alike are basically forces of nature that may or may not harm you according to their non-negotiable whims. It’s typically useful to fear these things and of course suffer when damaged by them – but embedded in a social lifestyle where the risk of game-theoretically regulated intentional harm from others is possible but not inevitable, and often dependent on your communication and the community around you, suffering serves more functions than that. So new layers of intense suffering have developed to organize and guide individuals in violent populations – in addition to tissue damage, violence causes purely psychological harms like terror, long-term anxiety, disgust and distrust, hate, extreme despair, and of course vengefulness and the perpetuation of conflicts, again depending on the species in question.

Now, we can’t directly compare suffering levels even between humans attacked by other humans and humans attacked by lions or harmed by tornadoes. We do seem to fear and avoid intentional violence significantly more than other sources of harm, though: guns, murderers, and terrorists cause widespread panic and behavioural changes, whereas similar non-intentional harms are easier to bear, more quickly forgotten, and rarely get people to instantly rally around political causes or radically change their habits or anything. A stronger argument for violence feeling worse than non-social harms is that post-traumatic stress disorder – presumably the long-term consequence of going through something maximally upsetting and horrifying while equipped with a predisposing genetic makeup – is disproportionately often seen in humans after interpersonal harm, as opposed to accidents, natural disasters, and especially diseases (Kessler 1998). There are other possible explanations for this depending on the actual etiology of PTSD, but the simplest explanation seems to be that violence is indeed worse than other forms of damage, suffering-wise.

For social animals that have a grasp of humans as intentional agents, it is possible that humans hurting them is also experienced as violence of some kind. This grasp doesn’t necessarily mean that they have a solid theory of mind or anything, just that they model humans as agents a bit like their conspecifics – something you can somewhat personally trust or distrust based on external cues, and possibly communicate with. Animals that are known to exhibit PTSD-like behaviour after human mistreatment include dogs, elephants, chimps, and possibly cetaceans, all animals with complex and relatively personal, communicative social structures. Most of the research on animal PTSD is based on captive animals and human mistreatment, so we currently don’t know what conditions typically lead to similar pathologies in the wild, if any – but being social and somewhat cognitively advanced again seems like a prerequisite for this type of suffering. Even if all of them don’t process violence as intensely as humans do, it seems plausible that for these vulnerable animals, it’s also more traumatizing than other kinds of tissue damage. Since it is such a powerful way to build hierarchies and organize group behaviour, violence and its threat plays a part in the life of most other social creatures as well, and it probably adds a few extra layers of stress and suffering to every unstable social situation even among less cognitive animals.

Also, the best way to reduce the lethality of intraspecific violence is probably having a clear signal of submission, i.e. a credible display of sufficiently intense pain; among social organisms, showing suffering is a straightforward way to signal many other things as well, such as a need for help from allies when challenged. Asocial animals receive no benefits from displaying their suffering and typically have no purposeful external signals for communicating injuries or pain – on the contrary, being able to conceal your injuries as best as you can is crucial when calling for help is not even a comprehensible option for you and showing weakness typically leaves you vulnerable to predators. Social animals, on the other hand, usually do have signals for suffering – and since suffering more intensely makes your signals stronger and more credible, suffering more in these situations has also been adaptive to an extent.

Having friends: an exciting opportunity to suffer more than you otherwise could have afforded to

What else does suffering give you in a social environment? If your species is mostly prosocial, potentially a lot. When ill or injured, an animal feels long-term pain and distress, which discourages it from using and stressing damaged body parts and makes it keep still and use the available energy to recover – all of which also effectively prevents it from seeking food, shelter, or other necessities. Therefore, a member of an asocial species faces a straightforward survival tradeoff: prolonged and intense suffering, while protective, is also severely limited by the animal’s need to actively gather resources and defend itself. When you’re a social animal surrounded by basically sympathetic and reciprocal companions and relatives, however, this tradeoff could become slanted towards a greater intensity of suffering. If others can temporarily take care of your resource needs and protect you from threats, it suddenly becomes possible to spend a lot more time resting and recovering – as long as you’re in the right motivational state to do so, e.g. preoccupied with how unbearable your current existence is.

This is a more fundamental mechanism than any of the others above. Just having more potential homeostatic disturbances doesn’t necessarily mean they are experienced as more frequent or more intense suffering: maybe internal motivation levels are roughly calibrated between species so that where a social animal feels extreme agony over separation from its companions, an asocial animal can afford to be more sensitive to hunger or thirst since it naturally lacks the things social animals are debilitated without, and so feels similarly intense suffering under even a milder starvation threat. Maybe violence is the worst thing that can happen to a social animal, but asocial animals again experience similar fear and pain from natural causes, which social animals just have to rank as lower pains on a basically similar gradient in order to stay functional. But having prosocial companions could shift the absolute cap of your species’ suffering just by allowing individuals to wallow in all-consuming pain and misery without simply dying of hunger in a couple of days. (Friendship is magic.)

The usefulness of this hypothesized system varies a lot between different species. Clever and reasonably adaptive animals, such as humans, have some ways to protect an individual from harm and many to bring them suitable food and water when necessary. Elephants – while smart, prosocial, and exceptionally good at weighing several tons and so protecting weak herd members from predators – are grazers and browsers with a nutritionally unimpressive diet. This makes it immensely difficult for others to bring an injured individual all the food it needs, so at least some level of activity needs to be maintained even when ill (one should hope this means that an elephant’s maximum amount of physical suffering can’t be as intense and devastating as it sounds like to us). A good but heartbreaking rule of thumb might be that whenever we hear an uplifting story about an animal taking care of its weak or injured companion, we’re also looking at a species capable of experiencing the worst feelings of suffering in the biosphere. Maybe.

Implications and conclusions

Should we conclusively find that an animal’s natural degree of social behaviour is a good predictor of how much it suffers in various situations – both social and nonsocial – we would obviously have better tools for building policies and other solutions to effectively reduce suffering. Future research confirming similar conclusions could direct our attempts to improve animal welfare: for example, seafood is currently estimated to be one of the most suffering-dense protein sources to consume due to the small size of fish (which leads to a low meat/consciousness ratio) compared to cattle or pigs – but since the large herbivorous mammals typically grown as livestock are very social, their capacity to suffer may be greater quite independently of their other cognitive capabilities, which might eventually turn out to outweigh their large size. Chickens, unlike fish, have a very social lifestyle, which combined with their small size would make them one of the absolute worst animal-based foods to eat. Still, I’m wary of this approach to animal welfare now that veganism is heavily trending anyway (I hope? At least in Finland?) and our knowledge base is so severely lacking. It’s probably best to just ride the wave and focus on advocating better plant-based protein sources as well as in vitro meat as soon as it becomes a real option.

What about wild-animal suffering? Ecosystems whose fauna primarily consist of solitary herbivores may be more desirable than systems with lots of social animals even in the absence of predators, as social animals may react to other inevitable disturbances with greater suffering. When designing interventions to aid animals in the wild (emotionally compelling small-scale example here), social animals should possibly be prioritized, and long-term ecoengineering solutions developed for these species in particular. Other people have written at length about possible utopian interventions to manage suffering in wild ecosystems, and while it is currently unknown how feasible these goals are and what the relevant timescales could realistically look like, more research on the nature of suffering and the differences between species is probably useful before choosing any interventions becomes relevant, to make sure we actually prioritize reducing suffering.

A practical limitation to making use of these principles, should they turn out to be true, is that even minimally social animals must usually have ways to communicate and get along with conspecifics in order to mate, and many otherwise asocial animals still care for and invest in their offspring for a while. So, while some animals are clearly exceptional in their social bonding and commitment, purely asocial animals can’t really be found to use as points of comparison, and some of the principles above may apply to a varying extent to most sexually reproducing animals. Another complicating factor is that in the case of many animals, sexes are dimorphic so that females are typically more social than males, who may even live entirely alone. Is a significant sex difference in suffering plausible? There are a lot of confounders here, but human data says yeah maybe – gender differences in sociability are comparatively small in humans, though, and so is the difference between experienced intensity of pain in women and men, so the signal isn’t exactly clear.

Anyway, to reiterate, there are four main mechanisms that could cause a social evolutionary history to produce species that suffer more than otherwise similar asocial ones: 1) the extended homeostasis principle based on the fact that more things can go wrong (hence feel bad) for a naturally social animal simply due to the increased number of things to keep tabs on, 2) social commitment, which is purposefully fueled by psychological pain such as grief, worry, and empathetic pain, 3) purposeful violence, which only happens among social animals and plausibly feels subjectively worse than other kinds of tissue damage due to complicated signalling and group organizing things, and 4) the fact that fully utilizing the rest-and-recovery functions of suffering when physically injured or ill only becomes possible when your resource gathering needs can temporarily be covered by friends and you can afford to stay preoccupied with the pain. Due to the hypothetical nature of these principles, they are probably not super relevant to practical ethics or policy decisions or anything really until we know more, but maybe consider forever being extra nice to dogs, the blessed animal we purposely bred for maximum personal sociability, cooperation, dependency, and companionship. Thank you.

A brief history of humans trying to pretend that suffering is actually OK: Analogies between religious theodicy and secular justifications

[Epistemic status: I have no deep background in theology or philosophy of religion, so this isn’t meant to be a very comprehensive or detailed picture, just scratching the surface based on a few papers and lectures. Expect some major oversimplifications and a couple of misunderstandings.]

[TL;DR: Theodicy: do not do the thing.]


Theodicy was originally the religious project to justify, explain, or at least find ways to accept the intuitively unacceptable suffering we paradoxically see in a world supposedly ruled by a benevolent, omnipotent deity. Recently the concept has metaphorically been expanded to also encompass a more general, secular version of itself: the age-old human tradition of seeking meaning in or justifications for suffering in general, not just because these explanations are required by some theistic ontology. There are a lot of similarities in how people try to justify suffering within these two frameworks (though the projects seem to fail for different reasons) and the religious search for a viable theodicy has certainly influenced the justifications we now see even in reasonably secular cultures, but I suppose it’s fair to assume that most of the motivation is rooted in a deeper, more universal need for a coping mechanism, not so much in some lingering influence of specific religious memes.

Theodicy is distinct from defending theism against a fundamental logical incompatibility between God and evil, and much more interesting, especially from a secular point of view. We, too, are beings who to a great extent seem to tolerate evils we could at least potentially eradicate, so I guess in a sense we have almost as much to explain as a hypothetical benevolent, omnipotent deity has. The purpose of this post is to examine typical secular theodicies by comparing them to existing theophilosophical attempts and their critiques (obviously in the light of a secular ontology), because the large body of work surrounding religious theodicy could shed some light on the secular approaches as well.

Importantly, the consensus currently seems to be that no satisfying religious theodicy has actually been found, and that anti-theodicies – various explicit flat-out refusals to explain, justify, or even forgive God, especially prevalent among Jewish theophilosophers post-WWII – are the closest a theist can get to a solution. The project of theodicy itself is often seen as rotten and immoral; many go as far as to assert there can be no morally sufficient reasons for God to permit a world as evil as ours. The Finnish philosopher Sami Pihlström, for instance, argues that morality is more fundamental than metaphysics – no matter how mysterious the ways in which deity so-and-so works, or how feeble our rational capacities, we should have enough confidence in our moral sense to abandon a project this bizarre and instead take suffering and its victims seriously even if we subscribe to theism. And if anti-theodicy is the primary way theists have to deal with suffering, if even a fundamentally incomprehensible, all-powerful entity can’t really save the idea that suffering in itself is ultimately meaningful somehow, what hope can a secular morality have for preserving it?

Secular theodicies: some requirements

Anyone who has ever earnestly advocated the abolition or dramatic reduction of global suffering in almost any social setting has probably met some major resistance and a colourful bunch of common-or-garden theodicies.  Some of them are rooted in low-level misunderstandings, such as the notion that pain as a physiological process is a necessary warning signal (so our current levels of overall suffering are somehow optimal), or that abolishing suffering is necessarily basically equivalent to wireheading, or that prolonged boredom or existential dread isn’t really suffering or will for some other reason be preserved and intensified when the robotic abolitionists get their inhuman project off the ground and nothing will feel meaningful to anyone ever again. But even when people are roughly on the same page regarding these issues, the idea of reducing the biosphere’s overall suffering sounds extremely alarming to many people – probably due to its unintuitiveness and the immensely important role that suffering has historically played in our emotional meaning-making machinery. Dissecting this discomfort is useful both instrumentally and theoretically: in order to effectively advocate reducing suffering we obviously need to understand the counterpoints, and even more importantly, these counterpoints could eventually indicate something we’re currently missing about the functions of suffering.

All in all, though, it seems that comparing all the apparently futile religious theodicies with secular justifications for suffering mostly just reveals how weak the enterprise in general is. If a natural framework could reasonably justify the suffering we see in the world, centuries upon centuries of theodical philosophy would not have been needed in the first place, or they probably would at least have resulted in stronger conclusions than the ones we’re currently stuck with – basically yielded some acceptable general justifications disguised as religious ones. Even more damning than the lack of viable options is the conclusion accepted by many modern theophilosophers that it is immoral and possibly downright bizarre to even try, because the evil in our world is so evidently so bad that no benevolent God could ever be able to justify its existence.

So what would a viable secular theodicy need to explain? Among other criteria, religious theodicies can be classified according to the range of evils they tackle (Trakakis 2008). Why must there be any suffering at all? Why must there be purposeful evil, or naturally occurring accidental suffering? Is the current amount of suffering also necessary or justified? Is there a justification for every single instance of harm? All of these questions can be applied when searching for a secular theodicy as well: any sufficient justification for not reducing suffering will need to respond to these points (except perhaps to the last one, since micromanaging individual instances of suffering isn’t currently feasible for humans, so some collateral damage may be necessary).

Another perspective that usually has to be addressed (again according to Trakakis) concerns the nature of the benefits suffering is supposed to result in. In a theistic ontology, the potential benefits are different than in a secular one, of course, but some relevant principles remain. Suffering should at least be causally or logically connected to the resulting goods: if we want to argue that horrifying pain builds character, we should be fairly confident that it really does so, that similar character-building properties can’t easily be found elsewhere (with less of the, you know, horrifying pain), or better yet, that the suffering is absolutely necessary as a foundation for an ideal character. If this condition is satisfied, we now need to assess whether the benefits gained are somehow greater than the suffering endured: this is a tall order, for imagine the greatness of character that is needed to compensate even for the fairly typical everyday atrocities in history or in the present. Even if you could make this case for some humans, which I don’t think you could tbh, consider the pain felt by animals with no capacity for anything like character-building (if I find fifty righteous fruit flies tho).

The greater good approach

This brings us to the most common approach to theodicy, which probably covers the vast majority of both religious and secular justifications for suffering. The main point is simple: something really is worth all the suffering we endure, and suffering is likely to be the only way for us to achieve it. Candidates for this good include virtue or character, personal growth, close social relations, artistic inspiration, a sense of meaning, and even positive emotions in general – in a secular ontology, people will probably glare at you unless you can give an explanation of what exactly this benefit is and how it’s supposed to be related to suffering and also worth it; in a theistic one, you have the bonus option of just trying to convince us that there surely is such a benefit, it’s just mysterious like that, and also adding something about how souls need to be forged in the crucible of magic suffering in order to become worthy of the heavenly afterlife or something. Neither of these has so far been a satisfactory response to anything but fairly mundane or trivial pains on a scale from stubbed toes to genocide. Pihlström protests against any attempt, religious or otherwise, to justify intense suffering from the outside in this manner: if suffering does indeed result in something sufficiently valuable to make it worthwhile, it should only be up to the victim to decide whether or not it really does – other approaches trivialize the evil and the victim. This makes epistemic sense, as we don’t really have the subjective knowledge to assess the intensity of anyone else’s suffering. If we did, though, and if the benefits were gained by someone else, a utilitarian case could be made for justified suffering even when the victim doesn’t super agree.

Some suffering obviously does lead to good things, even to stuff that’s quite clearly worth it all. Maybe some kind of a contrast between, say, sadness and happiness really does enhance the overall experience. And maybe a genuine chance of failure and disappointment really makes it feel more meaningful to strive for nice things in life. And close and committed social relationships probably do require you feel some distress when you lose a loved one. However, this is entirely consistent with accepting that too much of a bad thing is in fact a very bad thing, and that there are forms of suffering that are entirely unacceptable in relation to the benefits they result in. Many kinds of distress actually make you a worse person: being in pain and stressed out makes it harder to focus on anything except your own personal survival and well-being, often even after the situation improves. Surviving a hardship makes you less empathetic to other people going through it later on, and so on.

There are many ways to assess this approach empirically, which is what any secular morality needs to do of course. Whatever the benefits are, they probably don’t scale ad infinitum with the suffering we experience; otherwise we would find people just advocating MAXIMUM SUFFERING, which maybe we do, I don’t know. This and common decency suggest that the current, horrifying amount of global suffering has not satisfactorily been proven optimal and hence justified, and that even if there are some hardships we need to go through in order to grow as human beings or something, people being brutally murdered or billions of sentient animals dying of thirst and infections everywhere all the time are not necessary properties of a world even if we also want it to have grown human beings. Also, the fact is that any benefit brought by distress can only be determined afterwards. People avoid intense pain, and wholeheartedly approve of others avoiding intense pain, even when the post hoc narrative just sometimes is that it was all worth it in the end. This looks a lot like the benefits of non-trivial suffering are mostly accidental, a net negative, and suffering isn’t a reliable way to gain anything valuable at all (with some specific, typically low-intensity exceptions – in which the suffering usually is more of a byproduct than the actual cause of the benefit). The dramatic ways in which the victims differ throughout the biosphere further reduce the odds that current suffering levels are fine: it seems extremely implausible that there is some suffering-benefit tradeoff that applies to every animal taxon, or otherwise renders all of the suffering we know about somehow acceptable.

The agency/free will approach

Another common approach is based on agency or free will: religious theodicies of this type either tend to claim that it’s logically impossible to be sentient or good without the possibility of evil (i.e. wanting to harm others), which doesn’t fly for multiple reasons this margin is too narrow to contain, or that good can only be meaningful if it’s a genuine choice, or that free will is otherwise more important than other beings not suffering (again for soul-forging purposes probably or because people need to make an active choice to remain close to God or something). Typically this also vaguely implies that God isn’t the direct source of the evil we see, and hence not really responsible for it: suffering only exists because humans misuse their agency.

From a secular point of view, I’m not sure what to make of it – I don’t think people place a lot of value on folks in general being able to kill each other and just not choosing to do so. I guess people do see value in freely choosing to be good when it’s just about them, but as evidenced by the self-centered nature of this judgment, this has more to do with virtue signalling and moral competition than with freedom-to-cause-or-not-cause-harm as a value. I also don’t think this applies to many major forms of harm; I, for one, have never congratulated myself for not severely beating people up in the subway, or for not having any desire to do so.

The secular version of this theodicy is sort of a subtype of the greater good approach above. So what goods would we lose if, starting tomorrow morning, people were unable to significantly harm each other for no good reason? I’m not even sure this would reduce our overall autonomy. In a sense, a great deal of violence is already rooted in impaired agency – people rarely choose to lead a life of, say, gang violence or war, as long as there are reasonable and realistic alternatives (building a life of order out of such chaos is extremely difficult but people still tend to prefer to attempt this when given a chance, whereas choosing a life of absolute chaos when living comfortably is extremely easy, yet few people choose to do so). Of course, there are disagreements about what kinds of suffering you are justified to cause as a necessity to preserve e.g. your social autonomy, but again, the evil or suffering itself isn’t needed for you to be autonomous. (The concept of autonomy and genuine agency in a social environment running on human brains is, in any case, probably too muddled to provide anything useful here.)

Another shortcoming of this approach is that a lot of suffering is still caused by diseases and natural disasters; so maybe you inexplicably want people to be able to maim each other at will (though they should still be stopped, and also they belong in prison afterwards, let’s not be unreasonable here), but there’s tons of suffering besides human evil. This is also a counterargument to Alvin Plantinga’s free will defense.

But autonomy is often evoked as a justification for suffering in the other direction as well: since people tend to place some value on their past suffering, and a lot of it has very genuinely been valuable to them, someone wanting to reduce or abolish suffering threatens many of the things they currently find meaningful, or the struggles and choices they made to be able to get through it. I don’t see why this isn’t a reasonable justification for some hardships and pains: again, if there are painful things people generally are glad to go through, or if there is an apparent relationship between these things and positive outcomes later on, maybe these forms of suffering shouldn’t be eradicated; but maybe an alternative should still be offered for people who would rather choose not to go through them, you know, because of the autonomy stuff and all. Also, this is again not a plausible argument for intense suffering, or credible in the presence of burning children, as rabbi Greenberg more eloquently put it. Also also, animal suffering is not properly justified by this theodicy any better than by the more general greater good approach above: even Darwin lamented the suffering of wild animals and found it irreconcilable with the concept of a benevolent God, and didn’t seem to glorify the freedom of wild creatures in the midst of it all.

The “Best of all possible worlds” approach

This theodicy is also pretty well-known, presented by Leibniz in the 1700’s, and it’s pretty much exactly what it says on the tin – out of all possible universes, God chose the one with the best conditions and actualized it and since he is obviously good and reasonable, everything’s basically fine by definition. Moving on without comment, a common secular analogy is rooted in the powerlessness of mankind: if there is no God, there’s also no way for anyone to directly make things better without the possibility of everything backfiring horribly. There may be terrible things going on in the world, but there’s no way we can help it – this is the best we can do.

The solution here, it seems to me, is to tirelessly gather more information and power, not shrug and turn your back to a world full of unimaginable distress you could at least help alleviate. I know, I know, there are massive coordination problems we haven’t really solved and fixing even most of the ways in which the world is bad currently looks like an intractable project, but at the same time everything is making some sort of progress and there are people doing a lot of good with whatever they’ve got and the change is slow but we’re making the intractable very tractable in surprising ways all the time. This entire theodicy is a lazy excuse mostly and y’all know it.

Minor theodicies, other directions, and conclusions

There are a lot of approaches in religious theodicy that aren’t really transferrable to a secular framework, such as all the Original Sin stuff and the related karmic explanations, all of which mainly try to shift the responsibility on us mortals – uninteresting now that we already accept it. There are also some justifications that are mostly just seen in secular contexts, such as wanting our experiences to be authentic or real in some usually poorly defined but intuitively natural sense, and thus wanting to retain distress almost as a terminal value because it’s part of the authentic human or animal #lifestyle. This is horrible and fundamentally incoherent with everything but I get it, there’s a chance that while carelessly getting rid of some traditional human stuff you throw away something valuable as well. None of these seem to fare better than the ones described above when asked to respond to all of the reasonable requirements.

What I’m interested in right now is suffering as a social motivator, though. As mentioned above, it’s plausible that the implicit fear of intense social distress is such a major part of human social dynamics that abolishing it or allowing it to become voluntary would change the way we have to approach human relationships and require us to strengthen other sources of emotional commitment. There are close social bonds without super notable suffering even when the bond eventually breaks, but at the same time, the most distressing events of a typical first world life are social losses of different kinds, and this might be something people will generally want to retain for complicated sentimental and social reasons. Again, this is not going to lead to a satisfying theodicy even if we only wanted a narrow, anthropocentric one, but I think the relationship between suffering and social bonds is worth investigating before the hypothetical future where abolitionism or dramatic reduction of suffering becomes feasible.

Anyway, I realize that most of the rejections above are based on pretty intuitive moral judgments about what an acceptable justification should look like, and some people will obviously find them more persuasive than I do. I would kind of like to do more research on the subject and write up a more rigorous analysis of it, though probably focusing on the secular justifications to an even greater extent, since a deeper understanding of the religious approaches doesn’t seem very useful after this point. But it seems like the reasons people so strongly oppose reducing suffering aren’t very well understood right now: many of the individual arguments are trivially kind of weak, but the discomfort remains. Clarifying this issue and some related concepts could be really useful in understanding human values.

On mind-reading

I feel that explicit communication of preferences and emotions is frequently a bit overrated as an ideal habit. Obviously, clear and open communication is invaluable in most intentional social situations, but it’s also a common (and less frequently addressed) failure mode to not place enough value on needing to explicate as little as possible because you’re being understood more effortlessly on an intuitive level.

The subcultures I vaguely identify and interact with tend to be especially fond of explicit communication over mind-reading. This could be because many people roughly in this category (nerdy, analytic, thing-oriented) would seem to be somewhat below average at intuitively reading other people, which could make it more difficult to see how well mind-reading works when it works, and in some cases because empathy and related concepts are disvalued as a result of this (and even seen as fundamentally opposed to systemizing and rationality). Dichotomies such as the empathizing/systemizing divide in Baron-Cohen’s work on autism contribute to these attitudes, and I’m guessing it’s not implausible that there’s something to this divide in how the human brain works, but these thinking styles being inherently neurofunctionally antithetical to each other to the extent that empathizing should deserve its irrational reputation isn’t something I would bet a lot of money on (except possibly on the level of individual situations).

However, in many social environments I hang out in both online and in person, the culture has developed a firm appreciation of explicit communication while half-ignoring that explicit communication sometimes is actually genuinely worse than the nonverbal, gut-level understanding it enhances and replaces, that it certainly takes more effort from one or both of the parties in many situations, and that many people would probably benefit from cultivating and trusting their skills in intuitive empathy more than from being told that communicating every preference explicitly is the only good way to build and maintain healthy relationships (and expecting anything else is ridiculous and just causes silly problems to irrational people who expect some sort of magical mind-reading from others).

This doesn’t mean that all functional relationships require high levels of empathy, of course, and ideally the more empathetic people should of course accommodate those who require more verbal information about other people’s internal states. But in close relationships especially, you may run into a major compatibility issue where one person expects their intuitive signals to be understood because empathizing is a fundamental and important aspect of how they think, and the other person kind of scoffs at this and genuinely believes that the more empathetic party is demanding impossible, supernatural levels of mind-reading – again, because this is how their thinking generally, kind of fundamentally works. And this may not always be solved just by increasing explicit communication, because it in turn will quickly exhaust the person who possibly has spent most of their life not needing to describe their basic emotions and preferences to other people, and this is a form of labor that really really drains their energy. (I have on a few occasions been super exhausted by people who have wanted to have this great and healthy explicit communication thing with me, and I haven’t seen what the root of the problem was until years later, because of course explicit communication in every situation is the most important mark of a healthy relationship, and it would be silly to expect anyone to read my mind, right?)

In conclusion, the way discussing every issue explicitly is valued over everything else prevents many people from seeing that a close relationship they are trying to build with someone might just never work as well as it would with someone else because of this difference. Lots of explicit communication is not always a sign that your relationship is great or even functional; it isn’t what’s valuable in itself, being able and willing to respect each other’s preferences is. Lacking this, looking at the relationship and going “yup, gotta increase verbal communication” is sometimes a patch to fix something that wouldn’t have to be broken in the first place. Similarly, trying to improve your empathy levels to fix this may also not work out depending on the extent to which empathy is part of your congenital personality (and I’m sure many (most?) subcultures also demand exhausting accommodations from the people who would prefer very explicit emotional sharing – it’s just not something I run into as often as I see the anti-empathy sentiment described here). I’m not sure I have a good solution at hand, but respecting other thinking styles and even trying them out to the extent that you can will probably not hurt, as unsatisfying and insufficient as it sounds.