Introduction
No one has a convincing estimate for when the first artificial general intelligence that effectively matches human cognition will be developed, and whether or how quickly such an entity could be improved or allowed to improve into an optimizer that humans can no longer control, modify, or generally prevent from satisfying its potentially strange initial goals. Due to the massive gap between the narrow AIs that now beat humans in a steadily increasing number of cognitive environments (though mostly using data-hungry learning approaches that quite fundamentally seem to handle the complexity and variability of reasonable real world behaviour poorly) and general intelligences capable of meaningfully interpreting and influencing the world, I don’t think we would be justified in expecting a defining breakthrough literally any day now. Still, many machine learning experts do give estimates in the ballpark of a few decades when asked when they expect the first AGIs to reach human-level abilities in the relevant cognitive dimensions. What’s more, the inconsistencies in their predictions (such as huge differences depending on how a given question is phrased) show that their responses are probably based more on vague intuitions than on reliable evidence. So, the level of risk awareness and safety precautions in the currently active AGI research and development groups could already define a great deal of the trajectory these events will follow.
With this in mind, it’s surprising how little attention is being paid to the state of safety measures in ongoing projects. It’s true that Google’s DeepMind, the apparent leader in the field right now, has expressed concerns about the possibility of unpredictable accidents resulting from advanced machine learning, and also published some research on the matter. OpenAI, the other major team drawing ample media attention to their achievements lately, has also always been explicitly concerned about the large-scale disasters a misaligned AGI could bring about. However, there are many other groups as well that actively seek to build a general intelligence. These projects are typically smaller, but they are numerous and cover a pretty wide variety of different approaches — some of which are still underexplored — so it is not wholly implausible that one of them will eventually still get to a decisive milestone first. The likelihood of this is further increased when detailed research results of the trailblazer projects are available to other groups, lowering the cost of entering and keeping up with the game. Also, even explicit safety concerns can be insufficient, and it’s crucial that work on safety will be carried out continuously alongside new advances in AI.
The good news is, Seth Baum from the Global Catastrophic Risk Institute recently announced a survey aiming to characterize all ongoing AGI R&D projects, assessing them based on a variety of attributes directly or indirectly related to safety, such as size, nationality, stated goals, and explicit interest in AGI risk. The classification is based on publicly available data from a multitude of sources and includes 45 active projects mostly in academic and corporate institutions, 23 of which are located in the US. (Due to the massive amount of deep learning research going on right now, the survey didn’t take into account DL projects with no explicit intentions to develop a general intelligence, even though it has been suggested that powerful DL algorithms initially designed for narrower purposes could suffice to form an effective AGI architecture.)
The working paper is available here; my summary of it here will not be very detailed, since the results are presented very concisely in the paper itself (pages 17-31). Baum’s paper mostly sticks to explicitly stated information and is thus pretty light on interpretations, so in this post, I’m going to use it as a primary source and pointer, but spend most of my time expanding on its findings using other material or (hopefully not entirely unfounded) guesswork. I will focus on two key questions I think will have the largest impact: the competitiveness of the research and development process, and the interest the groups have in ensuring their product is safe, i.e. value-aligned with or controlled by human developers. Due to the high uncertainty involved in AGI development and all the numerous features of different research cultures that might influence its safety, I definitely don’t expect to be correct about everything here — just bringing my knife to the generally confused gunfight that is AGI timeline prediction.
1. How competitive will the AGI race become?
A major fear in the AGI safety community is that security concerns will be prematurely buried as soon as the exceptionally powerful first-mover advantage in this area becomes clearer to research groups with competing interests, especially when it starts to look like the defining breakthroughs are right around the corner. As Baum points out, a competitive environment coupled with low trust strengthens basic game-theoretical concerns about each group maximizing their own progress while worsening the expected outcomes of everyone. Instead of shutting down and thoroughly revising every project that shows slightly worrying behaviour, even a very cautious group in an intensely competitive situation needs to consider whether a more careless project is going to get ahead of them if they do slow down, i.e. whether they should keep going even in the face of some uncertainty in order to prevent an AI with potentially greater flaws from gaining a decisive advantage. Competition is also part of what makes just putting a strict limit on an AGI’s influence such an inadequate strategy: in a world where superintelligences become feasible, a singleton sovereign could well be the only stable outcome, and a limited, probably friendly AGI won’t become one if a riskier but clearly more effective candidate is available.
The overall competitiveness of the development process is determined by a myriad of factors — a few candidates (such as the number of groups, their relative capabilities, and the information they have about other projects) have previously been identified by Armstrong et al. in a very simple game-theoretic model, but like everything related to social psychology, generally they are probably hard to find and interact in unknown ways. As a rule of thumb, it should nevertheless be better to have fewer serious projects going on, ideally with similar values and dissimilar capabilities, which makes it relatively cheaper for each group to invest in safety precautions. Armstrong’s model also suggests that it is always safer that no group has information about the others; as the authors note, in the actual world this is complicated by the need to establish trust and safety collaboration between different groups.
1.1. How many serious competitors will there be?
While the number of ongoing projects found by Baum is pretty high, most of them are small or medium-sized with three exceptions: DeepMind (the Google group behind AlphaGo and more recently AlphaZero), OpenAI (the project launched by Musk and Altman explicitly to mitigate the catastrophic societal risks of AGI) and the Human Brain Project (a major EU project aiming among other things to simulate the human brain). As a vague definition, a serious competitor is one that influences the behaviour of other groups that roughly understand its work or is seen by most of them as a relevant rival; everyone would describe DeepMind as influential in this sense, probably not yet pay a lot of attention to most of the medium-sized projects going on right now, and definitely not lose sleep over my chatbot even if I went around telling everyone I’m absolutely trying to build an AGI for reals (but maybe if I make one that quotes Gödel, Escher, Bach…).
Baum’s methodology uses group size as a proxy for capability. The task of going through each project and rating their actual potential would indeed be considerable and yield wildly uncertain results even if done by someone more immersed in machine learning than me, but there are some general impressions I have that feel a bit more accurate than relying solely on project size. It is obvious that DeepMind and OpenAI, especially the former, have the most impressive track records in solving recent machine learning challenges that the general public finds interesting. Whether or not this will in itself mean something substantial on the path to genuine AGI, due to their high profile they also seem like the most attractive career options for talented researchers. I’m assuming that progress in this field is limited more by exceptional talent than other resources, so a positive feedback loop in their future capability is likely.
But which other projects could challenge them? An obvious suspect is the Human Brain Project, the third project categorized as large. Despite its size and considerable resources, though, the HBP has other research interests apart from brain simulation, and seems to have shifted its focus to better neuroinformatics tools after a pretty rough start featuring heavy external and internal criticism and no convincing plans for reaching their initial goal of a simulated intelligent whole brain. I’m also skeptical in general about the potential of brain simulations as a road to AGI. I could not find an unambiguous definition of what the project constitutes as a human brain sim — this has often referred to low-level modelling of neural cells or even subcellular processes functionally interacting with each other roughly as their biological counterparts do, which can plausibly be expected to bring about intelligent behaviour even with no clear directions for how its higher-level cognition is supposed to work. (This was also the initial approach of HBP’s sister project and precedessor, the Blue Brain Project, best known for modelling the connectome of a single rat neocortical column on Deep Blue in 2006. I don’t know what said neocortical column is up to these days, so I’m guessing significant advances haven’t followed yet.)
Such a cell-level simulation encompassing a whole brain dreamed of by many a philosopher of mind would be massively too costly for us to realize, and indefinitely remain generally wasteful due to translation costs between different substrates, even if we knew how individual neurons work and interact with each other well enough to try (check out the research on C. elegans neuron-level sims, and remember that this is a system of a mere 302 neurons). On the other hand, to model the interactions of larger brain areas in less resource-intensive and more modifiable abstractions, you need a lot more than we currently have in terms of detailed and accurate knowledge about these interactions, if you want to replicate a brain’s cognitive functions and not just build entirely toothless approximations. In neurobiology, especially as it relates to cognition, we don’t really have precise controls of different variables we can mess with until we understand the system in some neat non-chaotic sense we could build something solid on. We have a replication crisis just like everyone else and ten thousand well-funded headaches.
There are several other projects that also approach the issue mostly through low-level modelling of human cognition, and my suspicions about inefficiency when compared to architectures that scrap human cognition almost entirely extend to them as well. The gains in AI capability are likely to be overwhelmingly based on algorithmic redesign instead of replicating human biology with increasing processing power: despite their value as basic research tools for cognition and neurophysiology, brain sims loyal to biology have next to no track record or evident medium-term promise in producing intelligent behaviour, while inhuman algorithms elsewhere show faster progress in narrow but still considerable tasks and maybe even plausible paths to increasing generality. If machine learning approaches fail to generalize and accommodate more complex motivational systems in a couple of decades with no solutions in sight, the slow and steady advance of systems based on neuroscience may become relevant again, but I would not bet on them right now or count them as serious competitors to inhuman machine learning when it comes to AGI.
In addition to the strictly neuro-based projects, there are some smaller groups I would quite confidently dismiss, as well as some groups that have been doing their thing for a couple or even several decades based on one cog-sci theory or another but with slim (or clearly only narrowly intelligent) results and few apparent ways to improve. Most groups don’t go into a lot of detail about their research in public, though, and it can definitely not be ruled out e.g. that some old established projects with sufficiently functional theoretical frameworks will be able to benefit from modern machine learning in surprising ways. This, combined with the rest of the major tech corporations also investing in AGI lately, makes it reasonable to expect at least a couple of the current projects to become serious competitors to DeepMind and OpenAI.
What’s more, unless AGI projects are remarkably short-lived, their number is also steadily rising at the moment: out of the 45 active projects, 5 launched in 2017, 4 in 2016, and 5 in 2015. So, roughly a third of the existing projects have started during the past three years, and by the end of this year we may have five or so projects more. While all of them won’t be capable of course, the chance of another DeepMind grows quite quickly if there is no reason to assume that the trend will die down in the near future. The initial resource limitations of smaller, competent teams with interesting approaches may also be overcome if larger companies seek to take them under their wings, which is what happened to DeepMind and could easily happen to other similarly prominent groups. (Facebook was also interested in buying DeepMind shortly before it was acquired by Google — maybe they luck out next time. Gulping down young but exceptional groups sounds like an effective way to acquire more talent, which makes it easier for the demonstrably best projects to get the resources they need soon enough.)
Another factor that could lead to a larger future number of serious competitors is the fact that safety precautions are again bound to slow down a group’s progress, so smaller projects could see the costly risk avoidance showed by the leading groups as a miscalculation and a weakness exploitable for them to get in the game anyway. Some talented researchers restricted by safety concerns they happen to deem unreasonable may choose to search for a more liberal research environment, which the market is likely to eventually provide them. Other prestigious corporations could probably match the attractiveness of the two current leaders when it comes to recruiting exceptional researchers, and also top it by promising less of that avoiding-planetary-disasters jazz and more exciting opportunities once promising paths to AGI start to become clearer. Microsoft, Baidu, and Facebook all have projects devoted to AGI that currently seem to pay no specific attention to control and value alignment, which they could start using as a selling point to recruit researchers who lean towards skepticism on rogue superintelligences. So could smaller projects, of course – while the capacity and resources they provide might on their own mostly not be interesting enough to attract tons of talent, building an exceptionally free, risk-seeking culture could conceivably be. Please don’t actually do this.
1.2. How motivated will the groups be to compete?
Baum notes that many of the projects have convergent values: goals that are best described as humanitarian or intellectual are clearly dominant among the stated missions. This consensus of benevolent goals is nice but partially illusory — it’s not like most institutes can costlessly just say they don’t know or aren’t actually super interested in what their inventions can be used for. Since the aim of the survey is to collect public, explicitly stated information, Baum takes these statements at face value in his coding, but notes that they may not reflect the actual missions of these projects. For instance, the largely humanitarian image of corporate projects could simply be a reflection of what motivations are seen as acceptable in the eyes of the general public, so little more than a marketing gimmick. I agree with this caveat, and in fact worry that it is necessarily underemphasized in this paper due to its inherently trusting methodology.
Another thing I’m wondering about is how much such roughly convergent goals would ultimately even reduce competition, or in other words, how much of a group’s motivation to compete is actually rooted in their high level mission. Some of it certainly is especially since it plays a part in selecting the people who end up working on a given project, but there are still going to be layers upon layers of personal motivations under the endorsed values of any institute. Most saliently, only the winners will end up with the considerable amounts of fame, fortune, and all that sweet going down in history stuff available to whichever team ends up creating the first AGI, at least for the two weeks we have after that before it’s weird runaway optimization process swallowing everything we know o’clock. So the idea that similar utility functions reduce competition is obviously legitimate, but even if we assume that every group is being honest, mostly ignoring obvious implicit goals like monetary profit, and agreeing on what things like “improving the world” look like, I would not interpret agreement about group-level goals as identical or even closely related utility functions.
A better sign is the interconnectedness of different groups, which means that the social gains involved can be distributed a bit more evenly across the community independently of which group eventually makes it. Baum notes that many organizations all around the world share contributors, advisors, or parent organizations, which both reduces competition and makes it easier to increase attention to risks, as safety awareness can spread more easily in a tighter network.
Of course, collaboration can also accelerate the development itself by reducing redundant work and putting every group in a better position to assess what the good ideas are. This is worrying, but still probably better than acceleration resulting from a more competitive situation, particularly if the collaboration happens in a context that emphasizes transparency for the sake of safety. This may just be a tradeoff we have to accept. For instance, it looks like Apple — apparently lagging behind in AI development and talent acquirement due to its secretive culture — was sufficiently attracted by greater cooperation opportunities last year to join the Partnership on AI, a consortium founded by rivalling Western tech giants to ensure that AI has positive societal outcomes. While the Partnership is mostly focused on more predictable concerns related to narrow AIs, such as consumer privacy, it also includes groups such as Oxford’s Future of Humanity Institute, the home of academic AGI doom ‘n’ gloom. The Partnership hasn’t been very active so far, so I guess we will have to wait and see what this amounts to, but the main takeaway is simply that the need to collaborate will also make it easier to coordinate responsible research.
In sum, the situation right now seems somewhat less competitive than what I expected before reading the paper, and there are some promising avenues for increasing safety collaboration: as we can see in the case of Apple and the Partnership on AI, there may be ways for the leading groups to modify the incentive landscape for the rest of the projects in a less competitive direction. Still, it would be naive to expect a perfectly nice collaborative environment to prevail indefinitely on its own as we get closer to actual AGIs. Perhaps isolation in comparison to collaboration is expensive right now when there’s still a huge and unknown amount of foundational work to do and dead ends to check out, but eventually, when the goals get more specific and getting ahead of everyone else will be less likely to be just a temporary victory, the apparent relative payoff of secrecy and competitiveness will grow. I don’t know how many people in the AI safety community are primarily working on finding ways to influence these indirect factors in AGI development, and perhaps it isn’t the best use of resources right now, but it is probably more feasible to prevent the formation of a hypercompetitive culture than to tear one down once it’s already being established.
1.3. Military connections
Considering the geopolitical implications of a strong military AGI, Baum also looks at the military connections of the existing R&D groups. I’ll add some comments about this issue in this section, because in addition to their inherently, uh, hostile nature, military projects seem pretty likely to become intensely competitive if two or more rivalling nations decide to invest in them. A Space Race style era of overt competition and fast-paced technological innovation could prove catastrophic if the aim is a massively effective general intelligence. Such an environment would also generally be volatile and increase the likelihood of a fast takeoff instead of gradual, controlled development, and also incentivize equipping the product with a great deal of concrete power as soon as possible. It seems very likely that the culture in military R&D groups would also be dangerously disconnected from the rest of the field with its social connections and potential for collaboration on safety.
The survey identified nine (mostly academic) projects with military funding, eight of which are located in the US. Only four explicitly had no such connections, and 32 projects were listed as unspecified. Baum acknowledges the possibility of covert projects, but finds no direct reasons to worry about the issue at the moment:
“– the modest nature of the military connections of projects identified in this survey suggests that there may be no major military AGI projects at this time — the projects identified with military connections are generally small and focused on mundane (by military standards) tactical issues, not grand ambitions of global conquest. This makes it likely that there are not any more ambitious secret military AGI projects at this time.”
From what I can tell, the discussion about AI safety in military contexts right now is also mostly about narrow AIs in control of autonomous weapon systems or other pretty specific tasks, which is reasonable considering that these systems are right around the corner even with existing technology, their worst-case consequences could also be practically irreversible, and slowing down narrow AI development in the military could also push military AGIs further away into the future. However, while the harms from autonomous weapons and the dramatically reduced cost of violence they bring about could be immense both in international conflicts and domestic control, these scenarios could still more typically be survivable for human values and allow for the eventual recovery of the afflicted societies. This may not be the case when we’re talking about superintelligences, so while they are certainly a more speculative threat than killer drones, military AGIs are a distinct hazard that needs explicit attention.
While Baum’s scarce evidence of serious military projects is reassuring, there are some reasons to believe that this is going to change in the future. Interest in increasingly sophisticated military AIs — which obviously is growing among all major world powers — is likely to gradually shift towards more generally intelligent systems because that’s a rational and appealing trajectory once such systems start to seem possible at all. For example, this quite recent CNAS report based on translations of Chinese documents mentions some explicit “singularity” thinking in the Chinese military (PLA), on top of the already very general-sounding tasks their recently boosted AI strategy aims for. The report is also concerned about PLA’s willingness to relinquish control of their AIs to gain an advantage:
“– the PLA’s speculation on the potential of a singularity in warfare does raise the question of whether the U.S. emphasis on human intuition and ingenuity might be appropriate for the immediate future but perhaps infeasible for aspects of future warfare that may occur at machine speed. There inevitably will be contexts in which keeping a human fully in the loop becomes a liability, and the type and degree of ‘meaningful’ or supervisory human control that is feasible or appropriate will remain a critical issue.”
Of course, interest does not yet mean serious effort, a coherent strategy, or quickly scaling capability. The attractiveness of such military projects to a sufficient pool of talented researchers is unclear to me, for one, and talk is cheap. But other world powers eventually escalating with similar plans of increasingly general military intelligences is a quite realistic scenario, and it could be the beginning of a death spiral I’m not looking forward to, particularly in light of how much military research has historically accelerated other technologies.
(For obvious infohazard reasons I considered not mentioning the report above, but I expect this blog to reach a vastly larger number of people in AGI safety than people relevant to military activity. The latter are also likely to get the information soon enough from other sources since the report is in no way obscure, so the value of slightly increasing the knowledge the safety community can work with is probably higher than that of ignoring it. Don’t tell anyone else though unless they know the secret AGI safety handshake.)
It has previously been suggested that military projects could in fact be safer than a larger number of private organizations working on AGI with potentially shoddy research policies. However, coordinating safe research between the non-military groups we’re now looking at seems more feasible than between hypothetical rivalling government projects, so on balance I suspect that military AGI should indeed be discouraged if possible. Not knowing that much about the history of warfare, though, I don’t have any interesting ideas about this apart from the usual platitudes of world peace being, by and large, a good idea.
How similar is this problem to the threat of nuclear weapons? There are some analogies between nuclear warfare and a hastily developed military superintelligence, such as the irreversible horribleness of the worst-case scenarios involved and the difficulty of international regulation. However, AGIs may be harder to control even during development and testing, so even a successful ban just on using them isn’t going to be sufficient, and they also have the potential to directly and massively improve the global future (unlike nuclear weapons, arguably), which means that regulation attempts will likely become even more complicated i.e. ineffective than nuclear disarmament plans have historically been. Effective countermeasures will perhaps involve indirect means like getting more cultures and researchers personally understand the hazards, if we have to accept that international military research policies have a questionable track record and may be even less applicable to this problem than to other research. If we assume exceptional talent is scarce, it could also counterintuitively be good to have some soft competition between corporations, as this would encourage them to attract more researchers and so leave government projects with less talent.
In conclusion, military AGIs make a lot of sense, are built following different incentives and cultures than the rest of AGI R&D, have recently been hinted at in practice, will probably be funded quite well, are at most as easy to regulate as the last doomsday invention we totally only survived because of anthropic selection, and form one of the worst race-to-the-bottom scenarios we can imagine. Just because they don’t super exist yet doesn’t mean today isn’t a perfect day to start vaguely worrying about them.
2. How much do current AGI projects care about safety?
Probably the most important question explored in the paper is the extent to which ongoing projects are concerned about the risks associated with poorly controlled or misaligned AGI. Low competitiveness won’t help that much if the winner or winners are still not interested in safety precautions, after all. (Though a longer development process makes even disinterested groups more likely to become interested in safety issues, as they have more time to notice and analyze the ways in which things go wrong, of course.) Baum categorizes each project as active, moderately active, dismissive, or unspecified concerning safety: it turns out that the vast majority of groups have not expressed any concerns about major risks an inhumanly capable AGI might involve.
The good news is that as mentioned above, both of the groups that seem like the strongest candidates to lead AGI development are clearly interested the safety of their products — particularly OpenAI, of course, which was explicitly founded as a response to these worries. In this section, I’ll start out with a quick look at the concerns these groups have, and then outline the safety concerns among the rest of the projects, trying to figure out what the interested groups have in common and how successful the AGI safety community has been so far at getting their message through.
2.1. What do the safety concerns of the leading R&D groups look like?
While most people agree that increasingly general artificial intelligence will change things dramatically and potentially for the worse, what they typically think of is not a technical malfunction in reward optimizing combined with capability gains, but things like sophisticated citizen surveillance bringing about a dystopian society, automation rendering humans purposeless, or intelligent systems otherwise diminishing the meaning and quality of the average human life in unpredictable ways because the people using them are careless or simply have bad intentions. In addition to distinguishing between how near-term narrow AIs might change our lives and how a superintelligent AGI might, we should also separate the hazards stemming from people using a superintelligence in harmful ways from the risk of technical control failures. An uncontrolled superintelligence is not a special case of AGIs facilitating short-sighted or selfish societal planning; it’s as close to that as a devastating natural disaster is. The methods and the people equipped to handle these problems are so different that it doesn’t make a lot of sense to blur them into the same conversation, as the results from social science and ethics will have next to no bearing on the control problems, and vice versa. Both of them include disastrous and pretty much irreversible scenarios that require active attention, however.
Like those of the general public, the overwhelming majority of safety concerns expressed by the research groups fall in the social ethics category. The asymmetrical concern is very understandable — we have plenty of examples of smaller technological innovations unpredictably changing society because people choose to use them in malicious or weird ways, but no strong examples of harm caused by uncontrolled technology that itself resembles an agent and not just a tool. Control issues seem like an outlandish concern to most people, which might also cause researchers to be less worried about superintelligent agents at least in public, and instead demonstrate the need for careful progress through less theoretical examples related to societal changes (even if they personally are worried about technical accidents as well).
Perhaps exemplifying this, the ethics and security board demanded by DeepMind as a part of its deal with Google is rumoured to be quite focused on technical hazards, though notoriously secretive – while the recently founded, more public-facing separate ethics team DeepMind Ethics & Society, is true to its name and works on the social and economic changes related to DeepMind’s research. (They do also mention control problems, collaborate with Nick Bostrom, etc., it’s just clearly not their main interest.) Assuming that whatever the internal board does is indeed mostly focused on the technical risks, a dual strategy where societal issues are discussed out in the open while technical safety matters remain internal seems like a pretty good approach to me, especially if there still is collaboration on technical safety with other groups.
OpenAI’s idea was initially very different from this. Like their name implies, the team started out with the aim of publishing most of their research in order to democratize AGI development, and so prevent it from optimizing for values that only a few people would agree with. This garnered criticism from Bostrom and other safety researchers, who essentially pointed out that universal AGI development chances are less like voting and more like everyone having access to nuclear weapons. They still have a lot of resources publicly available, but apparently they’re now reassessing the optimal levels of openness – what this means is not clear yet, but it is at least good to see they’re willing to reconsider potentially harmful plans based on external feedback.
As far as existing work goes, both DeepMind and OpenAI have recently published articles on technical safety as a concern distinct from societal issues like fairness and privacy: for instance, their collaborated paper Deep Reinforcement Learning from Human Preferences discusses relating complex goals to RL agents by involving a human in the learning process, Safely Interruptible Agents by DeepMind and MIRI/FHI develops the theoretical foundations of preventing interruption avoidance in reinforcement learners, Concrete Problems in AI Safety features authors from Google Brain and OpenAI discussing potential accidents during learning and goal attainment, and DeepMind’s AI Safety Gridworlds delves further into practical examples of properties needed to ensure the safety of intelligent agents.
What is the significance of these publications? We don’t know how much work on safety is enough, but I’m pretty sure it’s going to be more than a few papers of similar impact per year. The low ratio of AI safety research to innovative AI publications doesn’t match the magnitude of the risk. But I think there are reasons these papers warrant some optimism, and they are mostly indirect: they show us and everyone else working on AGI that top-tier researchers view the hazards as an acceptable thing to address instead of dismissing them in the face of their apparent absurdity. This is crucial, since to competent observers working with increasingly dangerous AIs, the need for thorough safety precautions should only become clearer with time as researchers get to observe all the concrete but counterintuitive ways in which their not-yet-general inventions keep going wrong. Right now it may sound laughable even to a somewhat cautious researcher that an AGI could really, actually, say, pretend to share our values until it has embedded itself into the most crucial components of our global infrastructure and only then let us realize that its terminal goal was dat sweet paperclip lightcone all along. But after observing AIs that learn similarly deceptive behaviour in complex game environments, when they aren’t yet quite clever enough to get away with it, even a skeptical researcher would probably reconsider their default optimism – as long as they aren’t too prejudiced or deafened by the public opinion to not notice what’s going on. (I’m guessing this would also be a required component in making the general public take AGI safety seriously: concrete examples of stuff like reward hacking and treacherous turns in increasingly complex environments are almost certainly going to be more convincing than even perfectly good abstract arguments with little in terms of empirical salience (more on this below).)
2.2. How interested in AI safety are research groups in general, and how well has the AI safety movement fared?
Considering that AI worries of the LessWrong flavour have been around as an increasingly coherent movement for a couple of decades, it is a bit disheartening to see how few of the research groups have addressed them at all: 15 out of 45 projects, three of which only moderately. In addition to the clusters identified by Baum (academic projects with intellectualist goals inactive on safety, corporate projects with humanitarian goals active on safety), I tried to find some traits that could help predict a group’s stance. For instance, newer projects seem to be a bit more active: out of the 24 projects started since 2010, more than half were listed as interested in safety.* Maybe some of the recent projects, just like OpenAI, have actually been launched because of the increasing concern about AI risk. In the general case, i.e. unless you’re Elon Musk, this is probably not a good approach due to the acceleration of competition it could cause, but I don’t expect it to be super common — older projects are also more likely to be calcified and less reflective regarding the ultimate impact of their work, so a culture shift such as AGI safety entering the mainstream can more easily be missed, whereas new projects are more directly formed and influenced by the changed culture.
There seems to be a small difference between projects in the US and projects elsewhere. Out of the 23 US groups, only six were explicitly interested in safety measures at all (two of which only moderately) and two — Numenta based on Jeff Hawkins’s somewhat pop-sciencey framework described in On Intelligence, and Victor, which I’m not familiar with — openly dismiss major risks calling them “not a threat to humanity” and “crazy talk”, respectively. Out of the 22 projects outside US, nine have specified at least some safety concerns (one only moderately), and no one has directly dismissed them. The difference is tiny for a sample size like this so it probably doesn’t mean much, but is there a plausible reason for projects in the US to be less worried about AI risk? It’s not that these groups are less likely to have heard of the problem: if anything, the majority of LW-adjacent activity and AI safety organizations are located in the US, and the projects close to them — geographically and so probably culturally — could in fact be more frequently exposed to their arguments. Perhaps the groups outside of the US are more likely to encounter them in a more academic and traditionally serious form? We already know that Bostrom’s 2014 book Superintelligence has been quite effective at making the right people worried — but maybe a reader who associates it with the historically weird aspects of LW is predisposed to dismiss it, whereas someone who hasn’t already experienced the less approachable argumentation style back in the days is more likely to give a chance to Bostrom’s thorough explanations that even carry Oxford’s prestigious logo.
This would be consistent with how loving to criticize LessWrong’s PR strategy is a neglected human universal. I agree that the culture has probably been off-putting to many people (at least one project here is led by a person I recall getting in pretty intense fights about AI risk with LessWrongers), but as always, it’s hard to compare to counterfactuals. Maybe a community with a more agreeable presentation could have leveraged similar initial resources for a better result, but then again, maybe the controversies and notoriously strange discussion norms were necessary to attract the attention of a sufficient pool of contributors whose work the academic publications can now build on. Post-Superintelligence, however, it seems that now that the cause has the attention of the general public and can more easily publish traditionally reputable or otherwise prestigious work affiliated with serious-looking institutions, doing so is getting us the best results.
(In addition to academic cred, what else did Superintelligence have that previous material on the subject didn’t? I think one of its merits was providing examples e.g. of genuinely surprising behaviour in actual algorithms — I remember being quite fascinated by this bit on evolvable hardware. To get a better picture, I browsed the Goodreads reviews of the book, but mostly found a confusing number of complaints about how dry, boring and overly abstract it is. On the flipside, the positive reviewers appreciated the detail and precision of Bostrom’s analysis, as well as the lack of sensationalism and the modesty he hedges his arguments with.)
All in all, particularly since the most prominent groups are on board, I’m actually tempted to say the safety movement has fared pretty well considering how far-fetched its central concerns sound like to the average person. Again, a disaster scenario based only on extrapolated estimates of capability trends and a bunch of other equally intangible arguments we haven’t seen at work empirically is not easy to convince people of, especially if it also matches a trope that is common but often poorly explained in fictional works. But if we assume that practically all groups have heard of the issue and some of them also believe that a disruptive superintelligence is possible in principle, why have they not addressed it? Here are some suggestions, ranked very roughly by how common I think they are:
• Confidence in that the necessary precautions will be easy: many groups that have commented on the matter suggest something like this, so it’s likely that some of the groups listed as unspecified also simply believe that safety is trivial. This belief can appear in conjunction with practical plans for safety measures, or just with an assumption that since AGI is hard and still pretty far away, the researchers talented enough to build it will also be talented enough to easily implement the necessary precautions or the optimal motivational systems, even if we can’t think of them yet.
• Socially determined threat responses: as described in Yudkowsky’s recent essay, there is no obvious mechanism that makes it socially acceptable to just react to the risk in the current environment where everyone else seems so remarkably chill about it. (I’m very slightly more optimistic about this changing, since AI safety has been gaining a lot of traction lately, the largest groups are demonstrating that they don’t see the issue as crackpottery, and I believe we have more time than Yudkowsky probably does since I’m not quite as worried about the recent machine learning advances, but this is not an endorsement of inaction.)
• Intuitive humility: in particular, academic projects with no intentions to develop their product into a massively useful general intelligence (rather than just a totally neat cognitive science research tool) might not see their project as dangerous or even all that impactful apart from intellectual curiosity. They’re not trying to change the entire world with their product, at least until decades of political negotiations someone else will eventually take care of, and improvement above the human level may not be a goal these projects have in the first place. This stance neglects many important risks if the AGI still has a motivation system of some sort, which it probably will if it’s going to be used even for anything just academically interesting.
• Impact neglect: sort of related to the idea above, but more specifically, a failure to intuitively feel how irreversible or massive the disasters caused by a rogue superintelligence could be. Many people agree that even weird technical accidents are possible but place them in the same mental category of, say, small-scale international conflicts or even just regular industry accidents where a dozen people are hurt but the situation is under control again in a few hours and we can learn from it and move on. But of course, any AGI worth its salt has the foresight to play nice for a while. It seems that if we have a slowly improving human-level intelligence with goals incompatible with ours, it’s more likely to look perfectly benevolent and totally useful re world peace/perfect healthcare/post-scarcity economy until it has a definite advantage than it is to start messing with our stuff at all at any point where it can still be stopped.
• Secrecy or PR: Baum suggests that the survey might also understate the attention actually paid to safety among R&D groups, since groups in the unspecified category might just not be vocal about their concerns even though they are aware of the issues. This is possible since it makes sense for groups to want to avoid associations with dangerous scenarios, but sounds generally pretty irresponsible — an environment where each group can trust the others to pay attention to safety seems more desirable, and can only be built by active signalling of safety precautions. Taking AGI safety seriously probably entails collaboration on safety matters as well (though such collaboration could in principle happen without leaving public traces, but this sounds pretty inconvenient).
3. What does the survey miss?
Finally, one interesting question is the nature of the projects that a survey like this would miss. Baum points out that his number of active projects is a lower bound, not necessarily an accurate picture. It’s true that in addition to the incentives to misrepresent various less virtuous-sounding goals in order to attract funding, talent, and goodwill, there are also many conceivable reasons for a group to not reveal anything to the public about their project’s existence at all. Many of these reasons imply goals that people generally would be reluctant to accept, such as disagreeable corporate practices or hostile purposes. Also, secrecy is again to some extent in conflict with sufficient safety precautions, which usually include engaging with safety researchers, trying to influence the field in general in the direction of responsible research, and explicitly fostering a noncompetitive culture of trust and collaboration. Hidden R&D prioritizes other things, typically just a competitive advantage, at the expense of basic cooperative values.
But could there also be projects that hide their existence for altruistic reasons? This is possible, I guess — again, just the knowledge of a higher number of groups could increase competition, even if no detailed information about a given group’s approach were available to others. So, maybe there are some extremely safety-conscious groups that have decided to play it safe and not even mention their existence anywhere in order to avoid adding pressure on the field. How wise this approach is depends on how well such groups can internally ensure the safety of their project: in most cases, I expect the value of collaborating on safety research to exceed the value of staying quiet. This is also why I don’t think there could be many such projects. Still, out of politeness towards these commendably paranoid hypothetical groups, I’m going to stop speculating about them and just hope that they know what they’re doing.
Anyway, all things considered it seems to me that projects potentially missed by this survey would generally be less cooperative, humanitarian, and safety-conscious than the ones it found — it’s not likely that their nature is better than that of the public projects, and somewhat likely that it is worse. However, AGI is such a difficult field of research that isolated projects should be dramatically less able than public high-profile groups to attract enough talent or resources to warrant serious concerns. Considering also how few of the known groups have reacted to AGI risk with even moderate interest, there are better things to work on right now than worrying about potential hidden groups.
4. Conclusions
It looks like getting a higher percentage of R&D groups on board with safety concerns at all is a goal that is both pretty tractable right now and crucial in ensuring the long-term safety of AGI development. Outreach-type work is hard to do efficiently and indeed sometimes indistinguishable from just shoving the responsibility on someone else’s shoulders, but we’re looking at a situation where the institutes working on AGI are still relatively cooperative with surprisingly strong social and economical ties, alliances coordinating research policies are being formed, and direct AGI arms races such as major military projects may not have started yet. Still, there is a lot going on already, and a number of events likely in the near future that could trigger the interests of active R&D groups to diverge significantly and break the well connected network we’re seeing now, making it more effortful to ensure that groups care about adopting whatever security measures anyone comes up with.
Increasing the chance that safety measures are developed alongside the actual projects by the AGI groups themselves will also quite straightforwardly advance these measures faster on the lowest level, or at least make it more likely that groups notice alarming situations and pause their projects until they have consulted others. This doesn’t mean that people concerned about the risks shouldn’t also work directly on the problems of course, just that the social aspects of the issue are currently being neglected, and people with the relevant comparative advantages are unlikely to have better opportunities in the future to influence the future of AGI safety. If only a handful of all the projects globally accept the need for precautions, nothing guarantees that the direct work safety organizations currently do will ever be implemented when it is required in practice; if all the important groups do, however, the concrete work on safety will be connected to the progress of AGI research, and the higher-level principles involved will be more efficient to implement even if they’re developed by outsiders.
As a more actionable ending to my mostly pretty vague post, here are some examples of more or less obvious tasks that I suspect could be valuable based on what we now know:
• Gathering and curating a library of examples of actual AIs showing unexpected behaviour and strange solutions, such as reward hacking and tactics that rely on deceiving humans, since a growing body of empirical evidence will probably feel more salient than abstract arguments both to researchers and to the general public. Bonus points for associating it with a reputable institute, minus points for overly preachy vibes, exaggeration, etc.
• Looking more closely into why certain groups are interested in AGI risk while others aren’t, possibly contacting researchers in groups that have addressed AGI safety to figure out what convinced them. Then doing more of the things that did, if applicable. While governance of AGI research is valuable especially when international, the end results also largely depend on researchers themselves taking the risk seriously, since no policy will be able to cover every dangerous case.
• For anyone with relevant expertise in areas such as international cooperation, Chinese politics and machine learning cultures, or technology forecasting, the Future of Humanity Institute at Oxford just announced a program on AI governance and they’re looking for applicants. They say team members will have lots of freedom regarding hours, research areas, possibly even remote working opportunities.
• For one-shot ideas, GoodAI’s General AI challenge just opened a round looking for submissions on solving the AI race. It’s open until May and welcomes stuff from policy proposals to more general roadmaps or meta stuff, but with an emphasis on actionable strategies. Write a good thing and then submit the good thing, other people can learn from it.
• Making sure that organizations concerned with technical safety issues are also represented in whatever conferences and consortia we see discussing AI ethics and safety right now; while most of these will probably be pretty cosmetic, the alternative is isolation and missing out on potentially important connections which are being formed now.
• Regarding potential future military AGIs, too much explicit attention to the arms race aspect should probably be avoided and any sort of public awareness-raising campaign is almost certainly worse than ineffective, but indirect ways of slowing down the development can perhaps be found. There are activists and experts seeking to regulate various narrow AIs in warfare; supporting them might be worth it even for people who primarily care about AGI risks.
• If you’re Nick Bostrom’s parents, please call him and tell him you’re proud of him. This absolute legend sitting on every AI safety advisory board everywhere I look, and that’s after writing the book that caused some of the most serious people in the world to take action in order to prevent the end of the world. What a dude.
Footnotes
*These numbers are based on Baum’s classification. I’m not sure I agree about the status of Susaro, the originally-US-currently-UK based project which was labeled active on the safety front – Susaro currently has little information on their website, but is led by Richard Loosemore, who has quite clearly dismissed concerns about catastrophic AI risk in the past. His views seem to be based on the idea that the only cognitive architectures capable of human-level intelligence or beyond will trivially be friendly or controllable due to necessary features of their motivational system design, so Susaro’s attention to safety might ultimately be meant to refer to Loosemore’s confidence in that an actually smart AGI will necessarily also be smart enough to know what we want it to do, and safely just do it. Hopefully I’m wrong, though!
Baum also categorizes the Human Brain Project as not active on the safety front, since its ethics program is focused on the procedure and not the consequences of research. I did, however, find a small subproject, the Foresight Labs in the UK, focused on “identifying and evaluating the future impact of new knowledge and technologies generated by the HBP”. While not articulating what their specific worries are, the group is interested in new technologies leading to unpredictable and uncontrollable outcomes, and also mentions fears related to translating artificial intelligence research into practice (including human-level AI, though not AI exceeding human capabilities). This sounds like they could be responsive to worries about a misaligned brain sim, though currently their concern level is moderate at best.