Today, I’m talking to Demis Hassabis, the CEO of Google DeepMind, the newly created division of Google responsible for AI efforts across the company. Google DeepMind is the result of an internal merger: Google acquired Demis’ DeepMind startup in 2014 and ran it as a separate company inside its parent company, Alphabet, while Google itself had an AI team called Google Brain.
Inside Google’s big AI shuffle — and how it plans to stay competitive, with Google DeepMind CEO Demis Hassabis
Inside Google’s big AI shuffle — and how it plans to stay competitive, with Google DeepMind CEO Demis Hassabis
Google has been showing off AI demos for years now, but with the explosion of ChatGPT and a renewed threat from Microsoft in search, Google and Alphabet CEO Sundar Pichai made the decision to bring DeepMind into Google itself earlier this year to create… Google DeepMind.
What’s interesting is that Google Brain and DeepMind were not necessarily compatible or even focused on the same things: DeepMind was famous for applying AI to things like games and protein-folding simulations. The AI that beat world champions at Go, the ancient board game? That was DeepMind’s AlphaGo. Meanwhile, Google Brain was more focused on what’s come to be the familiar generative AI toolset: large language models for chatbots, editing features in Google Photos, and so on. This was a culture clash and a big structure decision with the goal of being more competitive and faster to market with AI products.
And the competition isn’t just OpenAI and Microsoft — you might have seen a memo from a Google engineer floating around the web recently claiming that Google has no competitive moat in AI because open-source models running on commodity hardware are rapidly evolving and catching up to the tools run by the giants. Demis confirmed that the memo was real but said it was part of Google’s debate culture, and he disagreed with it because he has other ideas about where Google’s competitive edge might come into play.
Of course, we also talked about AI risk and especially artificial general intelligence. Demis is not shy that his goal is building an AGI, and we talked through what risks and regulations should be in place and on what timeline. Demis recently signed onto a 22-word statement about AI risk with OpenAI’s Sam Altman and others that simply reads, “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” That’s pretty chill, but is that the real risk right now? Or is it just a distraction from other more tangible problems like AI replacing a bunch of labor in various creative industries? We also talked about the new kinds of labor AI is creating — armies of low-paid taskers classifying data in countries like Kenya and India in order to train AI systems. We just published a big feature on these taskers. I wanted to know if Demis thought these jobs were here to stay or just a temporary side effect of the AI boom.
This one really hits all the Decoder high points: there’s the big idea of AI, a lot of problems that come with it, an infinite array of complicated decisions to be made, and of course, a gigantic org chart decision in the middle of it all. Demis and I got pretty in the weeds, and I still don’t think we covered it all, so we’ll have to have him back soon.
Alright, Demis Hassabis, CEO of Google DeepMind. Here we go.
This transcript has been lightly edited for length and clarity
Demis Hassabis, you are the CEO of Google DeepMind. Welcome to Decoder.
Thanks for having me.
I don’t think we have ever had a more perfect Decoder guest. There’s a big idea in AI. It comes with challenges and problems, and then, with you in particular, there’s a gigantic org chart move and a set of high-stakes decisions to be made. I am thrilled that you are here.
Glad to be here.
Let’s start with Google DeepMind itself. Google DeepMind is a new part of Google that is constructed of two existing parts of Google. There was Google Brain, which was the AI team we were familiar with as we covered Google that was run by Jeff Dean. And there was DeepMind, which was your company that you founded. You sold it to Alphabet in 2014. You were outside of Google. It was run as a separate company inside that holding company Alphabet structure until just now. Start at the very beginning. Why were DeepMind and Google Brain separate to begin with?
As you mentioned, we started DeepMind actually back in 2010, a long time ago now, especially in the age of AI. So that’s sort of like prehistory. Myself and the co-founders, we realized coming from academia and seeing what was going on there, things like deep learning had just been invented. We were big proponents of reinforcement learning. We could see GPUs and other hardware was coming online, that a lot of great progress could be made with a focused effort on general learning systems and also taking some ideas from neuroscience and how the brain works. So we put all those ingredients together back in 2010. We had this thesis we’d make fast progress, and that’s what happened with our initial game systems. And then, we decided in 2014 to join forces with Google at the time because we could see that a lot more compute was going to be needed. Obviously, Google has the most computers and had the most computers in the world. That was the obvious home for us to be able to focus on pushing the research as fast as possible.
So you were acquired by Google, and then somewhere along the way, Google reoriented itself. They turned into Alphabet, and Google became a division of Alphabet. There are other divisions of Alphabet, and DeepMind was out of it. That’s just the part I want to focus on here at the beginning, because there was what Google was doing with Google Brain, which is a lot of LLM research. I recall, six years ago, Google was showing off LLMs at Google I/O, but DeepMind was focused on winning the game [Go] and protein folding, a very different kind of AI research wholly outside of Google. Why was that outside of Google? Why was that in Alphabet proper?
That was part of the agreement as we were acquired was that we would pursue pushing forward research into general AI, or sometimes called AGI, a system that out of the box can operate across a wide range of cognitive tasks and basically has all the cognitive capabilities that humans have.
And also using AI to accelerate scientific discovery, that’s one of my personal passions. And that explains projects like AlphaFold that I’m sure we’re going to get back to. But also, from the start of DeepMind and actually prior to even DeepMind starting, I believe that games was a perfect testing or proving ground for developing AI algorithms efficiently, quickly, and you can generate a lot of data and the objective functions are very clear: obviously, winning games or maximizing the score. There were a lot of reasons to use games in the early days of AI research, and that was a big part of why we were so successful and why we were able to advance so quickly with things like AlphaGo, the program that beat the world champion at the ancient game of Go.
Those were all really important proof points for the whole field really that these general learning techniques would work. And of course we’ve done a lot of work on deep learning and neural networks as well. And our specialty, I suppose, was combining that with reinforcement learning to allow these systems to actively solve problems and make plans and do things like win games. And in terms of the differences, we always had that remit to push the research agenda and push things, advanced science. And that was very much the focus we were given and very much the focus that I wanted to have. And then, the internal Google AI teams like Google Brain, they had slightly different remits and were a bit closer to product and obviously to the rest of Google and infusing Google with amazing AI technology. And we also had an applied division that was introducing DeepMind technology into Google products, too. But the cultures were quite different, and the remits were quite different.
From the outside, the timeline looks like this: everyone’s been working on this for ages, we’ve all been talking about it for ages. It is a topic of conversation for a bunch of nerdy journalists like me, a bunch of researchers, we talk about it in the corner at Google events. Then ChatGPT is released, not even as a product. I don’t even think Sam [Altman] would call it a great product when it was released, but it was just released, and people could use it. And everyone freaked out, and Microsoft releases Bing based on ChatGPT, and the world goes upside down, and Google reacts by merging DeepMind and Google Brain. That’s what it looks like from the outside. Is that what it felt like from the inside?
That timeline is correct, but it’s not these direct consequences; it’s more indirect in a sense. So, Google and Alphabet have always run like this. They let many flowers bloom, and I think that’s always been the way that even from Larry [Page] and Sergey [Brin] from the beginning set up Google. And it served them very well, and it’s allowed them to organically create incredible things and become the amazing company that it is today. On the research side, I think it’s very compatible with doing research, which is another reason we chose Google as our partners back in 2014. I felt they really understood what fundamental and blue sky research was, ambitious research was, and they were going to facilitate us being and enable us to be super ambitious with our research. And you’ve seen the results of that, right?
“…AI has entered a new era.”
By any measure, AlphaGo, AlphaFold, but more than 20 nature and science papers and so on — all the normal metrics one would use for really delivering amazing cutting-edge research we were able to do. But in a way, what ChatGPT and the large models and the public reaction to that confirmed is that AI has entered a new era. And by the way, it was a little bit surprising for all of us at the coalface, including OpenAI, how viral that went because — us and some other startups like Anthropic and OpenAI — we all had these large language models. They were roughly the same capabilities.
And so, it was surprising, not so much what the technology was because we all understood that, but the public’s appetite for that and obviously the buzz that generated. And I think that’s indicative of something we’ve all been feeling for the last, I would say, two, three years, which is these systems are reaching a level of maturity now and sophistication where it can really come out of the research phase and the lab and go into powering incredible next-generation products and experiences and also breakthroughs, things like AlphaFold directly being useful for biologists. And so, to me, this is just indicative of a new phase that AI is in of being practically useful to people in their everyday lives and actually being able to solve really hard real-world problems that really matter, not just the curiosities or fun, like games.
When you recognize that shift, then I think that necessitates a change in your approach as to how you’re approaching the research and how much focus you’re having on products and those kinds of things. And I think that’s what we all came to the realization of, which was: now was the time to streamline our AI efforts and focus them more. And the obvious conclusion of that was to do the merger.
I want to just stop there for one second and ask a philosophical question.
Sure.
It feels like the ChatGPT moment that led to this AI explosion this year was really rooted in the AI being able to do something that regular people could do. I want you to write me an email, I want you to write me a screenplay, and maybe the output of the LLM is a C+, but it’s still something I can do. People can see it. I want you to fill out the rest of this photo. That’s something people can imagine doing. Maybe they don’t have the skills to do it, but they can imagine doing it. All the previous AI demos that we have gotten, even yours, AlphaFold, you’re like, this is going to model all the proteins in the world.
But I can’t do that; a computer should do that. Even a microbiologist might think, “That is great. I’m very excited that a computer can do that because I’m just looking at how much time it would take us, and there’s no way we could ever do it.” “I want to beat the world champion at Go. I can’t do that. It’s like, fine. A computer can do that.”
There’s this turn where the computer is starting to do things I can do, and they’re not even necessarily the most complicated tasks. Read this webpage and deliver a summary of it to me. But that’s the thing that unlocked everyone’s brain. And I’m wondering why you think the industry didn’t see that turn coming because we’ve been very focused on these very difficult things that people couldn’t do, and it seems like what got everyone is when the computer started doing things people do all the time.
I think that analysis is correct. I think that is why the large language models have really entered the public consciousness because it’s something the average person, that the “Joe Public,” can actually understand and interact with. And, of course, language is core to human intelligence and our everyday lives. I think that does explain why chatbots specifically have gone viral in the way they have. Even though I would say things like AlphaFold, I mean of course I’d be biased in saying this, but I think it’s actually had the most unequivocally biggest beneficial effects so far in AI on the world because if you talk to any biologist or there’s a million biologists now, researchers and medical researchers, have used AlphaFold. I think that’s nearly every biologist in the world. Every Big Pharma company is using it to advance their drug discovery programs. I’ve had multiple, dozens, of Nobel Prize-winner-level biologists and chemists talk to me about how they’re using AlphaFold.
So a certain set of all the world’s scientists, let’s say, they all know AlphaFold, and it’s affected and massively accelerated their important research work. But of course, the average person in the street doesn’t know what proteins are even and doesn’t know what the importance of those things are for things like drug discovery. Whereas obviously, for a chatbot, everyone can understand, this is incredible. And it’s very visceral to get it to write you a poem or something that everybody can understand and process and measure compared to what they do or are able to do.
It seems like that is the focus of productized AI: these chatbot-like interfaces or these generative products that are going to make stuff for people, and that’s where the risk has been focused. But even the conversation about risk has escalated because people can now see, “Oh, these tools can do stuff.” Did you perceive the same level of scrutiny when you were working on AlphaFold? It doesn’t seem like anyone thought, “Oh, AlphaFold’s going to destroy humanity.”
No, but there was a lot of scrutiny, but again, it was in a very specialized area, right? With renowned experts, and actually, we did talk to over 30 experts in the field, from top biologists to bioethicists to biosecurity people, and actually our partners — we partnered with the European Bioinformatics Institute to release the AlphaFold database of all the protein structures, and they guided us as well on how this could be safely put out there. So there was a lot of scrutiny, and the overwhelming conclusion from the people we consulted was that the benefits far outweighed any risks. Although we did make some small adjustments based on their feedback about which structures to release. But there was a lot of scrutiny, but again, it’s just in a very expert domain. And just going back to your first question about the generative models, I do think we are right at the beginning of an incredible new era that’s going to play out over the next five, 10 years.
Not only in advancing science with AI but in terms of the types of products we can build to improve people’s everyday lives, billions of people in their everyday lives, and help them to be more efficient and to enrich their lives. And I think what we’re seeing today with these chatbots is literally just scratching the surface. There are a lot more types of AI than generative AI. Generative AI is now the “in” thing, but I think that planning and deep reinforcement learning and problem-solving and reasoning, those kinds of capabilities are going to come back in the next wave after this, along with the current capabilities of the current systems. So I think, in a year or two’s time, if we were to talk again, we are going to be talking about entirely new types of products and experiences and services with never-seen-before capabilities. And I’m very excited about building those things, actually. And that’s one of the reasons I’m very excited about leading Google DeepMind now in this new era and focusing on building these AI-powered next-generation products.
Let’s stay in the weeds of Google DeepMind itself, for one more turn. Sundar Pichai comes to you and says, “All right, I’m the CEO of Alphabet and the CEO of Google. I can just make this call. I’m going to bring DeepMind into Google, merge you with Google Brain, you’re going to be the CEO.” How did you react to that prompt?
It wasn’t like that. It was much more of a conversation between the leaders of the various different relevant groups and Sundar about pretty much the inflection point that we’re seeing, the maturity of the systems, what could be possible with those in the product space, and how to improve experiences for our users, our billions of users, and how exciting that might be, and what that all requires in totality. Both the change in focus, a change in the approach to research, the combination of resources that are required, like compute resources. So there was a big collection of factors to take into account that we all discussed as a leadership group, and then, conclusions from that then result in actions, including the merger and also what the plans are then for the next couple of years and what the focus should be of that merged unit.
Do you perceive a difference being a CEO inside of Google versus being a CEO inside of Alphabet?
It’s still early days, but I think it’s been pretty similar because, although DeepMind was an Alphabet company, it was very unusual for another bet, as they call it an “alpha bet,” which is that we already were very closely integrated and collaborating with many of the Google product area teams and groups. We had an applied team at DeepMind whose job it was to translate our research work into features in products by collaborating with the Google product teams. And so, we’ve had hundreds of successful launches already actually over the last few years, just quiet ones behind the scenes. So, in fact, many of the services or devices or systems that you use every day at Google will have some DeepMind technology under the hood as a component. So we already had that integrative structure, and then, of course, what we were famous for was doing the scientific advances and gaming advances, but behind the scenes, there was a lot of bread and butter work going on that was affecting all parts of Google.
We were different from other bets where they have to make a business outside of Google and become an independent business. That was never the goal or the remit for us, even as an independent bet company. And now, within Google, we’re just more tightly integrated in terms of the product services, and I see that as an advantage because we can actually go deeper and do more exciting and ambitious things in much closer collaboration with these other product teams than we could from outside of Google. But we still retain some latitude to pick the processes and the systems that optimize our mission of producing the most capable and general AI systems in the world.
There’s been reporting that this is actually a culture clash. You’re now in charge of both. How have you structured the group? How has Google DeepMind structured under you as CEO, and how are you managing that culture integration?
Actually, it turns out that the culture’s a lot more similar than perhaps has been reported externally. And in the end, it’s actually been surprisingly smooth and pleasant because you’re talking about two world-class research groups, two of the best AI research organizations in the world, incredible talent on both sides, storied histories. As we were thinking about the merger and planning it, we were looking at some document where we listed the top 10 breakthroughs from each group. And when you take that in totality, it’s like 80–90 percent of over the last decade, of the breakthroughs that underpin the modern AI industry, from deep reinforcement learning to transformers, of course. It’s an incredible set of people and talent, and there’s massive respect for both groups on both sides. And there was actually a lot of collaboration on a project-based level ongoing over the last decade.
Of course, we all know each other very well. I just think it’s a question of focus and a bit of coordination across both groups, actually, and more in terms of what are we going to focus on, other places that it makes sense for the two separate teams to collaborate on, and maybe de-duplicate some efforts that basically are overlapping. So fairly obvious stuff, to be honest, but it’s important moving into this new phase now of where we are into more of an engineering phase of AI, and that requires huge resources, both compute, engineering, and other things. And, even as a company the size of Google, we’ve got to pick our bets carefully and be clear about which arrows we are going to put our wood behind and then focus on those and then massively deliver on those things. So I think it’s part of the natural course of evolution as to where we are in the AI journey.
That thing you talked about, “We’re going to combine these groups, we’re going to pick what we’re doing, we’re going to de-duplicate some efforts.” Those are structure questions. Have you decided on a structure yet, and what do you think that structure will be?
The structure’s still evolving. We’re only a couple of months into it. We wanted to make sure we didn’t break anything, that it was working. Both teams are incredibly productive, doing super amazing research, but also plugging in to very important product things that are going on. All of that needs to continue.
You keep saying both teams. Do you think of it as two teams, or are you trying to make one team?
No, no, for sure it’s one unified team. I like to call it a “super unit,” and I’m very excited about that. But obviously, we’re still combining that and forming the new culture and forming the new grouping, including the organizational structures. It’s a complex thing — putting two big research groups together like this. But I think, by the end of the summer, we’ll be a single unified entity, and I think that’ll be very exciting. And we’re already feeling, even a couple of months in, the benefits and the strengths of that with projects like Gemini that you may have heard of, which is our next-generation multimodal large models — very, very exciting work going on there, combining all the best ideas from across both world-class research groups. It’s pretty impressive to see.
You have a lot of decisions to make. What you’re describing is a bunch of complicated decisions and then, out in the world, how should we regulate this? Another set of very complicated decisions. You are a chess champion, you are a person who has made games. What is your framework for making decisions? I suspect it is much more rigorous than the other ones I hear about.
“Chess is basically decision-making under pressure with an opponent.”
Yes, I think it probably is. And I think if you play a game like chess that seriously — effectively professionally — since all my childhood, since the age of four, I think it’s very formative for your brain. So I think, in chess, the problem-solving and strategizing, I find it a very useful framework for many things and decision-making. Chess is basically decision-making under pressure with an opponent, and it’s very complex, and I think it’s a great thing. I advocate it being taught at school, part of the school curriculum, because I think it’s a really fantastic training ground for problem-solving and decision-making. But then, I think actually the overarching approach is more of the scientific method.
So I think all my training is doing my PhDs and postdocs and so on, obviously I did it in neuroscience, so I was learning about the brain, but it also taught me how to do rigorous hypothesis testing and hypothesis generation and then update based on empirical evidence. The whole scientific method as well as the chess planning, both can be translated into the business domain. You have to be smart about how to translate that, you can’t be academic about these things. And often, in the real world, in business, there’s a lot of uncertainty and hidden information that you don’t know. So, in chess, obviously all the information’s there for you on the board. You can’t just directly translate those skills, but I think, in the background, they can be very helpful if applied in the right way.
How do you combine those two in some decisions you’ve made?
There are so many decisions I make every day,it’s hard to come up with one now. But I tend to try and plan out and scenario a plan many, many years in advance. So I tell you the way I try to approach things is, I have an end goal. I’m quite good at imagining things, so that’s a different skill, visualizing or imagining what would a perfect end state look like, whether that’s organizational or it’s product-based or it’s research-based. And then, I work back from the end point and then figure out what all the steps would be required and in what order to make that outcome as likely as possible.
So that’s a little bit chess-like, right? In the sense of you have some plan that you would like to get to checkmate your opponent, but you’re many moves away from that. So what are the incremental things one must do to improve your position in order to increase the likelihood of that final outcome? And I found that extremely useful to do that search process from the end goal back to the current state that you find yourself in.
Let’s put that next to some products. You said there’s a lot of DeepMind technology and a lot of Google products. The ones that we can all look at are Bard and then your Search Generative Experience. There’s AI in Google Photos and all this stuff, but focused on the LLM moment, it’s Bard and the Search Generative Experience. Those can’t be the end state. They’re not finished. Gemini is coming, and we’ll probably improve both of those, and all that will happen. When you think about the end state of those products, what do you see?
The AI systems around Google are also not just in the consumer-facing things but also under the hood that you may not realize. So even, for example, one of the things we applied our AI systems to very initially was the cooling systems in Google’s data centers, enormous data centers, and actually reducing the energy they use by nearly 30 percent that the cooling systems use, which is obviously huge if you multiply that by all of the data centers and computers they have there. So there are actually a lot of things under the hood where AI is being used to improve the efficiency of those systems all the time. But you’re right, the current products are not the end state; they’re actually just waypoints. And in the case of chatbots and those kinds of systems, ultimately, they will become these incredible universal personal assistants that you use multiple times during the day for really useful and helpful things across your daily lives.
“…today’s chatbots will look trivial by comparison to I think what’s coming in the next few years.”
From what books to read to recommendations on maybe live events and things like that to booking your travel to planning trips for you to assisting you in your everyday work. And I think we’re still far away from that with the current chatbots, and I think we know what’s missing: things like planning and reasoning and memory, and we are working really hard on those things. And I think what you’ll see in maybe a couple of years’ time is today’s chatbots will look trivial by comparison to I think what’s coming in the next few years.
My background is as a person who’s reported on computers. I think of computers as somewhat modular systems. You look at a phone — it’s got a screen, it’s got a chip, it’s got a cell antenna, whatever. Should I look at AI systems that way — there’s an LLM, which is a very convincing human language interface, and behind it might be AlphaFold that’s actually doing the protein folding? Is that how you’re thinking about stitching these things together, or is it a different evolutionary pathway?
Actually, there’s a whole branch of research going into what’s called tool use. This is the idea that these large language models or large multimodal models, they’re expert at language, of course, and maybe a few other capabilities, like math and possibly coding. But when you ask them to do something specialized, like fold a protein or play a game of chess or something like this, then actually what they end up doing is calling a tool, which could be another AI system, that then provides the solution or the answer to that particular problem. And then that’s transmitted back to the user via language or pictorially through the central large language model system. So it may be actually invisible to the user because, to the user, it just looks like one big AI system that has many capabilities, but under the hood, it could be that actually the AI system is broken down into smaller ones that have specializations.
And I actually think that probably is going to be the next era. The next generation of systems will use those kinds of capabilities. And then you can think of the central system as almost a switch statement that you effectively prompt with language, and it roots your query or your question or whatever it is you’re asking it to the right tool to solve that question for you or provide the solution for you. And then transmit that back in a very understandable way. Again, using through the interface, the best interface really, of natural language.
Does that process get you closer to an AGI, or does that get you to some maximum state and you got to do something else?
I think that is on the critical path to AGI, and that’s another reason, by the way, I’m very excited about this new role and actually doing more products and things because I actually think the product roadmap from here and the research roadmap from here toward something like AGI or human-level AI is very complementary. The kinds of capabilities one would need to push in order to build those kinds of products that are useful in your everyday life like a universal assistant requires pushing on some of these capabilities, like planning and memory and reasoning, that I think are vital for us to get to AGI. So I actually think there’s a really neat feedback loop now between products and research where they can effectively help each other.
I feel like I had a lot of car CEOs on the show at the beginning of it. I asked all of them, “When do you think we’re going to get self-driving cars?” And they all said five years, and they’ve been saying five years for five years, right?
Yes.
I’m going to ask you a version of that question about AGI, but I feel like the number has gotten smaller recently with people I’ve talked to. How many years until you think we have AGI?
I think there’s a lot of uncertainty over how many more breakthroughs are required to get to AGI, big, big breakthroughs — innovative breakthroughs — versus just scaling up existing solutions. And I think it very much depends on that in terms of timeframe. Obviously, if there are a lot of breakthroughs still required, those are a lot harder to do and take a lot longer. But right now, I would not be surprised if we approached something like AGI or AGI-like in the next decade.
In the next decade. All right, I’m going to come back to you in 10 years. We’re going to see if that happens.
Sure.
That’s not a straight line, though. You called it the critical path, that’s not a straight line. There are breakthroughs along the way that might upset the train and send you along a different path, you think.
“…research is never a straight line. If it is, then it’s not real research.”
Research is never a straight line. If it is, then it’s not real research. If you knew the answer before you started it, then that’s not research. So research and blue sky research at the frontier always has uncertainty around it, and that’s why you can’t really predict timelines with any certainty. But what you can look at is trends, and we can look at the quality of ideas and projects that are being worked on today, look at how they’re progressing. And I think that could go either way over the next five to 10 years where we might asymptote, we might hit a brick wall with current techniques and scaling. I wouldn’t be surprised if that happened, either: that we may find that just scaling the existing systems resulted in diminishing returns in terms of the performance of the system.
And actually, that would then signal some new innovations were really required to make further progress. At the moment, I think nobody knows which regime we’re in. So the answer to that is you have to push on both as hard as possible. So both the scaling and the engineering of existing systems and existing ideas as well as investing heavily into exploratory research directions that you think might deliver innovations that might solve some of the weaknesses in the current systems. And that’s one advantage of being a large research organization with a lot of resources is we can bet on both of those things maximally, both of those directions. In a way, I’m agnostic to that question of “do we need more breakthroughs or will existing systems just scale all the way?” My view is it’s an empirical question, and one should push both as hard as possible. And then the results will speak for themselves.
This is a real tension. When you were at DeepMind in Alphabet and you were very research-focused, and then the research was moved back into Google and Google’s engineers would turn it into products. And you can see how that relationship worked. Now, you’re inside of Google. Google is under a lot of pressure as a company to win this battle. And those are product concerns. Those are “Make it real for people and go win in the market.” There’s a leaked memo that went around. It was purportedly from inside Google. It said the company had no moat and open-source AI models or leaked models would run on people’s laptops, and they would outpace the company because the history of open computing would outpace a closed-source competitor. Was that memo real?
“I think that memo was real.”
I think that memo was real. I think engineers at Google often write various documents, and sometimes they get leaked and go viral. I think that’s just a thing that happens, but I wouldn’t take it too seriously. These are just opinions. I think it’s interesting to listen to them, and then you’ve got to chart your own course. And I haven’t read that specific memo in detail, but I disagree with the conclusions from that. And I think there’s obviously open source and publishing, and we’ve done tons of that in the history of DeepMind. I mean, AlphaFold was open sourced, right? So we obviously believe in open source and supporting research and open research. That’s a key thing of the scientific discourse, which we’ve been a huge part of. And so is Google, of course, publishing transformers and other things. And TensorFlow and you look at all the things we’ve done.
We do a huge amount in that space. But I also think there are other considerations that need to be had as well. Obviously commercial ones but also safety questions about access to these very powerful systems. What if bad actors can access it? Who maybe aren’t that technical, so they couldn’t have built it themselves, but they can certainly reconfigure a system that is out there? What do you do about those things? And I think that’s been quite theoretical till now, but I think that that is really important from here all the way to AGI as these systems become more general, more sophisticated, more powerful. That question is going to be very important about how does one stop bad actors just using these systems for things they weren’t intended for but for malicious purposes.
That’s something we need to increasingly come up with, but just back to your question, look at the history of what Google and DeepMind have done in terms of coming up with new innovations and breakthroughs and multiple, multiple breakthroughs over the last decade or more. And I would bet on us, and I’m certainly very confident that that will continue and actually be even more true over the next decade in terms of us producing the next key breakthroughs just like we did in the past.
Do you think that’s the moat: we invented most of this stuff, so we’re going to invent most of the next stuff?
I don’t really think about it as moats, but I’m an incredibly competitive person. That’s maybe another thing I got from chess, and many researchers are. Of course, they’re doing it to discover knowledge, and ultimately, that’s what we are here for is to improve the human condition. But also, we want to be first to do these things and do them responsibly and boldly. We have some of the world’s best researchers. I think we have the biggest collection of great researchers in the world, anywhere in the world, and an incredible track record. And there’s no reason why that shouldn’t continue in the future. And in fact, I think with our new organization and environment might be conducive to even more and faster-paced breakthroughs than we’ve done in the past.
You’re leading me toward risk and regulation. I want to talk about that, but I want to start in with just a different spin on it. You’re talking about all the work that has to be done. You’re talking about deep mind reinforcement learning, how that works. We ran a gigantic cover story in collaboration with New York Magazine about the taskers who are actually doing the training, who are actually labeling the data. There’s a lot of labor conversation with AI along the way. Hollywood writers are on strike right now because they don’t want ChatGPT to write a bunch of scripts. I think that’s appropriate.
But then there’s a new class of labor that’s being developed where a bunch of people around the world are sitting in front of computers and saying, “Yep, that’s a stop sign. No, that’s not a stop sign. Yep, that’s clothes you can wear. No, that’s not clothes you can wear.” Is that a forever state? Is that just a new class of work that needs to be done for these systems to operate? Or does that come to an end?
I think it’s hard to say. I think it’s definitely a moment in time and the current systems and what they’re requiring at the moment. We’ve been very careful just to say, from our part, and I think you quoted some of our researchers in that article, to be very careful to pay living wages and be very responsible about how we do that kind of work and which partners we use. And we also use internal teams as well. So actually, I’m very proud of how responsible we’ve been on that type of work. But going forward, I think there may be ways that these systems, especially once you have millions and millions of users, effectively can bootstrap themselves. Or one could imagine AI systems that are capable of actually conversing with themselves or critiquing themselves.
This would be a bit like turning language systems into a game-like setting, which of course we’re very expert in and we’ve been thinking about where these reinforcement learning systems, different versions of them, can actually rate each other in some way. And it may not be as good as a human rater, but it’s actually a useful way to do some of the bread and butter rating and then maybe just calibrate it by checking those ratings with a human rater at the end, rather than getting human raters to rate everything. So I think there are lots of innovations I can see coming down the line that will help with this and potentially mean that there’s less requirement for this all to be done by human raters.
But you think there are always human raters in the mix? Even as you get closer to AGI, it seems like you need someone to tell the computer if it’s doing a good job or not.
Let’s take AlphaZero as an example, our general games playing system that ended up learning, itself, how to play any two-player game, including chess and Go. And it’s interesting. What happened there is we set up the system so that it could play against itself tens of millions of times. So, in fact, it built up its own knowledge base. It started from random, played itself, bootstrapped itself, trained better versions of itself, and played those off each other in sort of mini-tournaments. But at the end, you still want to test it against the human world champion or something like this or an external computer program that was built in a conventional way so that you can just calibrate your own metrics, which are telling you these systems are improving according to these objectives or these metrics.
But you don’t know for sure until you calibrate it with an external benchmark or measure. And depending on what that is, a human rater or human benchmark — a human expert is often the best thing to calibrate your internal testing against. And you make sure that your internal tests are actually mapping reality. And again, that’s something quite exciting about products for researchers because, when you put your research into products and millions of people are using it every day, that’s when you get real-world feedback, and there’s no way around that, right? That’s the reality, and that’s the best test of any theories or any system that you’ve built.
Do you think that work is rewarding or appropriate, the labeling of data for AI systems? There’s just something about that, which is, “I’m going to tell a computer how to understand the world so that it might go off in the future and displace other people.” There’s a loop in there that seems like it’s worth more just moral or philosophical consideration. Have you spent time thinking about that?
Yeah, I do think about that. I think I don’t really see it like that. I think that what raters are doing is they’re part of the development cycle of making these systems safer, more useful for everybody, and more helpful and more reliable. So I think it’s a critical component. In many industries, we have safety testing of technologies and products. Today, that’s the best we can do for AI systems is to have human raters. I think, in the future, the next few years, I think we need a lot more research. And I’ve been calling for this, and we are doing this ourselves, but it needs more than just one organization to do this, is great, robust evaluation benchmarks for capabilities so that we know if a system passes these benchmarks, then it has certain properties, and it’s safe and it’s reliable in these particular ways.
And right now, I think we are in the space of many researchers in academia and civil society and elsewhere, we have a lot of good suggestions for what those tests could be, but I don’t think they are robust or practical yet. I think they’re basically theoretical and philosophical in nature, and I think they need to be made practical so that we can measure our systems empirically against those tests and then that gives us some assurances about how the system will perform. And I think once we have those, then the need for this human rating testing feedback will be reduced. I just think that’s required in the volumes that’s required now because we don’t have these kinds of independent benchmarks yet. Partly because we haven’t rigorously defined what those properties are. I mean, it’s almost a neuroscience and psychology and philosophy area as well, right? A lot of these terms have not been defined properly, even for the human brain.
You’ve signed a letter from the Center for AI Safety — OpenAI’s Sam Altman and others have also signed this letter — that warns against the risk from AI. And yet, you’re pushing on, Google’s in the market, you’ve got to win, you’ve described yourself as competitive. There’s a tension there: needing to win in the market with products and “Oh boy, please regulate us because raw capitalism will drive us off the cliff with AI if we don’t stop it in some way.” How do you balance that risk?
It is a tension. It’s a creative tension. What we like to say at Google is we want to be bold and responsible, and that’s exactly what we’re trying to do and live out and role model. So the bold part is being brave and optimistic about the benefits, the amazing benefits, incredible benefits, AI can bring to the world and to help humanity with our biggest challenges, whether that’s disease or climate or sustainability. AI has a huge part to play in helping our scientists and medical experts solve those problems. And we’re working hard on that and all those areas. And AlphaFold, again, I’d point to as a poster child for that, what we want to do there. So that’s the bold part. And then, the responsible bit is to make sure we do that as thoughtfully as possible with as much foresight as possible ahead of time.
Try and anticipate what the issues might be if one was successful ahead of time. Not in hindsight, and perhaps this happened with social media, for example, where it is this incredible growth story. Obviously, it’s done a lot of good in the world, but then it turns out 15 years later we realize there are some unintended consequences as well to those types of systems. And I would like to chart a different path with AI. And I think it’s such a profound and important and powerful technology. I think we have to do that with something as potentially as transformative as AI. And it doesn’t mean no mistakes will be made. It’s very new, anything new, you can’t predict everything ahead of time, but I think we can try and do the best job we can.
“It’s very new. You can’t predict everything ahead of time, but I think we can try and do the best job we can.”
And that’s what signing that letter was for was just to point out that I don’t think it’s likely, I don’t know on the timescales, but it’s something that we should consider, too, in the limit is what these systems can do and might be able to do as we get closer to AGI. We are nowhere near that now. So this is not a question of today’s technologies or even the next few years’, but at some point, and given the technology’s accelerating very fast, we will need to think about those questions, and we don’t want to be thinking about them on the eve of them happening. We need to use the time now, the next five, 10, whatever it is, years, to do the research and to do the analysis and to engage with various stakeholders, civil society, academia, government, to figure out, as this stuff is developing very rapidly, what the best way is of making sure we maximize the benefits and minimize any risks.
And that includes mostly, at this stage, doing more research into these areas, like coming up with better evaluations and benchmarks to rigorously test the capabilities of these frontier systems.
You talked about tool usage for AI models, you ask an LLM to do something, it goes off and asks AlphaFold to fold the protein for you. Combining systems like that, integrating systems like that, historically that’s where emergent behaviors appear, things you couldn’t have predicted start happening. Are you worried about that? There’s not a rigorous way to test that.
Right, exactly. I think that’s exactly the sort of thing we should be researching and thinking about ahead of time is: as tool use becomes more sophisticated and you can combine different AI systems together in different ways, there is scope for emergent behavior. Of course, that emergent behavior may be very desirable and be extremely useful, but it could also potentially be harmful in the wrong hands and in the hands of bad actors, whether that’s individuals or even nation-states.
Let’s say the United States and the EU and China all agree on some framework to regulate AI, and then North Korea or Iran says, “Fuck it, no rules.” And that becomes a center of bad actor AI research. How does that play out? Do you foresee a world in which that’s possible?
Yeah, I think that is a possible world. This is why I’ve been talking to governments — UK, US mostly, but also EU — on I think whatever regulations or guardrails or whatever that is that transpires over the next few years, and tests. They ideally would be international, and there would be international cooperation around those safeguards and international agreement around deployment of these systems and other things. Now, I don’t know how likely that is given the geopolitical tensions around the world, but that is by far the best state. And I think what we should be aiming for if we can.
If the government here passes a rule. It says, “Here’s what Google is allowed to do, here’s what Microsoft is allowed to do. You are in charge, you are accountable.” And you can go say, “All right, we’re just not running this code in our data center. We are not going to have these capabilities; it’s not legal.” If I’m just a person with a MacBook, would you accept some limitation on what a MacBook could do because the threat from AI is so scary? That’s the thing I worry about. Practically, if you have open-source models and people are going to use them for weird things, are we going to tell Intel to restrict what its chips can do? How would we implement that such that it actually affects everyone? And not just, we’re going to throw Demis in jail if Google does stuff we don’t like.
I think those are the big questions that are being debated right now. And I do worry about that. On the one hand, there are a lot of benefits of open-sourcing and accelerating scientific discourse and lots of advances happen there and it gives access to many developers. On the other hand, there could be some negative consequences with that if there are bad individual actors that do bad things with that access and that proliferates. And I think that’s a question for the next few years that will need to be resolved. Because right now, I think it’s okay because the systems are not that sophisticated or that powerful and therefore not that risky.
But I think, as systems increase in their power and generality, the access question will need to be thought about from government and how they want to restrict that or control that or monitor that is going to be an important question. I don’t have any answers for you because I think this is a societal question actually that requires stakeholders from right across society to come together and weigh up the benefits with the risks there.
Google’s own work, you said we’re not there yet, but Google’s own work in AI certainly had some controversy associated with this around responsibility, around what the models can do or can’t do. There’s a famous “Stochastic Parrots” paper from Emily Bender and Timnit Gebru and Margaret Mitchell that led to a lot of controversy inside of Google. It led to them leaving. Did you read that paper and think, “Okay, this is correct. LLMs are going to lie to people and Google will be responsible for that”? And how do you think about that now with all of the scrutiny?
Yeah, look, the large language models, and I think this is one reason that Google’s been very responsible with this, is that we know that they hallucinate and they can be inaccurate. And that’s one of the key areas that has to be improved over the next few years is factuality and grounding and making sure that they don’t spread disinformation, these kinds of things. And that’s very much top of mind for us. And we have many ideas of how to improve that. And our old DeepMind’s Sparrow language model, which we published a couple of years ago, was an experiment into just how good can we get factuality and rules adherence in these systems. And turns out, we can maybe make it an order of magnitude better, but it sometimes comes at the expense of lucidness or creativity on the part of the language model and therefore usefulness.
So it’s a bit of a Pareto frontier where, if you improve one dimension, you reduce the capability in another dimension. And ideally, what we want to do in the next phases and the next generations of systems is combine the best of both worlds — keep the creativity and lucidness and funness of the current systems but improve their factuality and reliability. And we’ve got a long way to go on that. But I can see things improving, and I don’t see any theoretical reason why these systems can’t get to extremely high levels of accuracy and reliability in the next few years.
When you’re using the Google Search Generative Experience, do you believe what it says?
I do. I sometimes double-check things, especially in the scientific domain where I’ve had very funny situations where, actually all of these models do this, where you ask them to summarize an area of research, which I think would be super useful if they could do that, and then say, “Well, what are the key papers I should read?” And they come up with very plausible sounding papers with very plausible author lists. But then, when you go and look into it, it turns out that they’re just like the most famous people in that field or the titles from two different papers combined together. But of course, they’re extremely plausible as a collection of words. And I think, there what needs to happen is these systems need to understand that citations and papers and author lists are a unitary block rather than a word-by-word prediction.
There are interesting cases like that where we need to improve, and there’s something which is, of course, us as wanting to advance the frontiers of science, that’s a particularly interesting use case that we would like to improve and fix — for our own needs as well. I’d love these systems to better summarize for me “here are the top five papers to read about a particular disease” or something like that to just quickly onboard you in that particular area. I think it would be incredibly useful.
I’ll tell you, I googled my friend John Gruber, and SGE confidently told me that he pioneered the use of a Mac in newspapers and invented WebKit. I don’t know where that came from. Is there a quality level, a truthfulness level that you need to hit before you roll that out to the mass audience?
Yeah, we think about this all the time, especially at Google because of the incredibly high standards Google holds itself to on things like search and that we all rely on every day and every moment of every day, really, and we want to get toward that level of reliability. Obviously, we’re a long, long, long way away from that at the moment with not just us but anybody with their generative systems. But that’s the gold standard. And actually, things like tool use can come in very handy here where you could, in effect, build these systems so that they fact-check themselves, perhaps even using search or other reliable sources, cross-reference, just like a good researcher would, cross-reference your facts. Also having a better understanding of the world. What are research papers? What entities are they?
So these systems need to have a better understanding of the media they’re dealing with. And maybe also give these systems the ability to reason and plan because then they could potentially turn that on their own outputs and critique themselves. And again, this is something we have a lot of experience in in games programs. They don’t just output the first move that you think of in chess or Go. You actually plan and do some search around that and then back up. And sometimes they change their minds and switch to a better move. And you could imagine some process like that with words and language as well.
There’s the concept of model collapse. That we’re going to train LLMs on LLM-generated data, and that’s going to go into a circle. When you talk about cross-referencing facts, and I think about Google — Google going out in the web and trying to cross-reference a bunch of stuff but maybe all that stuff has been generated by LLMs that were hallucinating in 2023. How do you guard against that?
We are working on some pretty cool solutions to that. I think the answer is, and this is an answer to deepfakes as well, is to do some encrypted watermarking, sophisticated watermarking, that can’t be removed easily or at all, and it’s probably built into the generative models themselves, so it’s part of the generative process. We hope to release that and maybe provide it to third parties as well as a generic solution. But I think that the industry in the field needs those types of solutions where we can mark generated media, be that images, audio, perhaps even text with some Kitemark that says to the user and future AI systems that these were AI-generated. And I think that’s a very, very pressing need right now for near-term issues with AI like deepfakes and disinformation and so on. But I actually think a solution is on the horizon now.
I had Microsoft CTO and EVP of AI Kevin Scott on the show a few weeks ago. He said something very similar. I promised him that we would do a one-hour episode on metadata. So you’re coming for that one. If I know this audience, a full hour on metadata ideas will be our most popular episode ever.
Okay, sounds perfect.
Demis, thank you so much for coming on Decoder. You have to come back soon.
Thanks so much.
Decoder with Nilay Patel /
A podcast about big ideas and other problems