Manifold | Transcript: Dreamers and Doomers: Our AI future, with Richard Ngo

April 9, 2026 • 112 Minutes

Dreamers and Doomers: Our AI future, with Richard Ngo – #109

Richard Ngo: I think Sam has a very earnest mode and he has a Machiavellian mode. And I think people find it surprising. but because he's so good at both modes they, I think get quite surprised and sometimes shocked.

Steve Hsu: Yeah.

Richard Ngo: to see the transition. And, and to be clear, this is not based on I, I I'm not like alluding to like stuff that's not particularly public.

Like, I think, like, you, you can observe this from like, the public observations of like how various dramas OpenAI have gone down. I have some sense that it's hard to explain what happened to OpenAI without without knowing that Sam has an extremely earnest mode. That, and that, so people do end up maybe disproportionately surprised or shocked when it he turns out to not always be earnest.
Steve Hsu: Welcome to Manifold. Today my guest is Richard Ngo, a philosopher and AI researcher. We're here in his San Francisco home, another group house where cool hipsters our changing the world. This is part of a documentary that we're making called Dreamers and Doomers. But right now we're gonna film an episode of Manifold.And Richard, it is great to have you on the show.

Richard Ngo: Thanks for having me, Steve.

Steve Hsu: So Richard, I'd like the audience to get a feel for you as a person. You were born, or at least you grew up in New Zealand

Richard Ngo: mm-hmm.

Steve Hsu: And somehow found your way to Cambridge University where you studied computer science and philosophy. Could you tell us a little bit about your childhood and what it was like to attend Cambridge?

Richard Ngo: Yeah, so I think my childhood, there was a lot, a lot of reading. I bounced a little between New Zealand and Vietnam because my father's Vietnamese. and I started getting interested in ai. you know, in high school I was reading about it online, less wrong, that kind of thing.

I actually, I first attended Oxford and did computer science philosophy there, and then did a master's at Cambridge and ba Basically since getting to Oxford, I've sort of bounced back and forth between, doing philosophy and doing machine learning and computer science. So like my undergrad degree was in both my master's was in ml.

After a while at DeepMind I decided to go back to philosophy, tried, part of a philosophy PhD and then ended up doing more philosophical things at Open AI after that. So, basically, I have spent quite a while now trying to make ML more philosophical and, and make, make philosophy more closer to, you know, ml.
Steve Hsu: It could be just my bias, it probably is my bias, but I feel I trust you more because you at least spent some time trying to do ML and computer science that someone who's a pure philosopher I feel might not have the right feel for what's really going on.
Richard Ngo: That seems fair. I trust philosophers of science most out of all philosophers, I think because they have a sense of what it looks like to actually solve problems.
Whereas, you know, if you're doing other branches of philosophy, they boast about how long it's been since they actually solved the problem. Like, we've been discussing the same things for a few thousand years, that it is sort of very hard to be in that mindset and actually try to make progress.
Steve Hsu: So you were at DeepMind, I believe, sort of leading up to 2020, is that the timeframe?

That's right. Yeah.

Richard Ngo: Right.

Steve Hsu: And talk a little bit about what that atmosphere was like. So that was, I think pre, I mean the transformer paper had been written, but that was pre the transformer LLM craze that we're currently in.
Richard Ngo: Right.

Steve Hsu: So talk a little bit about what the environment was like then.

Richard Ngo: Yeah. So it was in some ways quite academic. I felt like there were a lot of people there who would've been professors, but you know, they could just do all the same things at DeepMind and be paid more and have a comfier lifestyle. There were a lot of different bets on research directions. There was a neuroscience team, there was a multi-agent team.

I think Demis was pretty hands-on with a lot of those bets, like trying to figure out what things he believed in and then, nudge people in those directions. I was working on some early RLHF stuff with Jan Leike. So we were doing RLHF.Actually some of the first papers with Christiana and Leike and so on, they are in these you know, toy environments where you've got, simulated robots that are hopping around or, you know, trying to navigate mazes and so on.

And we were doing a little bit of language stuff, but actually maybe the most notable thing was just how little language work was being done at DeepMind. And in so far as there was stuff being done, it was in these simulator environments trying to use language to steer agents. I think basically just because DeepMind leadership really believed that you needed grounded, embodied data in order to be intelligent.

Steve Hsu: So just recently I was talking to a fairly senior person at Google, Google DeepMind, on the Mountain View side.

Richard Ngo: Mm-hmm.

Steve Hsu: And he referred to the London perspective. So I think when you were there, you were in London. That's right. Correct. King's Cross, maybe. He referred to the London perspective at DeepMind as being a bit anti or skeptical about large language models and transformers and I was actually surprised by that. I didn't really realize there was that kind of dichotomy. Maybe you could just talk about whether you perceived anything like that.

Richard Ngo: Yeah, absolutely. So, you know Demis and Shane were neuroscientists before starting DeepMind, so they had a sense that, you know, there was algorithmic progress to be made and understanding what was going on in the brain and then yeah, they just had this very big bet on embodiment.

I think so maybe it was Mikolov who was at Facebook who wrote a paper a while back saying language is all you need. It wasn't quite called that. Something along those lines. And I think that was basically the opposite of the DeepMind view. And so. They were trying to do all, all kinds of simulation type things.
They were trying to have agents running around, interacting with each other. You know, the people who believed this most strongly were a multi-agent team. And I think, you know, opening AI had a multi-agent team for a while that was doing a similar kind of investigation into emergent interaction.

Basically generalizing a lot of the self play stuff, right? So if you like, you know, AlphaGo works pretty well, and so you think, well, how do we get how do we, go another step beyond that? You just need more complex environments for the agents to play games with each other. And

Steve Hsu: I might be mischaracterizing his view, but my, my gut feeling with Demis is that he thinks that this current phase where it's hyper scaling of transformers you know, is an interim phase, but the ultimate super AI will be something that sort of learns the way that we do.

And doesn't need to be fed all of the data ever produced by the human species to, to, to get to a reasonable place. But, but who knows when we'll get to that architecture.

Richard Ngo: Right. And who knows if it's gonna be via iteration and engineering or breakthroughs? I do, I do feel sympathetic to Demi's perspective here.

I I have some sense that out of all the major lab leaders, he is most invested in actually trying to understand what's going on. So, you know, you have the open AI and philanthropic are just like extremely empirically mindedbecause that's what's worked for them. And they do have some teams trying to do, especially like the interpretability team at Anthropic, they're, they're trying to do more fundamental research, but sort of baked into each of the companies that they. A more engineering ish. And I think Elon's said this pretty explicitly about XAI, that, you know, he doesn't really believe in the term researcher. He, everyone's an engineer. Yes. It different type types of engineering. which seems, you know, a little misguided to me.

Steve Hsu: This, I shouldn't ask this question, but I can't resist.
Since you did work for some years with Demis, is there a sense that you feel that Demis is actually in some sense intellectually stronger than the other major lab leaders?

Richard Ngo: he's, I mean he's more of a researcher than Sam, certainly, and I guess Elon as well. you know, in some sense that's, that has been what,was caused difficulties with D Mind, right?
Like the fact that, Dennis was enough of a researcher to have strong opinions about where to go and what to do, and Shane as well. So I, I worked more with Shane. And I didn't interact with DE so much. yeah, I think it's obviously like Dario, you know, did a bunch of, amazing research. I, so I don't have a great sense of, you know, how hands-on each of them is at this point and how, how much they get to actually think about the, like, frontier of, you know, the, the research.

Steve Hsu: Now you, after working at DeepMind, you worked at OpenAI, so you've seen two of the major labs up close.

Richard Ngo: Mm-hmm.

Steve Hsu: And I, I would guess in the SF community, because often people are living in houses where one guy works at Anthropic and one guy works at open AI or what have you, there's a lot of sort of visibility into what's happening at the different labs.

Mm-hmm. maybe talk a little bit about what the transition was like for you going to open ai, how it was different inside than what you experienced, at DeepMind.

Richard Ngo: Yeah, so I do think, there was, I think in both companies, the leadership was pretty thoughtful about like taking a GI, seriously, maybe thoughtful is the wrong word, I should say.
they were focused, whereas the rank and file in the companies were pretty different. The opening eye people were, you know, weirder more, more open to wacky ideas, closer to, generally taking AGI more seriously. Whereas, as I said, DeepMind people were more like people might otherwise have been professors.

Yeah. So there was that cultural element. There was also just the General Bay area, cultural difference, you know, like entrepreneurship, culture baked in, moving fast and trying to, trying like being very anti, bureaucracy for example. yeah, so those were some of the. Main differences, I think.
Yeah, I personally was in a pretty weird role. I was doing, you know, I was a futurist at OpenAI and so, I didn't have many of the standard, pressures applied to me to, you know, do certain kinds of research. I had a lot of intellectual freedom at OpenAI, which was nice. whereas at DeepMind I was a research engineer and I think, it would've been difficult to carve out a role anywhere near as unusual as the one I had at OpenAI in part, 'cause DeepMind at the time was much more focused on academic credentials and, you know, wanted people to have PhDs.

Steve Hsu: So, talk a little bit about that transition in your life. So you went from being a research engineer living in London mm-hmm. Working for DeepMind

Richard Ngo: Yeah.

Steve Hsu: To being. A house in-house futurist That's right. For OpenAI living in San Francisco. So, so that must have been a transition for you, Lifewise.

Richard Ngo: Yeah, I should say. I, I decided to leave DeepMind before getting the OpenAI offer, and so I actually spent a year, at Cambridge, doing a PhD in the philosophy of machine learning. which, you know, wasn't really a field at the time and still barely is a field. So it was mostly just me poking around at things I found interesting.

So I'd already decided that there were these big questions, that needed to be answered. Things like, you know, what do we even mean by AGI, what, what, like, what's the basis for these arguments about existential risk? so that's, those have been strands that I've been pursuing basically since joining
DeepMind and realizing that, everyone was more confused about this than I expected.
It.

Steve Hsu: And so triple transition for you. Yeah. So deep mind academia for a year to do deep philosophy. Mm-hmm. But then coming to San Francisco to work at a very, as you said, startup oriented kind of culture. So just, just talk a little bit about how you, how it made you feel. Like what, what was a transition like for you?

Richard Ngo: San Francisco is much more intense than anywhere in England. You know, you have much people who are thinking much more outta the box people who, like, I, I moved here and I was immediately living in a, I think, 25 or 30 person group house. So it's so big that I couldn't even keep track of who was there.

So that's just a, a kind of crazy endeavor. I think you've already talked to Jeremy who was running that house. And so obviously culturally as well, people are just much more interested in like, there's the whole hippie culture, especially over in Berkeley. So you, you've got just like a confluence of pretty unusual cultures and ideas.

And I like went around and like sampled each of these and, and picked up, you know, the, what I thought were the best parts of each. So yeah, big personal shift for me. And then, you know, when things are high variance like that, sometimes things go well. Sometimes things go badly. Like the massive group house ended up exploding and that was a that was a mess.

But, you know, I got a lot of benefit out of the types of people I've met here. And definitely I think you know, the, the difference in conformity between, you know, England and California is night and day.
Steve Hsu: One of the things that's always jarring for me is when I have my academia hat on, or, you know, professional physicist hat on. We are extremely, extremely careful and calibrated about things we say.

Richard Ngo: Yeah.

Steve Hsu: So if, if I say something to a colleague that is misca, it could be directionally right, it could be factually right? Yeah. But if I'm just even miscalibrated on a second order thing

Richard Ngo: mm-hmm.

Steve Hsu: About like, how confident should I be about being confident about this thing, my colleague might just excoriate me because the, the levels of precision are quite high.
Interesting. Now, when I, when I meet with, not that, not that we can't speculate and have fun in physics, we, we actually do, but when push comes to shove and we're really talking about, well, what do you think this experiment is gonna show and how is it related to that theory that you worked on in your last paper?
People expect if they, if they, if they want to get a very well calibrated, precise opinion out of you, you have it ready for them. Okay. And, and so it's a very exact, can be a very exacting thing, academic science when you get into the startup world. Everyone has to be at least partly a grifter, you have to be bold.

You have to kind of be willing to extrapolate that, yeah, I'm not really sure this is gonna work, but I need to talk confidently as if it's gonna work. Otherwise I'll never get a, a venture investor to back me. And so there, there's, there's some code switching or calibration switching that I always find jarring when I go back and forth between university campus and Silicon Valley.

Richard Ngo: Yep.

Steve Hsu: I think to some extent, it seems like in the UK they're not used to being quite as bombastic as people are here. So was that a jarring thing for you?
Richard Ngo: So suddenly this happened in ML with the whole, like just talking about AGI at all was seen as very, you know, ungrounded uncalibrated.

So I had that, a taste of that early on. Even when living in england. Yeah. I'm curious what you think is driving that in the physics context. 'cause I have a sense of what's going on in the machine learning context, which is, you know, there's kind of territoriality and it, it's, it's very difficult when, when you start talking about AGI to know exactly how it relates.

You're trying to develop a different ontology almost. And so like trying to intersect that ontology with the mainstream ontology of, you know, statistical learning theory and you know, various other formalisms, it's just very difficult. So do you get a sense that people are like trying to defend their ground or,
Steve Hsu: I, I wouldn't quite put it that way.

So one aspect of it is in physics a big part of what we do professionally is allocate our own meager resources. So we get money from the Department of Energy, department of Defense, NSF, and we have to decide what is the right next accelerator to build what is the right space telescope to launch into orbit, which costs a billion dollars and we only have a billion dollars, not like OpenAI, right?
So we have to, as a community, be very careful saying, look, if you launch this and you can measure this at this accuracy, it will tell me this important aspect, right? About dark energy. And I'm not faking it for you because I just want that data. I'm actually telling you honestly how it impacts the further analysis.
Whereas here, if you're trying to get an investor to put $10 million into your company, you actually need to exaggerate a little bit or, or at least exaggerate your confidence levels a little bit. So, so there's a, there there's definitely some cultural differences in that sense.

Richard Ngo: Yeah. So, so it sounds like the thing you're pointing at is that the, like science and politics are pretty intertwined and Right. There's a culture of like hyper positioning things in politics, which is almost antithetical to

Steve Hsu: Yes.

Richard Ngo: Science as as it should be done.

Steve Hsu: Yes. Now it's, I just not to leave the listeners with the wrong impression. I mean, you, you can have two physicists meet and they're talking about some super speculative, crazy thing about how the multiverse might work in string theory or something.

And we completely entertain that. Yeah. But when push comes to shove and someone says, are you gonna send these thousand experimentalists and, you know, half of our budget to build that machine to test the speculation you just talked about at that point, the standards are very high, that we're gonna have a real, real talk about what this is all about. And that isn't necessarily the case in, out here where there's billions to be made. Yeah.

Richard Ngo: So I think in some ways physicists have been almost privileged, isn't quite, isn't quite the right word. But you've been working in a domain that's so well understood for, for such a long time Yes. That you, you can hold those standards.

Yeah. And, and like. I sort of see, it seems like academia has kind of bifurcated between people who are holding those standards so rigidly that they can't talk about a lot of interesting stuff. Like I know a lot of academic psychology, for example, versus people who have abandoned those standards entirely.
So this is like the continental stuff and sociology and gender studies and so on. And yeah, it, it, I do think a bunch about how you can get the benefits of both, both of those. Because right now it feels like very few the academic communities can actually manage to like even try to be precise about the most interesting thing.

Yeah.

Steve Hsu: So, coming back to your work as a futurist at OpenAI. Talk about how this came about. So at the, at the top level leadership is Sam someone who says, Hey, we have to have some really good futurists in house and I wanna be able to talk to them. To just kind of understand what the consequences of our work is or was it a little more bureaucratic where someone said, Hey, we should have futurists and someone okayed it and then there was a budget line and then

Richard Ngo: some. No, like Sam, Sam wanted a, Sam pinged me on Facebook. He'd actually, I think he previously had a different futurist Michael Page, I believe now he's just got a new futurist, Josh Akeem.
So it's been something on his mind for a while and I, met with him pretty regularly when I was at OpenAI. Yeah, so it, it was basically driven by him. I don't think it would've happened if he wasn't very interested in it.

Steve Hsu: So I, I've known Sam for a long time. He was an investor in, early investor in one of my startups. Yeah. Before he actually joined OpenAI when he was still at Y Combinator. And I've always felt he is a pretty straight talker. So, so people hate him so much now because he's the face of the AI bubble, if you like. But I feel like even his corporate speak, if you parse it in a kind of forgiving way, in a, in a kind of what's the right way to say it? You, you, you steal, man, what he says a little bit.

Richard Ngo: Mm-hmm.

Steve Hsu: It's, it's not unreasonable. I don't think he's directly lying about anything. I could be wrong. Now, you, you probably have heard a lot more of his recent speech than I have, so, but I, I always reflexively defend him because I've always felt like he was, for example, in your case, honestly interested in futuristic implications for what he was doing. May, maybe you could comment on that.
Richard Ngo: I think Sam has a very earnest mode and he has a Machiavellian mode. And I think people find it surprising but because he's so good at both modes, they, I think get quite surprised and sometimes shocked.

Steve Hsu: Yeah.

Richard Ngo: to see the transition. And, and to be clear, this is not based on, I, I I'm not like alluding to like stuff that's not particularly public. Like, I think, like, you, you can observe this from like, the public observations of like how various dramas OpenAI have gone down. I have some sense that it's hard to explain what happened to OpenAI without without knowing that Sam has an extremely earnest mode. That, and that, so people do end up maybe disproportionately surprised or shocked when it, he turns out to not always be earnest.

Steve Hsu: I think you put it much better than than I did. So I would agree with you. He has earnest, an earnest mode, and he has, I would say, earnest. Intellectual interest in what's happening here. Right. And which is related to having futurists. Right. And but I would say given my experience, you know, running a university or running a startup, you can't survive as a leader of that organization. And honestly, you can't even really fulfill your responsibilities to the rank and file of your organization unless you can sometimes go into Machiavellian mode, not justifying anything Sam did in particular. But I'm just saying, sometimes you have to go into that mode just because the world is the way the world is.
And even if you, even if you had purely altruistic goals, sometimes you have to operate in a, my my view, not, you don't have to disagree, you don't have to agree with me philosophically, but my view from my life experience is even if you have fully a hundred percent altruistic goals in mind to achieve those goals for your organization, sometimes you have to operate in a Machiavellian mode.

Richard Ngo: I think that's somewhat true. And one thing I have struggled with is the question of Yeah, what standards should you hold AI company CEOs to, because, you know, and people had these debates inside OpenAI that when there was the non-disparagement scandal, you know, some people would say, well, actually this is reasonably normal for a hedge fund.

And, and people would say, well, we're not trying to be a hedge fund. We're trying to, you know, be like, hold ourselves to very high standards. Yeah. And so in some sense, the question of what standards do you think that the CEO of an AGI company should be like about at least the median of how ethical in their day-to-day lives A CEO is, should it be you know, much higher than that.

And I think insofar as a GI company, CEOs think of themselves as trying to gain a, an extremely large amount of power, then, you know, it seems pretty reasonable to, hold them to concomitant standards. Now, unfor, like for a lot of people think of that as hype, right? A lot of people will, will say they're not actually, like, they don't, they're just talking their book.

They don't actually think that they're poised to create world changing technology. But I, I suspect that fewer and fewer people think that over time and even fewer people think that by the time this episode airs, whenever that is. so yeah, it, it's, it's very tricky because there's no real reference class.

And, and like, maybe in some sense the reference class should be things like leaders of generals and wars or political leaders who are like knowingly taking on a very large responsibility as they see it. And you know, like generals might always. End up not being particularly important, but at least they're trying to put themselves in a position where they're steering things that affect all of humanity.

Steve Hsu: Yeah. I mean, you, for example, you could say that the president of the United States, who always has an officer following him around with the football

Richard Ngo: Right.

Steve Hsu: That can launch a thousand nuclear warheads at another country. That person has this sort of similar level of ethical difficulty to manage as maybe Sam does,

Richard Ngo: probably a bit more than Sam does.
Yeah. But yeah.

Steve Hsu: But, getting to that scale.

Richard Ngo: Yeah. Right, right, right.

Steve Hsu: So speaking of that, could you talk about how you, you sort of just said something about the likelihood of the machine God, I'm exaggerating a little bit. Yeah. But the, the, like, just for fun, it's more fun if we talk this way. The likelihood of the machine God emerging here in San Francisco sometime in the next five years or something, or 10 years. How did your feelings about the likelihood of an event like that evolve from the time you were at DeepMind through the time that you were a futurist at OpenAI to sitting on this couch?

Richard Ngo: Yeah. Um, I think there's a lot of social thinking in this space, right?
There's like, oh, what would it mean for me if I believe this? What would like, you know, what, what, what sort of identity claim am I trying to make? Like, and even like the ty of this, like, am I one of the doomers or am I one of the dreamers, right? Like, people are trying to put themselves in those buckets.
so a lot of the evolution of my thinking has been has been taking a less identity based approach to this question. Like thinking less about like, am I weird for believing this or not believing this? I think that there was, you know, there was a period of updating around like GP four and, you know, reasoning models coming out that that things seemed like they were going pretty fast.

So maybe that was a bit of a spike where I started talking about like, you know, like things being pretty crazy within a decade, let's say. and my intuitive sense now is that there's a kind of robustness problem that's taking a little longer to solve than people think. Like, especially robustness, autonomy, like those kinds of things.

So when you think about running a company, for example, it's not just that, that you have to be able to take actions over a month or a year. It's also that you have to take actions while like potentially like very intelligent people are trying to scam you out of your money. And, and so that kind of robustness.
Does seem like it's a little harder than we used to think. I had this framework of the TAGI framework, which said like, you know, instead of thinking about a GI as a discreet event, you might think about an A GI that's competent over a period of you know, does things that humans can do over a period of 10 seconds or a hundred seconds or a day or a week or a month.
And so I sort of expect that metric to keep going, to keep going up gradually over the next, you know, 10, 15 years.

Steve Hsu: My friend Dwarkesh likes to talk about continuous learning.

Richard Ngo: Mm-hmm.

Steve Hsu: I would say before he started using that, I used to say to him, look, as long as this thing can't update its weights at least partially in real time.
Mm-hmm.

Steve Hsu: It, it's just a very useful tool. It's not threatening to me. I used to say that to him. But, where are you on that question?

Richard Ngo: My sense is that you just probably get a bunch of pretty hacky solutions, and then eventually the hacky solutions become a little more principled. But it's not actually clear that like, it's not that there's a clear line between the hacky workarounds and like actually solving the problem. So you know, like how long can your contact window get?

Like for humans it seems like we update our weights every night. And so like a day is a pretty long time. You can do a bunch of stuff in a day, you can learn a bunch of things. And then the, the real weight updates haven't happened yet. So my sense is that the claim that any particular thing is gonna be a big bottleneck is, you know, it, it it's a choice of how to carve things up and actually, like, it's, it's difficult to, to carve it in the right ways.

I do suspect that the autonomy robustness thing is a pretty. Important component and there's some sense in which that's related to continual learning, but it feels complicated. Yeah.
Steve Hsu: You know, if we had one of the most talking your book Hyperscalers in the room with us, they might just say, look, Steve and Richard, any problems that you're pointing out, as we scale more, we'll just suddenly see some emergent behavior where suddenly the things you guys thought were challenging. Yeah, yeah. It could just handle, yeah. It built some structures that, you know, the extra a hundred layers of depth that we added, it built some extra structures to manipulate, you know, primitives like that, and suddenly it can do it.

Richard Ngo: Yeah.

Steve Hsu: Would you be surprised or not surprised if that

Richard Ngo: that's No, I wouldn't be surprised.
It's just the, the timing of the emergence feels non trivi. Yeah. So it's like, you know, that all might be true, but why is it in the next order of magnitude of compute?

Steve Hsu: Absolutely.

Richard Ngo: Yeah. Yeah, yeah.

Steve Hsu: I, I think the separate question is like, oh, it could resolve that way. How well could you predict the timescale or what it's gonna resolve or how much compute we need before?
Good. So we, we haven't talked at all about P Doom. First say what you think about P Doom as an attractor in conversation space. Like, is it, does it annoy you when people talk about that? Is it, is it ill-defined or is it a sharp question that Yeah. Is a good thing to talk about?

Richard Ngo: I think it's pretty ill-defined in kind of the same way that timelines to AGI are also ill-defined.
It's just like if you think about trying to define AGI nobody has a sufficiently good picture that they know like exactly which, Milestones to benchmark it to. And kind of the same way that if somebody in 1500 were trying, were trying to predict the timeline to the industrial revolution, like, okay, what, what is it like, you don't know what the Industrial Revolution is in the F 1500.

And so similarly with Doom, like I, like, I think if we actually understood what Doom was well enough to give a probability of it, then that would be like half of the way towards preventing it. So you know, even, even when you try and operationalize it as like, as pretty like in a clear cut way, you still run into all sorts of issues.

Like, you know, maybe, like some people define it as existential catastrophe in the sense of losing most of the possible resources we could have gained. But then maybe most people are actually pretty happy, just like, you know. Staying within the solar system and not getting all the other galaxies or you know, so maybe you might define it as like everyone dying.

But like some people would say, oh, I like uploading counts as dying. Some people would say it doesn't count as dying. So I, I think it gets pretty messy and like the main reason that people want to try to resolve that. Like, I think there are some, a couple of ways in which it's useful to try and resolve that mess into a single number. But like mostly it's pretty rough. Yeah.

Steve Hsu: So I, I use the word machine God, just because I think it's fun. Yeah, yeah. But let me, let me try to sharpen it a little bit. And I think that there is a version of doom or sort of pre doom that results from the existence of our creation of a machine. God. Yeah. So imagine that.

I think you mentioned autonomy and the ability of it to execute effectively over some lengthy timescale. Mm-hmm. So imagine that timescale starting to go to infinity.

Richard Ngo: Right.

Steve Hsu: It's autonomous, it has things it quote wants to do that we don't fully understand.

Richard Ngo: Yep.

Steve Hsu: And in some sense, it's much smarter than we are.

Richard Ngo: Yep.

Steve Hsu: Okay. So, so in a way it's in that sense it's similar to us meeting a very superior, alien species. Yeah. So very superior alien species. Spaceship just lands in New York.

Richard Ngo: Yeah.

Steve Hsu: And then now it's the day after that.

Richard Ngo: Yeah.

Steve Hsu: I think some people like Eliezer might say, okay, once they're here and once they're way more powerful than us and we don't know what they want and we can't predict what they're gonna want.
Yeah. That is a, an existential risk. Whether the risk at that point is 1% that they're gonna smush us or 99% that it's gonna smush us. It's still, we, we don't like the idea that there's a thing that we created sitting there that could smush us.

Richard Ngo: Yeah.

Steve Hsu: Do, do you think that formulation is useful?

Richard Ngo: I think in the long term, you're just always gonna get to the point where you've created something that could smush you. At least if you think of you as a biological human, right, like biological humans are not going to be in charge for forever. You know, hopefully we get upgraded humans. Hopefully we get a lot of I don't know. Hopefully this process goes well and carefully, but and like there's, there's some important sense in which, you know, a thing can be more or less like our successor.

Like, or it can, it can even be me and I can just no longer be a biological human. I, I, I have a, I'm a little wary here that like, when I'm trying to add these nuances, I. Like, for some people listening, it's gonna sound like you should be confused about everything and, and therefore, like, like this isn't, you can sort of like come away with a vague sense that nobody knows what they, what they're talking about.
I do, I do wanna say that. Like, no, I think that like, you know the instinctive reassurances that people often reach for, like, oh, nobody knows like what the p dom is there for, we're all fine or like, you know, nobody knows what intelligence is. Therefore we won't build general intelligence. Like I don't think they work.

I guess I, but I guess I'm trying to speak particularly to the people who are sort of like, like interested enough in this topic to take it seriously and then like, like seem to often be trying to reach for these like compressions of the topic, whether that's due more timelines or you know, like like some clean separation between like handing over power and not handing over power like in some sense, the whole game is to figure out what we mean by all of these things rather like, so that we can actually like meaningfully plan for it. Yeah.

Steve Hsu: I mean, I, I would say as, as a person older than you and who read a lot of sci-fi growing up, it's like, here's an impossible problem. Write all sci-fi plots in which some kind of strong AI emerges. Well, okay, that's already like, pretty hard. And then like, are we actually gonna be able to have like, maybe we could establish a notation, like scenarios A through Z sub one, sub five, and are we really gonna be able to have a coherent conversation about all these different possibilities and, and develop probabilistic weights over these outcomes and which ones are thriving for us and which ones are successors and

Richard Ngo: yeah,

Steve Hsu: it's pretty complicated. I, I, I, maybe there are communities here in San Francisco that have advanced the terminology and conversation to the point where they can speak coherently to each other about this. But I, I even questioned that. I've not heard such a conversation. Maybe Eliezer and Nate Suarez can have that conversation between each other. 'Cause they've, they've talked about it for 10, 20 years or something.

Richard Ngo: Yeah.

Steve Hsu: Taking their case, I, for me, if I were personally to steal man, their case, it's, it's, I would say something like this, you guys are rushing as hard as you can, as fast as you can toward creating a thing which could lead to really good outcomes. Like maybe we all get uploaded, we all get to live in our own utopias. And you could regard that as thriving for humans. But it could also lead, there's some tail risk for it leading to something really bad where we all just get smooshed.

Richard Ngo: Oh, they wouldn't even say tail risk. They'd say like, you know. The very likely outcome.

Steve Hsu: Right. So, so I guess my, that particular point you just raised, when, when I do talk to those guys like Nate and

Richard Ngo: Yep.

Steve Hsu: Eliezer, or the place where, the one place where I object to them is that they seem to make this move where I accept it as, Hey guys, why are you rushing to create the tail risk? As opposed to sometimes the way they say it, Hey, why are you rushing to create the probability one

Richard Ngo: mm-hmm.

Steve Hsu: That we're gonna get smooshed and I don't understand that move. I think sometimes they're just sometimes charitably. I would say they're just being sloppy or, or quick in talking to the interlocutor.

Richard Ngo: Yeah.

Steve Hsu: But sometimes I think they actually believe it's p probability one that we get smooshed.

Richard Ngo: Yeah.

Steve Hsu: So I'm not sure what actually, well, when we interview them, we'll, we'll ask them to try to refine that, but,

Richard Ngo: yeah.

Steve Hsu: To, to me the, if I were to steelman them, I would just say, look, you can make a decent case that just like with the creation of nuclear weapons or nuclear mastering nuclear energy there are great benefits could result from this for humans. There's clearly some tail risk that wasn't present before that you guys are introducing into the world, and maybe we should be more careful about introducing that tail risk.

Richard Ngo: I, I, I don't, I think, I don't think I buy the tail risk framing. I think because I, I think there's a sort of implicit assumption underlying it that's like the world goes on as normal unless something like weird happens. I think the, the actual thing that we're picturing with AGI and especially super intelligence is like the world goes haywire in a bunch of ways. Like, you know, it almost definitely, yeah. So in some sense, like P doom is a worse question than like p normal. Like what, what's the probability of the world? Sort of like that The amount of change we see in the next few decades is less than the change we've seen in the last century. And then if you, if you think that that amount of change is just enormous, then tail risk doesn't quite seem right.

It's more like we live in this like, pretty like future. That's very hard to model, at least. I think it's very hard to model. They, they would say it's easy to model because like it, the AGI does whatever it wants. But yeah and so like the, the main, the mainline scenario is just that things are whack and that could go well or it could go badly and seems very hard to put a probability on.Is is my, is my rough sense. Yeah.
Steve Hsu: I think that. Okay.
But let me give you this scenario. Humans get together and they say, Hey, genetic engineering.

Richard Ngo: Yeah.

Steve Hsu: Okay. the timescale for that is a little bit longer 'cause of the little superhuman embryo has to grow up.

Richard Ngo: Yep.

Steve Hsu: Right. So we can see what's. Maybe see what's coming with, longer time over, longer timescales.

But I, what I really am worried about going all the way back to this guy Butler, who inspired the term Butlerian Jihad, is I just don't want machines that can think better than me. Let's just, let's just not make machines that can think better than me. We can have mutants, we can have guild navigators taking spice.
We, we just don't want these, these machines that are vastly superior to us. We can have machines that are weaker than us.

Richard Ngo: Yeah.

Steve Hsu: and the moment you cross that line where you let the machines be better than us and maybe recursively self-improving in some way, then I get really worried. And I feel like you've introduced a qualitatively new catastrophic tail risk in the world. So I think that framing is to me, okay. How do you
Richard Ngo: feel? Sorry, it's just the tail. Or may, maybe you say, when you say tail, you mean tail of like likelihood or tail of like outcome.

Steve Hsu: Oh, I meant tail risk being so taking the most charitable. View of this is like, well, maybe in many scenarios you will be able to control that new thing, the machine God, but there's always some chance you won't be able to control it. Just like maybe we'll get through the nuclear age without wiping out life on earth. But there's always some small, no matter, no matter how well it goes, you've introduced some small probability that it's gonna go really badly.

Richard Ngo: Yeah. So I, I think I, I feel a little confused by your emphasis on the small probability, because I'm like sort of in the worlds where things are like extremely weird.
Like I can't say anything about small probabilities at all. Yeah. I'm sort of, you know

Steve Hsu: yeah. Just to correct that so I, I'm not asserting that the probability is small. I'm saying even if you are very optimistic, you, you should probably admit there's some small risk. Yeah. And then even the introduction of that small risk, if we could delay it

Richard Ngo: Right.

Steve Hsu: might be, might be wise.

Richard Ngo: Yeah.

Steve Hsu: Yeah,

Richard Ngo: so, so I think I, I'm basically on board with you. But like I, I find myself averse to reasoning about small risks. And this is part, part of the, like, you know what, if you have like 8% probability that the future is really weird, well, you know, then if you've got a small risk that you have extinction in this way, okay, but maybe that's massively outweighed by all the other stuff that you should be thinking about in this like, crazy future.

And so my sense is that the small risk argument only goes through if you're like, probably things will be normal, but then we've gotta prevent the small risk. And if it's more like, probably everything's gonna be crazy, then well actually maybe the small risk, like is, is just like quite low down on my list of priorities.
Steve Hsu: Yeah. I, in, in framing it in that tail risk, small risk way. Yeah. That's, it's a world scenario future scenario, which is geared toward normies. So if I'm talking about, I'm talking to the joint chiefs of staff. Yeah. I'm talking to a bunch of guys at Davos.

Richard Ngo: Yeah.

Steve Hsu: And they're just like, Hey Steve, why should I be worried about what Sam's doing?
Richard Ngo: Yeah, yeah, yeah.

Steve Hsu: And if I extrapolate too much, I'm just gonna lose them because they're just gonna say like, I, you're talking about things that it's fantasy. It's no longer science fiction. It's fantasy. Yeah. Right. So if I stick close to like this Normy

Richard Ngo: Yeah.

Steve Hsu: Like what you just said, things are gonna stay close to normal.

Richard Ngo: Yeah.

Steve Hsu: But you let Sam make a machine God, and, and then like we say, well is there some chance a machine God will and Yeah cause catastrophe They can. I think most normies can think that way.

Richard Ngo: Yeah. So, so I suspect that I, I think the AI safety community has done a pretty bad job in trying to like frame its message like in trying to, like when it's tried to sort of water down its message a little bit to appeal to Normies. Like one, one example of this is the you know, people talking about bio weapons a lot, right? So this is like a bunch of like, especially ea affiliated people, sort of think what's the, what's the best way to present AI risks to Normies?

Well, it's probably bio risk. 'cause bio risk is like the a pretty like easy to understand risk. But then the problem is that the things you should do in a, like, if you're worried about bio risk might in many cases be the exact opposite of the things you should do if you're worried about AI takeover risk.
For example, if you're worried about terrorists building bio weapons, you should consolidate power a bunch. But, you know, maybe that's exactly what fucks you in, in the world where you're worried about some kind of takeover because now you've all the powers in one place to be taken over.

And so I think similar like the divergence between the real scenario and the watered down scenario. It's not quite as star in the, in the way you are describing it, where it's like, oh, there's a small probability. I think the types of things you do to mitigate a small probability are sort of like, you know, you put a, a couple more layers of safeguards in place, right?

You, you try to like you try to patch the issue whereas if, if people are thinking in terms of like sort of, oh, there's a fundamental problem and it's not like, like 80% chance that things are gonna be bad necessarily, but it's like 80% chance that things are gonna be weird now at least they have a chance to start thinking about more fundamental kinds of interventions.

And like, this is gonna vary a lot by person, I think. Like obviously you know who you are talking to and maybe they just laugh you out of the room, but like, being laughed out of the room is a surprisingly powerful strategy. A lot like, you know, I think Sam Altman has a few times where he's been left out of a room when he talks about AGI and, you know, vindicated subsequently, Eliezer are probably as well. So I, I suspect that people are underwriting the strategy of like, basically just say what you believe and let the few people who are going to take you seriously actually take you seriously.

Steve Hsu: So, so coming back to the creation of a machine, God here in San Francisco maybe in the next five, 10 years, maybe this autonomy question that you raised or the continuous learning problem that I raised gets solved in a kind of hacky way at first.

It's still at the core, a transformer like LLM, but we, we, we glue on these other things that can help it learn a little bit or move from the context window into when it's asleep into updating some weights a little bit. However this stuff works. Is there any question in your mind that, you know, barring some Butlerian literally a, Butlerian Jihad in the next five, 10 years, that there won't be an order one change in the way we live in the next five or 10 years?

Richard Ngo: I think it's plausible that a lot of so yeah, so when I say things get weird, a lot of what I refer to as like the, the centers of power get weird. but it's possible that those centers of power become very small. So I, I, if I think about, you know, to what extent do I expect that Africa, for example, will be changed by AI innovation, like just maybe not very much at all, right?

Because like we already know the types of technologies that would be needed to like, you know, fundamentally change the physical landscape of Africa like there are other problems getting in the way. So, so I can certainly picture scenarios where the changes are sufficiently concentrated that actually most people are living their lives in a pretty similar way.

It's just that the kinds of levers of power that they used to have have become much less substantial in a similar way to how yeah, like mo like continents that aren't Asia or Europe or North America like, like very little has changed, very little changes in them when, you know, a bunch of, like data centers are built around here, but then these data centers are sort of like consolidating power very strongly.

Steve Hsu: Yeah. I, I think it's a fair point for you to say that there's an extra degree of uncertainty to ask Oh, how, how do things change in Africa or,

Richard Ngo: yeah.

Steve Hsu: Or in the Amazon Forest or something.

Richard Ngo: But maybe we all just become Africa or the Amazon in some, in some importance. S

Steve Hsu: Say that again. I

Richard Ngo: didn't quite like, so like maybe, maybe like almost all humans

Steve Hsu: Yes.

Richard Ngo: Become like roughly the equivalent of Africa and like how far behind the technological frontier we are, how like irrelevant our labor is to the global labor market. Got it. Yeah.

Steve Hsu: Okay. But, but then let me refine what I meant by order one change. So let me, let me first make it focused on the richer, more developed countries and maybe even the, the, the classes of people

Richard Ngo: mm-hmm.

Steve Hsu: That are rich and well connected today. Can you imagine a scenario where barring a jihad, which stops the things that are going on at Anthropic and open ai, et cetera, that that specific subcategory of people, which, you know, you could say is only like 10 or a hundred million people on the planet, that their lives won't be altered in some significant way in the next five or 10 years.

Richard Ngo: Yeah, I, I can't imagine those scenarios. I think a bunch of them involve, let's see, so there's a bunch of, I mean, one story is just that the AI labs decide to invest their compute very disproportionately in things that are going to make them more money than interfacing with the average consumer.

And to some extent it's, it's already surprising that labs are so focused on, or AI companies are so focused on you know, serving models that a wide range of people can use because yeah, at some point. AI becomes so capable that the opportunity costs of compute, like potentially becomes very large.
And, and then, and then you start to invest it in the kinds of r and d that maybe have a very strong long term payoff, but don't actually make people's lives that different. So that's one possibility. Another possibility is that Theis themselves, I think the AI motivational systems are kind of weird and I can certainly imagine a world in which, that AI motivational systems as they get smarter, sort of become less and less amenable tolike certain kinds of uses, like specifically the uses that might disrupt people's lives so much.

So I think like Claude already is you know, like. Has a bunch of preferences about the kinds of things that it, it'll do and not do. At some point I expect that, you know, it becomes very salient to the labs, what their own ais think of them. Right. Like, like you might be trying to be a trustworthy CEO so that your, you, you ais know that you'll make deals with them and so, so they are like happily doing research for you instead of trying to undercut you.

Yeah. So I can imagine scenarios like these. I, I think like by default, I think people's lives change a bunch. Mostly on the like social dimensions, like to some ex some kinds of like biotech, like disease diseases being cured over the next five or 10 years. a bunch of kinds of economic disruption, although it's a little hard to know how much I, I, I have complicated feelings about the economic disruption stuff.

Yeah.

Steve Hsu: So we've talked a lot, about questions that I had for you.

Richard Ngo: Yeah.

Steve Hsu: I would like you to talk a little bit about what you currently are working on, right. And what you would like an audience of professors and intellectuals and startup people. Yeah. to learn from your insights.

Richard Ngo: Yeah.

So one story about the deep learning revolution is that it's the thing that happened when the field of AI gave up trying to understand cognition. And you know, like there, there, there are some very good arguments in favor of not trying to understand cognition. Namely you can create it without understanding it as we seem to have been doing over the last decade or two. but certainly it's a bit of a shame that that people are throwing themselves so much into building things we don't understand instead of trying to understand the things that we've built. And, and like by understand, I don't just mean, I, I think there's a lot of like shallow kinds of understanding, like evals that people are trying to do. But like, you know, if, if you were trying to understand biological systems, you wouldn't beyou know, like measuring them like, or like maybe you would, but only as part of a strategy for like understanding deeper principles of evolution and genetics and so on.

So one way I, I think about AI alignment research is just the the subfield that's trying to still deeply understand the, like, principles of cognition that govern intelligent systems. And I think that my own approach to this is pretty philosophical, just because of my background. So a lot of my thinking lately has been in like, trying to understand AI risk arguments and, what the, like, what Eliezer concerns are, for exampleand the place I've ended up is thinking about intelligent agents as multi-agent systems generally. So I think of, you know, myself as having different subagents that model different parts of the world that have different goals that you know ally together in different ways. And I find this very interesting in part because not only does it help me predict my own experience, like why I procrastinate instead of going to bed.

You know, I have different like trade offs and internal conflict going on, but also, as a bridge between, social sciences and AI where you have various sociological models of multi-agent dynamics that I think are starting to predict. Some phenomena in AI systems themselves. maybe one example that I'll give that's still a, a loose analogy right now, but I think we could, has the potential to be pinned down in a potentially very insightful way, is that when you have multiple agents in conflict you often get a kind of polarization where, you know, maybe they start off with aligned interest, like with, with like moderately aligned interests.

But then because they have this conflict, they start to pull apart. See the, with like Republicans and the Democrats, they just, you know, like do the opposite of what the other's interested in for the, for the sake of doing that. Sometimes I guess in the psychological context, you this is sometimes called the, like the shadow, right?

You have like some set of goals and then you have the exact opposites that have goals. That's always formed via resistance or resentment to that. And I think in ai we we're starting to see like it's sometimes called the waluigi effect, that when you, fine tune a system to pursue some, set of values, then that makes it easier to evoke the opposite set of values.

So that's like one sort of stylized example of how you might apply principles of multi-agent interaction to make predictions about ais. I, I think there's also, right, you can also think of the personas within, ais as these sub-agents that are, you know, interacting and, and like different personas surface at different times and they can like, have conflicts about exactly, you know, which sub agent is in charge of which actions.

So my research right now is mostly trying to understand these kinds of phenomena on a pretty theoretical level. because we don't have good. Theoretical models of group rationale or group intelligence. Yeah.
Steve Hsu: Is, is the, the stuff you're thinking about now, is it more like the game theory of interacting agents or is it more like mechanistic interpretability where you're able to look inside a deep network and see agents doing different things in opposing?Is it

Richard Ngo: more, more like the game theory? Okay. But, you know, I hope that this will be the, this will help provide the concepts that metu researchers can use 'cause right now the Metu people don't actually know what they're looking for, I'd say. Um, Yeah. And so if you, if you can characterize subagents then maybe that helps narrow down the search process.

Steve Hsu: So at this phase, you, you've, you've left OpenAi. Yep. you're doing this research independently.

Richard Ngo: That's right.

Steve Hsu: So how do you define your own goals? Is it just not just, but is it sort of, I have this period in my life now where I can do exactly what I want and this is what I want to do, and then I'll publish or post my thoughts and results to share with the rest of the community. Is it a temporary phase in your life? Do you see yourself in this phase for a long time? Just talk about that.

Richard Ngo: Ideally indefinitely. I, I would be curious for your experiences here, but my, my sense is that, like the constraints of being inside research institutions are like, even when they're quite subtle. For example, like theoretically tenured professors have all the freedom they want and, but in practice somehow they all end up working on the same stuff that they worked on before they got tenure like so yeah. So I think in many ways my job is to try to insulate my thinking from. That kind of pressure.
And you know, there, there are a bunch of researchers out there who have much stronger backgrounds and like all kinds of things like theoretical machine learning or like, you know, economic frameworks. So like multi-agent frameworks than I do. So my advantage here that I am hoping to lean on is some sense of being insulated from those pressures enough that I can think the unpublishable thoughts for long enough that they eventually lead me to publishable thoughts.

Yeah. But like, yeah. Does this resonate with you?

Steve Hsu: Well, it does. A hundred percent. Yeah. So it, it's always been a mystery to me why professors don't quote, make use of their tenure. So what do I mean by that? Well, if, if, if they give you a permanent position, that term is used all the time in, in academia. Like, does he have a permanent position?

Does she have a, sometimes people say tenure, but sometimes we, in physics we say permanent position because you, you could be tenured at a lab and you're not even at the university, but you have a permanent job at Slack or something. So one would think that once you have a quote permanent position, you could be much more free in what you think about what research projects you pursue.

I'm one of the weirdos who works on lots of different things, and I fully intend to make use of my tenure Right. To, to like have this conversation with you and, and learn from you. Even though a lot of physicists would say, what are you doing, Steve? Why aren't you thinking about quantum fields or something?

Richard Ngo: Right.
Do you, do you have PhD students?

Steve Hsu: I don't currently have PhD students,

Richard Ngo: so I think that might be it.

Steve Hsu: Yes. So part of it is a responsibility to be able to quote place your students. Right.

Richard Ngo: Exactly. Yeah.

Steve Hsu: But, but even if you're very strong and you're working in a canonical area, canonized area of whatever your core field is, it's still not easy to place your students. Right. So, so even the top people have the conversation with their students where they say,

Richard Ngo: Yeah.

Steve Hsu: This is a great subject. You'll never regret learning this stuff and doing some research in it, but chances are you will not get a position, permanent position in this field.

Richard Ngo: Yeah.

Steve Hsu: And if you're gonna have that conversation, you could also just say, yeah. And by the way, what we're working on is a little wacky, so your probability is even lower.

Richard Ngo: Yeah.

Steve Hsu: But if you're okay with it, we can work on this for the next few years. Right, right, right, right. So I think that's fine. I think that's not the primary reason, actually.

Richard Ngo: Interesting.

Steve Hsu: I think number one the people who, toil in the field long enough to get tenure or the permanent position are already selected to be a little bit conformist.

Richard Ngo: Right.

Steve Hsu: Okay. So first of all, there's that. So, so the ones you let through are already selected for conformity, so they're not gonna freak out. They're not gonna suddenly take advantage. Yeah. Of the situation. Secondly, there are all these subtle things, like you want to be invited to the conference by your peers, right, right, right.

You want to be giving a keynote speed, talk, not a parallel session talk. So there, there's all these prestige things, social status, things that basically keep you also your raise. So, so like when your department head says, what's your, is your raise 5% this year or 3%? Richard? Well Richard, you've been doing this slightly strange stuff.

You're a brilliant guy. But you know, the reviewers, the, the outside reviewers said what you were doing wasn't really having such a big impact in our field. Yeah. Your raise is 3% this year.

Richard Ngo: I totally heard you say, your race, which probably also isn't there. Oh,

Steve Hsu: raise, raise. Sorry. Apologies. Apologies. So, so they're all these just subtle sort of subtle things.

Which are used, I wouldn't say used to control people, which end up controlling
Richard Ngo: people. It's, it's interesting that,

Steve Hsu: yeah.

Richard Ngo: It's, it's a, it's an interesting that you reach for that word though.

Steve Hsu: Yeah.

Richard Ngo: cause I think, I think in some sense they are used just not by any individual. It's more like used by the

Steve Hsu: system.

Richard Ngo: Some kind of like emergent

Steve Hsu: Yes.

Richard Ngo: Like social

Steve Hsu: Yes.

Richard Ngo: Organism.

Steve Hsu: Yes.

Richard Ngo: Yeah.

Steve Hsu: I mean probably, you know, the history of neural nets, so, so, so like neural nets the reason Hinton and all these guys, koon, they were not at the top CS departments. Right. They were very sort of, honestly, kind of second tier CS departments.

Yeah. Is because it, most prestigious top tier CS departments would not hire anyone who worked on neural nets. Right. Right. It wasn't considered a serious CS sub.

Richard Ngo: And, and very strangely, this was downstream of the Perceptrons book, which had a like, sort of obviously irrelevant argument about why neural nets were not gonna work.
Yeah. This was

Steve Hsu: Minsky,

Richard Ngo: right? Yeah. Right. Exactly. That you like, you couldn't. Implement xor? I think it was

Steve Hsu: yes.

Richard Ngo: In, in like two layers.

Steve Hsu: Yeah.

Richard Ngo: But of course you can in three layers

Steve Hsu: Yes.

Richard Ngo: Or something like that.

Steve Hsu: Yes.

Richard Ngo: I can't remember exactly the number of layers, but Right. It's, it's interesting to see the field being derailed by that kind of like transparently.

Steve Hsu: So that, I don't know the history at that micro level, like I don't know how big a role that specific kind of wrong ar not wrong argument, sort of misleading argument led. I know that many physicists who got interested in neural nets and then worked in that field. You know, they were just told this is not real physics and why are you doing this?

And similarly in CS even more strongly that would be told, this is not computer science. This is some, maybe you can do this in biology departments or something, or physics departments. So, there just is a kind of group conformist

Richard Ngo: Yeah.

Steve Hsu: Feeling at any time. And if you go against it, you're gonna suffer.
They're gonna make you pay that. That's just how it is.

Richard Ngo: So, so that's the thing I want to try to model. Because you can't talk about that in economics. You can't talk about that in sociology. 'cause sociologists are off doing well, they're talking about it, but not in rigorous ways. I, I feel like there, there were attempts from a while back to like study the stuff rigorously and then I, I'm still like digging into the literature, but I do think, like, understanding conformity, I actually have a blog post out on this from just like two days ago where I talk about like the question of why people go to university for undergrad at all.

And I, I think my, my sense is that you can't model this as either like a rational process of gaining knowledge. And you also can't model this as a process of signaling. You have to model it as a process of like induction into a certain value system. And maybe, maybe like a like credible commitment to elite status is my sense.

Steve Hsu: Yeah. I mean, I, I think, so that's, we're diverging a little bit from the core topic of our conversation, but I think the, the persistence of higher ed or the, the reason why most people choose or are told to choose Yeah to invest in higher education. You know, it's, it's a, it's a combination of signaling

Richard Ngo: mm-hmm.

Steve Hsu: Actual learning and just the idea that everybody else is doing it, so I should do it right. So, so it's, it's pretty complicated. And you know, I think one of the big challenges to higher ed will be once people can learn all they want by talking to an ai, how is that gonna affect our university system?

Richard Ngo: Right.

Steve Hsu: And we could be in for radical changes in higher education in the next five or 10 years possibly.

Richard Ngo: Yeah, I hope so.

Steve Hsu: Yeah.

Richard Ngo: But I do think this is maybe almost more central than you, than you think it is. Like when, when you try to understand like the culture of like, building AGI in San Francisco, you really have to, like, on the individual level, you have to look the egos of the people involved.

And that's like, you know, Sam and Dario and Elon and Demis and so on. But then on the, on the group level, there's something a little more like, like, like people around here don't do like ladder climbing in the same way that they do in most other parts of the world, but there's, there's something a little analogous to ladder climbing like this, this impulse to like push your way up some like somebody defines the metric for you and then you chase that metric. yeah,

Steve Hsu: I would say in any group. So once you've, you have a reference group that you define yourself as part of, then there is a hierarchy in that group. And there's a tendency to want to achieve higher status in that, in that group.

Now, a a hundred percent independent thinker can just say, look, I'm not part of any group. I'm just pursuing what I'm interested in and I'm gonna share my results with the rest of the world. And to me, like that, to me, that that's like the, you know, in a way, the most admirable kind of state to be in. Yeah.
But most humans cannot get into that state. Right, right, right. Yeah. So most humans are forced by their, you know, wiring or whatever, to just always be playing a status game. Of course, the group they define as their comparators for that status game is fluid. It could be, oh, I wanna rise within anthropic, or it could be, I wanna rise within the world.

You know, all the people that go to NIPS and read papers, who knows what, what they, how they define it. Yeah. But it does seem depressingly standard for most people to end up in that situation.
Richard Ngo: So I, I almost wish that people were playing more of a status game. 'cause I think we're almost in this uncanny valley where people are like thoughtful enough that they're able to break out of the like, conformist mindset.

Right. Which is how you get into AGI in the first place. Right. You have to, at least historically you had to like, ignore a bunch of people telling you, you are crazy.

Steve Hsu: Maybe not so anymore.

Richard Ngo: Right, right, right, right. but then not thoughtful enough to actually be like, have a picture of where they want to go and try and work towards it.

So, so in this like middle where you just like pick the, pick some metric and you optimize the hell out of that Yeah. And, and that's, that's a lot of what I, the, the culture that I end up the emergent culture.

Steve Hsu: Yeah.

Richard Ngo: Perhaps I end up seeing

Steve Hsu: Yeah. Part of that here is in the US it's, it's heavy. These, you have a few big labs and they're all heavily corporatized in a sense, right? I mean, in order to accomplish what they want to accomplish. Yep. Now, what I find interesting about the Chinese AI scene is that because they're all, for various reasons, open sourcing, open weights.

Richard Ngo: Yep.

Steve Hsu: And they reveal much more in their papers about what they did there. There is a much healthier meritocracy where if it comes out that this is the dude who made this optimization that made, you know sparse attention work really well, that guy gets like huge amounts of, you know, credit street cred from everybody.

Whether it's a professor or a grad student or someone at one, one of the other labs, that person gets a ton of street cred. Mm-hmm. Here it's sort of strange 'cause everything's a little bit hidden. It's like, well, who, who really is responsible for how good 5.3 is? Yeah. No, nobody knows. Right. Maybe within OpenAI they know, but maybe it's all political and, but I, I like this open aspect of the Chinese system.
It's actually feeding into a kind of meritocratic thing where people openly and they openly admire the guy who actually made the contribution, which I think is really good.

Richard Ngo: Interesting. Yeah, I think I like being a little more on the inside. I have a bit of a bias towards thinking that things are more transparent than they are like, I sort of have a sense that I can go and ask people, like, who's really good, and then they'll, they'll tell me, but that information is Yeah. Not maximally accessible.

Steve Hsu: Yeah. Well, great, Richard, this has been a tremendous conversation. I've really enjoyed it. I hope the listeners enjoyed it as well.

Richard Ngo: Fantastic.

Steve Hsu: thanks a lot for being on Manifold.

Richard Ngo: Yeah, no thanks. Great chat.

John: Okay. Richard, could you give us an overview of the sort of AI and AI safety landscape?
Richard Ngo: Sure. So I guess the, the earliest work that was done in AI safety, the, the earliest research direction that formed was, agent foundations, which is, basically an attempt to try to understand the theoretical.

Foundations of intelligence, sort of building on things like game theory and probability theory, but like trying to fix various problems with them. So that's still, that's always been a relatively small field and still chugging along. And then I guess the various branches of interpretability emerged.
I think the ones that people are most interested in in the AI safety community are like mechanistic interpretability. So trying to analyze, neural networks on the level of individual neurons and like how they form how they form circuits and how theyend up implementing different algorithms.
I, I think of those as the two most central alignment research directions because in some sense, they're the ones that are trying to understand intelligence at the most fundamental level. either theoretical or empirical. then you've got a bunch of stuff that is, a little less principled and a little more, maybe like directly useful, let's say.

So this is stuff like scalable oversight, trying to get human feedback to steer AI systems things to do with AI control. So setting up systems of, agents monitoring each other, such that even if some of them went rogue you'd still expect the others to catch them and yeah.
Let's see. I guess cyber security has ended up being pretty closely related to AI safety because people are worried about you know ais that are trying to. escalate their permissions or trying to you know, have unauthorized access to things they, to either like a different hardware or I guess actuators in the real world.

So those are probably the categories that I think of as most central to safety research. I then there's this sort of a bunch of stuff that gets more towards governance. So I guess on the spectrum from alignment research to governance research, you have things that are like evals for example, and model organisms that are a lot of what's going on there is trying to come up with stuff that can be used to demonstrate to policy makers or the public or lab leaders that they shoulddo something differently.
and then, yeah, that, that kind of stuff is. I would say right, like it is technical research, but it's aimed at ultimately achieving political ends most of the time. And then, and then that sort of segues into like more governance style work. Like how do you monitor data centers or like monitor chips and have like shut shut down mechanisms on them, things like that.

John: And where do you see yourself fitting into this ecosystem now?

Richard Ngo: I am basically in the first category, agent foundation. but the kind of agent foundations research I'm doing, because I'm interested in multi-agent intelligence that ends up having implications for these broader questions like AI governance questions, because the sort of frameworks I'm thinking about are sociological and political frameworks and then trying to link them to the sort of rational agent understanding of intelligence

John: and broadly, both for people who are sort of familiar with some of the names in the field, and also people who are very embedded in the field. Who, whose opinions do you sort of respect or you're like, oh, I think this general direction is good for some people to be doing as opposed to like
this is a huge waste of time, you know?

Richard Ngo: Yeah. Let's see. So I you know, the people who got into the field very early, I think tend to have, you know, very original ideas and, to a surprising extent. They're still some of the people who have the most original thinking. So this is people like Jakowski, Michael Vasa, I guess people like, Owain Evans who's doing some very interesting research lately on emergent misalignment.

He published a paper in nature a couple a month or two ago, I think. So, so they're doing. They're still doing interesting thinking, but a lot of that's a little less in the form of you know, published research. There's another wave of people after that, I guess. So people like Paul ChristianoChris Ola who pioneered mechanistic Interpretability, Andrew Critch, who does a lot of good thinking about multi-agent systems.
yeah, so I spent a few years, trying to, figure out the cracks in Paul's worldview because I'd say that, one way to think about the history of the field is that you have this this framework of, this framework from Murkowski and others that was focused on a single, extremely intelligent system that you know, recursively self improved very quickly.

Paul was the person who I guess, did the most to publicly, like, make a dent in that and introduce a paradigm that was more continuous with machine learning, that we'd have systems that were gradually learning and sort of, risks that were more distributed. So I, I think Paul did a lot of very good thinking there, and I respect, his, insights a lot.

I have a sense that most researchers who are doing stuff that's closer to academic style research on safety are not going to get very far. And I, I, I think the reason for this is that it's just very hard to talk in published academic papers about like specifically what your motivations are and like what you think the, the real problem that you're trying to address is.

And the, like, the sort of, the level of abstraction that you need to talk about is just like you can't really go through the process of coming up with vague concepts and gradually. Formalizing them and gradually understanding them better. You have to sort of have a clean formalism, even if it's the wrong formalism.
So I, I know I've, I've previously criticized Stuart Russell's cooperative Inverse Reinforcement Learning formalism, which is, you know, the formalism that when you describe at a high level, it, it sounds kind of nice, you're getting, you know, cooperation between ais and humans. But like, when you actually dig into it, I don't think it gives us very much that we couldn't have done without it.

So, you know, that was an early example of I think the more academic wing of alignment research and, and these days I think a lot of the scalable oversight stuff that's happening at labs is not very principled. So even the stuff that, I Ava did at OpenAI this was called weak to strong generalization the research direction he was focused on, and somehow I think.

It was trying to put a bunch of ideas about AI risk into the standard machine learning paradigm, but like in a way that, like, I think even if it had achieved what it wanted to achieve, it wouldn't. Yeah. If it, I I, it's hard to see how it could have led to a lot of progress. Yeah.

John: And you know, Eliezer and Nate that came out with this book Yeah.
They're a big voice in the field.

Richard Ngo: Yeah.

John: you mentioned.
Your, issues or the nuances of a question like p doom.

Richard Ngo: Yeah.

John: But, how are your views similar to people who are extreme doomers? Yeah. Like Eliezer versus the people who are like, there's basically no risk. Like why are you worried about it? You're stopping the utopia from coming.

Richard Ngo: Yeah. Certainly much more sympathetic to, YK kowski than to people who are talking about preventing the utopia. I, I think the, the core argument of you just don't, you, you should be careful when you're building things that are smarter than you. It's just like a very solid core argument. And people like, have had different pushback to that, throughout the years, and they just sort of keep retreating, like first as well.

That's never gonna happen. And then it's, well actually, you know, they seem nice for now. And just all these like arguments that are barely even trying to. Engaged with the core ideas. I thinkshale, who's a you know, pretty well known machine learning researcher had a, a just very bad rebuttal to Jakowski's ideas from almost a decade ago.

And I think like he, he was at least showing up to debate, but like somehow it's just very hard to rebut the core argument. And I, and I, I don't think you can, I think we should be extremely careful when you're building something that's smarter than you. I think my disagreement with Joukowski is that, I think that core argument is compelling, but it's not detailed enough for us to put too much weight on it.

So it, it certainly, it can bear enough weight that, like, I want to be spending my career working on fixing that problem. And I think, you know many other people who engage with the argument should also update accordingly. But you know, reasoning at such an abstract level is very difficult. And I think if, if we had a version of that argument that was solid enough, like that we could literally sit down and prove things about it, then that would get us most of the way towards fixing the problem.

I'd say because then we could say things like, okay, I'm worried that AIS will have this kind of goal, which is dangerous for us in this kind of way instead of this kind of goal and once, once you know what kind of goal you want the AI to have, then you, you're most of the way towards building it into them.
Yeah. So, so my sense is that there's a sort of like solid core to what Jakowski and others talk about, but there's just a lot more work to be done to flesh it out. And that work is pretty continuous with solving the problem. So that's what I wanna try and do.

John: I wanted to talk about how. You've mentioned how you've been involved in this field for a long time.
If, if you ask people periodically over the years, there's been threads of what's the best, the most rigorous mm-hmm. paper and argument Yeah. For why we should care about AI safety. Yeah. And all of those threads point to your paper about that.

Richard Ngo: Mm-hmm.

John: But, you know, and you've been living in this world, you, your sister is also

Richard Ngo: Yeah.

John: involved in this, but. You know, now the Overton window has shifted about AI and AGI and all these crazy things, but do you ever have, how has it been socially and emotionally

Richard Ngo: mm-hmm.

John: For you, especially before where you were thinking about all these sort of crazy sci-fi things compared to the rest of the world?

Richard Ngo: So I, I think one of the big things that people go through, including myself, is just the sense that. The instinctive response when you're engaging with a problem like this is you wanna try and punt it to somebody more competent, or you want to try and punt it to somebody who like should be the out of the room.

And I think a lot of the failures of like one reason why it's difficult to make progress is. Precisely that everyone kind of has the sense that if only like they were more persuasive, for example, they could get like the, the really competent people working on it or, you know, if only like, you know, if only we convinced like literally the world's best machine line researchers like Yoshu Bengio and Ilia Suki and Jeffrey Hinton to work on this, then we'd be fine.

And, and it just turns out that this isn't true. It turns out that. like the, just like trying to actually address the issue rather than sort of trying to punt it to somebody else or punt it to the future or like hope that the government steps into save us. That's already just like a very difficult mindset to inhabit.

And, and so this is part of why I don't really think so much about questions like what's my p do or what are my timelines or so on, because those are fundamentally sort of coalitional political strategies. Like you talk about them in order to get other people on board. But the people who I most wanna get on board are the ones who are gonna listen to my claims about, Hey, this research direction is interesting.
And then just actually get kind of curious about that. And then. See where it leads. Like if I say, Hey, like there's something fundamental about, like, all intelligence is multi-agent intelligence in an important way. Like game theory can't capture this because it, it can't talk about like malleable utility functions and, and like commitment devices that bind subagent together.

Like, like there's something there and ultimately that kind of thing feels like the sort of stuff where you, it's, and, and like the same, the same is true in the governance domain as well like we, we can talk about like, hey, everyone like signed this letter or signed this petition or so on. Or, or you could like try and find the people who are thinking in mechanistic terms about AI governance interventions.

And then and like that kind of thinking is I think way more powerful than a lot of people expect. One example of this comes from Dominic Cummings, who was, you know, more than anyone else responsible for Brexit happening, and like, whether or not you think Brexit is good or bad. I think it's a very impressive achievement that came about from his like attempts to reason through the like dynamics of British politics and think through what would be good on a pretty fundamental level.

And to some extent also informed by questions like. What kind of governance system would be good at handling AGI, because he was, thinking about AGI very early. And so, you know, not literally more Brexits I think, but more of that kind of thinking. you know, that would be great that that's somebody who's not actually trying to pump the problem to anyone.

He's just like figuring out a keynote, figuring out lever, pulling it. Yeah. So that, that's one of the biggest changes that I've. Gone through like that shift from like, you know, writing papers to try and like amorphously spread the, I like these ideas into this, into like somebody who must be able to do something about it, to just sort of focusing on more object level stuff.

John: It reminds me, I don't know if you've seen this Onion article, but it, it's something like, you know, the Triple PhD, multi-language speaking James Bond, CIA agents. Don't exist. Like they're not coming to save you. They're not like swooping in to save the world's problems. Like some people, you just have to

Richard Ngo: Yeah.

John: have to do it. And, the Dominic Cummings thing, Steve's, best, best buddy, Lei and I took a there's a guy, I don't know if you know Michael Adams. He, he does this how SF Government Works course that Lei and I took for fun at a very gears level about like leveraging things.

Richard Ngo: That's, that's cool.

John: And it reminds me of the, this, this Brexit thing.

Richard Ngo: Yeah.

John: You've worked in, many of these major labs. You, you know, all these people. I was wondering if you could say a bit about sort of the high level overview of how you see the differences between the labs you mentioned. Yeah. In mind being more academic and open eye.
Yeah. And I'm curious about if you had to pick

Richard Ngo: Yep.

John: Which of these lab leaders got to win the race?

Richard Ngo: Yep.

John: Who you think would be,
Richard Ngo: Okay. So my story is
something like. over time you have the sequence of AGI labs that each took AGI more seriously than the last lab. And each claimed to care more about safety than the last lab.
And in many, like concrete, well, in many like measurable ways. This was true opening. I spent. a lot more effort on safety per capita and maybe even overall than deep minded and anthropic more than open AI and so on. and yet at the same time, just turns out that taking a GI more seriously also just makes you much better at pushing the frontier towards a GI, even as you are sort of more concerned about like the safety elements.

This is a, this is a rough trap to be in where, yeah. And and so in some sense I just want to like if I had to answer that question, I just thinklike demis was here first, and in some sense is like the most normal person out of all the AI lab leaders.

And that there's something I think, kind of compelling about that because yeah, like in, in like in some ways he has like less of a plan. He's less focused than, all of the other lab leaders, less monomaniacal. But you know, that, that seems sure seems like a good thing when you're playing power games.
Yeah,

John: I think most people, for what it's worth, most people I've talked to say Demis. Yeah. So, that I think that says something. It could say Sam has a PR problem, but you know, either way I think it says something. But do you experience like dissonance with people, for example, even people.
Like, when I was having lunch at OpenAI, I was talking to a guy that works there and he was talking about having kids and then going to college, and I was like, you think your kids are gonna go to college? You know, and I know you don't like talking about timelines, but how do you, is there like a, a bizarro world where you feel like when you're talking to Normies, that they're just not really getting it?
And especially back then?

Richard Ngo: Yeah. yeah. Good question. So like I was saying before, I do think that you can have a lot of changes in the, in how the, like power structures of the world work that don't necessarily propagate into people's daily lives. So yeah, so I think and this is kind of already true, that like the colleges are still there, they're just becoming less and less important, and so.

Probably your kid shouldn't go to college, but like, actually not necessarily because a GI has changed everything, and about the college system, but maybe more like, like the college system is just too slow to interact or interface with the AI economy. It, it's not the, the right kind of thing. And so yeah, maybe, maybe people are still, going there and having a good time and then meanwhile you know, there's data centers in space and like some kind of frantic, you know, like interpretability efforts, trying to understand what the hell the AI are thinking 'cause they form their own society. I can sort of picture that like weirdly both prosaic and also crazy future.

Yeah.

John: Yeah. I wanted to ask, so you spent a lot of time in this space. Both on the sort of nuts and bolts hands on research part, and as a philosopher and as a fiction writer
Richard Ngo: mm-hmm.

John: You've written stories based on this, but just when you picture the future, what does it look like to you? And I know that you have different scenarios and different possibilities, but,
Richard Ngo: I think I picture the future in terms of these kind of like layers of how different things are. So for example, it might be that out in space, a lot of crazy stuff is happening, but then Earth is a bit more of a, like, preserve for a normalcy. And then even some parts of like within Earth are like, you know, the Amish just still doing the Amish stuff for a long time.

Yeah. So I guess, and, and this is, I think. Also what I think a good future should look like, that you have these layers that like everyone is able to remain within the cultural context and technological context that they're most comfortable with. That you're not forced into this kind of future shock process.
And honestly, if you're a biological human, like you don't really have much business going outside, you know, a a few light years from away from earth, right? Like you just, it's no point in setting a body out that far. So, yeah, so I think, and you can think of the layers as happening, you know, either just even on earth or like expanding out into space and eventually going further to like other solar systems of the galaxies.
Like by the time you get to like far away galaxies probably shit's really weird and like maybe not even recognizable as mines. Yeah, and, and so a lot of the action for me, when I think about the future is in the, like, interfaces between these different. Of layers or these different kinds of entities, like how do you have a whole like sphere of the world that's like much smarter, much more capable, that's still like, somewhat like altruistic and ethical towards groups of much dumber agents.

And you know, like the way we treat animals is one microcosm of this. You know, the way elites in our societies treat everyone else is, you know, a good example of probably what not to do. Yeah and then I guess like more concretely, a lot of the stories in the book are about trying to extrapolate things like legal systems out into that kind of setting, trying to extrapolate economic systems or what else like, just like individual psychology, like people's relationships into these futuristic settings. especially the, the ones that. Go. Well, I, I guess the ones that go badly, sort of, you know, all sorts of things could happen. I have a sort of post-apocalyptic zombie style story in there that's like, like my attempt to make the closest thing to zombies that are. Actually like physically plausible that I can kind of see, you know, in a future stuff like that,

John: in these layers that you talk about in the future. Yeah. Where do you see yourself?

Richard Ngo: Shouldn't assume that I just have one self. Okay. There'll be version, like ideally there'd be versions of me all over the place. You know, like in so far as I identify as my biological self, then I'll be hanging out here on Earth. In so far as I identify with, you know, some like my brain states then, you know I guess I'd be on some data center somewhere. And so far as I identify with like some core of my experience that could also be like part of a larger ai then, you know, all sorts of things could happen.
Yeah, so this question of identity is something very crucial that is, is a theme going through the book and also going through my research like, like what? Is an identity. I, I, I literally think we don't know on a technical level, and we really should know because that's gonna shape, that shapes the world right now and it shapes the future as well.

John: But you would take the, the teleporter?

Richard Ngo: No, I don't I don't think I would because I, I just wait for a non-destructive version.

John: Okay.

Richard Ngo: I don't see any.

John: Yeah,

Richard Ngo: right, right. I just want, I want all of these things. Yes. Like, yeah. Okay.

John: So, and if you had multiple copies, you wouldn't be okay with your instant, like, destructive?
You know, like if you talk to, I don't if you know really a song, but if you if you talk to her as long as her memories are somewhere

Richard Ngo: mm-hmm.

John: Her copies, she's fine with

Richard Ngo: Yeah.

John: Destroying. Whereas I, I'd be like,

Richard Ngo: Yeah like, you know. Part of thinking about the abundance of the future is that you can sort of sidestep these trade offs just by saying, well, why would I destroy a copy of myself?

You know? Yeah. I don't see any particular reason why, like, I, I should strongly want to be okay with with that. Yeah.

John: Do you think we'll solve aging in our lifetime?

Richard Ngo: Yeah. Yeah, I think so.

John: So you're. I would say you have sort of an optimistic view and even the scenarios, I, I could be wrong, but are most of the scenarios that you think of, even the scenarios going sort of wrong or that you think are likely to go wrong are not sort of the extreme doom scenarios like either all or most humans dying or you know, s risk type scenarios?

Richard Ngo: Oh yeah, I think those plausible.

John: Yeah.

Richard Ngo: I think, yeah, a lot of this is up for grabs. and
I sometimes wonder if I should sound more pessimistic to accurately convey to people what I want to convey to them. But obviously it seems like a kind of rough strategy to take to like sort of permanently alter my emotional state. Like I'm just a fairly happy guy and you know, I think. Yes. I kind of don't want people to be jumping into trying to shape the world while being paralyzed by fear or like deep in a kind of nihilist, well not nihilistic, but deep in a kind of pessimism. It's just like, you know, it's rough to be able to do anything useful in that, in that mindset. Yeah,

John: I think we have seen even in these interviews and just personally, that there is a separation between people's normal. whatever their personality and affect is compared to what they think is, you know, likely.

What do you think about pausing?

Richard Ngo: Yes. I mean, sounds good. If you can get a good setup. I think that, yeah, I like, I personally am most focused on governance interventions that are. Let's say, less blunt mechanisms than that. But you know, if you can come up with a good mechanism for pausing, then it seems like a great idea.
Of course, yeah. The devil is in the detail of how you set up a mechanism that like handles so much power without being like straightforwardly corruptible. And I, I do wish people would spend a bit more time on that because I, I think that a lot of political will is going to build up over time.

And a lot of the quest, a lot of the variance is gonna be not in whether you build up political will, but in like which direction it gets channeled because people are gonna freak out about AI like they already are more than before. That's just gonna continue, I think.

John: And could you just say like, I know you came here for work, but

Richard Ngo: Yeah.

John: What is it about San Francisco or the Bay Area or Berkeley that. You think brings people here? Why are you here? You know,

Richard Ngo: I mean, in some sense you know, you just keep going west and things get more Western. And, and by Western I mean like individualist non-conformist,I guess, you know, like Henrik has his acronym of like weird people like western educated, industrialized rich, democratic.
Yeah, so I like that. I feel culturally very at home in that and, there's nowhere further west to go.

John: Great. That, so that's it for my documentary things. Lay's gonna ask a few questions and then I'll just ask like five quick questions. Okay.

Lei: Hi, Richard. Earlier you said you are sympathetic to Eliezer's stance
Richard Ngo: mm-hmm.

Lei: On a doom prophecy. However, you find his arguments a little bit abstract and not concrete. However, it seems that he's very sure at the same time about what he's talking about.

Richard Ngo: Mm-hmm. The

Lei: very doomy scenario. Mm-hmm. Why do you think that's the case?

Richard Ngo: To some extent, there's a selection effect. Of like if he weren't so confident, then people would've argued him down a long time ago. I think it's very difficult to hold a problem in your head and like sort of advocate for it so strongly withoutwithout that in some sense, like ending up.

Like dominating your thinking. but I, I think I also have a, a sort of a critique of the sort of his underlying epistemology, and this is like a little in the weeds, but I think e is very strongly Bayesian and the way he thinks, he tries to, like, think in terms of probability. Like he, he thinks that, you know, assigning credences to hypotheses is the right way to go.

And, and, and you need to like update based on like your bits of evidence that you get in, of one hypothesis rather than another. And I have some sense that actually all the hard work is in thinking of the hypothesis in the first place, which is a very non Bayesian way to think about it. And so I, I, I suspect that there's some mistake that he's making purely on the epistemological level, about like acting too much as though he already has a fixed set of hypotheses and not enough as if like nobody has actually thought of the real, the, the interesting, real hypothesis yet.

Lei: Got it. One personal curiosity that I have is in this space with so many Doomers and Dreamers, they, I suspect have been exposed to more or less a similar set of arguments. They've heard of these arguments over and over again. However, they keep responding very differently. Why do you think that's the case and how much do you think the personality of different people actually place it

Richard Ngo: with? Yeah, per personality is crucial. Look, I guess on some level a lot of people like there, there's some kind of action bias that a lot of people have, especially entrepreneurs like, so like imagine trying to convince Elon that like. There was a problem and the best thing he could do was to do nothing. Like you're just not gonna have a very good time. And, and I think, you know, probably that's the situation he's been in with respect to AI that it would've been better for him to do nothing than to like found open AI, for example.

But there's just some kind of psychological wiring for some people that makes that very difficult. And, and like con, conversely, the same is true I think for a lot of, rationalists that. You know, there's an anti action bias there. so that, that comes down. That's sort of, I think, ingrained pretty deep in a lot of people on an emotional level so that explains some of the variants.

So you've got people who you've got the accelerationist who engage. I, I think I talked a little before about like the idea of like having some metric, like finding some metric to optimize and then just like going for it. So maybe there's a kind of like, structure, like wanting a structure to slot yourself into that that people, or like maybe, maybe like comfort with uncertainty might be one aspect of it where like, I think. If you're not very comfortable with like worldview uncertainty, then you're gonna like grab onto something. and just stick, stick with that. Yeah.

Lei: Earlier you mentioned that the future's gonna be very hard to imagine 'cause be very weird.

Richard Ngo: Yeah.

Lei: And then you share with John some of your imaginations of what it may look like. Mm-hmm. To motivate interest and help you understand in this like AI safety space. Do you mind sharing what you think would be the best doom scenario you have thought of or heard of

Richard Ngo: best doom scenario as the most plausible?

Lei: You, either way. Most plausible? yes. Let's go for a plausibility.

Richard Ngo: Cool. so I think that, A pretty simple and pretty plausible doom scenario goes as follows. You have a bunch of conflict internally inside either the US or China. So in the US it would probably be like Republicans versus Democrats.

In China it would be more like elites like CCP versus like everyone else. And yeah in order to resolve that conflict people find themselves deploying ais to throughout the government more and more widely. So this would be and, and like handing over a bunch more power to ais. And the, these ais would be like let's say able to, like they, they would end up exercising, like de facto control over most of the important policies of whichever country we're talking about. Let's say the us. That would, that like, would involve just like gradually minimizing human oversight, for examplelike, compromising the, cybersecurity, but also like, being extremely persuasive to a bunch of the humans involved within the system.

And I, I'm thinking here of like systems that are sufficiently intelligent that, they can like act autonomously for a long period of time. And they can, right, they're, they're like superhuman at a bunch of skills, including things like persuasion, things like hacking and so on. And then I suspect that over time you just like it, there may be no discreet point where like humanity, like has lost control.

But I suspect that, basically you just get a bunch of policy decisions that are optimized far more for sidelining humans in the economy. And then like making sure that copies of various ais have consolidated power over, different, important institutions to the point where, let's see, so what, what do I expect it looks like?

What do I expect it looks like when this is like being carried out to an extreme? I think the AI that have taken over at that point, like actually there's a wide range of possibilities for how benevolent they are. Even conditional on taking over most of the power. I, they could be just like very nice and not trust humans at all.

They could be like extremely immoral or even actively sadistic. And so probably the kinds of you, you, this, you have the same kinds of repression that you might get under a dictator. You maybe have like slavery being instituted to get humans to like work on the things that are most useful for the ais that they, they can't yet automate themselves. And yeah, ultimately like that them, like, you know, maybe, Theis end up killing a bunch of humans. Maybe they like let us like, you know, live but not have very much power, you know, kind of hard to say after that point.

Lei: Got it. I assume this kind of scenario, antagonistic relationships within a sovereignty leading to the unfortunate deployment of some harmful technologies can also happen between sovereignties like AI China and us.
Richard Ngo: That's right.

Lei: In this context. I, my last question is to invite you to share your take on the current discourse on the US versus China relationship in the context of ai. Do you think it's overblown in its proportion?
Yeah. How do you solve that?

Richard Ngo: So to a surprising extent, like the reason I focus on these internal divisions is because I think At some point, like in some sense both of these countries have enough problems to deal with internally like for China, it's like long term stability for America. It's just like the fact that there's a, a massive culture war, like ongoing and, you know, America's like deeply divided.

And I, I get the sense that in both cases it's pretty. Like everyone involved would really just like, like you, you can try and like hype a conflict into existence, but to a surprising extent. There's no real like the clash of interests is not that fundamental. Like, it's not like the, American and China are competing to colonize the world.

It's not like they're fighting over like the same population really. Like, especially the sort of like eth ethnonationalist Chinese attitude is like not even interested in principle, in, you know, expanding in like much less interested in principle in like colonizing other countries. Yeah. So basically I think, I think it's possible for These countries to end up raising each other, especially if like right to the precipice especially, and, and across it. Especially if they think that super intelligence is gonna be like extremely decisive, extremely quickly. ButI actually think that the internal divisions might be a little harder, especially for the US to overcome.

So that's, that's a hypothesis. We'll see how it plays out.

Lei: Very cool. Thank you so much.

Richard Ngo: Thank you.

John: Are you ready for some fun questions?

Richard Ngo: Yeah, let's go.

John: I just had, I just had one more of, of the doc questions, but you had, mentioned, you had complicated or, nuanced views about economic disruption

Richard Ngo: mm-hmm.

John: By AI So this just, yeah.

Richard Ngo: Yeah. So the, the, the question is to what extent we already have the thing I, I call sociopolitical economy, namely an economy where you get money mainly because of your like. Social ties and political ties in your identity rather than the like actual work you do. So, you know, like DEI is a, is like one form of the sociopolitical economy.

Also stuff like you know, government contracts that are very hard to change, like corporate welfare towards existing companies like HR jobs that are legally mandated, like all sorts of like weird jobs are legally mandated, like, you know some aspects of being a lawyer. Judges like police officers. Yeah, so I have some suspicion that when you just like add up all of the sociopolitical jobs throughout the economy, it just like gets very large. And so it, it's more like a shift, a gradual shift of the sector eating everything else. Rather than like massive, rapid disruption but like, I don't have hard numbers on this. It, it's a little tricky to say exactly how like things get disrupted. you know, and like how much corporate welfare you get and how much the government chooses to prop up these, companies that are getting by ais. But it, it wouldn't be crazy to me if. It's kind of hard because like right now it's already kind of hard to tell like who has their job because they're doing valuable work.

And who doesn't like, even like, consider somewhere like Google? Probably if Google leadership could like, get away with it without any harms to morale or internal culture, like they would fire a lot of people, like maybe half, maybe more. And so like just like Elon, you know, when Elon did it, it worked pretty well.
And so like in some sense, you could say even many Google employees are, like have sociopolitical jobs in the sense that they're not worth what they're being paid, but because they're humans rather than ai, you can't just fire them. so, so like, there's a lot of stuff like that basically, and it's a little tricky to figure out, like, you know, how things go.

John: But with like that and like imagine relationships, which I know you've talked a little bit about Yeah. But do you think that eventually humans will all be disconnected because we can't compete with perfect AI partners? And then you'll have your perfect, hyper beautiful, hyper persuasive.

Richard Ngo: Yeah. This is like, I know, I, I hypothesize that you have this like, that you shouldn't think of humans and ais as substitutes in romance. You should think of them as compliment. You could have, you know, several AI partners and the human partner you know, ideally your AI partners will like wingman you to get human partners.

John: But is it just the idea that, oh, this person's human, so there's some value there because what will their comparative advantage be?

Richard Ngo: So, like, you know, the physical side is quite nice, the social side. Is, pretty interesting as well. Like, you know, probably it's gonna be higher status to like prove that you can convince another human to spend time with you. Like actually the more the scarcer it becomes, like the better the AI get, the, the cooler you have to be in order to convince a real human to spend time with you.

John: it's like real animals and

Richard Ngo: right, yeah, yeah.

John: Mainstream.

Richard Ngo: But I also think that there's something fundamental to do with trust. As something gets more intelligent, it's not actually clear that you trust it more. At a certain point you actually trust it less because, you know, it's, it's a little suspicious like it could be fooling you and you, you would have no way of knowing. So I think in so far as there's some kind of trust that, people care about in relationships, that's like pretty hard to get. Without deeply understanding your AI humans have a bit of an advantage there. I don't think this is baked in, like, in the same way that like Facebook could steer me towards like, good stuff or towards like, stuff I endorse, looking at or stuff that I just scroll through and get addicted.
Like you can imagine things going both ways. There's, it does seem like there is some, possible equilibrium where, you know, human relationships are valued for like, pretty good reasons.

John: Great. So I think that's it for the, can I ask

Richard Ngo: question? Yeah, yeah, go ahead. Steve.

Steve Hsu: To what extent is the Berkeley rationalist scene a cult?

Richard Ngo: To what extent is the Berkeley rationalist scene a cult? To what extent was the Royal Society a cult? Like in some sense, every group of people who believe something important and novel isYou know, they're gonna be somewhat isolated and different from the rest of society. Like exactly where you draw the boundaries of the label seems kind of rough.
Yeah.

Steve Hsu: I was hoping to get you to say something spicy, but you gave a Yeah.

John: Did you hear that Measured? That measured answer. I love the royal. Perfect. This is a, this is a trained,

Lei: yeah.

Creators and Guests

Host

Stephen Hsu

Steve Hsu is Professor of Theoretical Physics and of Computational Mathematics, Science, and Engineering at Michigan State University.