The Global AI Race: Z.ai and the View from Beijing — #96

Zixuan Li: We just did our own stuff inside China and this year with like the, the popularity of DeepSeek and Quinn and other open source models we are opening up providing more open source models and this gave us opportunity to let the world know about us.

Welcome to Manifold. My guest today is Li Zixuan. He is with us from Beijing. He is an employee of Z.ai, which is one of the leading AI companies in China. Li grew up in China, but he studied in the United States at Carnegie Mellon University, and now he is in the middle of the hottest fields in the world, which is large language model AI. Li, welcome to the show.

Zixuan Li: Hi, Steve. Glad to be here and share more about Z.ai also the development of Chinese LLM industry.

Steve Hsu: Great, great. I was just saying to you before we started recording that you know, if you're in tech and if you're specifically, if you're in AI in the United States, everybody is always talking about the AI competition with China Nvidia chips, Huawei Ascend chips. Is DeepSeek better than open AI? All, all these questions like this. But we seldom get to hear from someone who's really in the industry on the Chinese side. And the only guest I've had so far in my show that has that background is someone who was a researcher at DeepSeek and is now in the United States. So I think my audience will be super interested

Zixuan Li: Yeah, that's great.

Steve Hsu: Great. So tell us a little bit about yourself. Where did you grow up and go to school? What did, what did you want to do when you were young and how did you end up being in AI?

Zixuan Li: Okay. So I will start from the undergrad because I studied law and journalism. But before that I do math competitions. But finally, look, look at some movies about lawyers and finally choose to, to become a lawyer. But after the undergrad, I think it's not very funny. I don't like working with Word Excel. I love working with AI. So I finally decided to join a legal tech company creating softwares for, for lawyers and for them to, to search cases. So if, you know Lexus Nexus or Westlaw. I'm, I'm kind of building the similar things for Chinese lawyers, and after three to four years, I decided to like, pursue more solid technology.

So I studied in the United States in, in MIT and also Carnegie Milon. Learn some basic machine learning thing like before the, the era of, of the current generated ai. And after that, yeah, I decided, okay, it is time for going back to China. But you know, what I've learned is basically the traditional machine learning.

So I'm not capable of doing the frontier research here, so I do the supporting staff, for example, product design, marketing, and also collaboration because I'm the right person to comprehend all the things. I communicate this stuff to the audience. Yeah, so I, I joined Z AI last year, coming back from, from the United States.

At that time we haven't done anything related to the overseas business and half of our models are proprietary, so there's no way we can cooperate with US-based companies, pro inference providers and the benchmark companies. We just did our own stuff inside China and this year with like the, the popularity of DeepSeek and Quinn and other open source models we are opening up providing more open source models and this gave us opportunity to let the world know about us.

Yeah, so here I am and also I'm the owner of our X account and Z.ai chat. So if you, you have used chat.z.ai, it is the platform similar to, to chat GPT. Yeah, I'm the product leader of this project.

Steve Hsu: Got it. So,

Zixuan Li: Got it.

Steve Hsu: If I have my history right, Z.ai was originally one of the more famous AI gen AI companies in China, right? When, at a time when DeepSeek was pretty much unknown, Z.ai was one of maybe five leading AI companies. Do I have the history right?

Zixuan Li: Yes. And we are more than that. So we started our GLM series models back in 2021. So, I think, I believe it's the, the time of GPT two. So we started very early. And do trillions of parameters training. So at that time, our, our model is, is like one point, like seven, five trillion parameters. It's, it's bigger than Kim K two actually. But, but, but we did this because we, we couldn't understand how to, to shrink the size and how to find the, the high quality data. But we, we did the exploration very early and you can, you can see from the archive, we have the GLM, the first series, like paper. You can, you, you can, you can see on the archive is dead back to the 2021.

And we have like,

Steve Hsu: oh, sorry.

Zixuan Li: yeah, we, we became very famous in, in 23. We, we open source our GOM one. 130 billion parameters model. And also Chat GLM 6 billion parameter models. So at that time we are leading in China. Yeah. But there, with the rise of Alibaba and Biden, like they have more money, more resources. They kind of like created things like Quinn bad as like Dobo and also DeepSeek. DeepSeek is very rich because they're quant company. They, they have, they have GPUs and money. So 2024 is very hard for us and we're trying to reshape our training methodology, our team, and finally created something last month.

Steve Hsu: Right. So in, in China you have a quite a mix. So as you, you, you listed a bunch of the companies that are leading AI companies in China, and some of them are part of gigantic corporations, which are more analogous to Google or Amazon. Some are startups like Z. So you, you guys must be one of the smallest leading gen AI companies in China.

Is, is that true?

Zixuan Li: Yes, yes, that's true.

Steve Hsu: So are you like a hundred people? Few hundred people?

Zixuan Li: I think 500 because, yeah, because the, the company include the research engineering team, product team also sells because sells is very hard in China. You have to directly contact each enterprises or developers. So it's not like, like the ecosystem in United States where you just let the users subscribe, your, your software and the money just, just came.

Yeah. So for us, we need to do a lot of product solution and more on the engineering side where you, you transform the, the LMS into the real solution or softwares and deploy them on the, like the original system of these, the enterprises.

Steve Hsu: So, in terms of core technical staff that really are involved in model training, post training safety research, those kinds of core technical things, it's, it's no more than a few hundred people. Right.

Zixuan Li: Yes.

Steve Hsu: But now the model you just released, which is I think GLM 4.5, that is a state-of-the-art model, right. On the major benchmarks. It's, it's pretty much competitive with Quinn, Kimi K2 DeepSeek and even the, the Western models. Is that fair?

Zixuan Li: Yeah, I think it's fair and it's a very balanced model. So when we look at DeepSeek, R one is very solid in reasoning and coding, front end coding, especially front end coding. And for Quinn, they have separate models, right? They have instruct model thinking models and coder. But for GLM 4.5, we try to incorporate all the capabilities in the single model.

So let it do the ARC thing. Yeah. So you mentioned ARC thing in the, in the list. So ARC means agent reasoning and coding. So we let the GLM study all the gentech thing, coding thing and reasoning thing, and combine into one model. So that's our competitive advantage compared to other companies.

Steve Hsu: Right. talk a little bit about what people would call the model card for GLM 4.5. So how many parameters, in size, what kind of data was used in training? Is there anything special about the architecture that's significantly different from, say DeepSeek or Quinn, maybe you could just talk about those aspects of the model.

Zixuan Li: Okay, so we have two models, 4.5 and 4.5. Air. 4.5 is a 355 billion parameters model. And. Air is quite small. It is over 100, 106 billion parameter model, and they use the similar data. So you, you just think about it, that 4.5 is a scaling up of 4.5 air and the data we used, which we did detailed in our tech report, like traditional training pipeline. But the most interesting here is how we train the agent capabilities. So between the pre-training phase and the post-training phase, we insert a phase called mid training. So in this mid training, we deliver high quality data like repository level code math competition, solution and also agentic trajectories.

So you have this multiple tool calling and like function use combined with reasoning. So we, we use all these kind of data in the Mediterranean phase, try to let the model speak the same language as the math competition. Like, like, winners or, or the man or spar who can, who can speak the language of, of agentic tool use.

So before the post training, the model has already gained the, the capabilities of doing stuff like, reasoning or agentic tool use. And our post training become very easy because we only need to give the answer. To the model to let it inference correctly so we don't have to worry about whether it understand what, what it is doing. And for the architecture, our model is deeper, so have more layers compared to K2 and, and DeepSeek based on our own experiment. But I think it's that, that's very, that's not an innovation, it's just based on the experiment. Yeah. The, the true innovation behind this model is the data and the methodology, and also it's a hybrid reasoning model.

So, so DeepSeek V 3.1 just released today. So we look at the, the model, it, it's also a hybrid reasoning model. Yeah. Trying to, trying to combine all the things in the single model

Steve Hsu: When it comes to the architectural speed up and efficiency, optimizations that deep seek, did you know back in sort of V 3, like things like KV ca cash, is that now universally adopted by the Chinese companies? Because Deep SQ was very open about all the stuff they did. Have their innovations been broadly adopted by the other Chinese companies.

Zixuan Li: Yes, yes, that's correct. Both Kimi K2 and GLM 4.5. I think they are learn a lot from DeepSeek V3 architecture. And in our training we only chose the best, best pipeline or best architecture for the performance. we did a lot of experiments and for the adapter we used more. And for our RL training, we, we used our internal innovation called slime. So all these are based on the very scientific experiments and DeepSeek V3 just contribute a lot more compared to other companies or entities.

Steve Hsu: Right and on on a per token efficiency level, a per token cost. How do you compare? Is there, is there sort of rough parody now between all the, the best open source models coming from China?

Zixuan Li: I think it, it, it depends on the, like activation parameters. So if you have similar activation parameters, it merely the same. But for the training stuff, I think the most impressive thing about Deepsea V three, it's how it make it less costly in its training phase. And we try to use more techniques to lower that cost.

But for the inference cost, I think it's basically the same. And for Z.ai chat, so we provide services globally and it it reduce our cost because the machines keep running, keep running. You don't have to buy a lot of GPUs to, to make it suitable only for Chinese users. And you can, you can accommodate more user request globally.

Steve Hsu: So I, I had a guest on a few episodes ago who is an AI researcher at Google, and when I had him on, it was right when there was a lot of publicity about Meta and Mark Zuckerberg hiring people from, you know, stealing people from OpenAI, from Google, from other organizations, and paying what was reported at the time super huge compensation packages. My guest said he didn't think there were any secrets, so he thought, you know, once you get to a certain level within these, in, in his case, he was talking about the US companies, so Google DeepMind competing with OpenAI, competing with philanthropic, et cetera. He said he didn't think there were really any secrets, and so, you know, maybe there's some reason for Mark to pay so much money to hire these people, but he, he personally didn't feel it necessarily made sense. I'm curious how, oh, and another thing that was reported about those hires at Meta for their artificial super intelligence team, that's what it's called, is, you know, over half of those people, maybe two thirds of those people were Chinese, actually, maybe even from China, that were recruited, like sort of superstar recruits that were paid, you know, reported the a hundred million dollar packages.

I'm curious what this looked like to people in China, in Beijing, or in Hangzhou. What did people there think of Mark Zuckerberg paying so much money for that talent? Because I'm guessing very similar talent exists in where, where you live in Beijing or in Hangzhou. So what was the, what was the reaction in China to this?

Zixuan Li: That's a great question. And I believe that the, the super brains in, in China and u US, they're like different mindset and different, different skillset. because most of the researchers in Z.ai, they haven't go abroad. They, they study here and yeah. Study in Tsinghua University and work for a Chinese company.

For OpenAI and meta. I think those researchers, they got their PhD overseas in the United States and May maybe in MIT Carnegie Mellon. So the, the researchers here, they read a lot of papers and we don't believe that it deserve to cost you like a hundred million dollars to to to hire the people. Yeah. Currently to, to, for training a GLM 4.5 level model. We think those, those people are too expensive.

Steve Hsu: Yeah, let, let me say it in a, let me say it in a very provocative way because first of all, GLM 4.5 is way better than anything meta has ever produced. And, and if their next model is as good as 4.5, it's already a huge leap for, for meta, but probably that a hundred million dollar package pays for almost like the whole team at Z.ai.

Am Am I, am I wrong?

Zixuan Li: I, I think for the most expensive person, yeah, the, the, the package can cover all, all the, all the researchers here.

Steve Hsu: Yeah.

Zixuan Li: But I think training and doing research are different things. So maybe Mark Zuckerberg is chasing the next generation LLM. Yeah, they're, they have more GPUs to do experiments compared to ours and they're not just doing training, because when you look at the, the average engineer, the software engineer and machine learning engineer, they don't have packages like that. So all the, all the people with like millions or or tens of millions dollars, they're researchers. So they don't actually train the model.

Yeah, they, they do research and predict the future and, but, but by at Z.ai. I think we lack those people. We have people like that, but not as many as the people at meta. But for training models, it's a different story. So training, training model, I think there are SOPs and you just need to, to do the experiments, look at the paper, see what everybody else is doing, and you have to do a lot of dirty work.

The, the data, how you clean the data, how you gather the data. I think that's what not Matt is paying for, for, for their, their scientists.

Steve Hsu: So for people like that, I think you, you sort of referred to them as researchers who are maybe seeing where the field is going rather than actually engineering the training of the next model. But looking a little bit further in the future, which open source model companies in China do you think are strongest in that category, in that area?

Zixuan Li: Open source I would choose DeepSeek and for closed source that will go with ByteDance.

Steve Hsu: Okay.

Zixuan Li: Yep. ByteDance is underrated globally only because it's not open source. Yeah. They have like, they're kind of Google in China. They can earn money from from everywhere. Yeah. Google don't have to worry about whether it earn money from, from Gemini, but for other companies like OpenAI and anthropic it needs to earn money from its AI services.

Steve Hsu: Got it. Now, for the top five or so Gen AI labs in China, do you feel the government is actually helping you or do you feel at Z that, hey, we're just a startup and we have to survive on our own and we could easily not survive if we don't execute. Like, like where do you feel the strategy, the strategic support from the government impacting you? Is it in the data center access, or is it in money, or what kind of support do you, do you actually get.

Zixuan Li: I think we get support, but not as much as the US government or US citizens flee. because we are a company, we have to earn money. Yeah. Government don't bring customers to us. Maybe give us more exposure to the resources, but we have to earn these resources on our own. Yes. So government like provide a value for you and you, you need to compete with other companies like Kimi K2 and like Quinn. So for the data centers or for the, or even for the money why they should finance you. You, you have to prove it

Steve Hsu: Right. That,

Zixuan Li: I think it's similar to the US because also like OpenAI gets support from U US government. We get support, but that support is not very like in the in solid way. So money doesn't like directly give us on only because we are leading open source company. So we have to do a lot of things to prove ourselves.

Steve Hsu: Right. I, I mean this is speculation, but you know, I imagine a really, truly strategic company like SMIC or maybe Huawei, the hardware part of Huawei, maybe they can get some direct support from the government. But my impression is for you guys and Kimi and DeepSeek, you guys are all just competing, like almost just like companies in the US are competing. Like you're not really getting reliable support from the government.

Zixuan Li: Yeah, I think the, the reason, one of the reason from my point of view is that we are not a hardware company. So there's no imports or like controls over the axis of, of these resources. So if some resources are controlled by other entities, we might need some other support, but currently we are just doing the software side or the AI side, and it seems that the government don't have to put much effort into our products or services.

Steve Hsu: So when, when the DeepSeek moment happened in early, I think it was early 2025, earlier this year, the DeepSeek moment happened, the perspective from tech people. In the US was, oh, now DeepSeek suddenly became like the darling of the government. You know, LiangFang Zhang was, you know, invited to Beijing to meet, I guess he met Xi Jinping or something.

Everyone here in America thought, oh, these guys did their training. They only spent $5 million or $6 million to do their first, their main training run. And they did it on some, you know, maybe a one hundreds. They had bought already bef while, you know, using the quant trading money, but now they'll get infinite hardware support, data center support from the government, et cetera, et cetera, et cetera. What do you think about that? Do you, do you think the government is picking a national champion like that and really supporting them in a way that they don't support Z.ai or Kimi, like how, how different do you think the situations are?

Zixuan Li: I think DeepSeek is very hot in terms of government level support, I think is the champion, but not infinite resources because we don't have infinite sources here. Maybe DeepSeek they don't have to worry about commercialization. Yeah, I think that's, that's very true here. That DeepSeek is a AI lab. It's not a company actually. So the government can make sure that like, you can lose money. You just did the training and provide the best models in the world. Compete with other, other models or proprietary models? Yeah, so the government can make sure that DeepSeek, don't have to worry about this, but DeepSeek don't need customers.

So it's customers. So the paying customers are only the API users. So for chat bot, nobody pays for the, the chat bot, it is totally free. And for the, like local deployment or projects, DeepSeek don't, doesn't have any service like that. So other companies, they use the DeepSeek open source models to, to do the deployment and serve the companies. But DeepSeek doesn't do this thing on their own.

Steve Hsu: Like putting all its effort in the, in the training and the government wanna make sure that they think can going on .

Got it. So, so they're the special case. Now Qwen has support because they're part of Alibaba, but the, the really excellent models that have to kind of survive on their own as companies really. Are you guys, Z Kimi. Right? And is there another one? I mean, it's probably you guys, right? The other ones are part of much bigger companies that make money doing lots of other stuff. Is that, is that correct?

Zixuan Li: Yes. So I just saw a, a graph listing the, the tiers of Chinese air labs, especially in the open source domain. So Quinn and DeepSeek, they're the first level, and Kimi and, and Z the second level, other companies like Step and also RedNote, like they're coming up. We believe that it's, it's a hard time for us because we're competing with giant companies, like ByteDance and Alibaba, we need to get fi like financing and we also need to acquire customers from, from their customers, right?

Yeah. So the, the competition be between models. I think that's the one thing, and like competition among customers is another thing. Like you, you, you need to get the trust from the customer. You need to provide solution and your pricing strategies. There are a lot of stuffs beyond the models.

Steve Hsu: You know, I just to re just to reassure you, I'll tell you that over the summer I had some conversations with some of the most senior people at OpenAI they're stressed out too because they're aware that the open source models can cannibalize their revenue from below. Also, they have to compete against some big monopolies like Google and things like this. So, you know, so they're under a lot of pressure too. Every, everybody actually feels they're under pressure. Of course, some people are luckier than others, but still, everybody, even, even, I'm telling you, even the top people at OpenAI feel tremendous pressure. I mean, it's a, it's a, it's an amazing competition that's happening right now.

Zixuan Li: Yeah, that's so true. So I work 20 hours a day during the past two months. Yeah. Because we, we only have 13K followers on X and DeepSeek has like 100 times more. So we, I need to use my effort to help the training team to make the model more famous so that we can get more resources, more customers, to get the model more famous.

Because when I post a next post without promotion, without like DM anyone, it only get 1K views or exposure. So even our followers cannot see what we are doing right now, that I need to work very hard to, to build this D Edge chat, to let people try out the model freely. Yeah. And also like, yeah.

Steve Hsu: I think in the, in the west, people know about DeepSeek, but after DeepSeek the, the, the level of awareness drops tremendously. So e even Quinn, K2, you guys like, there's very little awareness. I test, I use all your models 'cause I'm just curious about the quality level. So I, I, I, I sort of cycle through all of the Chinese open source models just to get a sense of the quality level.

And I find it quite impressive. I actually, you know, maybe the best OpenAI performance is a little bit better, but it's the, I think the gap is not really very big and it really depends a lot on what you're trying to do actually.

Zixuan Li: Yeah. DeepSeek r-1 I, I think it's a moment that we cannot replicate or even DeepSeek cannot replicate because it's the first time people see the chain of thoughts like coming out of a model, not, not human made,

Steve Hsu: Mm-hmm.

Zixuan Li: and also the performance. I think it's equivalent to oh one at that time. So equivalent to the top territory models. And we believe that there is a gap between the top open source model and the top proprietary models currently. But like in the earlier 2025, the R-1 really impress the world

Steve Hsu: Mm-hmm.

Zixuan Li: by like surprising. A lot of people, even the US companies CEOs, a lot of CEOs, they're, they're freaking out by, by this model.

Steve Hsu: Yeah. I've never tried the ByteDance models. Do you, do you really, do you feel they are better than the Open source models in China?

Zixuan Li: I think they're kind of Gemini. I think they incorporate the vision capabilities and the thinking. I think the coding is not comparable to the top open source models, but in other areas like role playing, like vision, it is pretty close to Gemini 2.5 Pro

Steve Hsu: Mm-hmm.

Zixuan Li: They're earning money with this model deep double seed, 1.6 and 1.6 thinking.

Steve Hsu: So one of the big differences between the perspective here and in China in my experience is the way that we would say it in Silicon Valley is being AGI pilled, P-I-L-L-E-D AGI pill, which means that you think AGI. or ASI is right around the corner and there could be a very rapid takeoff to reach that level.

Whereas I think in China, people are not, fewer People think it's right around the corner and there will be a fast takeoff. So I'm curious what, what you think about that 'cause it seems like people in China are just more pragmatic about it. Like they, they are building models which are almost as good as the US side models, but the whole attitude, the worldview of what's gonna happen in the next three years or five years, it seems quite different to me. But I'm curious what you think.

Zixuan Li: Yeah. I agree with you. We are more pragmatic and even though we are chasing a GI right now, there's no unified I think the, the definition of AGI currently. What AGI means is a model can, can doing IMO problems or can, can replace lawyers or like financial analyst. There's no definition. For us, we are chasing AGI by incorporate more human-like capabilities like agent to use.

And now currently we are working on multi-agent system and we believe that for AGI you need to connect to the physical world. So none of the models can, can do this currently. Yeah, for agentic stuff it doesn't mean that making the slides or making a deep research report is a GI. Yeah. You need to operate like human.

Maybe one day we can, we can realize this by like tool calling, a robot hand or, or something like that, but people need to work on it. It, it's not near for us.

Steve Hsu: So you might know that my, my main job is as a theoretical physicist, so I do research in theoretical physics and I have access to some of the leading specialized models. So coming from some of the top labs specialized models for science or for physics, which are analogous to the IMO models, right but for science and physics. And so I'm able to submit research queries to these specialized models, which have essentially kind of unlimited token budgets. And what they're able to do is actually really impressive. So they are actually right on the verge where I actually think probably in the next year or two, you'll see some good scientific papers, theoretical physics papers in which the effectively co-authored is a model because the model is actually able to do some non-trivial stuff.

Same way it can solve like an IMO problem. if a researcher gives them kind of a well posed thing, it can actually make a significant contribution now to phy, in this case, physics research. So it, it is really impressive what you can do. Like typically these are pipelines where you have large number of tokens, you have different instances of the model, which are trying to propose different solutions. And then you have verifiers that are sort of checking the solutions to make sure that they make sense or checking the citations, looking at the references. So that whole pipeline, which consumes a huge number of tokens, it does generate really amazing stuff at the moment.

You have to be a, an actual leading researcher to be able to understand the output, to like actually tell like, this is junk. This is junk, but this is actually something original and really good. And, and, but, but we're right around the corner, I think right around the corner where you're gonna start to see papers in which an, an AI actually made a significant contribution to the research.

Zixuan Li: Yeah, so, and we, we can also see DeepSeek is making their approver, like Biden is making the approver trying to solve the mass problems. And that, that's very great. And we are working on similar thing, not the prover, but we are doing AI for science as well.

Steve Hsu: Mm-hmm.

Zixuan Li: We believe that like for  AGI, but for AGI, I think specialized model is not enough. So you need to like integrate all the capabilities into several models. Not a single one, but maybe for, for several models can perform all the tasks and chasing the AGI. So in that way, I think grok heavy and GBT five are somehow underrated because yeah, they, they are trying to improve the intelligence level, even though some people will say it's not as good as the four O in terms of role playing or the eq, I think the EQ dropped down a little bit. Yeah. But, but for, for the AGI, we, we need work like that. At least they're chasing something that beyond current human intelligence and capabilities.

Steve Hsu: So, I, I wanna pivot to a topic which is a little bit sensitive. So I don't want you to talk about Z if it's confidential information about Z. But maybe just talk about the industry in general, like what it looks like from the perspective of Chinese AI companies. And, and the issue is the AI chips.

So, as you know, the US is, you know, for a while was not letting Nvidia sell any of its good stuff. Now maybe the H 20 is coming back. We also know there's being smuggled in even Blackwell, GPUs are being smuggled into China. So I'm curious, like, again, not from, don't say anything that you shouldn't say about Z but from the perspective of the companies in your space trying to get enough compute to do really big training runs, do they feel worried about the supply of chips? Do they feel they can switch over to Huawei ascend chips? Or maybe there are just enough Nvidia chips around? What, what is the pre-training and RL training situation right now for companies like yours?

Zixuan Li: Okay, so I'll start from Huawei chips because from my point of view, Huawei is chips can replace some inference tasks

Steve Hsu: Mm-hmm.

Zixuan Li: and it's doing quite well. Maybe 80% level of, of Nvidia chips. But for the training one, I think it's not mature yet, or even Huawei is mature. Those air companies are not prepared to, to do the training and calculate all the, all the GPUs together. Yeah. So the, for the training, it's still in the experimental phase and for the inference, I think it quite robust. Some, some companies are already using it, widely and there are a lot of cloud services. Yeah. So for AI companies, most of AI companies, they don't directly use GPUs. They use cost services. Yeah. And for the training side, I think chips are not enough. But for us, I think we, we have to, to make our company profitable. So we need to assess how many chips we need. Not, not more is, is is better. Yes. So I think we try to make it a level that we can afford. But for companies like ByteDance and Quinn who they have unlimited resources, maybe that's a, that's a huge test for them to get more chips.

But for us, I think, yeah, we just buy the chips we can afford currently. Yeah. And we try to improve the, the algorithm. So why algorithm that open source is so important in China is that we try to do the training more efficiently and more effectively.

Steve Hsu: Right. So for, for you and say, Kimi being more like a startup, you, your main constraint is, is money. Like how much you can actually spend, when you talk about Alibaba QN or ByteDance, as you say there, those are very profitable businesses. It's all, it's all basically how much the CEO wants to allocate to AI training.

You know, and it could be a huge amount, but then they might actually encounter the supply issues for Nvidia chips for training. And so I, is that a real constraint right now or is there, are there enough smuggled Nvidia chips in China that ByteDance and Quinn can still do whatever they want?

Zixuan Li: I think they cannot do whatever they want, but they have some tactics or have some, some contracts with companies. And Biden's a huge global company. They have relationship with US government as well. Right. So during the, the negotiation on TikTok and other stuff. So I'm not sure like whether there's a specific agreement between Biden's and US government or other parties, but they have different venues to solve problems. Yeah. Compared to us, we can just accept things. But for companies like Quinn and Ance, if they can do some legal stuff, it is, I, I believe it's not illegal, but for, for the companies, like in their sizes, they have more powers and can build more formal connections into the authority and Yeah. With other entities.

Steve Hsu: And, and do you think DeepSeek is in a category all on their own? Like do you think deep seek is GPU limited or do you think that's that's not their main problem right now?

Zixuan Li: I think DeepSeek is on our side. Yeah. It's it's similar to, to us. and I think it's very sensitive right now because everyone's looking at DeepSeek. Not everyone's looking at ByteDance because most of people will consider ByteDance a video company or, or, yeah or shopping company. Not many people consider Quinn or, or ByteDance a really competitor against US top AI companies.

But DeepSeek is different.

Steve Hsu: Mm-hmm.

Zixuan Li: I believe that like US government trying to stop any chips going to DeepSeek

Steve Hsu: yeah.

Zixuan Li: like the, the leadership at  DeepSeek , they have a long-term vision, so they predict the future. They, they bought the chips before everything happen, and I think they're very smart and they're trying to use more advanced algorithms to train models. And I think, I believe reason why  DeepSeek success is their infrastructure and their data collection process. Yeah. They have very high quality data that nobody else have because I would see the job description of their, like data labeling people very high standard. The bar is very high. You have to know everything about AI and also on some domain knowledge. So they're trying to use other stuff to complement the lack of GPUs.

Steve Hsu: So one part of the story I think that people talk about a lot in the United States is that when companies, whether it's, you know, whether it's OpenAI or X or Meta, they, they want to put up new data centers. It's very hard in the United States or even in the West to get the electrical power for those data centers. Because our grid hasn't grown at all in the last 20 years. The amount of electricity the US produces hasn't really changed much in 20 years. And so to start adding some gigantic data centers to the grid is very, actually, very difficult. And when you compare it to China, where the, the amount of electrical power is growing every year by a large amount, it seems like it's easier for China to power big data centers and you have individual provinces and individual cities trying to align with the government. They know the government likes AI so each city or each region is putting up its own data centers. So they're putting up lots of, you know, physical buildings with plenty of power, and then they have to get chips to put in there.

Does it seem like, from your perspective, that there's just a kind of possibly even oversupply of data center capacity in China because of this? Like, everybody's rushing to try to do the same thing. I think XI even made a comment like this at, you know, one of the, you know, party Congress is saying, why is everybody investing in EVs and, and data centers?

It's too much like, some people should do other things. I thought Xi actually said something like this. So does, does that reflect reality? Do you, do you sense something like that happening in China?

Zixuan Li: First of all, company like Z Kimi. They don't build the data centers like lab meta or X. So it's not, it's not our task. And as I said before, we, we use cloud services and make it easier for us to do the inference or or training. And there, there are companies or organizations doing data centers, but their services are not providing to us, but providing to other giant companies who need LLM because there are some local de deployment and we sometimes we have to fine tune our models

Steve Hsu: Mm-hmm.

Zixuan Li: and then provide to, to some giant companies like oil companies telecom communication companies.

So these companies need a lot of influence, power, and resources. So I cannot say exactly whether it is oversupplied or Undersupply, but the, the supply pipeline is not what it looks like in the United States because in the United States, all major models are proprietary. So only OpenAI Google, they can supply the model.

But here we open source the model so other people can use our, our models to, to deploy and provide services. So the, the data center may be like decentralized. It's not controlled by, by AI companies. So we have like providers, the, the inference providers who can provide the, the services with Quinn, with GLM. And we cannot trace these services. So they're everywhere.

Steve Hsu: Got it. Yeah. I think you know, if, if the way you're describing it is a lot of this capacity maybe is going into inference, and if it's inference, the Huawei chips are actually good enough. So maybe these new data centers are all gonna be full of Huawei ascends and just running open source models, doing inference for customers.

Is that, is that the.

Zixuan Li: And also, and also providers, inference providers are co-designing with Huawei,

Steve Hsu: Mm-hmm.

Zixuan Li: that it's not just standard chips and they are like different types of chips for different tasks sometimes for, for region tasks, for MOE architecture. Yeah. So you need to build different, different chips. So maybe for some tasks the chips are under supply and maybe for others it's oversupplied.

Yeah. But it, I not guarantee whether. That's the case for, for all the scenarios.

Steve Hsu: Got it. Good. so that maybe that's enough about hardware. Do you have any forecast, sort of any non-obvious things that you think are gonna come out of the open source AI competition in China? In the next few years, like, can you predict where things are gonna go?

Zixuan Li: I think we, we'll continue to do the open source and contribute more to the, to the global community because we are not there yet. So for the AGI thing, I think it's not a competition between US and China. It's a human thing. So globally we need to collaborate on the key tasks, like the, the how to reasoning, how to do the COT I think DeepSeek really contributed a lot to the community by letting people understand how to, how to do the reinforcement, learning how to incorporate the COT inside a model. And for us, we are trying to do the similar thing and provide more to the community, and we expand the cake and we can get more.

So, as I said, we open source model and we can only get maybe 5% or 10% of all the services related to GOM because now we can run GLM on your desktop. You can use GLM with fireworks, novita, AI, or other US based providers. We, we don't earn money from them, but I, I think it's, it's necessary. Yeah. And you, for, for from competition, you need to do a lot of things.

Steve Hsu: If you go to a party in San Francisco or a dinner, you might have on one side, some guy who's very senior at OpenAI, some guy who's from Anthropic, you know, another guy who's from Google. And you know, of course there are some things which are really secret, but there's still plenty of leakage of information between all of these.

You know, they're all competing with each other, but there's still a lot of leakage of information between the engineers and even even senior level people who are at these companies. I'm curious, how does it feel in China? So like, you go to some event and maybe there are a bunch of other AI people there, and like, are there any secrets between what the Quinn people are doing and what the K2 people are doing or do people the engineers just talk to each other?

Zixuan Li: I think most of information are confidential and. Because we don't, like, we, we don't have parties in China. know, we, we only have like friends gathering or, or, or similar things. And for Kimi K2, I was impressed by their trillion parameter models. And I believe that nobody at Z.ai know they're doing a, that big. Maybe we, we know they're doing something a foundational model, but we haven't expected it's that big and have performance like that. And they're trying to use Kimi K2 inside clock code. All those stuff are confidential.

Steve Hsu: Okay. Well, it sounds like you guys are more careful than the guys here because I, I think there's more leakage here between the labs than what you described. So like the example you gave for the audience. So K2, which is a really excellent model that came from Kimi that was recently released and it's has a trillion parameters, which is big. It's as big as I think any model. And you guys are around 360 billion, is that right? Yeah. So,

Zixuan Li: 355.

Steve Hsu: Right. So, so I guess you were saying before, when, before Kimi came out, before K two came out, nobody knew it was quite gonna be that big,

Zixuan Li: Yeah.

Steve Hsu: a secret until it actually came out. Got it.

Zixuan Li: Yeah. It's a, and also for GM 4.5, I think nobody's know it. It's a, it's a thing, it's a, a genetic model. Maybe someone is expecting us to provide a powerful model, but they haven't figured out yet because in 2024, I think some companies are misled by LLM Arena, chatbot Arena at that time. They try to improve the score on that arena, but neglect that we, we need something like Cloud 3.5 or like the useful models.

So all models at that time are purely improved or training for general use or, or chat.

Steve Hsu: Mm-hmm.

Zixuan Li: But with the, with the, the popularity of Cloud 3.5, we, we understand that we have to do more things other than chat. And with  DeepSeek we realize that, okay, it's time for doing the reasoning and we can yeah. Let the, the model think for 10 minutes to get an answer for an MO level question.

Yeah. So we, we shift and also not we is not referring to, to the Z.ai, but also the AI researchers. We, we change our mindset every month and we read every month, every step. So we also learn from Kimi K a lot, just two weeks ahead of us oh, we look at Coco. Oh, that's amazing and Coco made it famous. Also, we look at Manus GenSpark, thinking about how we can build a model that can do the same thing as a agent. Yeah. So Manus, they use, cloud for solid and with a lot of engineering stuff, and we try to replicate the trajectory and let the model learn from that trajectory so the model can like call a function, get back here, continue to think, and then use another tool back here. So currently when you, when you log in to the Z.ai chat.

So our as slides, it is a single agent

Steve Hsu: Mm-hmm.

Zixuan Li: architecture. So it, it can just use one model with several tools to complete a lot of tasks.

Steve Hsu: You mentioned that you're working 20 hours a day at Z and other similar companies, is 9, 9 6 still the rule for, even for people who are AI engineers and AI researchers,

Zixuan Li: I think there's no regulation or policies on that because I need to communicate with US based partners. So I, I need to work very hard. So day daytime, I manage the product team, the growth team, and nighttime I collaborate with US based providers, leaderboards and even the scientists. So there's a lot of stuff going on.

Steve Hsu: Do you, do you think like within the company though, the AI, the AI engineers and AI researchers are, are they actually working the equivalent of 9, 9, 6, or is it less intense than that?

Zixuan Li: I think it's close, but there is no policy. They're working very hard. It all depends on, on the work. Yeah. They have to, there's a deadline, right. So they have to allocate their time. Maybe they can, they can go on a vacation and back and work maybe 20 hours a day, and maybe they can give up all the vacation and work 10 hours a day.

Yeah. It depends on the, the schedule they have to finish the task. Yeah. They, there's no one pushing them to, to stay in the office and. And, and get back to home at nine. But the work, yeah, maybe they, they, they invented something that really automatic and they can, they can go back at five, 5:00 PM Yeah. All depends on, yeah, all depends on their skills and how they perform the tasks.

Steve Hsu: Great. Well, we're right at about an hour, so, maybe we should start wrapping up. What do you think is the biggest misconception that let's say technical people who follow AI in the us what, what is the biggest misconception that they have about what is happening in AI in China?

Zixuan Li: I think we are normal people yet, so there's no secret sauce. We just follow the, the signs. We did experiments, we read paper. There's no some hype or governmental thing that can really accelerate the process. So we welcome any invitations like this podcast or communication channels. Yeah. We believe that with more communications, we can understand each other better.

And I do love the party side because I, I lack information and we are willing to, to exchange information. Yeah. Not, not of the confidential information, but how we, how we handle things, how we look at things. I think that's critical for, for innovation.

Steve Hsu: You know, if you are ever in San Francisco, I'll hook you up. And the scene in San Francisco maybe would seem very strange to you. So there are people like living in group houses, so this is, even people who are in their thirties, for example, would like to live in a group house and maybe everybody in the house is an AI engineer or researcher or something.

Maybe they're working at a startup, maybe one's working at Google, another guy's working at Google and other guys working at philanthropic and they're all living in the same house and they're throwing parties and even throwing hackathons on the weekend. So it's like a crazy scene. It's almost like this, I dunno if you're familiar with the 1960s in America, where there are just lots of crazy energy in Silicon Valley, especially San Francisco and Berkeley there's crazy energy around AI. So that, that's actually what I'm referring to when I talk about these parties and the information flow between different labs. It, it, it's, it's, it's very special.

Zixuan Li: Yeah, I love it. I truly love it. But in China we have more home bars.

Steve Hsu: Okay. Okay,

Zixuan Li: Home bars and we also have an AGI bar here, here in Beijing. So if visit you, visit China, I will invite you to the, to the AGI bar, like people just randomly talking to strangers and for Yes. We, we don't have like many houses that, that includes people from Quinn or ByteDance and Z.ai but we, we love like some, some houses or, or bars that can meet strangers.

Steve Hsu: yeah.

I'm gonna be in Beijing I think in January, and I'm gonna be visiting another physicist who is at Qing but he also works with ByteDance on applying the models to math and physics. So maybe if we go out together, you'll meet some people from some of the other companies and from, from academic AI.

So we'll definitely do that in

yeah, if you wanna meet more people, I can arrange this, invite you to some other places. And because our company or our lab is originated from Tsinghua University. Yeah. very near from Tsinghua Yeah. Great. 'cause I think I'll probably spend at least a week in that, in your area, so it should be fun.

Zixuan Li: And yeah, so we, we are normal people. Back to that thing,

Steve Hsu: Okay,

Zixuan Li: no, no mystery, no, no Eastern mystery. So someone believe that, oh, there are some tricks or other things. I think we are just normal people and following

Steve Hsu: wouldn't say, I wouldn't say people here think there are tricks or that you're not normal people. But I think one thing people realize is because China is so big and the students study really hard, there's a very big population of people who can read and understand the AI papers, right? So when a new model comes out with some new innovation, KV cash or something, there's a very big pool of people in China I think, who can read those papers and understand them.

In the US it's only kind of people who went to the top AI schools, I mean PhDs and stuff, who can, I think, follow all this stuff. But I think increasingly there's a larger and larger set of people in China who can do this stuff. That, that's my theory. But I don't think it's because people are different.

That's just, it's just there are a lot of smart people in China.

Zixuan Li: Yeah, they, they care about AI, so there are a lot of, they, they begin maybe coding from their six or seven.

Steve Hsu: Yep.

Zixuan Li: There are a lot of hackathons organized for high school students. Yeah, I'm really, I'm really impressed. So last year I joined a hackathon in the high school. There are people using LAMA index, long graph, long chain to build agents, and maybe for this year, different kinds of adjunct workflow instead of, of the predefined workflows. So they're very interesting in, in this kind of stuff. So not, not just for models, but also for, for the architecture, for the framework. There are a lot of people contributing currently.

Steve Hsu: Great. Well, let's, let's leave it at that. I, I know you're very busy, so I, I'll let you get back to work, but it's been great to have you on the podcast and, I hope I get to see you when I visit Beijing.

Zixuan Li: Thank you. Thank you very much. Thanks again for the invitation. I, I think it is a great opportunity to talk to you and the audience and share more and let, let us know more about the both us side and the Chinese side and this in case great.

Steve Hsu: Great.

Creators and Guests

Stephen Hsu
Host
Stephen Hsu
Steve Hsu is Professor of Theoretical Physics and of Computational Mathematics, Science, and Engineering at Michigan State University.
© Steve Hsu - All Rights Reserved