Meta's 24k H100 Cluster Capex/TCO and BoM Analysis
1.47 Billion Dollars TCO, Full BoM Analysis of H100 Chassis and Infiniband Networking, Opex Analysis of Coloc Cost & Electricity Costs
Welcome to an in-depth analysis of Meta's 24k H100 Cluster Capex/TCO and BoM. In this comprehensive breakdown, we dive into the Bill of Materials (BoM) of Meta’s 24,576 H100 cluster. We examine the capital expenditure (Capex) broken down by major item item such as H100, CPUs, DDR5 memory, infiniband switches, etc. Additionally, we explore operational expenses (Opex), including colocation costs and electricity expenses. Finally, we dive into the total cost of ownership and the TCO per GPU hour in this cluster.
Disclaimer: Only publicly available information will be used and analyzed within this analysis. Estimates may be wrong. Main Sources are Semianalysis public articles & fs.com
Background
During the beginning of March 2024, Meta announced two 24,576 H100 clusters to be used by their GenAI teams such as Llama3 training. For the compute fabric, one of the 24k clusters is networked using RoCE Ethernet which the other one is networked using Infiniband. This analysis will only cover the 24k infiniband cluster. We may analysis the RoCE Ethernet cluster in a future post. You can learn more about the different network fabrics in an H100 compute cluster here.
In order to avoid crazy OEM markups, Meta has decided to partner with an ODM to designed and manufacture their own H100 chassis. Since Meta has such high volume, this allows them to negotiate directly with the component manufacturers like Intel, Nvidia, etc while amortizing their R&D over 43,750 H100 chassis in 2024. Furthermore, this chassis can be used for their MI300X compute servers too.
When you are designing NDR Infiniband networks above 2048 H100s, 2 Tier rail optimized fat tree is no longer possible due to the limited number of ports on each QM9700 Infiniband switch. You need to either move to a 3 tier folded clos topology or a DragonFly+ topology. For Meta’s 24,576 H100 cluster, they have designed to go with the standard 3 tier folded clos topology.
Capital Expenditure
Below we estimate the capex of h100 based on publicly available data and the gross margins of server ODMs such as Quanta Computers. As you can see, the H100s take over most of the BoM at 65.8% while CPUs cost is only 1.75%. 1.75% is still 15.97million dollars but is nothing compared to Nvidia’s cut.
Source for this table is all publicly available information and is mainly from Semianalysis and fs.com
Note that meta has not use Cedar-7 Fever modules but instead opt’ed for the standard ConnectX-7 module. You can learn more about how they could of saved $86.4 million by using Cedar-7 Fever in our analysis below.
Operational Expense
The main 2 significant opex is colocation space and electricity cost. Even though, Meta has tons of fully owned datacenter campus, we used the standard market price for colocation as the datacenter colocation industry and datacenter building industry is supply constrained right now thus making Meta’s internal datacenter building efforts will be more expensive now too. If you substract the gross margins of the colocation provider, Meta’s datacenter space cost would probably be close to $80/kW/month. This cluster takes 39-40Megawatts and there are approximately ZERO available colocation space in the world with that much readily available power. You need to greenfield a completely new project which take 4-5 years or partner with vendors that have the contracts to this colocation space such as how Microsoft has partnered with Coreweave.
For electricity cost, we used the same rate as SemiAnalysis and assume a 90% electricity utilization rate. The reason we assume such high of a utilization rate is because this H100 compute servers usually run at peak power in term to crunch numbers. OpEx takes up approximately 29.27% of the total cost of ownership. The reason we only assume a 4 year lifespan is that since Nvidia has moved to a 1 year datacenter accelerator release cadence, 2028, B100, X/R100, Y100, Z100 will have released. There is a plenty strong argument that by 2028, electricity and datacenter space will be so expensive to rent/build, that it would be impractical to run H100s.
At Meta/Microsoft/OpenAI/AWS/Google scale, size of training cluster deployment are not constrained by capital but the constrained by how much readily available power are available at a single datacenter campus. When you pass the 10,000 H100 Cluster (18 MegaWatts), your target metric becomes “maximizing intelligence per picojoule” instead of “maximizing intelligence per dollar”. We will write more about this in a future analysis of limiting factors preventing larger training deployment.
Total Cost of Ownership
We group the total cost of ownership by major item items. Since Meta needs to pay NVIDIA for both the H100 and the Infiniband, NVIDIA’s revenue from this cluster is 791.5 million dollars (53.81% of TCO). Notice that electricity Cost is only 9.32% of TCO but as decided above, improving energy efficiency means you can deploy a larger cluster within the same amount of megawatts.
We also need to factor in the weighted average cost of capital (WACC) of 9.33%. The reason we need to factor this in even though Meta has more than enough money in this cash reserves is because Meta could instead be earning 4.5%/year by just keeping their money in risk free US treasury bills. There is also market risk premium and beta of Meta which factors into their WACC of 9.33%.
Before adding in cost of capital, we see that the TCO per H100 Hour of this cluster works out to $1.494/hour while $0.918/hour goes towards NVIDIA and $0.159/hour goes towards electricity costs.
If you include the cost of capital of $0.214/hour, the TCO per H100 hour goes up to $1.689/Hour.
Furthermore, we see 192.5 million dollars goes towards Infiniband networking. NVIDIA’s networking division has insane gross margins as if Meta switched to RoCEv2 Ethernet from Arista, like in their other 24k H100 cluster, they could easily be saving close to 70 millions dollars.
Thank you for joining us in this detailed exploration of Meta's 24k H100 cluster, covering everything from the BoM to the total cost of ownership. For more in-depth analyses and updates on the latest in tech and business, make sure to subscribe to our newsletter. Your support enables us to keep delivering high-quality content directly to your inbox. Stay tuned!
Update (May 4, 2024): Add Meta’s WACC Cost of Capital of 9.33% to the TCO
http://linkedin.com/in/rwangsf
Would love to connect on this