H100 GPU scarcity worsens: prices up, big labs control supply chains
Original: GPU shortage is worse than ever.
Deep summary
This post makes a market/access claim rather than a technical one: H100 GPU pricing has not declined from its 2022–2023 levels and on-demand cloud availability remains constrained, with the author asserting that hyperscalers and frontier AI labs have absorbed multi-year capacity commitments that crowd out academic and independent researchers. The concern is structural — if compute access concentrates among a handful of well-capitalized actors, the diversity of research directions, reproducibility, and the ability to independently verify lab claims all degrade.
The linked artifact does not appear to contain a paper, dataset, or benchmark — it is a social-media amplification of an infrastructure access concern. No architectural details, training runs, or experimental results are associated with the post. The claim rests on spot-market pricing observations and anecdotal availability data rather than a systematic study, so there is no methodology to evaluate in the conventional sense.
From a technical-ecosystem standpoint the concern is grounded. H100 SXM5 nodes were listing at roughly $25,000–$35,000/month per 8-GPU node on spot markets in mid-2023; reports through early-to-mid 2025 show sustained or higher pricing rather than the decay one would expect from normal capacity build-out. Reserved-instance commitments by hyperscalers (AWS, Azure, GCP) and direct OEM purchases by OpenAI, Google DeepMind, Meta, and xAI consume the majority of TSMC CoWoS packaging capacity, which is the binding constraint for HBM3-attached dies. AMD MI300X and Intel Gaudi 3 exist as alternatives but software ecosystem maturity (CUDA dependency in most training stacks) limits substitutability for most practitioners.
The broader implication for ML research is real: pretraining and even serious fine-tuning of frontier-scale models (70B+ parameters) requires thousands of H100-equivalent accelerator-hours that are prohibitively expensive or simply unavailable on-demand. University compute clusters (NAIRR pilot, NSF ACCESS) partially address this but are heavily oversubscribed. Open-weights models from Meta (Llama series) and Mistral partially decouple research from pretraining access, but evaluation, ablation, and post-training work still require substantial GPU time. The post does not quantify the shortage numerically or cite supply-chain data, which limits its evidential weight; it is better read as a practitioner sentiment signal than a rigorous market analysis.