The storage and cycling of soil organic carbon (SOC) are governed by multiple co-varying factors, including climate, plant productivity, edaphic properties, and disturbance history. Yet, it remains unclear which of these factors are the dominant predictors of observed SOC stocks, globally and within biomes, and how the role of these predictors varies between observations and process-based models. Here we use global observations and an ensemblenof soil biogeochemical models to quantify the emergent importance of key state factors – namely, mean annual temperature, net primary productivity, and soil mineralogy – in explaining biome- to global-scale variation in SOC stocks. We use a machine-learning approach to disentangle the role of covariates and elucidate individual relationships with SOC, without imposing expected relationships a priori. While we observe qualitatively similar relationships between SOC and covariates in observations and models, the magnitude and degree of non-linearity vary substantially among the models and observations. Models appear to overemphasize the importance of temperature and primary productivity (especially in forests and herbaceous biomes, respectively), while observations suggest a greater relative importance of soil minerals. This mismatch is also evident globally. However, we observe agreement between observations and model outputs in select individual biomes – namely, temperate deciduous forests and grasslands, which both show stronger relationships of SOC stocks with temperature and productivity, respectively. This approach highlights biomes with the largest uncertainty and mismatch with observations for targeted model improvements. Understanding the role of dominant SOC controls, and the discrepancies between models and observations, globally and across biomes, is essential for improving and validating process representations in soil and ecosystem models for projections under novel future conditions.