Data Labeling Businesses, The Refineries Behind AI’s Gold Rush, Are Cashing In

If data centers are the gold mine and the data is the gold, then data labelers are the refineries making that gold shine. Data labeling — the unglamorous work of teaching AI models what’s what — has quietly become one of the fastest-growing areas of AI. And it’s ushering in a new kind of digital labor.
Not Hotdog: While tech giants burn billions racing to build better AI models, data labeling companies handle the tedious work of organizing, annotating, and validating the massive datasets these systems need. Mercor, a startup connecting companies like OpenAI and Meta with domain experts to train AI models, is approaching $450M in annualized run-rate revenue less than two years after its founding — and is now eyeing a $10B valuation. The surge in demand stems from a fundamental problem with large language models (LLMs), which train on petabytes of scraped internet data that’s riddled with bad info that lacks specialized domain knowledge.
- Mercor connects domain experts (such as doctors and lawyers) with AI labs like OpenAI, which recently hired 100+ ex-bankers to train its models on financial tasks.
- UberUBER is joining the trend, paying drivers for “digital tasks” like uploading menus or recording audio between rides, with earnings varying based on time commitment.
Train a Robot To Fold, And You Get Clean Laundry For a Lifetime
Meta’s $14.3B investment for a 49% stake in Scale AI earlier this summer sent shockwaves through the industry, validating data labeling as mission-critical infrastructure. The AI training dataset market is projected to balloon from $1.64B in 2023 to $14.42B by 2033. But the future of data training extends beyond text-based models — the real explosion is coming from demand for robotics training data. Unlike LLMs, which can scrape the internet for training material, robots need humans to film themselves performing basic tasks like loading dishwashers, folding laundry, or making espresso.
- Startups like Encord, Micro1, and Scale AI are seeing massive demand surges, with Encord reporting 4x more volume for robotics data compared to last year.
- Pay rates range from $25-$50/hour for simple tasks, reaching up to $150 an hour for highly technical work like handling surgical equipment.
Four decades of success: InnodataINOD, a 37-year-old publicly traded data engineering company, has become Wall Street’s unlikely AI darling with a nearly 300% run over the past year — now sporting a $2.4B market cap on the back of 79% year-over-year revenue growth. Management is positioning aggressively for the robotics wave, stating in their latest earnings report that agentic AI is driving significant advancements in robotics — and that the market for simulation training data and evaluation services could ultimately surpass the current LLM training data market. After all, someone still has to teach the robots how to learn.