AI Data Brokers
How a Quiet Industry Became the Backbone of Artificial Intelligence
Most people think AI is trained on the internet. Articles. Books. Social media posts. That is only part of the story.
A much larger and quieter industry feeds AI systems the data that actually matters. Behavioral data. Location data. Purchase histories. Demographic profiles. Risk scores. Entire digital lives.
These companies are data brokers. They have existed for decades. AI just made them powerful.
They do not build models. They do not deploy products. But they decide what AI systems learn about people. And they do it almost entirely out of public view.
What Data Brokers Actually Do
A data broker is a company that collects personal information, combines it, and sells access to it. That data comes from many places. Websites. Mobile apps. Loyalty programs. Public records. Purchased lists from other brokers (UTK 2025; TechTarget 2024).
Most people never interact with these companies directly. There is no account. No consent screen. No clear opt out.
Brokers assemble profiles that include where you live, what you buy, where you travel, who you resemble statistically, and how you are likely to behave. Those profiles are then sold to advertisers, insurers, lenders, political campaigns, and now AI developers (Gartner 2025; Proton 2025).
You are not the customer.
You are the product.
Why AI Changed Everything
Traditional data brokerage focused on targeting. Ads. Credit risk. Insurance pricing. AI introduced a new demand. Training data.
Modern AI systems need structured, real world information to learn how people behave. Public text is not enough. Models improve when they ingest detailed behavioral datasets. That is where brokers come in.
A new class of AI focused data brokers now packages curated datasets for machine learning. Some provide labeled data for fine tuning models. Others specialize in industry specific datasets like healthcare, finance, or customer behavior (RadHash 2024).
These companies sit between human life and machine intelligence. They decide what patterns models see.
That is real power.
Real Examples Hiding in Plain Sight
Some data brokers are large and familiar. Acxiom. Experian. Equifax. They maintain massive consumer databases used across industries (Webopedia 2025).
Others operate quietly. Mobilewalla collects location data from mobile devices and builds detailed movement profiles. Reports show this data has been used to track political activity and sensitive populations (Mobilewalla 2024).
Then there are brokers built specifically for AI. They do not advertise to consumers. They sell datasets directly to companies training models. Their customers are enterprise teams. Their products are training pipelines.
You will never see their names.
But their data shapes AI behavior.
Consent Is Mostly Fiction
In the United States, there is no comprehensive federal law regulating data brokers. Thousands of firms operate with minimal oversight. Most people do not know who holds their data or how it is used (Kaspersky 2025).
Even when privacy laws exist, compliance is weak. A recent study of California’s CCPA found that many brokers failed to honor access or deletion requests. Some demanded additional personal data just to process a request, creating new risks (van Kempen et al. 2025).
The system assumes people will protect themselves.
But it hides the map.
When Data Becomes a Security Issue
The risks go beyond advertising.
Data brokers now hold information that matters at a national level. Genetic data. Biometric records. Movement histories. Behavioral predictions.
When 23andMe entered bankruptcy, experts warned that its genetic database could be sold as an asset. That data is valuable for medical AI research. It is also dangerous if misused by insurers, employers, or governments (Time 2025).
Clearview AI already demonstrated how scraped biometric data could be weaponized. European regulators filed criminal complaints over unauthorized facial recognition databases used by law enforcement and private clients (Reuters 2025).
AI makes these datasets more valuable.
And more dangerous.
Why This Industry Shapes the Future
AI systems increasingly make decisions that affect real lives. Hiring. Lending. Healthcare. Policing. Insurance.
The behavior of those systems depends on the data they learn from. Data brokers control that input layer. They influence which patterns are reinforced and which disappear.
This is not neutral.
It is not transparent.
And it is barely regulated.
A small number of companies now sit between people and the systems that judge them. Most voters have no idea this layer exists.
What Comes Next
Regulation will eventually arrive. But slowly. And unevenly.
Until then, AI data brokers will continue to grow in influence. They will shape models without public accountability. They will monetize personal lives at machine scale. And they will quietly decide how AI understands humanity.
The future of AI will not be determined only by labs or algorithms.
It will be shaped by whoever controls the data.
Right now, that is an industry almost no one is watching.
References
UTK. (2025). What are data brokers. University of Tennessee Knowledge Base.
TechTarget. (2024). Data broker definition.
Gartner. (2025). Data brokerage market overview.
Proton. (2025). What data brokers know about you.
RadHash. (2024). AI training data marketplaces.
Webopedia. (2025). Major data broker companies.
Mobilewalla. (2024). FTC and investigative reporting summaries.
Kaspersky. (2025). Data broker privacy risks.
van Kempen, M. et al. (2025). CCPA compliance analysis.
Reuters. (2025). EU criminal complaints against Clearview AI.
Time. (2025). 23andMe bankruptcy and genetic data concerns.





The creepiest part of AI isn’t just the model — it’s what happens when the layer that interprets human intent, meaning, and accountability disappears into opaque data flows and invisible systems. This piece explores how AI can point to the absence of a meta-layer for observability and explanation, and why that matters for trust, governance, and long-term alignment. https://northstarai.substack.com/p/ai-spoke-of-a-meta-layer-in-its-own