Open-Source AI Adoption 2026: 5.6M Projects vs Real Deployment

Stanford counts 5.6M open-source AI projects. We scanned 50M+ domains for what's deployed: Botpress leads at 1,558 vs 52,682 for the closed OpenAI API.

Published 17 min read

Open-Source AI Adoption 2026: 5.6M Projects vs Real Deployment
Share:

Open-source AI adoption is enormous in supply and concentrated in production. Stanford counts 5.6 million open-source AI projects, and 63% of organizations say they use open-source AI in some form — yet across the 50 million live domains we scan, the leading open-source AI framework in production, Botpress, runs on 1,558 sites, against 52,682 for the closed OpenAI API. That ratio, roughly 34 to 1, is the real shape of open-source AI adoption in 2026.

I run the detection infrastructure at TechnologyChecker, so when Stanford's 2026 AI Index reported 5.6 million open-source AI projects on GitHub and Hugging Face — with model uploads tripling since 2023 — I pointed our crawler at the open-source stack to see which of those projects actually run in production. The picture is sharper, and more useful, than the raw project count suggests: a huge, fast-growing supply, and a visible production layer led by one open-source framework against a field of closed APIs.

This isn't a hype-versus-reality story. The open-source AI commons is real, growing fast, and now drawing billion-dollar commitments. It's a story about where open-source AI runs, who actually deploys it, and how that compares to the closed alternatives most of the web reaches for first.

Key findings — our May 2026 detection scan (50M+ domains), the Stanford AI Index 2026, and recent reporting:

  • Stanford counts 5.6M open-source AI projects on GitHub and Hugging Face — yet only 206,880 (3.7%) have 10 or more stars, and HF model uploads have more than tripled since 2023.
  • Botpress is the most-deployed open-source AI framework we detect: 1,558 active domains, matched to 899 companies (a 57.7% LinkedIn match rate).
  • Open-source AI deployment is an indie phenomenon: 64% of those companies have 10 or fewer employees; 81% have fewer than 50.
  • Closed APIs still dominate the visible web: OpenAI is detectable on 52,682 domains — about 34× Botpress — and MIT Sloan finds closed models take roughly 80% of all model usage.
  • The momentum is open: Hugging Face passed 2 million public models, and in May 2026 IBM and Red Hat pledged $5 billion to open-source AI.

Vast warehouse of identical sealed crates with one opened and glowing teal, illustrating millions of open-source AI projects versus the few actually deployed

What counts as open-source AI — and what did we scan for?

One open door spilling teal light beside a padlocked closed door, representing open-source versus closed AI

"Open-source AI" covers two overlapping things. First, open models — weights you can download and run yourself, like Llama, Mistral, DeepSeek, and the open checkpoints on Hugging Face. Second, open tooling — the frameworks that wire those models into products: Hugging Face's transformers, LangChain and LlamaIndex for orchestration, Ollama for running models locally, and Gradio or Streamlit for quick app front-ends. Closed AI is the opposite end: hosted APIs from OpenAI, Anthropic, and Google that you call but can't inspect or self-host.

Dimension Open-source AI Closed AI
Examples Botpress, Llama, Mistral, DeepSeek OpenAI, Anthropic, Google Gemini
How you run it Self-host or fork Call a hosted API
Performance ~90% of closed at release (MIT) Leading edge
Share of model usage ~20% ~80% (MIT)
What we detect Botpress on 1,558 domains OpenAI on 52,682
Best for Control, cost, privacy Speed to ship

Our detection engine fingerprints what a site actually runs by reading its public signals — HTTP response headers, JavaScript imports, DNS records, TLS certificates, and HTML patterns — across more than 50 million domains. It's the same method behind our work on the most popular platforms by category and which AI crawlers sites block in robots.txt. So the test was simple: search 50 million live domains for each open-source AI signature, and count where it's genuinely in production. We don't count GitHub stars or download tallies. We count deployments.

Does 5.6 million open-source AI projects mean mass adoption?

Iceberg with a small glowing tip above the waterline and a huge mass submerged below, representing millions of projects versus tiny visible deployment

The number everyone is quoting from the Stanford AI Index 2026 counts creation: AI-related repositories pushed to GitHub, models and datasets uploaded to Hugging Face. Stanford puts the count of AI-related GitHub projects at roughly 5.6 million in 2025 — up from 1,549 in 2011, and still accelerating at 23.7% year over year.

Stanford AI Index 2026, Figure 1.5.1 — the number of AI-related GitHub projects rose from 1,549 in 2011 to 5.58 million in 2025 (Source: Stanford AI Index 2026, data from GitHub)

But a repository is not a deployment, and Stanford's own figures show how concentrated the real engagement is. Filter for projects with at least 10 stars — a low bar for community interest — and the 5.6 million narrows to 206,880, about 3.7% of the total. The report is blunt about why: most repositories "consist of personal or experimental work and receive minimal attention." Hugging Face tells the same story from the model side — roughly half of all models on the hub have fewer than 200 total downloads, and the top 0.01% account for 49.6% of all downloads, per Hugging Face's spring 2026 state-of-open-source report (which also clocked 13 million users, more than 2 million public models, and verified accounts from over 30% of the Fortune 500).

The upload curve is what makes this market move. Hugging Face model uploads more than tripled between 2023 and 2025, reaching 332,000 in a single quarter, while dataset uploads grew fourfold.

Stanford AI Index 2026, Figure 1.5.6 — quarterly Hugging Face model uploads more than tripled from 2023 to 2025 to reach 332,000, with datasets reaching 153,000 (Source: Stanford AI Index 2026, data from Hugging Face)

A fresh upload or a Fortune 500 sign-up means someone created something — not that it ships to customers. That's the trap with star counts and upload charts: LangChain and Ollama are among the most-starred AI repositories anywhere, which tells you about developer interest, not production adoption. The question that matters for anyone selling into this market or forecasting it is simpler: how much is actually running in front of real users? That needs deployment data, not supply data.

Are AI agents behind the open-source AI boom?

Robotic arms piling crates onto a huge stack while a small teal-lit shipping doorway at the base stays the same size, showing AI agents inflating open-source supply faster than deployment

The reason the project count is exploding is no mystery: AI agents now write a fast-growing share of open-source code. GitHub's Octoverse 2025 report clocked a new developer joining roughly every second — 36 million arrived in 2025, led by India — and about 60% of the year's fastest-growing projects were AI-focused, names like cline/cline, vllm-project/vllm, and astral-sh/uv. More than 1.1 million public repositories now pull in a large-language-model SDK, up 178% year over year.

Agents don't just spin up new repos; they commit to existing ones. An independent analysis of 40.3 million public pull requests from 2022 to 2025 found AI agents now take part in 14.9% of them. The agent frameworks are themselves open-source and increasingly self-authored — Aider's maintainers report that more than 70% of Aider's own code is now written by Aider. Anthropic's 2026 Agentic Coding Trends Report puts it plainly: the primary human role in building software is becoming orchestration — directing agents and judging their output.

Most of these agents are open-source projects you can clone today:

Open-source coding agent Org / repo What it does
OpenHands All-Hands-AI/OpenHands Full agent environment — browses, runs a shell, opens PRs
SWE-agent princeton-nlp/SWE-agent Research agent that resolves GitHub issues (SWE-bench)
Aider Aider-AI/aider Git-native terminal pair programmer; 70%+ self-written
Cline cline/cline VS Code agent with per-step human approval
OpenCode sst/opencode Terminal-native, provider-agnostic
Goose block/goose Block's extensible, MCP-native agent
Codex CLI / Gemini CLI openai/codex, google-gemini/gemini-cli Vendor-built open-source terminal agents

Here's why this matters for the gap we measured. Every one of these agents accelerates supply — more repos, more model uploads, more merged pull requests — yet they run in terminals, CI pipelines, and editors, never on a public root domain. So AI agents widen the divide this post is about: the open-source project count is inflating faster than ever, while the slice that reaches detectable production grows far more slowly. GitHub maintainers even have a name for the downside — "AI slop," the flood of low-quality auto-generated contributions that now takes more time to review than to produce. A growing share of those 5.6 million projects were never going to ship. Increasingly, they weren't written by a human either.

Why can't web crawlers see most open-source AI?

Server towers behind a frosted glass wall with only a chat bubble visible, showing how backend open-source AI runs server-side

Web detection reads what a site reveals to a browser — HTTP headers, JavaScript, DNS, TLS, and HTML. That puts a whole class of open-source AI out of view by design. transformers, LangChain, LlamaIndex, and Ollama are Python and TypeScript libraries that run inside an application's backend. When a company builds a support agent on LangChain calling an Ollama-hosted model, the visitor's browser sees a chat widget and some HTML — never the orchestration underneath. There's no header, script tag, or DNS record that announces "LangChain ran here." The same is true for backend automation generally, which is why detecting self-hosted tools takes specialized methods — a problem we dug into in our guide on finding companies that run n8n.

Fishbone diagram of why open-source AI is invisible to web detection: backend libraries with no browser signature, platform-subdomain hosting, AI agents running in terminals and CI, and orchestration hidden behind a chat widget

App frameworks have the same blind spot for a different reason: tools like Streamlit publish to platform subdomains such as *.streamlit.app rather than the root domains a crawler indexes. So this analysis measures open-source AI through the browser-facing signatures that are observable — conversational frameworks like Botpress and the closed-API widgets they compete with. It's a deliberately conservative lens: it captures the production layer that touches real users and treats server-side infrastructure as out of scope rather than guessing at it. When those backend systems start exposing public endpoints, our crawler will see them — and we'll report it.

How much open-source AI is actually deployed? Botpress leads at 1,558 domains

Open-Source AI in Production 2026: Botpress 1,558 vs OpenAI's 52,682

Across TechnologyChecker's May 2026 scan of 50M+ live domains, Botpress is the most-deployed open-source AI framework, detected on 1,558 domains. The closed OpenAI API leads the detectable AI web at 52,682 domains — roughly 34 times more — and Google's closed Dialogflow trails the open-source leader at 1,349. The figures cover browser-observable production signatures; backend AI libraries that run server-side are measured separately.

Source: TechnologyChecker detection database (50M+ domains) · May 2026 crawl

Open-Source AI in Production 2026: Botpress 1,558 vs OpenAI's 52,682
AI toolActive domains (production)
OpenAI (closed API)52682 domains
Botpress (open source)1558 domains
Dialogflow (closed)1349 domains
  • Botpress is the most-deployed open-source AI framework we detect: 1,558 active domains
  • The closed OpenAI API leads the detectable AI web at 52,682 domains — about 34x Botpress
  • Google's closed Dialogflow (1,349) trails the open-source leader despite enterprise distribution

The clearest read on real open-source AI deployment is Botpress, the open-source conversational-AI platform. We detect it active on 1,558 domains, and 899 of those — a 57.7% match rate — resolve to a company in our firmographic data. That's a large enough sample to say something true about who actually ships open-source AI to the open web.

The chart puts that in context against the closed field. Botpress (1,558) runs neck-and-neck with Google's closed Dialogflow (1,349), while the OpenAI API towers over both at 52,682 — about 34 times Botpress. The visible production layer, in other words, is a contest between one open-source framework and a field of closed APIs, and the closed APIs are winning it on convenience: a hosted snippet is the fastest way to put AI on a website.

Who actually deploys open-source AI?

Open-Source AI Adoption by Company Size 2026: 64% Have 10 or Fewer Staff

Open-source AI adoption in production is overwhelmingly a small-team phenomenon. Of the 899 companies TechnologyChecker matched to a live Botpress deployment (May 2026), 64% have 10 or fewer employees and 81% have fewer than 50. Only about 5% are enterprises of 1,000+ staff. Botpress is the lone open-source AI framework with a detectable footprint at scale; Hugging Face, LangChain and Ollama leave no web-observable signature.

Source: TechnologyChecker detection database (50M+ domains) · May 2026 crawl

Open-Source AI Adoption by Company Size 2026: 64% Have 10 or Fewer Staff
Company size (employees)Share of detected deployments
1–10 employees64%
11–5017%
51–2009%
201–5003%
501–1,0002%
1,000+5%
  • 64% of companies running open-source AI in production have 10 or fewer employees
  • 81% have fewer than 50 staff — open-source AI deployment is an indie and small-agency phenomenon
  • Only ~5% are enterprises of 1,000+ employees, the inverse of the closed-API adoption curve

The company profile behind those 1,558 Botpress deployments is the opposite of the enterprise AI narrative. Nearly two-thirds (64%) of the matched companies have 10 or fewer employees, and 81% have fewer than 50. Only about 5% are enterprises of 1,000-plus staff. This is a builders-and-agencies phenomenon: indie developers, boutique automation shops, and small software teams that self-host because they can, want control over their stack, and don't want to pay per-seat for a closed platform.

Geography tells the same story. The United States leads at roughly 23% of detected deployments, but the next two — Spain at 6.9% and India at 6% — both over-index their usual share of B2B software adoption. Those are exactly the markets where self-hosted, no-license-fee tooling wins on cost.

Open-Source AI Adoption by Country 2026: US Leads, Spain and India Over-Index

Production open-source AI adoption is globally distributed and cost-driven. Of companies TechnologyChecker matched to a live Botpress deployment (May 2026), about 23% are US-based, followed by Spain (6.9%) and India (6%) — both over-indexing their usual share of B2B SaaS adoption, exactly the markets where self-hosted, no-license-fee tooling wins on cost. The UK, France, Canada and Germany round out the top seven.

Source: TechnologyChecker detection database (50M+ domains) · May 2026 crawl

Open-Source AI Adoption by Country 2026: US Leads, Spain and India Over-Index
CountryShare of detected deployments
United States23%
Spain6.9%
India6%
United Kingdom4.5%
France3.8%
Canada3.1%
Germany3%
  • The US leads open-source AI deployment at ~23% of detected companies
  • Spain (6.9%) and India (6%) over-index versus their usual B2B SaaS share — self-hosting wins on cost
  • No single country dominates: the top 7 account for roughly half of all detected deployments

The industry mix closes the loop. IT services and consulting lead at 8.4%, followed by technology and internet (6.9%), higher education (4.7%), and software development (4.4%). These are the firms that build chatbots for other people and need a framework they can fork and host — plus the research and teaching labs that self-host on principle. The companies deploying visible open-source AI are, overwhelmingly, the ones who write software for a living.

Open-Source AI Adoption by Industry 2026: IT Consultancies Lead Deployment

The industries deploying open-source AI in production are the ones that build software for others. Of companies TechnologyChecker matched to a live Botpress deployment (May 2026), IT services and consulting leads at 8.4%, followed by technology and internet (6.9%), higher education (4.7%), and software development (4.4%). The pattern fits the firmographics: small technical shops and agencies that fork and self-host a framework for client chatbots, rather than enterprises buying a closed platform.

Source: TechnologyChecker detection database (50M+ domains) · May 2026 crawl

Open-Source AI Adoption by Industry 2026: IT Consultancies Lead Deployment
IndustryShare of detected deployments
IT Services & Consulting8.4%
Technology & Internet6.9%
Higher Education4.7%
Software Development4.4%
Marketing Services3.7%
Business Consulting2.9%
Real Estate2.6%
Financial Services2.6%
  • IT services and consulting lead open-source AI deployment at 8.4% of detected companies
  • Software-building industries dominate: tech, higher education, and software development fill the top four
  • Higher education's 4.7% share reflects research and teaching labs self-hosting open frameworks

It fits Stanford's other 2026 finding neatly: organizational AI adoption is near-universal in surveys, but agentic, build-it-yourself deployment is still in the single digits. The agents are being built by small teams, one client at a time.

Are open models catching up to closed AI?

Open-Source AI Model Market 2025-2026: $19B Climbing to $23B

The open-source AI model market is projected to grow from $19.05 billion in 2025 to $23.08 billion in 2026, a roughly 21% year-over-year increase. The expansion is driven by enterprise demand for vendor-neutral solutions, regulatory pressure for transparency, and the rise of edge and on-device AI. The market value sits alongside Linux Foundation findings that 63% of organizations already use open-source AI and 89% use some open source in their AI stack — even though that adoption is nearly invisible to web-based technology detection.

Source: Market.us · 2025-2026

Open-Source AI Model Market 2025-2026: $19B Climbing to $23B
YearOpen-source AI model market (USD billions)
2025$19.05B
2026 (projection)$23.08B
  • Open-source AI model market grows from $19.05B (2025) to $23.08B (2026), ~21% year over year
  • Growth is driven by demand for vendor-neutral models, transparency rules, and edge AI
  • 63% of organizations already use open-source AI, per Linux Foundation research

Three independent lenses point at the same picture, and that's what makes it credible. Survey data says adoption is broad: the Linux Foundation's research on the economic impacts of open-source AI found that 89% of organizations use some open source in their AI stack and 63% use open-source AI specifically. Usage data says it's narrower than the surveys imply: MIT Sloan reports that open models reach about 90% of closed-model performance at release, yet users still pick closed models from OpenAI, Anthropic, and Google for roughly 80% of actual model usage. And our web detection shows the visible production footprint concentrated in one open-source framework on 1,558 domains.

You can watch the usage side of that gap move in real time. Cloudflare's 1.1.1.1 resolver tracks which generative-AI services the world actually reaches for — and the leaderboard stays dominated by closed assistants, even as open-weight challengers like DeepSeek climb it.

Live: most-used generative AI services by 1.1.1.1 DNS traffic. Refreshed daily — a real-world proxy for which AI tools people actually use. Watch for open-weight players like DeepSeek rising against the closed assistants that lead the ranking.

The money is flowing into open source regardless. The open-source AI model market is projected to grow from $19.05 billion in 2025 to $23.08 billion in 2026, about 21% year over year, per market sizing reported by Yahoo Finance. The throughline across all three lenses is consistent: open-source AI is adopted broadly but lightly, used heavily by a technical minority, and deployed visibly by a focused set of teams — while the easiest path to "AI on your website" in 2026 is still pasting a closed-API snippet, and most of the web takes the easy path.

Where is open-source AI adoption heading?

Road-style timeline of open-source AI momentum from 2025 to 2026: Hugging Face passes 2 million models, AI agents in GitHub pull requests, MIT finds open models near parity, Linux Foundation expands agent standards, IBM and Red Hat commit $5B

The closed-API lead is real today, but the people building these systems expect it to narrow. Hugging Face CEO Clément Delangue calls the current pattern a transition, not the destination. "Companies are using API because they haven't built yet the capabilities, the trust, the ability to do AI themselves," he told Acquired, predicting that thousands of companies will eventually build and run their own models rather than rent a handful of closed ones.

The capital and the standards are lining up behind that thesis. In May 2026, IBM and Red Hat pledged $5 billion to open-source AI tooling and infrastructure, calling open source "the catalyst for innovation" that lets businesses avoid vendor lock-in, as reported by American Bazaar. The same month, the Linux Foundation's Agentic AI Foundation added 43 members as enterprises and governments standardized on open agent protocols. MIT Sloan's finding that open models already reach about 90% of closed-model performance removes much of the technical excuse to stay closed.

One leading indicator is already measurable on the open web: how fast sites are adopting the open standards those agents read — llms.txt, ai.txt, and agent manifests.

Live: AI-agent standards adoption. Real-time adoption of llms.txt, ai.txt, and similar agent-readable conventions across the top 200K domains — the open scaffolding the next wave of AI agents will read.

None of the self-hosted backend deployments behind that scaffolding show up in a web crawl yet. But it's the leading edge of the next adoption curve — and when those self-built, self-hosted systems start shipping to real users, our detection is where the shift will first become measurable.

What the deployment gap means for builders, sellers, and analysts

For three audiences, the project-versus-deployment gap has concrete consequences:

  • If you're building an open-source AI product: your real competition for deployment is the one-line closed-API snippet, not the other open-source frameworks. The teams who self-host are small, technical, and cost-sensitive — price, forkability, and good self-host docs matter more than enterprise features. The Botpress profile is your addressable market.
  • If you sell into AI builders: GitHub stars and Hugging Face downloads are vanity metrics for prospecting. Half of HF models see fewer than 200 downloads. Detectable signatures, plus firmographics, get you closer to companies that have actually shipped. For the macro picture, see our AI adoption trends analysis and 2026 technology forecasts.
  • If you're forecasting the market: don't read "5.6 million projects" as a deployment curve. Supply is exploding while the visible production footprint stays focused and concentrated. Those are two different curves, and conflating them is how analysts overstate near-term adoption every cycle.

Methodology and data notes

  • Source: TechnologyChecker's proprietary detection database — 50M+ domains, multi-signal fingerprinting (HTTP headers, JavaScript signatures, DNS, TLS, HTML patterns). Figures reflect the May 2026 crawl and count only active detections — a current, live signature on the domain.
  • Firmographics: company size, industry, and country come from joining detected domains to LinkedIn company records. The Botpress sample is 899 matched companies of 1,558 detected domains (57.7% match rate). United States totals combine casing variants in the source data.
  • Scope: web-crawl detection sees signatures present in a domain's public response on its root domain. Backend Python/TypeScript libraries that never reach the browser (Hugging Face transformers, LangChain, LlamaIndex, Ollama) and apps hosted on platform subdomains (Streamlit) run server-side, so we measure them separately rather than reporting web-deployment counts here. Read the deployment figures as comparative and directional — a read on the browser-facing production layer, not a census of all open-source AI.
  • Why we don't plot a monthly trend: our crawl is monthly with varying coverage, so per-month deltas reflect crawl batches, not real installs and removals. The cumulative active counts here are the reliable figures.
  • External figures: the GitHub project count (5.6M; 206,880 with ≥10 stars) and Hugging Face upload growth are from the Stanford HAI AI Index 2026, §1.5 "Open-Source AI Software" (Figures 1.5.1 and 1.5.6, reproduced above); Hugging Face hub statistics from Hugging Face's spring 2026 state-of-open-source report; the AI-agent figures from GitHub Octoverse 2025 and an independent 40.3M-PR analysis; adoption survey figures from Linux Foundation Research; the closed-usage share from MIT Sloan; the Clément Delangue quote from his Acquired interview; the IBM and Red Hat pledge as reported by American Bazaar (May 28, 2026); market sizing as reported by Yahoo Finance. The two Stanford charts are reproduced from the publicly available AI Index 2026 report for reference.

Frequently asked questions

Is open-source AI actually being adopted in 2026? Yes — massively at the level of projects (Stanford counts 5.6 million on GitHub and Hugging Face, though only 3.7% have 10 or more stars) and broadly in surveys (63% of organizations report using open-source AI, per the Linux Foundation). In visible web production, the most-deployed open-source AI framework we detect across 50M+ domains, Botpress, appears on 1,558 sites, versus 52,682 for the closed OpenAI API.

What's the most-deployed open-source AI framework on the web? Botpress, the open-source conversational-AI platform, leads the open-source frameworks we can detect at 1,558 active domains across our 50M-domain scan, matched to 899 companies.

Who actually deploys open-source AI? Small, technical teams. About 64% of companies running Botpress in production have 10 or fewer employees, and 81% have fewer than 50. IT consultancies, software shops, and agencies — disproportionately in the US, Spain, and India — dominate the mix.

Why do closed APIs like OpenAI show up on so many more sites? Convenience. A hosted API ships a browser-facing snippet you can paste onto a page in minutes, while open-source frameworks have to be self-hosted by a technical team. MIT Sloan finds closed models still take roughly 80% of model usage even though open models reach about 90% of their performance — the gap is effort, not capability.

Are open models catching up to closed ones? On capability, largely yes: MIT Sloan reports open models reach about 90% of closed-model performance at release. Adoption is following more slowly, but the investment is moving — IBM and Red Hat pledged $5 billion to open-source AI in May 2026, and Hugging Face has passed 2 million public models.


Want to know which companies run a specific AI stack right now? TechnologyChecker detects 40,000+ technologies across 50M+ domains, with firmographics and real-time alerts. Start with our 2026 technology trends report or browse the most popular platforms by category.