David Thomson

CTO

David is the CTO of TechnologyChecker, responsible for the engineering and architecture behind the platform's crawling infrastructure. Before joining TechnologyChecker, he spent five years at Google on the Search team, where he worked on large-scale crawling and indexing systems that shaped his approach to building high-performance data infrastructure.

He oversees the detection systems that scan over 50 million domains monthly, ensuring accurate and timely identification of technology stacks across the web. His work focuses on scalable data pipelines, real-time processing, and maintaining detection accuracy across HTTP headers, JavaScript libraries, DNS records, and HTML patterns.

Based in Edinburgh, David is a devoted single malt whisky enthusiast when he's not architecting distributed systems.

Areas of Expertise

Scalable Data PipelinesReal-Time ProcessingWeb Crawling ArchitectureDistributed Systems

Credentials

  • MEng Computer Science, University of Edinburgh
  • AWS Solutions Architect Professional
  • Contributor, Open Source Crawling Frameworks

Achievements

  • Architected infrastructure processing 50M+ domain scans per month with 99.9% uptime
  • Reduced detection latency from hours to under 60 seconds for real-time alerts
  • Built a technology fingerprinting engine covering HTTP headers, JS libraries, DNS records, and HTML patterns
  • Analysed petabytes of Common Crawl data to power TechnologyChecker's historical technology adoption database
David Thomson

15+ years of experience

David’s Impact

Written from real expertise — every post adds first-hand know-how and original insights.

Articles
10
Total reads
1.3K
Hours of work
240h
Topics covered
10

Content effort estimates the hours David spent researching and creating each post. Every article is written in David’s field of expertise, so it carries first-hand know-how and unique insights — not commodity information rehashed from elsewhere.

Articles by David Thomson

AI Crawler Statistics in 2026: What AI Crawlers Actually Want?
Updated

AI Crawler Statistics in 2026: What AI Crawlers Actually Want?

Live Cloudflare Radar data: training is what AI crawlers hunt most, and its share of AI crawl jumped from 29% to 45% year over year. Shopping sites get crawled hardest. What AI bots want from your site, verified.

David ThomsonDavid Thomson
Bot Traffic Statistics 2026: How Much of the Web Is Bots?
Updated

Bot Traffic Statistics 2026: How Much of the Web Is Bots?

Live Cloudflare Radar data: bots are 35% of web traffic and Anthropic now out-crawls OpenAI and Meta. Year-over-year, crawler block rates jumped from 10% to 36% since 2025. Verified bot statistics.

David ThomsonDavid Thomson
Web Traffic Statistics Q2 2026: We Analyzed Billions of Requests - Here Are the 15 Numbers That Matter (July 2026 Update)
Updated

Web Traffic Statistics Q2 2026: We Analyzed Billions of Requests - Here Are the 15 Numbers That Matter (July 2026 Update)

Our Q2 2026 refresh of Cloudflare Radar web traffic data, with year-over-year comparisons to Q2 2025. 33.2% of all HTTP requests are now bots, up from 30.4% a year ago. ECDSA certificates crossed RSA (50.3% of new certs, nearly double last year's 26.5%), Googlebot's share of AI-bot traffic more than halved as ClaudeBot rose to #2, UDP now drives 82.6% of network-layer attacks, and compromised-credential logins fell to 57.4%. All-traffic shows desktop leading, but human-only traffic is still mobile-majority. Here are the numbers that matter.

David ThomsonDavid Thomson
DMARC adoption statistics 2026: 89% of emails pass DMARC but 14.5% still fail SPF (July 2026 Update)
Updated

DMARC adoption statistics 2026: 89% of emails pass DMARC but 14.5% still fail SPF (July 2026 Update)

DMARC adoption analysis from Cloudflare Radar, refreshed for the Q2 2026 close with full-quarter data and the first year-over-year layer. Adoption kept improving (DMARC 'NONE' fell to 5.70%), but inbound spoof attempts doubled to 24.3% and malicious mail tripled to 18.7% while bulk spam fell, as attackers shifted from volume to targeted impersonation. Encryption in transit slipped to 88.2% and deprecated TLS climbed to 51.1%. Full Q2 scorecard, the year's most-abused sending TLDs, and what a full year changed.

David ThomsonDavid Thomson
We analyzed HTTP protocol adoption in 2026: 27.8% of the web still runs HTTP/1.x (July 2026 Update)
Updated

We analyzed HTTP protocol adoption in 2026: 27.8% of the web still runs HTTP/1.x (July 2026 Update)

At the Q2 2026 close HTTP/1.x holds 27.8%, HTTP/2 leads at 51.2%, and HTTP/3's plateau is now a full year old (20.5% to 21.0% year over year). Our Cloudflare Radar refresh adds the year-over-year picture: post-quantum crypto crossed the majority of requests (28.9% to 54.2%) while barely reaching origins (9.6%), the Russia anomaly reverted exactly as predicted, and the Netherlands overtook Singapore as the HTTP/1.x leader.

David ThomsonDavid Thomson
We Analyzed 8.7B SSL Certificates: ECDSA Overtook RSA in May 2026 (July 2026 Update)
Updated

We Analyzed 8.7B SSL Certificates: ECDSA Overtook RSA in May 2026 (July 2026 Update)

We process ~80% of global CT logs. ECDSA overtook RSA in May 2026 and held the majority across the full Q2 (50.3%, up from 26.5% a year earlier). The Q2 2026 close with year-over-year shifts: the 200-day certificate cliff (10.8% to 0.10%), Amazon's CA surge, IP-address certificates, and what certificates reveal about backend tech stacks.

David ThomsonDavid Thomson