◉ ◉

Methodology

How we measure the Dead — AI-generated content and bot activity across the web.


Overview

The Dead Internet Monitor tracks two distinct phenomena: the creation of AI-generated content (“AI Slop”) and the consumption of content by automated accounts (“AI Slurp”). We sample content from major platforms, classify it using large language models, and analyse author behaviour for bot-like patterns.

Our goal is not perfect accuracy — which remains elusive even for specialised detectors — but consistent, transparent measurement of trends over time.

+-----------------------+
|      COLLECTION       |  7 sources, staggered
+-----------+-----------+
            |
      +-----+-----+
      v           v
+----------+ +----------+
| CLASSIFY | |BOT DETECT|  parallel
|  (LLM)   | |(7-signal)|
+----+-----+ +----+-----+
     |            |
     +------+-----+
            v
+-----------------------+
|      AGGREGATION      |  DII + Autopsy Matrix
+-----------+-----------+
            v
+-----------------------+
|       DASHBOARD       |  deadinternetmonitor.com
+-----------------------+
The classification pipeline.

Data Collection

Content is collected from seven sources on staggered schedules. The headline Dead Internet Index weights each source equally regardless of volume, so no single platform dominates the aggregate.

SourceTypeSchedule~Items/run
RedditSocialDaily~1,500
HackerNewsTech forumDaily~1,200
YouTubeComments3×/week~1,200
MastodonFediverseDaily~200
BlueskySocial3×/week~200
Stack OverflowQ&AWeekly~350
LobstersTech forumWeekly~1,000

Reddit uses public RSS feeds (comments only, no API credentials required). Mastodon and Lobsters serve as control groups — decentralised or invite-only platforms with lower bot incentive.


Classification

Each item is classified by a large language model using a structured prompt (v3.0) that applies Bayesian calibration with platform-specific base rates derived from Ahrefs and Originality.ai research. This counters the documented tendency of LLM classifiers to default to “human” ( RAID 2024 found 10–15% false negative rates).

Models

RoleModelProvider
PrimaryGemini 2.5 Flash LiteGoogle
FallbackClaude Haiku 4.5Anthropic

Fallback triggers when primary confidence is below 0.5.

AI Indicators

The classifier looks for research-validated signals of AI generation:

Human Indicators

Output

Each classification returns: a label (ai_generated, human_created, or uncertain), a confidence score (0.0–1.0), specific indicators observed, and a brief reasoning explanation.

Post-Processing

After the LLM returns its classification, a post-processing step corrects for the documented human-default bias. Items classified as “human” but carrying multiple AI indicators are reclassified as uncertain or AI-generated. Short content (<100 characters) is capped at 0.60 confidence.

+--------------+
| Content Item |
+------+-------+
       v
+--------------+   confidence
|Primary Model |---- >= 0.5 --> RESULT
| Gemini Flash |
+------+-------+
       | < 0.5
       v
+--------------+
|Fallback Model|---- >= 0.5 --> RESULT
| Claude Haiku |
+------+-------+
       | < 0.5
       v
  "uncertain"
       |
       v
+--------------+
|Post-Process  |  correct false negatives
| Recalibrate  |  using signal evidence
+------+-------+
       v
   FINAL LABEL
ai / human / uncertain
Classification and post-processing flow.

Bot Detection

Separate from content classification, we analyse author behaviour using a 7-signal weighted scoring system grounded in peer-reviewed research. Authors with 2+ collected items receive a bot score.

SignalWeightResearch
Posting frequency — posts per hour0.20Gilani et al. 2017
AI content ratio — % of posts classified as AI0.20Novel signal
Content diversity — topic/subreddit entropy0.15Oentaryo et al. 2016
Timing entropy — Shannon entropy of posting hours0.15Chu et al. 2012
Response latency — median seconds between posts0.10Ferrara et al. 2016
Karma velocity — karma gained per day0.10Multiple studies
Account age ratio — age vs activity volume0.10Cresci et al. 2015

Scores above 0.7 are flagged as likely bots. Between 0.4–0.7 is suspicious. Below 0.4 is likely human.


The Autopsy

What type of Dead is the Internet?

The Autopsy Matrix crosses content origin with audience type. The content axis comes from our AI classification. The audience axis uses the global bot consumption rate — the proportion of web traffic that is automated.

Human AudienceAI Slurp
Human ContentAliveZombified
AI SlopPollutedDead

AI Slop = AI-generated content · AI Slurp = bot consumption of content

The bot consumption rate (45%) is a cross-industry blend of four major bot traffic reports: Imperva 2025 (51%), Akamai 2025 (51%), Cloudflare Radar 2025 (~30%), and Barracuda 2023 (50%). This rate is applied uniformly across all sources as per-platform consumption data is not publicly available.


Dead Internet Index

The DII is a composite score (0–100) measuring how “dead” the internet is. Each source's DII is computed independently, then the headline DII is the unweighted mean across all sources — giving every platform equal say regardless of collection volume. Each per-source DII combines four weighted components:

ComponentWeight
AI content % — classified as AI-generated0.40
Bot engagement % — engagement from bot-flagged authors0.25
Slop×Slurp % — AI content from bot authors (the “dead” quadrant)0.20
Low-confidence human % — “human” classifications below 0.7 confidence0.15

When consumption data is available (Cloudflare Radar, robots.txt monitoring), a fifth component (0.20 weight) is added and the other weights adjust downward.


Limitations


Transparency

Every classification record stores the model provider, model name, prompt version, token counts, estimated cost, and latency. This metadata enables full auditability and comparison across models over time.


Revision History

The monitor evolves as we add sources, refine methodology, and respond to platform changes. Each revision is backfilled — we re-aggregate all historical data under the current methodology so the full timeline is consistent.

RevDateChanges
022026-03-25Reddit live (15 subreddits via RSS, ~1,500 items/run). Autopsy matrix now uses bot consumption rate for audience axis instead of author bot scores. Rate corrected from 51% (Imperva-only) to 45% (cross-industry blend of Imperva, Akamai, Cloudflare Radar, and Barracuda). Source-weighted DII — each source contributes equally regardless of volume. YouTube capped at 30 comments/video. All historical data re-aggregated under Rev 02 methodology.
012026-03-04Initial release. 6 active sources (Reddit paused — no API credentials). Classification via Gemini 2.5 Flash Lite with Claude Haiku fallback. 7-signal bot detection. Volume-weighted DII. Author-based autopsy matrix.

The trend matters more than any single number. We are watching the watchers.