Skip to content

Add Google Analytics tracking and auto-publish stats to Wikipedia #18

@xinbenlv

Description

@xinbenlv

Original Request

add Google Analytics to the page, and add auto-reporting to Wikipedia Page "WP:WikiLoop_DoubleCheck/stats" (fix link for me), which depends on /fr proper cookie concent bar setup and /fr Google Analytics ID and API authentication for reporting

Agent's Two Cents (could be wrong)

Everything below is the AI agent's best guess based on the current codebase.
Take with a grain of salt — the original request above is the only thing that came from a human.

Wikipedia page link fix: The correct link is likely Wikipedia:WikiLoop DoubleCheck/Statistics — the WP: shortcut resolves to Wikipedia:. The exact subpage name should be confirmed by the requester.

Problem / Motivation

WikiLoop DoubleCheck has zero usage analytics. There's no visibility into how many people use the tool, which features are popular, or where users drop off. Additionally, the Wikipedia community has no easy way to see the project's impact. Publishing stats to a Wikipedia project page would increase transparency and help with community engagement and Wikimedia grant reporting.

Proposed Solution

Two parts:

  1. Client-side GA4 tracking on the web app, gated behind cookie consent
  2. Server-side cron job that queries GA Data API + MongoDB stats and publishes a formatted summary to a Wikipedia project page via the MediaWiki API

Architecture Diagram

┌─────────────────────────────────────────────────────┐
│                    Web Browser                       │
│                                                      │
│  ┌──────────┐    ┌──────────────┐                   │
│  │ Consent  │───▶│ GA4 gtag.js  │──▶ Google         │
│  │ (opt-in) │    │ (G-XXXXXX)   │   Analytics       │
│  └──────────┘    └──────────────┘                   │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│                 Server (Cron Job)                     │
│                                                      │
│  ┌──────────────┐    ┌────────────────┐             │
│  │ GA Data API  │    │   MongoDB      │             │
│  │ (pageviews,  │    │ (judgements,   │             │
│  │  sessions)   │    │  users, revs)  │             │
│  └──────┬───────┘    └───────┬────────┘             │
│         │                    │                       │
│         └────────┬───────────┘                       │
│                  ▼                                    │
│  ┌──────────────────────────────┐                   │
│  │  Format Wikitext Template    │                   │
│  │  (stats table, charts, etc.) │                   │
│  └──────────────┬───────────────┘                   │
│                 │                                    │
│                 ▼                                    │
│  ┌──────────────────────────────┐                   │
│  │  MediaWiki API               │                   │
│  │  action=edit                 │                   │
│  │  Wikipedia:WikiLoop_         │                   │
│  │  DoubleCheck/Statistics      │                   │
│  └──────────────────────────────┘                   │
└─────────────────────────────────────────────────────┘

Dependencies & Potential Blockers

  • Depends on Add GDPR-compliant cookie consent bar #16 — Cookie consent bar must be implemented first (GA cannot load without consent)
  • Depends on Set up Google Analytics ID and API credentials for stats reporting #17 — GA4 Measurement ID and service account credentials must be provisioned
  • Bot account or OAuth token — Writing to Wikipedia requires either a registered bot account or the server using an authorized user's OAuth token. A bot account is preferred for automated edits (see Wikipedia:Bots)
  • Wikipedia bot approval — Automated edits to Wikipedia may require bot approval if running frequently. Low-frequency edits (daily/weekly) from a logged-in user account may be acceptable without formal bot status
  • The @google-analytics/data npm package is needed for server-side GA API queries

How to Validate

  • GA4 tracking snippet loads ONLY after cookie consent opt-in
  • Page views, button clicks (Review, Revert, Skip) tracked as GA4 events
  • Server cron job runs on schedule (daily or weekly) without errors
  • Wikipedia stats page is updated with formatted wikitext containing:
    • Total judgements (all-time and this month)
    • Active reviewers count
    • Top reviewed wikis
    • Page views / sessions from GA (if available)
  • Wikipedia edit summary includes bot flag and attribution
  • Stats page renders correctly on Wikipedia with proper wikitext formatting

Scope Estimate

large

Key Files/Modules Likely Involved

  • packages/web/index.html or packages/web/src/App.vue — GA snippet injection (conditional on consent)
  • packages/web/src/composables/useConsent.ts — consent check before loading GA (from Add GDPR-compliant cookie consent bar #16)
  • packages/server/src/cron/ — new stats reporting cron job
  • packages/server/src/lib/ — new GA Data API client and Wikipedia stats publisher
  • packages/server/src/db/models/ — aggregate queries for judgement/user stats

Rough Implementation Sketch

Part 1: GA4 Client Tracking

  • Add gtag.js loading in web app, gated behind useConsent() composable
  • Track key events: page_view, review_judgement, revert_action, login, signup
  • GA Measurement ID from GOOGLE_ANALYTICS_ID env var (or hardcoded since it's public)

Part 2: Wikipedia Stats Auto-Publisher

  • New cron job in packages/server/src/cron/statsPublisher.ts
  • Query MongoDB for: total judgements, monthly active reviewers, top wikis, total reverts
  • Query GA Data API for: page views, sessions, user count (last 30 days)
  • Format into wikitext table/template
  • Publish to Wikipedia via MediaWiki API action=edit using bot/OAuth credentials
  • Run weekly (e.g., every Sunday at 00:00 UTC)

Open Questions

  • Wikipedia page name: Is Wikipedia:WikiLoop_DoubleCheck/Statistics correct, or should it be Wikipedia:WikiLoop DoubleCheck/stats or another subpage?
  • Bot account: Should we register a dedicated bot account (e.g., User:WikiLoopBot) or use an existing user's OAuth token for the edits?
  • Stats granularity: Daily snapshots? Monthly summaries? Rolling 30-day window?
  • GA alternatives: Would a privacy-friendly alternative like Plausible or Umami be preferred? They don't require cookie consent in most jurisdictions
  • Wikitext template: Should we create a reusable Wikipedia template, or just raw wikitext tables?

Potential Risks or Gotchas

  • Current privacy policy explicitly says "No tracking data, analytics cookies" — must be updated alongside this change
  • Wikipedia bot edits without approval can get the account blocked
  • GA Data API has a ~48-hour data processing delay — stats won't be real-time
  • If the Wikipedia stats page doesn't exist yet, the first edit creates it — ensure proper categorization
  • The server needs MediaWiki API write credentials separate from user OAuth (bot password or dedicated OAuth consumer)

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsNewDependenciesImplementation requires adding new packages or external dependenciesenhancementNew feature or requestp2Medium priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions