Impact Dashboard Pipeline Spec
Impact Dashboard Pipeline Spec
Scope
Defines how /impact/ is rendered from generated data and which scripts/workflows own data refresh.
Page Integration
- Page file:
_pages/impact.md - Include shell:
_includes/impact-dashboard.html - Client logic:
assets/js/impact-dashboard.js - Client styles:
assets/css/impact-dashboard.css
The page is include-driven (no inline chart logic in impact.md).
Data Inputs
Impact dashboard builder reads:
_publications/*.md_data/scholar_metrics.json_data/map_data.jsondata/altmetric/raw/*.csv
Impact reach builder reads:
data/impact/exports/news_mentions_clean.json(preferred) or.csv- Tranco top-domain list + historical date snapshots (network fetch with local cache fallback)
Generated Outputs
Builder script:
scripts/build-impact-dashboard-data.py
Reach script:
scripts/build-impact-reach-data.py
Default output directory:
data/impact/
Generated files:
impact_dashboard.jsonimpact_reconciliation.jsonexports/*.jsonexports/*.csvreach/outlet_reach.jsonreach/outlet_reach.csvreach/reach_metadata.jsonreach/time_adjusted_mentions_reach.jsonreach/time_adjusted_mentions_reach.csvreach/time_adjusted_outlet_reach.jsonreach/time_adjusted_outlet_reach.csvreach/tranco_snapshots_used.jsonreach/tranco_snapshots_used.csv
Historical Tranco lookup guardrails (default behavior):
- Uses cache first (
reach/.cache/tranco_cache_index.jsonand cached zip files). - Applies request pacing:
--historical-date-api-delay-ms(default250). - Applies per-run lookup cap:
--historical-max-date-lookups(default250,0disables cap). - Uses bounded nearest-date probing:
--historical-window-days(default10).
Dataset catalog links in impact_dashboard.json are absolute site paths under /data/impact/exports/.
Altmetric parse normalization (in scripts/build-impact-dashboard-data.py):
- CSV decode order is
utf-8-sig, thencp1252, thenlatin-1, with UTF-8 replacement fallback only if all fail. Mention Dateis parsed directly when present; forX Post, missing dates are inferred fromExternal Mention ID(Twitter/X snowflake timestamp).mention_urlis emitted as the best direct link (rawMention URLor inferredhttps://x.com/i/web/status/<id>for X).- Exports retain provenance fields for downstream QA/use:
mention_url_rawinferred_mention_urldetails_page_url(Altmetric details page)mention_link(best available display link)mention_link_source(mention_url,inferred_x_status,altmetric_details_url)
Build Hooks
Local preview hook:
scripts/local_preview.command- Default path skips impact data regeneration for faster UI/content iteration.
- Data-refresh path: run
./scripts/local_preview.command --with-datato executepython3 scripts/build-impact-dashboard-data.pybeforejekyll build. - Uses committed reach datasets (no Tranco API calls during preview).
Deploy hook:
.github/workflows/deploy_site.yml- Runs publication metadata validation before data generation/build:
node scripts/qa/validate-publication-tags.mjsnode scripts/qa/validate-publication-method-tags.mjs
- Runs
python3 scripts/build-impact-dashboard-data.pybefore Jekyll build. - Verifies committed reach datasets exist before Jekyll build.
Reach refresh hook:
.github/workflows/refresh_impact_reach_data.yml- Scheduled weekly (
cron: 17 6 * * 1) and manual (workflow_dispatch). - Runs
python3 scripts/build-impact-reach-data.pywith conservative historical lookup defaults. - Commits and pushes updated
data/impact/reach/*outputs only when values changed.
This keeps deploy/build deterministic while still refreshing API-backed reach data on a controlled cadence.
Runtime Dependencies
Loaded by include:
- Chart.js (CDN)
- topojson-client (CDN)
- chartjs-chart-geo (CDN)
Client data-fetch behavior:
assets/js/impact-dashboard.jsloads dashboard JSON with default browser fetch caching (no explicitcache: "no-store").assets/js/impact-dashboard.jsalso loads reach datasets for the outlet impact ranking:/data/impact/reach/time_adjusted_mentions_reach.json/data/impact/reach/reach_metadata.json
_includes/impact-dashboard.htmlappends?v=20260220204831to JSON/CSS/JS URLs to force fresh assets after each build/deploy.
World topojson fetch order in client:
https://cdn.jsdelivr.net/npm/world-atlas@2/countries-110m.jsonhttps://unpkg.com/world-atlas@2/countries-110m.json
Notes
Mention countriesuses a log-scaled choropleth ramp to reduce US dominance.Attention mixincludes a local scope toggle (All channelsvsExclude social) that only re-renders the donut + sparkline strip.Citing authors (WOS)remains unchanged as a bubble map.Outlet impact rankinguses a horizontal bar chart:- Rank: top normalized outlets by impact score (
mentions * mean publish-time reach / 100) - Excludes syndicated/proxy domains from the ranked score
- Color: reach level (mean publish-time reach score)
- Tooltip: domains observed, mentions, reach score, publication spread, rank coverage
- Rank: top normalized outlets by impact score (
Estimated Impressionsside panel uses a model-based proxy per outlet:- Visits model:
visits = 500M * rank^-0.62(rank from time-adjusted Tranco snapshot) - Per-mention article-share bounds:
0.02%(low),0.05%(mid),0.10%(high) - Displays low/high ranges as horizontal intervals with a mid-point marker
- Uses a logarithmic x-axis so lower-impression outlets remain readable
- Uses the same top-outlet set and order as outlet impact ranking
- Uses same syndication/proxy exclusions as outlet impact ranking
- Visits model:
- Reconciliation and export tables are rendered client-side from generated JSON.