Audit table — the human-readable summary of corpus statistics (Files / Sentences / Word tokens / ccVersions / Rule sentences / etc.). Each row references the canonical YAML key its value comes from.
This is the only producer stage that appears in the published Quarto site (it’s the most reader-friendly view); stages 00–04 run in the kernel but are hidden from the navbar.
12. Canonical HEADLINE sheet
Re-uses prompt_analysis.headline_numbers() — the same function consumer notebooks call. Pass alt_df (for composite-directiveness range and per-version mood_marker_pct extremes) and the per-sentence parquet (for parquet-level threat / causal / rule counts) so the producer’s audit covers the full HEADLINE contract.
Code
"""Compute and display the canonical HEADLINE dict.Prints as YAML so the values are visible in this notebook (saving a copy in thecell output) without requiring another tool."""import os, sys, pathlib, importlibsys.path.insert(0, ".")import pandas as pdimport yaml as _yaml_here = pathlib.Path.cwd().resolve()PROJECT_ROOT =next( (p for p in [_here, *_here.parents] if (p /"prompt_pipeline.py").is_file()),None,)if PROJECT_ROOT isNone:raiseRuntimeError(f"Could not find prompt_pipeline.py walking up from {_here}. ""Run from inside the claude-prompts-analysis repo." )if pathlib.Path.cwd() != PROJECT_ROOT: os.chdir(PROJECT_ROOT)import prompt_analysisimportlib.reload(prompt_analysis) # pick up edits without restarting the kernelfrom prompt_analysis import ( load_yaml, build_alt_df, headline_numbers, qualitative_phrases, bind_inline_vars, use_deterministic_ids,)# Replace random Altair / Styler IDs with a deterministic counter so re-runs# produce byte-identical .ipynb outputs (no UUID churn in `git diff`).use_deterministic_ids()data = load_yaml()alt_df = build_alt_df(data)parquet = pd.read_parquet("sentences_classified.parquet")HEADLINE = headline_numbers(data, alt_df=alt_df, parquet=parquet)PHRASES = qualitative_phrases(HEADLINE, alt_df=alt_df, parquet=parquet)# Make every formatted figure available as a plain-name variable for inline {python} expressions in the audit-table cell below.globals().update(bind_inline_vars(HEADLINE, PHRASES))print(_yaml.safe_dump(HEADLINE, sort_keys=False, default_flow_style=False))
Live corpus statistics — these are the canonical values for every prose mention across the notebooks; any number that disagrees gets reconciled to these. Every figure below is computed live from the YAML.