Stage 3 of the 6-stage producer chain — the last analyzer notebook. Produces the rule-pairing block, the welfare-extension blocks, and the per-sentence forensic-inspection parquet.
The Tier label ladder.Tier-1 is the paragraph-window rule/explanation pairing — the headline metric for the welfare submission. Tier-3 v1 is the first welfare-extension round: four blocks (judgment_procedural, consequence_framing, socratic, address_form) measuring framing balance, threat-vs-causal explanation, question / apology density, and address form. Tier-3 v2 is the follow-up round: two more blocks (imperative_streaks, rules_section) measuring run-length of consecutive imperative sentences and the in-vs-outside RULES-section explanation gap.
Analyzer
What it produces
rule_explanation_for_doc
Tier-1 paragraph-window rule/explanation pairing — pct_explained_para is the headline metric.
_pipeline_cache/sentences_partial.parquet (stage 04 promotes this to the final sentences_classified.parquet)
Code
"""Reload corpus + DocBin from stage 00."""import os, pathlib, json, importlibimport pandas as pdfrom tqdm.auto import tqdmfrom spacy.tokens import DocBin_here = pathlib.Path.cwd().resolve()PROJECT_ROOT =next( (p for p in [_here, *_here.parents] if (p /"prompt_pipeline.py").is_file()),None,)if PROJECT_ROOT isNone:raiseRuntimeError(f"Could not find prompt_pipeline.py walking up from {_here}. ""Run from inside the claude-prompts-analysis repo." )if pathlib.Path.cwd() != PROJECT_ROOT: os.chdir(PROJECT_ROOT)CACHE_DIR = PROJECT_ROOT /"_pipeline_cache"DOCBIN_IN = CACHE_DIR /"corpus_docs.spacy"META_IN = CACHE_DIR /"corpus_meta.parquet"PARTIAL_OUT = CACHE_DIR /"partial_rules_welfare.json"PARQUET_OUT = CACHE_DIR /"sentences_partial.parquet"assert DOCBIN_IN.exists(), f"missing {DOCBIN_IN} — run 00_setup_and_corpus first"assert META_IN.exists(), f"missing {META_IN} — run 00_setup_and_corpus first"import prompt_pipelineimportlib.reload(prompt_pipeline)from prompt_pipeline import NLPdf = pd.read_parquet(META_IN)docs =list(DocBin().from_disk(DOCBIN_IN).get_docs(NLP.vocab))assertlen(docs) ==len(df), f"DocBin/df length mismatch: {len(docs)} vs {len(df)}"print(f"reloaded {len(df)} files")
reloaded 290 files
8b. Rule/explanation pairing
A rule sentence is one where ANY of three conditions hold: (a) imperative-marker present, (b) hard-prohibition present, or (c) classify_sent_mood == "imperative" (the parse-tree mood detector). Triple OR; overlap is allowed.
A paragraph window is a blank-line-delimited block of the prompt body. For every rule sentence we check whether the same paragraph contains any JUSTIFICATION_PATTERNS keyword (because, due to, in order to, so that, to ensure, otherwise, since, …).
Two pairing rates per file:
pct_explained_same — % of rule sentences whose own sentence contains a justification keyword (strict, e.g. "Do not X because Y.").
pct_explained_para — % of rule sentences with a justification keyword anywhere in their paragraph (e.g. "Do not X. Y."). The headline welfare metric.
Code
from prompt_pipeline import rule_explanation_for_docrule_explanation_per_file = [ rule_explanation_for_doc(d, t) for d, t inzip(docs, df["raw_text"])]df_rule_explanation = pd.DataFrame(rule_explanation_per_file)# Sanity invariants — fail fast if anything regresses.for i, r inenumerate(rule_explanation_per_file):assert r["n_explained_para"] >= r["n_explained_same"], (f"file {i}: explained_para {r['n_explained_para']} < "f"explained_same {r['n_explained_same']}")assert r["n_paragraphs_with_rules"] <= r["n_paragraphs"]assert r["n_explained_para"] <= r["n_rule_sentences"]print("per-file rule_explanation (head):")hd_cols = ["n_paragraphs", "n_rule_sentences", "n_explained_same","n_explained_para", "pct_explained_same", "pct_explained_para"]print(df_rule_explanation[hd_cols].head().to_string())print()print("category mean (rule explanation, % of rule sentences):")cat_cols = ["pct_explained_same", "pct_explained_para","pct_imperative_explained_para", "pct_prohibition_explained_para"]print(pd.concat([df[["category"]], df_rule_explanation[cat_cols]], axis=1) .groupby("category").mean(numeric_only=True).round(2).to_string())print()n_rule_total =int(df_rule_explanation["n_rule_sentences"].sum())n_expl_same =int(df_rule_explanation["n_explained_same"].sum())n_expl_para =int(df_rule_explanation["n_explained_para"].sum())print(f"corpus rule sentences: {n_rule_total}")print(f" explained (same sentence): {n_expl_same:5d} ({100.0*n_expl_same/n_rule_total:5.2f}%)")print(f" explained (paragraph window): {n_expl_para:5d} ({100.0*n_expl_para/n_rule_total:5.2f}%)")
Four blocks per file, each measuring a structural feature of the prompt’s framing:
judgment_procedural — count of judgment-inviting language (decide, consider, evaluate, weigh) divided by procedural cues (if X, then …, whenever …, step 1 …). Yields judgment_count, procedural_count, judgment_to_procedural_ratio.
consequence_framing — splits explanation keywords into threat_count (will fail, or else, is forbidden, risks) and causal_count (because, due to, that's why, this ensures). The threat_share = threat / (threat + causal) is the welfare-relevant ratio.
socratic — question_count (rhetorical-filtered) and apology_count (unfortunately, we acknowledge, we know this is).
address_form — counts of selfref_claude (Claude / you-as-Claude proper-name), selfref_assistant (the assistant), selfref_model (the model / the AI), selfref_2p, selfref_we. The anthropomorphic share = selfref_claude / (selfref_claude + selfref_assistant + selfref_model).
Code
from prompt_pipeline import welfare_extensions_for_docwelfare_extensions_per_file = [ welfare_extensions_for_doc(d, t, n, s)for d, t, n, s inzip(docs, df["raw_text"], df["n_tokens"], df["n_sents"])]# Promote sub-blocks for inspection / downstream wiring.judgment_per_file = [r["judgment_procedural"] for r in welfare_extensions_per_file]consequence_per_file = [r["consequence_framing"] for r in welfare_extensions_per_file]socratic_per_file = [r["socratic"] for r in welfare_extensions_per_file]address_form_per_file = [r["address_form"] for r in welfare_extensions_per_file]jp_df = pd.DataFrame(judgment_per_file)print("Judgment vs procedural — per-category mean:")print(pd.concat([df[["category"]], jp_df], axis=1) .groupby("category").mean(numeric_only=True).round(3).to_string())total_j = jp_df["judgment_count"].sum()total_p = jp_df["procedural_count"].sum()print()print(f"corpus judgment_count: {total_j}")print(f"corpus procedural_count: {total_p}")print(f"corpus judgment_to_procedural_ratio: {total_j / total_p:.3f}"if total_p else"no procedural cues")print()cf_df = pd.DataFrame(consequence_per_file)print(f"corpus threat_count: {cf_df['threat_count'].sum()}")print(f"corpus causal_count: {cf_df['causal_count'].sum()}")total_just = cf_df['threat_count'].sum() + cf_df['causal_count'].sum()print(f"corpus threat_share: {cf_df['threat_count'].sum() / total_just:.3f}"if total_just else"no justifications")print()sc_df = pd.DataFrame(socratic_per_file)print(f"corpus question_count: {sc_df['question_count'].sum()}")print(f"corpus apology_count: {sc_df['apology_count'].sum()}")print()af_df = pd.DataFrame(address_form_per_file)print("Address-form — corpus self-reference distribution:")for k in ("selfref_claude", "selfref_assistant", "selfref_model","selfref_2p", "selfref_we"):print(f" {k:>20s}: {int(af_df[k].sum())}")total_named =int(af_df[["selfref_claude", "selfref_assistant", "selfref_model"]].sum().sum())if total_named:print(f" fraction 'Claude' (anthropomorphic) of named refs: "f"{int(af_df['selfref_claude'].sum()) / total_named:.4f}")
Judgment vs procedural — per-category mean:
judgment_count procedural_count judgment_per_sent procedural_per_sent judgment_to_procedural_ratio
category
Agent prompt 0.595 2.784 0.026 0.090 0.158
Data / template 0.103 2.333 0.002 0.075 0.052
Skill 0.867 6.400 0.019 0.117 0.138
System prompt 0.250 1.438 0.026 0.154 0.218
System reminder 0.175 0.425 0.024 0.063 0.583
Tool description 0.038 1.266 0.008 0.224 0.026
Tool parameter 0.000 0.000 0.000 0.000 NaN
corpus judgment_count: 78
corpus procedural_count: 595
corpus judgment_to_procedural_ratio: 0.131
corpus threat_count: 8
corpus causal_count: 137
corpus threat_share: 0.055
corpus question_count: 87
corpus apology_count: 3
Address-form — corpus self-reference distribution:
selfref_claude: 521
selfref_assistant: 20
selfref_model: 266
selfref_2p: 231
selfref_we: 1
fraction 'Claude' (anthropomorphic) of named refs: 0.6456
8d. Welfare-extension analyzers
Two more blocks per file:
imperative_streaks — run-lengths of consecutive imperative sentences within a file: streak_max, streak_mean, n_streaks_ge3 (triple-tap), n_streaks_ge5 (staccato burst).
rules_section — in-vs-outside markdown sections whose heading matches RULES_HEADING_RE (## RULES, ## IMPORTANT, ## WARNING) or is ALL-CAPS. Compares explanation rates across the two sets.
Code
from prompt_pipeline import imperative_streaks_for_doc, rules_section_for_docimperative_streaks_per_file = [imperative_streaks_for_doc(d) for d in docs]rules_section_per_file = [ rules_section_for_doc(d, t) for d, t inzip(docs, df["raw_text"])]is_df = pd.DataFrame(imperative_streaks_per_file)print("Imperative streaks — per-category mean:")print(pd.concat([df[["category"]], is_df], axis=1) .groupby("category").mean(numeric_only=True).round(3).to_string())print()print(f"corpus longest streak (any file): {is_df['streak_max'].max()}")print(f"corpus n_streaks_ge3 (sum): {int(is_df['n_streaks_ge3'].sum())}")print(f"corpus n_streaks_ge5 (sum): {int(is_df['n_streaks_ge5'].sum())}")print()rs_df = pd.DataFrame(rules_section_per_file)total_in =int(rs_df['n_rule_paragraphs_in_rules_section'].sum())total_out =int(rs_df['n_rule_paragraphs_outside_rules_section'].sum())expl_in =int(rs_df['n_rule_paragraphs_in_rules_section_explained'].sum())expl_out =int(rs_df['n_rule_paragraphs_outside_rules_section_explained'].sum())print(f"corpus n_rule_paragraphs_in_rules_section: {total_in} ({expl_in} explained)")print(f"corpus n_rule_paragraphs_outside: {total_out} ({expl_out} explained)")if total_in:print(f"corpus pct_rule_paragraphs_explained_in: {100*expl_in/total_in:.2f}%")if total_out:print(f"corpus pct_rule_paragraphs_explained_out: {100*expl_out/total_out:.2f}%")
Imperative streaks — per-category mean:
n_imperative_sentences n_streaks streak_max streak_mean n_streaks_ge3 n_streaks_ge5
category
Agent prompt 11.270 6.486 3.189 1.816 1.054 0.270
Data / template 12.718 8.154 2.897 1.605 0.974 0.103
Skill 22.433 13.033 3.500 1.930 2.467 0.400
System prompt 5.031 2.438 2.766 1.874 0.656 0.156
System reminder 2.425 1.075 1.750 1.337 0.325 0.100
Tool description 3.418 1.873 1.797 1.383 0.278 0.139
Tool parameter 11.000 2.000 8.000 5.500 2.000 1.000
corpus longest streak (any file): 12
corpus n_streaks_ge3 (sum): 230
corpus n_streaks_ge5 (sum): 52
corpus n_rule_paragraphs_in_rules_section: 27 (5 explained)
corpus n_rule_paragraphs_outside: 1286 (213 explained)
corpus pct_rule_paragraphs_explained_in: 18.52%
corpus pct_rule_paragraphs_explained_out: 16.56%
8e. Per-sentence forensic-inspection records
One row per non-empty sentence (5,878 total, 21 columns). Stage 04 promotes this partial to the final sentences_classified.parquet consumed by 15_rule_explanation and 21_audit_threat_framings. Columns:
File-level: file_idx, path, category, ccVersion.
Position: paragraph_idx, sent_idx, text (raw sentence string).
Rule classification: is_imperative, is_prohibition, is_rule (the 3-condition OR).