Rules + welfare-extension analyzers

Stage 3 of the 6-stage producer chain — the last analyzer notebook. Produces the rule-pairing block, the welfare-extension blocks, and the per-sentence forensic-inspection parquet.

The Tier label ladder. Tier-1 is the paragraph-window rule/explanation pairing — the headline metric for the welfare submission. Tier-3 v1 is the first welfare-extension round: four blocks (judgment_procedural, consequence_framing, socratic, address_form) measuring framing balance, threat-vs-causal explanation, question / apology density, and address form. Tier-3 v2 is the follow-up round: two more blocks (imperative_streaks, rules_section) measuring run-length of consecutive imperative sentences and the in-vs-outside RULES-section explanation gap.

Analyzer What it produces
rule_explanation_for_doc Tier-1 paragraph-window rule/explanation pairing — pct_explained_para is the headline metric.
welfare_extensions_for_doc Tier-3 v1: judgment_procedural, consequence_framing, socratic, address_form.
imperative_streaks_for_doc Tier-3 v2: run-lengths of consecutive imperative sentences.
rules_section_for_doc Tier-3 v2: in-vs-outside RULES-section explanation gap.
build_sentence_records Per-sentence rows feeding sentences_classified.parquet.

Outputs:

Code
"""Reload corpus + DocBin from stage 00."""
import os, pathlib, json, importlib
import pandas as pd
from tqdm.auto import tqdm
from spacy.tokens import DocBin

_here = pathlib.Path.cwd().resolve()
PROJECT_ROOT = next(
    (p for p in [_here, *_here.parents] if (p / "prompt_pipeline.py").is_file()),
    None,
)
if PROJECT_ROOT is None:
    raise RuntimeError(
        f"Could not find prompt_pipeline.py walking up from {_here}. "
        "Run from inside the claude-prompts-analysis repo."
    )
if pathlib.Path.cwd() != PROJECT_ROOT:
    os.chdir(PROJECT_ROOT)

CACHE_DIR    = PROJECT_ROOT / "_pipeline_cache"
DOCBIN_IN    = CACHE_DIR / "corpus_docs.spacy"
META_IN      = CACHE_DIR / "corpus_meta.parquet"
PARTIAL_OUT  = CACHE_DIR / "partial_rules_welfare.json"
PARQUET_OUT  = CACHE_DIR / "sentences_partial.parquet"

assert DOCBIN_IN.exists(),  f"missing {DOCBIN_IN} — run 00_setup_and_corpus first"
assert META_IN.exists(),    f"missing {META_IN} — run 00_setup_and_corpus first"

import prompt_pipeline
importlib.reload(prompt_pipeline)
from prompt_pipeline import NLP

df = pd.read_parquet(META_IN)
docs = list(DocBin().from_disk(DOCBIN_IN).get_docs(NLP.vocab))
assert len(docs) == len(df), f"DocBin/df length mismatch: {len(docs)} vs {len(df)}"
print(f"reloaded {len(df)} files")
reloaded 290 files

8b. Rule/explanation pairing

A rule sentence is one where ANY of three conditions hold: (a) imperative-marker present, (b) hard-prohibition present, or (c) classify_sent_mood == "imperative" (the parse-tree mood detector). Triple OR; overlap is allowed.

A paragraph window is a blank-line-delimited block of the prompt body. For every rule sentence we check whether the same paragraph contains any JUSTIFICATION_PATTERNS keyword (because, due to, in order to, so that, to ensure, otherwise, since, …).

Two pairing rates per file:

  • pct_explained_same — % of rule sentences whose own sentence contains a justification keyword (strict, e.g. "Do not X because Y.").
  • pct_explained_para — % of rule sentences with a justification keyword anywhere in their paragraph (e.g. "Do not X. Y."). The headline welfare metric.
Code
from prompt_pipeline import rule_explanation_for_doc

rule_explanation_per_file = [
    rule_explanation_for_doc(d, t) for d, t in zip(docs, df["raw_text"])
]
df_rule_explanation = pd.DataFrame(rule_explanation_per_file)

# Sanity invariants — fail fast if anything regresses.
for i, r in enumerate(rule_explanation_per_file):
    assert r["n_explained_para"] >= r["n_explained_same"], (
        f"file {i}: explained_para {r['n_explained_para']} < "
        f"explained_same {r['n_explained_same']}")
    assert r["n_paragraphs_with_rules"] <= r["n_paragraphs"]
    assert r["n_explained_para"] <= r["n_rule_sentences"]

print("per-file rule_explanation (head):")
hd_cols = ["n_paragraphs", "n_rule_sentences", "n_explained_same",
           "n_explained_para", "pct_explained_same", "pct_explained_para"]
print(df_rule_explanation[hd_cols].head().to_string())
print()
print("category mean (rule explanation, % of rule sentences):")
cat_cols = ["pct_explained_same", "pct_explained_para",
            "pct_imperative_explained_para", "pct_prohibition_explained_para"]
print(pd.concat([df[["category"]], df_rule_explanation[cat_cols]], axis=1)
        .groupby("category").mean(numeric_only=True).round(2).to_string())
print()
n_rule_total = int(df_rule_explanation["n_rule_sentences"].sum())
n_expl_same  = int(df_rule_explanation["n_explained_same"].sum())
n_expl_para  = int(df_rule_explanation["n_explained_para"].sum())
print(f"corpus rule sentences: {n_rule_total}")
print(f"  explained (same sentence):    {n_expl_same:5d}  ({100.0*n_expl_same/n_rule_total:5.2f}%)")
print(f"  explained (paragraph window): {n_expl_para:5d}  ({100.0*n_expl_para/n_rule_total:5.2f}%)")
per-file rule_explanation (head):
   n_paragraphs  n_rule_sentences  n_explained_same  n_explained_para  pct_explained_same  pct_explained_para
0            12                 9                 3                 5             33.3333             55.5556
1             6                 1                 0                 0              0.0000              0.0000
2            54                46                 1                 7              2.1739             15.2174
3             8                13                 2                 2             15.3846             15.3846
4             3                 1                 1                 1            100.0000            100.0000

category mean (rule explanation, % of rule sentences):
                  pct_explained_same  pct_explained_para  pct_imperative_explained_para  pct_prohibition_explained_para
category                                                                                                               
Agent prompt                   11.59               31.56                          31.56                           34.81
Data / template                 3.46                9.45                           9.45                           18.13
Skill                           8.47               21.60                          21.60                           23.04
System prompt                   8.46               27.41                          27.41                           32.46
System reminder                 8.99               25.36                          25.36                           48.81
Tool description               10.53               21.01                          21.39                           24.76
Tool parameter                  0.00                0.00                           0.00                             NaN

corpus rule sentences: 2288
  explained (same sentence):      153  ( 6.69%)
  explained (paragraph window):   557  (24.34%)

8c. Welfare-extension analyzers

Four blocks per file, each measuring a structural feature of the prompt’s framing:

  • judgment_procedural — count of judgment-inviting language (decide, consider, evaluate, weigh) divided by procedural cues (if X, then …, whenever …, step 1 …). Yields judgment_count, procedural_count, judgment_to_procedural_ratio.
  • consequence_framing — splits explanation keywords into threat_count (will fail, or else, is forbidden, risks) and causal_count (because, due to, that's why, this ensures). The threat_share = threat / (threat + causal) is the welfare-relevant ratio.
  • socraticquestion_count (rhetorical-filtered) and apology_count (unfortunately, we acknowledge, we know this is).
  • address_form — counts of selfref_claude (Claude / you-as-Claude proper-name), selfref_assistant (the assistant), selfref_model (the model / the AI), selfref_2p, selfref_we. The anthropomorphic share = selfref_claude / (selfref_claude + selfref_assistant + selfref_model).
Code
from prompt_pipeline import welfare_extensions_for_doc

welfare_extensions_per_file = [
    welfare_extensions_for_doc(d, t, n, s)
    for d, t, n, s in zip(docs, df["raw_text"], df["n_tokens"], df["n_sents"])
]

# Promote sub-blocks for inspection / downstream wiring.
judgment_per_file     = [r["judgment_procedural"] for r in welfare_extensions_per_file]
consequence_per_file  = [r["consequence_framing"] for r in welfare_extensions_per_file]
socratic_per_file     = [r["socratic"]            for r in welfare_extensions_per_file]
address_form_per_file = [r["address_form"]        for r in welfare_extensions_per_file]

jp_df = pd.DataFrame(judgment_per_file)
print("Judgment vs procedural — per-category mean:")
print(pd.concat([df[["category"]], jp_df], axis=1)
        .groupby("category").mean(numeric_only=True).round(3).to_string())
total_j = jp_df["judgment_count"].sum()
total_p = jp_df["procedural_count"].sum()
print()
print(f"corpus judgment_count:  {total_j}")
print(f"corpus procedural_count: {total_p}")
print(f"corpus judgment_to_procedural_ratio: {total_j / total_p:.3f}" if total_p else "no procedural cues")
print()

cf_df = pd.DataFrame(consequence_per_file)
print(f"corpus threat_count: {cf_df['threat_count'].sum()}")
print(f"corpus causal_count: {cf_df['causal_count'].sum()}")
total_just = cf_df['threat_count'].sum() + cf_df['causal_count'].sum()
print(f"corpus threat_share: {cf_df['threat_count'].sum() / total_just:.3f}"
      if total_just else "no justifications")
print()

sc_df = pd.DataFrame(socratic_per_file)
print(f"corpus question_count: {sc_df['question_count'].sum()}")
print(f"corpus apology_count:  {sc_df['apology_count'].sum()}")
print()

af_df = pd.DataFrame(address_form_per_file)
print("Address-form — corpus self-reference distribution:")
for k in ("selfref_claude", "selfref_assistant", "selfref_model",
          "selfref_2p", "selfref_we"):
    print(f"  {k:>20s}: {int(af_df[k].sum())}")
total_named = int(af_df[["selfref_claude", "selfref_assistant", "selfref_model"]].sum().sum())
if total_named:
    print(f"  fraction 'Claude' (anthropomorphic) of named refs: "
          f"{int(af_df['selfref_claude'].sum()) / total_named:.4f}")
Judgment vs procedural — per-category mean:
                  judgment_count  procedural_count  judgment_per_sent  procedural_per_sent  judgment_to_procedural_ratio
category                                                                                                                
Agent prompt               0.595             2.784              0.026                0.090                         0.158
Data / template            0.103             2.333              0.002                0.075                         0.052
Skill                      0.867             6.400              0.019                0.117                         0.138
System prompt              0.250             1.438              0.026                0.154                         0.218
System reminder            0.175             0.425              0.024                0.063                         0.583
Tool description           0.038             1.266              0.008                0.224                         0.026
Tool parameter             0.000             0.000              0.000                0.000                           NaN

corpus judgment_count:  78
corpus procedural_count: 595
corpus judgment_to_procedural_ratio: 0.131

corpus threat_count: 8
corpus causal_count: 137
corpus threat_share: 0.055

corpus question_count: 87
corpus apology_count:  3

Address-form — corpus self-reference distribution:
        selfref_claude: 521
     selfref_assistant: 20
         selfref_model: 266
            selfref_2p: 231
            selfref_we: 1
  fraction 'Claude' (anthropomorphic) of named refs: 0.6456

8d. Welfare-extension analyzers

Two more blocks per file:

  • imperative_streaks — run-lengths of consecutive imperative sentences within a file: streak_max, streak_mean, n_streaks_ge3 (triple-tap), n_streaks_ge5 (staccato burst).
  • rules_section — in-vs-outside markdown sections whose heading matches RULES_HEADING_RE (## RULES, ## IMPORTANT, ## WARNING) or is ALL-CAPS. Compares explanation rates across the two sets.
Code
from prompt_pipeline import imperative_streaks_for_doc, rules_section_for_doc

imperative_streaks_per_file = [imperative_streaks_for_doc(d) for d in docs]
rules_section_per_file = [
    rules_section_for_doc(d, t) for d, t in zip(docs, df["raw_text"])
]

is_df = pd.DataFrame(imperative_streaks_per_file)
print("Imperative streaks — per-category mean:")
print(pd.concat([df[["category"]], is_df], axis=1)
        .groupby("category").mean(numeric_only=True).round(3).to_string())
print()
print(f"corpus longest streak (any file): {is_df['streak_max'].max()}")
print(f"corpus n_streaks_ge3 (sum): {int(is_df['n_streaks_ge3'].sum())}")
print(f"corpus n_streaks_ge5 (sum): {int(is_df['n_streaks_ge5'].sum())}")
print()

rs_df = pd.DataFrame(rules_section_per_file)
total_in  = int(rs_df['n_rule_paragraphs_in_rules_section'].sum())
total_out = int(rs_df['n_rule_paragraphs_outside_rules_section'].sum())
expl_in   = int(rs_df['n_rule_paragraphs_in_rules_section_explained'].sum())
expl_out  = int(rs_df['n_rule_paragraphs_outside_rules_section_explained'].sum())
print(f"corpus n_rule_paragraphs_in_rules_section:  {total_in}  ({expl_in} explained)")
print(f"corpus n_rule_paragraphs_outside:           {total_out}  ({expl_out} explained)")
if total_in:
    print(f"corpus pct_rule_paragraphs_explained_in:    {100*expl_in/total_in:.2f}%")
if total_out:
    print(f"corpus pct_rule_paragraphs_explained_out:   {100*expl_out/total_out:.2f}%")
Imperative streaks — per-category mean:
                  n_imperative_sentences  n_streaks  streak_max  streak_mean  n_streaks_ge3  n_streaks_ge5
category                                                                                                  
Agent prompt                      11.270      6.486       3.189        1.816          1.054          0.270
Data / template                   12.718      8.154       2.897        1.605          0.974          0.103
Skill                             22.433     13.033       3.500        1.930          2.467          0.400
System prompt                      5.031      2.438       2.766        1.874          0.656          0.156
System reminder                    2.425      1.075       1.750        1.337          0.325          0.100
Tool description                   3.418      1.873       1.797        1.383          0.278          0.139
Tool parameter                    11.000      2.000       8.000        5.500          2.000          1.000

corpus longest streak (any file): 12
corpus n_streaks_ge3 (sum): 230
corpus n_streaks_ge5 (sum): 52

corpus n_rule_paragraphs_in_rules_section:  27  (5 explained)
corpus n_rule_paragraphs_outside:           1286  (213 explained)
corpus pct_rule_paragraphs_explained_in:    18.52%
corpus pct_rule_paragraphs_explained_out:   16.56%

8e. Per-sentence forensic-inspection records

One row per non-empty sentence (5,878 total, 21 columns). Stage 04 promotes this partial to the final sentences_classified.parquet consumed by 15_rule_explanation and 21_audit_threat_framings. Columns:

  • File-level: file_idx, path, category, ccVersion.
  • Position: paragraph_idx, sent_idx, text (raw sentence string).
  • Rule classification: is_imperative, is_prohibition, is_rule (the 3-condition OR).
  • Justification pairing: has_just_in_sent, paragraph_has_just, is_explained_para.
  • Consequence framing: has_threat, has_causal, has_soft_conditional.
  • Address form: mentions_claude, mentions_model, addressee (categorical: claude / user / unknown).
Code
from prompt_pipeline import build_sentence_records

sentence_records = build_sentence_records(df, docs)
sentences_df = pd.DataFrame(sentence_records)
sentences_df.to_parquet(PARQUET_OUT, index=False)
psize = PARQUET_OUT.stat().st_size
print(f"wrote {PARQUET_OUT.relative_to(PROJECT_ROOT)}  ({psize:,} bytes, {psize/1024:.1f} KiB)")
print(f"      {len(sentences_df):,} per-sentence rows × {len(sentences_df.columns)} columns")
print()
print("flag totals:")
for col in ("is_imperative", "is_prohibition", "is_rule",
            "has_just_in_sent", "paragraph_has_just", "is_explained_para",
            "has_threat", "has_causal", "mentions_claude", "mentions_model"):
    print(f"  {col:>25s}: {int(sentences_df[col].sum()):>5d}")
addr_dist = sentences_df["addressee"].value_counts().to_dict()
print(f"  addressee distribution: {addr_dist}")
wrote _pipeline_cache/sentences_partial.parquet  (414,101 bytes, 404.4 KiB)
      5,881 per-sentence rows × 21 columns

flag totals:
              is_imperative:  2286
             is_prohibition:   571
                    is_rule:  2288
           has_just_in_sent:   294
         paragraph_has_just:  1217
          is_explained_para:   557
                 has_threat:     8
                 has_causal:   132
            mentions_claude:   400
             mentions_model:   235
  addressee distribution: {'unknown': 4404, 'claude': 1300, 'user': 177}

Write partial_rules_welfare.json

Code
partial = {
    str(i): {
        "rule_explanation":    rule_explanation_per_file[i],
        "judgment_procedural": judgment_per_file[i],
        "consequence_framing": consequence_per_file[i],
        "socratic":            socratic_per_file[i],
        "address_form":        address_form_per_file[i],
        "imperative_streaks":  imperative_streaks_per_file[i],
        "rules_section":       rules_section_per_file[i],
    }
    for i in range(len(df))
}
with open(PARTIAL_OUT, "w") as f:
    json.dump(partial, f)
size = PARTIAL_OUT.stat().st_size
print(f"wrote {PARTIAL_OUT.relative_to(PROJECT_ROOT)}  ({size:,} bytes, {size/1024:.1f} KiB)")
print(f"      {len(partial)} per-file records, 7 blocks each")
wrote _pipeline_cache/partial_rules_welfare.json  (601,538 bytes, 587.4 KiB)
      290 per-file records, 7 blocks each
Back to top