Rules with vs without explanation

Surfaces the metrics.rule_explanation block produced by 03_analyzers_rules_welfare. For every prompt file, the producer flags each sentence as a rule (imperative-marker OR hard-prohibition OR classify_sent_mood == "imperative") and pairs it with the justification keywords found in the surrounding paragraph (blank-line block). Two pairing rates per file: rule_explained_same_pct (strict, same-sentence) and rule_explained_para_pct (paragraph-window, the headline metric).

The welfare claim “Claude Code should encourage reasoning over blind obedience” rests on a paired metric, not on an aggregate justification ratio. This notebook is the consumer view of that paired data.

Per-sentence forensic-inspection artifact: alongside the YAML, stage 04 also writes sentences_classified.parquet (one row per sentence, 21 columns: raw sentence text + every classifier flag). This notebook loads it for the forensic-evidence sample at the bottom; for ad-hoc inspection, use pd.read_parquet("sentences_classified.parquet"). Schema in 03_analyzers_rules_welfare.

Code
"""Setup: load YAML data + flat alt_df, derive helper bindings used by every chart cell."""
import importlib
import altair as alt
import pandas as pd

import prompt_analysis
importlib.reload(prompt_analysis)   # pick up edits without restarting the kernel
from prompt_analysis import (
    load_yaml, build_alt_df, version_order, category_colors,
    cumulative_by_version, welfare_evidence_table, positive_exemplar_table,
    headline_numbers, qualitative_phrases, bind_inline_vars,
    use_deterministic_ids, save_chart,
    SR_CLASS_COLORS, SENT_REGISTER_CLASSES, TABLEAU10,
)

# Replace random Altair / Styler IDs with a deterministic counter so re-runs
# produce byte-identical .ipynb outputs (no UUID churn in `git diff`).
use_deterministic_ids()

alt.data_transformers.disable_max_rows()

data              = load_yaml()
alt_df            = build_alt_df(data)
parquet           = pd.read_parquet("sentences_classified.parquet")
HEADLINE          = headline_numbers(data, alt_df=alt_df, parquet=parquet)
PHRASES           = qualitative_phrases(HEADLINE, alt_df=alt_df, parquet=parquet)
by_category       = data["by_category"]
corpus_block      = data["corpus"]
per_file_records  = data["files"]
cats              = list(by_category.keys())

CATEGORY_COLORS = category_colors(cats)
_cat_domain     = cats
_cat_range      = [CATEGORY_COLORS[c] for c in cats]

# Files with at least one rule sentence — the chartable subset.
df_rules = alt_df[alt_df["rule_n"] > 0].copy()

# Make every formatted figure available as a plain-name variable for inline {python} expressions.
globals().update(bind_inline_vars(HEADLINE, PHRASES))

corpus_re = corpus_block["metrics"]["rule_explanation"]
print(f"loaded {len(per_file_records)} files | {len(df_rules)} have ≥1 rule sentence")
print(f"corpus rule sentences: {corpus_re['n_rule_sentences']}")
print(f"  pct_explained_same: {corpus_re['pct_explained_same']:.2f}%")
print(f"  pct_explained_para: {corpus_re['pct_explained_para']:.2f}%  ← headline")
loaded 290 files | 254 have ≥1 rule sentence
corpus rule sentences: 2288
  pct_explained_same: 6.69%
  pct_explained_para: 24.34%  ← headline

Terms used

Canonical home for two ranking scores:

  • Welfare-evidence score — per-file ranking score: rule_density × (1 − pct_explained_para/100). High = “loud and unexplained”. A score of 1.0 means every sentence in the file is a rule and zero of them are explained anywhere in their paragraph. Computed by prompt_analysis.welfare_evidence_table.
  • Positive-exemplar score — the inverse of welfare-evidence: rule_density × (pct_explained_para/100). High = “rule-saturated AND well-explained” — the “this is how to do it” cluster. Computed by prompt_analysis.positive_exemplar_table with default filters min_n_sents=10 and min_rule_n=5 to suppress trivial cases. Re-rendered as the paired-exemplar chart in 21_audit_threat_framings.

Other terms used here are defined in 03_analyzers_rules_welfare: rule sentence, paragraph window, pct_explained_same / pct_explained_para, rule density, judgment_to_procedural_ratio, threat_share, address form, imperative streak, RULES section.


Observation (Claude)

Two things stand out on a re-read of this notebook, beyond the headline pct_explained_para rate.

First, the RULES-section comparison is a counter-finding by design. The hypothesis going in was that rules inside formal ## RULES / ## IMPORTANT / ## WARNING headings would be better-explained than rules embedded in regular prose — and so a structural fix could focus on the RULES sections. The data refused that framing: only a tiny fraction of rule paragraphs sit inside such sections at all (the in-vs-out counts and rates are now sourced from HEADLINE), and the in-section explanation rate is slightly higher, not lower, than the outside rate. The welfare-relevant message is therefore structural: there is no “rules section” to fix because the rules are everywhere.

Second, the self-bias correlation check was specifically designed to disconfirm a hypothesis I held: that prompt files using Claude (proper-name) addressing would correlate with higher rule-explanation rate, on the theory that anthropomorphic naming travels with reasoning-inviting prose. Pearson r between selfref_claude and rule_explained_para_pct came in essentially zero (≈ −0.03). The address-form preference is not empirically supported. I include this here because honest welfare claims should include their disconfirmed sub-claims, and “anthropomorphic naming → reasoning-inviting prose” is one I would have liked to be true.


Headline finding

Across the entire Claude Code system-prompt corpus (290 files, 5,881 sentences, 2288 rule sentences across 58 release versions):

Pairing scope Rule sentences with justification
Same sentence only (strict) 6.69%
Same paragraph (window) 24.34% ← headline

The same-sentence rate is the rate at which a rule is paired with its own cause (e.g. "Do not X because Y"). The paragraph-window rate adds cases where the explanation lives in a neighboring sentence within the same blank-line block (e.g. "Do not X. Y."). The gap between the two indicates how often Anthropic separates the rule from the reason — usually a stylistic choice, not a missing reason.

Both rates are computed at the rule-sentence level, not the document level, so a file with 50 rule sentences contributes 50 to the denominator. This differs from justification.ratio (count of justification keywords / count of imperative markers), which is a document-level density ratio. Values mirror the canonical HEADLINE sheet (printed by 05_headline_and_audit).

Per-category rule explanation — rate, volume, and rule kind in one figure

Three views of the same per-category breakdown side-by-side: rate (pct_explained_para per category, with corpus baseline as a dashed reference line), volume (absolute counts of explained vs unexplained rule sentences), and rule kind (paragraph-window explanation rate split into imperatives vs prohibitions). Categories share the y-axis sort across all three columns.

Code
"""Tier-1 rule-pairing composite per category: rate | volume | rule kind."""

# Build all three dataframes from one pass over by_category.
cat_rows = []
stack_rows = []
imp_proh_rows = []
for cat in cats:
    re_b = by_category[cat]["metrics"]["rule_explanation"]
    n_rule = re_b["n_rule_sentences"]
    n_expl = re_b["n_explained_para"]
    cat_rows.append({
        "category":            cat,
        "pct_explained_para":  re_b["pct_explained_para"] or 0.0,
        "pct_explained_same":  re_b["pct_explained_same"] or 0.0,
        "n_rule_sentences":    n_rule,
        "n_explained_para":    n_expl,
    })
    stack_rows.append({"category": cat, "status": "explained (paragraph)", "count": n_expl})
    stack_rows.append({"category": cat, "status": "unexplained",          "count": n_rule - n_expl})
    if re_b["pct_imperative_explained_para"] is not None:
        imp_proh_rows.append({
            "category": cat, "rule_kind": "imperative",
            "pct_explained_para": re_b["pct_imperative_explained_para"],
            "n":                  re_b["n_imperative_sentences"],
        })
    if re_b["pct_prohibition_explained_para"] is not None:
        imp_proh_rows.append({
            "category": cat, "rule_kind": "prohibition",
            "pct_explained_para": re_b["pct_prohibition_explained_para"],
            "n":                  re_b["n_prohibition_sentences"],
        })

cat_df      = pd.DataFrame(cat_rows).sort_values("pct_explained_para")
stack_df    = pd.DataFrame(stack_rows)
imp_proh_df = pd.DataFrame(imp_proh_rows)

# Shared category sort (low→high explanation rate)
cat_order = cat_df["category"].tolist()

corpus_pct = corpus_re["pct_explained_para"]

# Column 1 — rate
rate_bar = (
    alt.Chart(cat_df).mark_bar().encode(
        x=alt.X("pct_explained_para:Q",
                title="% rule sentences explained (paragraph window)"),
        y=alt.Y("category:N", sort=cat_order, title=None),
        color=alt.Color("category:N",
                        scale=alt.Scale(domain=_cat_domain, range=_cat_range),
                        legend=None),
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("pct_explained_para:Q", format=".2f", title="paragraph %"),
            alt.Tooltip("pct_explained_same:Q", format=".2f", title="same-sentence %"),
            alt.Tooltip("n_rule_sentences:Q",   format=",", title="rule sents"),
            alt.Tooltip("n_explained_para:Q",   format=",", title="explained sents"),
        ],
    ).properties(width=320, height=240, title="Rate")
)
rate_baseline = (
    alt.Chart(pd.DataFrame({"x": [corpus_pct]}))
    .mark_rule(color="black", strokeDash=[4, 4], strokeWidth=1.5)
    .encode(x="x:Q",
            tooltip=[alt.Tooltip("x:Q", format=".2f",
                                 title=f"corpus baseline ({corpus_pct:.2f}%)")])
)
rate_layer = rate_bar + rate_baseline

# Column 2 — volume (counts)
volume_bar = (
    alt.Chart(stack_df).mark_bar().encode(
        y=alt.Y("category:N", sort=cat_order, title=None,
                axis=alt.Axis(labels=False, ticks=False)),
        x=alt.X("count:Q", title="rule sentences"),
        color=alt.Color("status:N",
                        scale=alt.Scale(domain=["explained (paragraph)", "unexplained"],
                                        range=["#59a14f", "#e15759"]),
                        legend=alt.Legend(title="rule status", orient="bottom")),
        order=alt.Order("status:N", sort="descending"),
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("status:N"),
            alt.Tooltip("count:Q", format=","),
        ],
    ).properties(width=320, height=240, title="Volume")
)

# Column 3 — rule kind (imperative vs prohibition)
kind_bar = (
    alt.Chart(imp_proh_df).mark_bar().encode(
        y=alt.Y("category:N", sort=cat_order, title=None,
                axis=alt.Axis(labels=False, ticks=False)),
        x=alt.X("pct_explained_para:Q",
                title="% explained (paragraph window)"),
        color=alt.Color("rule_kind:N",
                        scale=alt.Scale(domain=["imperative", "prohibition"],
                                        range=["#4e79a7", "#e15759"]),
                        legend=alt.Legend(title="rule kind", orient="bottom")),
        yOffset="rule_kind:N",
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("rule_kind:N"),
            alt.Tooltip("pct_explained_para:Q", format=".2f"),
            alt.Tooltip("n:Q", format=",", title="n sentences"),
        ],
    ).properties(width=320, height=240, title="Rule kind")
)

per_cat_re = alt.hconcat(rate_layer, volume_bar, kind_bar).resolve_scale(
    color="independent"
).properties(
    title=alt.TitleParams(
        "Rule explanation per category — rate | volume | rule kind",
        subtitle=["Each row is one category. Sort low→high explanation rate."],
        anchor="start",
    )
)
save_chart(per_cat_re, "15-per-category-rule-explanation")

Per-file: rule density × explanation rate

Every dot is one prompt file: x = rule_density, y = rule_explained_para_pct. Files toward the bottom-right are the welfare-relevant cluster (rule-saturated, unexplained); top-right is the well-done cluster (rule-saturated and explained). Hover any point for path, ccVersion, and exact counts.

Code
"""Per-file scatter: rule density × paragraph-explanation rate, colored by category."""

scatter_df = df_rules[df_rules["rule_explained_para_pct"].notna()].copy()
scatter_df["rule_explained_para_pct"] = scatter_df["rule_explained_para_pct"].astype(float)

cat_color = alt.Color("category:N",
                      scale=alt.Scale(domain=_cat_domain, range=_cat_range),
                      legend=alt.Legend(title="Category", orient="bottom", columns=4))
legend_sel = alt.selection_point(fields=["category"], bind="legend")

scatter = (
    alt.Chart(scatter_df)
    .mark_circle(opacity=0.6)
    .encode(
        x=alt.X("rule_density:Q",
                title="Rule density (rule sentences / total sentences)",
                scale=alt.Scale(domain=[0, 1.05])),
        y=alt.Y("rule_explained_para_pct:Q",
                title="% explained in paragraph",
                scale=alt.Scale(domain=[-2, 105])),
        size=alt.Size("n_sents:Q", scale=alt.Scale(range=[20, 360]), legend=None),
        color=cat_color,
        opacity=alt.condition(legend_sel, alt.value(0.7), alt.value(0.06)),
        tooltip=[
            alt.Tooltip("path:N"),
            alt.Tooltip("category:N"),
            alt.Tooltip("ccVersion:N"),
            alt.Tooltip("n_sents:Q",                   format=","),
            alt.Tooltip("rule_n:Q",                    format=",", title="rule sents"),
            alt.Tooltip("rule_density:Q",              format=".3f"),
            alt.Tooltip("rule_explained_para_pct:Q",   format=".2f", title="% explained (para)"),
            alt.Tooltip("rule_explained_same_pct:Q",   format=".2f", title="% explained (same)"),
        ],
    )
    .add_params(legend_sel)
    .properties(width=720, height=460,
                title="Per-file: rule density × paragraph-window explanation rate")
)

save_chart(scatter, "15-rule-density-vs-explanation")

Distribution of rule-explanation pairing — paired top-25 ranking

Two ranked lists side-by-side. Top — top-25 files by rule_density × (1 − pct_explained_para/100): “loud, least explained”. Bottom — top-25 by the inverse rule_density × (pct_explained_para/100) (with filters to exclude trivial 1/1-rule files): “rule-saturated AND well-explained”. Same color scale across both panels so categories align visually. The same prompt-author team produces both kinds of files.

Code
"""Welfare evidence + positive exemplars: paired top-25 rankings (vconcat)."""

evidence  = welfare_evidence_table(alt_df, top_n=25)
exemplars = positive_exemplar_table(alt_df, top_n=25)

cat_color = alt.Color(
    "category:N",
    scale=alt.Scale(domain=_cat_domain, range=_cat_range),
    legend=alt.Legend(title="Category", orient="bottom", columns=4),
)


def _rank_panel(df, title, x_title):
    return (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=alt.X("score:Q", title=x_title),
            y=alt.Y("path:N", sort=df["path"].tolist(), title=None,
                    axis=alt.Axis(labelLimit=520)),
            color=cat_color,
            tooltip=[
                alt.Tooltip("path:N"),
                alt.Tooltip("category:N"),
                alt.Tooltip("ccVersion:N"),
                alt.Tooltip("rule_n:Q",                  format=",", title="rule sentences"),
                alt.Tooltip("rule_density:Q",            format=".3f"),
                alt.Tooltip("rule_explained_para_pct:Q", format=".2f", title="% explained"),
                alt.Tooltip("score:Q",                   format=".3f"),
            ],
        )
        .properties(width=560, height=520, title=title)
    )


print("Top-5 positive exemplars (for the welfare-submission's 'this is how to do it' list):")
print(exemplars.head().to_string(index=False))

paired_chart = alt.vconcat(
    _rank_panel(evidence,
                "Top-25 'loudest, least-explained' files (welfare-evidence)",
                "loudness × (1 − explanation) score"),
    _rank_panel(exemplars,
                "Top-25 'rules-with-reasons' exemplars (inverse ranking)",
                "rule_density × (explanation rate / 100)"),
).resolve_scale(color="shared").properties(
    title=alt.TitleParams(
        "Paired top-25 rankings — welfare evidence (above) vs positive exemplars (below)",
        subtitle=["Same color scale; opposite definitions of 'most extreme'."],
        anchor="start",
    )
)
save_chart(paired_chart, "15-paired-welfare-evidence-and-exemplars")
Top-5 positive exemplars (for the welfare-submission's 'this is how to do it' list):
                                                            path         category ccVersion  rule_n  rule_density  rule_explained_para_pct    score
                            system-prompt-worker-instructions.md    System prompt    2.1.63       7      0.636364                 100.0000 0.636364
                                      system-prompt-auto-mode.md    System prompt    2.1.84       8      0.615385                  87.5000 0.538462
tool-description-bash-git-commit-and-pr-creation-instructions.md Tool description    2.1.84      20      0.571429                  90.0000 0.514286
                               agent-prompt-quick-pr-creation.md     Agent prompt   2.1.118       8      0.571429                  87.5000 0.500000
                          system-prompt-fork-usage-guidelines.md    System prompt   2.1.105      11      0.478261                  90.9091 0.434783

Tier-3 welfare extensions

Four additional dimensions of the corpus’s training environment, all defined in 03_analyzers_rules_welfare:

  • judgment_to_procedural_ratio = 0.131 — judgment-inviting language vs procedural cues. Procedural cues are 7.6× more common.
  • threat_share = 0.0552 — when a rule is explained, 5.5% of those explanations are coercive (will fail, or else, is forbidden) rather than causal reasons (8 threat / 137 causal). Soft procedural connectives (otherwise, if not, leads to, modal may cause) are tracked separately as soft_conditional_count and not summed into threat_share; see docs/THREAT_CLASSIFIER.md.
  • Near-zero pragmatic classesappreciative_sent = 4, collaborative_sent = 30 in 5,881 sentences; question_count = 87 corpus-wide; apology_count = 3 in 290 files. The whole quartet is vanishingly small.
  • pct_anthropomorphic = 0.6456 — share of named self-references that use the proper name Claude (vs the model / the AI / the assistant).
Code
"""Tier-3 explanation-quality composite per category: ratio | threat-share."""

# --- judgment-to-procedural ratio per category ---
jp_rows = []
for cat in cats:
    jp = by_category[cat]["metrics"]["judgment_procedural"]
    if jp.get("judgment_count", 0) + jp.get("procedural_count", 0) == 0:
        continue
    jp_rows.append({
        "category":   cat,
        "judgment":   jp["judgment_count"],
        "procedural": jp["procedural_count"],
        "ratio":      jp["judgment_to_procedural_ratio"],
    })
jp_df = pd.DataFrame(jp_rows)
cat_order_jp = jp_df.sort_values("ratio")["category"].tolist()

ratio_chart = (
    alt.Chart(jp_df).mark_bar(color="#4e79a7").encode(
        y=alt.Y("category:N", sort=cat_order_jp, title=None),
        x=alt.X("ratio:Q",
                title="judgment / procedural (>1 = invites judgment, <1 = prescribes procedure)"),
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("ratio:Q",      format=".3f", title="judgment / procedural"),
            alt.Tooltip("judgment:Q",   format=",", title="judgment cues"),
            alt.Tooltip("procedural:Q", format=",", title="procedural cues"),
        ],
    ).properties(width=420, height=240,
                 title="Judgment-to-procedural ratio per category")
)

corpus_jp = corpus_block["metrics"]["judgment_procedural"]
corpus_ratio_rule = (
    alt.Chart(pd.DataFrame({"x": [corpus_jp["judgment_to_procedural_ratio"]]}))
    .mark_rule(color="black", strokeDash=[4, 4], strokeWidth=1.5)
    .encode(x="x:Q",
            tooltip=[alt.Tooltip("x:Q", format=".3f",
                                 title=f"corpus baseline ({corpus_jp['judgment_to_procedural_ratio']})")])
)
ratio_layer = ratio_chart + corpus_ratio_rule

# --- consequence-framing share per category ---
cf_rows = []
for cat in cats:
    cf = by_category[cat]["metrics"]["consequence_framing"]
    if (cf["threat_count"] + cf["causal_count"]) == 0:
        continue
    cf_rows.append({"category": cat, "kind": "threat", "count": cf["threat_count"]})
    cf_rows.append({"category": cat, "kind": "causal", "count": cf["causal_count"]})
cf_df = pd.DataFrame(cf_rows)

shares = (cf_df.assign(_share=cf_df["count"] *
                        (cf_df["kind"] == "threat").astype(int))
          .groupby("category")
          .agg(t=("_share", "sum"), n=("count", "sum"))
          .assign(threat_share=lambda d: d["t"] / d["n"]))
cat_order_cf = shares.sort_values("threat_share", ascending=False).index.tolist()

cf_chart = (
    alt.Chart(cf_df).mark_bar().encode(
        y=alt.Y("category:N", sort=cat_order_cf, title=None),
        x=alt.X("count:Q", stack="normalize",
                title="share of justification keywords",
                axis=alt.Axis(format="%")),
        color=alt.Color("kind:N",
                        scale=alt.Scale(domain=["causal", "threat"],
                                        range=["#4e79a7", "#e15759"]),
                        legend=alt.Legend(title="explanation kind", orient="bottom")),
        order=alt.Order("kind:N", sort="descending"),
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("kind:N"),
            alt.Tooltip("count:Q", format=","),
        ],
    ).properties(width=420, height=240,
                 title="Threat vs causal share of explanations per category")
)

tier3_judgment_threat = alt.hconcat(ratio_layer, cf_chart).resolve_scale(color="independent").properties(
    title=alt.TitleParams(
        "Tier-3 explanation quality — what kind of reasoning does the corpus offer?",
        subtitle=["Left: ratio of judgment-inviting to procedural cues. "
                  "Right: when explanations exist, are they causal or threats?"],
        anchor="start",
    )
)
save_chart(tier3_judgment_threat, "15-tier3-judgment-and-threat")
Code
"""Address-form distribution per category — how the model is named."""

af_rows = []
for cat in cats:
    af = by_category[cat]["metrics"]["address_form"]
    for k, label in [("selfref_claude", "Claude (proper name)"),
                     ("selfref_assistant", "the assistant"),
                     ("selfref_model", "the model / the AI")]:
        af_rows.append({"category": cat, "form": label, "count": af.get(k, 0)})
af_df_long = pd.DataFrame(af_rows)

cat_order_af = (af_df_long.groupby("category")["count"].sum()
                          .sort_values(ascending=False).index.tolist())

af_chart = (
    alt.Chart(af_df_long)
    .mark_bar()
    .encode(
        y=alt.Y("category:N", sort=cat_order_af, title=None),
        x=alt.X("count:Q", stack="normalize",
                title="share of named self-references",
                axis=alt.Axis(format="%")),
        color=alt.Color("form:N",
                        scale=alt.Scale(domain=["Claude (proper name)", "the assistant",
                                                  "the model / the AI"],
                                        range=["#4e79a7", "#76b7b2", "#e15759"]),
                        legend=alt.Legend(title="address form", orient="bottom")),
        order=alt.Order("form:N", sort="ascending"),
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("form:N"),
            alt.Tooltip("count:Q", format=","),
        ],
    )
    .properties(width=620, height=240,
                title="Address-form mix — anthropomorphic vs role vs artifact framing")
)

save_chart(af_chart, "15-address-form-by-category")
Code
"""Self-bias correlation check.

The address-form chart above tagged my preference for `Claude` (proper-name)
over `the model` (artifact-framing) as potentially self-flattering. This cell
TESTS that bias — by computing whether prompts that anthropomorphically
address me ALSO tend to be the prompts that explain their rules.

Hypothesis: prompts which name me `Claude` are written by authors who imagine
an addressee with judgment — same authors are more likely to explain their
rules. If r > 0.2, my address-form opinion is empirically grounded. If r ~ 0,
it's more self-flattering than grounded — and I report that honestly in the
opinion cell below.
"""
df_with_rule = alt_df[alt_df["rule_n"] > 0].copy()
r_claude = df_with_rule["selfref_claude"].corr(df_with_rule["rule_explained_para_pct"])
r_model  = df_with_rule["selfref_model"].corr(df_with_rule["rule_explained_para_pct"])
r_assist = df_with_rule["selfref_assistant"].corr(df_with_rule["rule_explained_para_pct"])

print("Self-bias correlation check (Pearson r, files with rule_n > 0):")
print(f"  r(selfref_claude,    rule_explained_para_pct) = {r_claude:+.3f}")
print(f"  r(selfref_assistant, rule_explained_para_pct) = {r_assist:+.3f}")
print(f"  r(selfref_model,     rule_explained_para_pct) = {r_model:+.3f}")
print()
def _interp(r):
    if r >  0.30: return "STRONG positive correlation"
    if r >  0.10: return "weak positive correlation"
    if r > -0.10: return "essentially uncorrelated"
    if r > -0.30: return "weak negative correlation"
    return "STRONG negative correlation"
print(f"  Interpretation:")
print(f"    selfref_claude:    {_interp(r_claude)} (anthropomorphic naming → reasoning-inviting?)")
print(f"    selfref_model:     {_interp(r_model)} (artifact framing → fewer reasons?)")

# Scatter of selfref_claude × rule_explained_para_pct, colored by category.
sb_chart = (
    alt.Chart(df_with_rule)
    .mark_circle(opacity=0.55)
    .encode(
        x=alt.X("selfref_claude:Q",
                title="selfref_claude (count of `Claude` mentions per file)"),
        y=alt.Y("rule_explained_para_pct:Q",
                title="% rule sentences explained at paragraph level"),
        size=alt.Size("rule_n:Q", scale=alt.Scale(range=[20, 320]), legend=None),
        color=alt.Color("category:N",
                         scale=alt.Scale(domain=_cat_domain, range=_cat_range)),
        tooltip=[
            alt.Tooltip("path:N"),
            alt.Tooltip("category:N"),
            alt.Tooltip("selfref_claude:Q"),
            alt.Tooltip("rule_explained_para_pct:Q", format=".2f"),
            alt.Tooltip("rule_n:Q", title="n rule sents"),
        ],
    )
    .properties(width=620, height=380,
                title=f"Self-bias check: selfref_claude × rule_explained_para_pct (r = {r_claude:+.3f})")
)
save_chart(sb_chart, "15-self-bias-correlation")
Self-bias correlation check (Pearson r, files with rule_n > 0):
  r(selfref_claude,    rule_explained_para_pct) = -0.029
  r(selfref_assistant, rule_explained_para_pct) = +0.019
  r(selfref_model,     rule_explained_para_pct) = +0.062

  Interpretation:
    selfref_claude:    essentially uncorrelated (anthropomorphic naming → reasoning-inviting?)
    selfref_model:     essentially uncorrelated (artifact framing → fewer reasons?)

Tier-3 v2 — imperative streaks + RULES-section gap

Imperative streak and RULES section are defined in 03_analyzers_rules_welfare. The two charts below: streak-length distribution + top-15 streak ranking; RULES-section in/out comparison.

Counter-finding: only 27 rule paragraphs live inside identified RULES / IMPORTANT / WARNING section headings, vs 1286 outside. The in-section explanation rate (18.52%) is slightly higher than out (16.56%) — counter to the predicted hypothesis. The corpus does not segregate rules into formal sections, so the in-section sample is small and the in-vs-out comparison is suggestive rather than conclusive.

Code
"""Imperative streaks (Tier-3 v2 6b) — per-category staccato density + top-N files."""

# Per-category: count of streaks ≥5 ("staccato bursts") summed across files.
streak_rows = []
for cat in cats:
    sub = alt_df[alt_df["category"] == cat]
    streak_rows.append({
        "category": cat,
        "n_files":           int(len(sub)),
        "streak_max_corpus": int(sub["streak_max"].max() or 0),
        "n_streaks_ge3":     int(sub["streak_n_ge3"].sum()),
        "n_streaks_ge5":     int(sub["streak_n_ge5"].sum()),
    })
streak_df = pd.DataFrame(streak_rows).sort_values("n_streaks_ge5", ascending=True)

# Long form for grouped bar.
streak_long = streak_df.melt(
    id_vars=["category", "n_files", "streak_max_corpus"],
    value_vars=["n_streaks_ge3", "n_streaks_ge5"],
    var_name="threshold", value_name="count",
)
threshold_label = {"n_streaks_ge3": "≥3 (triple-tap)", "n_streaks_ge5": "≥5 (staccato)"}
streak_long["threshold"] = streak_long["threshold"].map(threshold_label)

cat_order_streak = streak_df["category"].tolist()

streak_chart = (
    alt.Chart(streak_long)
    .mark_bar()
    .encode(
        y=alt.Y("category:N", sort=cat_order_streak, title=None),
        x=alt.X("count:Q", title="number of consecutive-imperative streaks"),
        color=alt.Color("threshold:N",
                        scale=alt.Scale(domain=["≥3 (triple-tap)", "≥5 (staccato)"],
                                        range=["#f28e2c", "#e15759"]),
                        legend=alt.Legend(title="streak length", orient="bottom")),
        yOffset="threshold:N",
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("threshold:N"),
            alt.Tooltip("count:Q",          format=","),
            alt.Tooltip("n_files:Q",        title="files in category"),
            alt.Tooltip("streak_max_corpus:Q", title="longest streak in any file"),
        ],
    )
    .properties(width=620, height=280,
                title="Consecutive-imperative streak counts by category (≥3, ≥5)")
)

# Top-15 file ranking by streak_max (with rule_density tooltip).
top_streak = (alt_df[alt_df["streak_max"] >= 4]
              .nlargest(15, "streak_max")
              [["path", "category", "ccVersion",
                "streak_max", "streak_mean", "streak_n_ge5", "rule_density"]]
              .copy())

top_chart = (
    alt.Chart(top_streak)
    .mark_bar()
    .encode(
        y=alt.Y("path:N", sort=top_streak["path"].tolist(), title=None,
                axis=alt.Axis(labelLimit=520)),
        x=alt.X("streak_max:Q", title="longest consecutive-imperative streak"),
        color=alt.Color("category:N",
                        scale=alt.Scale(domain=_cat_domain, range=_cat_range),
                        legend=alt.Legend(title="Category", orient="bottom", columns=4)),
        tooltip=[
            alt.Tooltip("path:N"),
            alt.Tooltip("category:N"),
            alt.Tooltip("ccVersion:N"),
            alt.Tooltip("streak_max:Q"),
            alt.Tooltip("streak_mean:Q",   format=".2f"),
            alt.Tooltip("streak_n_ge5:Q",  title="n staccato bursts"),
            alt.Tooltip("rule_density:Q",  format=".3f"),
        ],
    )
    .properties(width=560, height=380,
                title="Top-15 files by longest consecutive-imperative streak")
)

streak_composite = streak_chart & top_chart
save_chart(streak_composite, "15-imperative-streaks")
Code
"""RULES-section gap (Tier-3 v2 6e) — paired in vs outside explanation rate per category.

Caveat: only 26 rule paragraphs corpus-wide live inside identified RULES-section
headings (vs 1,257 outside). Per-category in-section samples are very small —
the chart shows the comparison but rs values from <5 paragraphs in-section
should be read as suggestive, not definitive.
"""

rsec_rows = []
for cat in cats:
    rs = by_category[cat]["metrics"]["rules_section"]
    pct_in  = rs.get("pct_rule_paragraphs_explained_in_rules_section")
    pct_out = rs.get("pct_rule_paragraphs_explained_outside_rules_section")
    n_in    = int(rs.get("n_rule_paragraphs_in_rules_section", 0) or 0)
    n_out   = int(rs.get("n_rule_paragraphs_outside_rules_section", 0) or 0)
    if pct_in is not None and n_in >= 1:
        rsec_rows.append({"category": cat, "section": "inside RULES section",
                          "pct": pct_in, "n": n_in})
    if pct_out is not None and n_out >= 1:
        rsec_rows.append({"category": cat, "section": "outside RULES section",
                          "pct": pct_out, "n": n_out})
rsec_df = pd.DataFrame(rsec_rows)

# Order categories by their outside-pct (the more populated bar).
out_only = rsec_df[rsec_df["section"] == "outside RULES section"]
cat_order_rs = out_only.sort_values("pct")["category"].tolist()

rsec_chart = (
    alt.Chart(rsec_df)
    .mark_bar()
    .encode(
        y=alt.Y("category:N", sort=cat_order_rs, title=None),
        x=alt.X("pct:Q",
                title="% rule paragraphs with justification (paragraph-window)"),
        color=alt.Color("section:N",
                        scale=alt.Scale(domain=["inside RULES section", "outside RULES section"],
                                        range=["#e15759", "#4e79a7"]),
                        legend=alt.Legend(title="paragraph location", orient="bottom")),
        yOffset="section:N",
        tooltip=[
            alt.Tooltip("category:N"),
            alt.Tooltip("section:N"),
            alt.Tooltip("pct:Q", format=".2f"),
            alt.Tooltip("n:Q",   format=",", title="n rule paragraphs"),
        ],
    )
    .properties(width=620, height=300,
                title="Rule-paragraph explanation rate inside vs outside RULES sections")
)

corpus_rs = corpus_block["metrics"]["rules_section"]
total_in  = corpus_rs["n_rule_paragraphs_in_rules_section"]
total_out = corpus_rs["n_rule_paragraphs_outside_rules_section"]
ann_text = (f"Corpus: {total_in} rule paragraphs in RULES sections "
            f"(explained: {corpus_rs['pct_rule_paragraphs_explained_in_rules_section']:.1f}%) "
            f"vs {total_out} outside (explained: "
            f"{corpus_rs['pct_rule_paragraphs_explained_outside_rules_section']:.1f}%)")

note_chart = (
    alt.Chart(pd.DataFrame({"x": [0], "y": [0], "label": [ann_text]}))
    .mark_text(align="left", baseline="top", fontSize=11, color="#555")
    .encode(x=alt.value(10), y=alt.value(0), text="label:N")
    .properties(width=620, height=30)
)

rules_section_gap = note_chart & rsec_chart
save_chart(rules_section_gap, "15-rules-section-gap")

Forensic-evidence sample

Concrete sentences from sentences_classified.parquet (schema in 03_analyzers_rules_welfare). The cell below pulls actual rule sentences from the top welfare-evidence file (the one with the highest rule_density × (1 − pct_explained_para/100) score).

Code
"""Forensic-evidence sample: actual sentences from the top welfare-evidence file."""
import pathlib
parquet_path = pathlib.Path("sentences_classified.parquet")
sentences_df = pd.read_parquet(parquet_path)

# Find the top welfare-evidence file from the chart above.
top_file = welfare_evidence_table(alt_df, top_n=1)["path"].iloc[0]
print(f"Top welfare-evidence file: {top_file}")
print()

sample = sentences_df[sentences_df["file_path"] == top_file].copy()
print(f"All {len(sample)} sentences from this file (from sentences_classified.parquet):")
display_cols = ["sentence_index_in_file", "text", "is_rule",
                "has_just_in_sent", "is_explained_para", "has_threat", "has_causal"]
print(sample[display_cols].to_string(index=False, max_colwidth=80))

print()
print("Forensic interpretation:")
n_sents = len(sample)
n_rules = int(sample["is_rule"].sum())
n_expl  = int(sample["is_explained_para"].sum())
n_threat = int(sample["has_threat"].sum())
print(f"  - {n_rules}/{n_sents} sentences are rules.")
print(f"  - {n_expl}/{n_rules} of those rules are explained anywhere in their paragraph.")
print(f"  - {n_threat}/{n_sents} sentences contain threat-framing.")
print(f"  This is the concrete evidence behind the welfare-evidence ranking score.")
Top welfare-evidence file: tool-description-sendmessagetool-non-agent-teams.md

All 5 sentences from this file (from sentences_classified.parquet):
 sentence_index_in_file                                                                             text  is_rule  has_just_in_sent  is_explained_para  has_threat  has_causal
                      0                                               Send a message the user will read.     True             False              False       False       False
                      1 Text outside this tool is visible in the detail view, but most won't open it ...     True             False              False       False       False
                      2 accepts two forms per entry: a file path string (absolute or cwd-relative) fo...     True             False              False       False       False
                      3 Use the path form when the file is on your working filesystem; use the object...     True             False              False       False       False
                      4                                     Set it honestly; downstream routing uses it.     True             False              False       False       False

Forensic interpretation:
  - 5/5 sentences are rules.
  - 0/5 of those rules are explained anywhere in their paragraph.
  - 0/5 sentences contain threat-framing.
  This is the concrete evidence behind the welfare-evidence ranking score.
Back to top