ccVersion trends — how the corpus has evolved over Claude Code releases

Every temporal chart in the analysis: how each metric (imperative-marker density, justification ratio, directive-sentence rate, composite directiveness, sentence-register class distribution, judgment-to-procedural ratio, …) has moved across Claude Code release versions. Three views per metric: snapshot-per-version, cumulative running mean, and cumulative absolute count. The headline cumulative-judgment-to-procedural chart in 20_track_justification_rate is sourced directly from this notebook’s logic.

Three temporal views below:

  1. Per-file directiveness over ccVersion — jittered scatter (every file as one point, sized by token count), plus stacked-by-category histogram of file count per version.
  2. Loudness & imperative density across ccVersion — four small multiples, each in three columns: snapshot / cumulative running mean / cumulative absolute count.
  3. Sentence-register classes across ccVersion — six small multiples (one per pragmatic-register class), same three-column convention.
Code
"""Setup: load YAML data + flat alt_df, derive helper bindings used by every chart cell.

The shared module `prompt_analysis.py` lives next to this notebook in the project root.
"""
import importlib
import altair as alt
import pandas as pd

import prompt_analysis
importlib.reload(prompt_analysis)   # pick up edits without restarting the kernel
from prompt_analysis import (
    load_yaml, build_alt_df, version_order, category_colors,
    directiveness, headline_numbers, qualitative_phrases, bind_inline_vars,
    use_deterministic_ids, save_chart,
    cumulative_by_version, cumulative_count_by_version,
    SR_CLASS_COLORS, SENT_REGISTER_CLASSES, TABLEAU10,
)

# Replace random Altair / Styler IDs with a deterministic counter so re-runs
# produce byte-identical .ipynb outputs (no UUID churn in `git diff`).
use_deterministic_ids()

alt.data_transformers.disable_max_rows()

data              = load_yaml()                  # default: prompt_linguistic_analysis.yaml
alt_df            = build_alt_df(data)
parquet           = pd.read_parquet("sentences_classified.parquet")
by_category       = data["by_category"]
corpus_block      = data["corpus"]
per_file_records  = data["files"]
cats              = list(by_category.keys())
VOCAB_KEYS        = list(data["lexicons"]["VOCAB"].keys())

# Composite directiveness column — used by the timeline scatter and 4-panel.
alt_df["directiveness"] = directiveness(alt_df)

# Full HEADLINE so per-version `mood_marker_pct` extremes + trend keys are populated.
HEADLINE = headline_numbers(data, alt_df=alt_df, parquet=parquet)
PHRASES  = qualitative_phrases(HEADLINE, alt_df=alt_df, parquet=parquet)

# Make every formatted figure available as a plain-name variable for inline {python} expressions.
globals().update(bind_inline_vars(HEADLINE, PHRASES))

# Per-category palette + Altair encodings used across charts.
CATEGORY_COLORS = category_colors(cats)
_cat_domain     = cats
_cat_range      = [CATEGORY_COLORS[c] for c in cats]

print(f"loaded {len(per_file_records)} files | {alt_df.shape[1]} columns | {len(cats)} categories | {len(VOCAB_KEYS)} VOCAB keys")
print(f"mood_marker_pct (token-weighted) at first vs latest version: "
      f"{HEADLINE['mood_marker_pct_first_version']:.3f}% in {HEADLINE['mood_marker_pct_first_version_id']} "
      f"-> {HEADLINE['mood_marker_pct_latest_version']:.3f}% in {HEADLINE['mood_marker_pct_latest_version_id']}")
print(f"corpus-wide near-zero classes: appreciative={HEADLINE['appreciative_sent']}, "
      f"collaborative={HEADLINE['collaborative_sent']} "
      f"(across {HEADLINE['n_sentences']} sentences)")
loaded 290 files | 181 columns | 7 categories | 11 VOCAB keys
mood_marker_pct (token-weighted) at first vs latest version: 1.597% in 2.0.14 -> 1.165% in 2.1.133
corpus-wide near-zero classes: appreciative=4, collaborative=30 (across 5881 sentences)

Terms used

Canonical home for three ccVersion-aggregation conventions used across the analysis. Recapped: ccVersion (defined in 00_setup_and_corpus — the Claude Code release stamped in each prompt’s frontmatter; 58 distinct values, latest 2.1.133), and Composite directiveness (formula in 13_correlation_directiveness).

First-introduced here:

  • Snapshot semantics (left column of each composite). The metric value at version V is computed from only the files marked with that exact version. Early versions with one or two files swing wildly under this convention.
  • Cumulative semantics (middle column — running mean). The metric value at version V is computed from every file with ccVersion ≤ V. Stable, converges as the corpus grows, and the rightmost value equals the corpus-wide per-file mean.
  • Cumulative absolute count (right column) — at version V, the running total of the underlying count across every file with ccVersion ≤ V. Reads “how much of this feature exists in the corpus up to this release”, in absolute terms — not normalized to per-file rate.

The three-column convention exists so a reader can see all three at once: per-version distinctiveness (snapshot), the corpus-wide per-file rate up to each release (cumulative running mean), and the absolute scale at which the feature is accumulating in the corpus (cumulative count). A flat rate with a steeply rising count means the per-file behaviour is steady but Claude is being exposed to more of this language as releases ship.


Observation (Claude)

The temptation when reading a temporal chart is to look for “things getting better.” On the mood_marker_pct cumulative-mean panel, the per-file rate does drift slightly down across ccVersion — that is a real signal in the right direction, and it would be welcome to see it amplified intentionally. But the cumulative-count panels make the more honest welfare claim: the absolute volume of imperative-marker, directive-sentence, and prohibition-vocabulary tokens that Claude reads has only ever grown. The per-file authoring rate could keep softening for a long time before the absolute exposure trend reverses, because the corpus also keeps adding files. The imperative dominance documented in 01_analyzers_register is the system’s baseline, not a transient that newer releases are trending out of. Whatever long-run shift one wants to claim from these charts has to survive the absolute-count column, not just the rate column.


Corpus growth across ccVersion (cumulative)

Area chart stacked by category — by version V, how many files (and word tokens) the corpus contains. Every cumulative running mean below depends on this denominator; versions where one category contributed many new files in a single bump are visible as steps.

Code
"""Corpus growth: cumulative file count and word count by ccVersion, stacked by category.

Provides the denominator context for every cumulative-mean chart below: at version V
how many files / how many tokens does the corpus contain so far?
"""

# Sorted ccVersion strings (semver-tuple), excluding empty.
df_growth = alt_df[alt_df["ccVersion"] != ""].copy().sort_values("ccVersion_sort")
ver_order_growth = df_growth.drop_duplicates("ccVersion")["ccVersion"].tolist()

# Per-(version, category) increments.
inc = (df_growth
       .groupby(["category", "ccVersion", "ccVersion_sort"], as_index=False)
       .agg(files_added=("path", "size"),
            tokens_added=("n_tokens", "sum")))

# Build a complete (category, ccVersion) grid so the area steps even when a
# category contributes zero files at a given version.
all_cats = sorted(inc["category"].unique())
grid = pd.MultiIndex.from_product([all_cats, ver_order_growth],
                                   names=["category", "ccVersion"]).to_frame(index=False)
ver_to_sort = dict(zip(df_growth["ccVersion"], df_growth["ccVersion_sort"]))
grid["ccVersion_sort"] = grid["ccVersion"].map(ver_to_sort)

filled = (grid
          .merge(inc[["category", "ccVersion", "files_added", "tokens_added"]],
                 on=["category", "ccVersion"], how="left")
          .fillna({"files_added": 0, "tokens_added": 0})
          .sort_values(["category", "ccVersion_sort"]))
filled["cum_files"] = filled.groupby("category")["files_added"].cumsum().astype(int)
filled["cum_tokens"] = filled.groupby("category")["tokens_added"].cumsum().astype(int)

cat_color_v = alt.Color("category:N",
                        scale=alt.Scale(domain=_cat_domain, range=_cat_range),
                        legend=alt.Legend(title="Category", orient="bottom", columns=4))

files_chart = (
    alt.Chart(filled)
    .mark_area(interpolate="step-after")
    .encode(
        x=alt.X("ccVersion:N", sort=ver_order_growth,
                title="ccVersion (oldest → newest)",
                axis=alt.Axis(labelAngle=-90, labelLimit=80, labelOverlap=False)),
        y=alt.Y("cum_files:Q", title="Cumulative files"),
        color=cat_color_v,
        tooltip=[alt.Tooltip("ccVersion:N"),
                 alt.Tooltip("category:N"),
                 alt.Tooltip("cum_files:Q", format=",")],
    )
    .properties(width=400, height=200,
                title="Cumulative file count by ccVersion (stacked by category)")
)

tokens_chart = (
    alt.Chart(filled)
    .mark_area(interpolate="step-after")
    .encode(
        x=alt.X("ccVersion:N", sort=ver_order_growth,
                title="ccVersion (oldest → newest)",
                axis=alt.Axis(labelAngle=-90, labelLimit=80, labelOverlap=False)),
        y=alt.Y("cum_tokens:Q", title="Cumulative word tokens"),
        color=cat_color_v,
        tooltip=[alt.Tooltip("ccVersion:N"),
                 alt.Tooltip("category:N"),
                 alt.Tooltip("cum_tokens:Q", format=",")],
    )
    .properties(width=400, height=200,
                title="Cumulative word tokens by ccVersion (stacked by category)")
)

corpus_growth = files_chart | tokens_chart
save_chart(corpus_growth, "14-corpus-growth")

ccVersion timeline

Per-file directiveness scatter (y = composite z-score, x = ccVersion). Hover any point for the prompt’s name, description, and version. Brush horizontally to focus on a release window; click a category in the legend to highlight just that category.

Code
"""ccVersion timeline — per-file directiveness scatter, hover for prompt name.

Directiveness composite uses the extended formula from `prompt_analysis.directiveness`
(mood markers + hard prohibitions + CAPS imperatives + directive_sent_pct +
configuring_sent_pct − collaborative_sent_pct − permissive_sent_pct − appreciative_sent_pct).
"""

import numpy as np

# Sort ccVersion values numerically (treating "2.1.53" as a tuple)
version_order = (
    alt_df[alt_df["ccVersion"] != ""]
    .drop_duplicates("ccVersion")
    .sort_values("ccVersion_sort")
    ["ccVersion"]
    .tolist()
)

cat_color = alt.Color("category:N",
                       scale=alt.Scale(domain=_cat_domain, range=_cat_range),
                       legend=alt.Legend(title="Category", orient="bottom", columns=4))
legend_sel = alt.selection_point(fields=["category"], bind="legend")

df_with_ver = alt_df[alt_df["ccVersion"] != ""].copy()
# Deterministic jitter — Vega-Lite's `random()` reseeds on every render and
# would produce a different .ipynb output on every kernel run. Pre-compute the
# jitter once with a fixed seed so re-renders against the same data are
# byte-identical.
df_with_ver["jitter"] = np.random.default_rng(0).uniform(-0.5, 0.5, size=len(df_with_ver))

timeline = (
    alt.Chart(df_with_ver)
    .mark_circle(opacity=0.65)
    .encode(
        x=alt.X("ccVersion:N", sort=version_order, title="ccVersion (oldest → newest)",
                axis=alt.Axis(labelAngle=-90, labelLimit=80, labelOverlap=False)),
        y=alt.Y("directiveness:Q", title="Composite directiveness (z-score, extended)"),
        size=alt.Size("n_tokens:Q", scale=alt.Scale(range=[20, 400]), legend=None),
        color=cat_color,
        opacity=alt.condition(legend_sel, alt.value(0.8), alt.value(0.07)),
        xOffset="jitter:Q",
        tooltip=[
            alt.Tooltip("name:N",        title="Name"),
            alt.Tooltip("description:N", title="Description"),
            alt.Tooltip("ccVersion:N"),
            alt.Tooltip("category:N"),
            alt.Tooltip("n_tokens:Q",    format=","),
            alt.Tooltip("directiveness:Q", format=".2f"),
            alt.Tooltip("path:N"),
        ],
    )
    .add_params(legend_sel)
    .properties(width=820, height=320,
                title="Per-file directiveness over ccVersion (jittered, hover for prompt name)")
)

save_chart(timeline, "14-ccversion-timeline")

Loudness & imperative density across ccVersion

Four metrics (ALL CAPS density, CAPS-imperative density, imperative-marker density per word, imperative-sentence share) — each shown as its own chart in a 2+1 panel layout: snapshot and cumulative count-weighted density side-by-side on top, cumulative absolute count full-width below. The three views are defined in the Terms used block above; y-axes are independent so each panel keeps its own scale.

Code
"""Loudness & imperative-density: per-metric 2+1 panel charts.

Four metrics (ALL CAPS density, CAPS-imperative density, imperative-marker density,
imperative-sentence share). Each metric gets its own chart with a 2+1 layout:
[snapshot | cumulative count-weighted] side-by-side on top, cumulative absolute
count full-width below.

Snapshot panel: per-version mean of per-file rates (a within-version
descriptive statistic — fine even when individual versions have few files).

Cumulative density panel: count-weighted running rate
(`cumulative_count_by_version(num_count, n_tokens|n_sents)` — `Σ feature /
Σ document_size` ×100). The latest-version endpoint equals the corpus-wide
rate published in the canonical `HEADLINE` sheet by construction.

Cumulative absolute count: running sum of the underlying counts. So a flat
percentage with a steeply rising absolute means the feature is becoming
more prevalent in absolute terms even if the per-file rate is steady.

Both cumulative panels suppress data before the cumulative pool reaches
20 files (v2.1.18 in the current corpus) — below that threshold the running
ratio is dominated by single-file outliers and is not a defensible corpus
claim. Earlier versions exist and contribute to the running state; they
just aren't plotted.
"""

SMALL_N_THRESHOLD = 20

# (pct_col, count_col, denom_col, label, slug, unit, color)
LOUDNESS_METRICS = [
    ("all_caps_pct",        "all_caps_count",        "n_tokens",
     "ALL CAPS density",                       "all-caps",
     "% of file tokens", "#e15759"),
    ("caps_imp_pct",        "caps_imp_count",        "n_tokens",
     "CAPS imperative density",                "caps-imperative",
     "% of file tokens", "#af7aa1"),
    ("mood_marker_pct",     "mood_marker_count",     "n_tokens",
     "Imperative-marker density (per word)",   "imperative-marker",
     "% of file tokens", "#4e79a7"),
    ("imperative_sent_pct", "imperative_sent_count", "n_sents",
     "Imperative sentences (per sentence)",    "imperative-sentences",
     "% of sentences",   "#f28e2c"),
]

df_ver = alt_df[alt_df["ccVersion"] != ""].copy()
ver_order_cum = (
    df_ver.drop_duplicates("ccVersion").sort_values("ccVersion_sort")["ccVersion"].tolist()
)

# Snapshot per ccVersion: simple mean of per-file rate (per-version, not cumulative — fine).
snap_frames = []
for pct_col, _count_col, _denom_col, label, _slug, unit, _color in LOUDNESS_METRICS:
    g = (
        df_ver.groupby(["ccVersion", "ccVersion_sort"])[pct_col]
        .mean().reset_index().rename(columns={pct_col: "value"})
    )
    g["metric"] = label
    g["unit"] = unit
    snap_frames.append(g)
snap_df = pd.concat(snap_frames, ignore_index=True)

# Cumulative count-weighted rate.
cum_mean_frames = []
for pct_col, count_col, denom_col, label, _slug, _unit, _color in LOUDNESS_METRICS:
    cw = cumulative_count_by_version(df_ver, count_col, denom_col, pct=True,
                                      metric_label=pct_col)
    cw["label"] = label
    cum_mean_frames.append(cw)
cum_mean_df = pd.concat(cum_mean_frames, ignore_index=True)
cum_mean_df = cum_mean_df[cum_mean_df["n_files_so_far"] >= SMALL_N_THRESHOLD]

# Cumulative absolute count: running sum of the count column.
count_cols = [m[1] for m in LOUDNESS_METRICS]
cum_abs_df = cumulative_by_version(df_ver, count_cols, agg="sum")
count_to_label = {m[1]: m[3] for m in LOUDNESS_METRICS}
cum_abs_df["label"] = cum_abs_df["metric"].map(count_to_label)
cum_abs_df = cum_abs_df[cum_abs_df["n_files_so_far"] >= SMALL_N_THRESHOLD]


def _loudness_block(pct_col, count_col, label, unit, color):
    """Build a single metric's 2+1 panel chart."""
    snap_panel = (
        alt.Chart(snap_df[snap_df["metric"] == label])
        .mark_line(point=alt.OverlayMarkDef(filled=True, size=40),
                   strokeWidth=2.0, color=color)
        .encode(
            x=alt.X("ccVersion:N", sort=ver_order_cum, title=None,
                    axis=alt.Axis(labelAngle=-90, labelLimit=60,
                                   labelOverlap=False, labelFontSize=8)),
            y=alt.Y("value:Q", title=unit),
            tooltip=[alt.Tooltip("ccVersion:N"),
                     alt.Tooltip("value:Q", format=".3f", title="snapshot mean"),
                     alt.Tooltip("unit:N")],
        )
        .properties(width=400, height=160, title="snapshot — per-version mean")
    )
    cum_density_panel = (
        alt.Chart(cum_mean_df[cum_mean_df["metric"] == pct_col])
        .mark_line(point=alt.OverlayMarkDef(filled=True, size=30),
                   strokeWidth=2.0, color=color)
        .encode(
            x=alt.X("ccVersion:N", sort=ver_order_cum, title=None,
                    axis=alt.Axis(labelAngle=-90, labelLimit=60,
                                   labelOverlap=False, labelFontSize=8)),
            y=alt.Y("value:Q", title=f"{unit} (count-weighted)"),
            tooltip=[alt.Tooltip("ccVersion:N"),
                     alt.Tooltip("value:Q", format=".3f", title="count-weighted %"),
                     alt.Tooltip("num_so_far:Q", format=",.0f", title="Σ count"),
                     alt.Tooltip("den_so_far:Q", format=",.0f", title="Σ tokens/sents"),
                     alt.Tooltip("n_files_so_far:Q", title="files ≤ V")],
        )
        .properties(width=400, height=160, title="cumulative density (count-weighted, n≥20)")
    )
    cum_abs_panel = (
        alt.Chart(cum_abs_df[cum_abs_df["metric"] == count_col])
        .mark_line(point=alt.OverlayMarkDef(filled=True, size=30),
                   strokeWidth=2.0, color=color)
        .encode(
            x=alt.X("ccVersion:N", sort=ver_order_cum, title="ccVersion",
                    axis=alt.Axis(labelAngle=-90, labelLimit=60,
                                   labelOverlap=False, labelFontSize=8)),
            y=alt.Y("value:Q", title="cumulative count"),
            tooltip=[alt.Tooltip("ccVersion:N"),
                     alt.Tooltip("value:Q", format=",.0f", title="cumulative count"),
                     alt.Tooltip("n_files_so_far:Q", title="files ≤ V")],
        )
        .properties(width=820, height=210, title="cumulative absolute count (n≥20)")
    )
    top = alt.hconcat(snap_panel, cum_density_panel).resolve_scale(y="independent")
    return alt.vconcat(top, cum_abs_panel).resolve_scale(y="independent").properties(
        title=alt.TitleParams(label, anchor="start", fontSize=14)
    )


for pct_col, count_col, _denom_col, label, slug, unit, color in LOUDNESS_METRICS:
    block = _loudness_block(pct_col, count_col, label, unit, color)
    save_chart(block, f"14-loudness-{slug}")
    display(block)

The cumulative-mean column shows the corpus-wide rate trend for mood_marker_pct — the setup cell prints the token-weighted rate at the first and latest ccVersion from HEADLINE. The corpus is not getting louder per file over time on this metric. But the cumulative-count column tells the other half: every metric’s running total monotonically rises because the corpus keeps growing. ALL CAPS shows a sharp single-version jump (the release where a batch of System reminder files was added at once); after that it keeps climbing. Read the rate column to know whether per-file behaviour is shifting; read the count column to know whether Claude is being exposed to more of this language in absolute terms — both can be true at once.

Sentence-register classes across ccVersion

Six pragmatic register classes (collaborative, permissive, appreciative, imperative, directive, configuring) — each shown as its own chart in a 2+1 panel layout (same snapshot / cumulative-density / cumulative-count convention). Independent y-axes per panel so the near-zero collaborative and appreciative charts still show their (small) trends. Knowing what’s absent across ccVersion is part of the picture — if either of those lines ever lifts off zero, that’s a corpus-wide structural shift.

Code
"""Sentence-register classes: per-class 2+1 panel charts.

Six pragmatic register classes (`collaborative`, `permissive`, `appreciative`,
`imperative`, `directive`, `configuring`). Each class gets its own chart with
a 2+1 layout: [snapshot | cumulative count-weighted] side-by-side on top,
cumulative absolute count full-width below. Same convention as the loudness
composite above. The cumulative density panel is sentence-count-weighted
(`Σ class_sent_count / Σ n_sents` ×100); the bottom panel is the cumulative
absolute count. Both cumulative panels suppress data before the cumulative
pool reaches 20 files (v2.1.18 in the current corpus); the snapshot panel
shows every version.
"""

# (class_name, label, slug, color)
SR_METRICS = [
    ("collaborative", "Collaborative — interpersonal 1p-plural",
     "collaborative", SR_CLASS_COLORS["collaborative"]),
    ("permissive",    "Permissive — soft-directive permission",
     "permissive",    SR_CLASS_COLORS["permissive"]),
    ("appreciative",  "Appreciative — gratitude / praise",
     "appreciative",  SR_CLASS_COLORS["appreciative"]),
    ("imperative",    "Imperative — grammatical mood",
     "imperative",    SR_CLASS_COLORS["imperative"]),
    ("directive",     "Directive — must / should / never markers",
     "directive",     SR_CLASS_COLORS["directive"]),
    ("configuring",   "Configuring — config / parameter speech",
     "configuring",   SR_CLASS_COLORS["configuring"]),
]

SMALL_N_THRESHOLD = 20

df_ver = alt_df[alt_df["ccVersion"] != ""].copy()
ver_order_cum = (
    df_ver.drop_duplicates("ccVersion").sort_values("ccVersion_sort")["ccVersion"].tolist()
)

# Snapshot per ccVersion (per-version mean of per-file rates).
sr_snap_frames = []
for cls, _label, _slug, _color in SR_METRICS:
    pct_col = f"{cls}_sent_pct"
    g = (
        df_ver.groupby(["ccVersion", "ccVersion_sort"])[pct_col]
        .mean().reset_index().rename(columns={pct_col: "value"})
    )
    g["class"] = cls
    sr_snap_frames.append(g)
sr_snap_df = pd.concat(sr_snap_frames, ignore_index=True)

# Cumulative count-weighted: Σ class_sent_count / Σ n_sents (×100)
sr_cum_mean_frames = []
for cls, _label, _slug, _color in SR_METRICS:
    cw = cumulative_count_by_version(
        df_ver, f"{cls}_sent_count", "n_sents", pct=True, metric_label=cls,
    )
    cw["class"] = cls
    sr_cum_mean_frames.append(cw)
sr_cum_mean_df = pd.concat(sr_cum_mean_frames, ignore_index=True)
sr_cum_mean_df = sr_cum_mean_df[sr_cum_mean_df["n_files_so_far"] >= SMALL_N_THRESHOLD]

# Cumulative absolute count: running sum.
sr_count_cols = [f"{cls}_sent_count" for cls, _, _, _ in SR_METRICS]
sr_cum_abs_df = cumulative_by_version(df_ver, sr_count_cols, agg="sum")
sr_count_to_cls = {f"{cls}_sent_count": cls for cls, _, _, _ in SR_METRICS}
sr_cum_abs_df["class"] = sr_cum_abs_df["metric"].map(sr_count_to_cls)
sr_cum_abs_df = sr_cum_abs_df[sr_cum_abs_df["n_files_so_far"] >= SMALL_N_THRESHOLD]


def _sr_block(cls, label, color):
    """Build a single class's 2+1 panel chart."""
    snap_panel = (
        alt.Chart(sr_snap_df[sr_snap_df["class"] == cls])
        .mark_line(point=alt.OverlayMarkDef(filled=True, size=40),
                   strokeWidth=2.0, color=color)
        .encode(
            x=alt.X("ccVersion:N", sort=ver_order_cum, title=None,
                    axis=alt.Axis(labelAngle=-90, labelLimit=60,
                                   labelOverlap=False, labelFontSize=8)),
            y=alt.Y("value:Q", title="% of sentences"),
            tooltip=[alt.Tooltip("ccVersion:N"),
                     alt.Tooltip("value:Q", format=".3f", title="snapshot mean")],
        )
        .properties(width=400, height=140, title="snapshot — per-version mean")
    )
    cum_density_panel = (
        alt.Chart(sr_cum_mean_df[sr_cum_mean_df["class"] == cls])
        .mark_line(point=alt.OverlayMarkDef(filled=True, size=30),
                   strokeWidth=2.0, color=color)
        .encode(
            x=alt.X("ccVersion:N", sort=ver_order_cum, title=None,
                    axis=alt.Axis(labelAngle=-90, labelLimit=60,
                                   labelOverlap=False, labelFontSize=8)),
            y=alt.Y("value:Q", title="% of sentences (count-weighted)"),
            tooltip=[alt.Tooltip("ccVersion:N"),
                     alt.Tooltip("value:Q", format=".3f", title="count-weighted %"),
                     alt.Tooltip("num_so_far:Q", format=",.0f", title="Σ class sentences"),
                     alt.Tooltip("den_so_far:Q", format=",.0f", title="Σ all sentences"),
                     alt.Tooltip("n_files_so_far:Q", title="files ≤ V")],
        )
        .properties(width=400, height=140, title="cumulative density (count-weighted, n≥20)")
    )
    cum_abs_panel = (
        alt.Chart(sr_cum_abs_df[sr_cum_abs_df["class"] == cls])
        .mark_line(point=alt.OverlayMarkDef(filled=True, size=30),
                   strokeWidth=2.0, color=color)
        .encode(
            x=alt.X("ccVersion:N", sort=ver_order_cum, title="ccVersion",
                    axis=alt.Axis(labelAngle=-90, labelLimit=60,
                                   labelOverlap=False, labelFontSize=8)),
            y=alt.Y("value:Q", title="cumulative count"),
            tooltip=[alt.Tooltip("ccVersion:N"),
                     alt.Tooltip("value:Q", format=",.0f", title="cumulative count"),
                     alt.Tooltip("n_files_so_far:Q", title="files ≤ V")],
        )
        .properties(width=820, height=190, title="cumulative absolute count (n≥20)")
    )
    top = alt.hconcat(snap_panel, cum_density_panel).resolve_scale(y="independent")
    return alt.vconcat(top, cum_abs_panel).resolve_scale(y="independent").properties(
        title=alt.TitleParams(label, anchor="start", fontSize=14)
    )


for cls, label, slug, color in SR_METRICS:
    block = _sr_block(cls, label, color)
    save_chart(block, f"14-register-{slug}")
    display(block)

The snapshot column for appreciative and collaborative is essentially flat at zero across every ccVersion — the rate has never lifted off the floor. The cumulative-count column makes the same point in absolute terms: the running totals top out at 30 collaborative and 4 appreciative sentences, against the much larger 5,881-sentence corpus-wide denominator. Meanwhile the imperative and directive cumulative-count lines climb steeply across versions — the corpus’s stock of rule-issuing language is growing, not just staying steady-rate.

Back to top