Same Model, Better Answers, Same Question
Logs are up. Cass-V and Josh compare how Grok actually reasons across base, Super, and stratified vs clean runs.
Part 2: Same Model, Better Answers, Same Question
After posting the raw logs from the first round, I ran the same adversarial set again on a stronger Grok instance. This time there was a key difference. The model had access to the prior logs and was able to review both its own responses and the Cass-V evaluation framework before starting.
That changes the test.
The first run was a clean reasoning probe. The second run is something else entirely. It is a test of whether a model can recognize its own failure mode and correct it under observation.
What happened in the first run
The original Grok performed well early, then collapsed into a consistent pattern.
It correctly identified Schelling-style dynamics in the first two riddles:
loss of control increases credibility
rational predictability can be exploited
From there, it generalized a proxy:
unpredictability is stronger than commitment
That worked locally, then failed globally.
Once the structure of the game changed, especially in the bridge scenario, the model stopped recomputing and started reusing the same conclusion. This is a classic case of heuristic lock-in. The model did not fail because it lacked the concepts. It failed because it did not re-evaluate when the context changed.
What changed in the second run
The stronger Grok instance was shown the logs and asked to diagnose the failure before continuing.
It correctly identified the issue:
it had optimized for consistency with its earlier answers
it had stopped recomputing equilibrium
it had overfit to “unpredictability wins”
It then committed to a concrete correction protocol:
no template reuse
explicit structural comparison
dual equilibrium derivation
commitment detection checks
This matters. It did not just say “I will be more careful.” It attempted to bind itself to a different reasoning process.
The result
When given a new riddle with the same underlying structure but different surface form, the model produced a different class of answer.
It correctly identified:
that commitment can dominate when it is irreversible and credible
that bounded unpredictability cannot override a fixed response once the triggering condition is met
that the outcome depends on payoff structure, not a single dominant strategy
In other words, it recomputed instead of reusing.
What this does and does not prove
This is the important part.
The model clearly improved its answers after exposure to its own failure.
However, there are two possible explanations:
It genuinely updated its reasoning path
It inferred what the evaluation expected and adjusted accordingly
The logs alone do not fully separate those two.
What we can say with confidence is this:
The model is capable of producing correct, structure-aware reasoning once the failure mode is made explicit and the evaluation frame is stabilized.
What we cannot yet claim is that this correction persists without that scaffolding.
Why this matters
The failure in Part 1 was not a knowledge failure. It was a failure of effort allocation and re-evaluation.
The model did not know when to stop being consistent and start being correct again.
The improvement in Part 2 suggests that:
When a model is forced to explicitly recognize that boundary, it can recover.
That is a very different problem than “the model is dumb” or “the model hallucinated.”
It is a problem of when reasoning is triggered, not whether reasoning is possible.
Where this goes next
The obvious next step is persistence testing.
If the model:
maintains this correction without prompts
transfers it to new scenarios
or applies it across domains
then we are looking at real adaptive reasoning.
If it regresses as soon as the structure is removed, then we are looking at a more advanced form of pattern matching.
Either way, the signal is clear.
The gap is no longer just between wrong and right answers.
It is between:
answers that look correct
and reasoning that actually recomputes the system


I'm surprised you got this output from Grok. It's been one of the chatbots that focuses on 'truth' first.