The Six
Archetypes.
Because the strategic surface of this game is wide, no single optimal strategy emerges — instead, a cluster of plausible styles, each with characteristic vulnerabilities. These are the six archetypes I use to seed the simulations, and against which a learned agent has to hold its own. The first four play to a fixed plan; the fifth updates as it goes; the sixth (still in training) discovers the plan that beats all the rest.
The archetype names below — High-Seeker, Low-Hoarder, Piggy-Hunter, Denialist, Adaptive — are my own invention, built specifically for this simulator. They are not what the regulars at the table call themselves or each other. The other players have been playing this game for years; I started in 2024.
The readings here — who beats whom, what each style's vulnerability is, what the trained agent ought to learn — are my opinions, drawn from simulation data and my own time at the table, and do not necessarily reflect the views of anyone else who plays.
Each archetype is implemented as a stateless decision policy operating on a clean observation of the game state. They are not intended to play optimally; they are intended to play characteristically, so that the meta-game of the table can be studied as a whole rather than collapsed into a single equilibrium. The fifth archetype — the GTO baseline — is the agent we are training. It learns by self-play against the other four.
The High-Seeker
Aggressive · high-only · loud
The High-Seeker commits to the high hand at the deal and never deviates. They prize face cards, native wilds, and pairs. They ship away 2s, 3s, 4s, and 6s without strategic regret — with the result that they feed the Low-Hoarders exactly the cards needed to complete a six-four. They are predictable but they bet hard. If you can read them, you can fold to them cheaply; if you cannot, they will run away with you.
Pass logic: drop lowest-rank cards first Declares: High Bets: aggressive on premium
The Low-Hoarder
Quiet · low-only · methodical
The Low-Hoarder is the High-Seeker's structural dual. They lock on to the six-four (A–2–3–4–6) immediately and pass away face cards, mid-range cards, and anything outside their narrow target. They use native wilds only to fill gaps in the six-four. They bet little; they often call small and fold to anything they suspect of being a scoop attempt. Their primary failure mode is splitting the low pot with another Low-Hoarder when both have the perfect.
Pass logic: keep aces, twos, threes, fours, sixes Declares: Low Bets: thin and selective
The Piggy-Hunter
Greedy · variance-loving · loud
The Piggy-Hunter is the most distinctive of the archetypes. They over-value hands that have split potential — a bottom-end low plus a couple of native wilds, say — and they will declare two chips even when their high is slightly vulnerable, chasing the scoop. They produce enormous per-hand variance and lose more money to the Piggy bust rule than to any other mechanism in the game. A table of Piggy-Hunters is a generous table; a Piggy-Hunter against a conservative caller is dead money.
Pass logic: keep wilds and dual-direction cards Declares: Piggy readily Bets: aggressive on split potential
The Denialist
Defensive · adaptive · reactive
The Denialist treats the four-pass distribution as the primary strategic mechanism it is. Rather than passing cards purely on the basis of internal hand value, the Denialist tracks which cards have come in from which seats and infers what each opponent is likely building. They will then deliberately hold cards that are useless to their own hand specifically to deny them to a player they believe is building the corresponding direction. The presence of a Denialist at the table makes the passing phase tactical rather than merely positional.
Pass logic: hold cards that block opponents Declares: less-contested direction Bets: conservative; calls only the cheap
The Adaptive
Updating · Bayesian · honest
The Adaptive agent is the strongest non-learned style at the table. Where the other four archetypes commit to a plan at the deal and stay there, the Adaptive picks no plan in advance. It scores its starting hand for both high and low feasibility, but does not declare a direction. As each pass comes in, it updates a Bayesian estimate of every opponent's leaning — based not on what they keep (which is unobservable) but on what they ship (which is the only signal available). It then re-chooses its own direction each pass, weighing the gain in hand strength against the cost of contestedness on each side of the pot.
The Adaptive is also the only non-learned archetype that considers the downstream seat when selecting which cards to ship. It will reliably refuse to send 2s and 3s downstream when its inference says the next seat is building low, even if those 2s and 3s are useless to its own hand. In simulation it tends to outperform every fixed style except the Low-Hoarder, which wins because the table is rarely contested for low.
Pass logic: re-evaluates after every pass Declares: situation-dependent Bets: graded by hand and table
The GTO Baseline
Trained · pure-EV · target
The fifth archetype is the agent we are training. It carries no heuristics. It learns to pass, bet, and declare from the signal of millions of self-play hands against the other four archetypes. We expect it to learn, among other things: when to pivot from a High build to a Low build mid-passing-phase; how to exploit the Piggy-Hunter's greed by calling with a blocking hand rather than folding; precisely how many tokens a subtractive wild is actually worth in different table configurations.
Right now this archetype is a placeholder — the simulator runs the other four against each other for telemetry, and the RL training loop on top of the same environment is the next deliverable. A full report on the trained agent's emergent strategy lives on the findings page when it is ready.
Pass logic: learned Declares: learned Bets: learned
The strategic geometry
The first four archetypes are not on a single linear scale of strength; they form a small ecology. The High-Seeker feeds the Low-Hoarder. The Low-Hoarder is exploited by the Denialist. The Denialist suffers against the Piggy-Hunter, who will scoop them when their blocking play backfires. The Piggy-Hunter is in turn ruined by the High-Seeker, who creates the ties that void Piggy declarations.
The Adaptive sits orthogonally to all four. It cannot beat the Low-Hoarder consistently — the Low-Hoarder's strategy is not adversarial to the Adaptive, it is simply hard to dislodge because so few seats compete for the low pot — but it does beat every fixed-direction archetype on the high side, and it avoids Piggy busts because it almost never declares Piggy casually.
This rock-paper-scissors-spock topology is part of why the game rewards skilled play. A single archetype trained against itself converges quickly on a narrow optimum and then loses to anyone who plays the next archetype over. The trained agent must learn to identify which archetype each opponent is — on each individual hand, often within the first one or two passes — and shift its play accordingly. This is the meta-game the research is built to study.