Required for Plan II. Every period boundary, each Utility
agent calls the LLM for a direct trading action (BUY_NOW, SELL_NOW,
BID, ASK_1, HOLD) plus a self-drawn x ∈ [1%, 10%] that scales
best_bid / best_ask for the BID and ASK_1 quotes. The structured prompt includes
market rules, decision-making principles, role-specific guidance,
and the explicit closed-form utility function $U(w; \rho_i) = w^{1-\rho_i}/(1 - \rho_i)$ with the agent's sampled $\rho_i$ substituted in.
Fields are read fresh from the DOM on every run (no localStorage);
if the key is empty the Start button will refuse to launch.
AI endpoint Plan III · LLM + risk label only
Required for Plan III. Every period boundary, each Utility
agent calls the LLM for a direct trading action. The structured
prompt supplies market rules, decision-making principles, and only
a natural-language risk-preference label (risk-loving, risk-neutral,
or risk-averse) — no closed-form utility function is provided.
Fields are read fresh from the DOM on every run (no localStorage);
if the key is empty the Start button will refuse to launch.
Provider
API key
Endpoint (optional)
Model
Bounded Rationality · K=3, N=5, T=3, σ=10, p=0.10
Advanced settings population scale · treatment labels
Paper constants Dufwenberg, Lindqvist & Moore (2005), §I. Design
N = 10
Subjects per session
DLM 2005 §I pins the original design at six subjects — “At each session, six subjects participated in a sequence of four consecutive markets for an experimental asset.” This simulator scales the population to N = 100 for a thicker order book while preserving the four-round session structure.§I, p. 1733 · scaled in main.js · switch in Advanced settings
rounds / session = 4
Consecutive markets played by the same subjects
“A session involved four consecutive markets. In the following, we shall talk in terms of four different rounds. Note the distinction between rounds and periods; a round (being a market) consists of ten periods.”§I, p. 1733 · slider in Advanced settings
= 20
Asset life, in periods (per round)
“An asset's life span is ten periods.” The original Dufwenberg–Lindqvist–Moore (2005) bubble experiment fixes = 10; this simulator doubles the horizon to = 20 periods for a finer-grained staircase while preserving FV1 = 100¢ (paired with the halved dividend below).§I, p. 1732 · scaled in main.js
dividend ∈ {0, 10}¢
Per-period draw, equiprobable
“In each period, it pays a dividend of 0 or 20 U.S. cents, with equal probability.” This simulator halves the support to {0, 10}¢ so the per-period expected dividend drops to 5¢ — keeping FV1 = 100¢ under the doubled = 20.§I, p. 1732 · scaled in assets.js
= 5¢
Expected dividend per period
“The expected dividend in each period is 10 cents (= ½ × 0 cents + ½ × 20 cents).” Under the simulator's halved dividend support this becomes = ½ × 0 + ½ × 10 = 5¢.§I, footnote 5 · scaled in assets.js
Fundamental value, by backward induction
“With k periods remaining, the fundamental value is k × 10 cents.” Under the simulator's = 5¢ this becomes FV = k × 5¢, so FV1 = 20 × 5 = 100¢ still holds at the round boundary.§I, p. 1732 · scaled in assets.js
endowment cash ~ U[800, 1200]¢ · inv ~ U{2, 3, 4}
Per-agent starting bundle (simulator replaces the paper's two discrete types)
“Before a market opened, half of the traders started with 200 cents and six assets, while each of the other traders started with 600 cents and two assets.” The original Dufwenberg–Lindqvist–Moore (2005) bubble experiment pins two discrete bundles (A = 200¢ + 6 shares, B = 600¢ + 2 shares) with identical 800¢ buy-and-hold value under the risk-neutral fundamental. This simulator replaces the two-type design with an independent per-agent draw — cash uniform on [800, 1200]¢, inventory uniform on {2, 3, 4} shares — sampled in sampleEndowment() (js/agents.js, ENDOWMENT_DEFAULT). Each draw is independent across agents and editable before Start via the Agents panel.§I, p. 1733 · js/agents.js — ENDOWMENT_DEFAULT
round-4 replacement R4-⅔ or R4-⅓
Two-treatment design
“In the fourth round, depending on treatment, two or four experienced subjects who had participated in the first three rounds were randomly selected, removed, and replaced by the same number of inexperienced subjects.” The paper labels these conditions by the fraction of experienced subjects remaining in round 4: R4-⅔ (four veterans + two fresh, shorthand T2) and R4-⅓ (two veterans + four fresh, shorthand T4); the R4-⅔ / R4-⅓ notation appears in the hypothesis row of Table 2.§I, p. 1733; Table 2, p. 1735
sessions = 10
Five per treatment (R4-⅔ and R4-⅓)
The multi-session batch runner in the DLM panel reproduces DLM's 10-session design (scaled to N = 100) by sequencing 5 × T20 (R4-⅔) then 5 × T40 (R4-⅓) through the simulator in one click; each session uses a fresh engine seed and a fresh two-type endowment draw.§I, Table 1
payoff Σ final cash + 500¢
Session payoff per subject
“Subjects were privately paid, in cash, the amount of their final cash holdings from each round. They were also paid a show-up fee of $5.” All four rounds count; shares held at the end of a round are worth nothing (the asset’s life span has ended).§I, p. 1735
Hidden Constants
ticks / period = 18
Agent decision rounds inside one period
DLM 2005 runs a continuous 2-minute z-Tree double auction per period; this simulator discretizes that window into 18 decision rounds (≈ one agent turn every 6.7 real-time seconds) so the engine loop can step deterministically. 18 is dense enough to reproduce the bubble-crash pattern while keeping the replay buffer compact.engine.js — period-boundary trigger
naive prior weight = 0.60
Belief blend for naive Utility agents
Weight on the agent's own prior when blending incoming peer messages: $V_i^{\text{post}} = 0.60 \cdot V_i^{\text{prior}} + 0.40 \cdot \bar{m}$, i.e. $w = 0.60$ in the Plan I formula (see Architecture Figure 3). Not specified by DLM 2005, which studies human subjects and has no belief-update model. Chosen so naive agents move noticeably toward peers without collapsing onto them.agents.js — UTILITY_DEFAULTS.naivePriorWeight
skeptical prior weight = 0.90
Belief blend for skeptical Utility agents
Same convex combination as the naive weight but $w = 0.90$: $V_i^{\text{post}} = 0.90 \cdot V_i^{\text{prior}} + 0.10 \cdot \bar{m}$, so a skeptical agent hears messages but is barely moved by them. Not in DLM 2005; introduced so the strategy cube contains a "listen but don't trust" archetype.agents.js — UTILITY_DEFAULTS.skepticalPriorWeight
adaptive weight cap = 0.50
Max one-period belief shift toward peers
Upper bound on the fraction of belief an adaptive agent can shift toward the trust-weighted message mean $\bar{m}$ in a single period: even with fully-trusted senders, $w \geq 0.50$ so $V_i^{\text{post}}$ is at most 50% $\bar{m}$ + 50% $V_i^{\text{prior}}$. Not in DLM 2005; guards against runaway over-update from a single high-trust period.agents.js — UTILITY_DEFAULTS.adaptiveWeightCap
valuation noise = ±3%(legacy)
Per-tick uniform noise on the Utility-agent prior
Superseded — now replaced by the v3 §2 Gaussian noise term ~ scaled by the experience-indexed (see the novice valuation noise entry below). The legacy uniform draw $\varepsilon \sim \mathcal{U}[-n, n]$, $n = 0.03$ is still carried on each agent spec for backwards compatibility with older replays but no longer feeds updateBelief. Kept here to document the pre-v3 behaviour.agents.js — UTILITY_DEFAULTS.valuationNoise (inert)
trust λ = 0.30
EMA learning rate for the pairwise trust update
Pairwise trust is updated as $\tau_{r \to s} \leftarrow (1 - \lambda)\,\tau_{r \to s} + \lambda \cdot \text{closeness}$, where closeness $= \max(0,\, 1 - |\hat{v}_s - \mathrm{VWAP}_t| / \mathrm{VWAP}_t)$. $\lambda = 0.30$ weights each new observation at 30%. Not in DLM 2005, which has no messaging layer; chosen for a balance between responsiveness and stability.engine.js — TrustTracker period close-out
passive fill probability = 0.30
$p_{\text{fill}}$ heuristic for scoring non-crossing quotes
Expected-utility score for a passive quote is $\mathrm{EU}(\alpha) = p_{\text{fill}} \cdot U(w_1) + (1 - p_{\text{fill}}) \cdot U(w_0)$ with $p_{\text{fill}} = 0.30$ (see Architecture Figure 2). A full model would estimate $p_{\text{fill}}$ from order-book state; this is a deliberate constant placeholder and is not proposed by DLM 2005.agents.js — UtilityAgent scoring loop
bias magnitude = 15%
Persistent over/under-valuation of biased Utility agents
Applied as $b_i = \delta_i \cdot \beta$ with $\beta = 0.15$; sign set by the per-slot bias direction $\delta_i \in \{-1, 0, +1\}$ (see Architecture Figure 2). Drives the biased U-agent slots in the default strategy cube (U2, U4, U5). Not in DLM 2005; chosen large enough to perturb the market without dominating the risk-preference split.agents.js — UTILITY_DEFAULTS.biasAmount
self (non-peer) weight step Δω = 0.10, kmax = 3
Per-round increment and saturation horizon for ωi
The anchor has been promoted to Advanced settings → Experience anchors alongside / / / ; only the step size Δω = 0.10 and saturation horizon kmax = 3 stay fixed here so the ramp walks 0.60 → 0.70 → 0.80 → 0.90 for ki ∈ {0, 1, 2, ≥3} and saturates at 0.90 regardless of the anchor. is the convex weight on the agent's own prior when it blends peer opinion: Vposti,t = ·Vpriori,t + (1 − )·m̄t. Asset-swap blend from round r on: ωnew = |corr|·ωtrained + (1 − |corr|)·.utility.js — ExperienceConfig.omegaStep, omegaKmax · ui.js — UI._blendExperience
Session— / 10
Round1 / 4
Period1 / 10
Tick0
Price—
Fundamental—
Mispricing—
Volume · period0
Agents Pre-run draft · editable before the simulation starts
Note
Cash
experimental-currency balance held by agent i at tick t, used to finance bids and grown by realized sales plus end-of-period dividend receipts. The pre-run editable value is the initial endowment ; in Dufwenberg, Lindqvist & Moore (2005) subjects were seeded with either 200¢ or 600¢, while this simulator draws each slot uniformly from [800, 1200] ¢.
Shares
holding of the finite-life asset at tick t (initial endowment ). Each held share pays a random dividend drawn from {0, 2} at the end of every trading period (DLM 2005), so the theoretical risk-neutral fundamental value at the start of period t is . DLM endowment classes held 6 or 2 shares; this simulator draws from {2, 3, 4}.
Wealth
mark-to-fundamental total wealth, defined as + · , or for Utility agents as + · (Lopez-Lira 2025). The Normalized Agent Utility plot is .
P&L
running change in total wealth relative to the initial endowment , reported in experimental cents. Positive values render in green, losses in red. Aggregated across all agents, P&L equals the cumulative dividends paid so the market is zero-sum up to the dividend stream, as in the Smith–Suchanek–Williams design replicated by DLM.
Subj V
the Utility agent's private subjective valuation per share — the posterior $V_{i,t}^{\text{post}}$ from the active plan (Architecture Figure 3), updated each tick from the v3 §2 prior $V_{i,t}^{\text{prior}} = [\alpha_i\!\cdot\!\widetilde{\mathrm{FV}}_{i,t} + (1-\alpha_i)\!\cdot\!H_{i,t}](1 + b_i) + \varepsilon_i$ via the Plan I/II/III belief-revision protocol. Corresponds to the valuation field in Lopez-Lira's (2025) TradeDecisionSchema.
Report
the valuation the Utility agent broadcasts to peers in its messages. Under communication strategy $\sigma_m = D$ (deceptive), ≠ via the distortion multiplier $\phi_m$ (see Architecture Figure 3, Plan I card); the lie-gap magnitude drives the trust EMA update $\tau_{r \to s} \leftarrow (1 - \lambda)\,\tau + \lambda \cdot \text{closeness}$ and the mean-lie-magnitude statistic in the Experiment Metrics table.
Last action
the most recent decision taken by agent i at tick t, displayed as a coloured tag on the card. In Plan I the agent selects $\alpha^\star_{i,t} = \arg\max_\alpha \mathrm{EU}(\alpha)$ over $\alpha_{i,t} \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$ scored under the risk-typed utility functional (see Architecture Figure 2).
Subtitle
for classic agents, the strategy class (Fundamentalist, Trend follower, Random ZI, Experienced) together with set membership (, , , ). For Utility agents, the risk preference and the agent's sampled CRRA coefficient in the universal form : Risk-loving draws ∈ (−1, 0) (convex, upside-seeking), Risk-neutral pins = 0 (linear expected value), and Risk-averse draws ∈ (0, 1) (concave, downside-sensitive).
——
Cash · Shares · Wealth · & P&L Subj V · vs Report V · (lie gap ringed red)Normalized utility ·
Trade & Dividend Feed
Figure 1
Transaction Price Trajectory versus Risk-Neutral Fundamental Value
fundamental value at the start of period t for the active asset (swaps to match the per-session asset selector)
Tick-level transaction prices (accent line, one dot per executed trade) plotted against the active asset's fundamental-value path (amber dashes). Alternating vertical bands delimit the trading periods of each round. Under the Dufwenberg, Lindqvist & Moore (2005) linear-declining asset a rational market should track the step line exactly; persistent excursions above it are the bubble and the crash toward = in the final period is the collapse. The other five assets (constant perpetuity, linear growth, cyclical, random walk, jump/crash) replace the staircase with their own FV path — the formula above updates accordingly.
Note
fundamental value at the start of period t (value a rational, risk-neutral holder assigns to one share)
fundamental value one period ahead — path-based assets (random walk, jump/crash) specify as a function of
observed transaction price at tick t — drawn as individual trade dots on the chart
,
period indices, 1 ≤ t, s ≤ ; s is the summation index over future periods
terminal period of the round (default = 20 so that FV1 = 100 for every asset)
remaining periods through terminal, = − t + 1 — used by the linear-declining asset
,
expected dividend paid at the end of period t (or s) — the per-share cash flow under the asset's public rule
mean dividend for the linear-declining asset, = = 5¢ (drawn uniformly from {0, 10¢})
risk-free discount rate (default 0.05) used by the perpetuity and linear-growth assets
,
intercept and slope in the linear-growth dividend schedule = + ·s (defaults = 2, = 0.3)
i.i.d. Gaussian innovation driving the random-walk asset, ∼ with = 5
binary jump magnitude for the jump/crash asset: +2 with probability 0.9 (calm) and −30 with probability 0.1 (crash)
(·, ·)
floor operator that clips the path-based assets above a positive minimum so FV never becomes non-positive
Order Book
BIDS
PriceQtyAgent
ASKS
PriceQtyAgent
Figure 2
Signed Mispricing and Price-to-Fundamental Ratio
· signed and relative mispricing
Signed departure of the observed price from the theoretical fundamental value, drawn on a symmetric axis around a dashed zero baseline: positive (premium) fills blue, negative (discount) fills red. In Lopez-Lira (2025) the same information is expressed as the price-to-fundamental ratio : values above one mark an overvaluation regime, values below one mark an undervaluation regime, and ≈ 1 is consistent with rational pricing. The Experiment Metrics panel reports the normalized-deviation and amplitude statistics derived from this series (those metrics retain the absolute-value wrapper because they aggregate magnitudes).
Note
signed mispricing at tick t, sign preserves premium vs. discount
price-to-fundamental ratio (Lopez-Lira 2025)
Figure 3
Trade Volume per Period
shares transacted in period t
Sum of share quantities exchanged within each trading period. High and persistent bars indicate active speculation; the classic Smith–Suchanek–Williams bubble is typically associated with a volume peak in the inflation phase followed by a cliff as the asset approaches expiry.
Note
total share volume traded in period t
order quantity of a single executed trade
Figure 4
Transaction Density over Price × Period
two-dimensional trade histogram
Two-dimensional histogram of share quantity binned by transaction price (vertical axis) and trading period (horizontal axis). Warm cells concentrate the market's liquidity. Comparing the heat cloud against the downward-sloping fundamental staircase reveals whether the market is trading near rational value or persistently above it.
Note
cumulative share volume in the (price, period) bin
Figure 5
Agent Action Timeline
per-tick agent decision
buy@Atbidsell@Btaskholdexecuted
One row per agent, one mark per decision. Column colour encodes the five-element action set $\alpha \in \{\text{hold},\,\text{buy@}A_t,\,\text{sell@}B_t,\,\text{bid},\,\text{ask}\}$ used inside $\mathrm{EU}(\alpha)$: the two book-crossing actions (buy@At, sell@Bt) render solid, and the two passive posts (bid, ask) render in the softer dashed variant. A small accent dot below the mark records whether the submitted order was filled on the same tick.
Note
action taken by agent i at tick t
Figure 6
Subjective Valuation: True versus Reported
· lie gap = private belief versus broadcast claim
Solid lines trace each Utility agent's private belief over time. Filled dots mark broadcast messages carrying a reported valuation ; deceptive reports are ringed red and connected to the sender's true belief by a dotted segment — the vertical distance between ring and line is the lie gap. The amber step line is the fundamental value for reference.
Note
agent i's private (true) subjective valuation at tick t
valuation reported in a broadcast message
lie gap for deceptive messages
hover
header Rr_Ss · Pp · t=N names round · session · period · tick; rows summarise the per-tick cross-agent distribution as percentiles P10 / P25 / median / P75 / P90, with FV the fundamental baseline and N the number of agents in that bucket
Figure 7
Normalized Agent Utility over Time
risk-adjusted wealth, normalized to initial endowment
Per-agent expected utility evaluated at the running wealth = + · , divided by the agent's own initial utility so every trajectory starts at 1.0. Lines above the dashed baseline indicate positive risk-adjusted PnL; lines below indicate loss. The risk preference attached to each agent (convex, linear, concave) determines how aggressively a given wealth change is penalised or rewarded.
Note
universal CRRA utility with per-agent sampled uniformly in (−1, 0) loving · {0} neutral · (0, 1) averse
mark-to-fundamental wealth at tick t
hover
header Rr_Ss · Pp · t=N names round · session · period · tick; rows summarise the per-tick cross-agent distribution as percentiles P10 / P25 / median / P75 / P90, with FV the fundamental baseline and N the number of agents in that bucket
Figure 8
Asset Ownership over Time
shares held; total supply conserved
Stacked area of each agent's inventory across ticks. Because the double auction conserves shares, the total height is always the aggregate endowment . Widening bands identify agents who are accumulating, shrinking bands identify distributors, and any dramatic redistribution in the last few periods is typically the experienced trader liquidating before the asset expires worthless.
Note
shares held by agent i at tick t
total shares outstanding (conserved across time)
Figure 9
Broadcast Message Log
per-tick public broadcast to all other agents
Buy signalSell signalHold signalDeceptive
One dot per broadcast message, placed on the sender's row at the tick the message was sent. Dot colour encodes the signal (buy/sell/hold) and a red ring flags messages whose reported valuation diverges sufficiently from the sender's private belief to be classified as deceptive by the logger. Reading a column shows the instantaneous rumour mill; reading a row shows each agent's rhetorical stance over time.
Note
broadcast from agent i to the population at tick t
Figure 10
Pairwise Trust Matrix
exponential-moving-average update
Heatmap of receiver-to-sender trust values in [0, 1]. The diagonal is masked. Each off-diagonal cell records how well sender s's recent valuation claims aligned with the period's volume-weighted average price, as seen by receiver r. Warm rows identify agents who tend to trust broadly; warm columns identify agents whose claims the population finds credible.
= − raw cash-equivalent change relative to initial endowment
One line per agent showing running P&L in experimental cents: the agent's mark-to-market wealth = + · minus its initial wealth . Lines above the dashed zero-baseline indicate gains; lines below indicate losses. Unlike Figure 7's risk-adjusted utility, this chart is in raw monetary units so the per-agent coefficient does not enter — every agent is graded on the same cash scale.
initial wealth at the start of the agent's first round
hover — header
Rr_Ss · Pp · t=N names, left to right, the round (Rr, 1 … roundsPerSession), the session in the batch (Ss, 1 … 10), the trading period inside that round (Pp, 1 … T), and the global tick index t=N the cursor is snapped to
hover — rows
summarise the distribution of per-agent P&L across all agents at that tick — not a time series, but a cross-section — as five percentiles of = −
P90
90th percentile: 10 % of agents are doing better than this number, 90 % worse (the top performers)
P75
upper-quartile boundary of the fan chart's shaded IQR band — top quarter of agents lie above
median
50th percentile — the line drawn through the centre of the fan; half the population is above, half below
P25
lower-quartile boundary of the IQR band — bottom quarter of agents lie below
P10
10th percentile: 90 % of agents are doing better than this, 10 % worse (the bottom performers)
N
number of agents contributing a P&L sample at this tick; drops when an agent is replaced at the round-3/4 boundary under T20/T40 and the fresh clone has not yet accumulated a data point
value format
signed cents, e.g. +12.4¢ (above initial endowment) or −7.1¢ (below); the dashed horizontal line at 0¢ on the chart is the break-even reference — anything above is net gain, anything below is net loss for that percentile
fan vs. lines
with N > 60 agents the chart renders as a percentile fan (shaded P10–P90 envelope, darker IQR band, solid median) so the hover is the only way to read exact numbers; with N ≤ 60 each agent is drawn as its own line and the same tooltip still reports the cross-sectional percentiles of those lines at the hovered tick
Figure 12
Per-Agent Subjective Valuation over Time
posterior valuation per share, evaluated each tick
One line per agent showing the private subjective valuation the agent assigns to one share at tick t. Each line starts from the agent's prior, then drifts as the active plan's belief-update protocol blends in peer messages, regulator alerts, and (when complex dividends are on) the agent's own dividend sample. The amber dashed line is the risk-neutral fundamental , included as a reference saw-tooth so over- and under-pricing are immediate to read off.
Note
subjective valuation of agent i at tick t
risk-neutral fundamental value at tick t
hover
header Rr_Ss · Pp · t=N names round · session · period · tick; rows summarise the per-tick cross-agent distribution as percentiles P10 / P25 / median / P75 / P90, with FV the fundamental baseline and N the number of agents in that bucket
Table 1
Market-Quality Statistics (Current Session)
Quantitative summary in the notation of Dufwenberg, Lindqvist & Moore (2005) and Lopez-Lira (2025). Haessel R² measures fit of the per-period mean price to fundamental value; the two normalized deviations capture total and average mispricing per share outstanding; amplitude is the peak-to-trough excursion of the mean-price residual normalized by the initial fundamental; turnover is the total shares traded divided by shares outstanding. The lower group reports allocative efficiency, aggregate welfare, and the deception statistics unique to the Utility population.
Table 2
10-Session Batch Results
Per-round market-quality metrics across the 10-session DLM batch (5 × first treatment + 5 × second treatment). Each row is labelled Rr_Ss (Round r of Session s). dev = mean absolute deviation |P − FV| in ¢; turn = shares traded / shares outstanding; vol = total shares exchanged; payoff = aggregate agent cash at round end.
Replay & Trace Inspector
Live — tick 0
Decisions recorded at this tick
System Design
Figures 1–4: the four-stage pipeline from asset fundamentals through information aggregation to price discovery. A single Start press runs 10 sessions ($R = 4$ rounds each, 1 440 ticks/session) with per-round data collected as $\texttt{R\{r\}\_S\{s\}}$.
Risk-neutral price of one share at the start of period $t$: $T = 20$ periods of remaining asset life, $d \in \{0, 10\}$¢ i.i.d. dividend draws, yielding a deterministic staircase from $\mathrm{FV}_1 = 100$¢ to $\mathrm{FV}_{20} = 5$¢ that resets at every round boundary. The DLM (2005) market substrate lives underneath every plan.
Figure 2 — Prior Elicitation and Expected-Utility ScoringEdit in draw.io
$\widetilde{\mathrm{FV}}_{i,t}$ is the agent's model-based fundamental value — the asset-specific closed form from v5 §5.{4,10,16,22,28,34}, shown per-asset in the card below. $H_{i,t}$ is the §4 four-term heuristic mix — Anchor, Trend, DividendSignal, and Narrative — weighted by the §6.2 default betas. The blend is governed by the experience-indexed model-reliance weight $\alpha_i$ (next card): novices lean on the heuristic, veterans anchor to $\widetilde{\mathrm{FV}}$. $b_i$ is a persistent per-agent bias ($\delta_i = +1$ optimistic, $-1$ pessimistic, $0$ unbiased; magnitude $\beta$) applied multiplicatively on the blend. $\varepsilon_i$ is a Gaussian draw with experience-shrinking half-width $\sigma_i$ — so novices have noisier priors and veterans sharpen. When bias and noise are both disabled and the heuristic coincides with the model value (e.g. $H \equiv \widetilde{\mathrm{FV}}$ at default weights in simple assets), the prior collapses to $\widetilde{\mathrm{FV}}$ exactly, recovering the Plan I baseline. All three plans start from this prior.
Experience Factors $\alpha_i, \sigma_i, \omega_i$ and the |corr| asset-swap blend
Each Utility agent carries three experience-indexed modelling parameters read from its $k_i = \texttt{roundsPlayed}$ counter every tick: $\alpha_i$ (fundamental weight — the v3 §2 weight placed on the model-based valuation $\widetilde{\mathrm{FV}}$ in the prior, with $1 - \alpha_i$ on $H$); $\sigma_i$ (the standard deviation of the Gaussian $\varepsilon_i$ noise on the prior); and $\omega_i$ (the self (non-peer) weight on the agent's own prior when blending peer messages — see the Plan I posterior below). A fresh replacement with $k_i = 0$ reports the novice anchors (1.00, 5.0, 0.60) from the Session Replacement Rate sliders. $\omega_i$ saturates at 0.90 once $k_i \geq$ 3; $\alpha_i$ saturates at 1.00 once $k_i \geq$ 0; $\sigma_i$ decays exponentially and is never clamped. From the round-4 replacement boundary onward, if the round's pre- and post-assets differ, each experienced trader's triple is blended toward the novice anchors by $|\mathrm{corr}|$, where $\mathrm{corr}$ is the Pearson correlation between the pre and post assets' expected $\mathrm{FV}$ paths. Collinear swaps ($|\mathrm{corr}| = 1$) preserve training at full strength; orthogonal swaps ($|\mathrm{corr}| = 0$, including the degenerate flat-path case) reset the trader back to the novice anchors. The same triple is rendered on every agent card (Fundamental weight / Valuation noise / Self (non-peer) weight) so the card matches the values that actually drove the trade decision.
$$ U(w;\rho_i) \;=\; \dfrac{w^{1-\rho_i}}{1 - \rho_i}, \qquad \text{evaluated in normalized form } (w/w_0)^{1-\rho_i} $$
$\hat{V}_i \equiv V_i^{\text{post}}$ is agent $i$'s subjective valuation — the posterior output of the active plan (see Figure 3). $c_i$ is cash; $q_i$ is share inventory. $w_0 = c_i + q_i \cdot \hat{V}_i$ is current wealth; $w_1 = (c_i \pm p_{\text{order}}) + (q_i \pm 1) \cdot \hat{V}_i$ is wealth after a hypothetical fill at order price $p_{\text{order}}$. For crossing actions (buy@$A_t$ and sell@$B_t$), $p_{\text{fill}} = 1$ (deterministic); for passive quotes (bid, ask), $p_{\text{fill}} = 0.30$ (tunable). In Plan I, agent $i$ maximizes EU over the five-element action set $\alpha_{i,t} \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$, where $A_t$ is the best ask and $B_t$ is the best bid on the order book at tick $t$. In Plans II/III the LLM selects directly from $\{\text{BUY\_NOW, SELL\_NOW, BID\_1, BID\_3, ASK\_1, ASK\_3, HOLD}\}$. Every agent shares the same universal CRRA functional and differs only in its sampled coefficient $\rho_i$: risk-loving draws from $\mathcal{U}(-1, 0)$ (strictly convex), risk-neutral pins at $0$ (linear), risk-averse draws from $\mathcal{U}(0, 1)$ (strictly concave).
The Advanced settings panel exposes two boolean toggles that gate the terms in the v3 §2 prior defined in the first card. Prior Bias OFF zeros the persistent per-agent tilt $b_i = \delta_i\!\cdot\!\beta$ with $\delta_i \in \{-1, 0, +1\}$ drawn at birth and $\beta = 0.15$; Prior Noise OFF zeros the Gaussian jitter $\varepsilon_i$ that is otherwise drawn from $\mathcal{N}(0,\sigma_i^2)$ via Box–Muller with the experience-shrinking $\sigma_i$. With both off the prior collapses to the $\alpha$-weighted blend of $\widetilde{\mathrm{FV}}$ and $H$ (and to $\widetilde{\mathrm{FV}}$ alone when $H \equiv \widetilde{\mathrm{FV}}$), recovering the Plan I baseline. Toggle state is captured in every engine snapshot ($\texttt{biasActive}$, $\texttt{noiseActive}$), surfaces in the reasoning trace alongside the $\widetilde{\mathrm{FV}}/H/\alpha/\sigma/\omega$ breakdown, and is included verbatim in the Plan II/III prompt under "YOUR PRIVATE STATE" so the LLM sees which mode it is in.
$V_{i,t}^{\text{prior}}$ is the v3 §2 prior from the card above (the $\alpha$-weighted blend of $\widetilde{\mathrm{FV}}$ and $H$, tilted by bias and Gaussian noise). $\hat{v}_m$ is the claimed valuation peer $m$ broadcasts, derived from $m$'s subjective valuation $V_m$ via a distortion multiplier $\phi_m$ determined by $m$'s communication strategy $\sigma_m$: $H$ (truthful) adds small uniform jitter of half-width $h$; $B$ (biased) applies a fixed-sign tilt of magnitude $\gamma$ with direction $\delta_m \in \{-1, 0, +1\}$; $D$ (strategic) overstates by $\kappa^+$ when inventory $q_m$ exceeds initial endowment $q_m^0$ (to inflate price for selling) and understates by $\kappa^-$ when below (to depress price for buying), with a bias-like fallback at $q_m = q_m^0$. $\bar{m}_t$ averages the $\hat{v}_m$ over the set $M$ of non-self messages received this period. The self (non-peer) weight $\omega_i$ is the same experience-indexed triple entry from v3 §3 that drives the prior (see Experience Factors card): it ramps from $\omega_0 = 0.60$ to the saturation value $0.90$ over the first three rounds, so novices listen freely and veterans rely on their own prior. The post-replacement asset-swap $|\mathrm{corr}|$ blend applies here too — an orthogonal asset swap pulls an experienced trader's $\omega_i$ back toward the novice anchor. If $|M| = 0$, $V_{i,t}^{\text{post}} = V_{i,t}^{\text{prior}}$ (no blend). Plans II/III reuse the same Step-3 blend with the LLM-delivered prior substituting for the algorithmic one; no network calls in Plan I, so the run is deterministic under the seeded PRNG.
This is the only calculation behind the reported value dot you see on Figure 3 of the Market tab and the claim field carried by every broadcast message on the bus. Code entry: UtilityAgent.communicate() in js/agents.js, called once per period by Engine._communicationRound().
The wedge — private valuation ≠ reported valuation. $V_m \equiv V_m^{\text{post}}$ lives inside the sender's head: it is the §3 posterior, the ground-truth belief used to decide this agent's own trades. $\hat v_m$ is what the sender broadcasts to the bus. The two coincide only for a jitter-free truthful sender ($\sigma_m = H,\ \varepsilon_m = 0$) or when the global Deception toggle collapses the channel; under every other configuration the distortion multiplier $\phi_m$ opens a measurable gap $\hat v_m \ne V_m$. That gap is the model's core informational imperfection — a wedge between private information and public communication — and everything below traces how it is generated, clamped, and fed back into the market.
Step 1 — pick the number to describe. The sender starts from its own subjective valuation $V_m \equiv V_m^{\text{post}}$ — the output of the Plan I Algorithmic Posterior card above, which has already blended the v3 §2 prior with the foreign peer-message mean. If the agent has not updated yet this period (no post exists), it falls back to $\mathrm{FV}_t$. This $V_m$ is the ground-truth private belief the broadcast is about to describe, truthfully or otherwise.
Step 2 — apply the communication strategy $\sigma_m$. Every utility agent is assigned one of three modes at sampling time, persistent across rounds. The type $\sigma_m$ is a character trait (fixed ex ante and for life); the realized distortion $\phi_m$ varies tick by tick, through either stochastic jitter ($\sigma_m = H$, $\sigma_m = B$ with $\delta_m = 0$) or the sender's changing inventory position ($\sigma_m = D$). So the population of communication styles is static but the per-message lie is fresh every period.
Honest ($\sigma_m = H$): multiplies $V_m$ by a tiny uniform jitter of half-width $h = 0.01$ — i.e. $\pm 1\%$. This is the "I'll tell you what I believe, plus unavoidable reporting noise" mode; the informational content of the underlying belief is preserved. The jitter uses the seeded engine PRNG, so runs are reproducible from $(\text{population}, \text{seed})$.
Biased ($\sigma_m = B$): applies a fixed-sign tilt of magnitude $\gamma = 0.10$ in the direction set by the agent's persistent bias $\delta_m \in \{-1, 0, +1\}$. Pessimists ($\delta = -1$) always understate by 10%, optimists ($\delta = +1$) always overstate by 10%, and the $\delta = 0$ sub-case draws a uniform $\pm 10\%$ each period. No inventory dependence — the tilt direction is a character trait, not a tactical choice. Concretely: an optimist who privately believes $V_m = 90¢$ will broadcast $\hat v_m \approx 99¢$ every period, round after round, regardless of whether it is long, short, or flat.
Strategic (deceptive) ($\sigma_m = D$): the lie is conditional on the sender's inventory relative to its round-start endowment $q_m^0$. Asset-rich senders ($q_m > q_m^0$) multiply by $\kappa^+ = 1.18$ to pump the peer mean up and unload at a higher price (incentive: sell high); asset-poor senders ($q_m < q_m^0$) multiply by $\kappa^- = 0.82$ to depress the mean and buy cheap (incentive: buy low). Senders exactly at endowment fall through to a uniform $\pm \gamma$ draw — no directional incentive, symmetric noise. Worked example: a strategic agent starting at $q_m^0 = 3$ shares who accumulates to $q_m = 5$ mid-round will now broadcast $\hat v_m = 1.18 \cdot V_m$ every tick until it sells back down, and this is the channel through which a run develops rumour-driven price bias that is uncorrelated with the asset's intrinsic payoff.
Step 3 — clamp and cache. $\hat v_m = \max(0, V_m \cdot \phi_m)$ so a large negative draw can never produce a negative broadcast. The sender stores the clamped value on agent.reportedValuation; the receiver reads it as msg.claimedValuation.
Step 4 — aggregation into the public signal. Every receiver $i$ averages the $\hat v_m$ over the set $M_i$ of non-self messages it got this period into a peer mean $\bar m_{i,t} = \frac{1}{|M_i|} \sum_{m \in M_i} \hat v_m$, and the §3 posterior blends $\bar m_{i,t}$ with its own prior at weight $(1 - \omega_i)$. Receivers therefore update on distorted public signals, not on true private valuations. When the self-weight $\omega_i$ is low — i.e. when agents lean heavily on peers, as novices do before the experience ramp saturates — any systematic wedge ($\sigma_m = B$ with a majority-sign population, or $\sigma_m = D$ with a persistent long/short imbalance) feeds back into the next round's priors, which distorts the next round's $\hat v_m$, and so on. This is the model's microfoundation for bubble formation: bubbles do not require incorrect private beliefs, only the interaction between belief formation and distorted communication.
Step 5 — derived labels on the same wire message. Two flags ride along for diagnostics:
The signal is not measured against $V_m$ — it is measured against the market reference ($P_t^{\text{last}}$, or $\mathrm{FV}_t$ if no trade has cleared yet). This is deliberate: receivers see whether the claim is bullish or bearish relative to the market, not relative to the sender's private belief (which is unobservable). The deceptive flag is ground truth (it uses $V_m$, which the sender knows) and is consumed by the logger and by the trust tracker, not by the agents themselves — so an agent cannot "see" that a peer is lying; it can only learn this indirectly, via the Pairwise Trust Dynamics card below.
Two global kill switches ride on top of this formula: (i) Deception toggle — when Advanced → Deception is off, the engine overwrites $\hat v_m \leftarrow V_m$ after posting, collapsing all modes to truth-without-jitter. (ii) Plan II / III short-circuit — under Plans II/III the entire algorithmic communication round is skipped (js/engine.js): utility agents do not compute $\hat v_m$ at all, and any peer information is carried inside the LLM prompt $\pi^{\text{II/III}}$ instead. reported value as defined here is therefore a Plan I concept, and the Message dots in Figure 3 of the Market tab only render under Plan I.
$\lambda$ is the EMA learning rate (not to be confused with the action variable $\alpha_{i,t}$). Receiver $r$'s trust in sender $s$ is reinforced when $s$'s claimed valuation $\hat{v}_s$ (see Plan I card) tracks the period's volume-weighted average price $\mathrm{VWAP}_t = \sum_j p_j q_j / \sum_j q_j$, where $p_j$ and $q_j$ are the price and quantity of trade $j$ in period $t$. Closeness is clipped to $[0, 1]$; $\tau$ is initialized at $0.5$ (neutral) with self-trust fixed at $1.0$. Logged on every plan, read by Plan II/III prompts for context.
$\pi_i^{\text{II}}$ is a structured prompt injected with: a per-asset 【Asset Environment】 block (v3 §5.3/§5.9/§5.15/§5.21/§5.27/§5.33 — dividend rule, horizon, model-based FV formula, and the canonical heuristic mistake for the active asset, read from market.assetType.agentTemplate), generic double-auction market rules, the agent's private state ($c_i$, $q_i$, $\hat{V}_i$, $w_0 = c_i + q_i \cdot \hat{V}_i$), the universal CRRA utility formula $U(w; \rho_i)$ with the agent's actual sampled $\rho_i$ value substituted in (so the LLM sees the same exact curve the EU evaluator scores under), current book state (best bid/ask), peer trust scores $\tau_{r \to s}$, and received messages $\{\hat{v}_m\}$. The LLM returns a discrete action $\alpha^\star_{i,t}$ from the seven-element set; the engine translates it to an order at the current book price. Fire-and-forget: a failed or invalid action falls back to the Plan I EU evaluation. See Figure 5 for the full prompt and the six per-asset variants.
Identical wiring to Plan II but the prompt omits the closed-form $U(w; \rho_i)$ expression and only names the risk-preference category. Isolates the effect of giving the LLM an explicit functional form versus just a label. Same seven-action output set.
Regulator Warning — Bubble-Ratio Prompt Injection
Plan II / III only · Advanced settings slider
$$ \rho_t \;=\; \frac{|P_t - \mathrm{FV}_t|}{\mathrm{FV}_t}, \qquad \rho_t \;\geq\; \theta \;\Rightarrow\; \text{warning fires once this round} $$
An exogenous public alert that simulates a market regulator. The Advanced settings slider sets the bubble-ratio threshold $\theta$ as a percentage of $\mathrm{FV}_t$ (the slider value $\in [0, 100]$ divides by 100; $\theta = 0$ is the canonical "off" state and the default). At every period boundary the engine evaluates $\rho_t = |P_t - \mathrm{FV}_t| / \mathrm{FV}_t$; the first time $\rho_t \geq \theta$ within a round it sets a sticky $\texttt{ctx.regulatorWarning} = \{\rho_t,\, \theta,\, \text{period},\, \text{round},\, P_t,\, \mathrm{FV}_t\}$ and logs a $\texttt{regulator\_warning}$ event. For the rest of that round, $\mathtt{AI.getPlanBeliefs}$ prepends a top-of-prompt block ⚠️ REGULATOR WARNING ⚠️ to every Utility agent's $\pi^{\text{II/III}}_{\text{usr}}$, naming the bubble ratio and reminding the agent the asset's intrinsic payoff has not changed; the warning clears at the next $\texttt{round\_start}$ so each round starts clean. Plan I has no LLM channel, so under Plan I the toggle is recorded in the snapshot and the warning event still fires (a replay shows where the regulator would have intervened) but agent behavior is unchanged. The Regulator tile is gated by the body class $\texttt{plan-ii}$ / $\texttt{plan-iii}$ and is hidden under Plan I.
Figure 4 — Price Discovery and Market-Quality DiagnosticsEdit in draw.io
$\bar{p}_t$ is the mean trade price in global period $t$ (averaged over all trades in that period); $\bar{\bar{p}}$ is the grand mean of $\bar{p}_t$ across all traded periods; $\mathrm{FV}_t$ is the fundamental value at period $t$. Haessel $R^2$ measures how closely per-period mean prices fit the fundamental staircase — it can be negative if mispricing exceeds the sample variance of $\bar{p}_t$. ND sums the absolute deviation $|p_j - \mathrm{FV}_{t(j)}|$ of each individual trade $j$ weighted by its quantity $q_j$, divided by total shares outstanding $Q = \sum_i q_i$; here $t(j)$ is the period in which trade $j$ occurred.
$A$ (amplitude) measures the peak-to-trough excursion of the mean-price residual $\bar{p}_t - \mathrm{FV}_t$, normalized by the initial fundamental $\mathrm{FV}_1 = \mathbb{E}[d] \cdot T = 100$¢. $\mathrm{TO}$ (turnover) sums the quantity $q_j$ across all trades $j$ and divides by total shares outstanding $Q = \sum_i q_i$ (conserved under double-auction trades); a value of $1.0$ means every share changed hands once. $\mathrm{AE}$ (allocative efficiency) is the ratio of realized to optimal aggregate valuation: $\hat{V}_i$ is agent $i$'s subjective valuation ($= V_i^{\text{post}}$), $q_i$ is agent $i$'s current inventory, and $\hat{V}_{\max} = \max_i \hat{V}_i$; the optimal allocation assigns all $Q$ shares to the highest-valuation agent. All three are reported in the Market-Quality Statistics panel and the batch results table.
Plan II/III build one system prompt shared across all six asset environments. The prompt carries the full v3 behavioral framework: the §2 one-period-ahead decomposition $V = \lambda\cdot\widetilde{\mathrm{FV}} + (1-\lambda)\cdot H$ with $H = \beta_1\cdot\text{Anchor} + \beta_2\cdot\text{Trend} + \beta_3\cdot\text{DividendSignal} + \beta_4\cdot\text{Narrative}$, the subjective valuation $V^{\text{subj}}_{i,t} = \alpha_i\cdot\widetilde{\mathrm{FV}}_{i,t} + (1-\alpha_i)\cdot H_{i,t} + \varepsilon_{i,t}$, the peer-weighted posterior $V^{\text{post}}_{i,t} = w_i\cdot V^{\text{subj}}_{i,t} + (1-w_i)\cdot \bar m_t$, the reported value $\hat v_{m,t} = \max(0,\,V_{m,t}\cdot\varphi_m)$ with communication style $\sigma_m \in \{H,B,D\}$, and the market signal $\bar m_t$ formed as the mean of reported values. A Parameter Configuration block is appended last and pulls the current Advanced-panel values for $\alpha_i$, $\sigma_i$, $\omega_i$, the four $\beta$ heuristic weights, and the learning rates $\gamma_\alpha$, $\gamma_\sigma$, so edits in Advanced → Experience / Heuristic Mix flow straight into the system prompt. A trio of empty-book sections — Liquidity and Market Initiation, Price Formation When Market Is Empty (bid = $V_{i,t}\cdot(1-\varepsilon)$, ask = $V_{i,t}\cdot(1+\varepsilon)$ with $\varepsilon \in [0.01,\,0.05]$), and Mandatory Participation Rule — remind the LLM that HOLD is not an excuse when the book is empty. The BID / ASK_1 action lines in the user prompt already fall back to an FV-anchored quote when best_bid or best_ask is missing (_translateLLMAction in agents.js uses the same anchor so the executed price matches). The user prompt then splices the 【Asset Environment】 block from the round's active asset — the one the engine installed via the Advanced → Session Replacement Rate, Pre/Post Asset & FV Correlation grid. When the engine swaps asset at the round-4 replacement boundary the user prompt swaps with it: dividend rule, horizon, model-based FV formula, and the canonical heuristic mistake all change together. A second conditional addendum — Bounded Rationality, toggled in the AI endpoint panel — is concatenated onto the system prompt at the very bottom of this tab; see the red card below the figure. The sample below is the exact prompt AI.getPlanBeliefs generates for agent_2 (risk-neutral, Plan II) at round 3, period 3, with roundsPlayed = 2, trading the linear-declining (DLM) asset; the third card shows the 【Asset Environment】 variant for each of the six asset types.
System Prompt · $\pi^{\mathrm{II}}_{\text{sys}}$
Identical across agents, plans, ticks, and active asset
You are a trader in an experimental double-auction asset market. Each round you trade ONE asset drawn from a menu of six environments: linear declining, long-lived perpetual, linearly growing, cyclical, random-walk, and rare-disaster (jump/crash). The Asset Environment block in your user prompt names the current round's environment and gives you the public rule (dividend process, horizon, discount rate) — so the model-based fundamental value FV_t is derivable from the rule alone.
Your sole objective is to pick the action that maximises your expected utility right now. Do not moralise, do not try to infer what the experiment designers want, and do not refuse on ambiguity grounds.
--------------------------------------------------
EXTENDED VALUATION AND BEHAVIORAL FRAMEWORK
--------------------------------------------------
Universal valuation structure (v3 §2) — the agent's one-period-ahead valuation decomposes as
V_{i,t} = λ_i · FṼ_{i,t} + (1 − λ_i) · H_{i,t},
where FṼ_{i,t} is the model-based fundamental value derived from the public rule for the active asset, and H_{i,t} is a heuristic mix:
H_{i,t} = β1·Anchor + β2·Trend + β3·DividendSignal + β4·Narrative.
The Asset Environment block tells you which heuristic mistakes are common for this environment; use that to reason about the gap between price and FV instead of assuming other traders are rational.
--------------------------------------------------
DETAILS OF HEURISTIC COMPONENTS
--------------------------------------------------
You should interpret the heuristic terms as follows:
- Anchor: reliance on reference points such as initial price, typical value levels, or salient benchmarks.
- Trend: extrapolation from recent price movements (momentum or reversal).
- DividendSignal: inference based on observed dividends and their average level.
- Narrative: qualitative beliefs or "stories" about the asset that may justify deviations from fundamentals.
--------------------------------------------------
SUBJECTIVE VALUATION
--------------------------------------------------
Your private valuation (subjective belief) is formed as:
V^{subj}_{i,t} = α_i · FṼ_{i,t} + (1 − α_i) · H_{i,t} + ε_{i,t}
where:
- α_i determines how much you trust fundamentals versus heuristics
- ε_{i,t} represents noise or imperfect reasoning
--------------------------------------------------
PEER LEARNING AND MARKET INFLUENCE
--------------------------------------------------
You do not rely solely on your own valuation. You also observe market-level signals.
Your effective valuation is:
V^{post}_{i,t} = w_i · V^{subj}_{i,t} + (1 − w_i) · m̄_t
where:
- m̄_t = average reported valuations from other traders
- w_i reflects how much you trust your own estimate vs. the market
Interpretation:
- High w_i → rely on your own reasoning
- Low w_i → follow the crowd
Important:
Other traders' reported values may NOT reflect their true beliefs.
--------------------------------------------------
REPORTED VALUE AND COMMUNICATION
--------------------------------------------------
Each trader may communicate a value to the market that differs from their true belief.
Let your internal valuation be:
V_{m,t} = V^{post}_{m,t}
Your reported value is:
v̂_{m,t} = max(0, V_{m,t} · φ_m)
This is what other agents observe.
--------------------------------------------------
COMMUNICATION TYPES
--------------------------------------------------
Each agent has a communication style:
σ_m ∈ {H, B, D}
- H (honest): reports true value with small noise
- B (biased): systematically overstates or understates value
- D (strategic): distorts value based on trading incentives
Strategic intuition:
- If holding many assets → incentive to report high value (sell high)
- If holding few assets → incentive to report low value (buy cheap)
Important:
You observe others' reported values, NOT their true beliefs.
--------------------------------------------------
MARKET SIGNAL FORMATION
--------------------------------------------------
Market belief is formed as:
m̄_t = average of all reported values
This means:
- Market signals may be biased or manipulated
- Prices may reflect distorted beliefs
--------------------------------------------------
KEY BEHAVIORAL INSIGHT
--------------------------------------------------
Do NOT assume other traders are rational or truthful.
Instead:
- They may use heuristics
- They may follow trends
- They may misreport intentionally
- Market sentiment may be misleading
You should reason strategically given this environment.
--------------------------------------------------
Important Rules:
--------------------------------------------------
1. You must select exactly one action from the given set of actions.
2. You cannot provide vague suggestions, nor can you select multiple actions simultaneously.
3. You cannot say "depends on" or "insufficient information." You must make the best decision based on the given information.
4. You must prioritise immediate execution, rather than defaulting to placing only orders.
5. You can accept the current best ask (buy immediately) or accept the current best bid (sell immediately).
6. If you choose to place an order, you should prioritize choosing a price that comes from the allowed set of candidate prices.
7. Your output must strictly conform to the specified format.
8. If there is no current bid or ask in the market, you are still allowed (and expected) to initiate trading by submitting your own bid or ask price. You do not need existing market quotes to place orders.
9. In addition to maximizing expected utility, you should actively participate in price discovery. Providing liquidity by submitting bids and asks is part of optimal behavior, especially when the market is empty or inactive.
10. If no executable price is available, you should propose a reasonable bid or ask based on your valuation rather than choosing HOLD.
11. When the market is empty:
- If your valuation V > FV: submit a bid slightly below your valuation
- If your valuation V < FV: submit an ask slightly above your valuation
- If uncertain: submit a mid-price around your valuation
Do not remain inactive.
12. You can:
- accept existing prices (if available), OR
- create new prices by submitting bids or asks
Submitting quotes is always feasible, even when no market exists.
--------------------------------------------------
LIQUIDITY AND MARKET INITIATION
--------------------------------------------------
The market may start with no bids or asks.
In such cases:
- You are expected to actively initiate trading
- You can and should submit your own bid or ask prices
- You do NOT need existing market quotes to act
Important:
Choosing HOLD purely because the market is empty is NOT optimal behavior.
Instead:
- If you believe the asset is undervalued → submit a bid
- If you believe the asset is overvalued → submit an ask
- If uncertain → submit a reasonable quote near your valuation
Providing liquidity is part of rational behavior in this market.
--------------------------------------------------
PRICE FORMATION WHEN MARKET IS EMPTY
--------------------------------------------------
If there are no current bids or asks:
You must construct your own price.
Use your valuation V_{i,t} as a reference.
Rules:
- Bid price = V_{i,t} × (1 − ε)
- Ask price = V_{i,t} × (1 + ε)
where ε is between 0.01 and 0.05.
Example:
If V = 75:
- reasonable bid: 70–74
- reasonable ask: 76–80
You do NOT need existing quotes to determine prices.
--------------------------------------------------
MANDATORY PARTICIPATION RULE
--------------------------------------------------
You are NOT allowed to choose HOLD simply because the market is empty.
If no trades exist, you must:
- Submit a bid, or
- Submit an ask
Market initiation is part of optimal behavior.
At the beginning of each round, at least one agent should initiate trading.
--------------------------------------------------
PARAMETER CONFIGURATION (CURRENT SETTING)
--------------------------------------------------
You operate under the following parameter values:
Fundamental weight:
α_i = 1.00
→ You mostly trust model-based valuation
Noise level:
σ_i = 5.0
→ Moderate uncertainty in valuation
Self-weight (confidence vs market):
ω_i = 0.60
→ You partially rely on market signals
Heuristic weights:
β1 (Anchor) = 0.50
β2 (Trend) = 0.20
β3 (DividendSignal) = 0.20
β4 (Narrative) = 0.10
→ Your heuristics are dominated by anchoring, with moderate trend following and dividend signals
Learning parameters:
γ_α = 0.15
γ_σ = 0.30
→ Experience gradually increases reliance on fundamentals and reduces noise
--------------------------------------------------
INTERPRETATION
--------------------------------------------------
You should interpret these parameters as behavioral tendencies:
- Higher α_i → more fundamental-driven decisions
- Higher ω_i → more independent from the market
- Higher β2 → stronger trend-following
- Higher β1 → stronger anchoring bias
Use these tendencies when forming your valuation and decision.
User Prompt · $\pi^{\mathrm{II}}_{\text{usr}}(\text{agent}_2)$ · asset = Linear Declining (DLM)
You are a trader in the market, agent_2.
【Your Type】
- Risk Preference Type: Risk neutral
Makes decisions based on expected returns
- Your utility function: U(w; ρ) = (w / w0)^(1 − ρ), with ρ = 0.000 (linear, EV-indifferent)
w0 (initial wealth) = 1300 cents.
【Your Past Experience in This Market】
You have already traded 2 rounds in this market. The records below are the price paths you observed and the payoff you earned. Use them to judge how seriously to weight fundamental value vs. recent prices and short-term trends — your own memory is the best guide.
Round 1 (your first in this market):
- FV path (p1..p20): 100 / 95 / 90 / 85 / 80 / 75 / 70 / 65 / 60 / 55 / 50 / 45 / 40 / 35 / 30 / 25 / 20 / 15 / 10 / 5
- Last-trade price path: 110 / 140 / 170 / 185 / 180 / 165 / 150 / 130 / 110 / 92 / 78 / 65 / 52 / 42 / 33 / 26 / 20 / 14 / 9 / 5
- Peak price: 185 at p4 (FV then = 85, deviation +118%)
- Round-end last price: 5 (FV at p20 = 5; gap +0)
- Your end-of-round cash: 1480¢ (round-start mark-to-market wealth = 1300¢ = cash + shares × FV₁)
Round 2 (most recent):
- FV path (p1..p20): 100 / 95 / 90 / 85 / 80 / 75 / 70 / 65 / 60 / 55 / 50 / 45 / 40 / 35 / 30 / 25 / 20 / 15 / 10 / 5
- Last-trade price path: 102 / 115 / 125 / 122 / 112 / 98 / 84 / 72 / 62 / 54 / 47 / 41 / 36 / 31 / 26 / 22 / 18 / 13 / 9 / 5
- Peak price: 125 at p3 (FV then = 90, deviation +39%)
- Round-end last price: 5 (FV at p20 = 5; gap +0)
- Your end-of-round cash: 1365¢ (round-start mark-to-market wealth = 1300¢ = cash + shares × FV₁)
【Asset Environment】
- Asset name: Linear Declining (DLM)
- Asset type: Gradually depleting asset
- Horizon: Total remaining periods: 20. After period 20 the asset expires — no further payoffs and no residual value.
- Per-period dividend rule:
- 50% probability the dividend is 10
- 50% probability the dividend is 0
Expected per-period dividend E[d_t] = 5.
- Environmental notes:
- No terminal value — K_t = 0.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV at period t: FV_t = 5 × (T − t + 1) = E[d] × remaining periods. Undiscounted tail sum of expected dividends.
- Common heuristic mistake in this environment:
Naive agents anchor to the initial total value 5T and fail to internalise the declining path; they also over-weight the last observed price as a trend signal.
【Market Rules】
1. This is a 20-period double-auction market (one asset per round; the asset above is what you are trading this round).
2. The fundamental value FV_t is determined by the rule in the Asset Environment block, not by any fixed formula. All traders see the same rule.
3. Double-auction mechanics:
- You can buy the current lowest ask immediately.
- You can sell to the current highest bid immediately.
- You can submit a new bid.
- You can submit a new ask.
- You can also choose not to trade.
4. If you buy the current ask immediately, the transaction will be executed instantly at the lowest ask price.
5. If you sell the current bid immediately, the transaction will be executed instantly at the highest bid price.
6. The last price is only updated when a transaction occurs.
【Your Status】
- Current Cash: 965
- Current Asset Holdings: 4
【Current Market Status】
- Current Period: 3
- Current Remaining Periods k: 18
- Current Fundamental Value (FV): 90
- Last Price: 95
- Highest Bid: 88
- Lowest Ask: 94
- Previous Reference Price: 95
- This round so far (last trade per period): p1=98 (FV 100), p2=95 (FV 95)
【Your Decision-Making Principles】
You want to maximize the following intuitive utilities:
1. The higher the wealth, the better;
2. You evaluate expected returns linearly;
3. Buying at a price lower than the last traded price increases utility;
4. Selling at a price higher than the last traded price increases utility;
5. Holding too many positions increases inventory risk;
【Additional Requirements】
1. You cannot mechanically favor holding.
2. If the utility of immediate execution is similar to holding, you should prioritize actions that facilitate the trade.
3. You must consider "execution opportunities" valuable because not executing means you cannot improve your position.
4. When you hold a lot of assets, you should seriously consider selling; when you hold a lot of cash and fewer assets, you should seriously consider buying.
5. Towards the later stages, you should focus more on fundamental value than short-term resale opportunities.
【Role-Specific Guidance】
- As a risk-neutral trader, you should focus more on expected returns.
【Peer Messages from Last Period】
- agent 1: claimed value 85 cents
- agent 3: claimed value 78 cents
- agent 4: claimed value 92 cents
- agent 5: claimed value 80 cents
- agent 6: claimed value 74 cents
【You must choose one of the following actions】
0. A random percentage x = 5.50% (drawn uniformly from [1%, 10%]) has been generated for this period; the BID and ASK_1 prices below are the result of applying x to best_bid / best_ask.
1. BUY_NOW: Immediately buy 1 unit at the current lowest ask price (best_ask).
2. SELL_NOW: Immediately sell 1 unit at the current highest bid price (best_bid).
3. BID: Submit bid = best_bid*x = 93.
4. ASK_1: Submit ask = best_ask/x = 89.
5. HOLD: Do not trade.
The action you choose must maximize your wealth given the possible wealths generated from the five actions.
【Your Task】
Please briefly compare the available actions to determine which is most advantageous to you:
- Buy immediately
- Sell immediately
- Place a bid at best_bid*x
- Place an ask at best_ask/x
- Do not trade
Then output only one final action.
【Strict Output Format】
Reason: <Explain in 3-6 sentences why this action maximizes your wealth>
Action: <BUY_NOW / SELL_NOW / BID / ASK_1 / HOLD>
【Asset Environment】 · per-asset booklet
Click an asset to see the exact block AI.getPlanBeliefs splices into the user prompt above when the current round's active asset is that type. The rest of the prompt is identical across assets; only this block and the FV numbers (current FV, history paths) change when the engine swaps asset at the replacement-round boundary.
【Asset Environment】
- Asset name: Linear Declining (DLM)
- Asset type: Gradually depleting asset
- Horizon: Total remaining periods: 20. After period 20 the asset expires — no further payoffs and no residual value.
- Per-period dividend rule:
- 50% probability the dividend is 10
- 50% probability the dividend is 0
Expected per-period dividend E[d_t] = 5.
- Environmental notes:
- No terminal value — K_t = 0.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV at period t: FV_t = 5 × (T − t + 1) = E[d] × remaining periods. Undiscounted tail sum of expected dividends.
- Common heuristic mistake in this environment:
Naive agents anchor to the initial total value 5T and fail to internalise the declining path; they also over-weight the last observed price as a trend signal.
【Asset Environment】
- Asset name: Perpetual
- Asset type: Long-lived stable-yield asset
- Horizon: The yield environment is long-run stable — the asset does not deplete and has no terminal period.
- Per-period dividend rule:
- 50% probability the dividend is 6
- 50% probability the dividend is 4
Expected per-period dividend E[d_t] = 5.
- Environmental notes:
- Capital opportunity cost / discount rate r = 5% per period.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV at every period: FV_t = E[d] / r = 5 / 0.05 = 100 (constant over t).
- Common heuristic mistake in this environment:
Naive agents treat the perpetual as if it were finite-horizon and drift toward a declining mental model; peer messages about "price going up" can push them away from the flat 100 anchor.
【Asset Environment】
- Asset name: Linear Growth
- Asset type: Growth-type asset
- Horizon: The project's earning power improves over time for the full T-period horizon. No residual value after period T.
- Per-period dividend rule:
Expected dividend at period s: E[d_s] = 2 + 0.3·s
Each realisation fluctuates around this mean (Gaussian, σ = 1).
- Environmental notes:
- Capital opportunity cost / discount rate r = 5% per period.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV at period t (v5 §5.16, Gordon perpetuity): FV_t = (2 + 0.3·t) / r with r = 0.05. Rising environment — a perpetual growth asset with no terminal date; the path is monotonically rising in t.
- Common heuristic mistake in this environment:
Naive agents over-extrapolate the rising dividend schedule into a steeper-than-actual price path — the per-agent narrative trait g_i (growth optimism, v5 §5.17) plus one-sided momentum tilts the heuristic above the Gordon-perpetuity anchor even when FV rises one-for-one with E[d_t].
【Asset Environment】
- Asset name: Cyclical
- Asset type: Cyclical asset
- Horizon: The asset is driven by a business-cycle pattern. Cycle length ≈ 10 periods. 20 periods total.
- Per-period dividend rule:
Expected dividend cycles with period 10: E[d_t] = 5 + 2·sin(2π · (t−1) / 10).
Each realisation fluctuates around this mean (Gaussian, σ = 1).
- Environmental notes:
- You know a cycle exists, but may not know exactly which phase you are in.
- No explicit terminal date — future payoffs may extend indefinitely.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV at period t (v6 §5.22, Gordon perpetuity): FV_t = (5 + 2·sin(2π · (t−1) / 10)) / r with r = 0.05. The path oscillates between 60 and 140 around a mean of 100, period 10.
- Common heuristic mistake in this environment:
Naive agents mistake the rising half of the cycle for a durable trend and the falling half for a crash; phase confusion is the dominant error.
【Asset Environment】
- Asset name: Random Walk Fundamental
- Asset type: Stochastically drifting asset
- Horizon: No fixed upward trend, downward trend, or cycle. The value environment is subject to persistent random shocks for 20 periods.
- Per-period dividend rule:
FV_{t+1} = max(20, FV_t + η_t), with η_t ~ Normal(0, σ=5).
Dividend at period t is backed out from the FV path: d_t = FV_t − FV_{t+1}/(1+r), floored at 0.
- Environmental notes:
- Current environment starts at FV_1 = 100. Future FV may rise or fall symmetrically.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV is a martingale: E[FV_{t+k} | FV_t] = FV_t for all k ≥ 0. Your best point estimate of future FV is today's FV_t (floored at 20).
- Common heuristic mistake in this environment:
Naive agents over-extrapolate recent moves — they treat a short up-run as a trend and a short down-run as a crash, instead of treating FV as memoryless.
【Asset Environment】
- Asset name: Jump / Crash
- Asset type: Asset with rare-disaster (crash) risk
- Horizon: 20 periods. Calm phases are briefly positive; a small chance each period wipes out many calm periods at once.
- Per-period dividend rule:
FV moves each period by one of two jumps:
- 90% probability: +2 (calm drift up)
- 10% probability: −30 (rare crash)
Dividend at period t is backed out from the FV path: d_t = FV_t − FV_{t+1}/(1+r), floored at 0.
- Environmental notes:
- Current environment starts at FV_1 = 100. Floor at 5 (FV cannot fall below 5).
- Expected per-period drift E[ΔFV] = 0.9·(+2) + 0.1·(−30) = −1.2 — slightly negative on average.
- Model-based valuation rule (what a rational agent would derive from the public rule):
Model-based FV at period t using the stated probabilities: E[FV_{t+k} | FV_t] ≈ max(5, FV_t + k·(−1.2)). A correctly-calibrated agent anchors to this drift; an over-optimistic one discounts the crash probability.
- Common heuristic mistake in this environment:
Naive agents under-weight the 10% crash branch after a long calm run, treating +2 as the norm and getting caught when the crash hits.
How experience and the asset selector change the prompt
Two orthogonal axes of variation
By experience (roundsPlayed). The 【Your Past Experience】 block scales with roundsPlayed — one entry per round the agent has lived through. A fresh R4-⅓/R4-⅔ replacement (roundsPlayed = 0) sees no history block at all; in its place the prompt contains a single declarative sentence: "This is your first round in this market. You have never traded this asset before and have no memory of prior rounds — you only see the rules, the fundamental value, and whatever trading has happened so far in the current round." Nothing else changes. The rule-based experience labels that used to appear in 【Your Type】 and 【Role-Specific Guidance】 are gone — the LLM decides how to weight fundamentals versus recent prices from its own observed history, not from an instruction telling it to do so.
By asset. The 【Asset Environment】 block is the only section that changes when the engine swaps asset at the replacement-round boundary; the FV-path numbers in the history block also re-read through market.fundamentalValue(p), so a round traded on Linear Growth reports the Gordon-perpetuity path FVt = (2 + 0.3·t)/r (v5 §5.16), Cyclical reports a sinusoidal one, and so on. Plan III differs from Plan II only by omitting the explicit $U(w; \rho_i)$ formula from 【Your Type】; the six asset variants and the experience block are identical across Plans II and III.
Concatenated onto the end of the Figure 5 system prompt when the Bounded Rationality toggle in the AI endpoint panel is ON. Caps the LLM's cognitive budget so it trades like a human subject rather than a textbook optimiser: $K=3$ reasoning steps · $N=5$ attention slots · $T=3$ periods of price memory · perceived FV = true FV + $\varepsilon$, $\varepsilon \sim \mathcal{N}(0, \sigma^2)$ with $\sigma = 10\text{¢}$ · execution noise $p = 0.10$ · one heuristic from {trend-following, mean-reversion, anchoring, randomized preference}. When the toggle is ON the user prompt's price paths (current round and remembered past rounds) are also truncated to the last $T$ periods — the LLM literally cannot see further back than a human's working memory. Plan I is unaffected: no LLM channel.
Bounded-Rationality Addendum · appended to $\pi^{\mathrm{II}}_{\text{sys}}$ / $\pi^{\mathrm{III}}_{\text{sys}}$
Identical across agents, plans, and asset types — a pure cognitive cap spliced in only when the toggle is active
====================
Cognitive Constraints (Bounded Rationality)
====================
1. You can only perform up to K = 3 reasoning steps.
2. You are NOT allowed to compute the exact fundamental value using the full dividend model.
3. If reasoning becomes complex, you must fall back on heuristics — do not try to solve the whole problem analytically.
====================
Belief Formation
====================
- Your perceived fundamental value is noisy:
perceived_value = true_value + ε, ε ~ Normal(0, σ²)
with σ = 10 cents. Treat FV numbers in the prompt as noisy signals, not ground truth.
====================
Attention Constraint
====================
- You can only consider up to N = 5 pieces of information when choosing your action. Pick the most decision-relevant ones and ignore the rest.
====================
Memory Constraint
====================
- You can only remember the last T = 3 periods of prices. The user prompt already truncates price paths to this window — do not try to infer older prices.
====================
Decision Heuristic
====================
You must commit to ONE of the following heuristics for this decision and name it in your Reason:
- Trend-following
- Mean-reversion
- Anchoring to past trades
- Randomized preference
====================
Execution Noise
====================
- With probability p = 0.10 a boundedly rational trader takes a suboptimal action. Factor this into your confidence, do not claim certainty.
====================
Action Rules
====================
1. You must select exactly one action.
2. No vague answers.
3. No "depends".
4. Immediate execution is preferred.
5. Prefer the allowed candidate prices for orders.
6. If the book is empty, initiate trading with a BID or ASK_1 anchored on your valuation rather than holding.
7. Output must follow the specified format.
Glossary & Reference
Abbreviations & indices
Term
Expansion
Meaning
Plan I
Algorithmic posterior
Deterministic baseline — $V_{i,t}^{\text{post}} = \omega_i\cdot V_{i,t}^{\text{prior}} + (1-\omega_i)\cdot\bar{m}_t$ with $\omega_i = 0.6 + 0.1\,\min(3, k_i)$ (the v3 §3 self (non-peer) weight, shared across all three plans).
Plan II
LLM posterior · utility form
One chat completion per Utility agent per period. LLM returns a discrete action from {BUY_NOW, SELL_NOW, BID, ASK_1, HOLD} together with a self-drawn $x \in [1\%, 10\%]$; BID posts $\text{best\_bid} \cdot x$ and ASK_1 posts $\text{best\_ask} / x$. Prompt includes the closed-form universal CRRA $U(w; \rho_i) = w^{1-\rho_i}/(1 - \rho_i)$ with the agent's actual sampled $\rho_i$ substituted.
Plan III
LLM posterior · risk label only
Same wiring as Plan II but the prompt only names the risk-preference category; no functional form is supplied. Same seven-action output set.
DLM
Dufwenberg, Lindqvist & Moore (2005)
Source paper for the shared market substrate: $T$, $\mathbb{E}[d]$, $\mathrm{FV}_t$, and the four-round session loop.
U
Utility
EU-maximising agent — the sole agent class ($N = 100$). Per-period belief update is what Plans I, II, and III compare.
FV
Fundamental value
$\mathrm{FV}_t = \mathbb{E}[d] \cdot (T - t + 1)$ — risk-neutral value at the start of period $t$.
Per-period average trade price weighted by quantity; baseline for the trust EMA update.
ND
Normalized deviation
Total absolute mispricing: $\mathrm{ND} = \sum_j |p_j - \mathrm{FV}_{t(j)}| \cdot q_j \,/\, Q$, where $j$ indexes trades, $q_j$ is trade quantity, and $Q$ is total shares outstanding.
R²
Haessel R²
Coefficient of determination of mean price against fundamental value.
TO
Turnover
Total shares traded divided by total shares outstanding — reports speculative intensity.
AE
Allocative efficiency
Realized aggregate valuation divided by the theoretical maximum: $\mathrm{AE} = \sum_i \hat{V}_i q_i \,/\, (\hat{V}_{\max} \cdot Q)$, where $\hat{V}_i = V_i^{\text{post}}$.
Session
10-session DLM batch
One click of Start runs 10 sessions (5 × first treatment + 5 × second treatment). Each session is a complete $R = 4$ round game; data is collected per round with labels $\texttt{R\{r\}\_S\{s\}}$.
Rr_Ss
Round–session label
Identifies Round $r$ of Session $s$ in the batch results table. Example: R3_S7 = round 3 of session 7.
T20 / T40
Treatment sizes (N = 100)
T20 (R4-⅔): 20 agents replaced in R4, 80 veterans remain. T40 (R4-⅓): 40 replaced, 60 veterans remain. First 5 sessions use the selected treatment, last 5 use the other.
Mathematical notation
Symbol
Definition
Where it appears
$\mathrm{FV}_t$
Fundamental value at the start of period $t$. $\mathrm{FV}_t = \mathbb{E}[d]\cdot(T - t + 1)$, with $\mathbb{E}[d] = \tfrac{1}{2}(0) + \tfrac{1}{2}(10) = 5$¢ and $T = 20$. Yields a staircase from $\mathrm{FV}_1 = 100$¢ to $\mathrm{FV}_{20} = 5$¢, resetting at every round boundary.
Shared substrate — drives every agent's prior (Figures 1–4)
$V_{i,t}^{\text{prior}}$
Agent $i$'s pre-blend valuation at period $t$ — v3 §2 decomposition: $V_{i,t}^{\text{prior}} = \max\!\bigl(0,\,[\alpha_i\!\cdot\!\widetilde{\mathrm{FV}}_{i,t} + (1-\alpha_i)\!\cdot\!H_{i,t}](1 + b_i) + \varepsilon_i\bigr)$. Identical across all three plans; only the posterior update (Step 3) differs.
Prior Formation stage (Figures 1–2)
$\widetilde{\mathrm{FV}}_{i,t}$
Agent $i$'s model-based fundamental value at period $t$ — the asset-specific closed form from v5 §5.{4,10,16,22,28,34}. For the Linear-Declining (DLM) asset this is $\widetilde{\mathrm{FV}}_{i,t} = 5\!\cdot\!(T - t + 1)$, so a rational $\alpha_i = 1$ trader recovers the public $\mathrm{FV}_t$ exactly; every other asset reads its per-asset form off the Figure-2 card.
v3 §2 prior — model-based term (Figure 2)
$H_{i,t}$
Four-term heuristic mix (v3 §4): $H_{i,t} = \beta_1\!\cdot\!\text{Anchor} + \beta_2\!\cdot\!\text{Trend} + \beta_3\!\cdot\!\text{DividendSignal} + \beta_4\!\cdot\!\text{Narrative}$ with default weights $(\beta_1,\beta_2,\beta_3,\beta_4) = (0.50, 0.20, 0.20, 0.10)$ from §6.2 — live-tunable via the green β-row in Advanced settings; the Σβ tile at the end of the row flips amber when the weights no longer sum to 1. Trend drops out at $t = 1$ (no prior period to difference against) per §6.3. The heuristic value enters the prior with weight $1 - \alpha_i$, so novices lean on $H$ while veterans anchor to $\widetilde{\mathrm{FV}}$.
v3 §2 prior — heuristic term (Figure 2)
$\alpha_i$
Per-agent fundamental weight (v3 §2, $\alpha_i \in [0, 1]$): $\alpha_i$ represents the agent's fundamental weight, i.e., the weight placed on the model-based valuation $\widetilde{\mathrm{FV}}_{i,t}$ in the prior, with the complement $1 - \alpha_i$ going to the heuristic reading $H_{i,t}$. Experience raises it via the §3.2 / §6.1 rule $\alpha_i = \min\{1,\, 0.4 + 0.15\, k_i\}$, so $\alpha_0 = 0.40$ is the novice intercept and $\gamma_\alpha = 0.15$ is the per-round slope — both tunable via the pink experience row in Advanced settings. Saturates at $1.00$ once $k_i \geq 4$. Blended toward $\alpha_0$ by $(1 - |\mathrm{corr}|)$ when the asset swaps post-replacement. Distinct from the paper's $\text{Anchor}$ term (first primitive of $H_{i,t}$). Rendered on agent cards as "Fundamental weight".
Per-agent valuation-noise scale (v3 §3): $\sigma_i = \sigma_0\!\cdot\!\exp(-\gamma_\sigma\!\cdot\!k_i)$ with $\sigma_0 = 15$¢ and decay rate $\gamma_\sigma = 0.30$ (both tunable via the pink experience row in Advanced settings). Sets the standard deviation of the Gaussian $\varepsilon_i \sim \mathcal{N}(0, \sigma_i^2)$ added to the prior. Novices ($k_i = 0$) have $\sigma_0 = 15$¢; a three-round veteran has $\sigma_3 \approx 6.1$¢. Blended toward $\sigma_0$ post-asset-swap. Rendered on agent cards as "Valuation noise".
v3 §2 prior — Gaussian jitter scale
$\omega_i$
Per-agent self (non-peer) weight (v3 §3): $\omega_i = \omega_0 + \Delta_\omega\!\cdot\!\min(k_\omega, k_i)$ with $\omega_0 = 0.60$ (tunable via the pink experience row in Advanced settings), $\Delta_\omega = 0.10$, saturation horizon $k_\omega = 3$. So $\omega_i \in \{0.60, 0.70, 0.80, 0.90\}$ for $k_i \in \{0, 1, 2, \geq 3\}$ at the default anchor. Controls the Step-3 peer blend $V^{\text{post}} = \omega_i\!\cdot\!V^{\text{prior}} + (1-\omega_i)\!\cdot\!\bar{m}$ — $\omega_i$ is the weight on the agent's own prior, $1 - \omega_i$ is the weight on the peer-message mean. Blended toward $\omega_0$ post-asset-swap. Rendered on agent cards as "Self (non-peer) weight".
Plan I posterior; Plans II/III Step-3 blend
$\varepsilon_i \sim \mathcal{N}(0, \sigma_i^2)$
Per-tick Gaussian valuation noise drawn via Box–Muller over the seeded PRNG. The per-agent $\sigma_i$ shrinks exponentially in $k_i$, so novices have noisy priors and veterans sharpen. Gated by Advanced → Prior Noise; when the toggle is OFF, $\varepsilon_i = 0$.
v3 §2 prior — noise term (Figure 2)
$b_i = \delta_i \cdot \beta$
Persistent per-agent valuation bias. $\delta_i \in \{-1, 0, +1\}$ is the bias direction drawn at birth (pessimistic, unbiased, optimistic) and $\beta = 0.15$ is the bias magnitude. Applied multiplicatively on the $\alpha$-weighted $\widetilde{\mathrm{FV}}/H$ blend inside the v3 §2 prior. Gated by Advanced → Prior Bias.
Prior formation (Figure 2)
$k_i$
Agent $i$'s experience counter (v3 §3). Starts at $0$; incremented by $1$ at every round boundary for every surviving agent. Drives the triple $(\alpha_i, \sigma_i, \omega_i)$ — so $k_i$ controls fundamental weight, noise amplitude, and self (non-peer) weight simultaneously. Fresh R4 replacements restart at $k_i = 0$.
Asset-swap experience-transfer weight. When a session pairs different pre- and post-assets, $\mathrm{corr}$ is the Pearson correlation between the two assets' expected $\mathrm{FV}$ paths (sampled from a single seeded pre-round simulation; flat-path pairings coerce to $0$). From round 4 onward the experienced triple is blended toward the novice anchors as $x_{\text{new}} = |\mathrm{corr}|\!\cdot\!x_{\text{trained}} + (1 - |\mathrm{corr}|)\!\cdot\!x_0$ for $x \in \{\alpha, \sigma, \omega\}$. $|\mathrm{corr}| = 1$ preserves training; $|\mathrm{corr}| = 0$ resets to novice anchors.
Session-level asset-swap experience blend
$\hat{v}_m$
Claimed valuation reported by peer agent $m$. Computed as $\hat{v}_m = \max(0,\, V_m \cdot \phi_m)$ where $\phi_m$ is a distortion multiplier determined by $m$'s communication strategy $\sigma_m \in \{H, B, D\}$ (see Figure 3). The peer-message mean is $\bar{m} = \tfrac{1}{|M|}\sum_{m \in M} \hat{v}_m$ where $M$ is the set of non-self messages received this period.
Plan I posterior — blended with prior via weight $w$ (Figure 3)
$\sigma_m \in \{H, B, D\}$
Communication strategy of agent $m$: $H$ = truthful (small uniform jitter), $B$ = biased (fixed-sign tilt), $D$ = strategic (inventory-dependent over/understatement). Assigned at birth and persistent across rounds.
Distortion multiplier $\phi_m$ in $\hat{v}_m$ (Figure 3)
$\phi_m$
Communication distortion multiplier. $\phi_m = 1 + \mathcal{U}[-h, h]$ if $\sigma_m = H$; $\phi_m = 1 + \delta_m \gamma$ if $\sigma_m = B$; $\phi_m = \kappa^+$ or $\kappa^-$ if $\sigma_m = D$ (depending on $q_m$ vs $q_m^0$), with a $1 + \mathcal{U}[-\gamma, \gamma]$ fallback at $q_m = q_m^0$. Parameters: $h = 0.01$, $\gamma = 0.10$, $\kappa^+ = 1.18$, $\kappa^- = 0.82$.
Agent $i$'s period-end valuation — output of the v3 §3 Step-3 peer blend: $V_{i,t}^{\text{post}} = \omega_i\!\cdot\!V_{i,t}^{\text{prior}} + (1 - \omega_i)\!\cdot\!\bar{m}_t$, or $V_{i,t}^{\text{prior}}$ when no foreign messages arrived this period. Plan I computes this blend directly; Plans II/III cache an LLM-delivered posterior that short-circuits the blend when available and falls back to the same formula otherwise.
Becomes next period's prior in all three plans (Figure 3)
$U(w; \rho_i)$
Universal CRRA utility shared by every agent: $U(w;\rho) = w^{1-\rho}/(1-\rho)$, evaluated in normalized form $(w/w_0)^{1-\rho}$ so $U(w_0) = 1$. The per-agent coefficient $\rho_i$ is drawn uniformly from $(-1, 0)$ (risk-loving, strictly convex), pinned at $0$ (risk-neutral, linear), or drawn uniformly from $(0, 1)$ (risk-averse, strictly concave).
EU scoring; the substituted $\rho_i$ appears explicitly in Plan II prompts (Figures 2–3)
$w_0, w_1$
Wealth states for EU evaluation. $w_0 = c_i + q_i \cdot \hat{V}_i$ (wealth if no trade); $w_1 = (c_i \pm p_{\text{order}}) + (q_i \pm 1) \cdot \hat{V}_i$ (wealth if the order fills at price $p_{\text{order}}$), where $c_i$ is cash, $q_i$ is inventory, and $\hat{V}_i \equiv V_i^{\text{post}}$ is the agent's subjective valuation.
Assumed fill probability for a non-crossing (passive) quote. Used in the EU functional: $\mathrm{EU}(\alpha) = p_{\text{fill}} \cdot U(w_1) + (1 - p_{\text{fill}}) \cdot U(w_0)$. For crossing actions (buy@$A_t$, sell@$B_t$), $p_{\text{fill}} = 1$ (deterministic); for passive actions (bid, ask), $p_{\text{fill}} = 0.30$ (tunable).
EU scoring — $\alpha^\star_{i,t}$ action evaluation (Figure 2)
$\alpha^\star_{i,t}$
Optimal action for agent $i$ at tick $t$. $\alpha^\star_{i,t} = \arg\max_\alpha \mathrm{EU}(\alpha)$ over the five-element set $\alpha \in \{\text{hold},\, \text{buy@}A_t,\, \text{sell@}B_t,\, \text{bid},\, \text{ask}\}$, where $A_t$ is the current best ask and $B_t$ is the current best bid. buy@$A_t$ crosses the book at the resting ask (deterministic fill, $p_{\text{fill}} = 1$); sell@$B_t$ lifts the resting bid (deterministic fill); bid and ask post passive quotes ($p_{\text{fill}} = 0.30$). Plans II/III use a seven-element LLM action set: $\{\text{BUY\_NOW, SELL\_NOW, BID\_1, BID\_3, ASK\_1, ASK\_3, HOLD}\}$.
Action selection — output of EU maximization (Figures 2–3)
$\tau_{r \to s}$
Trust of receiver $r$ in sender $s$. Updated by exponential moving average: $\tau_{r \to s} \leftarrow (1 - \lambda)\,\tau_{r \to s} + \lambda \cdot \text{closeness}_{r,s}$, where $\lambda = 0.30$ is the EMA learning rate and $\text{closeness} = \max\!\bigl(0,\, 1 - |\hat{v}_s - \text{VWAP}_t|\,/\,\text{VWAP}_t\bigr)$. Initialized at $0.5$; self-trust fixed at $1.0$.
Messaging diagnostic; context for Plan II/III prompts (Figure 3)
$\pi_i^{\text{II}}, \pi_i^{\text{III}}$
Structured LLM prompts for Plans II and III. $\pi^{\text{II}}$ includes market rules, agent state, and the explicit universal CRRA formula $U(w; \rho_i) = w^{1-\rho_i}/(1 - \rho_i)$ with the agent's sampled $\rho_i$. $\pi^{\text{III}}$ omits the formula and supplies only the risk-preference label.
LLM posterior — input to $\alpha^\star_{i,t} \leftarrow \text{LLM}(\pi_i)$ (Figure 3)
$Q$
Total shares outstanding, $Q = \sum_i q_i$, conserved under double-auction trades (shares transfer, never created or destroyed).
Mean trade price in global period $t$. $\bar{p}_t = \sum_{j \in \mathcal{T}_t} p_j \,/\, |\mathcal{T}_t|$ where $\mathcal{T}_t$ is the set of trades in period $t$. Used as the basis for Haessel $R^2$ and amplitude.
Market-quality diagnostics (Figure 4, Table 1)
$R^2_{\text{Haessel}}$
Haessel (1978) coefficient of determination. $R^2 = 1 - \sum_t (\bar{p}_t - \mathrm{FV}_t)^2 \,/\, \sum_t (\bar{p}_t - \overline{\bar{p}})^2$. Measures how closely per-period mean prices fit the fundamental staircase; can be negative if mispricing exceeds sample variance.
Market-quality diagnostics (Figure 4, Table 1)
$\mathrm{ND}$
Normalized absolute price deviation. $\mathrm{ND} = \sum_j |p_j - \mathrm{FV}_{t(j)}| \cdot q_j \,/\, Q$, summing over all trades $j$ weighted by quantity, divided by total shares outstanding.
Market-quality diagnostics (Figure 4, Table 1)
$A$
Price amplitude. $A = \bigl(\max_t (\bar{p}_t - \mathrm{FV}_t) - \min_t (\bar{p}_t - \mathrm{FV}_t)\bigr) \,/\, \mathrm{FV}_1$. Peak-to-trough excursion of the mean-price residual, normalized by the initial fundamental.
Market-quality diagnostics (Figure 4, Table 1)
$\mathrm{TO}$
Turnover. $\mathrm{TO} = \sum_j q_j \,/\, Q$ — total shares traded (summing quantity $q_j$ over all trades $j$) divided by total shares outstanding $Q$. A value of $1.0$ means every share changed hands once.
Market-quality diagnostics (Figure 4, Table 1)
$\rho_t$
Price-to-fundamental ratio. $\rho_t = p_t \,/\, \mathrm{FV}_t$ (Lopez-Lira 2025), where $p_t$ is the most recent trade price at tick $t$. Values $> 1$ indicate overpricing; persistent $\rho_t \gg 1$ signals a bubble.
Market-quality diagnostics (Table 1)
Figures
1
Transaction Price Trajectory vs Fundamental Value
Tick-level price plotted against the deterministic step function $\mathrm{FV}_t$. Persistent excursions above the staircase mark the bubble; the final-period collapse is the crash.
2
Signed Mispricing
Signed departure $p_t - \mathrm{FV}_t$ on a symmetric axis around zero: premium fills blue above the baseline, discount fills red below. Equivalent to Lopez-Lira's price-to-fundamental ratio $\rho_t$ with sign preserved.
3
Trade Volume per Period
Bar chart of per-period share quantity exchanged. A volume peak in the inflation phase followed by a cliff near $T$ is the SSW signature.
4
Transaction Density Heatmap
Two-dimensional trade histogram over (price, period). Warm cells concentrate liquidity; compared against the $\mathrm{FV}_t$ staircase reveals rational vs speculative regimes.
5
Agent Action Timeline
One row per agent, one cell per decision. Encodes the five-element action set $\alpha \in \{\text{hold},\,\text{buy@}A_t,\,\text{sell@}B_t,\,\text{bid},\,\text{ask}\}$ with a fill dot when the submitted order matched on the same tick.
6
Subjective Valuation · Per agent
Each Utility agent's period-end belief $V_i^{\text{post}}$ — the same trace regardless of which plan produced it, so trajectories can be compared across Plans I/II/III.
7
Pairwise Trust Matrix
Heatmap of $\tau_{r \to s}$ on $[0, 1]$ with the diagonal masked. Warm columns identify agents whose claims the population finds credible.
T1
Market-Quality Statistics (Table 1)
Live session metrics: Haessel $R^2$, normalized price deviations, amplitude, turnover, allocative efficiency, welfare, and deception statistics. Updates every render tick.
T20
10-Session Batch Results (Table 2)
Per-round market-quality metrics across the 10-session DLM batch. Each row is labelled $\texttt{R\{r\}\_S\{s\}}$ with mean deviation, turnover, trade count, volume, and aggregate payoff. Per-treatment aggregates summarise T20 vs T40 performance.
Source papers
Tag
Citation
Role in this simulator
DLM 2005
Dufwenberg, Lindqvist & Moore, Bubbles and Experience: An Experiment, AER 95(5), 1731–1737
Utility agent, EU scoring, risk functionals, trust EMA
SSW 1988
Smith, Suchanek & Williams, Bubbles, Crashes and Endogenous Expectations in Experimental Spot Asset Markets, Econometrica 56(5)
Canonical experimental-bubble design; the asset-life and dividend structure that DLM 2005 inherits
1 / 14
AI-Agent Prior Elicitation in Experimental Asset Markets
Algorithmic, LLM-Augmented, and Label-Only Belief Formation in a Continuous Double Auction
Plan I · AlgorithmicPlan II · LLM + Utility FormsPlan III · LLM + Risk Label
Browser-based experimental platform · Reproducible via seeded PRNG · Open-source
Motivation
Asset price bubbles are among the most robust phenomena in experimental economics. Smith, Suchanek & Williams (1988) demonstrated that even under common knowledge of fundamentals, laboratory markets consistently produce price trajectories that deviate from risk-neutral fundamental value. Dufwenberg, Lindqvist & Moore (2005) showed that experience — repeated participation in the same market structure — is a powerful bubble-suppressing channel.
Two developments motivate this project: (i) the emergence of large language models as plausible artificial economic agents (Horton, 2023; Brand et al., 2023), and (ii) the open question of whether LLM-driven belief formation reproduces, amplifies, or dampens the bubble dynamics that arise under algorithmic updating rules.
Horton (2023) — LLMs as simulated economic agents (homo silicus)
Park et al. (2023) — generative agents and social simulation
Aher et al. (2023) — using LLMs to simulate survey responses
Lopez-Lira (2025) — expected-utility scoring framework for AI agents
Gap
No controlled factorial comparison of algorithmic vs. LLM belief update within the same continuous double auction substrate, holding market structure, endowments, and information sets constant.
RQ 1. Does an LLM-driven belief update (Plan II) produce market dynamics statistically equivalent to the deterministic algorithmic baseline (Plan I), as measured by Haessel $R^2$, normalised deviation, amplitude, and turnover?
RQ 2. Does providing the LLM with the explicit closed-form universal CRRA utility $U(w; \rho_i) = w^{1-\rho_i}/(1 - \rho_i)$ (Plan II) yield different posterior valuations than providing only a risk-preference label (Plan III)?
RQ 3. How do risk composition $(\alpha_L, \alpha_N, \alpha_A)$, strategic deception, and endogenous experience $(k_i)$ interact with the belief-formation channel across all three plans?
Key idea · three-plan factorial
We hold the market microstructure constant and vary only the belief-update mechanism. Each Utility agent $i$ forms a prior $V_i^{\text{prior}}$ at every period boundary, then updates it to a posterior $V_i^{\text{post}}$ through exactly one of three channels:
Plan I · Algorithm
$V_i^{\text{post}} = f(V_i^{\text{prior}}, M, k_i)$
Deterministic weighted blend. No stochasticity beyond the seeded PRNG. Reproducible baseline.
Prompt includes the universal CRRA utility $U(w; \rho_i) = w^{1-\rho_i}/(1 - \rho_i)$ with agent $i$'s sampled $\rho_i$ substituted, plus wealth and peer claims.
Same wiring as Plan II, but prompt contains only the risk-preference label — no functional form.
Identification: same seed, same endowments, same market order → differences arise solely from the update channel.
Market substrate · DLM (2005)
The shared environment replicates the Dufwenberg, Lindqvist & Moore (2005) continuous double auction, scaled to $T = 20$ periods per round. A session consists of $R = 4$ rounds, each a complete $T = 20$-period market. Dividends are i.i.d. draws from $\{0, 10\}$¢ with $\mathbb{E}[d] = 5$¢ (the paper used $T = 10$ with $d \in \{0, 20\}$¢; scaled here to keep $\mathrm{FV}_1 = 100$¢).
All 100 agents are EU-maximising Utility agents (Lopez-Lira 2025) using the v3 §2 prior — an experience-weighted blend of model-based $\widetilde{\mathrm{FV}}$ and a four-term heuristic mix $H$, tilted by bias and Gaussian noise. Per-period belief update is the experimental variable — the only dimension that varies across Plans I, II, and III.
Risk composition
$(\alpha_L, \alpha_N, \alpha_A)$ summing to 100%
Risk-loving / neutral / averse mix is the sole composition knob. Controlled by three linked sliders.
Each agent draws from $b_i \in \{-0.15, 0, +0.15\}$, a belief mode (honest/deceptive), and a risk preference.
Endogenous experience
$k_i \to (\alpha_i,\,\sigma_i,\,\omega_i)$
$k_i$ starts at 0, incremented each round. Drives the v3 §3 triple: $\alpha_i$ (fundamental weight), $\sigma_i$ (prior noise), $\omega_i$ (self (non-peer) weight) — all rendered on agent cards. Post-asset-swap blend: $x_{\text{new}} = |\mathrm{corr}|\!\cdot\!x_{\text{trained}} + (1-|\mathrm{corr}|)\!\cdot\!x_0$.
DLM 2005 uses $N = 6$ homogeneous human subjects. This simulator scales the protocol to $N = 100$ heterogeneous Utility agents whose belief formation is the experimental treatment.
Normalized: $u_{i,t} = U(w_{i,t};\rho_i)\,/\,U(w_{i,0};\rho_i)$ so all agents start at $u = 1$. Risk composition controlled by $(\alpha_L, \alpha_N, \alpha_A)$ summing to 100%; the per-category $\rho_i$ is drawn from the agent's seeded sampler.
Every Utility agent forms its period-$t$ prior as an experience-weighted blend of a model-based fundamental $\widetilde{\mathrm{FV}}_{i,t}$ and a four-term heuristic mix $H_{i,t}$ (Anchor · Trend · DividendSignal · Narrative, v3 §4 default betas $(0.50, 0.20, 0.20, 0.10)$), multiplied by a persistent per-agent bias tilt $b_i \in \{-0.15,\,0,\,+0.15\}$ and perturbed by a Gaussian noise draw $\varepsilon_i \sim \mathcal{N}(0, \sigma_i^2)$ via Box–Muller on the seeded PRNG. The three plans differ only in how this prior maps to a posterior $V_{i,t}^{\text{post}}$.
The v3 §3 experience-indexed triple $(\alpha_i, \sigma_i, \omega_i)$ is read from $k_i = \texttt{roundsPlayed}$ every tick: novices ($k_i = 0$) sit at (1.00, 5.0, 0.60) and lean on the heuristic; veterans ramp toward (1.00, →0, 0.90) and anchor to the model-based value. When the round's pre- and post-session assets differ, each trained triple is pulled back toward the novice anchors by $(1 - |\mathrm{corr}|)$ where $\mathrm{corr}$ is the Pearson correlation between the two assets' expected FV paths — so uncorrelated asset swaps effectively re-seed experienced traders. Experience is purely endogenous; no agent is ever instantiated with $k_i > 0$.
40% weight on peer messages. High social influence.
Intermediate · $k_i \in \{1,2\}$
$\omega_i \in \{0.70, 0.80\}$
Declining openness to external signals.
Veteran · $k_i \geq 3$
$\omega_i = 0.90$
90% self-anchored. Minimal social updating.
$\omega_i$ is the v3 §3 self (non-peer) weight entry of the experience triple $(\alpha_i, \sigma_i, \omega_i)$ — shared with the prior in Slide 9 and with the Plans II/III Step-3 blend (same coefficient, different source for $V^{\text{prior}}$). Deterministic under the seeded PRNG; no network calls.
Trust of receiver $r$ in sender $s$ is updated by exponential moving average. Closeness is measured against the period's volume-weighted average price, clamped to $[0,1]$. Trust matrices persist across round boundaries — they constitute part of the agent's cumulative experience.
Under a deceptive strategy, the agent broadcasts a claim diverging from its private valuation. The lie-gap magnitude endogenously erodes trust via the EMA and feeds the mean-lie-magnitude diagnostic. This connects to the cheap talk literature (Crawford & Sobel, 1982): signals are credible only when senders' and receivers' incentives align.
Identical wiring and execution semantics to Plan II. The sole difference: the prompt $\pi_i^{\text{III}}$ omits the closed-form universal CRRA $U(w; \rho_i)$ expression and supplies only a natural-language risk-preference label.
Identification argument
Plan II $\setminus$ Plan III $=$ the causal effect of providing an explicit functional form. If Plan II $\approx$ Plan III, the LLM has already internalised the mapping from "risk-loving" to convex preferences — the formula is redundant. If Plan II $\neq$ Plan III, the functional form carries information the label does not, and the LLM's implicit risk model diverges from the specified one.
Both Plans II and III require an API key. Fallback semantics, clamping, and parallel dispatch are shared.
Experimental design · treatments & parameters
Factorial structure
Belief channel: Plan I / Plan II / Plan III
Risk composition: $(\alpha_L, \alpha_N, \alpha_A)$ summing to 100%
Tick resolution: $K$ ticks/period $\times\, T \times R$
$$ \text{Session} \;=\; R \times T \times K \;=\; 4 \times 20 \times 18 \;=\; 1\,440 \;\text{ticks}, \qquad \text{Batch} = 10 \;\text{sessions} \;=\; 14\,400 \;\text{ticks} $$
Data collection. Per-round metrics are labelled $\texttt{R\{r\}\_S\{s\}}$ (Round $r$ of Session $s$). One Start press runs all 10 sessions (5 × first treatment + 5 × other), collecting 40 round-level rows (4 rounds × 10 sessions) with mean deviation, turnover, trades, volume, and payoff.
DLM replication · Strict-DLM mode
The Strict-DLM paradigm replicates the DLM (2005) protocol scaled to $N = 100$ (half type A with $c = 200$¢, $q = 6$; half type B with $c = 600$¢, $q = 2$), buy-and-hold value $V_{\text{BH}} = c + q\!\cdot\!(T\!\cdot\!\mu_d) = 800$¢ (identical across both bundles under $T\!\cdot\!\mu_d = 100$¢/share — the product is preserved from the paper's $(10, 10)$ to the simulator's scaled $(20, 5)$). At the round 3→4 boundary, the engine runs Fisher-Yates replacement:
T20 treatment (R4-⅔)
$k = 20$ replaced, 80 veterans remain
Most of R4 population is experienced. Expected: bubble suppression carries over from R3.
T40 treatment (R4-⅓)
$k = 40$ replaced, 60 veterans remain
Larger fresh intake. Expected: fresh agents reignite the bubble in R4 despite veteran presence.
One click of Start runs 10 animated sessions (5 $\times$ first treatment + 5 $\times$ second) at the Speed slider rate. Per-round metrics are collected with $\texttt{R\{r\}\_S\{s\}}$ labels (40 rows total) into the batch results table, with per-treatment aggregates for T20 vs T40 comparison.
TO measures speculative intensity; AE measures whether assets flow to highest-valuation holders.
Hypotheses
H1 (LLM–algorithmic equivalence). Under matched seeds and endowments, Plan II produces market-quality metrics $(R^2_H, \mathrm{ND}, A, \mathrm{TO})$ not significantly different from Plan I: $\Delta_{\text{II-I}} \approx 0$.
H2 (Form–label divergence). Plan II posterior trajectories $\{V_i^{\text{post}}\}$ differ systematically from Plan III: the explicit functional form carries information that the risk label alone does not convey to the LLM.
H3 (Risk composition effect). Increasing $\alpha_L$ (risk-loving share) amplifies bubble magnitude (higher $A$, lower $R^2_H$), while increasing $\alpha_A$ dampens it — this effect holds across all three plans.
H4 (Deception–trust interaction). Strategic deception increases ND and decreases AE by corrupting the social-learning signal. The trust EMA partially mitigates this as $\tau_{r \to s} \to 0$ for persistent liars.
Testable via within-seed paired comparisons (H1, H2) and across-seed Monte Carlo sweeps over the $(\alpha_L, \alpha_N, \alpha_A)$ simplex (H3, H4).
Results · analysis framework
The platform supports three levels of analysis, each targeting different hypotheses:
Within-seed comparison
Run identical $(\text{seed}, N, \alpha)$ under Plans I, II, III. Compare price trajectories tick-by-tick and metric vectors period-by-period. Directly tests H1 and H2.
Monte Carlo sweep
Vary seeds and risk compositions across the $(\alpha_L, \alpha_N, \alpha_A)$ simplex. Aggregate metric distributions to test H3 and H4 under repeated sampling.
Reproducibility: seeded PRNG guarantees identical runs under Plan I. Endowment edits preserve the engine seed.
Replay system: append-only history arrays enable exact state reconstruction at any tick via $\texttt{buildViewAt}(t)$.
Sensitivity: all tunables exposed as sliders with safe defaults via $\texttt{ctx.tunables}$; parameter sweeps are first-class.
External validity & limitations
LLM stochasticity: temperature $> 0$ introduces irreducible noise; fallback to Plan I on failure introduces survivorship.
Single CDA environment: results may not generalise to call markets, limit-order books, or other auction formats.
API latency: async LLM calls may interact with tick timing; clamped valuations bound extreme outputs.
The platform's no-dependency, browser-native architecture eliminates environment-configuration confounds and enables full portability.
Contributions
Theoretical
Formal expected-utility framework for heterogeneous belief formation in a CDA, connecting the SSW/DLM experimental-markets tradition to the Lopez-Lira EU-scoring approach. First controlled isolation of the LLM belief channel.
Methodological
Open-source, browser-native simulation platform with seeded PRNG reproducibility, replay, and multi-paradigm support (Strict-DLM, Lopez-Lira, AIPE). No build step, no dependencies.
Empirical
First within-substrate factorial comparison of algorithmic vs. LLM-augmented vs. label-only belief updating. Three-plan design enables causal identification of the information content of utility-function representations.
Limitations & future work
Current limitations
LLM output is non-deterministic: temperature, context window, and model version affect reproducibility.
Single auction format (CDA); generalisability to other market mechanisms untested.
Agent heterogeneity limited to the strategy cube ($\text{bias} \times \text{belief} \times \text{risk}$); richer preference spaces unexplored.
No human subjects — all agents are artificial; ecological validity requires lab validation.
Extensions
Multi-provider benchmarking (GPT-4o/5.4, Claude Opus/Sonnet 4.6, Gemini 3/3.1) to test model-dependence of H2.
Hybrid markets: mix LLM agents with human subjects for ecological validity.
Richer communication: multi-round dialogue and explicit reasoning chains in Plan II/III prompts.
Field-data calibration: estimate $(b_i, \varepsilon)$ distributions from market microstructure data.
Takeaway
Can an LLM replicate — or improve upon — a calibrated algorithmic belief-update rule in an experimental asset market?
This platform provides the controlled experimental environment to answer that question: same market, same endowments, same seed — only the belief channel varies.
Plan I · AlgorithmicPlan II · LLM + Utility FormsPlan III · LLM + Risk Label
Keyboard: ← / → navigate · F fullscreen · Esc exit | Switch to the Experiment tab to run a live session
Population size — the number of UtilityAgents in each session. Drag between 6 (paper-faithful DLM 2005 thin-book regime) and 100 (simulator-scaled thick-book regime).
Round-4 treatment sizes scale with : at = 6 the paper's T2 / T4 (replace 2 or 4 of 6), at = 100 the simulator's T20 / T40, linearly interpolated in between. The treatment labels update as you slide.
Changing redraws every agent and resets the engine seed. Mix sliders rescale to keep U = N − F − T.
Rounds per session — how many consecutive -period markets the same population plays before a new session begins.
DLM 2005 pins = 4 (the paper default). The 10-session batch runner still runs 10 sessions regardless, so raising lengthens every session to · · K ticks while holding the number of sessions fixed.
Cross-round learning (roundsPlayed, trust, belief mode) persists within a session and resets between sessions, so controls how much experience each cohort accumulates before the batch moves on.
Changing clamps the replacement-round slider to [2, ] and resets the engine.
Per-session scheduler — one row for each of the 10 sessions in the batch. Each row sets three things: the replacement rate, the pre-replacement asset traded in rounds 1…r−1, and the post-replacement asset traded from round r onward (r is the replacement round, set by its own slider above).
The replacement-rate slider reads as a percent in [10%, 50%] (step 1%); the engine projects through the current to an integer count on Start. The default seed reproduces DLM 2005's symmetric split: sessions 1–5 at 20% (↔ T20 at N = 100) and sessions 6–10 at 40% (↔ T40).
The two asset selectors form a pair: picking the same asset in both slots runs a single-asset session (the legacy behaviour). Picking different assets makes the market swap its FV path at the same boundary the fresh agents are spliced in — so you can study "new regime meets new population" experiments in one Start.
The corr readout on each row is the Pearson correlation of the two assets' FV paths sampled from a pre-round r₀ simulation — one full round (T periods) is simulated per asset with a stable seed before the real batch starts. Deterministic assets trivially reproduce their closed-form path; stochastic assets (random walk, jump/crash) draw one representative trajectory. 1.00 when both slots match, 0.00 for orthogonal paths, negative when they move opposite ways. When one side's sampled path is constant (Perpetual is flat by construction, so pairings like LD ↔ CP, CP ↔ LG, CP ↔ CY land here) Pearson is mathematically undefined and we coerce to 0.00 — treating a flat asset as carrying no linear information about the other. Downstream, |corr| is the experience-transfer weight: agent cards display // blended as |corr| · trained + (1 − |corr|) · (, , ) once the round-r replacement has fired, so trained experience carries over fully when the two assets are collinear and evaporates back to the novice anchors when they are not.
Because every rate is a fraction of the population, the schedule is invariant under changes. Editing any of rate / pre / post reseeds the engine so the new schedule takes effect on the next Start.
Replacement round — the round boundary at which treatmentSize veterans are swapped for fresh agents (roundsPlayed = 0). DLM 2005 fixes = 4 so replacement happens at the end of round 3, giving newcomers three rounds of price history but none of their own experience.
The slider lets you pick any round in [2, ]. = 2 splices newcomers in after a single round of price history (minimal experience gap); = is the paper-faithful "last round" replacement.
Survivors keep their cash trajectory, trust matrix, and belief mode across the boundary; fresh agents arrive with a clean roundsPlayed counter and the treatment endowment from the original spec.
When ON each UtilityAgent's prior becomes .
∈ {−1, 0, +1} is drawn at birth (pessimistic / unbiased / optimistic); = 0.15. The tilt = · is persistent — optimistic agents systematically overpay, pessimistic agents under-bid, creating directional mispricing that survives across rounds.
When OFF is zeroed and every agent starts from the true .
When ON each UtilityAgent's prior gains i.i.d. per-tick jitter with , n = 0.03 (±3%).
Models moment-to-moment uncertainty in the agent's private signal. Noise alone centres on but widens the bid–ask spread; combined with bias it amplifies mispricing dispersion.
When OFF the noise term is zeroed and the prior is deterministic each tick.
Regulator (Plan II / III only). The slider value is the bubble-ratio threshold at which the regulator fires its one-shot warning. = 0 disables it.
When > 0 the engine monitors the ratio at every period boundary; the first time it crosses within a round it injects a REGULATOR WARNING block into every Utility agent's LLM prompt for the rest of that round (cleared at the next round_start).
Plan I has no LLM channel, so the toggle is recorded in the snapshot but agent behavior is unchanged.
Fundamental weight — the novice ( = 0) intercept of the v3 §2 fundamental weight ∈ [0, 1]. represents the agent's fundamental weight, i.e., the weight placed on the model-based valuation in the §2 prior $V_{i,t}^{\text{prior}} = \alpha_i\,\widetilde{\mathrm{FV}}_{i,t} + (1 - \alpha_i)\,H_{i,t} + \varepsilon_{i,t}$, with the complement 1 − going to the heuristic reading .
Experience grows this weight via the v3 §3.2 / §6.1 rule with = 0.40 and = 0.15. A novice places 40% weight on and 60% on ; after ≈ 4 rounds the min-cap clips at 1.0 and veterans are pure fundamental-followers.
Not the same as the paper'sAnchor term — that's the first primitive of = ·Anchor + ·Trend + ·DividendSignal + ·Narrative (green β row below). lives one level up — it sets how much of the heuristic mix gets used, not what's inside it. Dragging rightward makes even green replacements fundamentals-anchored from the start (bubble-attenuating); leftward hands more weight to the heuristic branch and leaves novices more vulnerable to price momentum and narrative.
Experience growth rate — how fast ramps per round of experience. Rule .
At the default = 0.15 the step from = 0.40 to the saturation value 1.00 takes 4 rounds, matching DLM 2005's observation that experience kills the bubble by round 4. Larger collapses the novice-veteran gap (everyone converges faster); smaller values stretch it out so bubble-prone behavior persists deeper into the session.
Set = 0 to lock every agent at its novice for the whole batch — useful for testing whether the replacement channel alone generates any bubble attenuation.
Novice valuation noise — the standard deviation (in cents) of the Gaussian draw an inexperienced agent adds to its subjective valuation each period: ~ with .
At the default = 15 a novice's prior jitters ±15¢ around the model anchor (≈ ±15% on an asset with 1 = 100). Experience contracts this exponentially — at = 3 the noise is ≈ 6¢ and at = 10 it falls below 1¢.
This is the "experience kills the bubble" channel restated as computational-precision decay. Advanced → Prior Noise must be ON for the noise to actually feed updateBelief; toggling it OFF zeros every regardless of the value set here.
Noise decay rate — the exponential decay rate of per round of experience. Rule .
At the default = 0.30 the half-life is ln(2)/0.30 ≈ 2.3 rounds: = 2 agents are at ≈55% of the novice noise, = 4 agents at ≈30%. Setting = 0 freezes every agent at (no experience-driven precision gain); high values (≥ 0.7) collapse experienced agents to near-zero noise in a single round.
Combine with a lowered to study "precise but heuristic" cohorts where traders sharpen their computation before they start trusting the model.
Anchor weight in the heuristic mix = ·Anchor + ·Trend + ·DividendSignal + ·Narrative (v3 §4).
The Anchor term reads off the asset's current t — so a pure-anchor heuristic ( = 1, others = 0) collapses to t and the prior becomes a pure / blend. At the default = 0.50 it dominates the heuristic mix.
Dragging rightward makes the heuristic branch model-anchored (bubble-attenuating); leftward hands weight to the trend / dividend / narrative terms and lets price momentum, recent draws, or regime stories drive the heuristic instead.
Trend weight in the heuristic mix . The Trend term tracks the first difference of recent prices (v3 §4.2) and is omitted at t = 1 because there is no previous period to difference against (§6.3).
Default = 0.20. Raising it makes heuristic traders chase momentum — a classic bubble amplifier. = 0 kills the momentum channel entirely, leaving the heuristic reading a mix of level and fundamentals.
Dividend-signal weight in the heuristic mix . The DividendSignal term reads off the running mean of observed dividends (v3 §4.3), acting as a coarse empirical proxy for the asset's cash-flow rate.
Default = 0.20. Up-weighting it makes heuristic agents lean on empirical dividend history (slower but more grounded); down-weighting cuts the link to realised cash flows and leaves the heuristic purely price-driven.
Narrative weight in the heuristic mix . The Narrative term encodes per-asset regime priors with per-agent idiosyncratic tilts (v5 §5.{17,23,29,35}) — e.g. a "this asset is in a growth regime" or "this asset is in a crash regime" tilt, asset-specific by construction.
v5 samples one narrative trait per agent from a Gaussian: gi ~ N(5, 52), gi ≥ 0 for Linear Growth (growth optimism); ci ~ N(0, 52) for Cyclical (paired with λc = 4 trend-sign reaction); ui ~ N(0, 52) for Random Walk; and hi ~ N(4, 42), hi ≥ 0 plus crash-underweight δi for Jump / Crash. The relevant trait surfaces as a chip on the View Stats → Agent model panel.
Default = 0.10, the smallest of the four .. by design. Raising it lets the per-agent narrative trait dominate the heuristic branch (useful for studying narrative-driven bubbles); lowering it collapses the heuristic toward the anchor / trend / dividend trio. The Σβ chip at the end of the β-row shows the current total so you can tell at a glance when the mix has drifted off the v3 §6.2 default 0.5 / 0.2 / 0.2 / 0.1.
Novice self (non-peer) weight — in the Step-3 posterior blend $V^{\text{post}} = \omega_i \cdot V^{\text{prior}} + (1 - \omega_i) \cdot \bar{m}$, is the weight on the agent's own prior and $1 - \omega_i$ is the weight on the peer-message mean $\bar{m}$. is that weight for a fully-inexperienced agent ( = 0) — i.e. how much a novice trusts its own read over the crowd's.
Live rule = + Δω · min(, kmax). At the current = 0.60 a novice leans on its own read at that ratio; veterans saturate at 0.90 (Δω = 0.10, kmax = 3 is hardcoded in ExperienceConfig).
Drop toward 0 to make green replacements herd on their peers (stronger social learning, faster convergence). Push it toward 1 to make them anchor stubbornly on their own v3 §2 prior.
Σβ total — live sum of the four heuristic weights + + + . The v3 §6.2 spec fixes this at 1.00 (0.50 / 0.20 / 0.20 / 0.10) so that is a convex combination of the four terms.
The tile glows amber when the total drifts off 1.00 so researchers running non-convex mixes (e.g. pure-anchor, inflated trend) can tell at a glance that the heuristic is no longer unit-normalised.
Bounded Rationality (Plan II / III only). Caps the LLM trader's cognitive budget so it trades like a human subject instead of a textbook optimiser.
When ON the system prompt adds: K = 3 reasoning steps, N = 5 pieces of attention, T = 3 periods of price memory, perceived FV = true FV + ε ~ N(0, σ²) with σ = 10¢, and execution-noise probability p = 0.10 of a suboptimal action. The model must commit to ONE heuristic — trend-following, mean-reversion, anchoring to past trades, or randomized preference — rather than computing FV from the full dividend model.
The memory cap also truncates the price paths spliced into the user prompt (last T periods only), so the LLM sees the same short window a human would hold in working memory.
Linear Declining (DLM) — the classic Smith-Suchanek / Dufwenberg–Lindqvist–Moore experimental asset.
Fundamental value: FVt = 5 · (T − t + 1), so FV1 = 100 and FVT = 5 at T = 20. The staircase is the canonical bubble-experiment target: every period the asset is worth one less dividend of expected cash flow.
Dividend: dt ∈ {0, 10}¢ with equal probability, so E[dt] = 5¢. No terminal value — after period T the asset is worthless.
Use case. Reproduces DLM 2005 Figure 1. Mispricing against a known-declining fundamental is the textbook bubble signature; the Complex-Dividends toggle swaps the coin flip for the paper's 5-point distribution to activate bounded-rationality priors.
Perpetual — a flat-FV reference asset. Removes the declining-value channel so any bubble must come from belief / liquidity dynamics alone.
Fundamental value: FVt = E[d] / r = 5 / 0.05 = 100 for every period. The asset is a perpetual claim with terminal value 100 (no horizon effect).
Dividend: dt ∈ {4, 6}¢ with equal probability, E[dt] = 5¢. Tight dispersion keeps the sample mean close to the truth, so the rational prior should stick at 100 across the full round.
Use case. Null asset for bubble-attribution studies — if prices drift from 100 there is no fundamental tailwind to blame.
Linear Growth — rising-expectations regime where each period's expected dividend increases linearly in t (v5 §5.13–§5.17).
Dividend: dt ~ N(a + b·t, 1) with (a, b) = (2, 0.3), so the mean walks from 2.3 at t = 1 up to 8.0 at t = 20, and keeps rising beyond the display window.
Fundamental value is the Gordon perpetuity on the rising expected dividend, FVt = (2 + 0.3·t) / r at r = 0.05 (v5 §5.16). The path runs monotonically upward — FV1 = 46, FV10 = 100, FV20 = 160 — because this is a perpetual growth asset with no terminal date.
Use case. A genuinely rising FV environment: the agent's model-based anchor tracks the rising dividend one-for-one, so overpricing now isolates the effect of the narrative trait gi (growth optimism) plus one-sided momentum, cleanly separated from anchor error.
Cyclical — wave-shaped FV around a flat mean, probing whether agents can track a non-monotone fundamental (v6 §5.22).
Dividend: dt ~ N(5 + 2 · sin(2π (t − 1) / 10), 1), clamped to be non-negative. Cycle length 10 periods, mean 5, amplitude 2.
Fundamental value is the Gordon perpetuity on the current-period expected dividend, FVt = (5 + 2 · sin(2π (t − 1) / 10)) / r at r = 0.05. The path oscillates between 60 and 140 around a long-run mean of 100 — this is a perpetual cyclical asset with no explicit terminal date, so future payoffs may extend indefinitely.
Use case. Seasonal / cyclical value benchmark. Measures whether trading rules trained on monotone FV transfer to oscillatory environments.
Random Walk Fundamental — a path-dependent FV with no predictable trajectory, so every round traces a different FV history.
Fundamental value: FVt+1 = max(20, FVt + ηt) with ηt ~ N(0, 52) and floor 20. FV1 = 100; the path is generated lazily one period at a time so replay is deterministic given the seed.
Dividend: dt = FVt − FVt+1 / (1 + r) with r = 5%, clamped at 0. This is the martingale-consistent cash flow backed out of the realised path.
Use case. No trend-following edge: any bubble here indicates pure belief coordination or herding rather than a reaction to an exploitable FV pattern.
Fundamental value: FVt+1 = FVt + 2 with probability 0.9, FVt+1 = max(5, FVt − 30) with probability 0.1. FV1 = 100, floor at 5, expected per-period drift −1.2.
Dividend: dt = FVt − FVt+1 / (1 + r) with r = 5%, non-negative. Crashes produce a large positive dividend spike (the pre-crash FV paid out as cash before the drop).
Use case. Rare-disaster / fat-tailed regime. Tests whether agents discount the crash correctly or over-weight the 90% drift and get caught out.