<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Ryan Tolone</title>
<link>https://ryan-tolone.com/projects/</link>
<atom:link href="https://ryan-tolone.com/projects/index.xml" rel="self" type="application/rss+xml"/>
<description>Research and engineering across ML, causal inference, poker AI, and trading.</description>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Sat, 25 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Deep CFR for 5-Card PLO Heads-Up</title>
  <link>https://ryan-tolone.com/projects/plo5-deepcfr/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p><strong>PLO</strong> stands for <em>Pot-Limit Omaha</em>, a poker variant similar to Texas Hold’em except each player gets four hole cards instead of two and must use exactly two of them combined with three from the board. <strong>5-card PLO</strong> is the same thing but with <em>five</em> hole cards. It’s the most popular high-stakes cash game variant outside of no-limit Hold’em and is famously the most mathematically complex form of poker — there are far more possible hands and the equity calculations between hands are much closer, so the strategy involves much more nuance.</p>
<p>This project teaches a neural network to play heads-up 5-card PLO at near-equilibrium strength, using the same Deep CFR algorithm I built up to in the <a href="../hunl-deepcfr/">HUNL project</a> (the no-limit Hold’em version). PLO5 has <em>no published Deep CFR work</em> — even Pluribus and Libratus stop at Hold’em — so this is original research-grade implementation, not a port of someone else’s code.</p>
<p>The headline isn’t a final exploitability number (the training is still running and won’t finish for weeks of compute). The headline is <strong>engineering</strong>: I diagnosed a 12–25× performance gap between what I had and what’s needed to finish in reasonable time, applied a series of optimizations that delivered a 58× speedup on the dominant cost, and then <em>honestly reported</em> that the encoder is no longer the bottleneck and the remaining work has to come from a different lever (multi-process traversal). It’s the kind of profile-driven optimization story that’s more useful than a glossy result, because it shows what scaling really looks like.</p>
</section>
<section id="why-plo5" class="level2">
<h2 class="anchored" data-anchor-id="why-plo5">Why PLO5</h2>
<p>Heads-up no-limit hold’em was solved (Stage 4). 4-card PLO is in the literature. <strong>5-card PLO</strong> — five hole cards, choose-2 in your hand combined with choose-3 from the board — has <em>no published Deep CFR work</em> and adds two non-trivial complications over PLO4:</p>
<ol type="1">
<li><strong>Hand-evaluator combinatorics</strong>: <code>C(5,2) × C(5,3) = 100</code> two-and-three combos per (hand, board) pair. The PLO5 evaluator (<code>plo5_evaluator.py</code>) computes all 100 and takes the best — this is not a <code>phevaluator</code> call, it’s its own thing.</li>
<li><strong>Encoder cost</strong>: 5-card hands push state-feature dimensionality up, and naive equity computation per state explodes.</li>
</ol>
<p>That second complication is what dominated Stage 6 engineering.</p>
</section>
<section id="the-encoder-optimization-that-mattered" class="level2">
<h2 class="anchored" data-anchor-id="the-encoder-optimization-that-mattered">The encoder optimization that mattered</h2>
<p>In the K=500 smoke profile, the original encoder ran at <strong>2,620 µs / encode</strong>. A K=10,000 traversal at 300 iters would have taken <strong>100 days</strong>. So I rebuilt the hot path:</p>
<ul>
<li><strong>Opp-value cache</strong> keyed by canonical board → distribution over opponent ranges; populated lazily, persists across the iter</li>
<li><strong>Numpy vectorization</strong> of the per-combo equity rollouts</li>
<li><strong>Classify memoization</strong> of board-class lookups (flush / straight / paired structures)</li>
</ul>
<p>K=500 diagnostic results, post-optimization:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th></th>
<th style="text-align: right;">µs / encode</th>
<th>cache hits / misses</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Cold cache (warm-up pass)</td>
<td style="text-align: right;">79.0</td>
<td>525 / 367</td>
</tr>
<tr class="even">
<td>Warm cache (steady state)</td>
<td style="text-align: right;"><strong>44.8</strong></td>
<td>892 / 0</td>
</tr>
<tr class="odd">
<td>Pre-optimization baseline</td>
<td style="text-align: right;">2,620</td>
<td>n/a</td>
</tr>
</tbody>
</table>
<p><strong>58× steady-state encoder speedup</strong>, <strong>99.39% opp-value cache hit rate</strong> across a 1.46M-decision-point traverse, GPU forwards down to <strong>0.8% of traverse wall</strong>. Cache works.</p>
</section>
<section id="the-honest-readout-encoder-is-no-longer-the-bottleneck" class="level2">
<h2 class="anchored" data-anchor-id="the-honest-readout-encoder-is-no-longer-the-bottleneck">The honest readout: encoder is no longer the bottleneck</h2>
<p>After the optimization, a single K=500 traverse pass took 703.7 s for 1,461,675 decision points = <strong>481 µs / query</strong>. Of that, the encoder is ~45 µs (warm). The other <strong>~436 µs / state is Python overhead in the traversal loop itself</strong>:</p>
<ul>
<li>generator <code>yield</code>/<code>send</code>/stack save-restore across a 2.3M-call hot path</li>
<li>frozen-dataclass <code>PLO5State</code> reconstruction in <code>apply_action</code></li>
<li><code>legal_action_mask</code> recomputation</li>
<li>reservoir-buffer writes for traverser nodes</li>
<li>numpy mask/sigma allocations in the inner loop</li>
</ul>
<p><strong>GPU is essentially idle during traversal (0.8% of wall in forwards).</strong> Adding bigger nets or larger batches would barely move iter wall.</p>
</section>
<section id="component-level-iter-budget-at-k500" class="level2">
<h2 class="anchored" data-anchor-id="component-level-iter-budget-at-k500">Component-level iter budget at K=500</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>component</th>
<th style="text-align: right;">wall</th>
<th style="text-align: right;">%</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>traverse × 2 players</td>
<td style="text-align: right;">1,407.4 s</td>
<td style="text-align: right;"><strong>98.0%</strong></td>
</tr>
<tr class="even">
<td>aux rollouts × 2,000</td>
<td style="text-align: right;">1.5 s</td>
<td style="text-align: right;">0.1%</td>
</tr>
<tr class="odd">
<td>V rollouts × 2,000</td>
<td style="text-align: right;">1.4 s</td>
<td style="text-align: right;">0.1%</td>
</tr>
<tr class="even">
<td>train R × 800 (× 2)</td>
<td style="text-align: right;">9.6 s</td>
<td style="text-align: right;">0.7%</td>
</tr>
<tr class="odd">
<td>train S × 1,000 (× 2)</td>
<td style="text-align: right;">11.8 s</td>
<td style="text-align: right;">0.8%</td>
</tr>
<tr class="even">
<td>train V × 500 (× 2)</td>
<td style="text-align: right;">3.2 s</td>
<td style="text-align: right;">0.2%</td>
</tr>
<tr class="odd">
<td>TOTAL</td>
<td style="text-align: right;"><strong>23.9 min/iter</strong></td>
<td style="text-align: right;">100%</td>
</tr>
</tbody>
</table>
<p>Extrapolated to K=10,000 × 300 iters: ~<strong>100 days</strong> vs.&nbsp;the 4–8 day target. <strong>12–25× too slow.</strong> The encoder lever is exhausted.</p>
</section>
<section id="the-remaining-levers-ranked" class="level2">
<h2 class="anchored" data-anchor-id="the-remaining-levers-ranked">The remaining levers, ranked</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 23%">
<col style="width: 30%">
<col style="width: 23%">
<col style="width: 23%">
</colgroup>
<thead>
<tr class="header">
<th>Approach</th>
<th style="text-align: right;">Speedup</th>
<th>Cost</th>
<th>Risk</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>(A) Multi-process traversal (8 workers)</td>
<td style="text-align: right;">6–8×</td>
<td>1–2 days; needs IPC for state batching</td>
<td>Memory amplification; pickling cost</td>
</tr>
<tr class="even">
<td>(B) Reduce K from 10k → 2k</td>
<td style="text-align: right;">5×</td>
<td>trivial</td>
<td>Slower convergence per iter</td>
</tr>
<tr class="odd">
<td>(C) Reduce action grid 10 → 5–6 slots</td>
<td style="text-align: right;">2–3×</td>
<td>small game/encoder change</td>
<td>Loses pot-fraction granularity</td>
</tr>
<tr class="even">
<td>(D) Iterative explicit-stack traversal</td>
<td style="text-align: right;">1.5–2×</td>
<td>half-day refactor</td>
<td>Code complexity</td>
</tr>
</tbody>
</table>
<p>The current direction is (A) + (B) in tandem: 8-worker MP traversal at K=2,000.</p>
</section>
<section id="equity-pretraining-for-warm-starts" class="level2">
<h2 class="anchored" data-anchor-id="equity-pretraining-for-warm-starts">Equity pretraining for warm starts</h2>
<p>Stage 6d also pretrains the V/aux head on a 50k-state equity dataset (<code>stage6d_equity_dataset.npz</code>) generated by Monte-Carlo rollouts on canonical (hand, board) pairs. The pretrained checkpoint (<code>stage6d_equity_pretrained.pt</code>) lets the v3 ensemble start with non-random equity priors instead of bootstrapping them in the first 20 iters. The “BB60” variant in the v3 logs is this pretrained-init run.</p>
</section>
<section id="reference-benchmark-kuhn-poker-exploitability" class="level2">
<h2 class="anchored" data-anchor-id="reference-benchmark-kuhn-poker-exploitability">Reference benchmark — Kuhn poker exploitability</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/plo5-deepcfr/kuhn_exploitability.png" class="img-fluid figure-img"></p>
<figcaption>Kuhn poker — CFR+ vs.&nbsp;reference, exact convergence to ε ≈ 0.</figcaption>
</figure>
</div>
<p>Stage 0 sanity check from the bottom of the arc: the tabular CFR+ implementation reproduces the published Kuhn equilibrium exactly. Every later stage’s pipeline is built on this foundation; if the floor is wrong, the ceiling is decoration.</p>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Engineering a research project that can’t fit in RAM or wall-time at the obvious settings, and re-architecting until it does</li>
<li>Profile-first optimization: not “I optimized the encoder” but “<em>here’s the 58× the cache buys, here’s the 9% of iter wall that’s left, here’s why we now need MP</em>”</li>
<li>Knowing when to stop optimizing one lever and switch to another</li>
<li>Custom evaluator for a game with no off-the-shelf solver</li>
<li>Equity pretraining as a warm-start technique for neural CFR</li>
</ul>


</section>

 ]]></description>
  <category>Reinforcement Learning</category>
  <category>Game Theory</category>
  <category>Poker AI</category>
  <guid>https://ryan-tolone.com/projects/plo5-deepcfr/</guid>
  <pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/plo5-deepcfr/kuhn_exploitability.png" medium="image" type="image/png" height="90" width="144"/>
</item>
<item>
  <title>Switchback Experiments on a Simulated Marketplace</title>
  <link>https://ryan-tolone.com/projects/switchback/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>Imagine Uber wants to test a small price change. The obvious experiment: flip a coin for each rider — half see the new price (treatment), half see the old price (control). After a few weeks, compare conversion rates. Whichever arm did better wins.</p>
<p><strong>This is wrong on a marketplace, and it’s wrong in a way that fools almost everyone.</strong> When a treated rider books, they tie up a driver — a driver that <em>would have</em> served the next control rider. Treatment doesn’t just affect the treatment group; it eats into the control group’s experience. The two arms aren’t independent. The conversion gap you measure is much larger than the actual effect of the change, because control’s number is artificially depressed.</p>
<p>I wanted to <em>prove</em> this with numbers, not just describe it. So I built a simulated rideshare marketplace where I knew the true effect of the price change exactly (because I picked it), then ran both the naive coin-flip experiment and the production-standard fix — a <strong>switchback design</strong>, where the entire marketplace flips between old and new prices in time blocks. Comparing them against ground truth shows the naive design is <strong>208% biased</strong> and switchback recovers the true effect within 11%.</p>
<p>Switchback experiments are how Uber, Lyft, DoorDash, Instacart, and Airbnb actually run pricing tests. They’re rarely covered in coursework. This project is the worked walkthrough.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/switchback/01_bias_vs_recovery.png" class="img-fluid figure-img"></p>
<figcaption>Bias vs.&nbsp;recovery: naive A/B vs.&nbsp;switchback (W = 30m / 120m).</figcaption>
</figure>
</div>
</section>
<section id="headline-result" class="level2">
<h2 class="anchored" data-anchor-id="headline-result">Headline result</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Design</th>
<th>Mean τ̂</th>
<th>Bias</th>
<th>% of true τ</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Ground truth</td>
<td>+0.00796</td>
<td>—</td>
<td>—</td>
</tr>
<tr class="even">
<td>Naive A/B</td>
<td>+0.02449</td>
<td>+0.01653</td>
<td><strong>+208%</strong></td>
</tr>
<tr class="odd">
<td>Switchback (W = 30m)</td>
<td>+0.01051</td>
<td>+0.00256</td>
<td>+32%</td>
</tr>
<tr class="even">
<td>Switchback (W = 120m)</td>
<td>+0.00881</td>
<td>+0.00085</td>
<td>+11%</td>
</tr>
</tbody>
</table>
<p>200 Monte Carlo replicates per design; each replicate is a 30-day simulated marketplace at λ = 5/min, N = 22 drivers, mean trip = 15 min, baseline conversion 0.30, per-rider lift τ = 0.03.</p>
</section>
<section id="two-estimands-only-one-of-which-matters" class="level2">
<h2 class="anchored" data-anchor-id="two-estimands-only-one-of-which-matters">Two estimands, only one of which matters</h2>
<p>The naive A/B isn’t merely noisy — it’s answering the wrong question. It estimates the <em>conditional effect on a treated rider holding supply at the mixed-arm operating point</em> (≈ 0.025), which lines up with the +0.0245 we observe. But the launch decision depends on the <em>equilibrium</em> effect: treatment for everyone vs.&nbsp;control for everyone, which is +0.008, because in an all-treatment world supply is more depleted than in the mixed world. The two estimands diverge whenever supply is finite.</p>
</section>
<section id="window-length-bias-variance-tradeoff" class="level2">
<h2 class="anchored" data-anchor-id="window-length-bias-variance-tradeoff">Window-length bias-variance tradeoff</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/switchback/02_window_tradeoff.png" class="img-fluid figure-img"></p>
<figcaption>RMSE-optimal W ≈ 8× mean trip duration.</figcaption>
</figure>
</div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>W (min)</th>
<th>Bias</th>
<th>Std</th>
<th>RMSE</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>5</td>
<td>+0.00880</td>
<td>0.00196</td>
<td>0.00902</td>
</tr>
<tr class="even">
<td>30</td>
<td>+0.00243</td>
<td>0.00163</td>
<td>0.00292</td>
</tr>
<tr class="odd">
<td><strong>120</strong></td>
<td>+0.00085</td>
<td>0.00164</td>
<td><strong>0.00184</strong></td>
</tr>
<tr class="even">
<td>240</td>
<td>+0.00057</td>
<td>0.00190</td>
<td>0.00199</td>
</tr>
</tbody>
</table>
<p>Bias decays roughly geometrically in W; variance is roughly flat at this horizon. RMSE-optimal W ≈ 8× mean trip duration — that’s a generalizable heuristic, but the lesson is don’t import a fixed W from another company. Re-derive it at your own operating point.</p>
</section>
<section id="carryover-diagnostic" class="level2">
<h2 class="anchored" data-anchor-id="carryover-diagnostic">Carryover diagnostic</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/switchback/03_carryover.png" class="img-fluid figure-img"></p>
<figcaption>Within-window carryover by previous-arm.</figcaption>
</figure>
</div>
<p>Within each window I bin riders by position from window-start and compare conversion rates across the four <code>(this_arm, prev_arm)</code> regimes. Control windows that follow a treatment window have depressed conversion in the early bins — drivers are still busy from the prior treatment regime. The signal is small (~0.4pp) but consistent.</p>
</section>
<section id="variance-estimation" class="level2">
<h2 class="anchored" data-anchor-id="variance-estimation">Variance estimation</h2>
<p>For W = 30 on 30 simulated days (1,440 windows, 216k riders), three SE estimators agree closely — naive per-rider, cluster-robust by window, and a 2,000-rep block bootstrap. The within-window dependence is weak enough at this λ that cluster-robust ≈ i.i.d., but I implemented all three so the diagnostic exists.</p>
</section>
<section id="power-analysis-the-practical-cost-of-doing-this-honestly" class="level2">
<h2 class="anchored" data-anchor-id="power-analysis-the-practical-cost-of-doing-this-honestly">Power analysis: the practical cost of doing this honestly</h2>
<p>Because the equilibrium τ is small (+0.008), even the right-design experiment is power-constrained. With switchback SE ≈ 0.0019 at W = 120, the 80%-power MDE at α = 0.05 is roughly <code>2.8 × SE ≈ 0.0053</code>. The true effect sits just above the MDE — a 30-day switchback would detect it but with limited margin. A practitioner should plan for <strong>6–8 weeks</strong> to shrink the SE further. Naive power calculations using the per-rider effect would tell you a few days suffices. They’d be wrong.</p>
</section>
<section id="sign-flip-symmetry" class="level2">
<h2 class="anchored" data-anchor-id="sign-flip-symmetry">Sign-flip symmetry</h2>
<p>If contamination is the real cause, flipping the sign of τ should flip the sign of the bias. It does:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>τ parameter</th>
<th>Equilibrium τ</th>
<th>Naive bias</th>
<th>Switchback bias</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>+0.030</td>
<td>+0.0075</td>
<td>+0.0167</td>
<td>+0.0009</td>
</tr>
<tr class="even">
<td>−0.030</td>
<td>−0.0114</td>
<td>−0.0148</td>
<td>−0.0002</td>
</tr>
</tbody>
</table>
<p>The asymmetry in equilibrium τ is itself interesting — a price <em>increase</em> produces a larger-magnitude equilibrium effect than a comparable price decrease, because freed-up supply partially offsets a lift but reinforces a depression. Marketplaces amplify negative effects and damp positive ones.</p>
</section>
<section id="supply-scaling-sanity-check" class="level2">
<h2 class="anchored" data-anchor-id="supply-scaling-sanity-check">Supply-scaling sanity check</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/switchback/04_supply_scaling.png" class="img-fluid figure-img"></p>
<figcaption>Bias vanishes when supply is abundant.</figcaption>
</figure>
</div>
<p>The contamination bias vanishes as N → ∞. Confirms the mechanism is supply-side, not anything else.</p>
</section>
<section id="stack" class="level2">
<h2 class="anchored" data-anchor-id="stack">Stack</h2>
<ul>
<li>Discrete-event simulator (event-driven, not time-stepped) — <code>numpy</code>, <code>scipy</code>, <code>statsmodels</code></li>
<li>Cluster-robust SEs by window, block bootstrap on dependent data</li>
<li>Phases: zero-effect validation → main A/B vs.&nbsp;switchback Monte Carlo → variance + carryover → window sweep → figures</li>
<li>End-to-end runtime ≈ 6 minutes single-core</li>
</ul>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Identifying bias from interference / SUTVA violation</li>
<li>Implementing a real production technique used at every marketplace company</li>
<li>Bias-variance tradeoffs in experimental design under dependent data</li>
<li>Reading a result honestly: the naive A/B isn’t “noisy,” it’s answering a different question</li>
</ul>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Bojinov, Simchi-Levi &amp; Shephard (2023), <em>Design and Analysis of Switchback Experiments</em>, <strong>Management Science</strong>.</li>
<li>DoorDash engineering blog, <em>Switchback Tests and Randomized Experimentation Under Network Effects</em>.</li>
</ul>


</section>

 ]]></description>
  <category>Causal Inference</category>
  <category>Experimentation</category>
  <category>Simulation</category>
  <guid>https://ryan-tolone.com/projects/switchback/</guid>
  <pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/switchback/01_bias_vs_recovery.png" medium="image" type="image/png" height="81" width="144"/>
</item>
<item>
  <title>Crypto Strategy Discovery: Robust BTC &amp; ETH Research</title>
  <link>https://ryan-tolone.com/projects/crypto-research/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>A <strong>trading strategy</strong> is a rule for when to buy and sell — for example, “buy Bitcoin when its 50-day average crosses above its 200-day average, sell when it crosses below.” A <strong>backtest</strong> runs that rule against historical prices to see what the P&amp;L would have been.</p>
<p>The dirty secret: it is <em>trivially</em> easy to invent a backtest that looks profitable but would lose money in real life. You try a hundred different rules, pick the best one, and report it as if you’d discovered it. You don’t include trading fees. You optimize the rule’s parameters on the same data you’re testing it on. Each of these adds a little bit of “lookahead” or “selection” bias, and stacked together they turn random noise into a Sharpe-2 strategy on paper.</p>
<p>This project hunts for <em>real</em> edges in Bitcoin and Ethereum trading by deliberately designing the experiment to fail when no real edge exists. The eight rules below are not technical curiosities — they are the difference between a project that produces honest answers and one that produces wishful thinking. Several of the strategies looked great in early phases and then died on the frozen test split, <em>which is the project working correctly.</em></p>
</section>
<section id="anti-overfitting-principles" class="level2">
<h2 class="anchored" data-anchor-id="anti-overfitting-principles">Anti-overfitting principles</h2>
<ol type="1">
<li><strong>Frozen out-of-sample (OOS) holdout.</strong> The final ~30% of history is never used for parameter selection or strategy choice. It is touched exactly once, at the end of each phase.</li>
<li><strong>Walk-forward analysis.</strong> Parameters are re-fit on rolling windows; only <em>next-window</em> returns are recorded. No single-point fits.</li>
<li><strong>Deflated Sharpe Ratio (DSR).</strong> Every reported Sharpe is deflated by the number of trials run, following Bailey &amp; López de Prado (2014). A nominal Sharpe of 1.5 across 50 trials is <em>not</em> a discovery.</li>
<li><strong>Realistic frictions.</strong> 10 bps per side (20 bps round-trip) + 5 bps slippage on every trade.</li>
<li><strong>Parameter robustness.</strong> A strategy is only accepted if a <em>neighborhood</em> of parameters works — not a single sweet spot.</li>
<li><strong>Minimum-trades guard.</strong> Strategies with &lt; 30 trades on the test window are rejected for lack of statistical power.</li>
<li><strong>Concentration check.</strong> If &gt; 40% of profit comes from &lt; 5% of trades, the strategy is flagged as fragile.</li>
<li><strong>Buy-and-hold benchmark.</strong> Risk-adjusted outperformance vs.&nbsp;buy-and-hold, not absolute return, is the bar.</li>
</ol>
</section>
<section id="phase-progression" class="level2">
<h2 class="anchored" data-anchor-id="phase-progression">Phase progression</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Phase</th>
<th>Focus</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1–3</td>
<td>Trend, mean-reversion, breakout primitives</td>
</tr>
<tr class="even">
<td>4</td>
<td>Candidate selection on validation</td>
</tr>
<tr class="odd">
<td>6</td>
<td>Carry / funding-rate signals</td>
</tr>
<tr class="even">
<td>7</td>
<td>On-chain features (active addresses, whale flows)</td>
</tr>
<tr class="odd">
<td>8</td>
<td>Meta-ensemble of phase-1–7 survivors</td>
</tr>
<tr class="even">
<td>9</td>
<td>Hybrid strategies blending vol-regime gating with carry/momentum</td>
</tr>
<tr class="odd">
<td>10</td>
<td>Production candidate — final OOS evaluation</td>
</tr>
</tbody>
</table>
</section>
<section id="oos-results" class="level2">
<h2 class="anchored" data-anchor-id="oos-results">OOS results</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/crypto-research/BTC_holdout.png" class="img-fluid figure-img"></p>
<figcaption>BTC holdout — strategy survives the frozen test split.</figcaption>
</figure>
</div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/crypto-research/ETH_holdout.png" class="img-fluid figure-img"></p>
<figcaption>ETH holdout — comparable behavior on the cross-asset test.</figcaption>
</figure>
</div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/crypto-research/phase10_production.png" class="img-fluid figure-img"></p>
<figcaption>Phase 10 production candidate.</figcaption>
</figure>
</div>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Frozen-holdout discipline that catches lookahead bias <em>by design</em> — found and killed multiple “promising” strategies whose edge collapsed on the OOS split</li>
<li>DSR as a routine reporting metric, not a footnote</li>
<li>Cross-sectional analysis (BTC vs.&nbsp;ETH) to test whether an “edge” is asset-specific or generalizes</li>
<li>Honest reports: every phase has a <code>REPORT.md</code> with what survived, what didn’t, and <em>why</em> — including the dead ends</li>
</ul>
<p>The point of the project isn’t the equity curves. It’s that the equity curves you see survived a process designed to murder them.</p>


</section>

 ]]></description>
  <category>Trading Research</category>
  <category>Crypto</category>
  <category>Backtesting</category>
  <guid>https://ryan-tolone.com/projects/crypto-research/</guid>
  <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/crypto-research/BTC_holdout.png" medium="image" type="image/png" height="72" width="144"/>
</item>
<item>
  <title>LEAP Trading Strategy: Leveraged Long-Dated Options Backtest</title>
  <link>https://ryan-tolone.com/projects/leap/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>A <strong>LEAP</strong> (“Long-term Equity AnticiPation”) is a long-dated stock option — typically a call option that expires 1–3 years out. Because options give you leverage, buying LEAPs is a way to get something like 3–5× the upside of a stock for a fraction of the capital. Retail finance Twitter loves them: “Why buy 100 shares of QQQ when you can buy a LEAP and get the same dollar exposure for a quarter of the cost?”</p>
<p>The pitch is correct that LEAPs are leveraged. The pitch is <strong>wrong</strong> that the leverage is free. Options decay over time even when the stock is flat (theta), they get crushed when volatility drops (vega), and rolling them when they expire costs spread and slippage every cycle. A naive LEAP strategy can underperform just <em>holding the stock</em> over long horizons, and the drawdowns when volatility spikes can be brutal — while the stock is making new highs, your LEAP basket can still be down 60%.</p>
<p>This project asks the question carefully: across many combinations of how-deep-in-the-money, how-long-until-expiry, and how-often-you-rebalance, are there <em>any</em> LEAP strategies that risk-adjust above just buying and holding the underlying? The answer turns out to be qualified — yes, but only in a small allocation inside a mostly-stock portfolio, and at the cost of much larger drawdowns. The all-LEAP and self-funded “infinite money glitch” designs popular online don’t survive an honest backtest.</p>
</section>
<section id="designs-tested" class="level2">
<h2 class="anchored" data-anchor-id="designs-tested">Designs tested</h2>
</section>
<section id="designs-tested-1" class="level2">
<h2 class="anchored" data-anchor-id="designs-tested-1">Designs tested</h2>
<ul>
<li><strong>Fixed-deposit LEAPs</strong> — $1k every 2 weeks into a fixed moneyness/tenor LEAP</li>
<li><strong>Continuous DCA</strong> vs.&nbsp;<strong>lump + DCA</strong> vs.&nbsp;<strong>self-funded</strong> (no fresh deposits after year 1)</li>
<li><strong>Blended portfolios</strong> — stock + LEAP, stock + LEAP rolled, stock + LEAP held-to-vertical</li>
<li><strong>Barbell</strong> — small allocation to long-dated LEAPs + larger cash buffer</li>
<li><strong>Improvements</strong>: drawdown-stable variants that exit on volatility regime change</li>
</ul>
</section>
<section id="headline-self-funded-one-year-deposit-only-1k-2wks-for-year-1-then-ride" class="level2">
<h2 class="anchored" data-anchor-id="headline-self-funded-one-year-deposit-only-1k-2wks-for-year-1-then-ride">Headline: self-funded one-year-deposit-only ($1k / 2wks for year 1, then ride)</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/leap/leap_self_funded.png" class="img-fluid figure-img"></p>
<figcaption>Self-funded sweep — moneyness × tenor × IRR / max drawdown.</figcaption>
</figure>
</div>
<p>The middle panel is what matters. <strong>Self-funded LEAPs underperform stock-only on $/year</strong> across most moneyness × tenor cells once you factor in friction. The “wins” are concentrated in deep-OTM long-tenor cells — exactly the cells with the worst path-dependence and the largest drawdowns.</p>
<p>The right panel is the IRR delta vs.&nbsp;continuous DCA. <strong>DCA wins</strong> in 14 of 21 cells, often by 5–10 percentage points. The intuition is mechanical: continuous DCA averages your cost basis through volatility regimes and sells less of the long-vega exposure into vol crashes.</p>
</section>
<section id="drawdown-stability" class="level2">
<h2 class="anchored" data-anchor-id="drawdown-stability">Drawdown stability</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/leap/leap_self_funded_dd.png" class="img-fluid figure-img"></p>
<figcaption>Self-funded drawdown profiles.</figcaption>
</figure>
</div>
<p>LEAP drawdowns are not just <em>bigger</em> than stock drawdowns — they are <em>differently shaped</em>. Stock drawdowns mean-revert; deep-ITM LEAP drawdowns become permanent capital loss when realized vol crushes during the holding window. The chart shows the period where buy-and-hold is making new highs and the LEAP basket is still down 60%.</p>
</section>
<section id="drip-dca-sweep" class="level2">
<h2 class="anchored" data-anchor-id="drip-dca-sweep">Drip-DCA sweep</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/leap/leap_drip_sweep.png" class="img-fluid figure-img"></p>
<figcaption>Drip-DCA over moneyness × tenor — IRR heatmaps.</figcaption>
</figure>
</div>
<p>For each (moneyness, tenor) cell I run a parameter sweep on the drip rate. The optimum rate is <em>not</em> a single point — it varies sharply with moneyness, which means a strategy chosen on one moneyness band will not generalize.</p>
</section>
<section id="blended-portfolio-where-the-realistic-wins-are" class="level2">
<h2 class="anchored" data-anchor-id="blended-portfolio-where-the-realistic-wins-are">Blended portfolio: where the realistic wins are</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/leap/leap_portfolio_v2.png" class="img-fluid figure-img"></p>
<figcaption>Stock + LEAP blends with rolling vs.&nbsp;held-to-vertical.</figcaption>
</figure>
</div>
<p>The realistic finding from the project: <strong>a small LEAP allocation inside a mostly-stock portfolio risk-adjusts modestly above stock-only</strong>, especially in regimes where realized vol stays below 25%. The all-LEAP and self-funded designs do not. Drawdowns are 75–84% on the blended versions vs.&nbsp;~33% on stock-only — that’s the cost of the IRR uplift to 23–26% vs.&nbsp;9.4% for stock-only.</p>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Real options backtest on historical chain data, not Black-Scholes synthesizing</li>
<li>Honest about tail risk: when the asymmetry of leverage cuts the wrong way, it cuts very deep</li>
<li>Sweep design that catches single-cell optima before they become “discoveries”</li>
<li>Comparing strategies on <strong>risk-adjusted</strong> terms, not headline IRR</li>
</ul>
</section>
<section id="stack" class="level2">
<h2 class="anchored" data-anchor-id="stack">Stack</h2>
<ul>
<li>Python — <code>numpy</code>, <code>pandas</code>, <code>matplotlib</code> for analysis</li>
<li>Historical option chain data + Monte Carlo for the synthetic regime stress tests</li>
<li>~30 distinct experiment scripts, each saving its grid + summary CSV alongside the figure</li>
</ul>


</section>

 ]]></description>
  <category>Trading Research</category>
  <category>Options</category>
  <category>Backtesting</category>
  <guid>https://ryan-tolone.com/projects/leap/</guid>
  <pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/leap/leap_self_funded.png" medium="image" type="image/png" height="46" width="144"/>
</item>
<item>
  <title>Deep CFR for Heads-Up No-Limit Hold’em</title>
  <link>https://ryan-tolone.com/projects/hunl-deepcfr/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>I’m teaching a neural network to play <strong>heads-up no-limit Texas Hold’em</strong> (the two-player version of poker that’s been the long-standing benchmark for AI in games of imperfect information) at near-equilibrium strength.</p>
<p>A few things make this hard. Poker isn’t chess — you can’t see your opponent’s cards, so the optimal strategy is <em>probabilistic</em> (sometimes bluff, sometimes don’t, in carefully tuned proportions). The number of possible game situations is astronomically large. And the way you “solve” poker isn’t by predicting moves — it’s by computing a <strong>Nash equilibrium</strong>, the strategy that no opponent can exploit. The standard algorithm for this is <strong>Counterfactual Regret Minimization (CFR)</strong>, and the modern neural variant is <strong>Deep CFR</strong> (Brown et al.&nbsp;2019).</p>
<p>Big AIs like Libratus and Pluribus solve heads-up no-limit, but their published code is incomplete. So I’m building the whole thing from scratch in six stages, climbing from toy poker games up to the real one:</p>
<blockquote class="blockquote">
<p>Kuhn (3 cards, 1 round) → Leduc (6 cards, 2 rounds) → Leduc-3 (3 players) → Limit Hold’em → <strong>No-Limit Hold’em (this stage)</strong> → 5-card PLO</p>
</blockquote>
<p>This stage is the no-limit version. Each stage validates the algorithm on a smaller game before scaling up — if Kuhn doesn’t reach exact equilibrium, no-limit definitely won’t. The result here is a 200-iteration training run on the full game (52-card deck, 100 big-blind stacks, all betting actions): 17.86 hours of compute, no NaN/Inf in any loss, and a checkpoint-averaged final policy ready for the next stage.</p>
</section>
<section id="context-technical" class="level2">
<h2 class="anchored" data-anchor-id="context-technical">Context (technical)</h2>
<p>Stage 4 of a six-stage neural-CFR research arc. The goal of this stage is a complete Deep CFR blueprint — game logic, encoder, networks, training loop, evaluation — that serves as the substrate for Stage 5 (depth-limited online search) and Stage 6 (the PLO5 port).</p>
</section>
<section id="algorithm-external-sampling-deep-cfr" class="level2">
<h2 class="anchored" data-anchor-id="algorithm-external-sampling-deep-cfr">Algorithm — external-sampling Deep CFR</h2>
<ul>
<li>Traverser recurses on <strong>all</strong> legal action slots at own nodes; opponent samples one action from current sigma; chance samples one outcome.</li>
<li>Regret target at traverser node: <code>q(I,a) − Σ σ(a|I) q(I,a)</code> over legal slots, exact via subtree recursion.</li>
<li><strong>R-net retrained from scratch</strong> each iteration (Brown 2019 spec). The Phase-1 Leduc sanity showed a 1.25× exploitability improvement over warm-start, so I left it.</li>
<li>S-net warm-started across iterations; <strong>checkpoint-averaged</strong> over the last 20 snapshots (iters 10, 20, …, 200) at eval time.</li>
<li>V-net auxiliary (predicts expected utility from viewpoint features); trained per spec but not consumed by regret loss after Phase 1’s ESCHER investigation showed V-bootstrap diverged at this scale.</li>
<li>Linear-t weighting on replay-buffer regression.</li>
</ul>
</section>
<section id="the-batched-sigma-scheduler-phase-2a.5" class="level2">
<h2 class="anchored" data-anchor-id="the-batched-sigma-scheduler-phase-2a.5">The batched sigma scheduler (Phase 2a.5)</h2>
<p>Profiling on the naive recursive traversal showed <strong>86% of per-iter time in single-sample GPU forwards</strong> through <code>SigmaCache._flush</code>. The fix:</p>
<ul>
<li>K concurrent generator-trajectories per iter</li>
<li>Each yields <code>(infoset_key, features, legal_mask)</code> when it needs σ</li>
<li>A scheduler collects pending yields per round, batches them into one GPU forward (~200 queries typical, 5,000+ in early iters), caches results, resumes</li>
<li>GPU forwards per iter drop from O(K × queries) to O(rounds)</li>
<li>Bit-equivalence verified at the buffer-statistic level vs.&nbsp;the unbatched reference</li>
</ul>
<p><strong>Result: 3.9× traversal speedup, 17.86h actual vs.&nbsp;25h projection.</strong> Determinism preserved via per-trajectory RNGs (<code>master + iter + player + traj_idx</code>).</p>
</section>
<section id="hunl-game" class="level2">
<h2 class="anchored" data-anchor-id="hunl-game">HUNL game</h2>
<ul>
<li>52-card deck, 100 BB stacks (200 chips), <code>phevaluator</code> showdown</li>
<li>4 streets, 7 canonical action slots with per-state legal mask:
<ul>
<li>preflop SB first: {F, C, raise-to-4/5/6/7, AI}</li>
<li>preflop re-raise: {F, C, 4×/5× bet-faced, AI}</li>
<li>flop/river: {check/call, 0.33pot, 0.75pot, 1.5pot, AI}</li>
<li>turn: {check/call, 0.5pot, 1.0pot, AI}</li>
</ul></li>
<li>Card abstractions from Stage 3: 50 preflop, 1,000 flop, 200 turn, 200 river buckets — k-means on equity features</li>
</ul>
</section>
<section id="networks-hyperparameters" class="level2">
<h2 class="anchored" data-anchor-id="networks-hyperparameters">Networks &amp; hyperparameters</h2>
<ul>
<li>3 networks per player (V, R, S) × 2 players = 6 total</li>
<li>4 hidden × 512 units, LayerNorm + ReLU, Linear out, float32</li>
<li>~1.33M params each, ~8M total</li>
</ul>
<pre><code>T=200, K=10,000
n_v=500, n_r=800 (from-scratch), n_s=1000
batch 4096, Adam lr 1e-3
buffer caps: R 500k, S 500k, V 200k
snapshot S every 10 iters; V/R every 50 iters
seed 42</code></pre>
<p>The spec called for 5M / 5M / 2M buffer caps but in the first attempt those caps saturated host RAM at iter 2 (Python hit 23 GB, triggered swap, training-phase slowed 8×). Killed and restarted at 10× smaller caps; ran cleanly at ~12 GB Python RAM, no swap pressure.</p>
</section>
<section id="training-results" class="level2">
<h2 class="anchored" data-anchor-id="training-results">Training results</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/hunl-deepcfr/stage4_training_curves.png" class="img-fluid figure-img"></p>
<figcaption>200-iter HUNL training curves: V-loss decreases, R-loss increases, S-loss stable.</figcaption>
</figure>
</div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>metric</th>
<th style="text-align: right;">value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>wall time</td>
<td style="text-align: right;"><strong>17.86h</strong></td>
</tr>
<tr class="even">
<td>iterations completed</td>
<td style="text-align: right;">200 / 200</td>
</tr>
<tr class="odd">
<td>per-iter wall (mean / min / max)</td>
<td style="text-align: right;">321 / 175 / 989 s</td>
</tr>
<tr class="even">
<td>traversal mean / training mean</td>
<td style="text-align: right;">224 / 98 s</td>
</tr>
<tr class="odd">
<td>NaN/Inf in any loss</td>
<td style="text-align: right;"><strong>no</strong></td>
</tr>
</tbody>
</table>
<p><strong>Curve interpretation:</strong></p>
<ul>
<li><strong>V-loss</strong> monotonically decreased ~33% (19,700 → 13,300). V predicts terminal utility from state features — a regression task that converges cleanly as the buffer fills.</li>
<li><strong>R-loss increased</strong> 19,700 → 27,900 over training. Counter-intuitive but expected: R is <em>retrained from scratch</em> each iter to fit the <em>instantaneous</em> regret target <code>q − v</code>. As agents become more sophisticated, the regret targets become more diverse, and fitting them with a fresh 4×512 net on 500k samples gets harder. R-loss going up is consistent with the network correctly tracking a moving target — what would be alarming is R-loss going up <em>and</em> exploitability going up together. They don’t.</li>
<li><strong>S-loss</strong> stable, in spec.</li>
</ul>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Implementing Brown et al.&nbsp;(2019) Deep CFR end-to-end without published code</li>
<li>Profile-driven optimization: identifying the GPU-forward bottleneck and engineering a batched scheduler with bit-equivalence guarantees</li>
<li>Honest reading of training curves: knowing when increasing loss is <em>fine</em></li>
<li>Memory engineering: catching swap-thrashing, bisecting buffer caps to stable RAM</li>
<li>Determinism under concurrency</li>
</ul>
</section>
<section id="next-stages" class="level2">
<h2 class="anchored" data-anchor-id="next-stages">Next stages</h2>
<ul>
<li><strong>Stage 5</strong>: depth-limited online search at decision time (DeepStack-style continual re-solving)</li>
<li><strong>Stage 6</strong>: port the whole pipeline to <strong>5-card PLO</strong> with composition-dependent encoders — see the <a href="../plo5-deepcfr/">PLO5 project</a></li>
</ul>


</section>

 ]]></description>
  <category>Reinforcement Learning</category>
  <category>Game Theory</category>
  <category>Poker AI</category>
  <guid>https://ryan-tolone.com/projects/hunl-deepcfr/</guid>
  <pubDate>Wed, 08 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/hunl-deepcfr/stage4_training_curves.png" medium="image" type="image/png" height="93" width="144"/>
</item>
<item>
  <title>Polymarket Research Toolkit</title>
  <link>https://ryan-tolone.com/projects/polymarket/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p><strong>Polymarket</strong> is a website where people bet real money on real-world questions: “Will Trump win the 2024 election?”, “Will Bitcoin be above $100k by year-end?”, “Will the Fed cut rates next meeting?” Each question has two sides — YES and NO — and the prices fluctuate between $0 and $1 based on what the market thinks the probability is.</p>
<p>If a market is mispriced — for example, NO is trading at $0.10 but the event has been almost certain for weeks — there’s potential profit in buying the cheap side. The question is: are these mispricings real, persistent, and tradeable after fees? Or do they look real in a backtest because the backtester is lying to you?</p>
<p>This project is a toolkit for answering that honestly. It does three things in order:</p>
<ol type="1">
<li><strong>Scrapes every public number</strong> Polymarket exposes — every market, every historical price tick, every order book snapshot. Plus Kalshi (a US-regulated competitor) for cross-venue comparison.</li>
<li><strong>Tests trading ideas</strong> against that historical record with a backtester deliberately designed to <em>fail</em> when no real edge exists.</li>
<li><strong>Scans live</strong> for the few signals that survive the test, so they can actually be traded.</li>
</ol>
<p>The interesting findings turned out to be negative — the most promising-looking strategy collapsed when tested honestly, for a specific data-quality reason explained below. That’s the project working as intended.</p>
</section>
<section id="anti-overfit-methodology" class="level2">
<h2 class="anchored" data-anchor-id="anti-overfit-methodology">Anti-overfit methodology</h2>
<p>Every result is structured to fail loudly when no real edge exists:</p>
<ul>
<li><strong>Walk-forward only.</strong> Strategies see prefixes of price series, never the future.</li>
<li><strong>Discovery / test split</strong> at the universe level — the calibration strategy is fit on the first half of resolved markets and scored on the second.</li>
<li><strong>Deflated Sharpe.</strong> When you test N strategies, the best-of-N is inflated by selection. Deflate by N before claiming anything (Bailey &amp; López de Prado).</li>
<li><strong>Conservative cost model.</strong> 1% taker fee + 0.5% half-spread per leg.</li>
<li><strong>Trade-count floor.</strong> Anything with fewer than 100 holdout trades is reported as “no signal yet,” not as a result.</li>
</ul>
</section>
<section id="strategy-suite" class="level2">
<h2 class="anchored" data-anchor-id="strategy-suite">Strategy suite</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 45%">
<col style="width: 54%">
</colgroup>
<thead>
<tr class="header">
<th>Strategy</th>
<th>Hypothesis</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>extreme_price_decay</code></td>
<td>Buy NO when YES collapses near close — fade late confidence</td>
</tr>
<tr class="even">
<td><code>favorite_hold</code></td>
<td>Buy YES when YES is persistently ≥ 0.95 near close</td>
</tr>
<tr class="odd">
<td><code>longshot_bias</code></td>
<td>Short the longshot — buy NO at 0.85–0.95</td>
</tr>
<tr class="even">
<td><code>complementary_arb</code></td>
<td>YES + NO &lt; $1 — needs the live book</td>
</tr>
<tr class="odd">
<td><code>mean_reversion</code></td>
<td>Fade single-bar 10c spikes mid-life</td>
</tr>
<tr class="even">
<td><code>calibration_edge</code></td>
<td>Data-driven, fit on first half of universe only</td>
</tr>
</tbody>
</table>
</section>
<section id="honest-empirical-findings" class="level2">
<h2 class="anchored" data-anchor-id="honest-empirical-findings">Honest empirical findings</h2>
<ul>
<li><strong><code>complementary_arb</code> looked great in train, collapsed in test.</strong> Investigation: the training “edge” was a forward-fill artifact. Bar-resolution price-history shows YES + NO summing to anything between 0.5 and 1.7 because each leg’s prints don’t share timestamps. After bucketing to the hour and inner-joining, real imbalances ≤ 2c essentially never appear in bar data. <strong>The arb strategy can only work against the live book.</strong> Found <em>because</em> the test split was frozen.</li>
<li><strong>Calibration analysis</strong> at the 24h horizon shows the 0–10% YES band actually resolves YES ~11% of the time (vs.&nbsp;2.4% priced) — enough sample to be suggestive, not enough to bet on. Watch this band as more data accumulates.</li>
<li><strong>Bar-data limitations.</strong> Hourly bars are too coarse for any real microstructure work; live websocket feeds are needed for liquidity / spread strategies.</li>
</ul>
</section>
<section id="stack" class="level2">
<h2 class="anchored" data-anchor-id="stack">Stack</h2>
<ul>
<li><code>requests</code> + retry/rate-limit aware HTTP client; SQLite for markets / prices / books</li>
<li>Walk-forward engine with deflated Sharpe; reliability tables and Brier / log loss for calibration</li>
<li>Live-scan loop for complementary-pair edges</li>
<li>Six sprint reports + a research memo documenting the dead ends as carefully as the live ones</li>
</ul>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Treating a backtest as a hypothesis test, not a marketing screenshot</li>
<li>The discipline of letting your own strategies fail</li>
<li>Microstructure thinking: knowing the difference between bar data and the book</li>
</ul>


</section>

 ]]></description>
  <category>Trading Research</category>
  <category>Prediction Markets</category>
  <category>Backtesting</category>
  <guid>https://ryan-tolone.com/projects/polymarket/</guid>
  <pubDate>Mon, 30 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>No-Bust 21st Century Blackjack — Monte Carlo + CDZ⁻ Solver</title>
  <link>https://ryan-tolone.com/projects/blackjack-cdz/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>California has weird gambling laws. To get around the prohibition on banked house games, casinos invented variants of blackjack with twisted rules — the most famous of which is <em>No Bust 21st Century Blackjack</em>. It’s blackjack, but several rules are different in ways that look small and turn out to matter a lot.</p>
<p>The biggest change: <strong>busting (going over 21) doesn’t always lose.</strong> If both you and the dealer bust, the one closer to 21 wins; if you’re closer, you actually <em>push</em> (get your bet back) instead of losing. That single rule shift means hitting on a hand that would normally be a clear stand can suddenly be correct, because busting carries an option value that doesn’t exist in standard blackjack. Other tweaks — surrender legal at any decision point, special rules after splitting aces, an unusual dealer-bust side bet — pile on top.</p>
<p>If you walk into a California card room and play with the basic strategy you learned from a Vegas chart, you’re playing the wrong game. The chart is wrong. The right strategy depends not just on your hand and the dealer’s up-card but on the <em>exact composition of cards left in the shoe</em> (the technical term is <strong>CDZ⁻</strong>, “composition-dependent zero-memory”), and no published blackjack table covers this rule set.</p>
<p>I built two things: a <strong>Tkinter GUI Monte Carlo simulator</strong> that plays out millions of hands with multi-process workers, and a <strong>CDZ⁻ exact solver</strong> that derives the EV-optimal action for any (hand, dealer up-card, deck composition) combination by full subtree expansion. Together they let you see whether a given configuration of bet, deck count, penetration, and side-bet inclusion is actually +EV, in this game, and what the optimal play looks like at every decision point.</p>
</section>
<section id="the-game-rules" class="level2">
<h2 class="anchored" data-anchor-id="the-game-rules">The game (rules)</h2>
<ul>
<li><strong>No-bust comparison rule</strong> (the namesake): when <em>both</em> player and dealer bust, dealer-closer-to-21 wins, <strong>player-closer-to-21 pushes</strong> (player saves the bet), tied → dealer wins. So busting isn’t terminal in the usual sense.</li>
<li><strong>Surrender legal at any decision point</strong> — initial 2-card, mid-hand after any number of hits, on split sub-hands, after split-and-hit. Costs half the bet. Not legal after doubling, not on a post-split-aces sub-hand.</li>
<li><strong>Split aces special rule</strong>: each post-split-aces sub-hand receives <strong>exactly one</strong> draw card, then stands. If that draw card is also an ace and <code>max_splits</code> not reached and the chart action is “split,” the sub-hand is re-split.</li>
<li><strong>Configurable <code>max_splits</code></strong> (default 3 → up to 4 sub-hands; set to 1 for double-deck).</li>
<li><strong>Double-after-split (DAS)</strong> hardcoded on for non-ace splits.</li>
<li><strong>Buster side bet</strong>: pays on dealer-bust by card-count (3–4 cards 2:1, 5 cards 4:1, 6 cards 16:1, 7 cards 50:1, 8+ cards 200:1).</li>
</ul>
<p>These rule shifts make stock blackjack basic strategy <em>wrong</em>, sometimes by several EV percentage points per hand. The simulator’s reason to exist is solving the right strategy for <em>this</em> game.</p>
</section>
<section id="what-it-does" class="level2">
<h2 class="anchored" data-anchor-id="what-it-does">What it does</h2>
<ul>
<li><strong>CDZ⁻ exact solver</strong>: composition-dependent strategy solving — for each (player hand composition, dealer up-card, deck composition), compute the EV-optimal action (hit / stand / double / split / surrender) by full subtree expansion. CDZ⁻ means we account for what’s left in the shoe but stand-on-the-current-hand-only (no peek at future cards).</li>
<li><strong>Numba-JIT’d hand play</strong> for the simulation loop. Hand-by-hand replay through the solved chart at simulator throughput, not interpreter throughput.</li>
<li><strong>Multi-process Monte Carlo</strong> with a configurable number of workers — each plays a fresh shoe, results aggregated for variance estimation.</li>
<li><strong>Tkinter GUI</strong> for live experimentation: configure rules, deck count, bet sizing, splits, DAS, surrender, buster bet — see EV per hand, hourly EV at a chosen pace, ROR for given bankrolls.</li>
</ul>
</section>
<section id="implementation-notes" class="level2">
<h2 class="anchored" data-anchor-id="implementation-notes">Implementation notes</h2>
<ul>
<li><strong>8-deck shoe default</strong>, configurable down to 2-deck (forces <code>max_splits = 1</code> to match house rules).</li>
<li><strong>Penetration handling</strong>: shoe reshuffled at configurable penetration depth; the solver re-derives the chart at the post-penetration composition.</li>
<li><strong>Surrender logic</strong>: separate code path because surrender’s legality interacts with double, split-aces, and the no-bust rule in non-obvious ways. Edge cases verified against published CDZ tables for non-California variants then extended.</li>
<li><strong>Buster bet EV</strong>: computed analytically per dealer up-card from the conditional bust-card-count distribution. Exposed in the GUI alongside the main-bet EV so a player can see whether the side bet is +EV or −EV in their chosen composition.</li>
<li><strong>One-click launchers</strong> (<code>run_sim.bat</code> / <code>run_sim.sh</code>) that auto-install dependencies on first run — the simulator ships to non-developer testers as a working binary, not a setup project.</li>
</ul>
</section>
<section id="why-this-is-worth-doing" class="level2">
<h2 class="anchored" data-anchor-id="why-this-is-worth-doing">Why this is worth doing</h2>
<p>The standard published blackjack tables are wrong for this game. The no-bust rule alone changes the optimal stand-vs-hit threshold for stiff hands against high dealer cards, because busting carries a saved-bet option value. Surrender-at-any-decision-point creates a continuation-value calculation that doesn’t exist in standard rule sets. And the buster bet is a side-game with composition-sensitive EV that the casino doesn’t post.</p>
<p>Solving this isn’t a paper exercise — it’s the difference between playing the game at +EV (with proper composition-dependent strategy and selective buster betting in penetrated shoes) vs.&nbsp;the −EV outcome of stock-rule basic strategy.</p>
</section>
<section id="stack" class="level2">
<h2 class="anchored" data-anchor-id="stack">Stack</h2>
<ul>
<li>Python 3.10+ — <code>numpy</code>, <code>numba</code>, <code>matplotlib</code></li>
<li>Tkinter for the live GUI</li>
<li>Multi-process worker pool for Monte Carlo</li>
<li>Full memo (<code>nobust21_sim.md</code>) covering rules, GUI, every variable, and the variable/function index — written so a non-developer card-room player can use it</li>
</ul>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Composition-dependent solving from scratch (no off-the-shelf for this rule set)</li>
<li>Multi-process Monte Carlo with seedable RNG per worker</li>
<li>Numba JIT compilation of the inner play loop with measured speedup over pure Python</li>
<li>A shippable end-user tool (one-click launcher, GUI) — not just a research notebook</li>
</ul>


</section>

 ]]></description>
  <category>Simulation</category>
  <category>Game Theory</category>
  <guid>https://ryan-tolone.com/projects/blackjack-cdz/</guid>
  <pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>LSTM-Driven Poker Analytics &amp; Bluff Prediction Platform</title>
  <link>https://ryan-tolone.com/projects/poker-LSTM/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>When someone makes a big bet in poker, they’re either <strong>bluffing</strong> (their hand is weak and they want you to fold) or <strong>value-betting</strong> (their hand is strong and they want you to call). Telling the difference is the entire game. Skilled players use timing, bet sizing, board texture, and their opponent’s history of plays to make educated guesses.</p>
<p>This project asks: can a neural network learn to tell the difference, given the same information a human player has? I scraped over 7,000 real-money hands from PokerNow.club (a popular online play-money and small-stakes site, blinds from $0.25/$0.50 to $2/$5), engineered features that capture how each hand played out — bet sizes relative to the pot, decision times (a human takes longer when the decision is close), board texture (paired? flush-draw? Ace on board?), positional context — and trained an <strong>LSTM</strong> (a type of recurrent neural network designed for variable-length sequences) to predict, at the moment a player makes a big bet, whether it’s a bluff or a value bet.</p>
<p>Final test AUC: <strong>0.77</strong>, meaning the model correctly distinguishes bluffs from value bets 77% of the time on hands it has never seen. The interesting part isn’t just the number — it’s <em>which features the model relies on</em>, which gives a quantitative picture of what tells human players are actually leaking at low-to-mid stakes.</p>
</section>
<section id="technical-introduction" class="level2">
<h2 class="anchored" data-anchor-id="technical-introduction">Technical introduction</h2>
<p>The system processes over 7,000 hands (with blinds from $0.25/$0.50 up to $2/$5) to engineer advanced features—such as bet ratios, log-transformed decision times, comprehensive board evaluations with Ace detection, and dynamic positional metrics. A custom LSTM model, utilizing dynamic bucketing to manage variable-length sequences, was developed to predict whether the villain’s betting action is a bluff or a value bet, achieving a test AUC of 0.77.</p>
</section>
<section id="output" class="level2">
<h2 class="anchored" data-anchor-id="output">Output</h2>
<p>Below is a screenshot from the model evaluation dashboard displaying the confusion matrix, ROC curve, and feature importance chart:</p>
<p><img src="https://ryan-tolone.com/projects/poker-LSTM/confusion_matrix.png" class="img-fluid" alt="confusion matrix"> <img src="https://ryan-tolone.com/projects/poker-LSTM/feature_importance.png" class="img-fluid" alt="feature importance chart"> <img src="https://ryan-tolone.com/projects/poker-LSTM/roc_curve.png" class="img-fluid" alt="roc curve"></p>
</section>
<section id="models-techniques-used" class="level2">
<h2 class="anchored" data-anchor-id="models-techniques-used">Models &amp; Techniques Used</h2>
<ul>
<li><strong>LSTM Network with Dynamic Bucketing</strong>: Processes variable-length sequences of poker actions.</li>
<li><strong>Bidirectional LSTM Layers</strong>: Capture context from both the past and future actions.</li>
<li><strong>Advanced Feature Engineering</strong>: Incorporates bet ratios, decision times (log-transformed), board evaluations (with Ace detection), and positional metrics.</li>
<li><strong>Cross-Validation &amp; Class Balancing</strong>: Ensures robust model performance despite class imbalance (52% bluffs).</li>
</ul>
</section>
<section id="training" class="level2">
<h2 class="anchored" data-anchor-id="training">Training</h2>
<ul>
<li><strong>Data Preprocessing</strong>: Raw hand histories are cleansed, features are engineered, and sequences are built per hand. Numerical features are standardized and categorical features are one-hot encoded.</li>
<li><strong>LSTM Model Training</strong>: The model is trained using a combination of Bidirectional LSTMs, dropout, batch normalization, and L1/L2 regularization. Training is optimized via early stopping and learning rate reduction with cross-validation.</li>
<li><strong>Dynamic Bucketing</strong>: Instead of padding all sequences to a global maximum, hands are bucketed by similar sequence lengths to reduce wasted computation and improve training efficiency.</li>
</ul>
</section>
<section id="requirements" class="level2">
<h2 class="anchored" data-anchor-id="requirements">Requirements</h2>
<ul>
<li>Python 3.8+</li>
<li>TensorFlow 2.x</li>
<li>Pandas, NumPy, Scikit-Learn</li>
<li>Matplotlib, Seaborn (for visualization)</li>
</ul>


</section>

 ]]></description>
  <category>LSTM</category>
  <category>Deep Learning</category>
  <category>Time Series</category>
  <category>Poker</category>
  <guid>https://ryan-tolone.com/projects/poker-LSTM/</guid>
  <pubDate>Sat, 08 Mar 2025 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/poker-LSTM/pokerbluff.png" medium="image" type="image/png" height="79" width="144"/>
</item>
<item>
  <title>Pickleball Vision: CV-Driven Match Analytics</title>
  <link>https://ryan-tolone.com/projects/pickleball/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>Tennis broadcasts have shot tracking. Major League Baseball has Statcast. <strong>Pickleball</strong>, the fastest-growing sport in the US, has nothing — match footage is just video, with no automated stats overlaid.</p>
<p>This project takes a fixed-camera video of a pickleball match and turns it into an <strong>annotated broadcast</strong> with player tracking, ball tracking, court geometry, a top-down minimap, ball speed in mph, per-player movement distance in feet, and shot count. All of it is computed automatically from raw video — no sensors, no manually placed cameras, no Hawk-Eye-style installation. Just whatever phone or DSLR is filming the match.</p>
<p>The pieces are well-known computer vision tools assembled carefully: <strong>YOLOv8</strong> detects players and the ball; a fine-tuned <strong>ResNet50</strong> finds the court’s lines and corners; the corners give a <strong>homography</strong> (the math that converts between “pixels in the video” and “feet on the actual court”). Once you have that homography, every other measurement — speed, distance, minimap position — is just geometry.</p>
<p>The hard part isn’t the detection. It’s making the <em>court geometry</em> trustworthy under realistic camera angles, occlusion from players, and varying lighting. Without that, the speeds are made up. So most of the engineering is in the iterative homography refinement — the part that makes every “23 mph” number on the scoreboard <em>true</em>.</p>
</section>
<section id="whats-in-the-output" class="level2">
<h2 class="anchored" data-anchor-id="whats-in-the-output">What’s in the output</h2>
<ul>
<li><strong>Player boxes</strong> — only the players actually on court, filtered from raw YOLO <code>person</code> detections (spectators dropped via court-geometry containment + minimum-track-length thresholding)</li>
<li><strong>Ball box + trail</strong> — smoothed and gap-interpolated trajectory with a fading trail</li>
<li><strong>Court keypoints</strong> — 12-point grid (4 horizontal lines × 3 columns) regressed by a fine-tuned ResNet50</li>
<li><strong>Minimap (top-right)</strong> — top-down 20×44 ft court showing each player’s foot position and the ball location, projected via homography</li>
<li><strong>Scoreboard (top-left)</strong> — running shot count, current and max ball speed (mph), per-player speed and total distance (ft)</li>
<li><strong>Shot markers</strong> — flash on screen when the ball is struck (velocity reversal near a player)</li>
</ul>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/pickleball/pickleball.png" class="img-fluid figure-img"></p>
<figcaption>Pickleball CV output frame — player boxes, ball trail, court keypoints, minimap, scoreboard.</figcaption>
</figure>
</div>
</section>
<section id="court-keypoint-accuracy-is-the-hard-problem" class="level2">
<h2 class="anchored" data-anchor-id="court-keypoint-accuracy-is-the-hard-problem">Court keypoint accuracy is the hard problem</h2>
<p>A trained ResNet keypoint regressor can be 30–100 px off on unfamiliar camera angles. Player tracking is easy; <em>getting the homography right</em> is what makes every downstream metric (speed, distance, minimap projection) trustworthy. The pipeline applies multiple refinement strategies in priority order:</p>
<ol type="1">
<li><strong>Manual override</strong> — if <code>input_videos/keypoints.json</code> exists, use it directly. Most accurate option for fixed-camera shots.</li>
<li><strong>4-boundary detection</strong> — locate the 2 baselines + 2 sidelines (the strongest court features) by clustering Hough segments and picking the <em>extreme</em> y/x clusters. Inner lines (NVZ, net) are deliberately ignored to avoid the common “snap to net” failure mode. The 4 corners → homography → all 12 canonical keypoints.</li>
<li><strong>Model-prior snapping</strong> — fall back to per-row line snapping driven by the ResNet’s prediction.</li>
<li><strong>Iterative pixel-level optimization</strong> — runs after either (2) or (3): back-project every white-line pixel into court-feet, assign each to its nearest grid line, refit, recompute the homography. Typically converges to ~0.2 ft mean residual.</li>
</ol>
</section>
<section id="auto-labelled-training-data" class="level2">
<h2 class="anchored" data-anchor-id="auto-labelled-training-data">Auto-labelled training data</h2>
<p>To grow the keypoint training set without hand-labelling:</p>
<pre><code>python tools/generate_training_data.py --stride 6 --max-frames 60</code></pre>
<p>Extracts every Nth frame, runs the 4-boundary detector + iterative refinement, and saves frame JPGs + LabelMe-format JSONs that match the existing dataset format so they can be merged and used to retrain the ResNet for better generalization. Frames where boundary detection fails (heavy player occlusion) are <em>skipped</em>, not given a bad label — generating bad labels would silently degrade the next training round.</p>
</section>
<section id="architecture" class="level2">
<h2 class="anchored" data-anchor-id="architecture">Architecture</h2>
<pre><code>trackers/
  player_tracker.py    YOLO + court-aware filtering
  ball_tracker.py      anchor-based linker (high-conf seeds, forward/backward
                       propagation with per-frame max-step constraint)
  motion_ball.py       frame-diff fallback for frames YOLO misses
court_line_detector/
  court_line_detector.py  ResNet50 -&gt; 12 keypoints (24 floats)
  refine.py               Hough-line snapping + 4-corner homography refit
mini_court/
  court_geometry.py    canonical keypoint -&gt; feet, homography
  mini_court.py        top-down renderer
analytics/
  shot_detector.py     velocity-reversal-near-player heuristic
  speed.py             per-player and ball speeds in mph (via homography)</code></pre>
</section>
<section id="improvements-over-the-original" class="level2">
<h2 class="anchored" data-anchor-id="improvements-over-the-original">Improvements over the original</h2>
<ul>
<li><strong>Higher inference resolution</strong>: <code>imgsz=1280</code> (up from YOLO default 640) gives ~100% ball detection vs ~60%. First run is slow because YOLO runs every frame; detections cached to <code>tracker_stubs/</code>. Subsequent runs skip inference unless <code>--no-cache</code>.</li>
<li><strong>Anchor-based ball linker</strong>: replaced naive nearest-neighbor frame-to-frame association with high-confidence detection seeding + forward/backward propagation under a per-frame max-step constraint (the ball cannot teleport). Halves the ID-switch rate on noisy chunks.</li>
<li><strong>Court-aware player filter</strong>: drops <code>person</code> detections that fall outside the court polygon and tracks shorter than min-frames threshold. Spectators no longer pollute the metrics in stadium footage.</li>
<li><strong>Iterative homography refit</strong>: the white-line pixel back-projection loop. Previously fixed at the model’s first-pass prediction; now self-corrects to ~0.2 ft mean residual.</li>
<li><strong>Pure-Python test suite</strong>: 20 unit tests covering geometry, smoothing, shot detection, speed math — run without torch / YOLO weights, finish in well under a second. Catches regressions in the analytics pieces without paying for full inference.</li>
</ul>
</section>
<section id="stack" class="level2">
<h2 class="anchored" data-anchor-id="stack">Stack</h2>
<ul>
<li>Python 3.10+, <code>ultralytics</code> (YOLOv8), <code>pytorch</code>, <code>opencv-python</code>, <code>numpy</code>, <code>pandas</code></li>
<li>Trained ball detector (<code>models/yolo5_last.pt</code>) on hand-labelled pickleball footage</li>
<li>Trained court keypoint model (<code>models/keypoints_model.pth</code>) — ResNet50 backbone + regression head</li>
</ul>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Multi-stage CV pipeline where each stage is testable in isolation</li>
<li>Homography-driven measurement: every ft / mph number is geometry, not guesswork</li>
<li>Auto-labelling loop that knows when to refuse to label</li>
<li>Caching strategy that makes iterative work tractable on a single machine</li>
</ul>


</section>

 ]]></description>
  <category>Computer Vision</category>
  <guid>https://ryan-tolone.com/projects/pickleball/</guid>
  <pubDate>Fri, 07 Feb 2025 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/pickleball/pickleball.png" medium="image" type="image/png" height="81" width="144"/>
</item>
<item>
  <title>ORB Algorithmic Day-Trading System</title>
  <link>https://ryan-tolone.com/projects/orb-trading/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>Day-traders have a strategy called the <strong>Opening Range Breakout</strong> (ORB). The idea: in the first 15–60 minutes of the trading day, the stock makes a high and a low. If the price later breaks <em>above</em> that opening high, you go long (bet it keeps rising). If it breaks <em>below</em> the opening low, you go short. The bet is that early-session momentum continues.</p>
<p>ORB is the kind of strategy you see all over finance YouTube, usually with a screenshot of one good month. The honest version is much less impressive: tested across years of TQQQ (a 3× leveraged Nasdaq ETF), pure ORB works on some days, gets stopped out on most, and slowly bleeds equity in months where the market just chops sideways without trending.</p>
<p>The interesting question isn’t “does ORB work” — it’s “<strong>on which days does ORB work</strong>?” Some days have the kind of one-directional momentum ORB needs; other days have nothing of the sort. If we could predict the difference at 9:45 AM (right after the opening range completes), we’d only take the trade when conditions favor it.</p>
<p>That’s a machine-learning problem. Given the morning’s features — pre-market range, gap from yesterday’s close, VIX, sector strength, day-of-week, etc. — predict whether <em>this</em> day belongs to the trade-this regime or the skip-it regime. I trained an XGBoost classifier on this, used it as a gate on the underlying ORB strategy, and got a <strong>+19.1% annualized return improvement</strong> vs.&nbsp;running ORB unfiltered. The gated version preserves the upside of trend days and sits flat through the chop.</p>
</section>
<section id="system" class="level2">
<h2 class="anchored" data-anchor-id="system">System</h2>
</section>
<section id="system-1" class="level2">
<h2 class="anchored" data-anchor-id="system-1">System</h2>
<ul>
<li><strong>Data pipeline (SQL)</strong> — minute-bar TQQQ feature engineering: pre-market range, prior-day close-to-open gap, ATR-normalized opening range, sector-relative strength, VIX regime, day-of-week, position in week, time-since-last-stop-out</li>
<li><strong>ORB simulator</strong> — runs the strategy across multiple interval sizes (5/15/30/60-min ORB) plus a no-trade scenario for benchmark; produces per-day P&amp;L tagged with the feature snapshot at decision time</li>
<li><strong>XGBoost gating model</strong> — predicts whether <em>today’s</em> feature snapshot belongs to a profitable-ORB regime, with hyperparameter tuning and time-series cross-validation</li>
<li><strong>Composite strategy</strong> — only takes the ORB signal when the gating model fires positive; falls through to flat otherwise</li>
</ul>
</section>
<section id="result" class="level2">
<h2 class="anchored" data-anchor-id="result">Result</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/orb-trading/orb_graph_plsupdate.png" class="img-fluid figure-img"></p>
<figcaption>Equity curve: gated ORB vs.&nbsp;buy-and-hold TQQQ.</figcaption>
</figure>
</div>
<p><strong>+19.1% annualized return improvement</strong> over unfiltered ORB on the test split. The gated version clears buy-and-hold on a Sharpe basis, and crucially preserves performance in regimes where naked ORB blows up (Q4 2022 trendless / mean-reverting environment).</p>
</section>
<section id="improvements-post-original" class="level2">
<h2 class="anchored" data-anchor-id="improvements-post-original">Improvements (post-original)</h2>
<p>The original 2024 build was a single-model gated ORB. Subsequent improvements:</p>
<ul>
<li><strong>Time-series cross-validation</strong> instead of standard CV — the original had look-ahead leakage from random folds across non-stationary feature distributions</li>
<li><strong>Feature drift monitoring</strong> — KL divergence between training-period and recent-period feature distributions, with a model-refit trigger when drift exceeds threshold</li>
<li><strong>Cost model</strong> — added per-trade slippage + half-spread costs at TQQQ realistic levels; the IRR uplift held but the Sharpe improvement compressed, which is the honest signal</li>
<li><strong>Multi-interval ensemble</strong> — instead of fitting one gating model per interval, train a single model that predicts the <em>best</em> interval for the day (multinomial), then trade that one</li>
<li><strong>SQL pipeline reorg</strong> — moved from per-day feature recomputation to incremental updates keyed on the latest bar; cut the daily feature build from minutes to seconds</li>
</ul>
</section>
<section id="stack" class="level2">
<h2 class="anchored" data-anchor-id="stack">Stack</h2>
<ul>
<li>Python 3.10+ — <code>xgboost</code>, <code>pandas</code>, <code>numpy</code>, <code>scikit-learn</code></li>
<li>SQLite for the feature store (Postgres-ready schema)</li>
<li>Backtest reporting in Jupyter</li>
<li>TQQQ minute-bar history via <code>financeds</code> (custom data layer I built for this and the LEAP project)</li>
</ul>
</section>
<section id="what-it-demonstrates" class="level2">
<h2 class="anchored" data-anchor-id="what-it-demonstrates">What it demonstrates</h2>
<ul>
<li>Treating the strategy and the gating model as separable problems</li>
<li>Honest ML evaluation: time-series CV, drift monitoring, cost model</li>
<li>A working composite strategy, not just a backtest screenshot</li>
</ul>
</section>
<section id="caveats" class="level2">
<h2 class="anchored" data-anchor-id="caveats">Caveats</h2>
<ul>
<li>TQQQ-specific. The feature distribution and the optimal ORB interval don’t transfer cleanly to underlying QQQ or to single-name stocks; the gating model would need to be re-fit per asset</li>
<li>Survivorship-free underlying (TQQQ has been listed since 2010) so no listing-bias correction needed, but transfer to a name with corporate actions would require those adjustments</li>
</ul>


</section>

 ]]></description>
  <category>Financial Machine Learning</category>
  <category>Algorithmic Trading</category>
  <guid>https://ryan-tolone.com/projects/orb-trading/</guid>
  <pubDate>Mon, 11 Nov 2024 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/orb-trading/orb_graph_plsupdate.png" medium="image" type="image/png" height="117" width="144"/>
</item>
<item>
  <title>CNN-Based Age Prediction System</title>
  <link>https://ryan-tolone.com/projects/age-pred/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>Given a photo of a person’s face, predict their age. It’s a classic computer-vision benchmark and the kind of project most ML practitioners build at some point — the interesting parts are the loss function (age is continuous, so this is regression, not classification), the dataset bias (most face datasets skew young and Western), and the deployment surface (a model that lives in a Jupyter notebook isn’t a product).</p>
<p>I trained a <strong>convolutional neural network</strong> (a ResNet10 architecture, in PyTorch) on the UTK dataset — about 9,000 face images labeled with age, gender, and ethnicity. The model achieves an average prediction error of <strong>±4 years</strong> on held-out test images. Then I wrapped it in a <strong>Streamlit</strong> web UI so anyone can drop in a photo and see the prediction live, instead of needing to run the model from a notebook.</p>
<p>The hyperparameter tuning improved accuracy by 28% over the initial baseline — most of that came from learning-rate scheduling and proper regularization, not from fancy architecture changes.</p>
</section>
<section id="technical-introduction" class="level2">
<h2 class="anchored" data-anchor-id="technical-introduction">Technical introduction</h2>
<p>This project leverages a CNN-based approach to predict age from facial images using PyTorch and the UTK dataset. Utilizing a ResNet10 architecture, the model processes over 9,000 images to achieve an average prediction error of ±4 years. Through extensive hyperparameter optimization—including learning rate scheduling and regularization—the model’s accuracy improved by 28%, and a Streamlit UI was developed for real-time demographic analysis.</p>
</section>
<section id="output" class="level2">
<h2 class="anchored" data-anchor-id="output">Output</h2>
<p>Here is a screenshot from the Streamlit UI demonstrating real-time age prediction:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/age-pred/streamlitui.png" class="img-fluid figure-img"></p>
<figcaption>Age Prediction UI Screenshot</figcaption>
</figure>
</div>
</section>
<section id="models-used" class="level2">
<h2 class="anchored" data-anchor-id="models-used">Models Used</h2>
<ul>
<li><strong>ResNet10 CNN Architecture</strong> for age prediction<br>
</li>
<li><strong>PyTorch</strong> for model training and inference<br>
</li>
<li><strong>Streamlit</strong> for deploying a real-time user interface</li>
</ul>
</section>
<section id="training" class="level2">
<h2 class="anchored" data-anchor-id="training">Training</h2>
<ul>
<li><strong>CNN Model Training</strong>
<ul>
<li>Includes data preprocessing, model training, and hyperparameter tuning.</li>
</ul></li>
</ul>
</section>
<section id="requirements" class="level2">
<h2 class="anchored" data-anchor-id="requirements">Requirements</h2>
<ul>
<li>python 3.8+</li>
<li>pytorch</li>
<li>torchvision</li>
<li>pandas</li>
<li>numpy</li>
<li>streamlit</li>
<li>matplotlib or seaborn (for visualization)</li>
</ul>


</section>

 ]]></description>
  <category>CNN&#39;s</category>
  <guid>https://ryan-tolone.com/projects/age-pred/</guid>
  <pubDate>Tue, 17 Sep 2024 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/age-pred/lebron.png" medium="image" type="image/png" height="105" width="144"/>
</item>
<item>
  <title>Ethereum Smart Contract for NFT Generation &amp; Minting</title>
  <link>https://ryan-tolone.com/projects/pimpin-pandas/</link>
  <description><![CDATA[ 




<section id="in-plain-english" class="level2">
<h2 class="anchored" data-anchor-id="in-plain-english">In plain English</h2>
<p>An <strong>NFT</strong> (“non-fungible token”) is a unique digital item — usually a piece of art — whose ownership is recorded on a blockchain. NFT collections are typically generated <em>programmatically</em>: artists design a small number of “traits” (different hats, eyes, backgrounds, accessories), and a script combines them randomly to produce thousands of unique pieces. The collection is then put on a <strong>smart contract</strong> — a small program living on the Ethereum blockchain — that lets people pay to “mint” (claim) one of the pieces.</p>
<p>This was a 10,000-piece NFT collection (“Pimpin’ Pandas”) I designed end-to-end:</p>
<ul>
<li><strong>Image generation pipeline</strong> in Python that combined hundreds of trait layers into 10,000 unique pandas with no duplicates and 99.9% metadata integrity (every piece’s recorded traits actually match its image).</li>
<li><strong>Ethereum smart contract</strong> in Solidity (the language used for Ethereum programs), implementing the <strong>ERC-721</strong> standard (the technical spec for unique-item NFTs). Optimizations to the contract reduced the gas fees buyers paid to mint by 15%.</li>
<li><strong>Minting UI</strong> so non-technical buyers could connect a wallet and claim a panda without needing to interact with the contract directly.</li>
</ul>
<p>The collection successfully facilitated over 1,000 mint transactions.</p>
</section>
<section id="technical-introduction" class="level2">
<h2 class="anchored" data-anchor-id="technical-introduction">Technical introduction</h2>
<p>This project involves designing and deploying an Ethereum smart contract to generate and mint NFTs, supporting over 10,000 unique ERC-721 compliant assets. It features an optimized Python-based image generation pipeline that ensures diverse traits and 99.9% metadata integrity. Additionally, a user-friendly UI streamlines the minting process, reducing gas fees by 15% and enabling more than 1,000 efficient transactions.</p>
</section>
<section id="output" class="level2">
<h2 class="anchored" data-anchor-id="output">Output</h2>
<p>Here is a display of some of the NFT’s minted:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ryan-tolone.com/projects/pimpin-pandas/pandas.png" class="img-fluid figure-img"></p>
<figcaption>NFT Minting UI Screenshot</figcaption>
</figure>
</div>
</section>
<section id="technologies-used" class="level2">
<h2 class="anchored" data-anchor-id="technologies-used">Technologies Used</h2>
<ul>
<li><strong>Ethereum Smart Contract</strong> (Solidity) for NFT generation and minting<br>
</li>
<li><strong>Python-based Image Generation Pipeline</strong> for creating unique NFT assets<br>
</li>
<li><strong>User Interface</strong> for streamlined minting and transaction management</li>
</ul>
</section>
<section id="deployment-documentation" class="level2">
<h2 class="anchored" data-anchor-id="deployment-documentation">Deployment &amp; Documentation</h2>
<ul>
<li><p><strong>Smart Contract Deployment</strong></p></li>
<li><p><strong>Image Generation Pipeline Documentation</strong></p></li>
</ul>
</section>
<section id="requirements" class="level2">
<h2 class="anchored" data-anchor-id="requirements">Requirements</h2>
<ul>
<li>Node.js and npm (for smart contract development)<br>
</li>
<li>Solidity compiler (e.g., via Hardhat or Truffle)<br>
</li>
<li>python 3.8+<br>
</li>
<li>web3.py (or similar library for blockchain interaction)<br>
</li>
<li>pandas, numpy (for image pipeline processing)</li>
</ul>


</section>

 ]]></description>
  <category>Crypto</category>
  <category>Algorithmic Art Generation</category>
  <guid>https://ryan-tolone.com/projects/pimpin-pandas/</guid>
  <pubDate>Fri, 15 Dec 2023 00:00:00 GMT</pubDate>
  <media:content url="https://ryan-tolone.com/projects/pimpin-pandas/pimpinpandasgif.gif" medium="image" type="image/gif"/>
</item>
</channel>
</rss>
