Months of L3 Orderbook Data From Prediction Markets Is Up for Grabs — Here’s Why That Matters
TL;DR
A researcher on Reddit’s r/algotrading has accumulated months of Level 3 orderbook data across major prediction market platforms and is asking the community how best to release it publicly. L3 data — the deepest, most granular view of market microstructure — is extremely rare in the prediction market space. The post sparked 40 comments and significant community interest, signaling real demand for this kind of dataset. If released properly, it could become a foundational resource for algo traders, researchers, and market microstructure analysts.
What the Sources Say
The single but substantive source here is a Reddit post in r/algotrading, which has generated notable engagement (53 upvotes, 40 comments) for a niche data topic.
The poster claims to have collected months of L3 orderbook data across “major prediction markets” — though the specific platforms aren’t named in the title, the tool URLs referenced alongside the post point to Polymarket, Kalshi, and Limitless Exchange as the relevant players in this space.
What Is L3 Data, and Why Does It Matter?
To understand why this is a big deal, a quick primer:
- L1 data: Best bid/ask (the top of the book)
- L2 data: The full order book with price levels and quantities
- L3 data: Every individual order event — placements, cancellations, fills — tied to specific order IDs
L3 data is the holy grail for market microstructure analysis. It lets you reconstruct the full history of how liquidity was built and consumed, identify spoofing or layering patterns, build accurate execution simulations, and backtest strategies that depend on queue position. In traditional finance, L3 feeds cost thousands of dollars per month and are typically only accessible to institutional players.
In prediction markets, this kind of data is essentially non-existent publicly. Most retail participants are working with basic price feeds or at best L2 snapshots. That’s what makes this dataset potentially significant.
Community Consensus
The r/algotrading community’s response was enthusiastic. While the full comment thread details aren’t surfaced in the source package, the engagement level (40 comments on a niche technical post) suggests the community sees real value here and has opinions on the best distribution method.
The core question posed — how should I release it? — implies the researcher is committed to making this data public. The debate is about format and infrastructure, not whether to do it at all.
The Release Method Debate
The reference to Academic Torrents as a potential distribution channel is telling. Academic Torrents is a BitTorrent-based platform specifically designed for sharing large scientific datasets. It’s free, decentralized, and built for exactly this use case: large files that need to be distributed reliably without a single point of failure or ongoing hosting costs.
This framing suggests the dataset is substantial in size — likely gigabytes of tick-by-tick order event data — and that the researcher is thinking about this more like a research contribution than a casual data dump.
No Contradictions, But Plenty of Open Questions
Since we’re working from a single source, there aren’t conflicting accounts. But the community discussion likely surfaced important unresolved questions:
- Which markets specifically? Polymarket (crypto-settled, Polygon blockchain), Kalshi (CFTC-regulated, USD), and Limitless Exchange have very different microstructures. Data from all three in one package would be extraordinarily valuable.
- What time period? “Months” is vague — pre-election data from late 2024/early 2025 prediction markets would be particularly interesting given the trading volume spikes around major political events.
- What format? Raw order events vs. reconstructed snapshots vs. pre-processed OHLCV bars are very different products for very different use cases.
- Licensing? Academic Torrents supports open licensing, but prediction market data ownership is murky when scraped from decentralized protocols.
Pricing & Alternatives
Here’s the landscape of the platforms whose data is apparently involved, plus the proposed distribution channel:
| Platform | Type | Regulation | Settlement | Data Accessibility |
|---|---|---|---|---|
| Polymarket | Decentralized | Unregulated (offshore) | USDC (Polygon) | On-chain data available, but L3 requires custom infrastructure |
| Kalshi | Centralized | CFTC-regulated (US) | USD | Limited public data; institutional access unclear |
| Limitless Exchange | Decentralized | Unregulated | Crypto | On-chain data available |
| Academic Torrents | Distribution only | N/A | Free | Free hosting for large datasets |
Pricing note: None of the platforms publish official pricing for data access. Polymarket and Limitless have on-chain data that’s theoretically public but requires significant infrastructure to collect and normalize. Kalshi, being a regulated exchange, likely has tighter controls on data redistribution. The fact that this researcher collected it themselves — rather than licensing it — suggests they built custom data collection pipelines.
For context, comparable L3 data from traditional crypto exchanges (like Binance or Coinbase) would typically run $500–$2,000+/month through commercial data providers, though the source package doesn’t provide specific pricing comparisons for prediction market data products.
The Bottom Line: Who Should Care?
Algorithmic traders — If you’re building execution strategies or market-making bots for prediction markets, L3 data lets you backtest with realistic assumptions about queue dynamics and fill probabilities. This is the difference between a strategy that looks good on paper and one that actually works.
Quantitative researchers — Prediction markets are increasingly used as indicators of real-world probability assessments. Understanding the microstructure — who’s providing liquidity, how spreads behave around events, whether informed trading is detectable — has academic value beyond pure trading.
Market microstructure analysts — Prediction markets are a fascinating edge case: relatively thin order books, event-driven volatility, and a mix of sophisticated arb traders and retail punters. L3 data from these venues has never really been studied at scale.
Crypto/DeFi researchers — Polymarket and Limitless run on blockchain infrastructure. The interplay between on-chain settlement mechanics and off-chain order matching (if applicable) is understudied territory.
Anyone building prediction market infrastructure — If you’re working on tooling, analytics dashboards, or execution infrastructure for these platforms, historical L3 data is invaluable for testing.
What Should Actually Happen Here?
The community response and the Academic Torrents mention suggest the most likely outcome is a torrent-based release with some documentation. The ideal release would include:
- Schema documentation — What fields, what timestamps, what event types
- Coverage metadata — Which markets, which contracts, exact date ranges
- A small sample — So researchers can validate the format before downloading gigabytes
- Clear licensing — CC0 or CC-BY would maximize usability for both commercial and academic use
If this data is as described and gets released properly, it’ll likely become a reference dataset in the prediction market research community almost immediately. There simply isn’t anything else like it publicly available.
The r/algotrading community clearly agrees — this is one of those rare moments where a single person sitting on a unique dataset can make a genuine contribution to open financial research.