The Floor
Your workspace. Begin here.
What is Thresh?
Thresh is a free, browser-based tool for collecting Reddit data. Point it at any subreddit, configure what you want to gather, and it hands you a clean dataset with a complete record of how it was collected. No accounts, no API keys, no code. Everything runs in your browser and stays on your machine. Built for researchers, journalists, and anyone who believes public discourse is worth studying carefully.
Quick Start
Enter a subreddit name and start collecting data in seconds. No API keys, no code, no setup.
How It Works
Recent Collections
No collections yet. Start by threshing a subreddit.
Who Uses This — And How
Public Health Researcher
"What are people in r/mentalhealth talking about this month?"
score to find what resonates most
Winnow: Run Identify themes to map dominant concerns
Glean: Export CSV with anonymized usernames for IRB-ready analysis
Journalist
"What questions are people asking in r/personalfinance about student loans?"
num_comments for the biggest conversations
Winnow: Run Extract questions to find what people need answered
Glean: Provenance.txt gives your editor a transparent methodology section
Graduate Student
"I need to compare discourse in r/science vs. r/conspiracy for my thesis."
upvote_ratio to see consensus vs. division
Winnow: Run Sentiment analysis on each, then a Custom prompt comparing tone
Glean: Two exports, each with its own provenance — cite both in your methods section
Community Organizer
"What are residents saying in our city's subreddit about the new transit plan?"
Thresh
Configure and collect Reddit data. No API key needed.
Harvest
Browse and explore your collected data.
No Data Yet
Collect some Reddit data first, then come back here to explore it.
Glean
Export your data with full provenance documentation.
Nothing to Export
Collect some data first, then return here to export it.
Winnow
Separate signal from noise. Start with the built-in word frequency table (free, instant), then optionally use Claude AI for deeper analysis.
No Data to Analyze Yet
Head over to Thresh to collect posts from a subreddit. Once you have a collection, come back here to explore word patterns and run AI analysis.
About
Ethics, methodology, and how this tool works.
What Is This?
The Threshing Floor is a free, open-source tool for collecting and exporting Reddit data. It is designed for public health researchers, journalists, civic technologists, and anyone who believes public discourse is worth measuring.
It runs entirely in your browser. There is no server, no database, no account to create. Your data stays on your machine.
How It Works
Reddit's public pages serve JSON data alongside HTML. The Threshing Floor fetches this public data through a lightweight proxy (to handle browser security restrictions), then lets you browse, filter, and export it.
No Reddit API key is required. No authentication of any kind. The data collected is limited to what any person could see by visiting Reddit in a web browser.
The workflow follows four steps:
- Thresh — Enter a subreddit and collection parameters
- Harvest — Browse and search your collected data
- Winnow — Analyze with word frequency and optional AI
- Glean — Export as CSV or JSON with full provenance
Your Data & Storage
Everything is stored in your browser only. There is no server database, no account system, and no cloud sync. Specifically:
- Collections (posts, comments, configuration) are saved in your browser's
localStorage. - Anthropic API key (if you use Claude AI analysis) is stored in
localStorage. It is never sent to any server except Anthropic's API directly. - Rate limit state and subreddit cache are also stored locally in your browser.
This means your data does not sync across browsers or devices. If you switch browsers, your collections will not follow.
Clearing Your History
To erase all Thresh data from your browser:
- Open your browser's Settings (or press
Ctrl+Shift+Delete/Cmd+Shift+Delete) - Navigate to Privacy & Security → Clear browsing data
- Select "Cookies and site data" (this includes localStorage)
- To clear only Thresh: go to your browser's Developer Tools (
F12), open the Application tab, expand Local Storage, find the Thresh site, and delete individual keys or click Clear All
This will remove all saved collections, your API key (if stored), rate limit state, and cached subreddit data. It cannot be undone.
The Rate Limit Gauge
The Rate Limit gauge at the bottom of the sidebar tracks how many requests remain in your current Reddit rate limit window. Reddit allows 100 requests per minute to its public JSON endpoints.
- Gold bar — plenty of requests remaining. Normal operation.
- Yellow bar (below 30%) — requests are running low. Consider pausing collection.
- Red pulsing bar (below 10%) — critical. Thresh will pause automatically if the limit is reached.
- Cooldown timer — if you hit the limit, a countdown appears showing when requests resume. The collect button is disabled until the cooldown expires.
The rate limit resets automatically each minute. Under normal use (25–100 posts per collection), you will rarely see it drop below gold.
Ethical Considerations
- Re-identification risk: Even with usernames removed, unique writing styles or specific details in posts may allow re-identification. Consider this when publishing findings.
- IRB guidance: If you are conducting academic research, consult your Institutional Review Board about whether your data collection constitutes human subjects research.
- Reddit's Terms: This tool accesses publicly available data. Please review Reddit's Terms of Service and API Terms regarding data collection and use.
- Consent: Reddit users post publicly, but they may not expect their posts to be aggregated and analyzed. Handle data with care and respect.
- Default anonymization: Exports anonymize usernames by default. You can disable this, but consider the implications before doing so.
Provenance
Every export includes a provenance.txt file documenting exactly how the data was collected: the subreddit, sort method, time filter, number of posts, date of collection, and any filters applied. This is the seal on every bundle — it gives you the language for a methods section, a transparency report, or a replication attempt.
AI Analysis (Optional)
The Winnow page offers optional AI-powered analysis using Claude (by Anthropic). To use this feature, you need your own Anthropic API key. Your key is stored only in your browser's local storage and is sent directly to Anthropic's API — it is never stored on any server.
Deploying Your Own
The Threshing Floor deploys to Cloudflare Pages with zero configuration:
- Fork the repository on GitHub
- Connect it to Cloudflare Pages
- Set the build output directory to
public - Deploy
That's it. No environment variables, no build step, no dependencies to install.
Citation
Thomas, J.E. (2026). The Threshing Floor: A browser-based tool for Reddit data collection and export. https://github.com/jethomasphd/The_Threshing_Floor
A Jacob E. Thomas artifact. Built with deliberate attention.