Home / Benchmarks
Lab logTransparent format

Local AI benchmark queue and test methodology.

A public plan for TokenByte hardware and workflow tests: exact machine, model, settings, result, limitation, evidence, and buying verdict.

Current evidence status: the rows below are queued tests, not completed benchmark results. TokenByte should not use them as performance claims until the matching evidence files, screenshots, settings, and measured values are published.
Public test queue

What is planned, what is measured, and what is still missing.

This page separates planned tests from completed evidence. A queued row is a promise of method, not a performance claim.

Structured benchmark queue

The public data file keeps the queue honest: each test has a status, required evidence, target metrics, and the guide it will update.

Open JSON Log
Test Machine Workload Metrics to publish Status Evidence needed
RTX 3090 ComfyUI workflow RTX 3090 / 24GB VRAM Image generation, upscale chain, batch output VRAM use, render time, resolution, workflow file, failure notes Queued Workflow screenshot, exact graph, seed/settings, VRAM capture, output sample, failure notes
Mac Mini local model run Mac Mini + external SSD Summaries, transcripts, small local LLM prompts Tokens/sec, memory pressure, model name, prompt set, output quality Queued Model name, quantization, app version, prompt set, memory capture, output notes
Storage stress test NVMe / external SSD Model library loading, output folder growth, backup flow Capacity used, load time, heat, reliability notes Queued Drive model, enclosure, folder size, transfer notes, heat notes, backup result
Automation folder watcher Mac Mini or local PC File summaries and Markdown output Files processed, runtime, error rate, model cost/privacy notes Queued Script/config, file count, runtime log, error cases, before/after output sample
Benchmark fields

Every result needs receipts.

TokenByte benchmark posts should include enough context for a reader to reproduce or reject the result.

HW

Hardware

CPU, GPU, VRAM, RAM, storage, OS, power, cooling, and any unusual constraints.

Gear hub
SW

Software

Model names, app versions, drivers, workflow files, quantization, and settings.

Testing policy
RUN

Run Data

Speed, memory, VRAM, temperature, output size, time-to-result, and failure modes.

RTX guide
BUY

Verdict

Who should buy, who should skip, cheaper alternatives, and what to upgrade first.

Roadmap
Measurement protocol

How a queued row becomes a published result.

The page is allowed to influence buying advice only after the test moves from queued to measured in the public JSON log.

Minimum evidence gate

  • Record the exact hardware: CPU, GPU, VRAM, RAM, storage, OS, driver, cooling, and power constraints.
  • Record the exact software: app version, model or workflow name, quantization, settings, seed when relevant, and prompts or input files.
  • Run the same workload at least three times when the result is a speed, time, or memory claim.
  • Publish the failure notes, not only the best run. A setup that crashes, swaps, overheats, or silently degrades output is not a clean win.
  • Attach screenshots, output samples, workflow files, logs, or photos before changing a buying verdict.
StatusAllowed claimNot allowed yet
QueuedTokenByte plans to test this workload and has listed the required evidence.Speed claims, winner language, product rankings, or completed-review scores.
MeasuredThe row has original measurements and evidence files attached to the public log.Broad claims outside the tested hardware, model, settings, or workflow.
RetestOlder data exists, but the recommendation needs a new run before changing buying advice.Using stale numbers as current buying proof.