Vercel MCP hosting
Use this page to operate the direct Vercel-hosted fpf-memory MCP endpoint.
Current state
Canonical MCP endpoint:
Canonical status endpoint:
The direct Vercel origin is the only hosted endpoint documented for clients. The runtime uses the official MCP SDK directly and emits Vercel Build Output API files without an intermediate framework deployer.
Validation snapshot on 2026-05-04:
The 75-call mixed sample still had read/query tail spikes, so treat p95 latency as a release-gate metric to keep watching rather than a fixed platform constant.
Vercel setup
The Vercel project runs the direct MCP runtime as a Vercel function.
The repo-root vercel.json pins GitHub preview builds to bun run vercel:origin:build, which stages the hosted spec, manifest, and snapshot, creates the Vercel Build Output API bundle, and runs the bundle-size guard.
Known direct-origin constraints:
- The local prebuilt Vercel function bundle is about 211 MB, close enough to Vercel's 250 MB function bundle limit that
bun run bench:vercel:function-sizeremains a release gate. - Vercel functions can read the bundled
hosted/FPF-Spec.mdand seed the bundled snapshot into/tmp; mutable runtime artifacts and logs must use/tmp. - Preview deployments may be protected by Vercel Authentication; smoke the production alias or use an automation bypass token.
- The generated
.vercel/output/functions/index.funcdirectory is the deployment artifact; keep the build-output shape covered bybun run vercel:origin:build. /api/fpf/statusis a plain JSON freshness endpoint. It reportsupstreamRef,sourceHash,publishedAt,specBytes, and whether the bundled snapshot matches the bundled spec.
Cost comparison
Sources checked on 2026-05-04:
- Vercel pricing: Pro is $20/month plus additional usage, with $20 included usage credit. Vercel Functions list 4 hours active CPU, 360 GB-hours provisioned memory, and 1 million invocations included before usage pricing.
- Vercel function limits: Node functions have a 250 MB uncompressed bundle limit, 2 GB memory on Hobby, up to 4 GB on Pro/Enterprise, and function usage is billed on active CPU time plus provisioned memory time.
Comparison:
Current pick: use the direct Vercel origin as canonical. Keep bundle-size, smoke, Q&A, and latency checks as release gates.
LLM support cost comparison
This is separate from hosting cost. The hosted MCP runtime can answer deterministically without an LLM. Only query_fpf_spec and ask_fpf can use the optional synthesis layer, and compact route answers often stay deterministic.
Sources checked on 2026-05-04:
- LM Studio for Teams: Community use is free for personal and work use; the local server runs on your own hardware.
- LM Studio free for work announcement: LM Studio removed the separate commercial-license requirement for work use.
- xAI Models and Pricing: xAI recommends
grok-4.3for API callers, bills token usage, and charges server-side tools separately. - Pi model registry for Grok 4.3: lists
grok-4.3at $1.25 / 1M input tokens, $2.50 / 1M output tokens, and $0.20 / 1M cache-read tokens. - VentureBeat Grok 4.3 launch coverage: reports the same $1.25 / $2.50 per 1M token pricing, with higher pricing above 200,000 input tokens.
- EIA Electric Power Monthly table 5.3: U.S. residential electricity averaged 17.65 cents/kWh in February 2026.
Cost model:
Example monthly electricity cost for local LM Studio at 17.65 cents/kWh:
Example Grok 4.3 API cost:
Operational comparison:
Current pick: keep deterministic answers as the hosted default. Use LM Studio for local synthesis while developing. Add a hosted xAI/Grok provider only if synthesized prose becomes a real production requirement; otherwise it adds cost without improving the deterministic FPF evidence surface.
Smoke test
Run the hosted HTTP smoke against the canonical Vercel origin:
The smoke verifies:
initializereturns serverfpf_memorytools/listreturns the six public tools onlyget_fpf_index_statusreports a freshlocal_vectorlesssnapshotquery_fpf_specreturnsroute:project-alignmentGETwithAccept: text/event-streamreturns SSE or a valid method rejection
Benchmark
Run the benchmark after the smoke test passes:
Useful benchmark knobs:
--clients <n>controls concurrent MCP clients.--requests <n>controls measured tool calls, excluding setup and warmup.--warmup <n>controls unreported warmup tool calls.--scenario mixed|query|read|discovery|statuscontrols the operation mix.--format json|markdowncontrols stdout format.
Treat a benchmark as invalid if any measured call fails, returns isError=true, reports a stale snapshot, or exposes the wrong public tool surface.
For cold-start, idle, and soak checks, run repeated bench-lite samples instead of a single burst:
Series-specific knobs:
--iterations <n>controls how many samples to run.--interval-ms <n>controls the idle wait between samples; use0for fast local validation.--format json|jsonlcontrols whether stdout is one final JSON summary or JSON lines with per-iteration records plus a final summary.--output <path>writes the same JSON or JSONL payload to disk.
Each series iteration defaults to --requests 12 --clients 1 --warmup 0; pass the normal benchmark knobs to override the bench-lite workload.
Run the Q&A benchmark as the correctness gate for hosted deployment:
The Q&A gate accepts status: "degraded" only when the answer exposes expected retrieval candidates in candidateIds, keeps committed ids empty, and confidence stays low. Deterministic citations, relations, and constraints remain valid evidence in degraded answers. A synthesis failure with status: "ok" is a benchmark failure.