Last updated Jan 23, 2026

Forge LLM limits

Forge LLMs is available through Forge's Early Access Program (EAP). EAP grants selected users early testing access for feedback; APIs and features in EAP are experimental, unsupported, subject to change without notice, and not recommended for production — sign up here to participate.

For more details, see Forge EAP, Preview, and GA.

The following limits apply for each installation of your app when using the Forge LLMs API:

Resource	Limit	Description
Context window size in tokens	200000	The maximum number of tokens a model can reference plus subsequently generate.
Requests per minute	100	The number of prompts sent to any model in any given minute.
Inference time in minutes	5	The maximum time a model can process and generate responses before a timeout occurs, assuming the Async events API is used with a specified timeout equal or greater than 5 minutes. Otherwise the specified or default timeouts apply.