Last updated Jan 23, 2026

Forge LLM limits

Forge LLMs is now available as preview feature.

Preview features are deemed stable; however, they remain under active development and may be subject to shorter deprecation windows. Preview features are suitable for early adopters in production environments.

We release preview features so partners and developers can study, test, and integrate them prior to General Availability (GA). For more information, see Forge release phases: EAP, Preview, and GA.

The following limits apply for each installation of your app when using the Forge LLMs API:

Resource	Limit	Description
Context window size in tokens	200000	The maximum number of tokens a model can reference plus subsequently generate.
Requests per minute	100	The number of prompts sent to any model in any given minute.
Inference time in minutes	5	The maximum time a model can process and generate responses before a timeout occurs, assuming the Async events API is used with a specified timeout equal or greater than 5 minutes. Otherwise the specified or default timeouts apply.

Rate this page: