Developer
News and Updates
Get Support
Sign in
Get Support
Sign in
DOCUMENTATION
Cloud
Data Center
Resources
Sign in
Sign in
DOCUMENTATION
Cloud
Data Center
Resources
Sign in
Last updated Jan 23, 2026

Forge LLM limits

Forge LLMs is now available as preview feature.

Preview features are deemed stable; however, they remain under active development and may be subject to shorter deprecation windows. Preview features are suitable for early adopters in production environments.

We release preview features so partners and developers can study, test, and integrate them prior to General Availability (GA). For more information, see Forge release phases: EAP, Preview, and GA.

The following limits apply for each installation of your app when using the Forge LLMs API:

ResourceLimitDescription
Context window size in tokens200000The maximum number of tokens a model can reference plus subsequently generate.
Requests per minute100The number of prompts sent to any model in any given minute.
Inference time in minutes5The maximum time a model can process and generate responses before a timeout occurs, assuming the Async events API is used with a specified timeout equal or greater than 5 minutes. Otherwise the specified or default timeouts apply.

Rate this page: