Developer
Get Support
Sign in
Get Support
Sign in
DOCUMENTATION
Cloud
Data Center
Resources
Sign in
Sign in
DOCUMENTATION
Cloud
Data Center
Resources
Sign in
Last updated Jan 23, 2026

Forge LLM limits

The following limits apply for each installation of your app when using the Forge LLMs API:

ResourceLimitDescription
Context window size in tokens200000The maximum number of tokens a model can reference plus subsequently generate.
Requests per minute100The number of prompts sent to any model in any given minute.
Inference time in minutes5The maximum time a model can process and generate responses before a timeout occurs, assuming the Async events API is used with a specified timeout equal or greater than 5 minutes. Otherwise the specified or default timeouts apply.

Rate this page: