Forge LLMs is available through Forge's Early Access Program (EAP). EAP grants selected users early testing access for feedback; APIs and features in EAP are experimental, unsupported, subject to change without notice, and not recommended for production — sign up here to participate.
For more details, see Forge EAP, Preview, and GA.
The following limits apply for each installation of your app when using the Forge LLMs API:
| Resource | Limit | Description |
|---|---|---|
| Context window size in tokens | 200000 | The maximum number of tokens a model can reference plus subsequently generate. |
| Requests per minute | 100 | The number of prompts sent to any model in any given minute. |
| Inference time in minutes | 5 | The maximum time a model can process and generate responses before a timeout occurs, assuming the Async events API is used with a specified timeout equal or greater than 5 minutes. Otherwise the specified or default timeouts apply. |
Rate this page: