Forge LLMs is now available as preview feature.
Preview features are deemed stable; however, they remain under active development and may be subject to shorter deprecation windows. Preview features are suitable for early adopters in production environments.
We release preview features so partners and developers can study, test, and integrate them prior to General Availability (GA). For more information, see Forge release phases: EAP, Preview, and GA.
The following limits apply for each installation of your app when using the Forge LLMs API:
| Resource | Limit | Description |
|---|---|---|
| Context window size in tokens | 200000 | The maximum number of tokens a model can reference plus subsequently generate. |
| Requests per minute | 100 | The number of prompts sent to any model in any given minute. |
| Inference time in minutes | 5 | The maximum time a model can process and generate responses before a timeout occurs, assuming the Async events API is used with a specified timeout equal or greater than 5 minutes. Otherwise the specified or default timeouts apply. |
Rate this page: