Handling Rate Limits

Hitting issues with rate limits? We've got you covered!

In this guide we will look at handling:

situations where you actually hit a rate limit (i.e. HTTP 429); and
dynamic rate limiting (figuring out when you can make your next request, from a successful request).

Provider support

Rate limit handling is a new feature in Prism.

We currently only support Anthropic, but we're working to add support for more providers.

If you are keen to contribute, take a look at the issues tab on Github - implementing rate limits for a provider is a great first contribution!

The ProviderRateLimit value object

Throughout this guide, we'll talk about the ProviderRateLimit value object.

Each ProviderRateLimit has four properties:

name - the name given to that rate limit by the provider - e.g. "input-tokens"
limit - the current limit set on your API key by the provider - e.g. for input-tokens, perhaps 80000
remaining - how many you have left - e.g. for input-tokens if you have used 30000 out of your 80000 limit - this will be 50000
resetsAt - a Carbon instance with the date and time at which remaining will reset to limit

Handling a rate limit hit

Prism throws a PrismRateLimitedException when you hit a rate limit.

You can catch that exception, gracefully fail and inspect the rateLimits property which contains an array of ProviderRateLimits.

php

use EchoLabs\Prism\Prism;
use EchoLabs\Enums\Provider;
use EchoLabs\Prism\ValueObjects\ProviderRateLimit;
use EchoLabs\Prism\Exceptions\PrismRateLimitedException;

try {
    Prism::text()
        ->using(Provider::Anthropic, 'claude-3-5-sonnet-20241022')
        ->withPrompt('Hello world!')
        ->generate();
}
catch (PrismRateLimitedException $e) {
    /** @var ProviderRateLimit $rate_limit */ 
    foreach ($e->rateLimits as $rate_limit) {
        // Loop through rate limits...
    }
    
    // Log, fail gracefully, etc.
}

Figuring out which rate limit you have hit

In a simple world, they'd only be one rate limit.

However most providers implement various rate limits (e.g. request, input tokens, output tokens, etc.) and provide you with information on all of them on all requests, regardless of which you have hit.

For simple rate limits like "requests", the remaining property on ProviderRateLimit will be 0 if you have hit it. These are easy to find:

php

use EchoLabs\Prism\ValueObjects\ProviderRateLimit;
use Illuminate\Support\Arr;

try {
    // Your request
}
catch (PrismRateLimitedException $e) {
    $hit_limit = Arr::first($e->rateLimits, fn(ProviderRateLimit $rate_limit) => $rate_limit->remaining === 0);
}

For less simple rate limits like input tokens, the remaining property may not be zero. For instance, if you have 5,000 input tokens remaining and submit a request requiring 6,000 tokens, you'll be rate limited but remaining will still show 5,000.

Here, you may need to implement some logic to approximate how many tokens your request will use before sending it, and then test against that:

php

use EchoLabs\Prism\ValueObjects\ProviderRateLimit;
use Illuminate\Support\Arr;

try {
    // Your request
}
catch (PrismRateLimitedException $e) {
    $input_token_limit = Arr::first($e->rateLimits, fn(ProviderRateLimit $rate_limit) => $rate_limit->name === 'input-tokens');

    if ($input_token_limit < $your_token_estimate) {
        // Handle
    }
}

To help with approximating input token usage, we plan to implement Anthopic's token counting endpoint in a future release.

For providers that don't have a token counting endpoint, you could either roll your own token counter or use something like tiktoken if you are comfortable calling out to Python.

Once you know which rate limit you have hit, you'll want to ensure your app does not continue making requests until after the ProviderRateLimit resetsAt property.

If you aren't sure where to start with that, check out the What should you do with rate limit information section below.

Dynamic rate limiting

Prism adds the same rate limit information to every successful request:

php

use EchoLabs\Prism\Prism;
use EchoLabs\Enums\Provider;
use EchoLabs\Prism\ValueObjects\ProviderRateLimit;

$response = Prism::text()
    ->using(Provider::Anthropic, 'claude-3-5-sonnet-20241022')
    ->withPrompt('Hello world!')
    ->generate();
    
/** @var ProviderRateLimit $rate_limit */ 
foreach ($response->responseMeta->rateLimits as $rate_limit) {
    // Handle
}

Armed with that information, you'll probably want to update your app's rate limiter(s).

What should you do with rate limit information?

You'll likely want to implement a rate limiter within your app. Thankfully Laravel, as always, makes this very easy!

You should take a look at the rate limiting docs, and if you are firing requests from your queue, check out the job middleware docs.

You should implement a rate limiter / job middleware for each of the provider rate limits your application typically hits.

Handling Rate Limits ​

Provider support ​

The ProviderRateLimit value object ​

Handling a rate limit hit ​

Figuring out which rate limit you have hit ​

Dynamic rate limiting ​

What should you do with rate limit information? ​