> ## Documentation Index > Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Autoscaling > Configure autoscaling for Friendli Dedicated Endpoints to automatically adjust GPU replicas based on traffic and latency thresholds. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; Friendli Dedicated Endpoints provide autoscaling that automatically adjusts computational resources based on your traffic patterns, helping you optimize both performance and costs. Autoscaling Config

## How Autoscaling Works * **Minimum Replicas**: * When set to 0, the endpoint enters sleeping status during periods of inactivity, helping to minimize costs * When set to a value greater than 0, the endpoint maintains at least that number of active replicas at all times * **Maximum Replicas**: Defines the upper limit of replicas that can be created to handle increased traffic load * **Cooldown Period**: Measured in seconds; if no requests are received during this period, the endpoint transitions to sleeping status. ## Scaling Policies We highly recommend using the **Default** autoscaling type, as it performs reliably for most workloads. Performance degradation or unexpected charges may occur with other configurations if you don't fully understand your workload characteristics. * **Default** (Recommended): This is the best choice for the majority of users. It operates reliably across most workloads with no configuration required, leveraging our internal expertise to provide a balanced approach to performance and cost. * **Request count**: This is an advanced option for users who have a deep understanding of their workload characteristics and require granular control over scaling behavior. * As users define the number of requests a single worker will handle, cost prediction becomes more straightforward and intuitive. * This method can serve as a foundation for implementing your own custom autoscaling logic by dynamically changing the threshold via an API, targeting custom metrics. ## Benefits of Autoscaling * **Cost Optimization**: Pay only for the resources you need for your workload. * **Performance Management**: Handle traffic spikes efficiently. * **Resource Efficiency**: Maintain optimal resource utilization for your workload.