> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Dedicated Create Endpoint

> Create a Friendli Dedicated Endpoint deployment for a Hugging Face model via the API. Specify GPU type, replica count, and model configuration.

Create a Dedicated Endpoint deployment for a Hugging Face model.

To request successfully, it is mandatory to enter a **Personal API Key** (e.g. flp\_XXX) value in the **Bearer Token** field.
Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/~/setting/keys) to generate your API Key.

<Info>
  This API is currently in **Beta**.
  While we strive to provide a stable and reliable experience, this feature is still under active development.
  As a result, you may encounter unexpected behavior or limitations.
  We encourage you to provide feedback to help us improve the feature before its official release.

  * [Feature request & feedback](mailto:support@friendli.ai)
  * [Contact support](mailto:support@friendli.ai)
</Info>


## OpenAPI

````yaml https://github.com/friendliai/friendli-openapi/raw/refs/heads/main/openapi.yaml post /dedicated/beta/endpoint
openapi: 3.1.0
info:
  title: Friendli Suite API Reference
  description: This is an OpenAPI reference of Friendli Suite API.
  termsOfService: https://friendli.ai/terms-of-service
  contact:
    name: FriendliAI Support Team
    email: support@friendli.ai
  version: 0.1.0
servers:
  - url: https://api.friendli.ai
security: []
tags:
  - name: Serverless.Chat
  - name: Serverless.ToolAssistedChat
  - name: Serverless.Messages
  - name: Serverless.ChatRender
  - name: Serverless.Completions
  - name: Serverless.Token
  - name: Serverless.Audio
  - name: Serverless.Model
  - name: Serverless.Knowledge
  - name: Dedicated.Chat
  - name: Dedicated.Messages
  - name: Dedicated.ChatRender
  - name: Dedicated.Completions
  - name: Dedicated.Embeddings
  - name: Dedicated.TextClassification
  - name: Dedicated.Token
  - name: Dedicated.Image
  - name: Dedicated.Audio
  - name: Dedicated.Endpoint
  - name: Container.Chat
  - name: Container.Messages
  - name: Container.Completions
  - name: Container.TextClassification
  - name: Container.Token
  - name: Container.Image
  - name: Container.Audio
  - name: Cost
  - name: Dataset
  - name: File
paths:
  /dedicated/beta/endpoint:
    post:
      tags:
        - Dedicated.Endpoint
      summary: Create a new endpoint
      description: Create a new endpoint and return its status
      operationId: dedicatedCreateEndpoint
      parameters:
        - name: X-Friendli-Team
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            description: ID of team to run requests as (optional parameter).
            title: X-Friendli-Team
          description: ID of team to run requests as (optional parameter).
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DedicatedEndpointCreateBody'
      responses:
        '200':
          description: Successfully created the endpoint.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DedicatedEndpointStatus'
              examples:
                Example:
                  value:
                    status: INITIALIZING
                    createdAt: '2025-01-01T00:00:00Z'
                    updatedAt: '2025-01-01T00:00:00Z'
                    phase: DOWNLOADING_MODEL
        '400':
          description: Bad Request
        '422':
          description: Unprocessable Entity
        '500':
          description: Internal Server Error
      security:
        - token: []
components:
  schemas:
    DedicatedEndpointCreateBody:
      properties:
        projectId:
          type: string
          title: Project ID
          description: The ID of the project that owns the endpoint.
        name:
          type: string
          title: Name
          description: The name of the endpoint.
        instanceOptionId:
          type: string
          title: Instance Option ID
          description: |-
            The ID of the instance option.

            Available options:
            - 1x NVIDIA A100 80GB: `ShbPuOs4tfGb`
            - 2x NVIDIA A100 80GB: `mrAHuYt7T40o`
            - 4x NVIDIA A100 80GB: `JkNob0NMdoF3`
            - 8x NVIDIA A100 80GB: `sYH4kHmAcA5P`
            - 1x NVIDIA H100: `TwD5AqnBSVN0`
            - 2x NVIDIA H100: `zfTutSiLn0Hq`
            - 4x NVIDIA H100: `lfkRz5G48REc`
            - 8x NVIDIA H100: `GUA4qYFmsYz8`
            - 1x NVIDIA H200: `LnK1wTaKc7WO`
            - 2x NVIDIA H200: `Tu6GjBnfHPe4`
            - 4x NVIDIA H200: `OhTzYtZuomzI`
            - 8x NVIDIA H200: `ahBzWtOuomsI`
            - 1x NVIDIA B200: `8GiQTLKfJNOr`
            - 2x NVIDIA B200: `brTZGIuYgVrs`
            - 4x NVIDIA B200: `AFoZMFXZnAdD`
            - 8x NVIDIA B200: `drbc6G9FxJWZ`
        advanced:
          $ref: '#/components/schemas/EndpointAdvancedConfig'
          title: Advanced
          description: The advanced configuration of the endpoint.
        simplescale:
          anyOf:
            - $ref: '#/components/schemas/EndpointSimplescaleConfig'
            - type: 'null'
          title: Simple Scale
          description: The simple scaling configuration of the endpoint.
        autoscalingPolicy:
          anyOf:
            - $ref: '#/components/schemas/AutoscalingPolicy'
            - type: 'null'
          title: Auto Scale Policy
          description: The auto scaling configuration of the endpoint.
        hfModelRepo:
          type: string
          title: HF Model Repo
          description: HF ID of the model.
        hfModelRepoRevision:
          anyOf:
            - type: string
            - type: 'null'
          title: HF Model Repo Revision
          description: HF commit hash of the model.
        initialVersionComment:
          anyOf:
            - type: string
            - type: 'null'
          title: Initial Version Comment
          description: The comment for the initial version.
      type: object
      required:
        - projectId
        - name
        - instanceOptionId
        - advanced
        - hfModelRepo
      title: DedicatedEndpointCreateBody
      description: Dedicated endpoint create request.
    DedicatedEndpointStatus:
      properties:
        status:
          $ref: '#/components/schemas/InferenceDeploymentStatus'
          title: Status
          description: The current status of the endpoint deployment.
        errorCode:
          anyOf:
            - $ref: '#/components/schemas/InferenceDeploymentErrorCode'
            - type: 'null'
          title: Error Code
          description: Error code if deployment failed.
        createdAt:
          type: string
          format: date-time
          title: Created At
          description: When the endpoint was created.
        updatedAt:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
          description: When the endpoint was last updated.
        phase:
          anyOf:
            - type: string
              enum:
                - REQUESTING_VIRTUAL_MACHINE
                - DOWNLOADING_MODEL
                - ENGINE_INITIALIZING
            - type: 'null'
          title: Phase
          description: The current phase of the endpoint.
      type: object
      required:
        - status
        - createdAt
      title: DedicatedEndpointStatus
      description: Dedicated endpoint status.
    EndpointAdvancedConfig:
      properties:
        max_batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Max Batch Size
        tokenizer_skip_special_tokens:
          type: boolean
          title: Tokenizer Skip Special Tokens
        tokenizer_add_special_tokens:
          type: boolean
          title: Tokenizer Add Special Tokens
        max_token_count:
          type: integer
          title: Max Token Count
          default: 2560
        enable_content_logging:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Enable Content Logging
        max_input_length:
          anyOf:
            - type: integer
            - type: 'null'
          title: Max Input Length
      type: object
      required:
        - tokenizer_skip_special_tokens
        - tokenizer_add_special_tokens
      title: EndpointAdvancedConfig
      description: Endpoint advanced config.
    EndpointSimplescaleConfig:
      properties:
        replicas:
          type: integer
          minimum: 1
          title: Replicas
      type: object
      required:
        - replicas
      title: EndpointSimplescaleConfig
      description: Simple scaling options.
    AutoscalingPolicy:
      properties:
        minReplica:
          type: integer
          minimum: 0
          title: Minimum Replica
          description: >-
            Setting `minReplica` to 0 allows the endpoint to sleep when idle,
            reducing costs. The minimum value is 0.
          default: 0
        maxReplica:
          type: integer
          maximum: 10
          title: Maximum Replica
          description: >-
            The maximum replicas that the endpoint can scale up to. The maximum
            value is 10.
          default: 1
        cooldownPeriod:
          type: integer
          title: Cooldown Period
          description: >-
            Determines how long the endpoint waits before scaling down after the
            last request.
          default: 300
      type: object
      title: AutoscalingPolicy
      description: Autoscaling policy.
    InferenceDeploymentStatus:
      type: string
      enum:
        - UNKNOWN
        - INITIALIZING
        - RUNNING
        - UPDATING
        - SLEEPING
        - AWAKING
        - FAILED
        - STOPPING
        - TERMINATING
        - TERMINATED
        - READY
      title: InferenceDeploymentStatus
      description: Status of inference deployment.
    InferenceDeploymentErrorCode:
      type: string
      enum:
        - WORKLOAD_INIT_UNKNOWN_ERROR
        - WORKLOAD_INIT_SETTINGS_ERROR
        - WORKLOAD_INIT_GRPC_ERROR
        - WORKLOAD_INIT_MANIFEST_NOT_FOUND_ERROR
        - WORKLOAD_INIT_MANIFEST_TYPE_ERROR
        - WORKLOAD_INIT_DOWNLOAD_ERROR
        - WORKLOAD_INIT_INVALID_TOKEN_ERROR
        - WORKLOAD_INIT_CANNOT_ACCESS_REPO_ERROR
        - WORKLOAD_INIT_HF_WANDB_API_ERROR
        - WORKLOAD_INIT_INSUFFICIENT_DISK_ERROR
        - INFERENCE_ENGINE_UNKNOWN_ERROR
        - INFERENCE_ENGINE_INVALID_ARGUMENT_ERROR
        - INFERENCE_ENGINE_MEMORY_ERROR
        - INFERENCE_ENGINE_METERING_CLIENT_CONFIG_ERROR
      title: InferenceDeploymentErrorCode
      description: ErrorCode type.
  securitySchemes:
    token:
      type: http
      description: >-
        When using Friendli Suite API for inference requests, you need to
        provide a **Personal API Key** for authentication and authorization
        purposes.


        For more detailed information, please refer
        [here](https://friendli.ai/docs/openapi/introduction#authentication).
      scheme: bearer

````