> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploy with Hugging Face Models

> Deploy Hugging Face models on Friendli Dedicated Endpoints. Step-by-step tutorial covering model selection, endpoint creation, and first inference call.

export const RoundedBorderBox = ({children, caption}) => <div className="rounded-border-box">
    {children}
    {caption && <p className="text-sm text-gray-700 dark:text-gray-400">{caption}</p>}
  </div>;

This tutorial walks you through deploying a Hugging Face model on Friendli Dedicated Endpoints—from creating an endpoint to sending your first request. We'll use [meta-llama-3-8b-instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as the example model.

## Prerequisites

* A Friendli Suite account with access to [Friendli Suite > Dedicated Endpoints](https://friendli.ai/suite/~/dedicated-endpoints)
* A Hugging Face account with access to the [meta-llama-3-8b-instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model

## Step 1: Create a New Endpoint

1. Log in to your [Friendli Suite](https://friendli.ai/suite) account and navigate to the [Dedicated Endpoints](https://friendli.ai/suite/~/dedicated-endpoints).
2. If not done already, start the free trial for Dedicated Endpoints.
3. Create a new project, then click on the 'New Endpoint' button.
4. Fill in the basic information:

* Endpoint name: Choose a unique name for your endpoint (e.g., "My New Endpoint").

5. Select the model:

<RoundedBorderBox>
  <img alt="Hugging Face Model Search" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/hugging-face/hf-model-search.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=ba5c891b4eebb66ee7a9f14fae0b01f8" width="1654" height="1270" data-path="static/images/guides/dedicated-endpoints/tutorial/hugging-face/hf-model-search.png" />
</RoundedBorderBox>

* Model Repository: Select "Hugging Face" as the model provider.
* Model ID: Enter "meta-llama/Meta-Llama-3-8B-Instruct" as the model id. As the search bar loads the list, click on the top result that exactly matches the repository id.

<Info>
  By default, the model pulls the latest commit on the default branch of the model. You may manually select a specific branch / tag / commit instead.

  If you're using your own model, check [Format Requirements](/guides/dedicated-endpoints/faq#format-requirements) for requirements.
</Info>

6. Select the instance:

<RoundedBorderBox>
  <img alt="Select instance" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/hugging-face/gpu-selection.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=48ec49424afa8e60b2d4d82345c576c7" width="1626" height="1332" data-path="static/images/guides/dedicated-endpoints/tutorial/hugging-face/gpu-selection.png" />
</RoundedBorderBox>

* Instance configuration: Choose a suitable instance type based on your performance requirements. We suggest 1x A100 80G for most models.

<Info>In some cases where the model's size is big, some options may be restricted as they are guaranteed to not run due to insufficient VRAM.</Info>

<RoundedBorderBox>
  <img alt="Low Memory Warning" style={{ maxWidth: "400px", width: "-webkit-fill-available" }} src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/low-mem-warning.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=45082768b576bb5fcb19c54c17f06b5f" width="1644" height="796" data-path="static/images/guides/dedicated-endpoints/tutorial/low-mem-warning.png" />
</RoundedBorderBox>

7. Edit the configurations:

<RoundedBorderBox>
  <img alt="Autoscaling Config" style={{ maxWidth: "600px", width: "-webkit-fill-available" }} src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/autoscaling-config.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=6485338094ff440a4e7b06dffdac5a45" width="1576" height="1058" data-path="static/images/guides/dedicated-endpoints/tutorial/autoscaling-config.png" />
</RoundedBorderBox>

<br />

<RoundedBorderBox>
  <img alt="Engine Config" style={{ maxWidth: "600px", width: "-webkit-fill-available" }} src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/engine-config.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=53cf6bb465fc8ffbccdeeda6db24c32e" width="1342" height="896" data-path="static/images/guides/dedicated-endpoints/tutorial/engine-config.png" />
</RoundedBorderBox>

* Autoscaling: By default, the autoscaling ranges from 0 to 2 replicas. This means that the deployment will sleep when it's not being used, which reduces cost.
* Advanced configuration: Some LLM options including the batch size and token configurations are mutable. For this tutorial, we'll leave it as-is.

8. Click 'Deploy' to create a new endpoint.

## Step 2: Test the Endpoint

1. Wait for the deployment to be created and initialized. This may take a few minutes.

<Note>
  You may check the status by the indicator under the endpoint's name.
</Note>

<RoundedBorderBox>
  <img alt="Initializing Endpoint" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/initializing.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=71920abcd1401de94ae4d657f87b26ac" width="1100" height="360" data-path="static/images/guides/dedicated-endpoints/tutorial/initializing.png" />
</RoundedBorderBox>

2. In the "Playground" section, you may enter a sample input prompt (e.g., "What is the capital of France?").
3. Click on the right arrow button to send the inference request.

<RoundedBorderBox>
  <img alt="Playground" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/playground.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=9e3d168b0039f80f501e8df40e5cc122" width="1096" height="618" data-path="static/images/guides/dedicated-endpoints/tutorial/playground.png" />
</RoundedBorderBox>

4. You can use the "Metrics" and "Logs" section to monitor the endpoint.

<RoundedBorderBox>
  <img alt="Metrics" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/metrics.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=98c944e50b88401145573b28a49a2d9b" width="2652" height="2014" data-path="static/images/guides/dedicated-endpoints/tutorial/metrics.png" />
</RoundedBorderBox>

<br />

<RoundedBorderBox>
  <img alt="Logs" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/dedicated-endpoints/tutorial/logs.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=99d861a49d09b0736b23a83955f39efa" width="1106" height="766" data-path="static/images/guides/dedicated-endpoints/tutorial/logs.png" />
</RoundedBorderBox>

## Step 3: Send Requests by Using cURL or Python

1. As instructed in our [API docs](/openapi/model-apis/chat-completions), you can send instructions with the following code:

<CodeGroup>
  ```python OpenAI Python SDK theme={null}
  import os
  from openai import OpenAI
  client = OpenAI(
      api_key=os.getenv("API_KEY"),
      base_url="https://api.friendli.ai/dedicated/v1",
  )

  chat_completion = client.chat.completions.create(
      model="YOUR_ENDPOINT_ID",
      messages=[
          {
              "role": "user",
              "content": "Tell me how to make a delicious pancake"
          }
      ]
  )
  print(chat_completion.choices[0].message.content)
  ```

  ```python Friendli Python SDK theme={null}
  import os
  from friendli import SyncFriendli
  client = SyncFriendli(
      token=os.getenv("API_KEY"),
  )
  chat_completion = client.dedicated.chat.complete(
      model="YOUR_ENDPOINT_ID",
      messages=[
          {
              "role": "user",
              "content": "Tell me how to make a delicious pancake"
          }
      ]
  )
  print(chat_completion.choices[0].message.content)
  ```

  ```sh curl theme={null}
  curl -X POST https://api.friendli.ai/dedicated/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $API_KEY" \
    -d '{
      "model": "(endpoint-id)",
      "messages": [
        {
          "role": "user",
          "content": "What is the capital of France?"
        }
      ],
      "max_tokens": 200,
      "top_k": 1
    }'
  ```
</CodeGroup>

2. You can update the model and change almost everything by clicking the update button.
