> ## Documentation Index > Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Deploy with Hugging Face Models > Deploy Hugging Face models on Friendli Dedicated Endpoints. Step-by-step tutorial covering model selection, endpoint creation, and first inference call. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; This tutorial walks you through deploying a Hugging Face model on Friendli Dedicated Endpoints—from creating an endpoint to sending your first request. We'll use [meta-llama-3-8b-instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as the example model. ## Prerequisites * A Friendli Suite account with access to [Friendli Suite > Dedicated Endpoints](https://friendli.ai/suite/~/dedicated-endpoints) * A Hugging Face account with access to the [meta-llama-3-8b-instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model ## Step 1: Create a New Endpoint 1. Log in to your [Friendli Suite](https://friendli.ai/suite) account and navigate to the [Dedicated Endpoints](https://friendli.ai/suite/~/dedicated-endpoints). 2. If not done already, start the free trial for Dedicated Endpoints. 3. Create a new project, then click on the 'New Endpoint' button. 4. Fill in the basic information: * Endpoint name: Choose a unique name for your endpoint (e.g., "My New Endpoint"). 5. Select the model:

* Model Repository: Select "Hugging Face" as the model provider. * Model ID: Enter "meta-llama/Meta-Llama-3-8B-Instruct" as the model id. As the search bar loads the list, click on the top result that exactly matches the repository id. By default, the model pulls the latest commit on the default branch of the model. You may manually select a specific branch / tag / commit instead. If you're using your own model, check [Format Requirements](/guides/dedicated-endpoints/faq#format-requirements) for requirements. 6. Select the instance: Select instance

* Instance configuration: Choose a suitable instance type based on your performance requirements. We suggest 1x A100 80G for most models. In some cases where the model's size is big, some options may be restricted as they are guaranteed to not run due to insufficient VRAM. Low Memory Warning

7. Edit the configurations: Autoscaling Config

* Autoscaling: By default, the autoscaling ranges from 0 to 2 replicas. This means that the deployment will sleep when it's not being used, which reduces cost. * Advanced configuration: Some LLM options including the batch size and token configurations are mutable. For this tutorial, we'll leave it as-is. 8. Click 'Deploy' to create a new endpoint. ## Step 2: Test the Endpoint 1. Wait for the deployment to be created and initialized. This may take a few minutes. You may check the status by the indicator under the endpoint's name. Initializing Endpoint

2. In the "Playground" section, you may enter a sample input prompt (e.g., "What is the capital of France?"). 3. Click on the right arrow button to send the inference request.

4. You can use the "Metrics" and "Logs" section to monitor the endpoint. Metrics

## Step 3: Send Requests by Using cURL or Python 1. As instructed in our [API docs](/openapi/model-apis/chat-completions), you can send instructions with the following code: ```python OpenAI Python SDK theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.getenv("API_KEY"), base_url="https://api.friendli.ai/dedicated/v1", ) chat_completion = client.chat.completions.create( model="YOUR_ENDPOINT_ID", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" } ] ) print(chat_completion.choices[0].message.content) ``` ```python Friendli Python SDK theme={null} import os from friendli import SyncFriendli client = SyncFriendli( token=os.getenv("API_KEY"), ) chat_completion = client.dedicated.chat.complete( model="YOUR_ENDPOINT_ID", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" } ] ) print(chat_completion.choices[0].message.content) ``` ```sh curl theme={null} curl -X POST https://api.friendli.ai/dedicated/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d '{ "model": "(endpoint-id)", "messages": [ { "role": "user", "content": "What is the capital of France?" } ], "max_tokens": 200, "top_k": 1 }' ``` 2. You can update the model and change almost everything by clicking the update button.