Introduction
This guide will walk you through deploying Friendli Container as an Amazon EKS Add-on to enable real-time inference on your Kubernetes cluster.By utilizing Friendli Container in your EKS environment, you’ll benefit from the Friendli Engine’s speed and resource efficiency.We’ll explore how to configure GPU nodes, install the add-on, and create inference deployments using Kubernetes manifests.Walking through this tutorial is easier with eksctl and AWS CLI tools. Please visit the eksctl documentation and AWS CLI homepage for the installation guides.
General Workflow
- Add GPU Node Group: Create a GPU-enabled node group in your EKS cluster with instances like g6.xlarge or g5.2xlarge.
- Configure Friendli Container EKS add-on: Subscribe to the Friendli Container add-on from the AWS Marketplace and configure IRSA for license validation.
- Create Friendli Deployment: Deploy your model using FriendliDeployment custom resource.
- Run Inference: Send inference requests to your deployed model.
Prerequisite
- AWS account with permissions for EKS, IAM, EC2 operations
- eksctl and AWS CLI tools installed and configured
- kubectl configured to access your EKS cluster
- (Optional) Hugging Face token if deploying gated/private models. Hugging Face token docs
1. Add GPU Node Group to your EKS Cluster
You need an active Amazon EKS cluster. To create a cluster, consult the Amazon EKS documentation on creating an EKS cluster.Friendli Container EKS-addon requires Kubernetes version 1.29 or later.
| Supported NVIDIA Device | AWS EC2 Instance Type |
|---|---|
| B200 | P6 instances |
| H200 | P5 instances |
| H100 | P5 instances |
| A100 | P4 instances |
| L40S | G6e instances |
| A10G | G5 instances |
| L4 | G6 instances |
- Amazon VPC CNI
- CoreDNS
- kube-proxy
- Amazon EKS Pod Identity Agent
- Open Amazon EKS console and choose the cluster that you want to create a node group in.
- Select the “Compute” tab and click “Add node group”.
- Configure the new node group by entering the name, Node IAM role, and other information. You can click “Create recommended role” to create IAM role. Click “Next”.
- On the next page, select “Amazon Linux 2023 (x86_64) Nvidia” for AMI type.
- Select the appropriate instance type for the GPU device of your choice.
- Suggested instance type for this tutorial is
g6.2xlarge.
- Suggested instance type for this tutorial is
- Configure the disk size. It should be large enough to download the model you want to deploy.
- Suggested disk size for this tutorial is 100GB.
- Configure the desired node group size.
- Go through the rest of the steps, review the changes and click “Create”.
2. Configure Friendli Container EKS add-on
- Open Amazon EKS console and choose the cluster that you want to configure.
- Select the “Add-ons” tab and click “Get more add-ons”.
- Scroll down and under the section “AWS Marketplace add-ons”, search and check “Friendli Container”, and click “Next”.
- Click “Next”, Review your settings, and click “Create”
- For the details of the pricing, check Friendli Container on AWS Marketplace.
- For trials, custom offers, and inquiries, please visit here for contacts.
<REGION> with the AWS region you created the cluster and <CLUSTER> with the EKS cluster name.
default in the default namespace to exercise AWSMarketplaceMeteringFullAccess policy on your behalf. Click here to learn more about IRSA.
3. Create Friendli Deployment
You need to be able to use the “kubectl” CLI tool to access your EKS cluster. Consult this guide from AWS for more details.
To deploy a private or gated model in the HuggingFace model hub, you need to create a HuggingFace access token with “read” permission.Then create a Kubernetes secret.
kubectl create secret generic hf-secret --from-literal token=YOUR_TOKEN_HERE- The “token:” section under spec.model.huggingFace refers to the Kubernetes secret you created for storing the HuggingFace access token. If accessing your model does not require an access token, you can omit the “token:” section entirely.
- In the example above, the node selector is
eks.amazonaws.com/nodegroup: <NODE GROUP NAME>. Replace the node selector key to match the name of your node group. - CPU and memory resource requirements are adjusted to g6.2xlarge instance and you may need to edit those values if you used different instance type.
If your cluster has NVIDIA GPU Operator installed, you need to put “nvidia.com/gpu” resource in “requests:” and “limits:” section, as GPU nodes will advertise that they have “nvidia.com/gpu” resource alongside ordinary resources like “cpu” and “memory”. You can omit “numGPUs” from your FriendliDeployment.Below is the equivalent example as above for the GPU Operator-enabled cluster.
kubectl apply -f friendlideployment.yaml