# Friendli Docs

> Let your team focus on building great AI products. FriendliAI will make sure your AI runs fast, affordable, and reliable at scale. Leave the hassle of AI inference to FriendliAI.

## Docs

- [Changelog](https://friendli.ai/docs/changelog.md): Track the latest FriendliAI product updates, new model releases, pricing changes, deprecations, and feature announcements across all endpoints.
- [CUDA Compatibility](https://friendli.ai/docs/guides/container/cuda-compatibility.md): The Friendli Engine supports CUDA-enabled NVIDIA GPUs, which means it relies on a specific version of CUDA and necessitates proper CUDA compute compatibilities.
- [Deploy Friendli Container as Amazon EKS Add-on](https://friendli.ai/docs/guides/container/eks-quickstart.md): Deploy Friendli Container on Amazon EKS using the official AWS EKS Add-On. Set up GPU nodes, install the add-on, and run model inference.
- [Inference with gRPC](https://friendli.ai/docs/guides/container/inference-with-grpc.md): Run a gRPC inference server with Friendli Container and send requests using the Friendli Python SDK. Includes setup, configuration, and code examples.
- [Introducing Friendli Container](https://friendli.ai/docs/guides/container/introduction.md): Deploy generative AI models on your own infrastructure with Friendli Container. Full control over GPU resources, networking, and scaling.
- [Observability for Friendli Container](https://friendli.ai/docs/guides/container/monitoring.md): Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format.
- [Optimizing Inference with Policy Search](https://friendli.ai/docs/guides/container/optimizing-inference-with-policy-search.md): Boost inference throughput by up to 2x for MoE and quantized models by running execution policy search in Friendli Container for production.
- [Quantization](https://friendli.ai/docs/guides/container/quantization.md): Learn how to serve pre-quantized models or perform online quantization with Friendli Container to reduce memory and speed up inference.
- [QuickStart: Friendli Container Trial](https://friendli.ai/docs/guides/container/quickstart.md): Get started with Friendli Container trial. Access the registry, configure your secret, launch the container, and monitor with Grafana.
- [Running Friendli Container](https://friendli.ai/docs/guides/container/running-friendli-container.md): Step-by-step guide to running Friendli Container on your own machine. Configure environment variables, select GPU resources, and deploy models.
- [Running Friendli Container on SageMaker](https://friendli.ai/docs/guides/container/sagemaker-integration.md): Create a real-time inference endpoint in Amazon SageMaker with Friendli Container. Leverage Friendli Engine for faster, cost-efficient serving.
- [Serving MoE Models](https://friendli.ai/docs/guides/container/serving-moe-models.md): Serve Mixture of Experts (MoE) models like Mixtral 8x7B with Friendli Container. Covers policy search setup and multi-GPU Docker configuration.
- [Serving Multi-LoRA Models](https://friendli.ai/docs/guides/container/serving-multi-lora-models.md): Serve multiple LoRA-adapted LLMs simultaneously with Friendli Container without additional GPU resources. No retraining needed for task-specific models.
- [Data Privacy & Security](https://friendli.ai/docs/guides/data-handling.md): Learn how FriendliAI handles your data. Inference requests and responses are never used for training or shared with third parties.
- [Autoscaling](https://friendli.ai/docs/guides/dedicated-endpoints/autoscaling.md): Configure autoscaling for Friendli Dedicated Endpoints to automatically adjust GPU replicas based on traffic and latency thresholds.
- [Dataset Specifications and Upload Guide](https://friendli.ai/docs/guides/dedicated-endpoints/dataset.md): Upload and manage datasets for Friendli Dedicated Endpoints. Covers supported formats, size limits, splits, versioning, and the upload process.
- [Deploy with Hugging Face Models](https://friendli.ai/docs/guides/dedicated-endpoints/deploy-with-huggingface.md): Deploy Hugging Face models on Friendli Dedicated Endpoints. Step-by-step tutorial covering model selection, endpoint creation, and first inference call.
- [Deploy with W&B Models](https://friendli.ai/docs/guides/dedicated-endpoints/deploy-with-wandb.md): Deploy models from Weights & Biases artifacts on Friendli Dedicated Endpoints. Covers linking W&B projects, selecting artifacts, and running inference.
- [Endpoints](https://friendli.ai/docs/guides/dedicated-endpoints/endpoints.md): Manage Friendli Dedicated Endpoints deployments. Learn about endpoint lifecycle, GPU allocation, status monitoring, and configuration options.
- [Dedicated Endpoints FAQ and Troubleshooting](https://friendli.ai/docs/guides/dedicated-endpoints/faq.md): Answers to common questions about Friendli Dedicated Endpoints, including model compatibility, GPU requirements, billing, and troubleshooting tips.
- [Introducing Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated-endpoints/introduction.md): Run custom or open-source generative AI models on dedicated GPU hardware with Friendli Dedicated Endpoints. No shared resources or infra management.
- [Serving LoRA Models](https://friendli.ai/docs/guides/dedicated-endpoints/lora-models.md): Learn how to deploy LoRA models from Hugging Face Hub to Friendli Dedicated Endpoints for efficient inference, including a quick guide for FLUX LoRA models.
- [Models](https://friendli.ai/docs/guides/dedicated-endpoints/models.md): Manage models for Friendli Dedicated Endpoints. Upload directly, or load from Hugging Face and Weights & Biases artifact repositories.
- [Multi-LoRA Serving](https://friendli.ai/docs/guides/dedicated-endpoints/multi-lora-serving.md): Enable Multi-LoRA serving on Friendli Dedicated Endpoints to run multiple LoRA adapters on a single base model without extra GPU resources.
- [Online Quantization](https://friendli.ai/docs/guides/dedicated-endpoints/online-quantization.md): Automatically quantize models to 4-bit or 8-bit precision at deploy time on Friendli Dedicated Endpoints. No pre-quantized checkpoint needed.
- [Pricing & Billing](https://friendli.ai/docs/guides/dedicated-endpoints/pricing.md): View Friendli Dedicated Endpoints pricing by GPU type. Covers supported instance types, per-second billing, and how autoscaling affects costs.
- [QuickStart: Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated-endpoints/quickstart.md): Get started with Friendli Dedicated Endpoints. Create a project, pick a model, deploy an endpoint, and generate your first inference response.
- [Speculative Decoding](https://friendli.ai/docs/guides/dedicated-endpoints/speculative-decoding.md): Speed up LLM inference on Friendli Dedicated Endpoints with speculative decoding using proprietary draft models and N-gram token prediction.
- [Versioning](https://friendli.ai/docs/guides/dedicated-endpoints/versioning.md): Use endpoint versioning on Friendli Dedicated Endpoints to track deployment history, roll back to previous configurations, and update without downtime.
- [Integrations](https://friendli.ai/docs/guides/model-apis/integrations.md): Integrate Friendli Model APIs with LangChain, LiteLLM, LlamaIndex, and MongoDB for RAG, tool calling agents, and load balancing.
- [Introducing Friendli Model APIs](https://friendli.ai/docs/guides/model-apis/introduction.md): Get started with Friendli Model APIs to access popular AI models via API. No infrastructure setup or GPU management required.
- [Pricing & Billing](https://friendli.ai/docs/guides/model-apis/pricing.md): View Friendli Model APIs pricing per model. Compare token-based, time-based, and audio-based rates across usage tiers and free models.
- [QuickStart: Friendli Model APIs](https://friendli.ai/docs/guides/model-apis/quickstart.md): Get started with Friendli Model APIs in minutes. Explore popular AI models, experiment in a chat-style playground, and make your first API call with no setup required.
- [Tool Assisted API](https://friendli.ai/docs/guides/model-apis/tool-assisted-api.md): Use the Friendli Tool Assisted API to extend chat completions with built-in tools. Models can call web search and other tools automatically.
- [Multi‑modality](https://friendli.ai/docs/guides/multi-modality.md): Process text, images, audio, and video with Friendli multimodal APIs. Includes vision, transcription, and image generation endpoint guides.
- [OpenAI Compatibility](https://friendli.ai/docs/guides/openai-compatibility.md): Use official OpenAI Python and Node.js SDKs with Friendli endpoints. Migrate existing OpenAI applications by changing the base URL and API Key.
- [Friendli Documentation](https://friendli.ai/docs/guides/overview.md): FriendliAI documentation hub. Explore Model APIs, Dedicated Endpoints, and Container products with quickstart guides and API references.
- [Reasoning](https://friendli.ai/docs/guides/reasoning.md): Enable model-agnostic reasoning on Friendli endpoints. Extract chain-of-thought traces from any supported model without writing custom parsers.
- [Structured Outputs](https://friendli.ai/docs/guides/structured-outputs.md): Generate JSON outputs conforming to a schema using Friendli Structured Outputs. Works on all chat-capable models with response_format support.
- [Account Suspension](https://friendli.ai/docs/guides/suite/account-suspension.md): Find out why your Friendli Suite account was suspended, what access is restricted, and the steps to resolve billing or policy-related suspensions.
- [Billing & Payments](https://friendli.ai/docs/guides/suite/billing-payments.md): Understand Friendli Suite billing cycles, manage payment methods, view invoices, and learn how credits are applied before pay-as-you-go charges.
- [Credits](https://friendli.ai/docs/guides/suite/credits.md): Learn about Friendli Suite credit types, including promotional and purchased credits, their consumption order, expiration rules, and how to redeem promo codes.
- [Enterprise plan](https://friendli.ai/docs/guides/suite/enterprise-plan.md): FriendliAI's Enterprise plan with reserved GPUs, higher API rate limits, private deployments, and enterprise-grade security and support.
- [How to Redeem a Promo Code](https://friendli.ai/docs/guides/suite/how-to-redeem-promo-code.md): Redeem a FriendliAI promotional code to add credits to your Friendli Suite account. Follow the step-by-step instructions on the credits page.
- [Personal API Keys](https://friendli.ai/docs/guides/suite/personal-api-keys.md): Create and manage Personal API Keys in Friendli Suite for API authentication. Covers key generation steps and usage in API requests.
- [Supported Models](https://friendli.ai/docs/guides/supported-models.md): Browse the full list of AI models available on FriendliAI, including open-source LLMs, vision models, and audio models across all endpoints.
- [Tool Calling](https://friendli.ai/docs/guides/tool-calling.md): Use OpenAI-compatible tool calling on Friendli endpoints. Broad model support, strict schema enforcement, and parallel tool call examples.
- [Build an agent with Gradio](https://friendli.ai/docs/guides/tutorials/build-an-agent-with-gradio.md): Build and deploy an AI agent with Friendli Model APIs and Gradio in under 50 lines of Python. Includes tool calling and chat UI setup.
- [Build an agent with LangChain](https://friendli.ai/docs/guides/tutorials/build-an-agent-with-langchain.md): Create an AI agent using LangChain and Friendli Model APIs with tool calling. Step-by-step tutorial with code examples in Python.
- [Chat docs with LangChain](https://friendli.ai/docs/guides/tutorials/chat-docs-with-langchain.md): Build a document chatbot with LangChain and Friendli using retrieval-augmented generation. Full RAG pipeline tutorial with embeddings and vector search.
- [Chat docs with MongoDB](https://friendli.ai/docs/guides/tutorials/chat-docs-with-mongodb.md): Build a RAG chatbot with Friendli, MongoDB Atlas, and LangChain. Store document embeddings in a vector database and generate context-aware answers.
- [Getting Started with EXAONE 4.0](https://friendli.ai/docs/guides/tutorials/getting-started-with-exaone-4.0.md): Deploy and run LG AI Research's EXAONE 4.0 models on FriendliAI Dedicated Endpoints. Covers authentication, inference, reasoning, and optimization.
- [Getting Started with Nemotron 3](https://friendli.ai/docs/guides/tutorials/getting-started-with-nemotron-3.md): Deploy and run NVIDIA Nemotron 3 on FriendliAI Dedicated Endpoints. Includes setup, API usage, chat completions, and performance benchmarks.
- [Go Playground with Next.js](https://friendli.ai/docs/guides/tutorials/go-playground-with-nextjs.md): Build a Go game AI playground using Next.js and the Vercel AI SDK with Friendli Model APIs. Full-stack tutorial with streaming responses.
- [RAG app with LlamaIndex](https://friendli.ai/docs/guides/tutorials/rag-app-with-llamaindex.md): Build a retrieval-augmented generation app with LlamaIndex and Friendli Engine. Index documents, query with semantic search, and generate answers.
- [Tool calling with Model APIs](https://friendli.ai/docs/guides/tutorials/tool-calling-with-model-apis.md): Implement tool calling with Friendli Model APIs. Tutorial covers defining tools, handling function calls, and building multi-turn agent loops.
- [Deploy from W&B Registry with Webhook](https://friendli.ai/docs/guides/tutorials/wandb-registry-with-dedicated-endpoints.md): Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Weights & Biases artifacts through webhook automation.
- [Use Claude Code with FriendliAI](https://friendli.ai/docs/integrate/agents/claude-code.md): Configure Claude Code to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use Cline with FriendliAI](https://friendli.ai/docs/integrate/agents/cline.md): Configure Cline to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use Cursor with FriendliAI](https://friendli.ai/docs/integrate/agents/cursor.md): Configure Cursor to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use Hermes Agent with FriendliAI](https://friendli.ai/docs/integrate/agents/hermes-agent.md): Configure Hermes Agent to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use Kilo Code with FriendliAI](https://friendli.ai/docs/integrate/agents/kilo-code.md): Configure Kilo Code to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use OpenClaw with FriendliAI](https://friendli.ai/docs/integrate/agents/openclaw.md): Configure OpenClaw to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use OpenCode with FriendliAI](https://friendli.ai/docs/integrate/agents/opencode.md): Configure OpenCode to use FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [Use Your Agent with FriendliAI](https://friendli.ai/docs/integrate/agents/overview.md): Set up your favorite coding agent with FriendliAI to run fast, cost-efficient, and reliable open-source models.
- [Build Fast with FriendliAI](https://friendli.ai/docs/integrate/overview.md): Connect your favorite coding agents and SDKs to FriendliAI for fast, cost-efficient, and reliable open-source model inference.
- [LangChain Node.js SDK](https://friendli.ai/docs/integrate/sdks/langchain/nodejs.md): Utilize the LangChain Node.js SDK with FriendliAI for seamless integration and enhanced tool calling capabilities in your applications.
- [LangChain Python SDK](https://friendli.ai/docs/integrate/sdks/langchain/python.md): Integrate FriendliAI with LangChain Python SDK. Use ChatOpenAI with tool calling and connect to Model APIs or Dedicated Endpoints.
- [Linkup](https://friendli.ai/docs/integrate/sdks/linkup.md): Find and access high-quality web content using the Linkup API, integrated with Friendli Model APIs for seamless interaction.
- [LiteLLM](https://friendli.ai/docs/integrate/sdks/litellm.md): Use LiteLLM with FriendliAI to call Model APIs, Dedicated Endpoints, and fine-tuned endpoints. Includes setup, model selection, and streaming examples.
- [LlamaIndex](https://friendli.ai/docs/integrate/sdks/llamaindex.md): Integrate FriendliAI with LlamaIndex. Use the Friendli LLM class for chat completions and text completions with sync, async, and streaming support.
- [OpenAI Node.js SDK](https://friendli.ai/docs/integrate/sdks/openai/nodejs.md): Use the OpenAI Node.js SDK with FriendliAI endpoints. Migrate existing Node.js apps by changing the base URL. Covers chat, streaming, and tool calls.
- [OpenAI Python SDK](https://friendli.ai/docs/integrate/sdks/openai/python.md): Use the OpenAI Python SDK with FriendliAI endpoints. Migrate existing Python apps by changing the base URL. Covers chat, streaming, and tool calling.
- [Integrate Your SDK with FriendliAI](https://friendli.ai/docs/integrate/sdks/overview.md): Browse FriendliAI SDK integrations for OpenAI, LangChain, LlamaIndex, LiteLLM, Vercel AI SDK, and Weaviate. Pick the SDK that fits your stack.
- [Vercel AI SDK](https://friendli.ai/docs/integrate/sdks/vercel-ai.md): Use the Vercel AI SDK with FriendliAI for streaming chat UIs in Next.js and React. Connect to Model APIs or Dedicated Endpoints with minimal setup.
- [FriendliAI + Weaviate (Node.js)](https://friendli.ai/docs/integrate/sdks/weaviate/nodejs.md): Build RAG apps with FriendliAI and Weaviate in Node.js. Combine vector search with Friendli Engine inference to reduce hallucinations in responses.
- [FriendliAI + Weaviate (Python)](https://friendli.ai/docs/integrate/sdks/weaviate/python.md): Build RAG apps with FriendliAI and Weaviate in Python. Combine vector search with Friendli Engine inference for context-aware, grounded responses.
- [Container audio transcriptions](https://friendli.ai/docs/openapi/container/audio-transcriptions.md): Transcribe audio files to text using Friendli Container. Run speech-to-text models locally on your own GPU hardware with full data privacy.
- [Container chat completions](https://friendli.ai/docs/openapi/container/chat-completions.md): Send a conversation to Friendli Container and receive a chat completion response. Supports streaming, tool calls, and custom model parameters.
- [Container chat completions chunk object](https://friendli.ai/docs/openapi/container/chat-completions-chunk-object.md): Schema reference for the streamed chat completions chunk object returned by Friendli Container when streaming is enabled on the local endpoint.
- [Container completions](https://friendli.ai/docs/openapi/container/completions.md): Generate text completions from a prompt using Friendli Container. Run on your own hardware with full control over streaming and generation settings.
- [Container completions chunk object](https://friendli.ai/docs/openapi/container/completions-chunk-object.md): Schema reference for the streamed completions chunk object returned by Friendli Container when using the completions API with streaming enabled.
- [Container detokenization](https://friendli.ai/docs/openapi/container/detokenization.md): Convert token IDs back to text using Friendli Container. Decode tokenized model output into readable strings on your own infrastructure.
- [Container image edits](https://friendli.ai/docs/openapi/container/image-edits.md): Edit images with text prompts using Friendli Container. Upload an image and describe desired modifications to the self-hosted model on your GPUs.
- [Container image generations](https://friendli.ai/docs/openapi/container/image-generations.md): Generate images from text descriptions using Friendli Container. Run image generation models locally with configurable size and output parameters.
- [Container messages](https://friendli.ai/docs/openapi/container/messages.md): Use the Anthropic Messages-style API with Friendli Container. Send structured message payloads to your self-hosted model and receive responses.
- [Container messages chunk object](https://friendli.ai/docs/openapi/container/messages-chunk-object.md): Schema reference for the streamed messages chunk object returned by Friendli Container when using the Messages API with streaming enabled.
- [Container overview](https://friendli.ai/docs/openapi/container/overview.md): API reference for Friendli Container. Browse self-hosted inference endpoints for chat, completions, tokenization, images, and audio transcription.
- [Container text classification](https://friendli.ai/docs/openapi/container/text-classification.md): Classify text into categories using Friendli Container. Run text classification models on your own infrastructure with full data control and privacy.
- [Container tokenization](https://friendli.ai/docs/openapi/container/tokenization.md): Tokenize text into token IDs using Friendli Container. Run tokenization locally on your own infrastructure for pre-processing and token counting.
- [Add samples](https://friendli.ai/docs/openapi/dataset/add-samples.md): Add new data samples to an existing Friendli dataset split via the API. Append training or evaluation data entries to your dataset programmatically.
- [Create a new dataset](https://friendli.ai/docs/openapi/dataset/create-a-new-dataset.md): Create a new dataset in Friendli Suite via the API. Initialize a dataset resource to organize training and evaluation data with splits and versions.
- [Create a new split](https://friendli.ai/docs/openapi/dataset/create-a-split.md): Create a new split within a Friendli dataset via the API. Use splits to organize data into training, validation, and test partitions.
- [Create a new version](https://friendli.ai/docs/openapi/dataset/create-a-version.md): Create a new version of a Friendli dataset via the API. Snapshot your current dataset state to track changes and enable reproducible training runs.
- [Delete a version](https://friendli.ai/docs/openapi/dataset/delete-a-version.md): Delete a specific version of a Friendli dataset by ID via the API. Permanently removes the version snapshot and its associated metadata.
- [Delete a dataset](https://friendli.ai/docs/openapi/dataset/delete-dataset.md): Permanently delete a Friendli dataset by ID via the API. Removes the dataset along with all its splits, versions, and sample data.
- [Delete samples](https://friendli.ai/docs/openapi/dataset/delete-samples.md): Delete specific data samples from a Friendli dataset split via the API. Remove individual entries by sample ID from your training or evaluation data.
- [Delete a split](https://friendli.ai/docs/openapi/dataset/delete-split.md): Delete a split from a Friendli dataset by ID via the API. Permanently removes the split and all sample data it contains from the dataset.
- [Get dataset info](https://friendli.ai/docs/openapi/dataset/get-dataset-info.md): Retrieve metadata for a Friendli dataset by ID via the API. Returns the dataset name, modality, and creation and update timestamps.
- [Get split info](https://friendli.ai/docs/openapi/dataset/get-split-info.md): Retrieve metadata for a specific split within a Friendli dataset via the API. Returns the split name, sample count, and creation timestamp.
- [Get version info](https://friendli.ai/docs/openapi/dataset/get-version-info.md): Retrieve metadata for a specific version of a Friendli dataset via the API. Returns the version tag, creation date, and snapshot details.
- [List datasets](https://friendli.ai/docs/openapi/dataset/list-datasets.md): List all datasets in your Friendli Suite project via the API. Returns dataset IDs, names, split counts, and creation timestamps for each entry.
- [List samples](https://friendli.ai/docs/openapi/dataset/list-samples.md): List data samples in a Friendli dataset split via the API. Paginate through training or evaluation entries with sample IDs and content previews.
- [List splits](https://friendli.ai/docs/openapi/dataset/list-splits.md): List all splits within a Friendli dataset via the API. Returns split IDs, names, and sample counts for training, validation, and test partitions.
- [List versions](https://friendli.ai/docs/openapi/dataset/list-versions.md): List all versions of a Friendli dataset via the API. Returns version IDs, tags, and creation timestamps to track your dataset's revision history.
- [Dataset overview](https://friendli.ai/docs/openapi/dataset/overview.md): API reference for Friendli Dataset API. Create, manage, and version datasets with splits and samples for model training and evaluation.
- [Update samples](https://friendli.ai/docs/openapi/dataset/update-samples.md): Update existing data samples in a Friendli dataset split via the API. Modify sample content by ID without deleting and re-adding entries.
- [Dedicated create endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/create.md): Create a Friendli Dedicated Endpoint deployment for a Hugging Face model via the API. Specify GPU type, replica count, and model configuration.
- [Dedicated delete endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/delete.md): Permanently delete a Friendli Dedicated Endpoint deployment by ID. This stops the endpoint and releases all associated GPU resources immediately.
- [Dedicated get endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/get-spec.md): Retrieve the full specification of a Friendli Dedicated Endpoint by ID, including model config, GPU type, replica count, and deployment settings.
- [Dedicated get endpoint status](https://friendli.ai/docs/openapi/dedicated/endpoint/get-status.md): Check the current status of a Friendli Dedicated Endpoint by ID. Returns the lifecycle state such as running, sleeping, initializing, or terminated.
- [Dedicated get endpoint version](https://friendli.ai/docs/openapi/dedicated/endpoint/get-version.md): Retrieve the version history of a Friendli Dedicated Endpoint by ID. View past configurations and rollback points for deployment tracking.
- [Dedicated list endpoints](https://friendli.ai/docs/openapi/dedicated/endpoint/list.md): List all Friendli Dedicated Endpoint deployments in your project. Returns endpoint IDs, statuses, model names, and GPU configurations.
- [Dedicated restart endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/restart.md): Restart a failed or terminated Friendli Dedicated Endpoint by ID. The endpoint re-initializes with the same model and GPU configuration.
- [Dedicated sleep endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/sleep.md): Put a Friendli Dedicated Endpoint into sleep mode by ID. The endpoint stops serving but retains its configuration for quick wake-up later.
- [Dedicated terminate endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/terminate.md): Terminate a running Friendli Dedicated Endpoint by ID. Stops all inference and releases GPU resources while preserving the endpoint configuration.
- [Dedicated update endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/update.md): Update a Friendli Dedicated Endpoint with a new model, GPU type, or replica count. Changes are applied as a new version in the deployment history.
- [Dedicated wake endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/wake.md): Wake up a sleeping Friendli Dedicated Endpoint by ID. The endpoint resumes serving with its previous model and GPU configuration intact.
- [Dedicated create endpoint from W&B artifact](https://friendli.ai/docs/openapi/dedicated/endpoint/wandb-artifact-create.md): Create a Friendli Dedicated Endpoint from a Weights & Biases artifact via the API. Deploy W&B-managed models directly to dedicated GPU hardware.
- [Dedicated audio transcriptions](https://friendli.ai/docs/openapi/dedicated/inference/audio-transcriptions.md): Transcribe audio files to text using your Friendli Dedicated Endpoint. Upload an audio file and receive a text transcription from the deployed model.
- [Dedicated chat completions](https://friendli.ai/docs/openapi/dedicated/inference/chat-completions.md): Send a conversation to your Friendli Dedicated Endpoint and receive a chat completion response. Supports streaming, tool calls, and JSON mode.
- [Dedicated chat completions chunk object](https://friendli.ai/docs/openapi/dedicated/inference/chat-completions-chunk-object.md): Schema reference for the streamed chat completions chunk object returned by Friendli Dedicated Endpoints when streaming is enabled.
- [Dedicated chat render](https://friendli.ai/docs/openapi/dedicated/inference/chat-render.md): Preview the final prompt text that your Friendli Dedicated Endpoint will send to the model. Useful for debugging chat templates and token counts.
- [Dedicated completions](https://friendli.ai/docs/openapi/dedicated/inference/completions.md): Generate text completions from a prompt using your Friendli Dedicated Endpoint. Supports streaming, token limits, temperature, and stop sequences.
- [Dedicated completions chunk object](https://friendli.ai/docs/openapi/dedicated/inference/completions-chunk-object.md): Schema reference for the streamed completions chunk object returned by Friendli Dedicated Endpoints when using the completions streaming API.
- [Dedicated detokenization](https://friendli.ai/docs/openapi/dedicated/inference/detokenization.md): Convert token IDs back to text using your Friendli Dedicated Endpoint. Decode tokenized output into a human-readable string for analysis.
- [Dedicated embeddings](https://friendli.ai/docs/openapi/dedicated/inference/embeddings.md): Generate text embedding vectors using your Friendli Dedicated Endpoint. Convert text into dense vector representations for search and similarity.
- [Dedicated image edits](https://friendli.ai/docs/openapi/dedicated/inference/image-edits.md): Edit images with text prompts using your Friendli Dedicated Endpoint. Upload an image and describe the desired modifications for the model to apply.
- [Dedicated image generations](https://friendli.ai/docs/openapi/dedicated/inference/image-generations.md): Generate images from text descriptions using your Friendli Dedicated Endpoint. Supports configurable image size, count, and generation parameters.
- [Dedicated messages](https://friendli.ai/docs/openapi/dedicated/inference/messages.md): Use the Anthropic Messages-style API on your Friendli Dedicated Endpoint. Send structured message payloads and receive assistant responses.
- [Dedicated messages chunk object](https://friendli.ai/docs/openapi/dedicated/inference/messages-chunk-object.md): Schema reference for the streamed messages chunk object returned by Friendli Dedicated Endpoints when using the Messages API with streaming.
- [Dedicated text classification](https://friendli.ai/docs/openapi/dedicated/inference/text-classification.md): Classify text into categories using your Friendli Dedicated Endpoint. Send text input and receive predicted labels with per-class probabilities.
- [Dedicated tokenization](https://friendli.ai/docs/openapi/dedicated/inference/tokenization.md): Tokenize text into token IDs using your Friendli Dedicated Endpoint. Useful for counting tokens and validating input length before inference.
- [Dedicated overview](https://friendli.ai/docs/openapi/dedicated/overview.md): API reference for Friendli Dedicated Endpoints. Browse inference, endpoint management, chat completions, embeddings, and image generation endpoints.
- [Complete file upload](https://friendli.ai/docs/openapi/file/complete-file-upload.md): Finalize a multipart file upload to Friendli Suite via the API. Call this endpoint after all file parts have been uploaded to mark the file as ready.
- [Get file download URL](https://friendli.ai/docs/openapi/file/get-file-download-url.md): Get a pre-signed download URL for a file stored in Friendli Suite via the API. Use the returned URL to download the file with standard HTTP clients.
- [Get file info](https://friendli.ai/docs/openapi/file/get-file-info.md): Retrieve metadata for a file stored in Friendli Suite by file ID via the API. Returns filename, size, upload status, and creation timestamp.
- [Initiate file upload](https://friendli.ai/docs/openapi/file/init-file-upload.md): Start a new file upload to Friendli Suite via the API. Returns a file ID and pre-signed upload URL to begin transferring your file data.
- [File overview](https://friendli.ai/docs/openapi/file/overview.md): API reference for Friendli File API. Upload, download, and manage files used for model training and dataset preparation on Friendli Suite.
- [API Reference](https://friendli.ai/docs/openapi/introduction.md): Complete API reference for Friendli Suite. Explore endpoints for Model APIs, Dedicated Endpoints, Container, File, and Dataset APIs with HTTP examples.
- [Model APIs audio transcriptions](https://friendli.ai/docs/openapi/model-apis/audio-transcriptions.md): Transcribe audio files to text using Friendli Model APIs. Supports multiple audio formats with streaming and non-streaming responses.
- [Model APIs audio transcriptions chunk object](https://friendli.ai/docs/openapi/model-apis/audio-transcriptions-chunk-object.md): Schema reference for the streamed audio transcription chunk object returned by Friendli Model APIs during real-time transcription.
- [Model APIs chat completions](https://friendli.ai/docs/openapi/model-apis/chat-completions.md): Send a conversation to Friendli Model APIs and receive a chat completion response. Supports streaming, tool calls, and JSON mode.
- [Model APIs chat completions chunk object](https://friendli.ai/docs/openapi/model-apis/chat-completions-chunk-object.md): Schema reference for the streamed chat completions chunk object returned by Friendli Model APIs when streaming is enabled.
- [Model APIs chat render](https://friendli.ai/docs/openapi/model-apis/chat-render.md): Preview the final prompt text that Friendli Model APIs will send to the model. Useful for debugging chat templates and token usage.
- [Model APIs completions](https://friendli.ai/docs/openapi/model-apis/completions.md): Generate text completions from a prompt using Friendli Model APIs. Supports streaming, token limits, temperature, and stop sequences.
- [Model APIs completions chunk object](https://friendli.ai/docs/openapi/model-apis/completions-chunk-object.md): Schema reference for the streamed completions chunk object returned by Friendli Model APIs when using the completions streaming API.
- [Model APIs detokenization](https://friendli.ai/docs/openapi/model-apis/detokenization.md): Convert token IDs back to text using Friendli Model APIs. Decode tokenized output into a human-readable string for post-processing.
- [Model APIs messages](https://friendli.ai/docs/openapi/model-apis/messages.md): Use the Anthropic Messages-style API on Friendli Model APIs. Send structured message payloads and receive assistant responses.
- [Model APIs messages chunk object](https://friendli.ai/docs/openapi/model-apis/messages-chunk-object.md): Schema reference for the streamed messages chunk object returned by Friendli Model APIs when using the Messages API with streaming.
- [Model APIs overview](https://friendli.ai/docs/openapi/model-apis/overview.md): API reference for Friendli Model APIs. Browse chat completions, completions, tokenization, messages, and audio transcription endpoints.
- [Model APIs tokenization](https://friendli.ai/docs/openapi/model-apis/tokenization.md): Tokenize text into token IDs using Friendli Model APIs. Useful for counting tokens before sending inference requests to the model.
- [Model APIs tool assisted chat completions](https://friendli.ai/docs/openapi/model-apis/tool-assisted-chat-completions.md): Chat completions with built-in tool calling on Friendli Model APIs. The model automatically invokes tools like web search during generation.
- [Model APIs tool assisted chat completions chunk object](https://friendli.ai/docs/openapi/model-apis/tool-assisted-chat-completions-chunk-object.md): Schema reference for the streamed tool-assisted chat completions chunk object from Friendli Model APIs during tool-augmented generation.
- [Friendli Python SDK](https://friendli.ai/docs/sdk/python-sdk.md): Install and use the Friendli Python SDK to access Model APIs, Dedicated Endpoints, and Container. Covers chat, completions, streaming, and async usage.

## OpenAPI Specs

- [openapi](https://friendli.ai/docs/openapi.yaml)

## Optional

- [Website](https://friendli.ai)
- [Blog](https://friendli.ai/blog)
- [Models](https://friendli.ai/models)