# Friendli Docs > Let your team focus on building great AI products. FriendliAI will make sure your AI runs fast, affordable, and reliable at scale. Leave the hassle of AI inference to FriendliAI. ## Docs - [Changelog](https://friendli.ai/docs/changelog.md): Track the latest FriendliAI product updates, new model releases, pricing changes, deprecations, and feature announcements across all endpoints. - [CUDA Compatibility](https://friendli.ai/docs/guides/container/cuda-compatibility.md): The Friendli Engine supports CUDA-enabled NVIDIA GPUs, which means it relies on a specific version of CUDA and necessitates proper CUDA compute compatibilities. - [Deploy Friendli Container as Amazon EKS Add-on](https://friendli.ai/docs/guides/container/eks-quickstart.md): Deploy Friendli Container on Amazon EKS using the official AWS EKS Add-On. Set up GPU nodes, install the add-on, and run model inference. - [Inference with gRPC](https://friendli.ai/docs/guides/container/inference-with-grpc.md): Run a gRPC inference server with Friendli Container and send requests using the Friendli Python SDK. Includes setup, configuration, and code examples. - [Introducing Friendli Container](https://friendli.ai/docs/guides/container/introduction.md): Deploy generative AI models on your own infrastructure with Friendli Container. Full control over GPU resources, networking, and scaling. - [Observability for Friendli Container](https://friendli.ai/docs/guides/container/monitoring.md): Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format. - [Optimizing Inference with Policy Search](https://friendli.ai/docs/guides/container/optimizing-inference-with-policy-search.md): Boost inference throughput by up to 2x for MoE and quantized models by running execution policy search in Friendli Container for production. - [Quantization](https://friendli.ai/docs/guides/container/quantization.md): Learn how to serve pre-quantized models or perform online quantization with Friendli Container to reduce memory and speed up inference. - [QuickStart: Friendli Container Trial](https://friendli.ai/docs/guides/container/quickstart.md): Get started with Friendli Container trial. Access the registry, configure your secret, launch the container, and monitor with Grafana. - [Running Friendli Container](https://friendli.ai/docs/guides/container/running-friendli-container.md): Step-by-step guide to running Friendli Container on your own machine. Configure environment variables, select GPU resources, and deploy models. - [Running Friendli Container on SageMaker](https://friendli.ai/docs/guides/container/sagemaker-integration.md): Create a real-time inference endpoint in Amazon SageMaker with Friendli Container. Leverage Friendli Engine for faster, cost-efficient serving. - [Serving MoE Models](https://friendli.ai/docs/guides/container/serving-moe-models.md): Serve Mixture of Experts (MoE) models like Mixtral 8x7B with Friendli Container. Covers policy search setup and multi-GPU Docker configuration. - [Serving Multi-LoRA Models](https://friendli.ai/docs/guides/container/serving-multi-lora-models.md): Serve multiple LoRA-adapted LLMs simultaneously with Friendli Container without additional GPU resources. No retraining needed for task-specific models. - [Data Privacy & Security](https://friendli.ai/docs/guides/data-handling.md): Learn how FriendliAI handles your data. Inference requests and responses are never used for training or shared with third parties. - [Autoscaling](https://friendli.ai/docs/guides/dedicated-endpoints/autoscaling.md): Configure autoscaling for Friendli Dedicated Endpoints to automatically adjust GPU replicas based on traffic and latency thresholds. - [Dataset Specifications and Upload Guide](https://friendli.ai/docs/guides/dedicated-endpoints/dataset.md): Upload and manage datasets for Friendli Dedicated Endpoints. Covers supported formats, size limits, splits, versioning, and the upload process. - [Deploy with Hugging Face Models](https://friendli.ai/docs/guides/dedicated-endpoints/deploy-with-huggingface.md): Deploy Hugging Face models on Friendli Dedicated Endpoints. Step-by-step tutorial covering model selection, endpoint creation, and first inference call. - [Deploy with W&B Models](https://friendli.ai/docs/guides/dedicated-endpoints/deploy-with-wandb.md): Deploy models from Weights & Biases artifacts on Friendli Dedicated Endpoints. Covers linking W&B projects, selecting artifacts, and running inference. - [Endpoints](https://friendli.ai/docs/guides/dedicated-endpoints/endpoints.md): Manage Friendli Dedicated Endpoints deployments. Learn about endpoint lifecycle, GPU allocation, status monitoring, and configuration options. - [Dedicated Endpoints FAQ and Troubleshooting](https://friendli.ai/docs/guides/dedicated-endpoints/faq.md): Answers to common questions about Friendli Dedicated Endpoints, including model compatibility, GPU requirements, billing, and troubleshooting tips. - [Introducing Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated-endpoints/introduction.md): Run custom or open-source generative AI models on dedicated GPU hardware with Friendli Dedicated Endpoints. No shared resources or infra management. - [Serving LoRA Models](https://friendli.ai/docs/guides/dedicated-endpoints/lora-models.md): Learn how to deploy LoRA models from Hugging Face Hub to Friendli Dedicated Endpoints for efficient inference, including a quick guide for FLUX LoRA models. - [Models](https://friendli.ai/docs/guides/dedicated-endpoints/models.md): Manage models for Friendli Dedicated Endpoints. Upload directly, or load from Hugging Face and Weights & Biases artifact repositories. - [Multi-LoRA Serving](https://friendli.ai/docs/guides/dedicated-endpoints/multi-lora-serving.md): Enable Multi-LoRA serving on Friendli Dedicated Endpoints to run multiple LoRA adapters on a single base model without extra GPU resources. - [Online Quantization](https://friendli.ai/docs/guides/dedicated-endpoints/online-quantization.md): Automatically quantize models to 4-bit or 8-bit precision at deploy time on Friendli Dedicated Endpoints. No pre-quantized checkpoint needed. - [Pricing & Billing](https://friendli.ai/docs/guides/dedicated-endpoints/pricing.md): View Friendli Dedicated Endpoints pricing by GPU type. Covers supported instance types, per-second billing, and how autoscaling affects costs. - [QuickStart: Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated-endpoints/quickstart.md): Get started with Friendli Dedicated Endpoints. Create a project, pick a model, deploy an endpoint, and generate your first inference response. - [Speculative Decoding](https://friendli.ai/docs/guides/dedicated-endpoints/speculative-decoding.md): Speed up LLM inference on Friendli Dedicated Endpoints with speculative decoding using proprietary draft models and N-gram token prediction. - [Versioning](https://friendli.ai/docs/guides/dedicated-endpoints/versioning.md): Use endpoint versioning on Friendli Dedicated Endpoints to track deployment history, roll back to previous configurations, and update without downtime. - [Integrations](https://friendli.ai/docs/guides/model-apis/integrations.md): Integrate Friendli Model APIs with LangChain, LiteLLM, LlamaIndex, and MongoDB for RAG, tool calling agents, and load balancing. - [Introducing Friendli Model APIs](https://friendli.ai/docs/guides/model-apis/introduction.md): Get started with Friendli Model APIs to access popular AI models via API. No infrastructure setup or GPU management required. - [Pricing & Billing](https://friendli.ai/docs/guides/model-apis/pricing.md): View Friendli Model APIs pricing per model. Compare token-based, time-based, and audio-based rates across usage tiers and free models. - [QuickStart: Friendli Model APIs](https://friendli.ai/docs/guides/model-apis/quickstart.md): Get started with Friendli Model APIs in minutes. Explore popular AI models, experiment in a chat-style playground, and make your first API call with no setup required. - [Tool Assisted API](https://friendli.ai/docs/guides/model-apis/tool-assisted-api.md): Use the Friendli Tool Assisted API to extend chat completions with built-in tools. Models can call web search and other tools automatically. - [Multi‑modality](https://friendli.ai/docs/guides/multi-modality.md): Process text, images, audio, and video with Friendli multimodal APIs. Includes vision, transcription, and image generation endpoint guides. - [OpenAI Compatibility](https://friendli.ai/docs/guides/openai-compatibility.md): Use official OpenAI Python and Node.js SDKs with Friendli endpoints. Migrate existing OpenAI applications by changing the base URL and API Key. - [Friendli Documentation](https://friendli.ai/docs/guides/overview.md): FriendliAI documentation hub. Explore Model APIs, Dedicated Endpoints, and Container products with quickstart guides and API references. - [Reasoning](https://friendli.ai/docs/guides/reasoning.md): Enable model-agnostic reasoning on Friendli endpoints. Extract chain-of-thought traces from any supported model without writing custom parsers. - [Structured Outputs](https://friendli.ai/docs/guides/structured-outputs.md): Generate JSON outputs conforming to a schema using Friendli Structured Outputs. Works on all chat-capable models with response_format support. - [Account Suspension](https://friendli.ai/docs/guides/suite/account-suspension.md): Find out why your Friendli Suite account was suspended, what access is restricted, and the steps to resolve billing or policy-related suspensions. - [Billing & Payments](https://friendli.ai/docs/guides/suite/billing-payments.md): Understand Friendli Suite billing cycles, manage payment methods, view invoices, and learn how credits are applied before pay-as-you-go charges. - [Credits](https://friendli.ai/docs/guides/suite/credits.md): Learn about Friendli Suite credit types, including promotional and purchased credits, their consumption order, expiration rules, and how to redeem promo codes. - [Enterprise plan](https://friendli.ai/docs/guides/suite/enterprise-plan.md): FriendliAI's Enterprise plan with reserved GPUs, higher API rate limits, private deployments, and enterprise-grade security and support. - [How to Redeem a Promo Code](https://friendli.ai/docs/guides/suite/how-to-redeem-promo-code.md): Redeem a FriendliAI promotional code to add credits to your Friendli Suite account. Follow the step-by-step instructions on the credits page. - [Personal API Keys](https://friendli.ai/docs/guides/suite/personal-api-keys.md): Create and manage Personal API Keys in Friendli Suite for API authentication. Covers key generation steps and usage in API requests. - [Supported Models](https://friendli.ai/docs/guides/supported-models.md): Browse the full list of AI models available on FriendliAI, including open-source LLMs, vision models, and audio models across all endpoints. - [Tool Calling](https://friendli.ai/docs/guides/tool-calling.md): Use OpenAI-compatible tool calling on Friendli endpoints. Broad model support, strict schema enforcement, and parallel tool call examples. - [Build an agent with Gradio](https://friendli.ai/docs/guides/tutorials/build-an-agent-with-gradio.md): Build and deploy an AI agent with Friendli Model APIs and Gradio in under 50 lines of Python. Includes tool calling and chat UI setup. - [Build an agent with LangChain](https://friendli.ai/docs/guides/tutorials/build-an-agent-with-langchain.md): Create an AI agent using LangChain and Friendli Model APIs with tool calling. Step-by-step tutorial with code examples in Python. - [Chat docs with LangChain](https://friendli.ai/docs/guides/tutorials/chat-docs-with-langchain.md): Build a document chatbot with LangChain and Friendli using retrieval-augmented generation. Full RAG pipeline tutorial with embeddings and vector search. - [Chat docs with MongoDB](https://friendli.ai/docs/guides/tutorials/chat-docs-with-mongodb.md): Build a RAG chatbot with Friendli, MongoDB Atlas, and LangChain. Store document embeddings in a vector database and generate context-aware answers. - [Getting Started with EXAONE 4.0](https://friendli.ai/docs/guides/tutorials/getting-started-with-exaone-4.0.md): Deploy and run LG AI Research's EXAONE 4.0 models on FriendliAI Dedicated Endpoints. Covers authentication, inference, reasoning, and optimization. - [Getting Started with Nemotron 3](https://friendli.ai/docs/guides/tutorials/getting-started-with-nemotron-3.md): Deploy and run NVIDIA Nemotron 3 on FriendliAI Dedicated Endpoints. Includes setup, API usage, chat completions, and performance benchmarks. - [Go Playground with Next.js](https://friendli.ai/docs/guides/tutorials/go-playground-with-nextjs.md): Build a Go game AI playground using Next.js and the Vercel AI SDK with Friendli Model APIs. Full-stack tutorial with streaming responses. - [RAG app with LlamaIndex](https://friendli.ai/docs/guides/tutorials/rag-app-with-llamaindex.md): Build a retrieval-augmented generation app with LlamaIndex and Friendli Engine. Index documents, query with semantic search, and generate answers. - [Tool calling with Model APIs](https://friendli.ai/docs/guides/tutorials/tool-calling-with-model-apis.md): Implement tool calling with Friendli Model APIs. Tutorial covers defining tools, handling function calls, and building multi-turn agent loops. - [Deploy from W&B Registry with Webhook](https://friendli.ai/docs/guides/tutorials/wandb-registry-with-dedicated-endpoints.md): Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Weights & Biases artifacts through webhook automation. - [Use Claude Code with FriendliAI](https://friendli.ai/docs/integrate/agents/claude-code.md): Configure Claude Code to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use Cline with FriendliAI](https://friendli.ai/docs/integrate/agents/cline.md): Configure Cline to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use Cursor with FriendliAI](https://friendli.ai/docs/integrate/agents/cursor.md): Configure Cursor to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use Hermes Agent with FriendliAI](https://friendli.ai/docs/integrate/agents/hermes-agent.md): Configure Hermes Agent to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use Kilo Code with FriendliAI](https://friendli.ai/docs/integrate/agents/kilo-code.md): Configure Kilo Code to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use OpenClaw with FriendliAI](https://friendli.ai/docs/integrate/agents/openclaw.md): Configure OpenClaw to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use OpenCode with FriendliAI](https://friendli.ai/docs/integrate/agents/opencode.md): Configure OpenCode to use FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [Use Your Agent with FriendliAI](https://friendli.ai/docs/integrate/agents/overview.md): Set up your favorite coding agent with FriendliAI to run fast, cost-efficient, and reliable open-source models. - [Build Fast with FriendliAI](https://friendli.ai/docs/integrate/overview.md): Connect your favorite coding agents and SDKs to FriendliAI for fast, cost-efficient, and reliable open-source model inference. - [LangChain Node.js SDK](https://friendli.ai/docs/integrate/sdks/langchain/nodejs.md): Utilize the LangChain Node.js SDK with FriendliAI for seamless integration and enhanced tool calling capabilities in your applications. - [LangChain Python SDK](https://friendli.ai/docs/integrate/sdks/langchain/python.md): Integrate FriendliAI with LangChain Python SDK. Use ChatOpenAI with tool calling and connect to Model APIs or Dedicated Endpoints. - [Linkup](https://friendli.ai/docs/integrate/sdks/linkup.md): Find and access high-quality web content using the Linkup API, integrated with Friendli Model APIs for seamless interaction. - [LiteLLM](https://friendli.ai/docs/integrate/sdks/litellm.md): Use LiteLLM with FriendliAI to call Model APIs, Dedicated Endpoints, and fine-tuned endpoints. Includes setup, model selection, and streaming examples. - [LlamaIndex](https://friendli.ai/docs/integrate/sdks/llamaindex.md): Integrate FriendliAI with LlamaIndex. Use the Friendli LLM class for chat completions and text completions with sync, async, and streaming support. - [OpenAI Node.js SDK](https://friendli.ai/docs/integrate/sdks/openai/nodejs.md): Use the OpenAI Node.js SDK with FriendliAI endpoints. Migrate existing Node.js apps by changing the base URL. Covers chat, streaming, and tool calls. - [OpenAI Python SDK](https://friendli.ai/docs/integrate/sdks/openai/python.md): Use the OpenAI Python SDK with FriendliAI endpoints. Migrate existing Python apps by changing the base URL. Covers chat, streaming, and tool calling. - [Integrate Your SDK with FriendliAI](https://friendli.ai/docs/integrate/sdks/overview.md): Browse FriendliAI SDK integrations for OpenAI, LangChain, LlamaIndex, LiteLLM, Vercel AI SDK, and Weaviate. Pick the SDK that fits your stack. - [Vercel AI SDK](https://friendli.ai/docs/integrate/sdks/vercel-ai.md): Use the Vercel AI SDK with FriendliAI for streaming chat UIs in Next.js and React. Connect to Model APIs or Dedicated Endpoints with minimal setup. - [FriendliAI + Weaviate (Node.js)](https://friendli.ai/docs/integrate/sdks/weaviate/nodejs.md): Build RAG apps with FriendliAI and Weaviate in Node.js. Combine vector search with Friendli Engine inference to reduce hallucinations in responses. - [FriendliAI + Weaviate (Python)](https://friendli.ai/docs/integrate/sdks/weaviate/python.md): Build RAG apps with FriendliAI and Weaviate in Python. Combine vector search with Friendli Engine inference for context-aware, grounded responses. - [Container audio transcriptions](https://friendli.ai/docs/openapi/container/audio-transcriptions.md): Transcribe audio files to text using Friendli Container. Run speech-to-text models locally on your own GPU hardware with full data privacy. - [Container chat completions](https://friendli.ai/docs/openapi/container/chat-completions.md): Send a conversation to Friendli Container and receive a chat completion response. Supports streaming, tool calls, and custom model parameters. - [Container chat completions chunk object](https://friendli.ai/docs/openapi/container/chat-completions-chunk-object.md): Schema reference for the streamed chat completions chunk object returned by Friendli Container when streaming is enabled on the local endpoint. - [Container completions](https://friendli.ai/docs/openapi/container/completions.md): Generate text completions from a prompt using Friendli Container. Run on your own hardware with full control over streaming and generation settings. - [Container completions chunk object](https://friendli.ai/docs/openapi/container/completions-chunk-object.md): Schema reference for the streamed completions chunk object returned by Friendli Container when using the completions API with streaming enabled. - [Container detokenization](https://friendli.ai/docs/openapi/container/detokenization.md): Convert token IDs back to text using Friendli Container. Decode tokenized model output into readable strings on your own infrastructure. - [Container image edits](https://friendli.ai/docs/openapi/container/image-edits.md): Edit images with text prompts using Friendli Container. Upload an image and describe desired modifications to the self-hosted model on your GPUs. - [Container image generations](https://friendli.ai/docs/openapi/container/image-generations.md): Generate images from text descriptions using Friendli Container. Run image generation models locally with configurable size and output parameters. - [Container messages](https://friendli.ai/docs/openapi/container/messages.md): Use the Anthropic Messages-style API with Friendli Container. Send structured message payloads to your self-hosted model and receive responses. - [Container messages chunk object](https://friendli.ai/docs/openapi/container/messages-chunk-object.md): Schema reference for the streamed messages chunk object returned by Friendli Container when using the Messages API with streaming enabled. - [Container overview](https://friendli.ai/docs/openapi/container/overview.md): API reference for Friendli Container. Browse self-hosted inference endpoints for chat, completions, tokenization, images, and audio transcription. - [Container text classification](https://friendli.ai/docs/openapi/container/text-classification.md): Classify text into categories using Friendli Container. Run text classification models on your own infrastructure with full data control and privacy. - [Container tokenization](https://friendli.ai/docs/openapi/container/tokenization.md): Tokenize text into token IDs using Friendli Container. Run tokenization locally on your own infrastructure for pre-processing and token counting. - [Add samples](https://friendli.ai/docs/openapi/dataset/add-samples.md): Add new data samples to an existing Friendli dataset split via the API. Append training or evaluation data entries to your dataset programmatically. - [Create a new dataset](https://friendli.ai/docs/openapi/dataset/create-a-new-dataset.md): Create a new dataset in Friendli Suite via the API. Initialize a dataset resource to organize training and evaluation data with splits and versions. - [Create a new split](https://friendli.ai/docs/openapi/dataset/create-a-split.md): Create a new split within a Friendli dataset via the API. Use splits to organize data into training, validation, and test partitions. - [Create a new version](https://friendli.ai/docs/openapi/dataset/create-a-version.md): Create a new version of a Friendli dataset via the API. Snapshot your current dataset state to track changes and enable reproducible training runs. - [Delete a version](https://friendli.ai/docs/openapi/dataset/delete-a-version.md): Delete a specific version of a Friendli dataset by ID via the API. Permanently removes the version snapshot and its associated metadata. - [Delete a dataset](https://friendli.ai/docs/openapi/dataset/delete-dataset.md): Permanently delete a Friendli dataset by ID via the API. Removes the dataset along with all its splits, versions, and sample data. - [Delete samples](https://friendli.ai/docs/openapi/dataset/delete-samples.md): Delete specific data samples from a Friendli dataset split via the API. Remove individual entries by sample ID from your training or evaluation data. - [Delete a split](https://friendli.ai/docs/openapi/dataset/delete-split.md): Delete a split from a Friendli dataset by ID via the API. Permanently removes the split and all sample data it contains from the dataset. - [Get dataset info](https://friendli.ai/docs/openapi/dataset/get-dataset-info.md): Retrieve metadata for a Friendli dataset by ID via the API. Returns the dataset name, modality, and creation and update timestamps. - [Get split info](https://friendli.ai/docs/openapi/dataset/get-split-info.md): Retrieve metadata for a specific split within a Friendli dataset via the API. Returns the split name, sample count, and creation timestamp. - [Get version info](https://friendli.ai/docs/openapi/dataset/get-version-info.md): Retrieve metadata for a specific version of a Friendli dataset via the API. Returns the version tag, creation date, and snapshot details. - [List datasets](https://friendli.ai/docs/openapi/dataset/list-datasets.md): List all datasets in your Friendli Suite project via the API. Returns dataset IDs, names, split counts, and creation timestamps for each entry. - [List samples](https://friendli.ai/docs/openapi/dataset/list-samples.md): List data samples in a Friendli dataset split via the API. Paginate through training or evaluation entries with sample IDs and content previews. - [List splits](https://friendli.ai/docs/openapi/dataset/list-splits.md): List all splits within a Friendli dataset via the API. Returns split IDs, names, and sample counts for training, validation, and test partitions. - [List versions](https://friendli.ai/docs/openapi/dataset/list-versions.md): List all versions of a Friendli dataset via the API. Returns version IDs, tags, and creation timestamps to track your dataset's revision history. - [Dataset overview](https://friendli.ai/docs/openapi/dataset/overview.md): API reference for Friendli Dataset API. Create, manage, and version datasets with splits and samples for model training and evaluation. - [Update samples](https://friendli.ai/docs/openapi/dataset/update-samples.md): Update existing data samples in a Friendli dataset split via the API. Modify sample content by ID without deleting and re-adding entries. - [Dedicated create endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/create.md): Create a Friendli Dedicated Endpoint deployment for a Hugging Face model via the API. Specify GPU type, replica count, and model configuration. - [Dedicated delete endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/delete.md): Permanently delete a Friendli Dedicated Endpoint deployment by ID. This stops the endpoint and releases all associated GPU resources immediately. - [Dedicated get endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/get-spec.md): Retrieve the full specification of a Friendli Dedicated Endpoint by ID, including model config, GPU type, replica count, and deployment settings. - [Dedicated get endpoint status](https://friendli.ai/docs/openapi/dedicated/endpoint/get-status.md): Check the current status of a Friendli Dedicated Endpoint by ID. Returns the lifecycle state such as running, sleeping, initializing, or terminated. - [Dedicated get endpoint version](https://friendli.ai/docs/openapi/dedicated/endpoint/get-version.md): Retrieve the version history of a Friendli Dedicated Endpoint by ID. View past configurations and rollback points for deployment tracking. - [Dedicated list endpoints](https://friendli.ai/docs/openapi/dedicated/endpoint/list.md): List all Friendli Dedicated Endpoint deployments in your project. Returns endpoint IDs, statuses, model names, and GPU configurations. - [Dedicated restart endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/restart.md): Restart a failed or terminated Friendli Dedicated Endpoint by ID. The endpoint re-initializes with the same model and GPU configuration. - [Dedicated sleep endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/sleep.md): Put a Friendli Dedicated Endpoint into sleep mode by ID. The endpoint stops serving but retains its configuration for quick wake-up later. - [Dedicated terminate endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/terminate.md): Terminate a running Friendli Dedicated Endpoint by ID. Stops all inference and releases GPU resources while preserving the endpoint configuration. - [Dedicated update endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/update.md): Update a Friendli Dedicated Endpoint with a new model, GPU type, or replica count. Changes are applied as a new version in the deployment history. - [Dedicated wake endpoint](https://friendli.ai/docs/openapi/dedicated/endpoint/wake.md): Wake up a sleeping Friendli Dedicated Endpoint by ID. The endpoint resumes serving with its previous model and GPU configuration intact. - [Dedicated create endpoint from W&B artifact](https://friendli.ai/docs/openapi/dedicated/endpoint/wandb-artifact-create.md): Create a Friendli Dedicated Endpoint from a Weights & Biases artifact via the API. Deploy W&B-managed models directly to dedicated GPU hardware. - [Dedicated audio transcriptions](https://friendli.ai/docs/openapi/dedicated/inference/audio-transcriptions.md): Transcribe audio files to text using your Friendli Dedicated Endpoint. Upload an audio file and receive a text transcription from the deployed model. - [Dedicated chat completions](https://friendli.ai/docs/openapi/dedicated/inference/chat-completions.md): Send a conversation to your Friendli Dedicated Endpoint and receive a chat completion response. Supports streaming, tool calls, and JSON mode. - [Dedicated chat completions chunk object](https://friendli.ai/docs/openapi/dedicated/inference/chat-completions-chunk-object.md): Schema reference for the streamed chat completions chunk object returned by Friendli Dedicated Endpoints when streaming is enabled. - [Dedicated chat render](https://friendli.ai/docs/openapi/dedicated/inference/chat-render.md): Preview the final prompt text that your Friendli Dedicated Endpoint will send to the model. Useful for debugging chat templates and token counts. - [Dedicated completions](https://friendli.ai/docs/openapi/dedicated/inference/completions.md): Generate text completions from a prompt using your Friendli Dedicated Endpoint. Supports streaming, token limits, temperature, and stop sequences. - [Dedicated completions chunk object](https://friendli.ai/docs/openapi/dedicated/inference/completions-chunk-object.md): Schema reference for the streamed completions chunk object returned by Friendli Dedicated Endpoints when using the completions streaming API. - [Dedicated detokenization](https://friendli.ai/docs/openapi/dedicated/inference/detokenization.md): Convert token IDs back to text using your Friendli Dedicated Endpoint. Decode tokenized output into a human-readable string for analysis. - [Dedicated embeddings](https://friendli.ai/docs/openapi/dedicated/inference/embeddings.md): Generate text embedding vectors using your Friendli Dedicated Endpoint. Convert text into dense vector representations for search and similarity. - [Dedicated image edits](https://friendli.ai/docs/openapi/dedicated/inference/image-edits.md): Edit images with text prompts using your Friendli Dedicated Endpoint. Upload an image and describe the desired modifications for the model to apply. - [Dedicated image generations](https://friendli.ai/docs/openapi/dedicated/inference/image-generations.md): Generate images from text descriptions using your Friendli Dedicated Endpoint. Supports configurable image size, count, and generation parameters. - [Dedicated messages](https://friendli.ai/docs/openapi/dedicated/inference/messages.md): Use the Anthropic Messages-style API on your Friendli Dedicated Endpoint. Send structured message payloads and receive assistant responses. - [Dedicated messages chunk object](https://friendli.ai/docs/openapi/dedicated/inference/messages-chunk-object.md): Schema reference for the streamed messages chunk object returned by Friendli Dedicated Endpoints when using the Messages API with streaming. - [Dedicated text classification](https://friendli.ai/docs/openapi/dedicated/inference/text-classification.md): Classify text into categories using your Friendli Dedicated Endpoint. Send text input and receive predicted labels with per-class probabilities. - [Dedicated tokenization](https://friendli.ai/docs/openapi/dedicated/inference/tokenization.md): Tokenize text into token IDs using your Friendli Dedicated Endpoint. Useful for counting tokens and validating input length before inference. - [Dedicated overview](https://friendli.ai/docs/openapi/dedicated/overview.md): API reference for Friendli Dedicated Endpoints. Browse inference, endpoint management, chat completions, embeddings, and image generation endpoints. - [Complete file upload](https://friendli.ai/docs/openapi/file/complete-file-upload.md): Finalize a multipart file upload to Friendli Suite via the API. Call this endpoint after all file parts have been uploaded to mark the file as ready. - [Get file download URL](https://friendli.ai/docs/openapi/file/get-file-download-url.md): Get a pre-signed download URL for a file stored in Friendli Suite via the API. Use the returned URL to download the file with standard HTTP clients. - [Get file info](https://friendli.ai/docs/openapi/file/get-file-info.md): Retrieve metadata for a file stored in Friendli Suite by file ID via the API. Returns filename, size, upload status, and creation timestamp. - [Initiate file upload](https://friendli.ai/docs/openapi/file/init-file-upload.md): Start a new file upload to Friendli Suite via the API. Returns a file ID and pre-signed upload URL to begin transferring your file data. - [File overview](https://friendli.ai/docs/openapi/file/overview.md): API reference for Friendli File API. Upload, download, and manage files used for model training and dataset preparation on Friendli Suite. - [API Reference](https://friendli.ai/docs/openapi/introduction.md): Complete API reference for Friendli Suite. Explore endpoints for Model APIs, Dedicated Endpoints, Container, File, and Dataset APIs with HTTP examples. - [Model APIs audio transcriptions](https://friendli.ai/docs/openapi/model-apis/audio-transcriptions.md): Transcribe audio files to text using Friendli Model APIs. Supports multiple audio formats with streaming and non-streaming responses. - [Model APIs audio transcriptions chunk object](https://friendli.ai/docs/openapi/model-apis/audio-transcriptions-chunk-object.md): Schema reference for the streamed audio transcription chunk object returned by Friendli Model APIs during real-time transcription. - [Model APIs chat completions](https://friendli.ai/docs/openapi/model-apis/chat-completions.md): Send a conversation to Friendli Model APIs and receive a chat completion response. Supports streaming, tool calls, and JSON mode. - [Model APIs chat completions chunk object](https://friendli.ai/docs/openapi/model-apis/chat-completions-chunk-object.md): Schema reference for the streamed chat completions chunk object returned by Friendli Model APIs when streaming is enabled. - [Model APIs chat render](https://friendli.ai/docs/openapi/model-apis/chat-render.md): Preview the final prompt text that Friendli Model APIs will send to the model. Useful for debugging chat templates and token usage. - [Model APIs completions](https://friendli.ai/docs/openapi/model-apis/completions.md): Generate text completions from a prompt using Friendli Model APIs. Supports streaming, token limits, temperature, and stop sequences. - [Model APIs completions chunk object](https://friendli.ai/docs/openapi/model-apis/completions-chunk-object.md): Schema reference for the streamed completions chunk object returned by Friendli Model APIs when using the completions streaming API. - [Model APIs detokenization](https://friendli.ai/docs/openapi/model-apis/detokenization.md): Convert token IDs back to text using Friendli Model APIs. Decode tokenized output into a human-readable string for post-processing. - [Model APIs messages](https://friendli.ai/docs/openapi/model-apis/messages.md): Use the Anthropic Messages-style API on Friendli Model APIs. Send structured message payloads and receive assistant responses. - [Model APIs messages chunk object](https://friendli.ai/docs/openapi/model-apis/messages-chunk-object.md): Schema reference for the streamed messages chunk object returned by Friendli Model APIs when using the Messages API with streaming. - [Model APIs overview](https://friendli.ai/docs/openapi/model-apis/overview.md): API reference for Friendli Model APIs. Browse chat completions, completions, tokenization, messages, and audio transcription endpoints. - [Model APIs tokenization](https://friendli.ai/docs/openapi/model-apis/tokenization.md): Tokenize text into token IDs using Friendli Model APIs. Useful for counting tokens before sending inference requests to the model. - [Model APIs tool assisted chat completions](https://friendli.ai/docs/openapi/model-apis/tool-assisted-chat-completions.md): Chat completions with built-in tool calling on Friendli Model APIs. The model automatically invokes tools like web search during generation. - [Model APIs tool assisted chat completions chunk object](https://friendli.ai/docs/openapi/model-apis/tool-assisted-chat-completions-chunk-object.md): Schema reference for the streamed tool-assisted chat completions chunk object from Friendli Model APIs during tool-augmented generation. - [Friendli Python SDK](https://friendli.ai/docs/sdk/python-sdk.md): Install and use the Friendli Python SDK to access Model APIs, Dedicated Endpoints, and Container. Covers chat, completions, streaming, and async usage. ## OpenAPI Specs - [openapi](https://friendli.ai/docs/openapi.yaml) ## Optional - [Website](https://friendli.ai) - [Blog](https://friendli.ai/blog) - [Models](https://friendli.ai/models)