Skip to main content

Documentation Index

Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

April, 2026

Apr 16
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • Qwen/Qwen3-30B-A3B
Apr 15
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • zai-org/GLM-4.7
Apr 9
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • MiniMaxAI/MiniMax-M2.1
Apr 7
Model APIs

Model Release

We now support the following Model API.
  • zai-org/GLM-5.1
Apr 3
Model APIs

Pricing Update

We have changed the pricing model for Qwen/Qwen3-30B-A3B, DeepSeek-V3.1 to token-based pricing.
Apr 3
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • CohereAsrForConditionalGeneration (e.g., CohereLabs/cohere-transcribe-03-2026)
  • Gemma4ForConditionalGeneration (e.g., google/gemma-4-31B-it)
Apr 1
Model APIs

Model Release

We now support the following Model API.
  • openai/whisper-large-v3

March, 2026

Mar 18
Model APIs

Model Release

We now support the following Model API.
  • LGAI-EXAONE/K-EXAONE-236B-A23B

Pricing Update

We now support cached input pricing for the following Model API.
  • LGAI-EXAONE/K-EXAONE-236B-A23B
Mar 7
Model APIs

Model Release

We now support the following Model API.
  • deepseek-ai/DeepSeek-V3.2

Pricing Update

We now support cached input pricing for the following Model APIs.
  • MiniMaxAI/MiniMax-M2.1
  • zai-org/GLM-5
Mar 4
Dedicated Endpoints

Host KV Cache

We now support Host KV Cache. This extends KV capacity beyond GPU memory limits, allowing more tokens to be retained during inference. Read more

Speculative Decoding with a Draft Model

We now support speculative decoding by pairing the target model with a pre-trained draft model. The draft model proposes multiple candidate tokens verified by the target model, reducing decoding passes and improving throughput and latency. Available for a curated list of target models. Read more
Mar 1
Model APIs

Pricing Update

We now support cached input pricing for the following Model API.
  • MiniMaxAI/MiniMax-M2.5

February, 2026

Feb 28
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • LGAI-EXAONE/EXAONE-4.0.1-32B
Feb 20
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • LGAI-EXAONE/K-EXAONE-236B-A23B
Feb 19
Model APIs

Model Release

We now support the following Model API.
  • MiniMaxAI/MiniMax-M2.5
Feb 11
Model APIs

Model Release

We now support the following Model API.
  • zai-org/GLM-5

January, 2026

Jan 21
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • Glm4MoeLiteForCausalLM (e.g., zai-org/GLM-4.7-Flash)
Jan 20
Model APIs

Model Release

We now support the following Model API.
  • zai-org/GLM-4.7

Pricing Update

We have changed the pricing model for MiniMaxAI/MiniMax-M2.1 to token-based pricing.
Jan 16
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • deepseek-ai/DeepSeek-R1-0528
Jan 14
Model APIs

Model Release

We now support the following Model API.
  • MiniMaxAI/MiniMax-M2.1
Jan 2
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • ExaoneMoEForCausalLM (e.g., LGAI-EXAONE/K-EXAONE-236B-A23B)

December, 2025

Dec 31
Model APIs

Model Release

We now support the following Model API.
  • LGAI-EXAONE/K-EXAONE-236B-A23B
Dec 5
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • HunYuanVLForConditionalGeneration (e.g., tencent/HunyuanOCR)
  • MiniMaxM2ForCausalLM (e.g., MiniMaxAI/MiniMax-M2)
  • Gemma3TextModel (e.g., google/embeddinggemma-300m)
  • Phi4MMForCausalLM (e.g., microsoft/Phi-4-multimodal-instruct)
Dec 1
Model APIs

Model Release

We now support the following Model API.
  • deepseek-ai/DeepSeek-V3.1

November, 2025

Nov 27
Dedicated Endpoints

Feature Availability Update

Dedicated Endpoints’ Basic plan users can now access the following features that were previously available only to Enterprise plan users:
  • Request count auto-scaling: Scale endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. Read more
  • Multi-LoRA serving: Serve multiple LoRA adapters simultaneously on a single endpoint, allowing you to use different fine-tuned models without additional GPU resources. Read more
  • Metrics: Track, monitor, and optimize your inference deployment.
  • Logs: Track logs and spot issues in real time.
Nov 21
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • FluxKontextPipeline (e.g., black-forest-labs/FLUX.1-Kontext-dev)
  • Olmo3ForCausalLM (e.g., allenai/Olmo-3-32B-Think)
  • LightOnOCRForConditionalGeneration (e.g., lightonai/LightOnOCR-1B-1025)
  • PaddleOCRVLForConditionalGeneration (e.g., PaddlePaddle/PaddleOCR-VL)
  • DeepseekOCRForCausalLM (e.g., deepseek-ai/DeepSeek-OCR)
Nov 7
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • Qwen3VLForConditionalGeneration (e.g., Qwen/Qwen3-VL-4B-Instruct)
  • Qwen3VLMoeForConditionalGeneration (e.g., Qwen/Qwen3-VL-30B-A3B-Instruct)
  • GraniteMoeHybridForCausalLM (e.g., ibm-granite/granite-4.0-h-small)
  • DotsOCRForCausalLM (e.g., rednote-hilab/dots.ocr)
Nov 1
Model APIs

Pricing Update

We have changed the pricing model for Qwen/Qwen3-235B-A22B-Instruct-2507 to token-based pricing.

September, 2025

Sep 15
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • Qwen3NextForCausalLM (e.g., Qwen/Qwen3-Next-80B-A3B-Instruct)
  • HunYuanDenseV1ForCausalLM (e.g., tencent/Hunyuan-MT-7B)
  • ApertusForCausalLM (e.g., swiss-ai/Apertus-8B-Instruct-2509)
  • SeedOssForCausalLM (e.g., ByteDance-Seed/Seed-OSS-36B-Instruct)
Sep 12
Dedicated Endpoints

Custom Chat Template Support

We now support custom chat formatting. You can paste or upload a custom Jinja template during endpoint creation. Read more

4-Bit Online Quantization Support

We now support 4-bit online quantization. By enabling this feature, you can efficiently run models on smaller instances with negligible quality impact. Read more
Sep 10
Model APIsDedicated Endpoints

Reasoning Parsing Support

We now support reasoning parsing. By enabling the feature, the response will provide a separate reasoning_content field rather than including the reasoning content in the content field. Read more
Sep 8
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • K-intelligence/Midm-2.0-Base-Instruct
Sep 4
Model APIs

Model Deprecation

We have deprecated the following Model API.
  • K-intelligence/Midm-2.0-Mini-Instruct
Sep 1
Dedicated Endpoints

B200 Hardware Support

We now support NVIDIA B200 GPUs alongside existing A100, H100, and H200 GPUs. Read more

August, 2025

Aug 22
Model APIs

New Built-in Integration with Linkup

New built-in web-search tool integration with Linkup has been added. Read more
Aug 22
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • GptOssForCausalLM (e.g., openai/gpt-oss-20b)
Aug 19
Dedicated Endpoints

New Auto-Scaling Type ‘Request count’ Added

Enterprise plan users can now choose to scale their endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress.
Aug 8
Model APIs

Increased Output Token Limits for Reasoning Models

We have increased the output token limits for reasoning models on Model APIs, allowing longer reasoning outputs to be generated.
Aug 8
Dedicated Endpoints

New Endpoint Feature ‘N-GRAM Speculative Decoding’

Users can now enable N-GRAM speculative decoding for their endpoints. For predictable tasks, this can deliver substantial performance gains. Read more
Aug 1
Model APIs

Model Release

We now support the following Model API.
  • Qwen/Qwen3-235B-A22B-Instruct-2507
Aug 1
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • HyperCLOVAXForCausalLM (e.g., naver-hyperclovax/HyperCLOVAX-SEED-Think-14B)

July, 2025

Jul 25
Dedicated Endpoints

New Endpoint Feature ‘Online Quantization’

Users can now quantize their model endpoints without any preparations and accelerate inference. Read more
Jul 14
Model APIs

Model Release

LG AI Research has partnered with FriendliAI to bring the latest version of EXAONE 4.0. Read more
  • LGAI-EXAONE/EXAONE-4.0.1-32B
Jul 11
Model APIs

Model Release

We now support the following Model API.
  • deepseek-ai/DeepSeek-R1-0528
Jul 8
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • Dots1ForCausalLM (e.g., rednote-hilab/dots.llm1.inst)
  • Glm4vForConditionalGeneration (e.g., zai-org/GLM-4.1V-9B-Thinking)
  • KeyeForConditionalGeneration (e.g., Kwai-Keye/Keye-VL-8B-Preview)
  • HunYuanMoEV1ForCausalLM (e.g., tencent/Hunyuan-A13B-Instruct)
  • PhiMoEForCausalLM (e.g., microsoft/Phi-mini-MoE-instruct)
  • MiniMaxM1ForCausalLM (e.g., MiniMaxAI/MiniMax-M1-80k)
  • Ernie4_5_MoeForCausalLM (e.g., baidu/ERNIE-4.5-21B-A3B-Thinking)
  • Ernie4_5_ForCausalLM (e.g., baidu/ERNIE-4.5-0.3B-PT)
Jul 3
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • Exaone4ForCausalLM (e.g., LGAI-EXAONE/EXAONE-4.0.1-32B)
Last modified on May 12, 2026