Changelog - Friendli Docs

April, 2026

Apr 16

Model APIs

Model Deprecation

We have deprecated the following Model API.

Qwen/Qwen3-30B-A3B

Apr 15

Model APIs

Model Deprecation

We have deprecated the following Model API.

zai-org/GLM-4.7

Apr 9

Model APIs

Model Deprecation

We have deprecated the following Model API.

MiniMaxAI/MiniMax-M2.1

Apr 7

Model APIs

Model Release

We now support the following Model API.

zai-org/GLM-5.1

Apr 3

Model APIs

Pricing Update

We have changed the pricing model for Qwen/Qwen3-30B-A3B, DeepSeek-V3.1 to token-based pricing.

Apr 3

Dedicated Endpoints

New Model Family

Added support for the following new model families:

CohereAsrForConditionalGeneration (e.g., CohereLabs/cohere-transcribe-03-2026)
Gemma4ForConditionalGeneration (e.g., google/gemma-4-31B-it)

Apr 1

Model APIs

Model Release

We now support the following Model API.

openai/whisper-large-v3

March, 2026

Mar 18

Model APIs

Model Release

We now support the following Model API.

LGAI-EXAONE/K-EXAONE-236B-A23B

Pricing Update

We now support cached input pricing for the following Model API.

LGAI-EXAONE/K-EXAONE-236B-A23B

Mar 7

Model APIs

Model Release

We now support the following Model API.

deepseek-ai/DeepSeek-V3.2

Pricing Update

We now support cached input pricing for the following Model APIs.

MiniMaxAI/MiniMax-M2.1
zai-org/GLM-5

Mar 4

Dedicated Endpoints

Host KV Cache

We now support Host KV Cache. This extends KV capacity beyond GPU memory limits, allowing more tokens to be retained during inference. Read more

Speculative Decoding with a Draft Model

We now support speculative decoding by pairing the target model with a pre-trained draft model. The draft model proposes multiple candidate tokens verified by the target model, reducing decoding passes and improving throughput and latency. Available for a curated list of target models. Read more

Mar 1

Model APIs

Pricing Update

We now support cached input pricing for the following Model API.

MiniMaxAI/MiniMax-M2.5

February, 2026

Feb 28

Model APIs

Model Deprecation

We have deprecated the following Model API.

LGAI-EXAONE/EXAONE-4.0.1-32B

Feb 20

Model APIs

Model Deprecation

We have deprecated the following Model API.

LGAI-EXAONE/K-EXAONE-236B-A23B

Feb 19

Model APIs

Model Release

We now support the following Model API.

MiniMaxAI/MiniMax-M2.5

Feb 11

Model APIs

Model Release

We now support the following Model API.

zai-org/GLM-5

January, 2026

Jan 21

Dedicated Endpoints

New Model Family

Added support for the following new model family:

Glm4MoeLiteForCausalLM (e.g., zai-org/GLM-4.7-Flash)

Jan 20

Model APIs

Model Release

We now support the following Model API.

zai-org/GLM-4.7

Pricing Update

We have changed the pricing model for MiniMaxAI/MiniMax-M2.1 to token-based pricing.

Jan 16

Model APIs

Model Deprecation

We have deprecated the following Model API.

deepseek-ai/DeepSeek-R1-0528

Jan 14

Model APIs

Model Release

We now support the following Model API.

MiniMaxAI/MiniMax-M2.1

Jan 2

Dedicated Endpoints

New Model Family

Added support for the following new model family:

ExaoneMoEForCausalLM (e.g., LGAI-EXAONE/K-EXAONE-236B-A23B)

December, 2025

Dec 31

Model APIs

Model Release

We now support the following Model API.

LGAI-EXAONE/K-EXAONE-236B-A23B

Dec 5

Dedicated Endpoints

New Model Family

Added support for the following new model families:

HunYuanVLForConditionalGeneration (e.g., tencent/HunyuanOCR)
MiniMaxM2ForCausalLM (e.g., MiniMaxAI/MiniMax-M2)
Gemma3TextModel (e.g., google/embeddinggemma-300m)
Phi4MMForCausalLM (e.g., microsoft/Phi-4-multimodal-instruct)

Dec 1

Model APIs

Model Release

We now support the following Model API.

deepseek-ai/DeepSeek-V3.1

November, 2025

Nov 27

Dedicated Endpoints

Feature Availability Update

Dedicated Endpoints’ Basic plan users can now access the following features that were previously available only to Enterprise plan users:

Request count auto-scaling: Scale endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. Read more
Multi-LoRA serving: Serve multiple LoRA adapters simultaneously on a single endpoint, allowing you to use different fine-tuned models without additional GPU resources. Read more
Metrics: Track, monitor, and optimize your inference deployment.
Logs: Track logs and spot issues in real time.

Nov 21

Dedicated Endpoints

New Model Family

Added support for the following new model families:

FluxKontextPipeline (e.g., black-forest-labs/FLUX.1-Kontext-dev)
Olmo3ForCausalLM (e.g., allenai/Olmo-3-32B-Think)
LightOnOCRForConditionalGeneration (e.g., lightonai/LightOnOCR-1B-1025)
PaddleOCRVLForConditionalGeneration (e.g., PaddlePaddle/PaddleOCR-VL)
DeepseekOCRForCausalLM (e.g., deepseek-ai/DeepSeek-OCR)

Nov 7

Dedicated Endpoints

New Model Family

Added support for the following new model families:

Qwen3VLForConditionalGeneration (e.g., Qwen/Qwen3-VL-4B-Instruct)
Qwen3VLMoeForConditionalGeneration (e.g., Qwen/Qwen3-VL-30B-A3B-Instruct)
GraniteMoeHybridForCausalLM (e.g., ibm-granite/granite-4.0-h-small)
DotsOCRForCausalLM (e.g., rednote-hilab/dots.ocr)

Nov 1

Model APIs

Pricing Update

We have changed the pricing model for Qwen/Qwen3-235B-A22B-Instruct-2507 to token-based pricing.

September, 2025

Sep 15

Dedicated Endpoints

New Model Family

Added support for the following new model families:

Qwen3NextForCausalLM (e.g., Qwen/Qwen3-Next-80B-A3B-Instruct)
HunYuanDenseV1ForCausalLM (e.g., tencent/Hunyuan-MT-7B)
ApertusForCausalLM (e.g., swiss-ai/Apertus-8B-Instruct-2509)
SeedOssForCausalLM (e.g., ByteDance-Seed/Seed-OSS-36B-Instruct)

Sep 12

Dedicated Endpoints

Custom Chat Template Support

We now support custom chat formatting. You can paste or upload a custom Jinja template during endpoint creation. Read more

4-Bit Online Quantization Support

We now support 4-bit online quantization. By enabling this feature, you can efficiently run models on smaller instances with negligible quality impact. Read more

Sep 10

Model APIsDedicated Endpoints

Reasoning Parsing Support

We now support reasoning parsing. By enabling the feature, the response will provide a separate reasoning_content field rather than including the reasoning content in the content field. Read more

Sep 8

Model APIs

Model Deprecation

We have deprecated the following Model API.

K-intelligence/Midm-2.0-Base-Instruct

Sep 4

Model APIs

Model Deprecation

We have deprecated the following Model API.

K-intelligence/Midm-2.0-Mini-Instruct

Sep 1

Dedicated Endpoints

B200 Hardware Support

We now support NVIDIA B200 GPUs alongside existing A100, H100, and H200 GPUs. Read more

August, 2025

Aug 22

Model APIs

New Built-in Integration with Linkup

New built-in web-search tool integration with Linkup has been added. Read more

Aug 22

Dedicated Endpoints

New Model Family

Added support for the following new model family:

GptOssForCausalLM (e.g., openai/gpt-oss-20b)

Aug 19

Dedicated Endpoints

New Auto-Scaling Type ‘Request count’ Added

Enterprise plan users can now choose to scale their endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress.

Aug 8

Model APIs

Increased Output Token Limits for Reasoning Models

We have increased the output token limits for reasoning models on Model APIs, allowing longer reasoning outputs to be generated.

Aug 8

Dedicated Endpoints

New Endpoint Feature ‘N-GRAM Speculative Decoding’

Users can now enable N-GRAM speculative decoding for their endpoints. For predictable tasks, this can deliver substantial performance gains. Read more

Aug 1

Model APIs

Model Release

We now support the following Model API.

Qwen/Qwen3-235B-A22B-Instruct-2507

Aug 1

Dedicated Endpoints

New Model Family

Added support for the following new model family:

HyperCLOVAXForCausalLM (e.g., naver-hyperclovax/HyperCLOVAX-SEED-Think-14B)

July, 2025

Jul 25

Dedicated Endpoints

New Endpoint Feature ‘Online Quantization’

Users can now quantize their model endpoints without any preparations and accelerate inference. Read more

Jul 14

Model APIs

Model Release

LG AI Research has partnered with FriendliAI to bring the latest version of EXAONE 4.0. Read more

LGAI-EXAONE/EXAONE-4.0.1-32B

Jul 11

Model APIs

Model Release

We now support the following Model API.

deepseek-ai/DeepSeek-R1-0528

Jul 8

Dedicated Endpoints

New Model Family

Added support for the following new model families:

Dots1ForCausalLM (e.g., rednote-hilab/dots.llm1.inst)
Glm4vForConditionalGeneration (e.g., zai-org/GLM-4.1V-9B-Thinking)
KeyeForConditionalGeneration (e.g., Kwai-Keye/Keye-VL-8B-Preview)
HunYuanMoEV1ForCausalLM (e.g., tencent/Hunyuan-A13B-Instruct)
PhiMoEForCausalLM (e.g., microsoft/Phi-mini-MoE-instruct)
MiniMaxM1ForCausalLM (e.g., MiniMaxAI/MiniMax-M1-80k)
Ernie4_5_MoeForCausalLM (e.g., baidu/ERNIE-4.5-21B-A3B-Thinking)
Ernie4_5_ForCausalLM (e.g., baidu/ERNIE-4.5-0.3B-PT)

Jul 3

Dedicated Endpoints

New Model Family

Added support for the following new model family:

Exaone4ForCausalLM (e.g., LGAI-EXAONE/EXAONE-4.0.1-32B)

Documentation Index

​April, 2026

​Model Deprecation

​Model Deprecation

​Model Deprecation

​Model Release

​Pricing Update

​New Model Family

​Model Release

​March, 2026

​Model Release

​Pricing Update

​Model Release

​Pricing Update

​Host KV Cache

​Speculative Decoding with a Draft Model

​Pricing Update

​February, 2026

​Model Deprecation

​Model Deprecation

​Model Release

​Model Release

​January, 2026

​New Model Family

​Model Release

​Pricing Update

​Model Deprecation

​Model Release

​New Model Family

​December, 2025

​Model Release

​New Model Family

​Model Release

​November, 2025

​Feature Availability Update

​New Model Family

​New Model Family

​Pricing Update

​September, 2025

​New Model Family

​Custom Chat Template Support

​4-Bit Online Quantization Support

​Reasoning Parsing Support

​Model Deprecation

​Model Deprecation

​B200 Hardware Support

​August, 2025

​New Built-in Integration with Linkup

​New Model Family

​New Auto-Scaling Type ‘Request count’ Added

​Increased Output Token Limits for Reasoning Models

​New Endpoint Feature ‘N-GRAM Speculative Decoding’

​Model Release

​New Model Family

​July, 2025

​New Endpoint Feature ‘Online Quantization’

​Model Release

​Model Release

​New Model Family

​New Model Family

April, 2026

Model Deprecation

Model Deprecation

Model Deprecation

Model Release

Pricing Update

New Model Family

Model Release

March, 2026

Model Release

Pricing Update

Model Release

Pricing Update

Host KV Cache

Speculative Decoding with a Draft Model

Pricing Update

February, 2026

Model Deprecation

Model Deprecation

Model Release

Model Release

January, 2026

New Model Family

Model Release

Pricing Update

Model Deprecation

Model Release

New Model Family

December, 2025

Model Release

New Model Family

Model Release

November, 2025

Feature Availability Update

New Model Family

New Model Family

Pricing Update

September, 2025

New Model Family

Custom Chat Template Support

4-Bit Online Quantization Support

Reasoning Parsing Support

Model Deprecation

Model Deprecation

B200 Hardware Support

August, 2025

New Built-in Integration with Linkup

New Model Family

New Auto-Scaling Type ‘Request count’ Added

Increased Output Token Limits for Reasoning Models

New Endpoint Feature ‘N-GRAM Speculative Decoding’

Model Release

New Model Family

July, 2025

New Endpoint Feature ‘Online Quantization’

Model Release

Model Release

New Model Family

New Model Family