> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Multi-Modality

> Process text, images, audio, and video with Friendli multimodal APIs. Includes vision, transcription, and image generation endpoint guides.

Friendli supports multimodal workflows across text, image, audio, and video. \
Use the comprehensive guides below to get started with each modality.

## Quick Navigation

* [Image Generation](#image-generation) - Generate images from text prompts
* [Vision (Image Understanding)](#vision-image-understanding) - Analyze and understand images
* [Video Understanding](#video-understanding) - Process and analyze video content
* [Audio and Speech](#audio-and-speech) - Convert audio to text and analyze audio

### Image Generation

Transform text prompts into high-quality visuals with Friendli's image generation capabilities.

#### Representative Models

We support various trending image generation models including:

* [FLUX.1-dev](https://friendli.ai/models?baseModel=black-forest-labs/FLUX.1-dev)
* [FLUX.1-schnell](https://friendli.ai/models?baseModel=black-forest-labs/FLUX.1-schnell)
* [See all image generation models](https://friendli.ai/models?input=TEXT\&output=IMAGE)

#### API Usage

<CodeGroup>
  ```bash curl theme={null}
  curl -L -X POST "https://api.friendli.ai/dedicated/v1/images/generations" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $API_KEY" \
    --data-raw '{
      "model": "YOUR_ENDPOINT_ID",
      "prompt": "An orange Lamborghini driving down a hill road at night with a beautiful ocean view in the background.",
      "num_inference_steps": 10,
      "guidance_scale": 3.5
    }'
  ```

  ```python Friendli Python SDK theme={null}
  import os
  from friendli import SyncFriendli

  with SyncFriendli(
      token=os.environ.get("API_KEY"),
  ) as friendli:
      images = friendli.dedicated.image.generate(
          model="YOUR_ENDPOINT_ID",
          prompt="An orange Lamborghini driving down a hill road at night with a beautiful ocean view in the background.",
          num_inference_steps=10,
          guidance_scale=3.5
      )

      print(images.data[0].url)
  ```

  ```python OpenAI Python SDK theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/dedicated/v1",
      api_key=os.environ.get("API_KEY"),
  )

  images = client.images.generate(
      model="YOUR_ENDPOINT_ID",
      prompt="An orange Lamborghini driving down a hill road at night with a beautiful ocean view in the background.",
      extra_body={
          "num_inference_steps": 10,
          "guidance_scale": 3.5
      }
  )

  print(images.data[0].url)
  ```
</CodeGroup>

<Info>
  `guidance_scale` is required when using Friendli Container. For more detail, please refer to the [Container API Reference](/openapi/container/image-generations).
</Info>

### Vision (Image Understanding)

Analyze and understand images using Friendli's vision capabilities.

#### Representative Models

We support various trending vision models including:

* **Qwen2.5-VL**
* **InternVL3**
* [See all vision models](https://friendli.ai/models?input=IMAGE\&output=TEXT)

#### Supported Image Formats

Supports formats supported by the PIL library:

* JPEG (.jpeg and .jpg)
* PNG (.png)
* AVIF (.avif)

#### API Usage

<CodeGroup>
  ```python URL-based image theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/dedicated/v1",
      api_key=os.environ.get("API_KEY"),
  )

  image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"

  completion = client.chat.completions.create(
      model="YOUR_ENDPOINT_ID",
      messages=[
          {
              "role": "user",
              "content": [
                  {
                      "type": "text",
                      "text": "What kind of animal is shown in the image?",
                  },
                  {"type": "image_url", "image_url": {"url": image_url}},
              ],
          },
      ],
      stream=False
  )

  print(completion.choices[0].message.content)
  ```

  ```python Base64-encoded image theme={null}
  import base64, requests, os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/dedicated/v1",
      api_key=os.environ.get("API_KEY"),
  )

  image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
  image_media_type = "image/jpg"
  image_base64 = base64.standard_b64encode(requests.get(image_url).content).decode(
      "utf-8"
  )

  completion = client.chat.completions.create(
      model="YOUR_ENDPOINT_ID",
      messages=[
          {
              "role": "user",
              "content": [
                  {
                      "type": "text",
                      "text": "What kind of animal is shown in the image?",
                  },
                  {
                      "type": "image_url",
                      "image_url": {
                          "url": f"data:{image_media_type};base64,{image_base64}"
                      },
                  },
              ],
          },
      ],
  )

  print(completion.choices[0].message.content)
  ```
</CodeGroup>

### Video Understanding

Process and analyze video content with Friendli's video understanding capabilities.

#### Representative Models

We support various video understanding models including:

* **Qwen2.5-VL**
* [See all video models](https://friendli.ai/models?input=VIDEO\&output=TEXT)

#### Video Requirements

* Videos must be hosted at publicly accessible URLs
* HTTPS URLs are recommended for security
* Consider video file size and processing time implications
* Some models may have specific resolution or duration requirements

#### API Usage

By default, video fetching timeout is 30 seconds. To increase the timeout value, please [contact us](mailto:support@friendli.ai).

<CodeGroup>
  ```python Single Video Input theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/dedicated/v1",
      api_key=os.environ.get("API_KEY"),
  )

  video_url = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4"

  completion = client.chat.completions.create(
      model="YOUR_ENDPOINT_ID",
      messages=[
          {
              "role": "user",
              "content": [
                  {
                      "type": "text",
                      "text": "What's in this video?",
                  },
                  {
                      "type": "video_url",
                      "video_url": {"url": video_url},
                  },
              ],
          },
      ],
      temperature=0,
      max_tokens=100,
  )

  print(completion.choices[0].message.content)
  ```

  ```python Multi-Video Input theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/dedicated/v1",
      api_key=os.environ.get("API_KEY"),
  )

  video_url_1 = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4"
  video_url_2 = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4"

  completion = client.chat.completions.create(
      model="YOUR_ENDPOINT_ID",
      messages=[
          {
              "role": "user",
              "content": [
                  {
                      "type": "text",
                      "text": "Describe the characters in each video concisely.",
                  },
                  {
                      "type": "video_url",
                      "video_url": {"url": video_url_2},
                  },
                  {
                      "type": "video_url",
                      "video_url": {"url": video_url_1},
                  },
              ],
          },
      ],
      temperature=0,
      max_tokens=100,
  )

  print(completion.choices[0].message.content)
  ```
</CodeGroup>

### Audio and Speech

Convert audio files to text and perform various AI tasks with Friendli's audio capabilities.

#### Representative Models

We support various trending audio models including:

* **Whisper Large V3**
* **Qwen2-Audio**
* **Ultravox**
* [See all audio models](https://friendli.ai/models?input=AUDIO\&output=TEXT)

#### Supported Audio Formats

Our platform supports a wide range of audio formats compatible with the **librosa library**:

* **MP3** (.mp3)
* **WAV** (.wav)
* **FLAC** (.flac)
* **OGG** (.ogg)
* And many other standard audio formats

#### API Usage

By default, audio input is limited to 30 seconds. To enable longer audio inputs, please [contact us](mailto:support@friendli.ai).

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.friendli.ai/dedicated/v1/audio/transcriptions \
    -H "Authorization: Bearer $API_KEY" \
    -H 'Content-Type: multipart/form-data' \
    -F file=@/path/to/audio/file.mp3 \
    -F model="YOUR_ENDPOINT_ID"
  ```

  ```python Friendli Python SDK theme={null}
  from friendli import SyncFriendli
  import os

  with SyncFriendli(
      token=os.getenv("API_KEY"),
  ) as friendli:
      audio_file = open("/path/to/file/audio.mp3", "rb")

      transcription = friendli.dedicated.audio.transcriptions.create(
          model="YOUR_ENDPOINT_ID",
          file=audio_file
      )

      print(transcription.text)
  ```

  ```python OpenAI Python SDK theme={null}
  from openai import OpenAI
  import os

  client = OpenAI(
    base_url="https://api.friendli.ai/dedicated/v1",
    api_key=os.getenv("API_KEY"),
  )
  audio_file= open("/path/to/file/audio.mp3", "rb")

  transcription = client.audio.transcriptions.create(
      model="YOUR_ENDPOINT_ID",
      file=audio_file
  )

  print(transcription.text)
  ```
</CodeGroup>

### API References

For detailed API specifications, refer to:

* [Image Generation API Reference](/openapi/dedicated/inference/image-generations)
* [Image/Video/Audio Understanding API Reference](/openapi/dedicated/inference/chat-completions)
* [Audio Transcriptions API Reference](/openapi/dedicated/inference/audio-transcriptions)
