> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Inference with gRPC

> Run a gRPC inference server with Friendli Container and send requests using the Friendli Python SDK. Includes setup, configuration, and code examples.

This guide walks you through running a gRPC inference server with Friendli Container and interacting with it through the `friendli` SDK.

## Prerequisites

Install `friendli` to use gRPC client SDK:

```sh theme={null}
pip install friendli
```

<Note>
  Ensure you have the `friendli` SDK version `1.4.1` or higher installed.
</Note>

## Starting Friendli Container with gRPC

Running the Friendli Container with a gRPC server for completions is available by adding the `--grpc true` option to the command argument.
This supports response-streaming gRPC, and you can send requests using our `friendli` SDK.
To start the Friendli Container with gRPC support, use the following command:

```sh theme={null}
export FRIENDLI_CONTAINER_SECRET="YOUR_FRIENDLI_CONTAINER_SECRET_flc_XXX"

# e.g. Running `NousResearch/Hermes-3-Llama-3.1-8B` on GPU 0 with a trial image.
docker run --gpus '"device=0"' -p 8000:8000 \
  -e FRIENDLI_CONTAINER_SECRET=$FRIENDLI_CONTAINER_SECRET \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  registry.friendli.ai/trial:latest  \
  --hf-model-name NousResearch/Hermes-3-Llama-3.1-8B \
  --grpc true
```

<Info>
  You can change the port of the server with `--web-server-port` argument.
</Info>

## Sending Requests with the Client SDK

Here is how to use the `friendli` SDK to interact with the gRPC server.
This example assumes that the gRPC server is running on `0.0.0.0:8000`.

<CodeGroup>
  ```python Default theme={null}
  from friendli import SyncFriendli

  client = SyncFriendli()

  stream = client.container.chat.complete(
      messages=[
          {"content": "You are a helpful assistant.", "role": "system"},
          {"content": "Hello!", "role": "user"},
      ],
      stream=True,  # Should be True
      top_k=1,
  )

  for chunk in stream:
      print(chunk.text, end="", flush=True)
  ```

  ```python Async theme={null}
  # For asynchronous operations, use the following code snippet:

  import asyncio
  from friendli import AsyncFriendli

  client = AsyncFriendli()

  async def run():
      stream = await client.container.chat.complete(
          messages=[
              {"content": "You are a helpful assistant.", "role": "system"},
              {"content": "Hello!", "role": "user"},
          ],
          stream=True,  # Should be True
          top_k=1,
      )

      async for chunk in stream:
          print(chunk.text, end="", flush=True)

  asyncio.run(run())
  ```
</CodeGroup>

## Properly Closing the Client

By default, the library closes underlying HTTP and gRPC connections when the `client` is garbage-collected.
You can manually close the `Friendli` or `AsyncFriendli` client using the `.close()` method or utilize a context manager to ensure proper closure when exiting a `with` block.

<CodeGroup>
  ```python Default theme={null}
  from friendli import SyncFriendli

  client = SyncFriendli()

  with client:
      stream = client.container.chat.complete(
          messages=[
              {"content": "You are a helpful assistant.", "role": "system"},
              {"content": "Hello!", "role": "user"},
          ],
          stream=True,  # Should be True
          top_k=1,
          min_tokens=10,
      )

      for chunk in stream:
          print(chunk.text, end="", flush=True)
  ```

  ```python Async theme={null}
  import asyncio
  from friendli import AsyncFriendli

  client = AsyncFriendli()

  async def run():
      async with client:
          stream = await client.container.chat.complete(
              messages=[
                  {"content": "You are a helpful assistant.", "role": "system"},
                  {"content": "Hello!", "role": "user"},
              ],
              stream=True,  # Should be True
              top_k=1,
          )

          async for chunk in stream:
              print(chunk.text, end="", flush=True)

  asyncio.run(run())
  ```
</CodeGroup>
