Tech Blog

Kong AI Gateway + Llama 2

Cover Image for Kong AI Gateway + Llama 2

最近は開発時に複数の LLM 使い分ける事が増えてきたので、複数 LLM を考慮した管理とワークフローのために Kong AI Gateway を手始めにローカル環境の Llama 2 で試してみる

今回の環境

TL;DR

1. Ollama のセットアップ

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama --net kong-net ollama/ollama 
Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
c61820c274a0: Pull complete 
f40883caaaf0: Pull complete 
1e4c8a53437a: Pull complete 
Digest: sha256:8850b8b33936b9fb246e7b3e02849941f1151ea847e5fb15511f17de9589aea1
Status: Downloaded newer image for ollama/ollama:latest
57fcaf1525f8359e9560be5a12bbfd06d224989f9c4765c4a70efdfc2e542c3a

2. Llama 2 のインストール

docker exec -it ollama ollama run llama2
pulling manifest 
pulling 8934d96d3f08: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling fa304d675061: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   91 B
pulling 42ba7f8a01dd: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
success
curl http://localhost:11434/api/tags \
   -H "Content-Type: application/json"
{"models":[{"name":"llama2:latest","model":"llama2:latest","modified_at":"2025-12-04T20:24:30.409111008Z","size":3826793677,"digest":"78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"7B","quantization_level":"Q4_0"}}]}%

3. Kong Gateway のセットアップ

git clone https://github.com/Kong/docker-kong 
Cloning into 'docker-kong'...
remote: Enumerating objects: 4024, done.
remote: Counting objects: 100% (372/372), done.
remote: Compressing objects: 100% (146/146), done.
remote: Total 4024 (delta 313), reused 233 (delta 226), pack-reused 3652 (from 4)
Receiving objects: 100% (4024/4024), 24.62 MiB | 18.06 MiB/s, done.
Resolving deltas: 100% (2047/2047), done.
cd ./docker-kong/compose/
docker compose up -d
WARN[0000] ~/workspaces/docker-kong/compose/docker-compose.yml: the attribute `version` is obsolete, it (out)will be ignored, please remove it to avoid potential confusion 
[+] Running 4/4
 ✔ kong Pulled                                                                                                                                                                                                   6.3s
   ✔ bdaec291fa41 Pull complete                                                                                                                                                                                  0.7s
   ✔ c58fa5fef022 Pull complete                                                                                                                                                                                  0.9s
   ✔ f8382b8ad185 Pull complete                                                                                                                                                                                  3.4s
[+] Running 3/3
 ✔ Volume compose_kong_prefix_vol  Created                                                                                                                                                                       0.0s
 ✔ Volume compose_kong_tmp_vol     Created                                                                                                                                                                       0.0s
 ✔ Container compose-kong-1        Started                                                                                                                                                                       0.4s
open http://localhost:8002

Kong Manager OSS

4. Gateway Services, Routes AI Proxy Plugin の設定

  # ./docker-kong/compose/config/kong.yaml
  # a very minimal declarative config file
- _format_version: "2.1"
+ _format_version: "3.0"
  _transform: true

+ services:
+   - name: llama2-api
+     url: http://host.docker.internal:11434
+
+ routes:
+   - name: llama2-api_llama2-chat
+     service: llama2-api
+     paths:
+       - "~/llama2/chat$"
+   - name: llama2-api_llama2-completions
+     service: llama2-api
+     paths:
+       - "~/llama2/completions$"
+
+ plugins:
+   - name: ai-proxy
+     route: llama2-api_llama2-chat
+     config:
+       route_type: llm/v1/chat
+       model:
+         provider: llama2
+         name: llama2
+         options:
+           llama2_format: ollama
+           upstream_url: http://host.docker.internal:11434/api/chat
+   - name: ai-proxy
+     route: llama2-api_llama2-completions
+     config:
+       route_type: llm/v1/completions
+       model:
+         provider: llama2
+         name: llama2
+         options:
+           llama2_format: ollama
+           upstream_url: http://host.docker.internal:11434/api/generate
docker compose restart
WARN[0000] ~/workspaces/docker-kong/compose/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential (out)confusion 
[+] Restarting 1/1
 ✔ Container compose-kong-1  Started
open http://localhost:8002

Kong Manager OSS Kong Manager OSS Kong Manager OSS

5. 確認

curl -s -X POST localhost:8000/llama2/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
    "role": "user",
    "content": "What is Kong Gateway?"
  }] }'
{"model":"llama2","created":1764883688,"object":"chat.completion","choices":[{"finish_reason":"stop","message":{"role":"assistant","content":"Kong Gateway is an open-source service mesh that enables developers to manage and secure microservices in modern, distributed applications. It provides a centralized management platform for service discovery, traffic routing, monitoring, and security, allowing developers to build and operate complex, scalable systems with ease.\n\nKong Gateway is designed to work with popular containerization platforms like Docker and Kubernetes, as well as traditional infrastructure environments. It supports a wide range of protocols and data formats, including HTTP, gRPC, WebSocket, and more.\n\nKey features of Kong Gateway include:\n\n1. Service Discovery: Kong Gateway provides a flexible service discovery mechanism that allows services to be easily discovered and accessed by other services or clients.\n2. Traffic Routing: Kong Gateway can route traffic between services based on configuration rules, such as URL mapping, path mapping, and more.\n3. Monitoring: Kong Gateway provides built-in monitoring capabilities for services, including metrics and logs, allowing developers to track performance and troubleshoot issues.\n4. Security: Kong Gateway supports a wide range of security features, including authentication, authorization, rate limiting, and more, to protect services from unauthorized access and abuse.\n5. Integration with Other Tools: Kong Gateway can be integrated with other tools and platforms, such as Kubernetes, Docker, and Prometheus, to provide a comprehensive service management solution.\n\nOverall, Kong Gateway is a powerful tool for managing and securing microservices in modern applications, providing a centralized platform for service discovery, traffic routing, monitoring, and security."},"index":0}],"usage":{"completion_tokens":348,"total_tokens":374,"prompt_tokens":26}}%

参考にしたページ