LM Studio REST API (ベータ版) | LM Studio ドキュメント

実験的

LM Studio 0.3.6以降が必要です。まだ開発中であり、エンドポイントは変更される可能性があります。

LM Studioには、OpenAI互換モードに加えて、独自のREST APIが用意されています（詳細はこちら）。

REST APIには、トークン/秒や初回トークンまでの時間（TTFT）などの強化された統計情報、およびロード済み/未ロード、最大コンテキスト、量子化など、モデルに関する豊富な情報が含まれています。

サポートされているAPIエンドポイント

GET /api/v0/models - 利用可能なモデルを一覧表示
GET /api/v0/models/{model} - 特定のモデルに関する情報を取得
POST /api/v0/chat/completions - チャット補完（メッセージ → アシスタントの応答）
POST /api/v0/completions - テキスト補完（プロンプト → 補完）
POST /api/v0/embeddings - テキスト埋め込み（テキスト → 埋め込みベクトル）

🚧 このインターフェースは現在開発中です。皆様にとって重要な機能について、GitHubまたはメールでお知らせください。

REST APIサーバーを起動する

サーバーを起動するには、次のコマンドを実行します。

lms server start

プロのヒント

LM Studioをサービスとして実行し、GUIを起動せずにサーバーを自動起動させることができます。ヘッドレスモードについて学ぶ。

エンドポイント

`GET /api/v0/models`

ロード済みおよびダウンロード済みのすべてのモデルを一覧表示

リクエスト例

curl https://:1234/api/v0/models

レスポンス形式

{
  "object": "list",
  "data": [
    {
      "id": "qwen2-vl-7b-instruct",
      "object": "model",
      "type": "vlm",
      "publisher": "mlx-community",
      "arch": "qwen2_vl",
      "compatibility_type": "mlx",
      "quantization": "4bit",
      "state": "not-loaded",
      "max_context_length": 32768
    },
    {
      "id": "meta-llama-3.1-8b-instruct",
      "object": "model",
      "type": "llm",
      "publisher": "lmstudio-community",
      "arch": "llama",
      "compatibility_type": "gguf",
      "quantization": "Q4_K_M",
      "state": "not-loaded",
      "max_context_length": 131072
    },
    {
      "id": "text-embedding-nomic-embed-text-v1.5",
      "object": "model",
      "type": "embeddings",
      "publisher": "nomic-ai",
      "arch": "nomic-bert",
      "compatibility_type": "gguf",
      "quantization": "Q4_0",
      "state": "not-loaded",
      "max_context_length": 2048
    }
  ]
}

`GET /api/v0/models/{model}`

1つの特定のモデルに関する情報を取得

リクエスト例

curl https://:1234/api/v0/models/qwen2-vl-7b-instruct

レスポンス形式

{
  "id": "qwen2-vl-7b-instruct",
  "object": "model",
  "type": "vlm",
  "publisher": "mlx-community",
  "arch": "qwen2_vl",
  "compatibility_type": "mlx",
  "quantization": "4bit",
  "state": "not-loaded",
  "max_context_length": 32768
}

`POST /api/v0/chat/completions`

チャット補完API。メッセージ配列を提供すると、次のアシスタント応答がチャットで返されます。

リクエスト例

curl https://:1234/api/v0/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "granite-3.0-2b-instruct",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes." },
      { "role": "user", "content": "Introduce yourself." }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false
  }'

レスポンス形式

{
  "id": "chatcmpl-i3gkjwthhw96whukek9tz",
  "object": "chat.completion",
  "created": 1731990317,
  "model": "granite-3.0-2b-instruct",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Greetings, I'm a helpful AI, here to assist,\nIn providing answers, with no distress.\nI'll keep it short and sweet, in rhyme you'll find,\nA friendly companion, all day long you'll bind."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 53,
    "total_tokens": 77
  },
  "stats": {
    "tokens_per_second": 51.43709529007664,
    "time_to_first_token": 0.111,
    "generation_time": 0.954,
    "stop_reason": "eosFound"
  },
  "model_info": {
    "arch": "granite",
    "quant": "Q4_K_M",
    "format": "gguf",
    "context_length": 4096
  },
  "runtime": {
    "name": "llama.cpp-mac-arm64-apple-metal-advsimd",
    "version": "1.3.0",
    "supported_formats": ["gguf"]
  }
}

`POST /api/v0/completions`

テキスト補完API。プロンプトを提供すると、補完が返されます。

リクエスト例

curl https://:1234/api/v0/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "granite-3.0-2b-instruct",
    "prompt": "the meaning of life is",
    "temperature": 0.7,
    "max_tokens": 10,
    "stream": false,
    "stop": "\n"
  }'

レスポンス形式

{
  "id": "cmpl-p9rtxv6fky2v9k8jrd8cc",
  "object": "text_completion",
  "created": 1731990488,
  "model": "granite-3.0-2b-instruct",
  "choices": [
    {
      "index": 0,
      "text": " to find your purpose, and once you have",
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 9,
    "total_tokens": 14
  },
  "stats": {
    "tokens_per_second": 57.69230769230769,
    "time_to_first_token": 0.299,
    "generation_time": 0.156,
    "stop_reason": "maxPredictedTokensReached"
  },
  "model_info": {
    "arch": "granite",
    "quant": "Q4_K_M",
    "format": "gguf",
    "context_length": 4096
  },
  "runtime": {
    "name": "llama.cpp-mac-arm64-apple-metal-advsimd",
    "version": "1.3.0",
    "supported_formats": ["gguf"]
  }
}

`POST /api/v0/embeddings`

テキスト埋め込みAPI。テキストを提供すると、そのテキストの表現である埋め込みベクトルが返されます。

リクエスト例

curl http://127.0.0.1:1234/api/v0/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-nomic-embed-text-v1.5",
    "input": "Some text to embed"
  }

レスポンス例

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.016731496900320053,
        0.028460891917347908,
        -0.1407836228609085,
        ... (truncated for brevity) ...,
        0.02505224384367466,
        -0.0037634256295859814,
        -0.04341062530875206
      ],
      "index": 0
    }
  ],
  "model": "text-embedding-nomic-embed-text-v1.5@q4_k_m",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}

バグはGitHubで問題を報告してください。

APILM Studio REST API (ベータ版)

LM Studio 0.3.6以降が必要です。まだ開発中であり、エンドポイントは変更される可能性があります。

サポートされているAPIエンドポイント

🚧 このインターフェースは現在開発中です。皆様にとって重要な機能について、GitHubまたはメールでお知らせください。

REST APIサーバーを起動する

エンドポイント

GET /api/v0/models

GET /api/v0/models/{model}

POST /api/v0/chat/completions

POST /api/v0/completions

POST /api/v0/embeddings

API
LM Studio REST API (ベータ版)

`GET /api/v0/models`

`GET /api/v0/models/{model}`

`POST /api/v0/chat/completions`

`POST /api/v0/completions`

`POST /api/v0/embeddings`