LM Studio REST API (ベータ版) | LM Studio ドキュメント

実験的

LM Studio 0.3.6以降が必要です。現在WIP（開発中）であり、エンドポイントは変更される可能性があります。

LM Studioは、OpenAI互換モード (詳細はこちら) に加えて、独自のREST APIを持つようになりました。

このREST APIには、1秒あたりのトークン数 (Token / Second) や最初のトークンまでの時間 (TTFT) などの強化された統計情報に加え、読み込み済み/未読み込み、最大コンテキスト、量子化など、モデルに関する豊富な情報が含まれています。

サポートされているAPIエンドポイント

GET /api/v0/models - 利用可能なモデルを一覧表示
GET /api/v0/models/{model} - 特定のモデルの情報を取得
POST /api/v0/chat/completions - チャット補完 (メッセージ → アシスタントの応答)
POST /api/v0/completions - テキスト補完 (プロンプト → 補完)
POST /api/v0/embeddings - テキスト埋め込み (テキスト → 埋め込み)

🚧 このインターフェースは現在開発中です。GithubまたはEメールにて、ご意見をお聞かせください。

REST APIサーバーを起動する

サーバーを起動するには、以下のコマンドを実行します

lms server start

プロのヒント

LM Studioをサービスとして実行し、GUIを起動せずにサーバーをブート時に自動起動させることができます。ヘッドレスモードについて学ぶ。

エンドポイント

`GET /api/v0/models`

読み込み済みおよびダウンロード済みの全モデルを一覧表示

リクエスト例

curl https://:1234/api/v0/models

レスポンス形式

{
  "object": "list",
  "data": [
    {
      "id": "qwen2-vl-7b-instruct",
      "object": "model",
      "type": "vlm",
      "publisher": "mlx-community",
      "arch": "qwen2_vl",
      "compatibility_type": "mlx",
      "quantization": "4bit",
      "state": "not-loaded",
      "max_context_length": 32768
    },
    {
      "id": "meta-llama-3.1-8b-instruct",
      "object": "model",
      "type": "llm",
      "publisher": "lmstudio-community",
      "arch": "llama",
      "compatibility_type": "gguf",
      "quantization": "Q4_K_M",
      "state": "not-loaded",
      "max_context_length": 131072
    },
    {
      "id": "text-embedding-nomic-embed-text-v1.5",
      "object": "model",
      "type": "embeddings",
      "publisher": "nomic-ai",
      "arch": "nomic-bert",
      "compatibility_type": "gguf",
      "quantization": "Q4_0",
      "state": "not-loaded",
      "max_context_length": 2048
    }
  ]
}

`GET /api/v0/models/{model}`

特定のモデルの情報を取得

リクエスト例

curl https://:1234/api/v0/models/qwen2-vl-7b-instruct

レスポンス形式

{
  "id": "qwen2-vl-7b-instruct",
  "object": "model",
  "type": "vlm",
  "publisher": "mlx-community",
  "arch": "qwen2_vl",
  "compatibility_type": "mlx",
  "quantization": "4bit",
  "state": "not-loaded",
  "max_context_length": 32768
}

`POST /api/v0/chat/completions`

チャット補完API。メッセージの配列を提供すると、チャットの次のアシスタントの応答が返されます。

リクエスト例

curl https://:1234/api/v0/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "granite-3.0-2b-instruct",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes." },
      { "role": "user", "content": "Introduce yourself." }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false
  }'

レスポンス形式

{
  "id": "chatcmpl-i3gkjwthhw96whukek9tz",
  "object": "chat.completion",
  "created": 1731990317,
  "model": "granite-3.0-2b-instruct",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Greetings, I'm a helpful AI, here to assist,\nIn providing answers, with no distress.\nI'll keep it short and sweet, in rhyme you'll find,\nA friendly companion, all day long you'll bind."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 53,
    "total_tokens": 77
  },
  "stats": {
    "tokens_per_second": 51.43709529007664,
    "time_to_first_token": 0.111,
    "generation_time": 0.954,
    "stop_reason": "eosFound"
  },
  "model_info": {
    "arch": "granite",
    "quant": "Q4_K_M",
    "format": "gguf",
    "context_length": 4096
  },
  "runtime": {
    "name": "llama.cpp-mac-arm64-apple-metal-advsimd",
    "version": "1.3.0",
    "supported_formats": ["gguf"]
  }
}

`POST /api/v0/completions`

テキスト補完API。プロンプトを提供すると、補完が返されます。

リクエスト例

curl https://:1234/api/v0/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "granite-3.0-2b-instruct",
    "prompt": "the meaning of life is",
    "temperature": 0.7,
    "max_tokens": 10,
    "stream": false,
    "stop": "\n"
  }'

レスポンス形式

{
  "id": "cmpl-p9rtxv6fky2v9k8jrd8cc",
  "object": "text_completion",
  "created": 1731990488,
  "model": "granite-3.0-2b-instruct",
  "choices": [
    {
      "index": 0,
      "text": " to find your purpose, and once you have",
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 9,
    "total_tokens": 14
  },
  "stats": {
    "tokens_per_second": 57.69230769230769,
    "time_to_first_token": 0.299,
    "generation_time": 0.156,
    "stop_reason": "maxPredictedTokensReached"
  },
  "model_info": {
    "arch": "granite",
    "quant": "Q4_K_M",
    "format": "gguf",
    "context_length": 4096
  },
  "runtime": {
    "name": "llama.cpp-mac-arm64-apple-metal-advsimd",
    "version": "1.3.0",
    "supported_formats": ["gguf"]
  }
}

`POST /api/v0/embeddings`

テキスト埋め込みAPI。テキストを提供すると、そのテキストの表現が埋め込みベクトルとして返されます。

リクエスト例

curl http://127.0.0.1:1234/api/v0/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-nomic-embed-text-v1.5",
    "input": "Some text to embed"
  }

レスポンス例

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.016731496900320053,
        0.028460891917347908,
        -0.1407836228609085,
        ... (truncated for brevity) ...,
        0.02505224384367466,
        -0.0037634256295859814,
        -0.04341062530875206
      ],
      "index": 0
    }
  ],
  "model": "text-embedding-nomic-embed-text-v1.5@q4_k_m",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}

バグの報告は、Githubでイシューを開いてお願いします。