尝试本地部署microsoft/autogen项目

看到官方的概述的时候有点惊讶到了,这更像是LLM分布式集群的感觉,一个Agent下发任务给到多个Agent,但是同以往不一样的是,每一层级的Agent都是能够双相通信,这样的话我前面说到的“层级”其实就是不准确了,它更像是平级的团队。

简单说一下这个项目

项目地址:microsoft/autogen

项目README的概述:

这个项目是FLAML的一个分支。

autogen已经从FLAML毕业,成为一个新的项目。

AutoGen是一个框架,它可以使用多个可以相互对话来解决任务的智能体来开发LLM应用。AutoGen智能体是可定制的,可对话的,并且无缝地允许人类参与。它们可以在使用LLM、人类输入和工具的各种模式下运行。

AutoGen使得用最小的努力构建下一代LLM应用成为可能。它简化了复杂LLM工作流的编排、自动化和优化。它最大化了LLM模型的性能,并克服了它们的弱点。

它支持复杂工作流的多种对话模式。通过可定制和可对话的智能体,开发者可以使用AutoGen构建多种对话模式,涉及对话自主性、智能体数量和> 智能体对话拓扑。

它提供了一系列具有不同复杂度的工作系统。这些系统涵盖了各个领域和复杂度的广泛应用。这展示了AutoGen如何轻松地支持多种对话模式。

AutoGen提供了一个替代openai.Completion或openai.ChatCompletion的增强推理API。它允许轻松地进行性能调优、实用工具(如API统一和> 缓存)和高级使用模式(如错误处理、多配置推理、上下文编程等)。

AutoGen是由微软、宾夕法尼亚州立大学和华盛顿大学的合作研究项目提供支持的。

20231019125906

然后查阅官方给到的文档去使用本地模型完成部署 文档地址

本地部署

原文:

Clone FastChat​

FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. However, its code needs minor modification in order to function properly.

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

Download checkpoint​

ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion > parameters. ChatGLM2-6B is its second-generation version.

Before downloading from HuggingFace Hub, you need to have Git LFS installed.

git clone https://huggingface.co/THUDM/chatglm2-6b

Initiate server

First, launch the controller

python -m fastchat.serve.controller

Then, launch the model worker(s)

python -m fastchat.serve.model_worker --model-path chatglm2-6b

Finally, launch the RESTful API server

python -m fastchat.serve.openai_api_server --host localhost --port 8000

Normally this will work. However, if you encounter error like this, commenting out all the lines containing > finish_reason in fastchat/protocol/api_protocal.py and fastchat/protocol/openai_api_protocol.py will fix the > problem. The modified code looks like:

class CompletionResponseChoice(BaseModel):
    index: int
    text: str
    logprobs: Optional[int] = None
    # finish_reason: Optional[Literal["stop", "length"]]

class CompletionResponseStreamChoice(BaseModel):
    index: int
    text: str
    logprobs: Optional[float] = None
    # finish_reason: Optional[Literal["stop", "length"]] = None

Interact with model using oai.Completion

Now the models can be directly accessed through openai-python library as well as autogen.oai.Completion and autogen.oai.ChatCompletion.

from autogen import oai

# create a text completion request
response = oai.Completion.create(
    config_list=[
        {
            "model": "chatglm2-6b",
            "api_base": "http://localhost:8000/v1",
            "api_type": "open_ai",
            "api_key": "NULL", # just a placeholder
        }
    ],
    prompt="Hi",
)
print(response)

# create a chat completion request
response = oai.ChatCompletion.create(
    config_list=[
        {
            "model": "chatglm2-6b",
            "api_base": "http://localhost:8000/v1",
            "api_type": "open_ai",
            "api_key": "NULL",
        }
    ],
    messages=[{"role": "user", "content": "Hi"}]
)
print(response)

If you would like to switch to different models, download their checkpoints and specify model path when launching model worker(s).

interacting with multiple local LLMs​

If you would like to interact with multiple LLMs on your local machine, replace the model_worker step above with a multi model variant:

python -m fastchat.serve.multi_model_worker \
    --model-path lmsys/vicuna-7b-v1.3 \
    --model-names vicuna-7b-v1.3 \
    --model-path chatglm2-6b \
    --model-names chatglm2-6b

The inference code would be:

from autogen import oai

# create a chat completion request
response = oai.ChatCompletion.create(
    config_list=[
        {
            "model": "chatglm2-6b",
            "api_base": "http://localhost:8000/v1",
            "api_type": "open_ai",
            "api_key": "NULL",
        },
        {
            "model": "vicuna-7b-v1.3",
            "api_base": "http://localhost:8000/v1",
            "api_type": "open_ai",
            "api_key": "NULL",
        }
    ],
    messages=[{"role": "user", "content": "Hi"}]
)
print(response)

总结一下其实就是将chatglm2开放为一个接口提供给AutoGen调用,同理将模型封装为API都能提供给AutoGen调用。

实践出真知

安装fastchat

20231019131637

安装serve

20231019131730

然后启动本地的LLM模型

因为我的模型路径为 /home/kers/lab/transformers/models/THUDM/chatglm2-6b/,所以启动命令如下:

python -m fastchat.serve.model_worker --model-path /home/kers/lab/transformers/models/THUDM/chatglm2-6b/

出现问题

20231019132416

报了一个错误:RuntimeError: The NVIDIA driver on your system is too old (found version 11020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
意思就是我的 CUDA 版本太老了,图中我是用 nvidia-smi 输出了相关信息,能看到我的版本是 11.2 那么我们更新驱动。

20231019132554

这个驱动的版本是 12.2 目前的最新版本。

20231019140738

更新完成了,