应用引入LLM实践-一次性输出和流式输出(思维链)

篁瞑普 · 2025-6-1 18:24:28

在大模型应用时，有的场景希望根据prompt要求一次性输出结果，有的场景则希望输出整个思维过程以及最后的结果。
这部分在网上看了一些文章说的都不一样，自己尝试了一下，正确的写法是这样的，记录一下。
一次性输出：

from openai import OpenAI
def generate_huoshan(prompt):
client = OpenAI(
# 从环境变量中读取您的方舟API Key
api_key="**",
base_url="https://ark.cn-beijing.volces.com/api/v3",
# 深度推理模型耗费时间会较长，建议您设置一个较长的超时时间，推荐为30分钟
timeout=1800,
)
response = client.chat.completions.create(
model="deepseek-r1-250120",
messages=[
{"role": "system", "content": "You are a professional market research assistant who needs to accurately obtain retail price information for specified electronic products in a specific market"},
{"role": "user", "content": prompt},
],
max_tokens=1024,
temperature=0.6,
stream=False
)
answer = response.choices[0].message.content
return answer.strip()

复制代码

View Code流式输出：

from openai import OpenAI
def generate_huoshan(prompt):
client = OpenAI(
api_key="*",
base_url="https://ark.cn-beijing.volces.com/api/v3",
# 深度推理模型耗费时间会较长，建议您设置一个较长的超时时间，推荐为30分钟
timeout=1800,
)
response = client.chat.completions.create(
model="deepseek-r1-250120",
messages=[
{"role": "system", "content": "You are a professional market research assistant who needs to accurately obtain retail price information for specified electronic products in a specific market"},
{"role": "user", "content": prompt},
],
max_tokens=1024,
temperature=0.6,
stream=True
)
for chunk in response:
delta = chunk.choices[0].delta
# 优先提取思维链内容
if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
yield delta.reasoning_content
#print(f"[推理过程] {delta.reasoning_content}", end="\n", flush=True)
# 处理最终回答内容
elif delta.content:
yield delta.content
#print(f"[最终回答] {delta.content}", end="", flush=True)
else:
continue

复制代码

View Code外层通过这样返回：

def generate_stream():
try:
for chunk in generate_text(model_name, prompt):
#yield chunk
yield json.dumps({"msg": "Success", "code": 200, "data": chunk})+ '\n'
except Exception as e:
yield json.dumps({"code": 500, "message": str(e)})+ '\n' # Yield a JSON string
headers = {
'Content-Type':'text/event-stream',
'Cache-Control': 'no-cache',
'X-Accel-Buffering':'no',
}
return Response(generate_stream(), mimetype='text/event-stream',headers=headers)

复制代码

View Code然后，前端相应做解析即可。

来源：程序园用户自行投稿发布，如果侵权，请联系站长删除
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！

账号		自动登录	找回密码
密码			立即注册