From Language Models to Language Agents

https://www.bilibili.com/video/BV1ju4y1e7Em/?vd_source=ae5444af4da3a86008dab51f4f1a3a37

https://ysymyth.github.io/papers/from_language_models_to_language_agents.pdf

Language Model

Great, but several disadvantages
● Stateless
● Ungrounded 没有与世界交互
● Limited Knowledge 知识是有限的因为unground的原因

Language Agent

Pasted image 20241013085724.png

Prat I:what internal mechanisms are needed?

Mechanism 1: Reasoning-ReAct

thought:both reasioning and acting. Similar to human behavior
RL的效果很差，因为不知道为什么有这个行为(lack of reasoning)
Pasted image 20241013160103.png
Gerneralization UP
Alignment UP
因为学习到了思维，学到了根本。
Pasted image 20241013160310.png
必须会Acting其实就是用工具，人也是这样的。
ReAct的思想就是既然Acting and Reasoning相辅相成，那就一起利用他们。
Pasted image 20241013160531.png

ReAct: Overview

Tasks: Question answering, Fact verification, Text game, Web Interaction task效果非常好
Learning: prompting / finetuning
Model: PaLM-540B / GPT-3
Synergy: Reasoning guides acting, acting supports reasoning
Benefits of ReAct：
○ Flexibility: diverse reasoning / interactive tasks
○ Generalization: strong few-shot / fine-tuning performances
○ Alignment: the human way of problem solving!
就像人类解决问题的方式一样！

How to realize ReAct?

Pasted image 20241013161349.png
reasoning->acting->reasoning->acting->.............就像人一样提prompt.

Without Act: Misinformation

Pasted image 20241013161606.png
出现幻觉，模型开始瞎编。

ReAct: Interpretable, Factual

Pasted image 20241013161755.png
再次强调仿生，仿人。

Act Only: Unable to Synthesize Final Answer

Pasted image 20241013161845.png

ALFWorld Example

Pasted image 20241013162107.png

ALFWorld Example: Reasoning is Key to Long-horizon Acting

Pasted image 20241013162144.png

Pasted image 20241013162153.png
两者对比，我们可以发现加了reason,效果好了很多。think部分这像不像是人类的碎碎念？

ALFWorld Example: Human-in-the-loop Control

Pasted image 20241013162440.png

Pasted image 20241013162448.png
人为去修改他的reasoning也能更正他的acting。

Finetuning > Prompting

Pasted image 20241013162650.png

ReAct: Summary

Pasted image 20241013162750.png
如果有一个更长期的更能操作的memory是不是就更好了！这个想法引出了下一个机制。

Mechanism 2: Learning

(Reinforcement) Learning
Behavior -> Feedback -> Update -> Better Behavior

Learning: Feedback

还是思考人是怎么进行这个人物的。
Pasted image 20241013163405.png
语言的feedback是比做错做对给出的0和1要更加丰富的，也更符合人的学习习惯。

Learning: Update

Pasted image 20241013163541.png
traditional Q-learning.参数更新

gpt4没有办法更改参数。

Reflexion: “Verbal” RL

Pasted image 20241013163958.png
我们看一个例子剩下的不看，全是说性能非常好
Pasted image 20241013164422.png
经验必须压缩成知识给gpt,他才会学，之前的text知识他很难学习到
Pasted image 20241013164917.png