番外 Reasoning and Agents

来自CS224N Lecture 14: Reasoning and Agents

Reasoning(with Large Language Models)

What is Reasoning?

Using facts and logic lo to arrive at an answer.

Deductive Reasoning演绎推理

Use logic to go from premise to firm conclusion

Example

Premise: All mammals have kidneys
Premise: All whales are mammals
Conclusion: All whales have kidneys

Inductive Reasoning归纳推理

From observation, predict a likely conclusion

Example

Observation: When we see a creature with wings, it is usually a bird
Observation: We see a creature with wings.
Conclusion: The creature is likely to be a bird

Abductive Reasoning反向推理

From observation, predict the most likely explanation

Example

Observation: The car cannot start and there is a puddle of liquid under the engine.
Likely Explanation: The car has a leak in the radiator

Reasoning: Formal vs Informal

Formal Reasoning: Follows formal rules of logic along with axiomatic公理化 knowledge to derive conclusions.
Informal Reasoning: Uses intuition, experience, common sense to arrive at answers.

Warning

For most of this lecture, by “reasoning” we mean informal deductive reasoning,
often involving multiple steps

Reasoning in Large Language Models

Large Language models are REALLY GOOD at predicting plausible continuations合理的延续 of text (Lecture-9), that respect constraints in the input (Lecture 10,11), and align well with human preferences (Lecture-10, 11).

Chain-of-thought prompting

https://arxiv.org/pdf/2201.11903
Pasted image 20241011094837.png

Zero-shot CoT prompting

https://arxiv.org/pdf/2205.11916
Pasted image 20241011095447.png
“零样本”（左下角）直接询问模型答案，但没有任何逐步推理的提示，往往容易导致错误答案（如图片中的例子给出的错误答案为8）。

“少样本”（Few-shot）是通过给模型几个类似问题的例子，让模型学习并推广到新问题。

CoT with "Self-consistency”

https://arxiv.org/pdf/2203.11171
Pasted image 20241011140157.png

贪婪解码（Greedy decoding） 上面的：一种常见的解码方法，每次选择最有可能的单词生成推理路径。这个过程可能只得到一个解答，像图中的例子中模型得出的答案是 $14。

自一致性（Self-consistency）：代替单一的贪婪解码，通过生成多个不同的推理路径，来构建一个答案的多样化集合。这些推理路径可能给出不同的结果（例如图中的 $18 和 $26），之后通过对这些推理路径进行"边缘化处理"，即聚合出最终答案。

benchmarks

Pasted image 20241011140437.png
"Ensemble" 是指将多次生成的不同推理路径进行组合，最后得出一个聚合的结果。

Least-to-Most prompting

https://arxiv.org/pdf/2205.10625
Pasted image 20241011140908.png
核心思想是通过将复杂问题分解为一系列更简单的子问题，并逐步解决这些子问题来提高模型的推理能力。

这种“从简单到复杂”的方法，使得模型能够更好地应对比训练示例更复杂的问题，与传统的“思维链提示”（Chain-of-Thought Prompting）相比，Least-to-Most 提示在任务难度较大的情况下表现更优越。

Least-to-Most vs Chain of Thought

Pasted image 20241011141107.png
两种提示词的对比。
Pasted image 20241011141233.png
有点鸡肋？？

Orca: Instruction-tuning small LMs with CoT Rationales

https://arxiv.org/pdf/2306.02707
Pasted image 20241011141638.png

Collect a wide variety of instructions from the FLAN-v2 collection.
Prompt GPT4 or ChatGPT with these instructions along with a system message
Finetune Llama-13b on outputs generated via ChatGPT + GPT4，这一步骤似乎是要把他thought的过程也要提供给小模型，让小模型跟着大模型学习。

Orca-13B 的微调：Orca 的微调专注于使用 GPT-4 或 ChatGPT 的 CoT 输出。Orca 的训练数据不仅仅是直接的问答数据，还包括了 GPT-4 或 ChatGPT 给出的详细推理步骤。这些步骤通常展示了如何一步步从问题推导到最终答案，重点在于解释整个推理过程。
Vicuna-13B 的微调：Vicuna-13B 则主要聚焦于对话数据的微调。它通过人类对话数据进行训练，使其在对话任务中表现得更自然、符合人类对话的风格。虽然它也使用了大模型的输出进行训练，但更多集中于对话和交流场景，而不是专注于复杂推理过程的解答。

Reasoning by Finetuning LMs on their own outputs?

https://arxiv.org/pdf/2312.06585
ReSTEM alternates between the following two steps:

Generate (E-Step): Given reasoning problem, sample multiple solutions from language model. Filter based on some (problem specific) function (answer correctness for math problems)
Improve (M-Step): Update the language model to maximize probability of filtered solutions, using supervised finetuning

mean it is working!!

Can Language Models Reason?

CoT Rationales理由 are often not faithful

https://arxiv.org/pdf/2307.13702
Pasted image 20241011143928.png
推理的理由是在回答之后吗？也就是说，大模型只是为了满足人类的要求，展现了一个所谓的Rationales，其实他啥也不懂。

有的时候Rationales是错的，然后他也答对了。可疑。

Reasoning vs Memorization: Using Counterfactuals反事实

https://arxiv.org/pdf/2307.02477
Pasted image 20241011145639.png

Pasted image 20241011150049.png

Reasoning vs Memorization: Counterfactuals for Analogical Reasoning

https://arxiv.org/pdf/2308.16118
Pasted image 20241011150419.png

Language Model Agents

Some Terminology

术语介绍
Pasted image 20241011151145.png

Applications

Pasted image 20241011151302.png

Instruction following agents

Pre LLMs

https://arxiv.org/pdf/1207.1420
将指令直接映射到动作序列（类似机器翻译）
Pasted image 20241011151619.png
https://www.cs.utexas.edu/~ml/papers/chen.aaai11.pdf
从指令和轨迹对推导结构化计划
Pasted image 20241011151638.png
https://aclanthology.org/P09-1010.pdf
使用强化学习直接将指令映射为动作
Pasted image 20241011151709.png

in 2024

https://arxiv.org/pdf/2106.01345
Pasted image 20241011152121.png
A Simple Language Model Agent with ReACT
https://arxiv.org/pdf/2210.03629
Pasted image 20241011152459.png

Some popular benchmarks for LM agents

Pasted image 20241011152639.png

Pasted image 20241011152654.png

Pasted image 20241011152805.png

Training data for Language Model Agents

Pasted image 20241011153520.png
上下文有少量的学习样本的时候，人类依据类似的指令来操作

Question

代理（智能体）能否自主地探索它们的环境，并生成高质量的合成演示（synthetic demonstrations）？换句话说，能否让AI自己探索并生成合适的数据，而不需要依赖于人工提供的示例。

Use Exploration + Model Generated Data!

https://arxiv.org/pdf/2403.08140
Pasted image 20241011154352.png

Multimodality

So far, we’ve looked at using text-only language models for agents
This is intractable for real-world UIs with very long HTML
Can we instead operate directly over pixel space?
LLaVA
https://arxiv.org/pdf/2304.08485
Pasted image 20241011155008.png
Pix2Struct
https://arxiv.org/pdf/2210.03347
Pasted image 20241011155123.png

LM Agents is an emerging application!

Pasted image 20241011155652.png
llm agent without human prompt is basically nothing.
Pasted image 20241011155803.png
long term planning is really hard for agent
Pasted image 20241011155904.png
这页展示了在不同模型（如 GPT-3.5、GPT-3.5-CoT、GPT-4）和不同实验条件下，任务成功率的对比。最右侧显示使用了更多观测数据和操作接口工程后，成功率有了明显的提升。但是还是比不了人类啊。
Pasted image 20241011155921.png
一个真实交互的场景。

Recap

Pasted image 20241011160221.png