Comparing Reasoning Patterns: OpenAI’s o1 Model vs. Test-time Compute Methods
The paper https://arxiv.org/pdf/2410.13639 looks into how OpenAI's o1 model handles complex reasoning tasks, comparing it to other methods like Best-of-N (BoN), Step-wise BoN, Agent Workflow, and Self-Refine.

The study reviews how the o1 model performs on tasks like math, coding, and common sense reasoning, aiming to describe its reasoning patterns and potential limitations.
Key findings include:
- Top Performance: o1 model performs best, especially in coding and math tasks, beating other methods like BoN, Step-wise BoN, and Agent Workflow.
- Domain-Specific Prompts: Agent Workflow, using task-specific prompts, improves reasoning over Step-wise BoN, especially for step-by-step tasks.
- Six Reasoning Patterns Identified: These include Systematic Analysis, Method Reuse, Divide and Conquer, Self-Refinement, Context Identification, and Emphasizing Constraints. Divide and Conquer (DC) and Self-Refinement (SR) are the most common patterns.
- Long-Context Inference Issues: Methods like Step-wise BoN often lose focus with long processes, highlighting the importance of handling extended reasoning.
- Token Usage: The number of tokens used varies depending on the task, with more tokens needed for coding and math than for common sense tasks.
- Reward Model Weaknesses: Human-judged responses outdid those selected by reward models, showing room for improvement in search-based methods.
- Search Limits: Expanding the number of responses in BoN helps only up to a point, after which performance levels off or drops, showing that reward model strength and diversity of responses are crucial.
The Six Reasoning Patterns
-
Systematic Analysis (SA): Involves analyzing the overall structure of a problem, its inputs, outputs, and constraints before choosing the best approach.
-
Method Reuse (MR): Applies existing, well-known solutions to familiar problems (e.g., using standard algorithms like the shortest path).
-
Divide and Conquer (DC): Breaks down complex problems into smaller, manageable subproblems, solving them individually to build the final solution.
-
Self-Refinement (SR): Involves iterative improvements, where the model reviews and refines its own reasoning process to correct errors.
-
Context Identification (CI): Focuses on understanding and summarizing relevant context or background information needed to answer a query.
-
Emphasizing Constraints (EC): Highlights and adheres to specific constraints required by the task, ensuring the generated response fits the strict requirements.
Interesting notes:
- Agent Workflow improves performance by trimming down unnecessary steps with domain-specific prompts.
- For tasks like Collie that have strict format rules, o1 effectively handles constraints, preventing errors.
- Divide and Conquer (DC) and Self-Refinement (SR) are key in helping o1 perform better on complex tasks.
The paper concludes that o1's reasoning patterns, especially in math and coding, set it apart from other simpler methods like BoN and Self-Refine. Although Agent Workflow isn't as powerful, it offers solid performance gains by guiding tasks with domain-specific prompts. Further work could improve LLMs by refining reward models and search techniques, as current limitations hold back performance.
Published: Oct 24, 2024 at 3:08 AM