GraphWiz

Abstract

Large language models (LLMs) have achieved impressive success across several fields, but their proficiency in understanding and resolving complex graph problems is less explored. To bridge this gap, we introduce GraphInstruct, a novel and comprehensive instruction-tuning dataset designed to equip language models with the ability to tackle a broad spectrum of graph problems using explicit reasoning paths. Utilizing GraphInstruct, we build GraphWiz, an open-source language model capable of resolving various graph problem types while generating clear reasoning processes. To enhance the model's capability and reliability, we incorporate the Direct Preference Optimization (DPO) framework into the graph problem-solving context. The enhanced model, GraphWiz-DPO, achieves an average accuracy of 65% across nine tasks with different complexity levels, surpassing GPT-4 which has an average accuracy of 43.8%. Moreover, our research delves into the delicate balance between training data volume and model performance, highlighting the potential for overfitting with increased data. We also explore the transferability of the model's reasoning ability across different graph tasks, indicating the model's adaptability and practical application potential. Our investigation offers a new blueprint and valuable insights for developing LLMs specialized in graph reasoning and problem-solving.

Task Definition

• An Example

Figure 1: An example of solving the Connectivity task explicitly within natural language via LLMs.

In Figure 1, given an undirected graph and two nodes in this graph, an LLM needs to answer whether these two nodes are connected through a path.
While the intersection of graphs and LLMs is an emerging field, it remains uncertain to what extent LLMs can comprehend a graph merely from the context. Current graph machine learning tasks, such as node classification and link prediction, primarily require LLMs to focus on the semantic aspects of graphs. This often entails a basic understanding of local subgraph structures. Although benchmarks like mathematical problem-solving and knowledge-oriented question-answering necessitate multi-hop reasoning within a graph, they do not always demand an in-depth understanding of the entire graph's structure. To address this, newer benchmarks like GraphQA and NLGraph have introduced more diverse and complex graph problems to LLMs, demanding a more profound grasp of graph structures---challenges typically addressed by specific graph algorithms.
We aim at leveraging instruction-tuning to build a powerful instruction-following LLM that can map textural descriptions of graphs and structures, and then solve different graph problems explicitly in natural language.

• Task Details

Table 1: Overview of nine tasks in GraphWiz with problem definition, time complexity, graph type (weighted/directed), node size range, and task difficulty. |V| and |E| indicate the number of nodes and edges in the graph.

In this work, we meticulously choose a diverse set of nine graph problems that span across different levels of computational complexity, showcasing the breadth and depth of our study. We include four of linear complexity tasks: Connectivity and Cycle Detection, Bipartite Graph Checking, and Topological Sort; three of polynomial complexity tasks: Shortest Path, Maximum Triangle Sum, and Maximum Flow; two of NP-Complete tasks: Hamilton Path and Subgraph Matching.
By selecting these nine graph problems, our work benefits from a comprehensive exploration of graph theory that spans simple to complex, and tractable to intractable problems. This diverse selection allows us to not only contribute to the theoretical understanding of graph algorithms but also to address a wide range of practical applications.

Technical Description

• GraphInstruct Collection

Figure 2: The overview of GraphInstruct collection process.

Graph Problem Generation. To create a diverse and challenging suite of graph problems for model training and testing, we follow a programming aid way to generate random graph problems for each predefined task. Each task is associated with a unique template designed to capture the distinct characteristics of graphs, such as whether they are directed or undirected and if their edges carry weights. We use the Erdős-Rényi (ER) model to generate random graphs.
Elipict Reasoning Paths Generation. The most significant feature in GraphInstruct focuses on each G-Q paired with an explicit reasoning path R. Considering the impracticality of manually annotating explicit reasoning paths for these graph tasks due to the complexity and time-consuming nature of the work. We rely on GPT-4 to generate the initial reasoning paths.
Data Augmentation with Rejection Sampling. Given the observation that GPT-4 performs poorly on many graph tasks, such as resulting in fewer than 100 correct samples for the Maximum Flow task in D¹. To further augment the dataset with a broader spectrum of reasoning paths, we adopt a rejection sampling strategy.
How to select diverse reasoning paths? This phase is particularly challenging due to the need to balance accuracy with diversity. To address this, we employ a set of refined strategies, categorized into string-based and semantic-based approaches for selecting the distinct generated reasoning paths.

• GraphWiz Training

Based on the GraphInstruct dataset, we build GraphWiz, which is developed to optimize current LLMs' capabilities in solving graph problems with explicit reasoning paths. Our training methodology for GraphWiz is a novel two-phase process that first hones the model's ability to interpret and solve a wide range of graph problems using Mixed-Task Instruction Tuning. The subsequent phase, DPO Alignment, further sharpens the model's reasoning by training it to distinguish between more and less effective problem-solving paths. This innovative approach, particularly the use of DPO in the realm of graph problems, represents a pioneering step in teaching LLMs not just to output explicit answers, but to develop and follow a logical reasoning process that mimics expert problem-solving behavior.

Figure 3: Training pipelines of GraphWiz.

• Case Study

Figure 4: A typical case of GraphWiz and GPT-4.

In this component, we present a case study of a complex cycle detection problem that involves an undirected graph with 89 nodes in Figure 4, a scale that poses substantial challenges even for human solvers. For a human to detect a cycle in such an extensive graph would require external aids or substantial time, as it is impractical to resolve purely through mental computation. We could observe that GPT-4 outputs a very short incorrect answer, which could arise from the model's constraints in processing long inputs or from an erroneous interpretation of the graph's intricate structure. This limitation reflects the challenges that conventional LLMs encounter when adapting to graph-theoretic problems. In contrast, GrphWiz correctly detects a cycle, providing a clear and detailed reasoning path. This demonstration of GrphWiz's spatial reasoning and memory retention capabilities is significant. It indicates that the model has effectively internalized the principles of graph theory to the extent that it can autonomously navigate through and reason about large-scale and complex graph structures. This case study is a testament to the feasibility of employing GraphWiz for sophisticated graph problems. The model's ability to accurately resolve such complex tasks suggests it has substantial practical applications, offering a robust tool for researchers and practitioners.

BibTeX

@articles{chen2024graphwiz,
  title={GraphWiz: An Instruction-Following Language Model for Graph Problems},
  author={Nuo Chen, Yuhan Li, Jianheng Tang, Jia Li},
  journal={arXiv preprint arXiv:2402.16029},
  year={2024}
}

GraphWiz:

An Instruction-Following Language Model for Graph Problems