The anti-unification problem is concerned with finding a generalization for two input examples. The interesting generalizations are the least general ones. Informally, the problem can be described as: Given two concrete objects, find an expression that describes all their common features and uniformly represents their differences by variables. Such generalization problems arise in many areas of computer science and mathematics. An algorithm that computes least general generalizations for two input examples is called an anti-unification algorithm. In this work, we study the anti-unification problem for various theories. We develop algorithms that solve such problems, analyze their properties, study their computational complexity, demonstrate their usage on real world examples, and present a library of anti-unification algorithms which is implemented in Java.
Anti-unification algorithms are presented in a rule-based manner featuring a store that keeps track of all the differences at the input examples. Under certain circumstances, an algorithm that maintains such a store can be used to solve the matching problem as well as the problem of finding a join. Some of the algorithms developed in this work need a minimization step that requires to solve the matching problem. We show that those algorithms are self-minimizing in the sense that the matching problem can be solved by the anti-unification algorithm itself.
We develop anti-unification algorithms for unranked terms, nominal terms, and simply-typed lambda terms. For unranked terms, we consider first-order variables and higher-order variables. The anti-unification problem in such higher-order unranked theories is finitary but highly nondeterministic. Therefore, we impose a few natural restrictions on the computed generalizations so that we can establish a one-to-one correspondence between a generalization and a skeleton. (The skeleton consists of all those common parts of the input terms that are to be retained in the generalization.) A generalization can be computed in quadratic time for a given skeleton. This means that, for instance, if the skeleton is a constrained longest common subtree of the input terms, then both skeleton and generalization computation can be done in quadratic time.
In general, the problem of nominal anti-unification is of type zero, (i.e., a minimal complete set of generalizations does not always exist) but if the set of atoms permitted in generalizations is finite, then it becomes unitary. This is the anti-unification problem we consider for nominal terms. For simply-typed lambda terms, we restrict generalizations to be higher-order patterns so that a generalization computed for two arbitrary lambda terms is a unique pattern. (In higher-order patterns all the arguments of free variables are distinct bound variables.) We show an algorithm that computes it in quadratic time.
The implemented Java library provides anti-unification algorithms for all the theories discussed above. Such anti-unification problems arise, for instance, in proof generalization, in analogical reasoning, in detection of similarities in XML documents or in pieces of software code, etc. Therefore, our library can be a valuable ingredient for tools that need to solve such generalization problems.