Misplaced Pages

E-graph

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Graph data structure

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let Σ {\displaystyle \Sigma } be a set of uninterpreted functions, where Σ n {\displaystyle \Sigma _{n}} is the subset of Σ {\displaystyle \Sigma } consisting of functions of arity n {\displaystyle n} . Let i d {\displaystyle \mathbb {id} } be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of f Σ n {\displaystyle f\in \Sigma _{n}} to e-class IDs i 1 , i 2 , , i n i d {\displaystyle i_{1},i_{2},\ldots ,i_{n}\in \mathbb {id} } is denoted f ( i 1 , i 2 , , i n ) {\displaystyle f(i_{1},i_{2},\ldots ,i_{n})} and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:

  • A union-find structure U {\displaystyle U} representing equivalence classes of e-class IDs, with the usual operations f i n d {\displaystyle \mathrm {find} } , a d d {\displaystyle \mathrm {add} } and m e r g e {\displaystyle \mathrm {merge} } . An e-class ID e {\displaystyle e} is canonical if f i n d ( U , e ) = e {\displaystyle \mathrm {find} (U,e)=e} ; an e-node f ( i 1 , , i n ) {\displaystyle f(i_{1},\ldots ,i_{n})} is canonical if each i j {\displaystyle i_{j}} is canonical ( j {\displaystyle j} in 1 , , n {\displaystyle 1,\ldots ,n} ).
  • An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
    • a hashcons H {\displaystyle H} (i.e. a mapping) from canonical e-nodes to e-class IDs, and
    • an e-class map M {\displaystyle M} that maps e-class IDs to e-classes, such that M {\displaystyle M} maps equivalent IDs to the same set of e-nodes: i , j i d , M [ i ] = M [ j ] f i n d ( U , i ) = f i n d ( U , j ) {\displaystyle \forall i,j\in \mathbb {id} ,M=M\Leftrightarrow \mathrm {find} (U,i)=\mathrm {find} (U,j)}

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants. Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes f ( i 1 , , i n ) , f ( j 1 , , j n ) {\displaystyle f(i_{1},\ldots ,i_{n}),f(j_{1},\ldots ,j_{n})} are congruent when f i n d ( U , i k ) = f i n d ( U , j k ) , k { 1 , , n } {\displaystyle \mathrm {find} (U,i_{k})=\mathrm {find} (U,j_{k}),k\in \{1,\ldots ,n\}} . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

This section needs expansion. You can help by adding to it. (June 2021)

E-graphs expose wrappers around the a d d {\displaystyle \mathrm {add} } , f i n d {\displaystyle \mathrm {find} } , and m e r g e {\displaystyle \mathrm {merge} } operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

Equivalent formulations

An e-graph can also be formulated as a bipartite graph G = ( N i d , E ) {\displaystyle G=(N\uplus \mathrm {id} ,E)} where

  • i d {\displaystyle \mathrm {id} } is the set of e-class IDs (as above),
  • N {\displaystyle N} is the set of e-nodes, and
  • E ( i d × N ) ( N × i d ) {\displaystyle E\subseteq (\mathrm {id} \times N)\cup (N\times \mathrm {id} )} is a set of directed edges.

There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.

E-matching

Let V {\displaystyle V} be a set of variables and let T e r m ( Σ , V ) {\displaystyle \mathrm {Term} (\Sigma ,V)} be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, T e r m ( Σ , V ) {\displaystyle \mathrm {Term} (\Sigma ,V)} is the smallest set such that V T e r m ( Σ , V ) {\displaystyle V\subset \mathrm {Term} (\Sigma ,V)} , Σ 0 T e r m ( Σ , V ) {\displaystyle \Sigma _{0}\subset \mathrm {Term} (\Sigma ,V)} , and when x 1 , , x n T e r m ( Σ , V ) {\displaystyle x_{1},\ldots ,x_{n}\in \mathrm {Term} (\Sigma ,V)} and f Σ n {\displaystyle f\in \Sigma _{n}} , then f ( x 1 , , x n ) T e r m ( Σ , V ) {\displaystyle f(x_{1},\ldots ,x_{n})\in \mathrm {Term} (\Sigma ,V)} . A term containing variables is called a pattern, a term without variables is called ground.

An e-graph E {\displaystyle E} represents a ground term t T e r m ( Σ , ) {\displaystyle t\in \mathrm {Term} (\Sigma ,\emptyset )} if one of its e-classes represents t {\displaystyle t} . An e-class C {\displaystyle C} represents t {\displaystyle t} if some e-node f ( i 1 , , i n ) C {\displaystyle f(i_{1},\ldots ,i_{n})\in C} does. An e-node f ( i 1 , , i n ) C {\displaystyle f(i_{1},\ldots ,i_{n})\in C} represents a term g ( j 1 , , j n ) {\displaystyle g(j_{1},\ldots ,j_{n})} if f = g {\displaystyle f=g} and each e-class M [ i k ] {\displaystyle M} represents the term j k {\displaystyle j_{k}} ( k {\displaystyle k} in 1 , , n {\displaystyle 1,\ldots ,n} ).

e-matching is an operation that takes a pattern p T e r m ( Σ , V ) {\displaystyle p\in \mathrm {Term} (\Sigma ,V)} and an e-graph E {\displaystyle E} , and yields all pairs ( σ , C ) {\displaystyle (\sigma ,C)} where σ V × i d {\displaystyle \sigma \subset V\times \mathbb {id} } is a substitution mapping the variables in p {\displaystyle p} to e-class IDs and C i d {\displaystyle C\in \mathbb {id} } is an e-class ID such that the term σ ( p ) {\displaystyle \sigma (p)} is represented by C {\displaystyle C} . There are several known algorithms for e-matching, the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.

Extraction

Given an e-class and a cost function that maps each function symbol in Σ {\displaystyle \Sigma } to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard. There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.

Complexity

  • An e-graph with n equalities can be constructed in O(n log n) time.

Equality saturation

This section needs expansion. You can help by adding to it. (June 2021)

Equality saturation is a technique for building optimizing compilers using e-graphs. It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3 and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. E-graphs are also used in the Simplify theorem prover of ESC/Java.

Equality saturation is used in specialized optimizing compilers, e.g. for deep learning and linear algebra. Equality saturation has also been used for translation validation applied to the LLVM toolchain.

E-graphs have been applied to several problems in program analysis, including fuzzing, abstract interpretation, and library learning.

References

  1. (Willsey et al. 2021)
  2. (Willsey et al. 2021)
  3. (Goharshady, Lam & Parreaux 2024)
  4. (de Moura & Bjørner 2007)
  5. Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit". Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007). 198 (2): 19–35. doi:10.1016/j.entcs.2008.04.078. ISSN 1571-0661.
  6. Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages. 6 (POPL): 35:1–35:22. doi:10.1145/3498696. S2CID 236924583.
  7. Stepp, Michael Benjamin (2011). Equality saturation: engineering challenges and applications (PhD thesis). USA: University of California at San Diego. ISBN 978-1-267-03827-2.
  8. (Goharshady, Lam & Parreaux 2024)
  9. (Flatt et al. 2022, p. 2)
  10. (Tate et al. 2009)
  11. de Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". In Ramakrishnan, C. R.; Rehof, Jakob (eds.). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi:10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3.
  12. Rümmer, Philipp (2012). "E-Matching with Free Variables". In Bjørner, Nikolaj; Voronkov, Andrei (eds.). Logic for Programming, Artificial Intelligence, and Reasoning. Proceedings. 18th International Conference, LPAR-18, Merida, Venezuela, March 11–15, 2012. Lecture Notes in Computer Science. Vol. 7180. Berlin, Heidelberg: Springer. pp. 359–374. doi:10.1007/978-3-642-28717-6_28. ISBN 978-3-642-28717-6.
  13. (Flatt et al. 2022, p. 2)
  14. Detlefs, David; Nelson, Greg; Saxe, James B. (May 2005). "Simplify: a theorem prover for program checking". Journal of the ACM. 52 (3): 365–473. doi:10.1145/1066100.1066102. ISSN 0004-5411. S2CID 9613854.
  15. Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566. ISSN 0362-1340.
  16. Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv:2101.01332 .
  17. Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv:2002.07951 .
  18. Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi:10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1.
  19. "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.
  20. Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). "Abstract Interpretation on E-Graphs". arXiv:2203.09191 .
    Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). "Combining E-Graphs with Abstract Interpretation". arXiv:2205.14989 .
  21. Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages. 7 (POPL): 396–424. arXiv:2212.04596. doi:10.1145/3571207. ISSN 2475-1421. S2CID 254536022.

External links

Program analysis
Key concepts A simple control-flow graph
Semantics
Types
Models
Analyses
Static
Dynamic
Formal methods
Concepts
Logics
Data structures
Tools
Constraint solvers
Lightweight
Proof assistants
Category: