A Trace-based Framework for Comparing String Matchers
Table of Contents
1 About
This page publishes files associated with my master's thesis. My thesis presents a trace-based framework that can compose string matchers and can compare matchers using three methods:
- Identify matcher
- Build table grouping trace-equivalent matchers
- Build an evolutionary tree over a set of string matchers
- (And an extra method not mentioned in the thesis) Calculate the average comparisons to text length.
The framework comes with string matchers generated by itself and string matchers from The Handbook of Exact String Matching Algorithms by Charras and Lecroq.
The trace-based framework is implemented in Scheme using the interpreter version 2.0.0.
2 News
- 1 Mar 2012: Uploaded dissertation and version 1.1 (includes my work with the Zhu-Takoaka string matcher)
- 24 Oct 2011: Evolutionary trees do not work in the Windows version. Please use the Linux version.
- 16 Sep 2011: Mentioned GNU Guile and Charras and Lecroq's Handbook of Exact String Matching Algorithms.
- 15 Sep 2011: Added abstract and more figures.
- 1 Sep 2011: Initial page
3 Thesis abstract
Many different string matchers exist, but not so much exists on how these string matchers differ from each other. We present a trace-based framework that reveals these differences through various methods of comparing string matchers. These methods help us investigate, understand, and build string matchers.
We introduce three methods of comparing string matchers: (1) identifying a single string matcher by finding a known string matcher that is trace-equivalent with the single string matcher; (2) grouping a set of string matchers into trace-equivalent groups; and (3, our original contribution) building an evolutionary tree over string matchers. We present our results from applying these methods on traced versions of known string matchers, string matchers from published papers, and matchers generated using our framework by combining string-matching concepts in various ways.
Furthermore, we describe the work leading up to our framework. We define string-matching algorithms such as Knuth, Morris and Pratt's. We describe string-matching concepts and the specializing of string matchers as introduced by Futamura. We describe the composition of string matchers by combining string-matching concepts like Queinnec and Geffroy's string-matching framework. Finally, we describe how to compare string matchers by their traces on a set of input, as introduced in Rohde's string-matching framework.
Finally, from our experiences with working on string matching, we introduce a new approach to the understanding of string-matching algorithms. The traditional approach consists of understanding string-matching algorithms one by one. In contrast, our new approach focuses on understanding string-matching concepts and how to combine these basic concepts to compose string matchers. The new approach uses our methods to understand the similarities and differences of composed, specialized, and known string matchers.
4 Dissertation
5 Slides from the presentation
6 Download framework
- trace-based framework 1.1 linux 32bit
- trace-based framework 1.0 linux 32bit
- trace-based framework 1.0 windows 32bit (Warning: building evolutionary trees does not work in this version)
7 Figures
- Evolutionary trees
- Over chosen string matchers
- Over all our string matchers
- Tables separating string matchers
- Identifying string matchers from the published papers
- knuth-morris-pratt-SIAM77-abcabcacab
- consel-danvy-IPL89-naive-approach
- consel-danvy-IPL89-still-naive-approach
- consel-danvy-IPL89-still-naive-ababc
- consel-danvy-IPL89-further-optimization
- consel-danvy-IPL89-further-optimization-abcabcacab
- queinnec-geffroy-WSA92-babar
- queinnec-geffroy-WSA92-foo
- soerensen-al-JFP96-fig-4-aab
- soerensen-al-JFP96-fig-11-aab
- soerensen-al-JFP96-fig-18
- amtoft-al-Jones02-left-to-right
- amtoft-al-Jones02-fig-2-aaa
- amtoft-al-Jones02-right-to-left
- amtoft-al-Jones02-fig-3-abb
- amtoft-al-Jones02-fig-3-abb-prune-duplicates
- ager-al-2002-ASIA-PEPM02-fig-3
- ager-al-2002-ASIA-PEPM02-fig-6
- ager-al-TOPLAS06-fig-1
- ager-al-TOPLAS06-fig-3
- ager-al-TOPLAS06-fig-4
- danvy-rohde-IPL06-sec-2
- danvy-rohde-IPL06-sec-3
- danvy-rohde-IPL06-sec-3-aba
- danvy-rohde-IPL06-sec-4
- Average trace lengths over text lengths