Staged Grammars
We have successfully implemented and coded grammars that are “staged” – meaning a dataset is parsed either with RE’s or context-free or combinations of both then re-parsed from a database or file with more computationally expensive algorithms.
Example: I parsed a 22,000,000 character file for certain structures that may or may not contain a specific secondary structure and sent that to a text file which was automatically re-parsed for a very specific structure. File 2 in this hierarchy was of approximately 100,000 characters.
The original 22 million char. file was parsed in ~ 2 seconds while the second file of 100k char. parsed in ~55 minutes. (Noting that these are linear parses, matching the 2nd structure would have taken roughly 220 hours.) – A huge advantage !
Several grammars coupled with results may be chained and/or branched together, working from less-specific to highly-specific patterns.
I’ll get to posting up some example .exe’s next month after some grant work is completed.