I have completed a program that uses system of grammars that learn from input, and using what they learn to parse a set of unknown words to see if they check against the original set.
There are 3 files needed:
Learn set: containing words, letters, numbers, names etc.
Rule Set: empty until filled with automatically written rules.
Test Set: containing words, letters, numbers, names etc.
The algorithm is roughly as follows using as simple as possible examples as I can muster.
1) First grammar reads a learn.txt file using rule::= (“%^[+’a-zA-Z’#%’word’^”)
e.g. the learn file contains the word “hello”
and casts that into “word”
2) At the same time print the head of the new rule: (“%^[‘
The body of the new rule: hello
And the tail of the new rule: ‘$%’x’^”)
and store that result into x e.g rule::= (“%^[‘hello’$%’x’^”)
3) The 3rd grammar then reads the new rule from rules.txt
and parses a test.txt file in which it either matches or doesn’t match-
placing the result into a file called results.txt
In other words, the grammars read a file of words, forms a set of rules from what it reads such that another grammar uses the new rules to parse a set of test words.
Work continues on several fronts. I have designed an application that reads my grammars from a file and modifies them as needed.
A sample rule set printed to a file looks like this:
rule –> “%^%^%”$%’x’^%4(%(%([‘T’$%’RHO’;’x’;’`\”A\”+x’.%)%|%(%([‘A’$%’RHO’;’x’;’`\”T\”+x’.%)%|%(%([‘G’$%’RHO’;’x’;’`\”C\”+x’.%)%|%(%([‘C’$%’RHO’;’x’;’`\”G\”+x’.%)%)%)%)%)%)[6’AGCT’#%’x’!%’stem’^”
(This simple grammar matches an rna stem-loop structure where stem size = 4, loop size =6)
Variables may be placed anywhere in the grammar and manipulated by the accompanying C++ code. Combined with ML and/or GA algorithms, this makes for an extremely powerful pattern matching machine.
Research continues and is yielding positive results. Unfortunately, its not a priority to update all details here on wordpress because I doubt its widely read anyway. An entire series of grammars have been packaged as windows apps/unix and will soon be available on my website for download as grantware.
A March 2012 generous personal grant will be used to purchase new computer equipment and software. The remainder will support new projects having to do with GA and advanced CS decision tree methods. Thank you for supporting this research.
I am working on combining Nevill-Manning like GA’s and context-free grammars (context-sensitive grammars later) as an experiment where the program selects rule sets from a database constellation in an attempt to match looser patterns without losing too much original form.
I’ve also added a filter to certain RNA secondary structure paring grammars: I’m quite pleased that a bioinformatician at the Craig Venter Institute (Rockville, MD) sent me C++ source code for the Nussinov-Jacobson algorithm which with some modification, works perfectly as an adjunct to my intended application. This code will help verify and refine results obtained from my grammatical algorithms for structural-consensus RNA secondary structures.
We are juggling with several software applications in bioinformatics as well as some other more-commercial apps not specifically associated with science research or mentioned in http://www.rnaparse.com. Research into the specific problems of RNA folding has led to the deployment of more general fast pattern matching algorithms and some “neural network-like” programs (for lack of another word.) that parse raw data several times over and pass the results between n number of data or text files while re-parsing, adding or removing information as needed. Some example exe’s may be made available in the coming months.