# What is an rna pseudoknot

A **pseudoknot** is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots.

The sequence above is read: 5′- AAAACGAGGGGUUUUCGACCCC-3′ and pknot notation is: ((((:::[[[[))):::]]]]

The structural configuration of pseudoknots does not lend itself well to bio-computational detection due to its context-sensitivity or “overlapping” nature. The base pairing in pseudoknots is not well nested; that is, base pairs occur that “overlap” one another in sequence position. This makes the presence of pseudoknots in RNA sequences impossible to predictby the standard method of dynamic programming, which uses a recursive scoring system to identify paired stems and consequently cannot detect non-nested base pairs under most circumstances, over a data set greater than a few 100 nucleotides. The newer method of stochastic context-free grammars suffers from the same problem. Thus popular secondary structure prediction methods like Mfold and Pfold will not predict pseudoknot structures present in a query sequence; they will only identify the more stable of the two pseudoknot stems.

It is possible to contrive situations in which dynamic programming-like methods can identify pseudoknots, but these methods are not general and are extremely inefficient. The general problem of pseudoknot prediction has been shown to be NP-complete.

We use grammatical context-sensitive methods that take a different approach, essentially ignoring RNA molecular energy and searching for specific configurations which can then be further verified via slower and more computationally expensive methods. A candidate configuration is found by matching, as a unit, one stem of some length with another stem of some length in such a way that the two must be connected by canonical base pairs. Loops are added to the grammar as appropriate. rna parse. These grammatical methods bypass the NP problem of trying every possible fitting permutation by instead matching a predefined system of stems and loops – and does so in O(n) time and space (computationally linear.) The final parsing stage takes into account the minimal free energy of the structure as a way of filtering out unlikely structures or structures that cannot exist in nature.