Skip to content
January 18, 2012

Some Theory on RNA Strings

Added page:

http://www.rnaparse.com/about.html

October 15, 2011

Current events

We are juggling with several software applications in bioinformatics as well as some other more-commercial apps not specifically associated with science research or mentioned in www.rnaparse.com.  Research into the specific problems of RNA folding has led to the deployment of more general fast pattern matching algorithms and some “neural network-like” programs (for lack of another word.) that parse raw data several times over and pass the results between n number of data or text files while re-parsing, adding or removing information as needed.  Some example exe’s may be made available in the coming months.

James

July 16, 2011

Attenuators (con’t), Pseudoknot application.

Matching attenuation sequences has turned out to be a challenging problem. The simplest way I’ve discovered is to write a grammar in two parts, one matching the first loop, and two, checking for direct repeats in the correct positions.  Furthermore, this specific configuration is complex (Thus rare in random data.) and seems rare in genomic sequences.

I’ve located the following attenuator-like structures in two related genomes:

GTGGTCGGCCACAGGCGTGG
((((::::))))::::::::
::::::::((((::::))))
>emb|V01174.1|  Avian myelocytomatosis virus 5' LTR and gene for p96 polyprotein,
proviral DNA in Gallus gallus genomic DNA
Length=3780 GENE ID: 1491913 Amvgp1 | p110 [Avian myelocytomatosis virus]
459  GTGGTCGGCCACAGGCGTGG  478

>gb|AF033809.1|AF033809  Avian myelocytomatosis virus, complete genome
Length=3392 GENE ID: 1491913 Amvgp1 | p110 [Avian myelocytomatosis virus]
72  GTGGTCGGCCACAGGCGTGG  91

Turns out this configuration is more common in bacterial genomes.
MTB DS016976 et al. *note several repeats throughout

((((::::))))::::::::
::::::::((((::::))))
ACGCTGTCGCGTGCCGACGC
GCCGCAGACGGCTAAAGCCG
GCTGGCCGCAGCCAGCGCTG
GTGCACGCGCACAACGGTGC
CGGTGGTCACCGTCGGCGGT
GGTGTCGGCACCGGCCGGTG
GCTCGTCGGAGCCAAAGCTC
GCTCACCCGAGCGGCAGCTC
GGCGATTCCGCCGTCGGGCG
CCGCCCGGGCGGGGCGCCGC
CCGGCAATCCGGCGTGCCGG
TGTTCGGCAACAAGTATGTT
CGCAAACATGCGGGAGCGCA
GCGCGTCGGCGCGGGAGCGC
GCGGACAGCCGCGGGAGCGG
CGCGCGGCCGCGCTGCCGCG
CCCGGAGCCGGGCATCCCCG
CGCCCGGCGGCGCGCTCGCC
GGCCCCACGGCCACCAGGCC
CCGGCGTCCCGGGAGGCCGG
CGGGAAAGCCCGCCATCGGG
CGTGCTGGCACGAAGTCGTG
CGGCTCTGGCCGACATCGGC
GCCGCATCCGGCGGAGGCCG
GGCCTGATGGCCAATCGGCC
GGCGTGATCGCCAAGAGGCG
ACCGGCGCCGGTGATTACCG
TATATAGATATATAGATATA
TATATACATATACAAATATA
TATATATATATATAGATATA
ATATATAAATATAAATATAT
ATATATATATATAGATATAT
GCCGTTGCCGGCCTGGGCCG
GCCGAGGGCGGCTTCGGCCG
GCGCATAAGCGCGAGAGCGC
GCACTCCGGTGCTGCTGCAC
GCGCGTGAGCGCCCGGGCGC
CGGCGCGGGCCGGTTCCGGC
CGGCAGGAGCCGGCGCCGGC
TGGTACCGACCAGATTTGGT
GGCGCCAGCGCCTGGCGGCG
ACCGCATACGGTCCCAACCG
CCGCGGTTGCGGTAGGCCGC
GGCGCCGGCGCCGTCTGGCG
CGGTGGCAACCGTCTGCGGT
CGGCGATGGCCGGTCTCGGC
CCTTCGGTAAGGCAACCCTT
GCCGGGTGCGGCTGCTGCCG
CCGCGGTAGCGGATCACCGC
CCACCGGAGTGGTTGGCCAC
GGCTGGCTAGCCAGCCGGCT
CGACCGTTGTCGGGGCCGAC
CGGGACCACCCGAAGGCGGG
GCCGGAGTCGGCAGTGGCCG
GCGCCGCTGCGCAGCAGCGC
GCGCGGCGGCGCGTTGGCGC
GCGCCGGTGCGCCGCGGCGC
CATCACTTGATGAAATCATC
GTGCTGTTGCACGTTGGTGC
CCCGCCGCCGGGTGATCCCG
GGGAGCTATCCCCCGGGGGA
CGGTGGGCACCGCCCCCGGT
CGGTGGAAACCGAACCCGGT
GCGGGGACCCGCCGAGGCGG
GCGGCCAACCGCAACAGCGG
GCGCGGCAGCGCTGCTGCGC
CCGCCGGTGCGGTGTCCCGC
GCCGGATGCGGCTCCCGCCG
CCCGCACACGGGAGAGCCCG
CGCCCGTAGGCGAATCCGCC
CGGACCAGTCCGACCACGGA
CTGGATGTCCAGCGCGCTGG
GGCGCGGCCGCCTGCTGGCG
CGGCTGTGGCCGCGGTCGGC
ACCCTGGTGGGTACCAACCC
GTGCGTTAGCACCTCGGTGC
CGGCGCGCGCCGTTACCGGC
CGGCCGCAGCCGGGACCGGC
GCCGTGTGCGGCCAGTGCCG
CGGGCGTGCCCGCTAACGGG
CGACGGTGGTCGACACCGAC
GCGAGCGGTCGCGGCCGCGA
CGCGGAGTCGCGGGTGCGCG
CGGCCCAAGCCGAGCGCGGC
CGGTTCCGACCGGATCCGGT
CAGGCCGTCCTGGCGCCAGG
CCGGCGCACCGGCGAACCGG
CATCGTCGGATGGTTTCATC
CCACCAGGGTGGCCGTCCAC
//*************************
I've also developed a pseudoknot grammar application that allows the investigator to vary the length of stems 1 and 2 and loops 1,2 and 3. This will be available as grantware within the next 2-3 months for public download. If you wish a beta copy contact me.
June 12, 2011

Attenuators – preliminary results

Very roughly, attenuators switch between two different stem-loop systems depending on how a gene is to be expressed.

I have completed a grammar that locates potential attenuators in the form:

L= {axbxa}

where a and b are compliments and x is any nucleotide.
note two stems are possible, a-b and b-a

The grammar first parses a stem and loop of some given size plus a tail.
Results are sent to a file where the first few nts. are checked for repeats in the first and last nts.

Two examples:

Hepatitis C virus 

GAAGACATCTCATCTTCTGCCACTCAAAGAAG

((((:::::::::)))):::::::::::::::  stem/loop + tail. configuration a
{{{{::::::::::::::::::::::::}}}}  repeat GAAG
:::::::::::::((((:::::::::::))))  head + stem/loop. configuration b
 s/r          s'              r'

Hepatitis C virus 

CCGGTGAGTACACCGGAATTGCCAGGACGACCGG

((((::::::::))))::::::::::::::::::  stem/loop + tail. configuration a
{{{{::::::::::::::::::::::::::}}}}  repeat CCGG
::::::::::::((((::::::::::::::))))  head + stem/loop. configuration b
 s/r          s'               r'

James F. Lynn


April 4, 2011

Staged Grammars

We have successfully implemented and coded grammars that are “staged” – meaning a dataset is parsed either with RE’s or context-free or combinations of both then re-parsed from a database or file with more computationally expensive algorithms.

Example: I parsed a 22,000,000 character file for certain structures that may or may not contain a specific secondary structure and sent that to a text file which was automatically re-parsed for a very specific structure. File 2  in this hierarchy  was of approximately 100,000 characters.

The original 22 million char. file was parsed in ~ 2 seconds while the second file of 100k char. parsed in ~55 minutes. (Noting that these are linear parses, matching the 2nd structure would have taken roughly 220 hours.) – A huge advantage !

Several grammars coupled with results may be chained and/or  branched together,  working from less-specific to highly-specific patterns.

I’ll get to posting up  some example .exe’s  next month after some grant work is completed.

February 17, 2011

New .exe’s

We have built several new applications having to do with certain hard to parse structures such as ‘kissing hairpins’ in the general form ([)(]) and ORF finding software. Some of these will be released as grantware while some remain private and in development. The following is an example of a kissing hairpin found with our new application:  Parse time is linear at ~ 3000 nts/sec.

December 31, 2010

DNA Direct Repeats

Demo of how a grammar may match repeating nts. in a DNA string. On my machine it runs 6 million nts/~37mins.

http://www.rnaparse.com/Downloads/repeats.exe

December 8, 2010

New TCR app

http://www.rnaparse.com/Downloads.html

For parsing n=12-20 with or without filler: {{{{{{…}}}}}} – {{{{{{{{{{…}}}}}}}}}}

This tandem complement repeat finder run in linear time O(n) and  has a current limit of 100 million bases.

Output is to screen and also to a file called dna_results.

November 9, 2010

TCR Demo available for download

Tandem compliment repeats are examples of multiple crossing (multi-context-sensitive languages) structures.  Examples are: AGCT.TCGA (This example is also a mirror repeat),  GCTC.CGAG (1st G paired with 4th position C. 2nd C paired wirh 5th position G…)

The download demo is limited to scanning 2000 bp at a time in 2 files and finding perfect TCRs of length 12: e.g  ATCGAT.TAGCTA

Create a simple text file called “DNA1″ and “DNA2″ and place in the same directory as TCR6.exe

http://www.rnaparse.com/Downloads.html

Experiments give O(n) time/space at 1,000,000 characters/~30 seconds

October 31, 2010

DARPA Viral Mutation – Prophecy

DARPA Wants to Create Almanacs of Every Possible Virus Mutation

June 2nd, 2010
If ever a story belonged in the Kill Off category, this is it.
Via: Wired:
Right now, preparing for new viral threats means looking to the past, creating hypotheses based on how pathogens have changed before. Now Darpa wants to reverse that strategy: test every possible outcome, to create a prophetic almanac that warns of viral mutations and outbreaks in advance — giving scientists the chance to change the course of the future before illness strikes.
The Pentagon’s far-out research arm has been zeroing in on the danger of mutating pathogens, and the corresponding problem of drug resistance, for years now. The agency is already funding tobacco-based vaccine production, a seven-day plan to thwart biothreats, and prescient viral infection detectors. And they’ve even set their sights on psychic medics, with a 2007 program that sought to turn docs into all-knowing illness predictors.
Now, Darpa wants the powers of premonition to wipe out viral threats altogether. They’re hosting a workshop for a new program, called “Prophecy,” that’ll develop methods to predict the rate, location and likely mutations of viral agents.
First, the agency wants novel lab-based methods to reproduce “virus-host interactions,” in different environments. After that, researchers will sequence different viral genomes, and test how they adapt and change under diverse conditions.
Ideally, that’ll yield a host of algorithms, capable of accurately predicting “the rate, direction and phenotype of viral mutations.” From there, scientists will be able to develop appropriate attack strategies in the right geographic locations. Most notably, Darpa wants to see mere mortals outdo the forces of nature, by creating “high energy evolutionary boundaries” that keep genetic mutations at bay.
Even if Darpa’s program doesn’t result in omniscient predictive powers, the possibility of more accurately anticipating viral mutations would have widespread implications. Health agencies could prep for looming outbreaks, new vaccines could be fast-tracked — and if scientists do manage to thwart evolution, the threat of resistance to antibiotic and antiviral meds could be all but eliminated.

Follow

Get every new post delivered to your Inbox.