Research meetings: Tuesdays, 1:30pm-3:30pm, MC 326.
A DNA single-strand consists of four different types of units called bases strung together by an oriented backbone like beads on a wire. The bases are Adenine (A), Guanine (G), Cytosine (C) and Thymine (T), and A can chemically bind to an opposing T on another single strand, while C can similarly bind to G. Bases that can thus bind are called Watson/Crick (W/C) complementary , and two DNA single strands with opposite orientation and with W/C complementary bases at each position can bind to each other to form a DNA double strand in a process called base-pairing. To encode information using DNA, one can choose an encoding scheme mapping the original alphabet onto strings over {A, C, G, T}, and proceed to synthesize the obtained information-encoding strings as DNA single strands. A computation will consists of a succession of bio-operations , such as cutting and pasting DNA strands, separating DNA sequences by length, extracting DNA sequences containing a given pattern or making copies of DNA strands. The DNA strands representing the output of the computation can then be read out and decoded.
Herein lies a wealth of problems to be explored, stemming from the fact that encoding information in DNA differs from encoding it electronically, and bio-operations differ from electronic computer operations. A fundamental difference arises from the fact that in electronic computing data interaction is fully controlled while, in a test-tube DNA-computer, free-floating data-encoding DNA single strands can bind because of W/C complementarity. Another difference is that, in DNA computing, a bio-operation usually consumes both operands. This implies that, if one of the operands is either involved in an illegal binding or has been consumed by a previous bio-operation, it is unavailable for the desired computation. Yet another difference is that, while in electronic computing a bit is a single individual element, in DNA experiments each submicroscopic DNA molecule is usually present in millions of identical copies. The bio-operations operate in a massively parallel fashion on all identical strands. However, this process is governed by the laws of chemistry and thermodynamics, and the output obeys statistical laws.
Differences like the ones mentioned above point to the fact that a new approach should be employed when analyzing bioinformation and biocomputation. This issue is addressed by the research projects below:
The long term goal of this line of research is to understand and utilize the mathematical and computational properties of DNA-encoded information for data storage or computational purposes. The results of this research could play a significant role in understanding of genomic information, as well as in the design of an error-free DNA computer which could become -- for certain applications-- superior to current electronic computers by providing huge information density, massive parallelism and high energy efficiency.
This project has as its goal the understanding of natural information processing by investigating the mathematics behind genomic information, by modeling biological information processing, and by studying natural occurring self-assembly phenomena as mathematical processes.
The proposed research into the mathematical theory of bioinformation has several aims: to contribute to a deeper understanding of genomic information, to model and study computation as a biological process, and to understand self-assembly as computation. Its results could have implications for many fields, such as genomics, theory of computation, material science, DNA computing and nano-technology.
The objective of this approach to the understanding natural information processing is to probe the power and limitations of nanocomputation by self-assembly.
It is becoming clear that computation plays an important role in understanding self-organization of matter, and several studies have in fact demonstrated that self-assembly can be used to compute. The practical importance of self-assembly comes from the rise of the field of nanotechnology, wherein the individual nano-components are too small to be manipulated by existing tools, and novel methods such as self-assembly have to be employed for modelling, design and engineering of extremely complex structures at this scale
In its simplest form, a self-assembly system consists of a set of so-called tiles that can bind to each other by prescribed binding rules to form complex structures. Self-assembly can thus model processes as different as inorganic crystal formation (each tile represents a molecule and the binding rules represent the chemical bonds that can form), and DNA computing with interacting DNA ``tiles''. A DNA tile is a complex two-dimensional DNA structure, for example rectangle-shaped, that has at its corners single strands capable of ``sticking'' to complementary ones. Depending on the corner sequences, the tiles can attach only to certain other tiles, contributing to forming a DNA lattice that can simulate a computation.
The mathematical study of self-assembly poses many interesting and challenging questions, some related to classical tiling, dynamical systems and cellular automata, and others new to the self-assembly context.
We intend to continue research in the area of self-assembly by investigating the effects of topology on the complexity of the obtained structures. We propose to analyze the effect on nanocomputations of working with arbitrary neighbourhoods, including ``Swiss cheese'' neighbourhoods of a tile where adjacent tiles are not neighbours, while far-away tiles may be neighbours. We also intend to initiate a study of a complexity hierarchy of nanocomputations by self-assembly based on their intrinsic characteristics such as neighbourhood topology, complexity and number of types of ``bonds'' between individual tiles.
It is anticipated that this research will contribute to a deeper insight into the theoretical properties of nanocomputation by self-assembly, as well as contribute to understanding the power and limitations of experimental nanocomputing techniques.
Comparing species based on portions of their genome has been identified as one of the most important challenges in the post-genomic era. To find how distinct different species are, many methods and algorithms have been proposed in literature. We compared different species based on their genomic signature generated using the Chaos Game Representation (CGR) method. Chaos Game Representation is a method of visualizing a DNA sequence as a 2D image. 2D images corresponding to different DNA sequences can then be compared using various distance measures, resulting in quantitative comparisons of the originating DNA sequences.
We improved the sensitivity of the comparison methods, by introducing some image comparisons for CGRs. We used this method to build a phylogenetic tree for 26 model organisms, based on their mitochondrial DNA, and achieved a smaller number of anomalies than previous methods, including ClustalW.
We also introduced the novel notion of Molecular Distance Map , where each genome is represented by a "genome-point" whose (x, y)-coordinates are computed based on a matrix of relative distances between species. In the resulting plot in the Cartesian coordinate system, the distance between two genome-points corresponds to the relatedness of their corresponding genomes.
This visualization method showed a clustering of genome-points on the plane that is compatible with the known relatedness of species. The method has the potential to be used for the identification of new species based on any long-enough (> 2,000 bp) portions of their genome, by identifying the region in space that the corresponding new genome-point belongs to.