The construction of DNA codes using a computer algebra system

  • Niema Aboluion

    Student thesis: Doctoral Thesis


    Coding theory has several applications in Genetics and Bioengineering. This thesis concentrates on a specific application from Computational Biology. This concerns the construction of new DNA codes which satisfy certain combinatorial constraints, using an alphabet of four symbols. The interest in these codes arises because it is possible to synthesise short single strands of DNA known as oligonucleotides. The codes can be useful in the design of these oligonucleotides. For example, the codes are used in DNA computing, as bar codes in molecular libraries and in microarray technologies. The computer algebra system Magma, which deals successfully with coding theory computation, is applied initially to the construction of DNA codes sat- isfying a GC-content constraint and a minimum Hamming distance constraint. The constraints are specified to avoid unwanted hybridizations and to ensure uniform melting temperatures. Additionally, another constraint, known as a reverse-complement constraint, is added to further prevent unwanted hybridiza- tions. This additional constraint is studied using involutions in a permutation group. Codes constructed in this thesis are derived from linear codes over GF(4) and Z4 and additive codes over GF(4). Previous approaches to the construction of these codes are extended in several ways. Longer codes are constructed, the examination of cyclic and extended cyclic codes is more comprehensive, and cosets of codes are considered. In addition, attention is paid to the mapping from field or ring elements to the DNA nucleotides; different mappings can give different lower bounds. Further improvements have been made after the tech- niques of shortening and puncturing are applied to the table of best codes, and also by searching for codes in the tables that have an all-ones vector in their dual. The use of a database of best known linear codes is also considered. In many cases codes are obtained which are larger than the best codes currently known. In the case of codes of length greater than twenty, linear DNA codes have not been constructed previously and so all codes obtained are the best known re- sults. Generator polynomials are given for the codes constructed. Coset leaders are also given in cases where cosets of linear codes are used. Thus it is possible for the reader to construct the codes without repeating the work presented in the thesis. Additionally, files of codewords are available online when the codes constructed are the best known codes and have fewer than 50000 codewords.
    Date of Award2011
    Original languageEnglish


    • Genetic engineering
    • Biological systems

    Cite this