Data Mining for Selected Genes of Agronomic Interest from the Oil Palm Genome Using a Comparative Genomics Approach

  • Rozana Rosli

    Student thesis: Doctoral Thesis


    Data mining of genes related to agronomic traits and transcription factors in plant genomes is essential to provide information on target genes for breeders. Taking advantage of the availability of completed genome sequences and the gene model available in a public database, the comparative genomic analysis of oil palm used here gives an insight into the new informatics approach for discovering genome data. However, in the case of oil palm, it is important to use gene models with good annotation, and also to choose the most significant/related species to be compared with. Several methods were assessed to identify orthologs and then combined with various bioinformatic programs/tools to provide a powerful strategy to discover genes in oil palm associated with agronomic traits. Three genes families, FA (fatty acid related), R (resistance) and DGAT (DIACYLGLYCEROL ACYLTRANSFERASE) were successfully identified and the AFL [ABSCISIC ACID INSENSITIVE 3 (ABI3), FUSCA3 (FUS3) and LEAFY COTYLEDON2 (LEC2)] transcription factor gene family were also investigated. The characterization of six SAD (stearoyl- ACP- desaturase), two FATA (oleoyl-ACP thioesterase) and four FATB (palmitoyl-ACP thioesterase) gene families will help to elucidate the gene regulatory networks involved in fatty acid composition and oil content in oil palm. A total of 61 R gene candidates with a predicted coiled-coil-NBS-LRR (CC-NBS-LRR) domain were detected. The analysis found that the oil palm genome contains respectively three, two, two and two distinctly expressed functional copies of the DGAT1, DGAT2, DGAT3 and WS/DGAT genes. Moreover, this study also able to identify orthologs sequences from other important crop species which 12 plants, namely Arabidopsis thaliana, Brachypodium distachyon, Brassica napus, Elaeis oleifera, Glycine max, Gossypium hirsutum, Helianthus annuus, Musa acuminata, Oryza sativa, Phoenix dactylifera, Sorghum bicolor, and Zea mays. Results achieved in this study showed the significant output from the bioinformatics framework which elucidates/improved the gene prediction task. Also, the availability of big data in the omics era such as RNA-seq data is valuable information to understand the gene activity and roles in a wide range of tissues and also developmental stages especially in the E. guineensis mesocarp and endosperm tissues.
    Date of AwardAug 2018
    Original languageEnglish
    SupervisorDenis Murphy (Supervisor), Francis Hunt (Supervisor) & Jeroen Nieuwland (Supervisor)

    Cite this