Automated recognition of retroviral sequences in genomic data - RetroTector©
2007 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 35, no 15, p. 4964-4976
Article in journal (Refereed) Published
Resource type
Text
Abstract [en]
Eukaryotic genomes contain many endogenous retroviral sequences (ERVs). ERVs are often severely mutated, therefore difficult to detect. A platform independent (Java) program package, RetroTector© (ReTe), was constructed. It has three basic modules: (i) detection of candidate long terminal repeats (LTRs), (ii) detection of chains of conserved retroviral motifs fulfilling distance constraints and (iii) attempted reconstruction of original retroviral protein sequences, combining alignment, codon statistics and properties of protein ends. Other features are prediction of additional open reading frames, automated database collection, graphical presentation and automatic classification. ReTe favors elements >1000-bp long due to its dependence on order of and distances between retroviral fragments. It detects single or low-copy-number elements. ReTe assigned a 'retroviral' score of 890-2827 to 10 exogenous retroviruses from seven genera, and accurately predicted their genes. In a simulated model, ReTe was robust against mutational decay. The human genome was analyzed in 1-2 days on a LINUX cluster. Retroviral sequences were detected in divergent vertebrate genomes. Most ReTe detected chains were coincident with Repeatmasker output and the HERVd database. ReTe did not report most of the volutionary old HERV-L related and MalR sequences, and is not yet tailored for single LTR detection. Nevertheless, ReTe rationally detects and annotates many retroviral sequences.
Place, publisher, year, edition, pages
2007. Vol. 35, no 15, p. 4964-4976
Keywords [en]
amino acid sequence, article, automated pattern recognition, computer program, data base, endogenous retrovirus, gene cluster, genetic analysis, genetic code, genetic screening, long terminal repeat, nonhuman, nucleotide sequence, priority journal, protein motif, sensitivity and specificity, sequence analysis, viral genetics, Algorithms, Animals, Endogenous Retroviruses, Genome, Human, Genomics, Humans, Mutation, Reproducibility of Results, Retroviridae Proteins, Software, Terminal Repeat Sequences, Eukaryota
National Category
Medical Biotechnology
Identifiers
URN: urn:nbn:se:mdh:diva-31744DOI: 10.1093/nar/gkm515ISI: 000249612300004Scopus ID: 2-s2.0-34548590461OAI: oai:DiVA.org:mdh-31744DiVA, id: diva2:933736
2016-06-072016-06-072025-10-10Bibliographically approved