..

Revista de informática y biología de sistemas

Volumen 2, Asunto 1 (2009)

Artículo de investigación

Vectors and Integration in Gene Therapy: Statistical Considerations

Alessandro Ambrosi and Clelia Di Serio

In gene therapy the integration process of the viral DNA genome into the host cell genome is a necessary step for virus integration. Just few years ago, retrovirus integration was believed to be random and the chance of accidentally activating a gene was considered remote. It has been seen that this process is not random and different viruses may show different preferences to integrate in some specific areas of the genome. Tumorigensis associated to some studies in gene therapy is suspected to be caused by insertion process. Depending on whether the provirus integrates into or in the vicinity of genes (Transcription Start Sites , TSS), normal trascription can be enhanced or disrupted thus inducing oncogenic mutations. This is called “insertional mutagenesis”. Investigating whether an area over the genome could be favoured by retrovirus integration is a crucial aspect in gene therapy. These area are called “Common Integration Sites”(CIS)or “hotspots”. In the paper we stressed the importance of developing statistical procedures leading to a unique definition of CIS rather than a “problem related” definition. We here propose some statistical solutions for the search of hotspots based on the “Peaksheight distribution”, which account within the null hypothesis for the possible non-random behaviour of the integrations.

Artículo de investigación

The Atomic Genetic Code

Lutvo Kuriæ

The modern science mainly treats the biochemical basis of sequencing in bio-macromolecules and processes in biochemistry. One can ask weather the language of biochemistry is the adequate scientific language to explain the phenomenon in that science. Is there maybe some other language, out of biochemistry, that determines how the biochemical processes will function and what the structure and organization of life systems will be? The research results provide some answers to these questions. They reveal to us that the process of sequencing in bio-macromolecules is conditioned and determined not only through biochemical, but also through cybernetic and information principles.

Artículo de investigación

Performance Comparative in Classification Algorithms Using Real Datasets

Hanuman Thota, Raghava Naidu Miriyala, Siva Prasad Akula, K.Mrithyunjaya Rao, Chandra Sekhar Vellanki, Allam Appa Rao and Srinubabu Gedela

Classification is one of the most common data mining tasks, used frequently for data categorization and analysis in the industry and research. In real-world data mining sometimes it mainly deals with noisy information sources, because of data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data which is called as noisy data. This noisy data may decrease performance of any classification algorithms. This paper deals with the performance of different classification algorithms and the impact of feature selection algorithm on Logistic Regression Classifier, How it controls False Discovery Rate (FDR) and thus improves the efficiency of Logistic Regression classifier.

Artículo de investigación

Web Based Theoretical Protein pI, MW and 2DE Map

Itaraju J. B. Brum, Daniel Martins-de-Souza, Marcus B. Smolka, José C. Novello and Eduardo Galembeck

The genomic projects have provided a far wide amount of information that still requires be analyzing and interpreting. That would be impossible to be done without the development of well adapted computational tools that might help the analysis of these data we have collected so far. Due to the need for analyzing proteomes we developed a tool, implemented through the CGI that can simulate the two-dimensional electrophoresis from a whole genome.

Artículo de revisión

Exploring Microbial Diversity Using 16S rRNA High-Throughput Methods

Fabrice Armougom and Didier Raoult

As a result of advancements in high-throughput technology, the sequencing of the pioneering 16S rRNA gene marker is gradually shedding light on the taxonomic characterization of the spectacular microbial diversity that inhabits the earth. 16S rRNA-based investigations of microbial environmental niches are currently conducted using several technologies, including large-scale clonal Sanger sequencing, oligonucleotide microarrays, and, particularly, 454 pyrosequencing that targets specific regions or is linked to barcoding strategies. Interestingly, the short read length produced by next-generation sequencing technology has led to new computational efforts in the taxonomic sequence assignment process. From a medical perspective, the characterization of the microbial composition of the skin surface, oral cavity, and gut in both healthy and diseased people enables a comparison of microbial community profiles and also contributes to the understanding of the potential impact of a particular microbial community.

Artículo de investigación

HLA Class I and II Binding Promiscuity of the T-cell Epitopes in Putative Proteins of Hepatitis B Virus

Vijai Singh, Indramani, Dharmendra Kumar Chaudhary and Pallavi Somvanshi

Hepatitis B virus is a human infectious disease universally caused by the hepatitis B virus. Its genome size is 3.215 kb. Immunoinformatics tools have been used to predict the epitopes from seven putative protein viz. polymerase, large-S- and middle –S- Protein, S and X- protein, Precore/Core Protein, Core and E- antigen. Total 50 epitopes were predicted for MHC class I and 55 epitopes for class II MHC molecules. These epitopes showed highest binding score at optimum threshold. Epitopes may use as an antigen for diagnosis and also might be helpful for designing peptide based subunit vaccine against Hepatitis B virus.

Artículo de investigación

Role of the Cation-π Interaction in Therapeutic Proteins: A Comparative Study with Conventional Stabilizing Forces

Shanthi V, Ramanathan K and Rao Sethumadhavan

The cation-p interaction is an important, general force for molecular recognition in biological receptors. In this study, we have analyzed the energy contribution resulting from cation-p interactions in the set of therapeutic proteins. The contribution of cation-p interacting residues in secondary structure involvement, solvent accessibility, stabilization centers, stabilizing residues and conservation score has been evaluated. Secondary structure of the cation-p involving residues shows that, Arg and Lys prefers to be in strand. Among the p residues, Phe prefer to be in coil, Tyr prefers to be in strand and Trp prefer to be in helix. Among the cation-p interacting residues Arg and Lys were in the exposed regions. Phe and Tyr were in the partially buried region and Trp in the fully buried region. Stabilization centers for these proteins showed that all the five residues found in cation-p interactions are important in locating one or more of such centers. The contribution of stabilizing residues in the cation–p interactions was analyzed. Further, the study shows that, 43 percent of the amino acid residues that are involved in cation-p interactions might be conserved in therapeutic proteins. The comparison between the conventional and nonconventional interactions in the data set, clearly depict the significance of cation-p interaction in the stability of therapeutic proteins. On the whole, the results presented in this work will be very useful for understanding the contribution of cation-p interaction to the stability of therapeutic proteins.

Artículo de investigación

A Probabilistic Approach to Study Yeast’s Gene Regulatory Network

Pinto F.R

Using only the transcription network structure information, a probabilistic model was developed that computes the probabilities with which a pair of genes responds simultaneously (SR) or differentially (DR) to a random network perturbation. Study of yeast’s transcription regulatory network in association with gene expression profiles shows that SR and DR probabilities are significantly associated with the distribution of strong co-expression. It is 100 fold more probable to observe co-expression when P(SR)»0.5 for a random perturbation of 3 transcription factors (TFs), allowing for perturbation spread until a depth of 3 connections in the regulatory network. The model also predicts that positive co-expression enhancement is related with the proportion of common TFs (number of TFs that regulate both genes in a pair divided by the total number of TFs that regulate at least one gene in the pair), and not to the absolute number. The relationship between the model derived probabilities and other graph-theoretic measures used to analyse biological networks is discussed.

Artículo de investigación

Modeling Host-Cancer Genetic Interactions with Multilocus Sequence Data

Yao Li and Rongling Wu

Cancer susceptibility may be controlled not only by host genes and mutated genes in cancer cells, but also by the epistatic interactions between genes from the host and cancer genomes. We derive a novel statistical model for cancer gene identification by integrating the gene mutation hypothesis of cancer formation into the mixturemodel framework. Within this framework, genetic interactions of DNA sequences (or haplotypes) between host and cancer genes responsible for cancer risk are defined in terms of quantitative genetic principle. Our model was founded on a commonly used genetic association design in which a random sample of patients is drawn from a natural human population. Each patient is typed for single nucleotide polymorphisms (SNPs) on normal and cancer cells and measured for cancer susceptibility. The model is formulated within the maximum likelihood context and implemented with the EM algorithm, allowing the estimation of both population and quantitative genetic parameters. The model provides a general procedure for testing the distribution of haplotypes constructed by SNPs from host and cancer genes and the linkage disequilibria of different orders among the SNPs. The model also formulates a series of testable hypotheses about the effects of host genes, cancer genes, and their interactions on cancer susceptibility. We carried out simulation studies to examine the statistical properties of the model. The implications of this model for cancer gene identification are discussed.

Artículo de investigación

Modeling the Chain Entropy of Biopolymers: Unifying Two Different Random Walk Models under One Framework

Wayne Dawson and Gota Kawai

Entropy plays a critical role in the long range structure of biopolymers. To model the coarse-grained chain entropy of the residues in biopolymers, the lattice model or the Gaussian polymer chain (GPC) model is typically used. Both models use the concept of a random walk to find the conformations of an unstructured polymer. However, the entropy of the lattice model is a function of the coordination number, whereas the entropy of the GPC is a function of the root-mean square separation distance between the ends of the polymer. This can lead to inconsistent predictions for the coarse-grained entropy. Here we show that the GPC model and the lattice model both are consistent under transformations using the cross-linking entropy (CLE) model and that the CLE model generates a family of equations that include these two models at important limits. We show that the CLE model is a unifying approach to the thermodynamics of biopolymers that links these incompatible models into a single framework, elicits their similarities and differences, and expands beyond the models allowing calculation of variable flexibility and incorporating important corrections such as the worm-like-chain model. The CLE model is also consistent with the contact-order model and, when combined with existing local pairing potentials, can predict correct structures at the minimum free energy.

Indexado en

arrow_upward arrow_upward