..

Volumen 7, Asunto 1 (2016)

Comunicación corta

From Three Fishers: Statistician, Geneticist and Person to Only One Fisher: The Scientist

Millor Fernandes

Sir Ronald Fisher was the most important statistician of the twentieth century. He developed several theories that revolutionized statistics and genetics. Many scientists declared him guilty of propagation of errors because their new ideas were not understood correctly. Fisher also had a polemic and controversial social position because he arduously defended eugenics and tobacco consumption. In this paper our intention is to show the three Fishers: the statistician, the geneticist and the person. Their contributions to science were results from these three Fishers who considered one only Fisher: the scientist.

Artículo de investigación

Using Mathematical Method to Solve Gene Identification Research

Dong Lin and Chu Xiangfeng

For the identification of the gene sequences of the different types of biological "key" to construct the gene sequence screening model based on two -way clustering algorithm. First, the establishment of the FCM algorithm based on the primary model solution similar to clustering samples using two-way clustering algorithm optimized to filter out the "key" gene sequence. The problem of inaccurate forecasts for the experience of the threshold, the introduction of boots with a sampling algorithm based threshold model obtained cluster of clusters. Confidence level α = 0.05 under the highest confidence, in order to solve the species optimal threshold value selected. Checksum achieve the classification of genes coding interval 90% of the validity and accuracy of 88%, a 50% increase compared to the experience threshold algorithm. As for the random noise covering part of intron fluctuations, interfere with gene identification, the wavelet transform function is introduced into the DNA coding region prediction to filter the genes noise. Therefore, In order to solve drawbacks of coding region prediction imprecise, we establish a DNA sequence coding region prediction model based on wavelet transform. Using this model, the detection rate reached to 81%, 27% increase from the neural network method, the prediction accuracy reached to75%, 36% higher than the Fourier analysis.

Artículo de investigación

Comparison of the Count Regression Models in Evaluation of the Effects of Hazelnut Harvest Season Variations on Pulmonary Aspergillus

Esin A and Emel U

Pulmonary aspergillosis has recently emerged as a worldwide health care problem especially in patients with underlying lung disease. The objective of this study was to compare the Poisson and COM-Poisson regression models and to find the best fitted model for determining the effect of hazelnut harvest season on pulmoner aspergillosis. The data obtained from the state hospital of our cityin the time period of two years, from September, 2012 to August, 2014. A retrospective study was conducted. Respiratory specimens which showed repeated isolation of Aspergillus were included in the study however only one of the samples was analysed. Cases were classified according to revised definitions given by European Organization for Research and Treatment of Cancer/Invasive Mycosis Study Consensus Group (EORT/MSG). Culture positive 36 patients were detected from 3457 patients. Poisson and Conway-Maxwell- Poisson (COM-Poisson) regression models were compared to determine the best fitted model for identifying the number of new pulmonary aspergillosis cases in hazelnut harvest season. To describe the best fitted model of count data, dispersion, deviance and Akaike Information Criteria (AIC) test statistics were used. Based on statistical test for dispersion, the under-dispersion was found non-significant. This results clearly indicate that Poisson regression model is more approtiate for pulmonary aspergillosis data when compared to COM-Poisson regression model. Deviance and AIC values also confirm this result. Poisson regression model and COM- Poisson regression model were compared with statistical tests. According to statistical tests Poisson regression model was found to be the best fit model for pulmonary aspergillosis data.

Artículo de investigación

A Multilevel Modeling Analysis of the Determinants and Cross-regional Variations of HIV Testing in Ethiopia: Ethiopian DHS 2011

Tesfay Gidey Hailu

Background: Determinants of HIV testing can be affected at both individual and community levels but most studies in Ethiopia did not assume any clustering effect hence the estimates will often be biased. Methods: Given the hierarchical nature of the survey population, that is; Ethiopian Demographic and Health Survey (EDHS2011), multilevel modeling approach was used. Results: About 4.07% (6.68%) of the total variation on ever being tested for HIV was attributable to region-level factors and 17.27% (18.45%) was attributable to cluster level factors among men (women) respectively. Conclusion: Random effects are useful for modeling intra-cluster correlation; that is, observations in the same cluster were correlated because they share common cluster-level random effects. This study hence will help to notify national efforts targeting on specific population who mostly under-utilized HIV testing services as well as to identify key geographic areas for further investigation. In line with this, the strengthening of the health programs on advocating the benefits of HIV testing through mass media, integrating family planning services with HIV testing, concentrating on both men and women in the age groups of 20 to 34 years old, targeting on Somali region and Nuwer ethnic group while designing services would greatly improve the proportion of HIV testing. Moreover, efficient distribution of health care facilities offering HIV testing services among women urban and rural areas residents are required.

Artículo de investigación

Developing Prediction Models from Results of Regression Analysis: Woodpecker Technique

Alexander Goldfarb Rumyantzev, Ning Dong, Sergei Krikov, Olga Efimova, Lev Barenbaum and Shiva Gautam

Background: Developing medical prediction models remains time and labor consuming. We propose the approach where information collected from published epidemiological outcome studies is quickly converted into prediction models. Methods: We used general expressions for regression models to derive prediction formulae defining the probability of the outcome and relative risk indicator. Risk indicator (R) is calculated as a linear combination of predictors multiplied by regression coefficients and then is placed on the scale of 0 to 10 for interpretability. Prediction expression for the probability (P) of the outcome is derived from general expression for logistic regression and proportional hazard models. The intercept is calculated based upon the outcome rate in the population and the risk indicator assigned to a subject representing mean characteristics of the population (Ȓ). We also consider linear expression where probability of outcome is the product of risk indicator and the ratio of observed outcome rate and Ȓ. Results: These models were explored and compared in a numeric simulation exercise and also using real data obtained from NHANES dataset. All three expressions generate very similar predictions in the lower categories of risk indicator. In the groups with the higher value of risk indicator linear expression tends to predict lower probability than exponential expressions and also lower than observed. Conclusions: We demonstrated simple technique (named Woodpecker™) that might allow deriving functional prediction model and risk stratification tool from the report of clinical outcome study using multivariate regression model.

Artículo de investigación

Combining Prediction Models in a Linear Way: Results of Numeric Simulation

Alexander Goldfarb-Rumyantzev and Ning Dong

Background: Using standard expressions for logistic regression and proportional hazard models and data from published outcome studies might allow generating prediction models and risk stratification tools in a more streamline fashion. However it might require combining the models, adding or removing predictors. The feasibility of this approach has been examined here. Methods: The outcome of this simulation study is mortality. The simulation exercise was based on the imaginary population of 20,000 subjects whose mortality was completely determined by five variables in the specified logistic regression model. In the first simulation exercise using “full model”, we evaluated the option of combining the results of two separate studies (studies A and B) each based on subset of the population. In the second simulation exercise studies A and B were based on limited number of predictors. Each simulation was repeated 50 times. Results: Both simulation exercises demonstrated the robustness of the model and feasibility of adding or removing predictors to/from the model. We also compared the results of linear model to the more complex exponential model using all five predictors. In subjects with lower risk indicator the outcome of linear model is similar to the outcome of the logistic regression model and to the true outcome rate, however it underestimates the risk in the high-risk groups. On the other hand, logistic regression model is accurate compared to actual outcomes. This confirms our hypothesis that dropping or adding variables should not distort the prediction in any noticeable way. Conclusions: Simple linear combination of prediction models, adding or removing predictors do not cause distortion of the model and predictions remain robust. Prediction of linear model is similar to exponential model, except the former underestimate the outcome in the high risk groups.

Artículo de investigación

Role of BMS and Infrastructure in Crude Death Rate and Infant Mortality Rate

Abdul Basit, Ishaque Ahmed Ansari and Anam Riaz

The aim of the study is to investigate the relationship of infrastructure of health sector and basic medical staff with the IMR and CDR respectively. Another purpose of doing this study is to describe the historical trend of infrastructure in Health sector and Basic Medical Staff (BMS). CGR and Year on Year (YoY) % change of BMS and infrastructure shows that there is downward trend after the period 1995-96. The results of one way ANOVA shows that, each decade has different growth in infrastructure and basic medical staff. Similarly regression analysis shows that there is linear relationship among the IMR, CDR, basic medical infrastructure and BMS. The finding of the study indicates that basic infrastructure and basic medical staff is playing an important role in reducing the CDR and IMR of Pakistan. Infrastructure is playing significant role in the decreasing trend of CDR and IMR as compare to the BMS. This indicates that government of Pakistan needs to increase the budget for the infrasture of health sector.

Artículo de investigación

Comparison of Binary Models for the Associated Factors Affecting Recovery Status of Vesico-vaginal Obstetrics Fistula Patients: A Case of Mettu Hamlin Fistula Center, South West Ethiopian

Aboma Temesgen

Background: Obstetric fistula or vaginal fistula is a medical condition in which a fistula (hole) develops between either the rectum or vagina or between the bladder and vagina after severe or failed childbirth, when adequate medical care is not available. It is the most tragic of preventable childbirth complications in the developing world, as affected women are often abandoned by their husbands and family, and forced to live in shame. Objective: The main objective the study was to determine an appropriate binary model for the recovery status of the vesico-vaginal patients. Furthermore, the study explores factors affecting the recovery status of the patients during the time period of the study. Methods: The study consists of 206 vesico-vaginal fistula patients having all required information who were taking treatment at Mettu Hamlin Fistula center from November 2010 to June 2014. The chi-square test of association was employed to explore the association between the recovery status and categorical independent variables. After exploring the association between the variables, different binary models were employed to have an appropriate model for the recovery status of the patients based on Akaki information criteria of the model. Results: The chi-square test of association showed that width of fistula length of fistula and bladder size categories were significantly associated with recovery status of the patients at 5% of level of significance. The study showed among the candidate binary models logistic model was considers an appropriate model. Furthermore, the fitted model showed width, length of fistula and bladder size categories were the factors that have significant effect on the recovery status of the patients at 5% level of significance. Conclusion: Logistic regression model was the better fit of the data whereas the fistula patients with width and length fistula category group between three up to five centimeter were less likely to be recovered comparison with the fistula patients group with width and length of fistula less than or equal to two centimeters. Similarly, the none bladder size category patients where less likely recover in comparison with fair bladder size fair bladder size group patients.

Artículo de investigación

Properties of Estimators in Exponential Family Settings with Observationbased Stopping Rules

Elasma Milanzi, Geert Molenberghs, Ariel Alonso, Michael GK, Geert Verbeke, Anastasios AT and Marie Davidian

Often, sample size is not fixed by design. A key example is a sequential trial with a stopping rule, where stopping is based on what has been observed at an interim look. While such designs are used for time and cost efficiency, and hypothesis testing theory has been well developed, estimation following a sequential trial is a challenging, still controversial problem. Progress has been made in the literature, predominantly for normal outcomes and/or for a deterministic stopping rule. Here, we place these settings in a broader context of outcomes following an exponential family distribution and, with a stochastic stopping rule that includes a deterministic rule and completely random sample size as special cases. It is shown that the estimation problem is usually simpler than often thought. In particular, it is established that the ordinary sample average is a very sensible choice, contrary to commonly encountered statements. We study (1) The so-called incompleteness property of the sufficient statistics, (2) A general class of linear estimators, and (3) Joint and conditional likelihood estimation. Apart from the general exponential family setting, normal and binary outcomes are considered as key examples. While our results hold for a general number of looks, for ease of exposition, we focus on the simple yet generic setting of two possible sample sizes, N=n or N=2n.

Indexado en

arrow_upward arrow_upward