Nobutaka Kitamura *,Kouhei Akazawa ,Kosuke Yoshihara
Abstract Background: Compared with the number of candidate genes used for DNA microarray experiments, the number of available samples is extremely limited. As a result, overfitting of the data may occur during regression analyses. To solve this problem, various penalized regression models have been suggested. In general, the validity of a regression model should be verified using a validation data set, as opposed to the training data used to construct the model. However, at present there are no programs available to calculate statistical properties, including the precision, validity, and the statistical power of the Cox’s proportional hazards model regularized by various penalties; therefore, the properties of these models are not sufficiently clear. Methods: In this study, we created programs using the R language to calculate statistical properties of the Cox’s proportional hazards model, including the statistical power based on the prognostic index, and conducted simulation experiments under various conditions of DNA microarray expression data with survival time. Results: The results showed that the power of a validation set for penalized methods is greater than for stepwise methods in many cases, particularly when n < p. This tendency is most remarkable for the penalized methods including both the L1-norm and the L2-norm. Furthermore, we tested our programs using actual microarray gene expression data with survival time data to confirm their validity. Conclusions: Our simulation programs for the Cox’s proportional hazards model regularized by various penalties are very useful for planning DNA microarray studies or for evaluating the results of such studies
Comparte este artículo