Selen Uguroglu, Oznur Tastan, Judith Klein-Seetharaman and Sanford H. Leuba
Due to the increasingly larger and more interdisciplinary nature of scientific reporting, it is becoming more difficult to identify all the potentially relevant, citeable articles in reference lists of publications such as scientific papers, reports, grant proposals and patent applications. Authors may miss and/or give inaccurate citations, potentially hindering progress in a discipline and on a personal level, and change the importance and impact of an investigator’s work. Given the emphasis on quantitative means for assessing productivity, including the number of literature citations, efforts are needed to assist authors in the identification of potentially relevant articles to cite. Prior work has analyzed citation network structure and characteristic features and correlated these with other variables, such as country of origin, journal impact factor and open access status. As a result, problems have been revealed, such as underrepresentation of third-world countries, a high incidence of self-citation, and unsystematic quotation habits in review articles. With the exception of gross plagiarism detection software, however, no attempt has been made to develop a practical solution to identifying potentially relevant, citeable articles that may have been missed. Here, we use statistical methods to help in the retrieval of relevant literature from existing publications. Specifically, we exploit the fact that publications reporting specific findings are typically quoted together as grouped-co-citations in their respective contexts. Our approach can automatically construct rules for co-citation by automatically extracting co-citation overrepresentations in manuscripts. This approach should help authors and reviewers identify potentially relevant, citeable articles.
Comparte este artículo