- Learning Phenotype Mapping for Integrating Large Genetic Data Conference Paper, Chun-Nan Hsu, Cheng-Ju Kuo, Congxing Cai, Sarah A. Pendergrass, Marylyn D. Ritchie, Jose Luis Ambite, http://dl.acm.org/citation.cfm?id=2002902.2002906, BioNLP '11, Stroudsburg, PA, USA, Association for Computational Linguistics, 19–27, 978-1-932432-91-6, 2011, 2014-02-24 18:08:19, ACM Digital Library, Accurate phenotype mapping will play an important role in facilitating Phenome-Wide Association Studies (PheWAS), and potentially in other phenomics based studies. The Phe-WAS approach investigates the association between genetic variation and an extensive range of phenotypes in a high-throughput manner to better understand the impact of genetic variations on multiple phenotypes. Herein we define the phenotype mapping problem posed by PheWAS analyses, discuss the challenges, and present a machine-learning solution. Our key ideas include the use of weighted Jaccard features and term augmentation by dictionary lookup. When compared to string similarity metric-based features, our approach improves the F-score from 0.59 to 0.73. With augmentation we show further improvement in F-score to 0.89. For terms not covered by the dictionary, we use transitive closure inference and reach an F-score of 0.91, close to a level sufficient for practical use. We also show that our model generalizes well to phenotypes not used in our training dataset., Proceedings of BioNLP 2011 Workshop,