Novel feature selection methods for construction of accurate epigenetic clocks
Journal: PLoS Comput Biol
Adam Li et al
Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct epigenetic clocks. The most common approach is to create clocks using elastic net regression modelling of all measured CpG sites, without first identifying specific features or CpGs of interest. The addition of feature selection approaches provides the opportunity to optimise the identification of predictive CpG sites. Here, we apply novel feature selection methods and combinatorial approaches including newly adapted neural network, genetic algorithms, and 'chained' combinations. Human whole blood methylation data of ~470,000 CpGs was used to develop clocks that predict age with R2 correlation scores of greater than 0.73, the most predictive of which uses 35 CpG sites for a R2 correlation score of 0.87. The five most frequent sites across all clocks were modelled to build a clock with a R2 correlation score of 0.83. These two clocks are validated on two external datasets where they maintain excellent predictive accuracy. When compared with three published epigenetic clocks (Hannum, Horvath, Weidner) also applied to these validation datasets, our clocks outperformed all three models. We identified gene regulatory regions associated with selected CpG as possible targets for future aging studies. Thus, our feature selection algorithms build accurate, generalizable clocks with a low number of CpG sites, providing important tools for the field.