最近,密西根理工大学的数学家开发出一项新 软件 ——Ensemble Learning Approach(ELA),ELA可用于比较不同个体之间的 基因组 成,从中分选出疾病相关的基因。研究人员利用该 软件 能够找出某些人类遗传病的致病基因。此外,他们还发现了2型 糖尿病 相关基因的11个突变体,即单核苷酸多态性(single nucleotide polymorphisms,SNPs)。这项研究发表在Genetic Epidemiology杂志上。
像2型 糖尿病 这种复杂的遗传疾病,单个基因突变可能促成该病的发生,多个基因共同作用也可能引起该病。过去,很难针对多个基因间的相互作用进行研究,因为要将人类基因组中约50万个基因匹配起来再进行计算几乎不可能实现。
而ELA软件避开上述问题,首先,将基因研究的范围缩小到只包括潜在的致病基因;再通过统计学方法计算出哪些SNPs是能单独致病,哪些需要多个基因共同作用才能致病。为了测试他们建立的模型在实际数据上的有效性,课题组在英国对1,000人进行基因分析——包括500名2型 糖尿病 患者,500名健康人。他们发现有11个SNPs引起该2型 糖尿病 的可能性很高。( 生物谷 Bioon.com)
生物谷推荐原始出处:
Genetic Epidemiology Volume 32 Issue 4, Pages 285 - 300
An ensemble learning approach jointly modeling main and interaction effects in genetic association studies
Zhaogong Zhang 1 2, Shuanglin Zhang 1 2, Man-Yu Wong 3, Nicholas J. Wareham 4, Qiuying Sha 1 *
1Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
2Heilongjiang University, Harbin, China
3Department of Mathematics, Hong Kong University of Sciences and Technology, Hong Kong, China
4Department of Public Health and Primary Care, University of Cambridge Institute of Public Health, Cambridge, United Kingdom
Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for base learners and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P-value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single-marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi-locus methods in almost all cases. In an application to a large-scale case-control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi-locus effect (P-value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two-locus combinations showed significant two-locus interaction effects.