8月30日,Nature Genetics报道了华盛顿大学的研究人员设计的新算法。该算法对重复 基因组 序列拷贝数及其含量的计算被证明是有效的。研究人员将该方法命名为mrFAST, 即微视快速算法搜索工具。
人类基因组中的片段重复被认为与情感和免疫相关。比如狼疮, 克隆 氏病,精神发育迟滞,精神分裂症,色盲,牛皮癣,和年龄相关性黄斑变性等疾病都与此有关。重复片段中常常含有重复未知功能的基因,不同个体的重复片段的拷贝数不同。检测重复片段的数量、含量以及位置是理解基因拷贝数变化对于健康的意义中很重要的一步。
Alkan说,"新算法,采用了新一代DNA测序技术,首次重复片段中可变拷贝数提供了精准的统计。"Kidd解释道,"它可以统计一个人是否含有1个、2个、3个或者更多的基因拷贝。"许多标准基因组分析并没有包括人类基因组重复片段的分析,因为这些序列并不是唯一的。其实,"这种计算是非常困难的。"
在该研究之前,也有科学家就此展开过研究,但是都没有计算出具体拷贝数。比如一些科学家研究结果表明了部分人可以通过增加基因拷贝数来抵抗HIV,但是关于拷贝数的增加数目却是一个未知数据。
该研究获得了1000基因组项目的支持,全球有多所研究机构参与了其中,实验样本来源于世界各地数百人的基因组。
Alkan及他的团队认为拷贝数变异为人类多样性做出了重大贡献。精确且系统的检测基因组片段拷贝数的能力是很重要的,特别是在个体基因组图谱的绘制和基因组如何塑造一个人的性格方面。
他们认为,接下来的挑战是确定片段重复在序列含量的变化和人类基因组中这些动态的、重要区域的结构。
推荐原始出处及摘要:
Nature Genetics Published online: 30 August 2009 | doi:10.1038/ng.437
Personalized copy number and segmental duplication maps using next-generation sequencing
Can Alkan1,2, Jeffrey M Kidd1, Tomas Marques-Bonet1,3, Gozde Aksay1, Francesca Antonacci1, Fereydoun Hormozdiari4, Jacob O Kitzman1, Carl Baker1, Maika Malig1, Onur Mutlu5, S Cenk Sahinalp4, Richard A Gibbs6 & Evan E Eichler1,2
1 Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA.
2 Howard Hughes Medical Institute, Seattle, Washington, USA.
3 Institut de Biologia Evolutiva (UPF-CSIC), Barcelona, Catalonia, Spain.
4 School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
5 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
6 Baylor College of Medicine, Houston, Texas, USA.
Correspondence to: Evan E Eichler
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 10-16). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology