"景先生毕设|www.jxszl.com

适用于表型预测的生物数据集应用分析【字数:8960】

2024-11-03 10:16编辑: www.jxszl.com景先生毕设

目录
摘要Ⅱ
关键词Ⅱ
AbstractⅢ
引言
引言1
相关文献综述1
1 材料与方法3
1.1 数据与工具3
1.1.1 数据来源3
1.1.2 GEMMA软件3
1.2 方法与过程3
1.2.1 多元混合线性模型(mvLMM)3
1.2.2 产生模拟数据4
1.2.3 确定评价指标5
2 结果与分析7
2.1 参与预测的样本量对预测结果的影响7
2.2 表型间相关系数对预测结果的影响7
2.3表型数量对预测结果的影响8
2.4遗传力对预测结果的影响9
2.5 等位基因频率对预测结果的影响10
3 讨论12
3.1 归纳与总结12
3.2 问题与启发12
致谢13
参考文献14
获得学术成果 14
适用于表型预测的生物数据集应用分析
摘要
现代基因组测序技术的不断发展,越来越多的基因组序列得到完善。通过全基因组测序可以系统地剖析生物数量性状,它有着可操作性强,预测精度高等优点而广泛应用。本文针对数量性状,确定适合用全基因组预测的生物数据特征。我们使用基于混合线性模型的GEMMA软件来实现该算法,并选出参与预测的样本量、表型间相关系数、表型数量、遗传力和等位基因频率作为五项特征,在不同水平下对生物数据进行100次重复实验模拟以减小随即误差。接下来通过GEMMA软件进行预测得出预测表型值。我们选取MAE、MSE、R2和预测表型与原表型相关系数作为评价指标,来将预测表型与模拟的真实表型进行对比。计算结果表明,用于预测的样本量为2000、表型之间相关系数为0.9和0.9、表型数量为10个、遗传力为0.9以及等位基因频率为0.480.5时表型预测效果最好。结论:参与预测的样本量越大、表型间相关系数的绝对值与遗传力越接近于1、表型数量越多、和等位基因频率越高则全基因组预测效果越好。这将对生物数据的全基因组预测提供一定的理论依据。
APPLICAT *51今日免费论文网|www.51jrft.com +Q: &351916072
ION ANALYSIS OF BIOLOGICAL DATA SETS SUITABLE FOR PHENOTYPE PREDICITION
ABSTRACT
With the development of modern genome sequencing technology, more and more genome sequences have been improved. Through genomewide sequencing, the quantitative traits of organisms can be systematically analyzed. It has the advantages of strong operability and high prediction accuracy, and is widely used. In this paper, for quantitative traits, we determine the characteristics of biological data suitable for genomewide prediction. We use GEMMA software based on the mixed linear model to implement the algorithm, and select the sample size, correlation coefficients between phenotypes, number of phenotypes, heritability and allele frequency as the five characteristics. Biological data was simulated 100 times in order to reduce random errors in different levels. Next, the prediction phenotype value is obtained through the prediction by GEMMA software. We selected MAE, MSE, R2 and the correlation coefficient between the predicted phenotype and the original phenotype as the evaluation indicators to compare the predicted phenotype with the simulated real phenotype. The results show that the phenotype prediction effect is best when the sample size used for prediction is 2000, the correlation coefficient between phenotypes is 0.9 and 0.9, the number of phenotypes is 10, the heritability is 0.9, and the allele frequency is 0.480.5. Conclusion: The larger the sample size involved in the prediction, the closer the absolute value of the correlation coefficient between phenotypes and heritability are to 1, the greater the number of phenotypes, and the higher the allele frequency, the better the genomewide prediction effect. This will provide a theoretical basis for genomewide prediction of biological data.

原文链接:http://www.jxszl.com/jsj/sxtj/606750.html