基于bilstm的数字人文研究方法抽取研究【字数:13114】
目录
摘要Ⅱ
关键词Ⅱ
AbstractⅢ
引言
引言1
一、文献综述1
(一)数字人文概述1
(二)数字人文的国内外研究现状3
(三)命名实体识别综述3
(四)神经网络综述3
二、研究方案4
(一)研究目的4
(二)研究框架4
(三)研究步骤4
(四)主要的研究方法4
三、实验过程6
(一)研究对象与数据来源6
(二)实验步骤7
四、结果分析10
(一)模型测试结果10
(二)统计结果分析11
(三)小结11
五、结论11
(一)研究总结12
(二)展望12
致谢12
参考文献13
图13
图24
图35
图46
图56
图610
图710
图811
表18
表28
表39
表411
基于BiLSTM的数字人文研究方法抽取研究
摘 要
【目的】人文,是人类文化的简称。随着时代的发展,世界数字化浪潮愈演愈烈。计算机技术和互联网技术逐渐渗透到各项人文研究之中,使人文研究各个环节的效率大大提高,数字人文应运而生。欲对数字人文领域的研究方法有一个概括性的了解,从大规模文本中抽取涉及研究方法的实体,统计热门的研究方法。【方法】基于机器学习的方法,使用BiLSTM+CRF 神经网络模型实现对研究方法的自动实体抽取。【过程】下载并整理三本数字人文领域著名期刊上刊登的文献题目、摘要和关键词,汇总成一个文本文件,先使用Python的NLTK模块对文本进行预处理,再通过BiLSTM+CRF 神经网络模型自动抽取涉及研究方法的实体,并以PRF值来衡量模型的准确性,最后统计出出现频数最多的研究方法。【结果】本次模型测试的查全率为92.74%,查准率为93.53%,F值为0.9313。热门的研究方法 *51今日免费论文网|www.51jrft.com +Q: ¥351916072¥
有“数字”“地理信息系统”“人工智能”等。可见本实验采用的机器学习自动抽取算法是基本可靠的。【意义】在量化研究的过程中,误差是一定会存在的,但数据能够为日后的定性研究提供重要依据。
THE EXTRACTION RESEARCH ON DIGITAL HUMANITIES BASED ON BiLSTM
ABSTRACT
Humanity is the abbreviation of human culture. With time changed, the world digital wave is getting stronger and stronger. Computer technology and Internet technology have gradually penetrated into humanities research, greatly improving the efficiency of each link of the humanities research, and digital humanities came into being. In order to have a general understanding of the research methods in the field of digital humanities, entities related to the research methods are extracted from largescale texts. Then statistically analyze popular research methods. Based on the machine learning method, the BiLSTM+CRF neural network model was used to realize automatic extraction of research methods. Download and sort out the literature titles, abstracts and keywords published in three wellknown journals in the field of digital humanities, and put them together into a text file. Use the Python module NLTK in text preprocessing, and then through the Bi LSTM + CRF neural network automatically extract the entities involved in the research methodology, and with PRF value to measure the accuracy of the model. Finally, the most frequently used research method was found. The recall rate of this model test was 92.74%, the precision rate was 93.53%, and the F value was 0.9313. Popular research methods include "digital", "geographic information system" and "artificial intelligence". It can be seen that the automatic extraction algorithm of machine learning adopted in this experiment is basically reliable. In the process of quantitative research, errors always exist, but the data can provide an important basis for future qualitative research.
原文链接:http://www.jxszl.com/jsj/xxaq/606953.html