语义相似度计算方法研究毕业论文

2021-04-14 05:04

摘要

计算机的广泛应用和Internet普及，各种信息急速膨胀。信息量的增加给人们带来了便利，同时带来了信息搜索困难。越来越多的人希望通过科学研究的单词的意思分类，会带来商业的决策和经营、经济或社会的效益。在现实世界中，词语是最重要的信息载体，因此对词义相似的处理和分析成为当今数据挖掘和信息检索技术的热点之一。自然语言的词语之间有非常多的关系，在实际应用中，有时需要用相似的关系用一种简单的数量来度量，这就是语义相似度。语义相似度的研究是语义学的理论的不断发展，自然语言的计算机处理的研究的不断深入的必然结果。本文为了深入研究语义相似度，对三种相似度方法进行了比较，所得结果对于文本聚类、Web智能检索、问答系统、网页去重、自然语言处理等很多领域具有重要的指导意义。

论文主要研究：三种计算语义相似度的方法及重点研究基于知网的语义相似度计算。

研究结果表明：文中提出的基于知网的语义相似度计算起来还算快捷和准确。

本文的特色：引用比较多。

关键词：本体模型；基于知网；语义相似度；余弦空间

Abstract

With the widespread use of computers and the popularity of Internet, all kinds of information are rapidly expanding. The increase in the amount of information has brought convenience to people. At the same time, it has also brought difficulties in finding information. People increasingly want to be able to classify words in order to conduct scientific research, business decisions and business management, and bring economic or social benefits. In the real world, words are the most important information carrier. Therefore, the processing and analysis of semantic similarity becomes one of the hotspots of today's data mining and information retrieval technologies. There are a lot of relationships between words in natural language. In practical applications, it is sometimes necessary to use similar relations to measure with a simple quantity. This is semantic similarity. The study of semantic similarity can not be separated from the continuous development of semantic theory, but also can not be separated from the continuous deepening of the study of computer processing of natural language, but also inseparable from the semantic analysis. In order to deeply study the semantic similarity, this paper compares the three similarity methods. The results have important guiding significance for text clustering, Web intelligent retrieval, question answering system, webpage deduplication, natural language processing and many other fields.

The dissertation focuses on the computation of semantic similarities based on HowNet for three methods of computing semantic similarity.

The research results show that the semantic similarity based on HowNet proposed in the paper is fast and accurate.

This article features: more references.

Key Words：Semantic similarity Hownet;Word

第1章绪论 1

1.1研究意义 1

1.2国内外研究现状 2

1.3本文内容 3

第2章相似度计算理论研究 4

2.1词汇相似度的基本概念 4

2.2基于VSM的计算方法 5

2.3基于本体的计算方法 6

第3章基于知网的相似度计算 9

3.1知网的结构 9

3.2计算方法 10

第4章系统实现 14

4.1概念设计 14

4.2计算实现 15

4.3系统测试 18

第5章总结与展望 19

参考文献 20

致谢 21

绪论

1.1 研究意义

毫无疑问现在信息呈爆炸式地增长，人们如何在如大海丛林般的信息堆中除去垃圾和无用的信息，较为容易地找寻自己需要和使用的东西，成为了云信息处理领域中等待解决的问题。在信息交互智能检索中，搜索引擎能很贴心地根据用户输入的关键字或词语返回所有与页面相关的内容，但是，返回的结果往往会有很多相同的内容，单单依赖关键字查询必然会影响搜索结果，利用文本相似度技术对用户提交的查询信息进行分析，了解用户的真实需求，并将检索结果与相似度进行匹配，大大提高了用户体验，这种查找技术可以应用于信息智能检索、自动问答系统、文本检查权重、文本分类、文本聚类等领域^[1]。在自动智能问答系统中，随着企业客户服务和用户间的交流加深，越来越多的用户使用人工答案会耗费许多的和人力资源。在自动交互系统中采用文本相似度计算技术对生活中常见的相似性问题进行整理归类和处理是非常有用的。

在科学论文的检测领域,由于一些论文或学术创作的独创性和保密性,类似或相似的内容不能出现在这些论文中,因此需要相似性计算技术来检查文本,其相似性不能超出范围。除此之外,相似性计算的研究在许多领域具有重要的应用和研究意义。

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码