BLEU
- Bleu 测量精度
- 双语评估研究
- 最初用于机器翻译(双语)
- W(机器生成摘要) in (Human reference Summary)
- 这是机器生成的摘要中的单词(和/或 n-gram)在人工参考摘要中出现的次数
- 机器翻译越接近专业的人工翻译越好
胭脂
-
胭脂措施召回
-
主旨评估的面向召回的研究
-W(Human Reference Summary) In w(机器生成摘要)
-
这就是机器生成摘要中出现的单词(和/或 n-gram)在机器生成摘要中出现的次数。
-
系统和参考摘要之间的 N-gram 重叠。
-Rouge N,这里 N 是 n-gram
reference_text = """Artificial intelligence (AI, also machine intelligence, MI) is intelligence demonstrated by machines, in contrast to the natural intelligence (NI) displayed by humans and other animals. In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". See glossary of artificial intelligence. The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring "intelligence" are often removed from the definition, a phenomenon known as the AI effect, leading to the quip "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from "artificial intelligence", having become a routine technology. Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go), autonomous cars, intelligent routing in content delivery networks, military simulations, and interpreting complex data, including images and videos. Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success and renewed funding. For most of its history, AI research has been divided into subfields that often fail to communicate with each other. These sub-fields are based on technical considerations, such as particular goals (e.g. "robotics" or "machine learning"), the use of particular tools ("logic" or "neural networks"), or deep philosophical differences. Subfields have also been based on social factors (particular institutions or the work of particular researchers). The traditional problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing, perception and the ability to move and manipulate objects. General intelligence is among the field's long-term goals. Approaches include statistical methods, computational intelligence, and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, neural networks and methods based on statistics, probability and economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy and many others. The field was founded on the claim that human intelligence "can be so precisely described that a machine can be made to simulate it". This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity. Some people also consider AI to be a danger to humanity if it progresses unabatedly. Others believe that AI, unlike previous technological revolutions, will create a risk of mass unemployment. In the twenty-first century, AI techniques have experienced a resurgence following concurrent advances in computer power, large amounts of data, and theoretical understanding; and AI techniques have become an essential part of the technology industry, helping to solve many challenging problems in computer science."""
抽象摘要
# Abstractive Summarize
len(reference_text.split())
from transformers import pipeline
summarization = pipeline("summarization")
abstractve_summarization = summarization(reference_text)[0]["summary_text"]
抽象输出
In computer science AI research is defined as the study of "intelligent agents" Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go)
提取摘要
# Extractive summarize
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer
parser = PlaintextParser.from_string(reference_text, Tokenizer("english"))
# parser.document.sentences
summarizer = LexRankSummarizer()
extractve_summarization = summarizer(parser.document,2)
extractve_summarization) = ' '.join([str(s) for s in list(extractve_summarization)])
提取输出
Colloquially, the term "artificial intelligence" is often used to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. Sub-fields have also been based on social factors (particular institutions or the work of particular researchers).The traditional problems (or goals) of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception and the ability to move and manipulate objects.
使用 Rouge 评估抽象摘要
from rouge import Rouge
r = Rouge()
r.get_scores(abstractve_summarization, reference_text)
使用 Rouge Abstractive 摘要输出
[{'rouge-1': {'f': 0.22299651364421083,
'p': 0.9696969696969697,
'r': 0.12598425196850394},
'rouge-2': {'f': 0.21328671127225052,
'p': 0.9384615384615385,
'r': 0.1203155818540434},
'rouge-l': {'f': 0.29041095634452996,
'p': 0.9636363636363636,
'r': 0.17096774193548386}}]
使用 Rouge 评估抽象摘要
from rouge import Rouge
r = Rouge()
r.get_scores(extractve_summarization, reference_text)
使用 Rouge Extractive 摘要输出
[{'rouge-1': {'f': 0.27860696251962963,
'p': 0.8842105263157894,
'r': 0.16535433070866143},
'rouge-2': {'f': 0.22296172781038814,
'p': 0.7127659574468085,
'r': 0.13214990138067062},
'rouge-l': {'f': 0.354755780824869,
'p': 0.8734177215189873,
'r': 0.22258064516129034}}]
解读胭脂分数
ROUGE 是重叠词的分数。 ROUGE-N 是指重叠的 n-gram。具体来说:
与原始论文相比,我试图简化符号。假设我们正在计算 ROUGE-2,也就是二元匹配。分子 ∑s 循环遍历单个参考摘要中的所有二元组,并计算在候选摘要中找到匹配二元组的次数(由摘要算法提出)。如果有多个参考摘要,∑r 确保我们对所有参考摘要重复该过程。
分母只是计算所有参考摘要中的二元组总数。这是一个文档摘要对的过程。您对所有文档重复该过程,并对所有分数进行平均,从而为您提供 ROUGE-N 分数。因此,较高的分数意味着平均而言,您的摘要和参考文献之间的 n-gram 重叠率很高。
Example:
S1. police killed the gunman
S2. police kill the gunman
S3. the gunman kill police
S1 是参考,S2 和 S3 是候选。注意 S2 和 S3 都与参考有一个重叠的二元组,因此它们具有相同的 ROUGE-2 分数,尽管 S2 应该更好。一个额外的 ROUGE-L 分数处理这个问题,其中 L 代表最长公共子序列。在 S2 中,第一个词和最后两个词匹配参考,因此得分 3/4,而 S3 仅匹配二元组,因此得分 2/4。