基于动态权重的多模型集成水产动物疾病防治事件抽取方法
CSTR:
作者:
作者单位:

1.大连海洋大学信息工程学院/辽宁省海洋信息技术重点实验室,大连116023;2.设施渔业教育部重点实验室(大连海洋大学),大连116023

作者简介:

沙明洋, E-mail:447412416@qq.com

通讯作者:

张思佳, E-mail:zhangsijia@dlou.edu.cn

中图分类号:

TP391.41

基金项目:

设施渔业教育部重点实验室开放课题(2021MOEKLECA-KF-05);计算机体系结构国家重点实验室开放课题(CARCH201921);辽宁省教育厅高等学校基本科研项目面上项目(20220056);辽宁省教育科学“十四五”规划课题(JG21DB076)


Multi-model integrated event extraction for aquatic animal disease prevention and control based on dynamic weight
Author:
Affiliation:

1.College of Information Engineering/Liaoning Provincial Key Laboratory of Marine Information Technology, Dalian Ocean University, Dalian 116023, China;2.Key Laboratory of Environment Controlled Aquaculture(Dalian Ocean University), Ministry of Education, Dalian 116023, China

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    为提高水产动物疾病防治事件抽取的准确性,有效解决抽取过程中出现的专有名词边界模糊和事件实体过长等问题,本研究将动态权重思想引入多模型集成的事件抽取方法中。改进后的方法利用百度自然语言理解开放平台(enhanced representation through knowledge integration,ERNIE)和澎湃BERT(MLM as correction BERT,MacBERT)2个预训练模型来学习文本语义信息;采用动态权重的gate模块融合特征;将学习到的语义信息传入双向长短时记忆网络(bi-directional long shortterm memory,BiLSTM)中,并通过条件随机场(conditional random field,CRF)对输出标签序列进行约束。选取ERNIE⊕MacBERT-CRF模型和ERNIE⊕MacBERT-BiLSTM-CRF模型(⊕代表简单相加求平均的融合方法)作为对照模型对提出的方法进行融合性能对比试验验证,结果显示,该方法F1值达74.15%,比经典模型BiLSTM-CRF提高了20.02个百分点。结果表明,该方法用于水产动物疾病防治事件抽取具有更好的效果。

    Abstract:

    In order to enhance the accuracy of event extraction for aquatic animal disease prevention and control, and effectively address issues such as ambiguous boundaries of proprietary terms and excessively lengthy event entities during the extraction process, the research introduces the idea of dynamic weight into the event extraction method of multi-model integration. Two pre-training models,ERNIE(enhanced representation through knowledge integration)and MacBERT(MLM as correction BERT), are used to learn the text semantic information.A gate module with dynamic weights is used to fuse features to enhance the semantic information of the original text.Pass the learned semantic information into BiLSTM (bi-directional long shortterm memory), and constrain the output label sequence through CRF (conditional random field).Select the ERNIE⊕MacBERT-CRF model and the ERNIE⊕MacBERT-BiLSTM-CRF model (⊕ represents the fusion method of simple addition and averaging) as the control model to conduct a comparative test of the fusion performance of the proposed method.The results show that the F1-score of this method reaches 74.15%, which is 20.02 percentage points higher than the classic model BiLSTM-CRF.The results show that this method has a better effect in the extraction of aquatic animal disease prevention and control events.

    表 1 事件论元标签定义Table 1 Event argument label definition
    表 2 草鱼出血病防治事件抽取输出结果Table 2 Grass carp haemorrhagic disease prevention event extraction output result
    表 3 消融实验结果Table 3 Ablation experiment result
    表 6 事件实体边界模糊抽取结果对比Table 6 Comparison of event entity boundary fuzzy extraction results
    表 4 不同融合方法性能对比Table 4 Comparison of performance of different fusion methods
    表 5 长事件实体抽取结果对比Table 5 Comparison of long event entity extraction results
    图1 模型框架Fig.1 Model frame
    参考文献
    [1] 张善文,王振,王祖良.结合知识图谱与双向长短时记忆网络的小麦条锈病预测[J].农业工程学报,2020,36(12):172-178.ZHANG S W,WANG Z,WANG Z L.Prediction of wheat stripe rust disease by combining knowledge graph and bi-directional long short term memory network[J].Transactions of the CSAE,2020,36(12):172-178 (in Chinese with English abstract).
    [2] 杨鹤,于红,孙哲涛,等.基于双重注意力机制的渔业标准实体关系抽取[J].农业工程学报,2021,37(14):204-212.YANG H,YU H,SUN Z T,et al.Fishery standard entity relation extraction using dual attention mechanism[J].Transactions of the CSAE,2021,37(14):204-212 (in Chinese with English abstract).
    [3] 刘巨升,杨惠宁,孙哲涛,等.面向知识图谱构建的水产动物疾病诊治命名实体识别[J].农业工程学报,2022,38(7):210-217.LIU J S,YANG H N,SUN Z T,et al.Named-entity recognition for the diagnosis and treatment of aquatic animal diseases using knowledge graph construction[J].Transactions of the CSAE,2022,38(7):210-217 (in Chinese with English abstract).
    [4] 项威.事件知识图谱构建技术与应用综述[J].计算机与现代化,2020(1):10-16.XIANG W.Reviews on event knowledge graph construction techniques and application[J].Computer and modernization,2020(1):10-16(in Chinese with English abstract).
    [5] 贾美英,杨炳儒,郑德权,等.基于模式匹配的军事演习情报信息抽取[J].现代图书情报技术,2009(9):70-75.JIA M Y,YANG B R,ZHENG D Q,et al.Sham battle information extraction based on pattern matching[J].New technology of library and information service,2009(9):70-75(in Chinese with English abstract).
    [6] 李浩瑞,王健,林鸿飞,等.基于混合模型的生物事件触发词检测[J].中文信息学报,2016,30(1):36-42.LI H R,WANG J,LIN H F,et al.A hybrid approach to trigger detection in biological event extraction[J].Journal of Chinese information processing,2016,30(1):36-42(in Chinese with English abstract).
    [7] 万齐智,万常选,胡蓉,等.基于句法语义依存分析的中文金融事件抽取[J].计算机学报,2021,44(3):508-530.WAN Q Z,WAN C X,HU R,et al.Chinese financial event extraction based on syntactic and semantic dependency parsing[J].Chinese journal of computers,2021,44(3):508-530(in Chinese with English abstract).
    [8] YANG S,FENG D W,QIAO L B,et al.Exploring pre-trained language models for event extraction and generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2019:5284-5294.
    [9] 陈星月,倪丽萍,倪志伟.基于ELECTRA模型与词性特征的金融事件抽取方法研究[J].数据分析与知识发现,2021,5(7):36-47.CHEN X Y,NI L P,NI Z W.Extracting financial events with ELECTRA and part-of-speech[J].Data analysis and knowledge discovery,2021,5(7):36-47(in Chinese with English abstract).
    [10] KINGMA D P, ADAM B A J. A method for stochastic optimization[C]//International conference on learning representations. Ithaca: NYarXiv.org, 2014.
    [11] 李舟军,范宇,吴贤杰.面向自然语言处理的预训练技术研究综述[J].计算机科学,2020,47(3):162-173.LI Z J,FAN Y,WU X J.Survey of natural language processing pre-training techniques[J].Computer science,2020,47(3):162-173 (in Chinese with English abstract).
    [12] CUI Y, CHE W, LIU T, et al. Revisiting pre-trained models for chinese natural language processing[C]//Findings of the association for computational linguistics: EMNLP 2020. [S.l.]:[s.n.],2020: 657-668.
    [13] 王子牛,姜猛,高建瓴,等.基于BERT的中文命名实体识别方法[J].计算机科学,2019,46(S11):138-142.WANG Z N,JIANG M,GAO J L,et al.Chinese named entity recognition method based on BERT[J].Computer science,2019,46(S11):138-142 (in Chinese with English abstract).
    [14] 李军怀,陈苗苗,王怀军,等.基于ALBERT-BGRU-CRF的中文命名实体识别方法[J].计算机工程,2022,48(6):89-94,106.LI J H,CHEN M M,WANG H J,et al.Chinese named entity recognition method based on ALBERT-BGRU-CRF[J].Computer engineering,2022,48(6):89-94,106(in Chinese with English abstract).
    [15] 余本功,范招娣.面向自然语言处理的条件随机场模型研究综述[J].信息资源管理学报,2020,10(5):96-111.YU B G,FAN Z D.A review of conditional random field models for natural language processing[J].Journal of information resources management,2020,10(5):96-111(in Chinese with English abstract).
    [16] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[DB/OL].arXiv,2018:1810.04805.https://doi.org/10.48550/arXiv.1810.04805.
    [17] 喻雪寒,何琳,徐健.基于RoBERTa-CRF的古文历史事件抽取方法研究[J].数据分析与知识发现,2021(7):26-35.YU X H,HE L,XU J.Extracting events from ancient books based on RoBERTa-CRF[J].Data analysis and knowledge discovery,2021(7):26-35 (in Chinese with English abstract).
    [18] CLARK K, LUONG M T, LE Q V, et al. ELECTRA: pre-training text encoders as discriminators rather than generators[C]//International conference on learning representations.arXiv:computation and language.[S.l.]:[s.n.],2020.
    [19] LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al.Neural architectures for named entity recognition[C]//Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics:human language technologies.Stroudsburg,PA,USA:Association for Computational Linguistics,2016:260-270.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

沙明洋,张思佳,傅庆财,于红,李枳錡,喻文甫,刘珈宁.基于动态权重的多模型集成水产动物疾病防治事件抽取方法[J].华中农业大学学报,2023,42(3):80-87

复制
分享
文章指标
  • 点击次数:575
  • 下载次数: 687
  • HTML阅读次数: 120
  • 引用次数: 0
历史
  • 收稿日期:2022-09-30
  • 在线发布日期: 2023-06-20
文章二维码