基于Inception-CSA深度学习模型的鸟鸣分类
CSTR:
作者:
作者单位:

中南林业科技大学计算机与信息工程学院/人工智能应用研究所,长沙 410004

作者简介:

李怀城,E-mail:Refrain_lhc@163.com

通讯作者:

陈爱斌,E-mail:hotaibin@163.com

中图分类号:

TP183

基金项目:

国家自然科学基金项目(62276276);智慧物流技术湖南省重点实验室项目(2019TP1015);湖南省研究生科研创新项目(CX20210879)


Inception-CSA deep learning model-based classification of bird sounds
Author:
Affiliation:

College of Computer and Information Engineering/Institute of Applied Artificial Intelligence, Central South University of Forestry and Technology,Changsha 410004,China

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献 [22]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    为进一步提高通过声音识别鸟类的精确度,本研究提出基于Inception-CSA深度学习模型的鸟鸣声分类方法,包含鸟鸣声音频样本预处理、特征提取、分类器分类等步骤。首先将鸟鸣声样本预处理成尺寸相同的梅尔频谱图,作为鸟鸣声特征图;其次利用Inception-CSA模型对鸟鸣声特征图进行特征提取,其中Inception模块提取鸟鸣声特征图中的多尺度局部时频域特征,CSA模块获取鸟鸣声特征图的全局注意力权重,将二者的输出结合得到更强的特征图,再次利用最大池化层对特征图进行下采样;最后利用全连接层进行分类,得到最终的分类结果。以采集的华南地区自然环境中的10种野生鸟类的鸣叫声构建数据集,用于实验部分以验证方法的有效性。结果表明,本研究提出的方法在自建数据集上准确率达到了93.11%,相比于基于其他经典模型的分类方法,基于Inception-CSA模型的分类方法在拥有较少模型参数量的同时达到了更高的准确率。

    Abstract:

    Bird sounds have diverse features, and most of the current convolutional neural network models based on a single receptive field are difficult to learn the diversity of bird sound features from audio containing complex background noise. In this article, we proposed a method of classifying bird sounds based on the Inception-CSA deep learning model, which consists of three steps including bird audio sample preprocessing, feature extraction, and classifier classification. First, the samples of bird sounds were preprocessed into Mel spectrum maps with the same size as the feature maps of bird sounds. Then the feature of bird sounds was extracted with the Inception-CSA model including the Inception module extracting the multi-scale local time-frequency domain features in the feature map of bird sounds and the CSA module obtaining the global attention weights of the feature map of bird sounds. The output of both was combined to obtain a stronger feature map. The feature maps were downsampled with the maximum pooling layer. Finally, the results of final classification were obtained with the fully connected layer. The calls of 10 wild bird species in the natural environment of south China were collected and the dataset was constructed to verify the effectiveness of the method. The results showed that the proposed method achieved 93.11% accuracy on the self-built dataset. The classification method based on the Inception-CSA model had higher accuracy with fewer model parameters compared with the classification methods based on other classical models.

    表 1 混淆矩阵分析结果Table 1 Confusion matrix analysis results
    表 2 与其他分类网络模型的试验结果对比Table 2 Comparison with experimental results of other classification network models
    图1 基于Inception-CSA模型的鸟鸣声分类方法总体结构图Fig.1 Overall structure of bird song classification method based on Inception-CSA model
    图2 银喉长尾山雀在不同情况下的鸣叫声波形图Fig.2 Waveforms of calls of the silver-throated long-tailed tit in different situations
    图3 Inception-CSA Block的结构图Fig.3 Structure diagram of Inception-CSA Block
    图4 CSA模块的结构图Fig.4 Structure diagram of CSA module
    图5 训练过程损失值(A)和准确率(B) 变化Fig.5 Loss value change (A) and accuracy rate change (B) during training
    图6 最终验证集的混淆矩阵Fig.6 Confusion matrix for final validation set
    参考文献
    [1] 安文雨,涂婧林,侯东瑞,等.国土空间生态修复与乡村振兴: 共现与融合[J].华中农业大学学报,2022,41(3): 1-10.AN W Y,TU J Y,HOU D R,et al.Ecological restoration of territorial space and rural revitalization:co-occurrence and integration [J].Journal of Huazhong Agricultural University,2022,41(3):1-10(in Chinese with English abstract).
    [2] ANAND R,SHANTHI T,DINESH C,et al.AI based birds sound classification using convolutional neural networks[J/OL].IOP conference series: earth and environmental science,2021,785(1): 012015[2022-09-19].https://iopscience.iop.org/article/10.1088/1755-1315/785/1/012015/meta.DOI: 10.1088/1755-1315/785/1/012015.
    [3] BARDELI R,WOLFF D,KURTH F,et al.Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring[J].Pattern recognition letters,2010,31(12): 1524-1534.
    [4] WIMMER J,TOWSEY M,ROE P,et al.Sampling environmental acoustic recordings to determine bird species richness[J].Ecological applications,2013,23(6): 1419-1428.
    [5] 刘志华,陈文洁,陈爱斌.基于自注意力机制时频谱同源特征融合的鸟鸣声分类[J].计算机应用,2022,42(4): 1260-1268.LIU Z H,CHEN W J,CHEN A B.Homologous spectrogram feature fusion with self-attention mechanism for bird sound classification[J].Journal of computer applications,2022,42(4):1260-1268(in Chinese with English abstract).
    [6] BRIGGS F,LAKSHMINARAYANAN B,NEAL L,et al.Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach[J].The journal of the Acoustical Society of America,2012,131(6): 4640-4650.
    [7] QIAO Y,QIAN K,ZHAO Z.Learning higher representations from bioacoustics: a sequence-to-sequence deep learning approach for bird sound classification[C]//27th International Conference,ICONIP 2020,November 18-22,2020,Bangkok,Thailand.Cham: Springer,2020: 130-138.
    [8] ACEVEDO M A,CORRADA-BRAVO C J,CORRADA-BRAVO H,et al.Automated classification of bird and amphibian calls using machine learning:a comparison of methods[J].Ecological informatics,2009,4(4): 206-214.
    [9] 魏静明,李应.利用抗噪纹理特征的快速鸟鸣声识别[J].电子学报,2015,43(1):185-190.WEI J M,LI Y.Rapid bird sound recognition using anti-noise texture features[J].Acta electronica sinica,2015,43(1):185-190(in Chinese with English abstract).
    [10] LEE C H,HSU S B,SHIH J L,et al.Continuous birdsong recognition using Gaussian mixture modeling of image shape features[J].IEEE transactions on multimedia,2012,15(2): 454-464.
    [11] 张赛花,赵兆,许志勇,等.基于Mel子带参数化特征的自动鸟鸣识别[J].计算机应用,2017,37(4):1111-1115.ZHANG S H,ZHAO Z,XU Z Y,et al.Automatic bird vocalization identification based on Mel-subband parameterized feature[J].Journal of computer applications,2017,37(4):1111-1115(in Chinese with English abstract).
    [12] JAN?OVI? P,K?KüER M,RUSSELL M.Bird species recognition from field recordings using HMM-based modelling of frequency tracks[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing,May 04-09,2014,Florence,Italy.New York:IEEE,2014:8252-8256.
    [13] HINTON G,DENG L,YU D,et al.Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J].IEEE Signal processing magazine,2012,29(6): 82-97.
    [14] ZHANG X,CHEN A,ZHOU G,et al.Spectrogram-frame linear network and continuous frame sequence for bird sound classification[J/OL].Ecological informatics,2019,54:101009[2022-09-19].https://doi.org/10.1016/j.ecoinf.2019.101009.
    [15] SPRENGEL E,JAGGI M,KILCHER Y,et al.Audio based bird species identification using deep learning techniques[C]//Conference and Labs of the Evaluation Forum (CLEF) 2016,September 5-8,2016,évora,Portugal.[S.l.]:LifeCLEF,2016:547-559.
    [16] JOLY A,GO?AU H,GLOTIN H,et al.Lifeclef 2017 lab overview: multimedia species identification challenges[C]//International Conference of the Cross-Language Evaluation Forum for European Languages,Sept 11-14,2017,Dublin,Ireland.Cham: Springer,2017:255-274.
    [17] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,June 07-12,2015,New York,USA.New York:IEEE,2015:1-9[2022-09-19].
    [18] HOU Q,ZHOU D,FENG J.Coordinate attention for efficient mobile network design[DB/OL].arXiv,2021:2103.02907[2022-09-19].https://doi.org/10.48550/arXiv.2103.02907.
    [19] LIM M,LEE D,PARK H,et al.Convolutional neural network based audio event classification[J].KSII transactions on internet and information systems (TIIS),2018,12(6): 2748-2760.
    [20] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[DB/OL].arXiv,2015:1409.1556[2022-09-19].https://doi.org/10.48550/arXiv.1409.1556.
    [21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[DB/OL].arXiv,2015:1512.03385[2022-09-19].https://doi.org/10.48550/arXiv.1512.03385.
    [22] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李怀城,杨道武,温治芳,王亚楠,陈爱斌.基于Inception-CSA深度学习模型的鸟鸣分类[J].华中农业大学学报,2023,42(3):97-104

复制
分享
文章指标
  • 点击次数:301
  • 下载次数: 860
  • HTML阅读次数: 117
  • 引用次数: 0
历史
  • 收稿日期:2022-09-19
  • 在线发布日期: 2023-06-20
文章二维码