国家自然科学基金项目（31572375）； 中央高校基本科研业务费专项（2662016PY006）； 中央高校基本科研业务费专项（2262018JC033）； 华中农业大学大北农青年学者提升专项（2017DBN019）
为指导养猪生产者更好地预测母猪的产仔数性状，尽早淘汰繁殖力较差的母猪，提升母猪群体的繁殖潜力，对记录了母猪总产仔数、产活仔数、健仔数、5日龄仔猪数和1 kg以上仔猪数的生产数据进行处理和描述统计，使用R软件中的Boruta包筛选出影响母猪产仔数性状的重要特征如品种、胎次、配种季节等，利用传统回归分析方法（LR）和不同机器学习方法—决策树（decision tree，DT）、K近邻（K-nearest neighbor，KNN）、支持向量机（support vector machine，SVM）对产仔数性状进行回归分析，最后比较机器学习方法与传统回归方法建模的优劣。结果显示，母猪总产仔数、产活仔数、健仔数、5日龄仔猪数和1 kg以上仔猪数不同回归分析方法的R2均达到0.71以上（0.71~0.88），体现了特征选择的正确性；在预测母猪总产仔数、产活仔数、健仔数、5日龄仔猪数和1 kg以上仔猪数中SVM模型均显著优于其他机器学习模型（P<0.05）并且要优于传统回归方法，而且在以上模型中预测1 kg以上仔猪数的SVM模型最优。因此，在今后的养猪生产中机器学习方法可能会成为养猪生产者早期选育高繁殖力母猪的一种新途径。
Currently,litter size trait is an important indicator to measure sow fertility and play important roles in determining total income of pig farm in China. An accurate prediction of these traits in the early life of an animal will allow pig producers to adjust their management practices in order to cull bad sows early and improve the reproductive ability of core sows. However,there are many factors not only influence sow’s litter size trait,but also influence each other. Traditional prediction methods may not be powerful enough to capture complex interactions while avoiding overfitting. In this case,learning algorithms that can learn from current data to predict the animal’s future performance offers promise. In this study,firstly,the sow’s production data,including total number of piglets born (TNB),number born alive (NBA),number of healthy piglets(NHP),number of piglets aged 5 day (N5D) and number of piglets weight above 1 kg (NPWA1) were processed and described statistically. Then,the R-package Boruta was used to screen out important eigenvalues affecting the litter size traits of sows,such as breed,parity,mating season,delivery season,gestation period,interval birth and birth litter weight. Last,regression analysis was performed by traditional linear regression method and three different machine learning methods including decision tree (DT),K-nearest neighbor (KNN) and support vector machine (SVM). The evaluation index of model including R2 and MSE are obtained by ten flod cross validation. Additionally,modeling methods was assessed by these indexes and best model was screened scatter plot using a part of original data. The results showed that the R2 of all regression analysis methods in TNB,NBA,NHP,N5D NPWA1 was over 0.71 (0.71-0.88),which showed that the selection of characteristics is correct. The SVM model was not only significantly better than other machine learning methods (P<0.05),but also better than traditional regression method in predicting TNB,NBA,NHP,N5D and NPWA1. The SVM model of NPWA1 is the best in all models. Therefore,machine learning methods will become a new approach for pig producers to breed high-fecundity sows in the future.