Fast and nondestructive discrimination of quality level of Yongchuan Xiuya tea based on near infrared spectroscopy combined with artificial neural network algorithm

Near infrared spectroscopy (NIRS) was used to discriminate the quality level of Yongchuan Xiuya tea quickly and nondestructively. Three quality levels of Yongchuan Xiuya tea were collected, then scanning NIRS, pretreating spectral noise information, screening characteristic spectral intervals by backward interval partial least squares, proceeding principal component analysis. Last, the jump connection nets artificial neural network (J-BP-ANN) with three kinds of transfer functions was applied to establish models. The best pretreated method was the combination of multivariate scattering correction and the first derivative. Six characteristic spectral intervals were screened, which accounting for 27.

Generally, the national standard sensory evaluation method (Gong et al., 2018) is used to evaluate the quality grade of tea (Zhu et al., 2017). Tea experts skilled in sensory evaluation method score the five evaluation factors of tea, such as the shape, color, aroma, taste and leaf bottom, and then calculate the total scores of the five factors to comprehensively evaluate the quality grade.
Sensory evaluation method is a classic method, but it requires that the reviewers have strong professional background and are vulnerable to the influence of many factors such as reviewers' preferences, physical conditions and environment, which makes the evaluation results more subjective and cannot meet the demand for rapid evaluation of tea quality grade. The wet chemical detection method is to use a variety of chemical detection instruments (such as high performance liquid chromatography (HPLC) (Guo et al., 2020), gas chromatography (GC) (Fang et al., 2019), high performance liquid chromatography/mass spectrometry (HPLC-MS) (Xu et al., 2019) and gas chromatography/mass spectrometry (GC-MS) (Xu et al., 2018)) to accurately determine the contents in tea of different quality levels (Zhou et al., 2014a, b), so as to evaluate the quality level . Although this method is objective, fair and accurate, it requires complex sample pretreatment before determination. The determination is time consuming and laborious, and a large number of chemical detection reagents are required, which will bring great pressure to the external environmental protection. In addition, the detection cost is expensive, and the real time and nondestructive detection of tea quality level cannot be realized. Therefore, this method also has major defects, and it cannot be directly applied to the rapid discrimination of quality grade in the tea sales market. It can be seen that there is an urgent need to develop a convenient and objective new method to quickly and nondestructive judge the quality level of Yongchuan Xiuya tea, and to meet the detection needs of tea market sales.
Near infrared spectroscopy (NIRS) is an electromagnetic wave with a wavelength in the range of 780-2526 nm (12, 820-3, 958 cm -1 ), which mainly reflects the X-H chemical bond information in the sample (Lv, 2006). By establishing a prediction model, a fast, nondestructive and digital detection of a component can be achieved. It has been widely used in agriculture, petrochemical, textile, medicine and other industries (Guillemain et al., 2017;Malegori et al., 2017;Forina et al., 2015). At present, many scholars have realized the content prediction of tea polyphenols, caffeine (Sahachairungrueng et al., 2022) and other components, the rapid assessment of the quality of fresh tea leaves  and the discrimination of tea varieties  by using NIRS and a variety of chemometrics methods. In addition, NIRS combined with a variety of chemometrics methods was used to achieve rapid detection of black tea quality grade (Ren et al., 2020a, b) and Huangshan Maofeng tea and Emei tea (He et al., 2022) with an accuracy rate of 97%. However, there was no report on the rapid discrimination of Yongchuan Xiuya tea with different quality levels.
In terms of research on Yongchuan Xiuya tea currently, it mainly focuses on Yongchuan Xiuya tea processing technology , aroma composition analysis, quality characteristics analysis ) and amino acid composition analysis (Yuan et al., 2011), and has also carried out some research work (Fu et al., 2014) in terms of function, but there is no relevant research report on the application of near infrared technology to quickly identify the quality grade of Yongchuan Xiuya tea. Therefore, the backward interval partial least squares (bi-PLS), principal component analysis (PCA) and jump connection nets artificial neural network (BP-ANN) were used to establish NIRS models for three quality levels of Yongchuan Xiuya tea, and the robustness of the model was verified by the prediction set samples, so as to provide theoretical basis and scientific and technological support for rapid, nondestructive and objective discrimination of Yongchuan Xiuya tea with different quality levels.

Yongchuan Xiuya tea samples and its classification
A total of 90 Yongchuan Xiuya tea samples ( Figure 1) processed between 27 March 2020 and 1 May 2020 were obtained from Chongqing Yunling Tea Technology Co., Ltd. Among them, there are respectively 30 samples for the first level of Yongchuan Xiuya tea (the chemical value was set as 1.000), the second level of Yongchuan Xiuya tea (the chemical value was set as 2.000) and the third level of Yongchuan Xiuya tea (the chemical value was set as 3.000). The samples were randomly divided into two sets according to the rate of 2:1, including 60 samples in the calibration set and 30 samples in the prediction set, in order to test the robustness of the calibration set model.

Spectra collection
NIR spectra were obtained in the reflectance mode using a Thermo Antaris II Fourier transform (FT) NIR spectrometer (Thermofisher Scientific, U.S.A.) coupled with an InGaAs detector, a quartz halogen lamp, and an integrating sphere accessory. The samples were placed in a sample cup (ø 30 mm) specifically designed for this application. For each sample, the Yongchuan Xiuya tea (10 g) was placed into the sample cup according to the procedure specified by the manufacturer. The spectral data were obtained from 10000 cm -1 to 4000 cm -1 at 3.857 cm -1 intervals while rotating the sample cup 360° such that the entire sample was analyzed. Duplicates of each sample were scanned three times. The average spectrum of each sample was employed in following analysis.

Spectral data analysis
1) The near infrared spectrum of each sample was converted into 1557 pairs of data points and saved in the excel table. TQ Analyst 9.4.45 software, OPUS 7.0 software and Matlab 2012a software were applied to analyze the spectral data.
2) In order to effectively remove a large amount of background information and noise information in the spectra and improve the signal-to-noise ratio when modeling, spectral free preprocessing (None), standard normal variable (SNV), multiple scatter correction (MSC), first derivative (FD), second derivative (SD) and their combination spectral preprocessing methods were used to denoise the original spectrum (Wang et al., 2022), and the best spectral preprocessing method was selected.
3) The biPLS method was used to divide all the pretreated spectral data equally into 22-25 spectral subintervals, and the partial least squares model was established with the n-1 remaining spectral subintervals through the method of leaving-one. When the root mean square error of cross validation (RMSECV) was the lowest, the spectral intervals obtained were the selected characteristic spectral subintervals reflecting the three quality levels of Yongchuan Xiuya tea. Where n is the number of samples in the calibration set, y i is the true value for sample i, and y i ' is the theoretical value for sample i predicted from the calibration set. 4) Principal component analysis (PCA) was performed on the obtained characteristic spectral data. With the principal components as the input values and the chemical values of Yongchuan xiuya tea with three quality grades as the output values, the back propagation-artificial neural network of jump connection nets (J-BP-ANN) method was used to establish NIRS models. The results were expressed by the determination coefficient of cross validation (R c 2 ), determination coefficient of prediction (R p 2 ), root mean square error of cross validation (RMSECV), and root mean square error of prediction (RMSEP). Among them, the larger the R p 2 and the lower the RMSEP, the better the model prediction effect was (Wolfgang & Leopold, 2014).
For the prediction set, the root mean square error of prediction (RMSEP) was calculated as follows (Equation 2): Where n is the number of samples in the prediction set, y i is the true value for sample i, and y i ' is the predicted value for sample i.
R 2 was calculated as follows (Equation 3): Where y is the mean true value, y i is the true value for sample i, and y i ' is the predicted value for sample i.

Screening of spectral preprocessing methods
It can be seen from Figure 2 that the spectra had more absorption peaks in the long wave band (4000-7000 cm -1 ), mainly reflecting the NIRS absorption information of water -OH and various components in three different quality grades of Yongchuan Xiuya tea, while the NIRS information reflecting the quality grades was relatively weak, which was likely to be covered by the NIRS absorption peak information of other components, and will greatly affect the prediction effect of models (Chu, 2021). Therefore, before establishing the model, it was the first thing to preprocess the spectra to filter as much noise information as possible and improve the signal-to-noise ratio. In this study, various spectral pretreatment methods were used to pretreat NIR spectra of Yongchuan Xiuya tea with three different quality grades, and partial least squares (PLS) was used to establish the NIRS models. The performances of the models were shown by the two parameters of RMSECV and R c 2 , and the higher R c 2 and the lower RMSECV, the better the pretreatment method. The results of all the pretreatment models were shown in Table 1 and Figure 3.   It can be seen from Table a and Figure 3 that among the models' performances established by the above nine spectral pretreatment methods, the results of NIRS models established with the original spectra were the worst (R c 2 = 0.374, RMSECV = 1.502); Among the models established with single pretreatment method, the NIRS model by MSC pretreatment method had better results (R c 2 = 0.614, RMSECV = 1.323), but the prediction results were worse than those established by combined pretreatment methods; Among the models established by combined pretreatment methods, the NIRS model established by MSC + FD combined method had the best prediction results (R c 2 = 0.685, RMSECV = 1.053). Compared with the model established by the original spectra, R c 2 increased by 83.2% and RMSECV decreased by 29.9%. It can be seen that it was very necessary to preprocess the original spectra before establishing the model, which can effectively eliminate some noise information, and this finding was consistent with previous conclusions (Li & Altaner, 2019). The best spectral preprocessing method in this study was the MSC + FD combination method, but the prediction results were still poor, which cannot meet the requirements for accurate prediction of Yongchuan Xiuya tea with three different quality grades. Therefore, further screening the characteristic spectra intervals was necessary to improve the prediction effect of the model.

Screening of characteristic spectral intervals
Due to the overlapping, covering and cross effects of spectral information, it was necessary to further screen the spectral information intervals closely related to the three quality levels of Yongchuan Xiuya tea to improve the prediction accuracy of the model. In this study, the biPLS method was used to screen the characteristic spectral intervals. When all spectral data were divided into 22 spectral subintervals, and RMSECV was the lowest, so the spectral subintervals obtained were the filtered characteristic spectral intervals. The results were shown in Table 2.
It can be seen from Table 2 that in the process of establishing biPLS models, when R c 2 was 0.773 and the lowest RMSECV was 0.627, there were six spectral subintervals ([4, 6, 9, 12, 19, 22]) modeled, and the corresponding spectral intervals were 4821.2-5091.2 cm -1 , 5368.9-5638.8 cm -1 , 6190.4-6460.4 cm -1 , 7011.9-7281.9 cm -1 , 8924.9-9191.1 cm -1 , 9734.9-10000 cm -1 , respectively. The proportion of the characteristic spectral range in the total spectral range was 27.23%. It can be seen that the biPLS method can screen the characteristic spectral intervals reflecting the three quality grades of Yongchuan Xiuya tea, which greatly reduced the amount of spectral data and the spectral information to be input into the model, and the prediction accuracy of the model was improved. The R c 2 of the best biPLS model was 12.8% higher than that of the best PLS model, and RMSECV was 40.45% lower than that of the best PLS model. So, it can be seen that by further screening the characteristic spectral intervals, not only the prediction effect of the three quality levels of Yongchuan Xiuya tea models was further improved, but also the amount of spectral data for modeling was reduced as much as possible. This advantage can reduce the complexity of the model and improve the robustness of the model, which laid a solid foundation for the next step of establishing the artificial neural network model of Yongchuan Xiuya tea with three quality levels.

Principal component analysis
Before establishing the back propagation-artificial neural network of jump connection nets (J-BP-ANN) model, it was required to input as few data as possible, and the principal component analysis of the sample spectra should be carried out firstly. Therefore, in this study, the principal component analysis was conducted on the characteristic spectral intervals screened by the biPLS method. The cumulative contribution rate of the first five principal components was as follows (Table 3).
It can be seen from Table 3 that after the principal component analysis of the selected characteristic spectral intervals, the contribution rate of the first five principal components decreases rapidly, of which the contribution rate of PC1 was 89.95%, the cumulative contribution rate of PC1-PC5 was 99.14%, and the cumulative contribution rate of the first three principal components was 97.85%. According to the principle of principal component analysis, the information of the first three principal components can represent all the information of the characteristic spectral intervals. The Figure 4 and Figure 5 of PC1 vs PC2 vs PC3 of three quality levels of Yongchuan Xiuya tea were as follows: It can be seen from Figure 4 and Figure 5 that through principal component analysis of the selected characteristic spectral intervals by biPLS method, the three quality levels of Yongchuan Xiuya tea had certain clustering characteristics. In the peripheral samples of the distribution space, the spatial distance between them was far, indicating that the biPLS method had good sample classification results, and the selected characteristic spectral intervals also had good representativeness. However, a small number of samples were still confused. This may be because the biPLS method was a linear method with limited classification effect. Therefore, the next step was to try to apply nonlinear artificial neural network method to further classify Yongchuan Xiuya tea samples with three different quality levels.

Establishment of J-BP-ANN model
With the first three principal components obtained from the above principal component analysis as the input variables and the output chemical values of the three quality levels of Yongchuan Xiuya tea, the near infrared spectral prediction model of quality levels was established by using the jump connection nets artificial neural network (J-BP-ANN) method. In the process of establishing the model, the transfer functions between the transfer layers were different, and the model prediction results were also different. In the process of establishing the J-BP-ANN model, this paper has set the learning rate of the artificial neural network model as 0.10, and applied three transfer functions, namely, linear [0,1] function and logistic function and tanh function. The results were shown in Table 4.
It can be seen from Table 4 that the prediction results of the three kinds of transfer function jump connection nets artificial neural network models were different. The relatively poor one was the model established by using the linear [0,1] function (R p 2 = 0.877, RMSEP = 0.136). This may be because Yongchuan Xiuya tea has undergone a large number of chemical reactions in the fixing stage and processing process, generating more chemical components, and the internal chemical components were very complex, so the near infrared spectral information of the components obtained was also extremely complex, which the nonlinear characteristics of spectral information were obvious. Therefore, the prediction results of the jump connection nets artificial neural network model of the three quality levels of Yongchuan Xiuya tea established by linear transfer function [0,1] were the worst. However, logistic function and tanh function had strong nonlinear characteristics. The prediction results of jump   : the coefficient of determination for calibration; R p 2 : the coefficient of determination for prediction; RMSEP: root mean square error of prediction; RMSECV: root mean square error of cross validation. the spectral information, and the model prediction results were slightly better than linear function results. The tanh function is a hyperbolic tangent function. The model converges faster and reduces the number of iterations. The prediction result of the tanh function model was the best among the three transfer functions, and the prediction model is also the most robust.
Among them, the best prediction accuracy was the artificial neural network model of Yongchuan Xiuya tea with three quality levels established by using tanh transfer function (R p 2 = 0.942, RMSEP = 0.041). The established jump connection nets artificial neural network model can completely classify the prediction set samples, and the results of the prediction set model were shown in Figure 6 and Figure 7.
It can be seen from Figure 6 and Figure 7 that when 30 samples of the prediction set were used to test the robustness of the calibration set model, the true values were almost the same as the predicted values, and the |prediction deviation|< 0.08 between the true values and the predicted values indicating that the J-BP-ANN model had extremely high prediction accuracy with tanh function, no over fitting phenomenon occurred, and can accurately predict the quality levels of Yongchuan Xiuya tea. The discrimination rate for the calibration set and the prediction set sample was both 100%, and J-BP-ANN method with tanh function can predict the quality levels of Yongchuan Xiuya tea quickly and accurately. The results of prediction set model samples were shown in Table 5.

Conclusion
1) The MSC + FD method was the best preprocessing method to remove NIRS noise information, and then the biPLS method was used to screen the characteristic spectral intervals reflecting the three quality levels of Yongchuan Xiuya tea; After principal component analysis, cumulative contribution rate of the first three PCs was 97.85%, which were as the input values. When the J-BP-ANN model was established with the tanh function, and the NIRS model had the best results, whose R c 2 and RMSECV were 0.953 and 0.031, respectively.
2) The robustness of the calibration model was tested by the prediction samples, whose R p 2 and RMSEP were   connection nets artificial neural network models established by using these two transfer functions were better than those of linear [0,1] function models. The logistic function is an s-type function, indicating that there are certain nonlinear factors in 0.942 and 0.041, respectively, indicating the robustness of the calibration set model was good, and there was no over fitting phenomenon. The results of this study has provided a scientific method to predict the quality levels of Yongchuan Xiuya tea quickly and accurately, and will provide a solid technical support to develop near infrared spectrometer in the future.