1 Introduction
Optical thin films are key components of laser systems, and their optical properties and laser-induced damage threshold (LIDT) directly affect their output energy[ Reference Xu, Zhu, Chai, Roshanzadeh, Boyd, Rudolph, Zhao, Chen and Shao 1 – Reference Xing, Fan, Huang, Cheng and Du 3 ]. Traditional preparation methods for laser thin films include electron-beam evaporation[ Reference Field, Galloway, Kletecka, Rambo, Smith, Gruzdev, Carr, Ristau and Menoni 4 – Reference Shuai, Liu, Zhao, Qiu, Li, Gong, Sun, Zhou, Jiang, Dai, Shao and Xia 6 ] and ion-beam sputtering[ Reference Malobabic, Jupe and Ristau 7 ]. Recently, plasma-enhanced atomic layer deposition (PEALD) has attracted attention because of its precise thickness controllability[ Reference Mahata, Byun, An, Choi, An and Kim 8 ], excellent conformality[ Reference Faraz, Knoops, Verheijen, van Helvoirt, Karwal, Sharma, Beladiya, Szeghalmi, Hausmann, Henri, Creatore and Kessels 9 ], low-temperature growth properties[ Reference Kim, Kim, Park, Jeong, Kim, Chung, Kim and Park 10 ] and high LIDT[ Reference Liu, Jensen, Ma and Ristau 11 ]. Furthermore, post-treatment annealing improves the properties of thin films grown via PEALD[ Reference Abromavičius, Kičas and Buzelis 12 ]. However, owing to the diversity and wide range of process parameters, process optimization and thin film performance improvement often require extensive, expensive and time-consuming experiments.
Back-propagation neural networks (BPNNs), a subset of machine learning, have shown potential for mapping the relationship between experimental parameters and material properties[ Reference Li, Chen, Xiong, Liu, Dou, Zhan, Zhu, Chu, Li and Ma 13 , Reference Liu, Yu, Wan, Shu, Sun, Gui and Xu 14 ]. This approach can identify underlying regularities in the training data by updating the internal weight parameters[ Reference Lininger, Hinczewski and Strangi 15 , Reference Xia, Hu, Chen and Li 16 ]. In recent years, researchers have begun to study the application of neural networks in the field of thin films to predict the growth rate[ Reference Kimaev and Ricardez-Sandoval 17 – Reference Bahramian 20 ], hydrophobicity[ Reference Gukeh, Moitra, Ibrahim, Derrible and Megaridis 21 ], permeate flux and foulant rejection[ Reference Fetanat, Keshtiara, Keyikoglu, Khataee, Daiyan and Razmjou 22 ]. Although these reports demonstrate the application of BPNNs in various thin films, studies on the properties of laser thin films are lacking. Furthermore, the adopted models were mainly shallow structures with single or double hidden layers. Shallow-structure neural networks can meet most modeling and prediction needs but may require a large number of neurons to accurately represent the relationship between the input and output[ Reference Montufar 23 ], which increases the likelihood of errors in models[ Reference Fetanat, Keshtiara, Keyikoglu, Khataee, Daiyan and Razmjou 22 ]. In 2022, Mengu et al. [ Reference Mengu, Rahman, Luo, Li, Kulce and Ozcan 24 ], while studying the emerging symbiotic relationship between deep learning and optics, reported the advantages of deep neural networks with three or more hidden layers in terms of approximation and generalization capability. However, as the number of hidden layers increases, deep neural networks may suffer from poor performance or training failure owing to issues such as vanishing/exploding gradients[ Reference Liu, Chen, Du, Jin and Shang 25 ]. Therefore, it is necessary to determine the optimal number of hidden layers for solving a special task.
In this study, we employ several BPNN models to establish the relationship between the annealing process and the properties of PEALD-HfO2 thin films for laser applications. Firstly, comparing the performance of BPNN models with different numbers of hidden layers, it is deduced that the three-hidden-layer back-propagation neural network (THL-BPNN) performs best. The THL-BPNN model was then used to model and predict the relationship between the annealing process and the PEALD-HfO2 thin film properties and was compared with the other two models. Finally, the LIDT of the PEALD-grown thin films and the properties of the PEALD-SiO2 thin films were predicted using the THL-BPNN model, and the applicability of the THL-BPNN model was verified. We believe that the THL-BPNN model can help predict the properties of other laser thin films.
2 Materials and methods
2.1 Data preparation
The HfO2 thin films used to construct the annealing process–thin film property relationship were grown on Si substrates using a commercial PEALD device (Picosun Advanced R200, Finland) with an integrated remote plasma source. HfO2 thin films were grown by alternating exposure to the precursor tetrakis-ethylmethylamino hafnium (Hf(N(CH3)(CH2CH3))4, TEMAH) and O2/Ar gas mixture plasma reactant at a deposition temperature of 150°C. The number of deposition cycles was 500, and the pulse sequence for each HfO2 growth cycle was as follows: TEMAH feeding (1.6 s), N2 purging (19 s), Ar/O2 mixture feeding (11 s) and Ar purging (10 s). The samples were then annealed in quartz tube annealing equipment (RS 80/300/11, Nabertherm) for 3 h. The annealing process included a combination of three atmospheres (vacuum, O2 and N2) and six annealing temperatures (300°C to 800°C in 100°C increments). For vacuum annealing, the pressure in the tubular annealing chamber was approximately 1 × 10–4 Pa. For O2 and N2 atmosphere annealing, the gas flow rate was 150 SCCM for both O2 and N2. The HfO2 thin films were measured using an ellipsometer (Horiba Uvisel 2), and the thicknesses and refractive indices were extracted using the Tauc-Lorentz model in DeltaPsi2 software, neglecting the extinction coefficient (k). The O/Hf ratio of the HfO2 thin films was analyzed using X-ray photoelectron spectroscopy (XPS) (Thermo Scientific) with a monochromatic Al Kα (1486.6 eV) X-ray source. The data used to construct the annealing process–thin film property relationship consisted of 19 samples, including 1 as-deposited sample and 18 annealed samples.
The HfO2 thin film data used for LIDT modeling and prediction come from Ref. [Reference Lin, Zhu, Song, Liu, Yin, Zeng and Shao26], including 12 samples treated by different annealing process parameters. Among them, six samples were annealed in an O2 atmosphere, and the other six samples were annealed in a N2 atmosphere. The annealing temperature ranged from 300°C to 800°C.
The SiO2 thin film data used for property modeling and prediction come from Ref. [Reference Yin, Zhu, Zeng, Song, Chai, Shao, Zhang, Zhao, Li and Shao27], including 10 samples grown by different deposition process parameters. Among them, four samples were grown at different temperatures ranging from 50°C to 200°C, and six samples were grown with different precursor source exposure times ranging from 0.2 to 0.7 s.
Table 1 lists the detailed parameters of the datasets used to model and predict the properties of HfO2 and SiO2 thin films, including the refractive index, thickness and stoichiometric ratio. As the annealing temperature increases, the thickness of the HfO2 thin film decreases and the refractive index increases. In a vacuum environment, O2 environment and N2 environment, the thickness of HfO2 thin films annealed at different temperatures changes in the range of 34.7–42.7, 38.5–49.1 and 36.3–46.7 nm, respectively, while the refractive index (at 355 nm) of HfO2 thin films annealed at different temperatures changes in the range of 1.99–2.24, 1.83–1.97 and 1.88–2.00, respectively. This means that the packing density of the HfO2 thin film increases with increasing annealing temperature[Reference Lin, Zhu, Song, Liu, Yin, Zeng and Shao26]. In addition, the O/Hf ratio of HfO2 thin films annealed in an O2 environment fluctuates slightly around the ideal value of 2.0. However, the O/Hf ratio of HfO2 thin films annealed in vacuum and N2 environments decreases with increasing annealing temperature.
*Note: 0, 1, 2 and 3 represent the as-deposited sample, O2, N2 and vacuum, respectively.
Table 2 lists the detailed parameters of the datasets used for LIDT modeling and prediction. Compared with PEALD-HfO2 thin films, PEALD-SiO2 thin films have lower absorption and impurity content. Furthermore, properties such as absorption, impurity content and stoichiometric ratio influence each other. Detailed relationships are described in Refs. [Reference Lin, Zhu, Song, Liu, Yin, Zeng and Shao26,Reference Yin, Zhu, Zeng, Song, Chai, Shao, Zhang, Zhao, Li and Shao27]. The LIDT was tested in one-on-one mode according to ISO 21254 using a Gaussian-shape 3ω neodymium-doped yttrium aluminum garnet (Nd:YAG) laser (355 nm, 7.8 ns). The LIDT test was performed under normal incidence, and the maximum laser fluence with zero damage probability was determined as the LIDT. It is worth mentioning that the LIDT of HfO2 thin films is lower than that of SiO2 thin films, which is attributed to the fact that the bandgap of HfO2 is lower than that of SiO2.
*Note: 1 and 2 represent HfO2 samples and SiO2 samples, respectively.
2.2 Models
Six models, namely four BPNN models with different numbers of hidden layers (single-hidden-layer BPNN, double-hidden-layer BPNN, three-hidden-layer BPNN and four-hidden-layer BPNN), a support vector machine regression (SVR) model[ Reference Cortes and Vapnik 28 ] using a Gaussian kernel function and a linear regression (LR) model[ Reference Wang, Wu, Zheng, Zeng, Ding and Zhang 29 ], were used to establish the correlation between the annealing process and the refractive index, layer thickness and O/Hf ratio of PEALD-HfO2 thin films. Except for the LR model, which belongs to the category of linear regression fitting, the other models belong to the category of nonlinear regression fitting. All models performed regression fitting by training on a training set, tuning the modeling parameters to achieve the highest accuracy (i.e., lowest error) and then validating on a validation set. When constructing the relationship between the annealing process and the properties of the PEALD-HfO2 thin films, 6 samples were randomly selected as the validation set, and the remaining 13 samples (12 annealed samples and 1 as-deposited sample) were used as the training set. When predicting the LIDT of PEALD-grown thin films, 6 samples (3 HfO2 samples and 3 SiO2 samples) were randomly selected as the validation set, and the remaining 16 samples (9 HfO2 samples and 7 SiO2 samples) were used as the training set. When predicting the properties of PEALD-SiO2 thin films, the leave-one-out cross-validation method was adopted owing to limited data. For each test, one sample was used as a validation set, and the remaining samples were used as a training set until every sample was used as a validation set. Subsequently, the average performance deviation was calculated for each model.
Figure 1 shows a schematic of the THL-BPNN model, including an input layer (layer 0), three hidden layers (layers 1–3) and an output layer (layer 4), with each layer containing one or more neurons. The number of neurons in the input and output layers was determined by the number of input and output variables in the dataset, whereas the number of neurons in the hidden layers was initially determined using Equation (1) (an empirical formula) and finally determined by a global traversal search:
where u, v and l are the numbers of neurons in the input, output and hidden layers, respectively, and a is a random number between 1 and 10.
The neurons receive input signals from the previous layer and generate output signals for the next layer[ Reference Xu, Zhang, Fu and Liu 30 , Reference Ma, Li, Liu, Zhang, Zhang, Zheng and Lu 31 ]. For example, the first neuron in layer 1 (from top to bottom), the circle where h 11 is located, receives input signals, x = [x 1; x 2], from layer 0. Then x undergoes linear transformation to get the weighted sum, z, which is expressed as follows:
where w = [w 1; w 2] $\in$ R is a weight vector between the neurons, and b $\in$ R is a bias.
Subsequently, z passes through a nonlinear activation function $f\left(\cdotp \right)$ [ Reference Qiu 32 ], and the output signal h 11 is generated as follows:
These processes were performed for each neuron in each layer to form the final output signal, y 1 [ Reference Wiecha, Arbouet, Girard and Muskens 33 ]. Obviously, mapping from the input space to the output space is initially established through layer-by-layer information transfer.
To further improve the mapping accuracy, a training loss was constructed in the output layer, and an appropriate training algorithm is selected to update the relevant parameters (weights w and bias b) in combination with the chain rule[ Reference LeCun, Bengio and Hinton 34 ] until the loss or the number of iterations reaches the preset threshold[ Reference Guo, Barrett, Wang and Lvovsky 35 ]. The Levenberg–Marquardt algorithm[ Reference Hagan and Menhaj 36 ] was used to solve the nonlinear least squares problem. The hyperbolic tangent function was selected as the activation function for all hidden layers. The initialization state of each run was fixed to avoid interference from other factors.
2.3 Model specification and evaluation
2.3.1 Variable scaling
Considering that different distribution ranges of the input and output values may lead to biased assessments, Equation (4) is used to scale the input and output of the data to [–Reference Xu, Zhu, Chai, Roshanzadeh, Boyd, Rudolph, Zhao, Chen and Shao1, 1]:
where X is the input or output vector; X max and X min are the maximum and minimum values of the input or output vector, respectively; and Y max and Y min are the maximum and minimum values after normalization, respectively.
2.3.2 Model evaluation metrics
The coefficient of determination (R 2)[ Reference Chicco, Warrens and Jurman 37 ] was used to evaluate the overall performance of each model. The average accuracy (AA) was used to evaluate the performance of each model on a validation set with only a single sample. The root mean square error (RMSE)[ Reference Chai and Draxler 38 ] was used to measure the deviation between the predicted and measured values:
where n is the size of the dataset; Yi and Ti are the measured and predicted values of the ith sample in the dataset, respectively; and $\overline{Y}$ is the average of the measured values. A lower RMSE (close to 0) and higher R 2 and AA (close to 1) indicate smaller differences between the measured and predicted values.
3 Results and discussion
3.1 Analysis of the number of hidden layers of the BPNN model
The influence of the number of hidden layers in the BPNN model on the modeling accuracy was studied using the measured data of the refractive index, layer thickness and O/Hf ratio of the PEALD-HfO2 thin films treated with different annealing process parameters. The optimal number of neurons in each hidden layer was determined by a global traversal search on the training set corresponding to the lowest mean absolute error, and then the optimal model was applied to the validation set. For the refractive index and layer thickness datasets, the total number of neurons in the BPNN model with multiple hidden layers was consistent with that of the single-hidden-layer BPNN model. For the O/Hf ratio dataset, because the optimal number of neurons in the single-hidden-layer BPNN model is only five, this value is set as the maximum number of neurons in each hidden layer in the BPNN model with multiple hidden layers. The modeling and prediction accuracies are shown in Figure 2. Overall, as the number of hidden layers increased from one to three, the difference between the R 2 and RMSE in the training and validation sets decreased, indicating that the model moved from inexact to exact fitting. However, as the number of hidden layers was further increased to four, the difference between the R 2 and RMSE in the training and validation sets increased. This may be due to the fact that the combination of neurons in each layer grows exponentially with the number of hidden layers, which introduces the risk of overfitting while potentially obtaining better solutions. The only exception is the modeling of the refractive index, where a single-hidden-layer BPNN also exhibits good performance, which could be attributed to the small variation in the properties and the uncomplicated relationship between the input and output. With the three-hidden-layer BPNN model, the R 2 values of the refractive index, layer thickness and O/Hf ratio were higher than 0.90 in both the training and validation sets. The THL-BPNN model was selected for the follow-up study.
3.2 Comparison of the THL-BPNN model with other models
The performance of the THL-BPNN model was further evaluated and compared with the LR and SVR models. The refractive index, layer thickness and O/Hf ratio of the HfO2 thin films predicted by the three models were compared with the measured values, as shown in Figure 3 and Table 3. As shown in Figures 3(a), 3(d) and 3(g), the poor performance of the LR model on all three datasets indicates a nonlinear relationship between the annealing process and the thin film properties. As shown in Figures 3(b), 3(e) and 3(h), the SVR model obtains a better fit than the LR model on the layer thickness and O/Hf ratio datasets, but it still does not perform well enough on the refractive index dataset. As shown in Figures 3(c), 3(f) and 3(i), the predicted and measured values of most samples are in good agreement, particularly for the refractive index dataset, indicating that the THL-BPNN model has a high accuracy in modeling and predicting the relationship between the annealing process parameters and HfO2 thin film properties.
Table 3 lists the specific performance of all models on the training and validation sets. The THL-BPNN model performs best among the three regression models, with R 2 values not lower than 0.90 for the refractive index, layer thickness and O/Hf ratio datasets. High R 2 values and low RMSE values indicate that the THL-BPNN model can capture the patterns and extend them to unknown data. In short, the THL-BPNN model shows good stability in constructing the relationship between the annealing process and HfO2 thin film properties under several conditions.
3.3 Evaluation of the THL-BPNN model for other thin film applications
3.3.1 Prediction of the LIDT of PEALD-HfO2 and PEALD-SiO2 thin films
The LIDT value is a key specification for thin films used in laser systems[ Reference Pu, Liu, Wang, Pan, Chen and Liu 39 , Reference Du, Zhu, Shi, Liu, Sun, Yi and Shao 40 ]. Firstly, we analyzed the main factors affecting the LIDT. According to Ref. [Reference Lin, Zhu, Song, Liu, Yin, Zeng and Shao26], the main factors affecting the LIDT of HfO2 thin films are the C impurity content, N impurity content, absorption and O/Hf ratio. Pearson’s correlation coefficient was used to further analyze the correlation between the main influencing factors and the LIDT. The results shown in Figure 4 indicate that, except for the O/Hf ratio, which is positively correlated with the LIDT, all other parameters are negatively correlated with the LIDT. The change in the C and N impurity contents can be represented by the total impurity content. Likewise, for SiO2 thin films, factors affecting the LIDT include the total impurity contents, absorption and O/Si ratio. Then, we applied the THL-BPNN to the quantitative prediction of the LIDT based on these factors. The total impurity contents, absorption, stoichiometric ratio and type of thin film were fed into the THL-BPNN as input variables, and the LIDT was derived as the output variable.
Furthermore, the predicted LIDT and measured LIDT of each sample are shown in Figure 5. It is observed that the THL-BPNN model performs well in both training and validation sets with high accuracy and low error, which is smaller than the relative error of the LIDT. The relative error of damage probability is about ±15%, mainly due to the uncertainty of the nonuniformity among the samples (3%), the measurement of the laser spot area (5%) and the fluctuation of laser energy (5%)[ Reference Liu, Wei, Wu, Yu, Cui, Yi and Shao 41 ]. For the training set and validation set, the R 2 values are 1.00 and 0.97, respectively, and the RMSE values are 0.48 and 2.32, respectively. The results show that the THL-BPNN model is effective for predicting LIDT values of HfO2 and SiO2 thin films.
3.3.2 Prediction of other properties of PEALD-SiO2 thin films
SiO2 is the most common low-refractive-index material used for laser thin films in the ultraviolet to near-infrared wavelength region. It is of great significance to study the correlation between the properties of SiO2 thin films and the deposition parameters. Therefore, we applied the THL-BPNN model to evaluate the relationship between the deposition parameters and the properties of PEALD-SiO2 thin films. Figure 6 shows the excellent performance of the THL-BPNN model in predicting the properties of PEALD-SiO2 thin films on the validation set, including the refractive index, layer thickness and O/Si ratio. For most samples, the prediction deviation was smaller than the measurement error.
Table 4 lists the R 2, AA and RMSE values of the THL-BPNN model for SiO2 thin film properties. Except for the average R 2 value of the O/Si ratio on the training set of 0.81, the other values, including the average R 2 value of the refractive index and layer thickness in the training set and the AA values of the three properties in the validation set, are higher than 0.98. Although the THL-BPNN model did not perform sufficiently well on the O/Si ratio training set, it still provided accurate predictions on the corresponding validation set. This could be attributed to the successful learning of correlations by the THL-BPNN model through training. Therefore, the THL-BPNN model can be used to construct the relationship between the deposition parameters and PEALD-SiO2 thin film properties, thus proving the universality of the THL-BPNN model in studying the nonlinear relationship between the process parameters and thin film properties.
4 Conclusions
In this study, BPNN models with different numbers of hidden layers were used to establish the correlation between the properties of PEALD-HfO2 thin films and annealing parameters. For modeling, the annealing parameters, including the annealing atmosphere and temperature, were used as inputs, and measured thin film properties, including the refractive index, layer thickness and O/Hf ratio, were used as outputs. The data were split into two categories: a training set and a validation set. Firstly, BPNN models with different numbers of hidden layers were compared. The results demonstrated that as the number of hidden layers was increased to achieve higher accuracy on the training sets, the risk of overfitting also increased. Considering the fitting accuracy and model stability, the THL-BPNN model was adopted in a follow-up study. The performance of the THL-BPNN model was then compared with that of the LR and SVR models. The poor performance of the LR model on most datasets indicated that the effect of the two input features on the dependent output variable was nonlinear. The THL-BPNN model achieved a high accuracy of not less than 0.90 on all training and validation datasets, confirming that the THL-BPNN model outperforms the SVR model, which also belongs to the category of nonlinear regression fitting. Finally, the THL-BPNN model was used to predict the LIDT of PEALD-HfO2 and PEALD-SiO2 thin films, and the mapping relationship between deposition parameters and PEALD-SiO2 thin film properties was constructed. The modeling results showed that the predicted values are consistent with the measured values, proving that the THL-BPNN model is a reliable predictive learning-based model. We believe that the THL-BPNN model can be used to predict the properties of different types of thin films, thereby reducing the experimental cost of process optimization.
Acknowledgements
The authors express their appreciation to Wenyun Du and Zesheng Lin for their fruitful discussions. This work was supported by the Program of Shanghai Academic Research Leader (No. 23XD1424100), the CAS Project for Young Scientists in Basic Research (No. YSBR-081), the National Natural Science Foundation of China (No. 61975215) and the Science and Technology Planning Project of the Shanghai Municipal Science & Technology Commission (No. 21DZ1100400).