1. Introduction
Information theory is one of the most important branches of science and engineering and has attracted significant attention of numerous researchers over the past seven decades. In information theory, several information-theoretic divergence measures between two probabilistic models have been introduced and then used in many fields including information theory, statistics, engineering and physics. Among the most important information, divergence measures are the Kullback–Leibler and chi-square divergence measures. These two information quantities have found many key applications in information theory, economics, statistics, physics and electrical engineering. In the literature, some extensions of Kullback–Leibler and chi-square divergence measures have appeared during the last three decades. For pertinent details, one may refer to [Reference Basu, Harris, Hjort and Jones5, Reference Cover and Thomas7, Reference Kharazmi and Balakrishnan11, Reference Nielsen and Nock18].
The chi-square divergence has several extensions, such as the symmetric chi-square, triangular divergence, generalized chi-square and Balakrishnan and Sanghvi divergence measures. Each of these measures has its own properties and applications in different fields.
In this work, we first consider chi-square (χ 2) and generalized chi-square ($\chi_{\alpha}^2$) divergence measures and then propose relative-$\chi_{\alpha}^2$ and two Jensen versions of $\chi_{\alpha}^2$ (Jensen-$\chi_{\alpha}^2$ and (p, w)-Jensen-$\chi_{\alpha}^2$) divergence measure. We further examine a possible connection between the proposed information measures and also discuss some potential applications of them.
The proposed relative-$\chi_{\alpha}^2$, $D_{\alpha}^{\psi}(\,f:g)$ divergence, provides a measure of the difference between two probability distributions, f and g, that is weighted by the density function $\psi(x)$. The weight density function $\psi(x)$ allows the divergence to be tailored to specific features and characteristics of the data, for the two models that are being compared.
The parameter α controls the sensitivity of the divergence to differences between f and g. For example, when α = 1, the divergence reduces to the L 2 distance, which measures the difference between f and g in terms of their squared deviations. When α = 0, the divergence measure reduces to half of the chi-square divergence measure. The weight density function $\psi(x)$ can be chosen to emphasize or de-emphasize certain regions of the data. For example, a weight function that down-weighs the tails of the distributions could be used to make the divergence more robust to outliers. Alternatively, a weight function that emphasizes a particular region of the data could be used to highlight differences in that region of the data.
Overall, the choice of α and the weight density function $\psi(x)$ can be tailored to suit the specific characteristics and features of the data for the two models that are being compared, allowing for greater sensitivity and flexibility in the comparison process, and the $D_{\alpha}^{\psi}(\,f:g)$ measure has potential uses in various fields, as listed below:
• Statistics: It can be used in goodness-of-fit tests and model selection criteria, for example, chi-square divergence (α = 0) is commonly used in contingency table analysis.
• Machine learning: The proposed divergence measure can be used as a divergence measure in machine learning algorithms, such as clustering, classification and anomaly detection.
• Information theory: The proposed divergence can be used to measure the difference between probability distributions and to quantify the amount of information gained or lost in a data compression or transmission process.
• Signal processing: $D_{\alpha}^{\psi}(\,f:g)$ divergence measure can be used to compare the strength signals in signal processing applications.
• Image processing: The proposed $D_{\alpha}^{\psi}(\,f:g)$ divergence measure can be used to compare image histograms and textures in image processing applications.
One of the main motivations behind the development of $D_{{\alpha}}^{\psi}(\,f:g)$ divergence is that it encompasses several popular divergence measures as special cases, including the symmetric chi-square, triangular divergence, generalized chi-square, and Balakrishnan and Sanghvi divergence measures. This property makes the $D_{{\alpha}}^{\psi}(\,f:g)$ divergence measure a versatile tool for comparing probability distributions in a variety of fields and facilitates the integration of different divergence measures into a unified framework.
Furthermore, it should also be noted that the proposed Jensen-$\chi_{\alpha}^2$ and (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measures are extensions of $D_{{\alpha}}^{\psi}(\,f:g)$ measure based on a convex combination. These extensions allow for the incorporation of additional divergence measures into the framework, further increasing the flexibility and applicability of the method. By combining different divergence measures in a convex form, these Jensen-type divergence measures can provide a more comprehensive and nuanced comparison of probability distributions.
In addition, in this paper, we also establish a new generalized mixture density and specifically show that the proposed model provides optimal information under three different optimization problems associated with $\chi_{\alpha}^2$ divergence measure. Moreover, some results on these information measures and their connections to other well-known information measures are also provided.
First, a diversity measure between two density functions f and g on common support ${\cal X}$, known as chi-square divergence, is defined as
Similarly, we can define $\chi^{2}(g:f).$
A generalized version of χ 2 divergence measure, denoted by $\chi_{\alpha}^2$, between two densities f and g, for $\alpha\geq 0$, considered by [Reference Basu, Harris, Hjort and Jones5], is defined as
Balakrishnan and Sanghvi [Reference Balakrishnan and Sanghvi4] introduced another version of the chi-square divergence in Eq. (1.1) as
where E denotes expectation taken with respect to density f on support ${\cal X}$, assuming it exists. This information measure is known as Balakrishnan–Sanghvi divergence measure.
Moreover, a symmetric version of chi-square divergence measure of the form
has been introduced by [Reference Le Cam12]. Here, E denotes expectation under mixture density $h(x)=\frac{f(x)+g(x)}{2}$. The divergence measure in Eq. (1.4) is known as triangular divergence measure. Throughout this paper, we will suppress ${\cal X}$ in the integration with respect to X, unless a distinction becomes necessary.
The rest of this paper is organized as follows. In Section 2, we first examine the connection between $\chi_{\alpha}^2$ divergence measure and q-Fisher information measure. Here, based on the $\chi_{\alpha}^2$ divergence measure, we introduce a relative-$\chi_{\alpha}^2$ divergence measure, which includes other well-known versions of chi-square divergence as special cases. We propose Jensen-$\chi_{\alpha}^2$ divergence measure in Section 3. We then show that Jensen-$\chi_{\alpha}^2$ divergence is a mixture of the proposed relative-$\chi_{\alpha}^2$ divergence measures. Further, we show that a lower bound for Jensen-$\chi_{\alpha}^2$ divergence can be given by Jensen–Shannon entropy measure. In Section 4, we first introduce (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measure and then discuss some of its properties. Next, the relative-$\chi_{\alpha}^2$ divergence measure of escort and arithmetic densities are studied in Section 5. We then introduce $(p,\eta)$-mixture density in Section 6 and show that this mixture distribution involves optimal information under three different optimization problems associated with $\chi_{\alpha}^2$ divergence measure. In Section 7, we study the relative-$\chi_{\alpha}^2$ divergence measure of order statistics and mixed reliability systems. Next, in Section 8, we use a real example in image processing and present some numerical results in this regard in terms of Jensen-$\chi_{\alpha}^2$ divergence measure. We specifically show that this divergence could serve as an useful measure of similarity between two images. Finally, we make some concluding remarks in Section 9.
2. Relative-$\chi_{\alpha}^2$ divergence measure and connection between $\chi_{\alpha}^2$ divergence measure and q-Fisher information
In this section, we first show that the $\chi_{\alpha}^2$ divergence measure in Eq. (1.2) has a close connection to $q-$Fisher information of mixing parameter of a given arithmetic mixture distribution. Next, we introduce a relative-$\chi_{\alpha}^2$ divergence measure and show that it includes some of the well-known chi-square-type divergence measures as special cases.
2.1. Connection between $\chi_{\alpha}^2$ divergence measure and q-Fisher information
The q-Fisher information of a density function fθ about parameter θ, defined by [Reference Lutwak, Lv, Yang and Zhang14], is given by
where $\log_q (x)$ is the q-logarithmic function defined as
for more details, see [Reference Furuichi9, Reference Masi15, Reference Yamano23]. Then, we have the following result.
Theorem 2.1. Let f 1 and f 2 be two density functions. Then, the q-information measure of mixing parameter p in the two-component mixture model
is given by
where $M_{\frac{1}{2}}(.,.)$ is the power mean with exponent $\frac{1}{2}$, defined as $\mathcal{M}_{\frac{1}{2}}(x,y)=\left(\frac{x^{\frac{1}{2}}}{2}+\frac{y^{\frac{1}{2}}}{2}\right)^{2}$ for positive x and y.
Proof. From the mixture model in Eq. (2.3), we readily see that
Now, from the definition of q-Fisher information measure in Eq. (2.1), we find
which readily yields
as required.
2.2. Relative-$\chi_{\alpha}^2$ divergence measure
In this subsection, we introduce a relative-$\chi_{\alpha}^2$ divergence measure and show that it includes some of the well-known chi-square-type divergence measures as special cases. Further, we show that the special case of the proposed measure, when α = 0, is connected to the variance of density ratios.
Definition 2.2. Let f and g be two density functions on support ${\cal X}$. Then, a relative version of $\chi_{\alpha}^2$ divergence measure between f and g with respect to density function ψ on support ${\cal X}$, denoted by R-$\chi_{\alpha}^2$, for $\alpha\geq 0$, is defined as
provided the involved integral exists. In addition, the special case of R-$\chi_{\alpha}^2$ divergence measure, when α = 0, is of the form
Moreover, it is useful to note that $D_{{\alpha}}^{\psi}(\,f:g)$ reduces to $\chi_{\alpha}^2(\,f:g)$ when $\psi=f.$ It is easily seen from Eq. (2.6) that R-$\chi_{\alpha}^2$ divergence measure can be expressed based on two expectations under densities f and g to be
From the definition of $D_{{\alpha}}^{\psi}(\,f:g)$, the weight density function, $\psi(x)$, can be utilized to assign varying degrees of importance to different regions of the dataset. For instance, a weight function that places less emphasis on extreme values can be employed to make the divergence measure more robust to outliers. On the other hand, a weight function that highlights a specific region of the data can be used to detect dissimilarities within that region of the data.
In general, $D_{{\alpha}}^{\psi}(\,f:g)$ divergence provides a flexible and powerful framework for assessing the differences between probability distributions in a wide range of applications. The parameters α and $\psi(x)$ can be adjusted to suit the specific characteristics and features of the data for the two models that are being compared, offering greater sensitivity and flexibility in the comparison process.
(i) If α = 1, then $D_{\alpha=1}^{\psi}(\,f:g)=L_2(\,f:g)=\int \big(\,f(x)-g(x)\big)^2{\rm d} x$.
(ii) If $\psi(x)=f(x)$, then $D_{\alpha=0}^{\psi}(\,f:g)={\chi_{0}^2(\,f,g)}=\frac{\chi^2(\,f,g)}{2}$.
(iii) If $\psi(x)=g(x)$, then $D_{\alpha=0}^{\psi}(\,f:g)={\chi_{0}^2(g,f)}=\frac{\chi^2(g,\,f)}{2}$.
(iv) If $\psi(x)=p f(x)+(1-p) g(x)$, then $D_{\alpha=0}^{\psi}(\,f:g)=\frac{1}{2(1-p)^2}\chi^2\big(\psi:f\big)=\frac{1}{2p^2}\chi^2(\psi:g)$.
(v) If $\psi(x)= \frac{f(x)+g(x)}{2}$, then $D_{\alpha=0}^{\psi}(\,f:g)=\chi_{T}^2(\,f:g)$, where $\chi_{T}^2(\,f:g)$ is the triangular divergence defined in Eq. (1.4).
(vi) If $\psi(x)= \frac{f(x)+g(x)}{2}$, then $D_{\alpha=0}^{\psi}(\,f:g)= \chi_{\rm BS}^2(\,f:g)+\chi_{\rm BS}^2(g:f)$, where $D_{\rm BS}(\,f:g)$ is the Balakrishnan–Sanghvi divergence measure defined in Eq. (1.3).
Theorem 2.4. Let ψ be a density function. Then, $D_{\alpha=0}^{\psi}(\,f:g)$ divergence measure in Eq. (2.7) can be expressed as
Proof. From the definition of $D_{\alpha=0}^{\psi}(\,f:g)$, we have
as required.
3. Jensen-$\chi_{\alpha}^2$ divergence measure
In this section, we first introduce Jensen-$\chi_{\alpha}^2$ divergence measure and then establish some of its properties.
In fact, the Jensen-$\chi_{\alpha}^2$ divergence measure is an expansion of $D_{{\alpha}}^{\psi}(\,f:g)$ that is established based on a convex combination. This extension allows for the incorporation of additional divergence measures into the framework, further increasing the flexibility and applicability of the method.
Definition 3.1. Let $X_{1}, X_{2}$ and Y be random variables with density functions ${f}_{1},{f}_{2}$ and ψ, respectively, Then, the Jensen-$\chi_{\alpha}^2$ (J-$\chi_{\alpha}^2$) divergence measure, for $p\in (0,1)$, is defined as
Lemma 3.2. The J-$\chi_{\alpha}^2$ divergence measure in Eq. (3.1) is non-negative.
Proof. As $\phi(x)=x^2$ is a convex function, by using Jensen’s inequality, we readily find
where the last expression follows from the fact that
Theorem 3.3. A representation for ${\cal {J}}_{\alpha=0}^{\psi}\big({f}_{1},{f}_{2}; {\bf{P}}\big)$, based on variance of the ratio of densities, is given by
Proof. From the definition of ${\cal {J}}_{\alpha=0}^{\psi}\big({f}_{1},{f}_{2}; {\bf{P}}\big)$, we have
as required.
Theorem 3.4. Let the random variables X 1 and X 2 have density functions f 1 and f 2, respectively. Then, ${\cal {J}}_{\alpha}^{\psi}\big({f}_{1},{f}_{2}; {\bf{P}}\big)$ measure is a mixture of R-$\chi_{\alpha}^{2}$ divergence measures of the form
where $D_{\alpha}^{\psi}(\,f_i:f_T)$ is the divergence measure in Eq. (2.6), with $f_{T}=pf_1+(1-p)f_2$ being the two-component mixture density.
Proof. With $f_{T}=pf_1+(1-p)f_2$, we first find
On the other hand, with $k=pD_{\alpha}^{\psi}(\,f_1:f_T)+(1-p)D_{\alpha}^{\psi}(\,f_2:f_T)$, we also have
which establishes the required result.
Theorem 3.5. A connection between ${\cal {J}}_{\alpha=0}^{\psi}\big({f}_{1},{f}_{2}; {\bf{P}}\big)$ with $\psi=\frac{f_1+f_2}{2}$ and Balakrishnan–Sanghvi divergence measure is given by
where $f_{T}=pf_1+(1-p)f_2$ is the two-component mixture density.
Proof. With $f_{T}=pf_1+(1-p)f_2$ and from Part (vi) of Remark 2.3 and Theorem 3.4, we have
as required.
Theorem 3.6. We have
Proof. From the definition ${\cal {J}}_{\alpha}^{\psi}\big({f}_{1},{f}_{2}; {\textbf{P}}\big)$ in Eq. (3.1) and making use of the dominated convergence theorem, we have
as required.
We now extend the definition of Jensen-$\chi_{\alpha}^{2}$ divergence measure in Eq. (3.1) to the case of n + 1 random variables. Let $X_{1},\ldots,X_{n}$ and Y be random variables with density functions ${f}_{1},\ldots,{f}_{n}$ and ψ, respectively, and $p_{1},\ldots,p_{n}$ be non-negative real numbers such that $\sum_{i=1}^{n}p_{i}=1$. Then, the Jensen-$\chi_{\alpha}^{2}$ measure is defined as
The special case of Jensen-$\chi_{\alpha}^2$ divergence measure, when α = 0, has the representation
Corollary 3.7. The ${\cal {J}}_{\alpha}^{\psi}\big({f}_{1},\ldots,{f}_{n}; {\textbf{P}}\big)$ measure in Eq. (3.3) is a mixture of $D_{\alpha}^{\psi}$ measures in Eq. (2.6) of the form
Theorem 3.8. The ${\cal {J}}_{\alpha}^{\psi}\big({f}_{1},\ldots,{f}_{n}; {\textbf{P}}\big)$ measure in Eq. (3.3) is a mixture of $D_{\alpha}^{\psi}$ measures in Eq. (2.6) of the form
Proof. From Corollary 3.7 and making use of the identity ([Reference Steele21], pp. 95–96)
we obtain
as required.
Theorem 3.9. Let $f_i\geq \frac{\psi^{1-\alpha}}{2}, i=1,\ldots,n$. Then, a lower bound for ${\cal {J}}_{\alpha}^{\psi}\big({f}_{1},\ldots,{f}_{n}; {\textbf{P}}\big)$ is given by
where $JS_{{\textbf{P}}}(\,f_1,\ldots,f_n)$ is the Jensen–Shannon entropy; see [Reference Lin13].
Proof. From the assumption, Theorem 3.8 and by making use of the identity
and then setting $w_i=p_i$, $w_j=p_j$, $x_i=f_i(x)$, $x_j=f_j(x)$ and $\bar{x}_w=\sum_{i=1}^{n}p_i f_i(x)$ , we find
where the second inequality follows from the fact that $\log(x) \lt x-1, x \gt 0$, and the last inequality follows from [Reference Asadi, Ebrahimi, Soofi and Zohrevand3].
4. ( p, w)-Jensen-$\chi_{\alpha}^2$ divergence measure
In this section, we first review the definition of (p, w)-Jensen–Shannon divergence measure. Then, we introduce (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measure in a way similar to (p, w)-Jensen–Shannon divergence. Furthermore, we establish some results for this extended divergence measure. Let f and g be two density functions. Then, the Kullback–Leibler divergence between f and g is defined as
where log denotes the natural logarithm. The (p, w)-Jensen–Shannon divergence between two density functions f 1 and f 2, for α and p $\in (0,1)$, is defined as
where ${\bar{s}}=wp+(1-w)(1-p)$. For more details, one may refer to [Reference Melbourne, Talukdar, Bhaban, Madiman and Salapaka16, Reference Nielsen17].
Definition 4.1. Let $X_{1}, X_{2}$ and Y be random variables with density functions ${f}_{1},{f}_{2}$ and ψ, respectively, Then, the (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measure, for w and p $\in (0,1)$, is defined as
where ${\bar{s}}=wp+(1-w)(1-p)$.
Theorem 4.2. Let the random variables X 1 and X 2 have density functions f 1 and f 2, respectively. Then, ${\cal {J}}_{\alpha}^\psi\big({f}_{1},{f}_{2};{\textbf{P}},\textbf{w}\big)$ is a mixture of relative measures in Eq. (2.6) of the form
with $f_{{\bar{s}}}^{T}=(1-{\bar{s}})f_1+{\bar{s}}f_2$ is the two-component mixture density.
Proof. With $f_{{\bar{s}}}^{T}=(1-{\bar{s}})f_1+{\bar{s}}f_2$, we find
On the other hand, letting
and using the fact that
we find
Now, from the above results, we have
which establishes the required result.
From Definitions 3.1 and 4.1, we readily have the following Corollary.
Corollary 4.3. A connection between ${\cal {J}}_{\alpha}^\psi\big({f}_{1},{f}_{2};{\textbf{P}},\textbf{w}\big)$ and ${\cal {J}}_{\alpha}^\psi\big({f}_{1},{f}_{2};\textbf{w}\big)$ measures is given by
Theorem 4.4. We have
(i)
\begin{eqnarray*} \frac{-1}{2}\,\frac{\partial^2}{\partial w^2}{\cal {J}}_{\alpha}^\psi\big({f}_{1},{f}_{2};{\textbf{P}},\textbf{w}\big)= D_{\alpha}^{\psi}\left((1-p){f}_{1}+pf_2:p{f}_{1}+(1-p)f_2\right); \end{eqnarray*}(ii)
\begin{eqnarray*} \frac{-1}{2}\,\frac{\partial^2}{\partial p^2}{\cal {J}}_{\alpha}^\psi\big({f}_{1},{f}_{2};{\bf{P}},\bf{w}\big)= D_{\alpha}^{\psi}\left((1-w){f}_{1}+wf_2:w{f}_{1}+(1-w)f_2\right)-D_{\alpha}^{\psi}\big(\,f_1:f_2\big). \end{eqnarray*}
Proof. From Theorem 3.6 and Corollary 4.3, we have
which proves Part (i). From Corollary 4.3 and using the facts that
and
we find
which proves Part (ii). Hence, the theorem.
5. $D_{\alpha}^{\psi}$ divergence measure of escort and arithmetic mixture densities
In this section, we examine $D_{\alpha}^{\psi}$ divergence measure of escort and arithmetic mixture densities.
5.1. $D_{\alpha}^{\psi}$ divergence measure of escort and generalized escort densities
The escort distribution is a key concept in nonextensive statistical mechanics and coding theory and is closely associated with Tsallis and Rényi entropy measures. Bercher [Reference Bercher6] studied some connections between coding theory and the measure of complexity in nonextensive statistical mechanics in terms of escort distributions.
Let f be a density function. Then, the escort density with order η > 0, associated with f, is defined as
Theorem 5.1. Let f and g be two density functions and fα be the escort density corresponding to f. Then, for $0\leq\eta\leq1$ and $\psi(x)=f_{\eta}(x)$, we have
where $\beta=1-\eta(1-\alpha)$ and $G_{\eta}(\,f)$ is the information generating function of density f with order η defined as
Proof. From the definition of $D_{\alpha}^{\psi}(\,f:g)$ and the assumption that $\psi(x)=f_{\eta}(x)$, we have
where $\beta=1-\eta(1-\alpha),$ as desired.
Next, let f and g be two density functions. Then, the generalized escort density, for $1 \gt \eta \gt 0$, is defined as
Let $\psi(x)=h_{\eta}(x)$. We then have
where $R_{\eta}(\,f:g)$ is the relative information-generating function between density functions f and g defined as
Theorem 5.2. A lower bound for $D_{\alpha=0}^{\psi}(\,f:g)$ in Eq. (5.5) is given by
where $f_\eta=\eta f+(1-\eta)g$ is the two-component mixture density, $\chi^2(.:.)$ is the chi-square divergence, and $ R_{\eta}(\,f:g)$ is as defined in Eq. (5.6).
Proof. From the definition of $D_{\alpha}^{\psi}(\,f:g)$ and the assumption that
for $0\leq \eta \leq1$, and using the geometric mean-arithmetic mean inequality between densities f and g given by
and the fact that $g(x)-f(x)=\frac{1}{1-\eta}(\,f_\eta(x)-f(x))$, we obtain
as required.
5.2. $D_{\alpha}^{\psi}$ divergence measure between two arithmetic mixture densities
In this subsection, we study $D_{\alpha}^{\psi}$ divergence measure between two arithmetic mixture densities. Consider two mixture density functions $f_m(x)=\sum_{i=1}^{n} p_i f_i(x)$ and $g_m(x)=\sum_{i=1}^{n} p_i g_i(x)$. Then, we have
Theorem 5.3. Let $f_1,\ldots,f_n$ be n density functions. Now, consider the probability mixing vector ${\textbf{P}}=(p_1,\ldots,p_n)$ and its corresponding negation probability vector
Then, we have the lower bound for $D_{\alpha=0}^{\psi}$ as
where $L=\underset{i \lt j}{\sum_{i=1}^{n}\sum_{j=1}^{n}}\frac{(np_i-1)(np_j-1)}{(n-1)^2}\int\frac{f_i(x)f_j(x)}{\psi^2(x)}{\rm d}x.$ For more details about negation probability, see [Reference Wu, Deng and Xiong22].
Proof. From the definition of $D_{\alpha=0}^{\psi}$ divergence measure between mixture densities $\sum_{i=1}^{n}p_i\,f_i$ and $\sum_{i=1}^{n}\bar{p}_i f_i$ and upon setting
we find
where the last inequality follows from the inequality between Kullback–Leibler and chi-square divergence measures. This proves the required result.
6. Optimal information under $\chi_{\alpha}^2$ divergence measure
In this section, we first introduce $(p,\eta)$-mixture density as a generalization of arithmetic and harmonic mixture densities. Then, we examine optimal information property of $(p,\eta)$-mixture density. To follow this, we consider optimization problem for $\chi_{\alpha}^2$ divergence under three types of constraints. For more details about optimal information properties of some mixture distributions (arithmetic, geometric and $\alpha-$mixture distributions), one may refer to [Reference Asadi, Ebrahimi, Kharazmi and Soofi2] and the references therein.
6.1. $(p,\eta)$-mixture density
Definition 6.1. Let f 0 and f 1 be two density functions. Then, a generalized mixture density, called the $(p,\eta)$-mixture density, is defined as
The $(p,\eta)$-mixture density provides arithmetic and harmonic mixture densities as special cases:
(i) If p = 0, then $f_m(x)=f_1(x)$.
(ii) If p = 1, then $f_m(x)=f_0(x)$.
(iii) If η = 1, then $f_m(x)=pf_0(x)+(1-p)f_1(x)$ is the arithmetic mixture density.
(iv) If η = 0, then $f_m(x)=\frac{{\big(\frac{p}{{f}_0(x)}+\frac{1-p}{{f}_1(x)}\big)^{-1}}}{\int \big(\frac{p}{{f}_0(x)}+\frac{1-p}{{f}_1(x)}\big)^{-1} {\rm d}x}$ is the harmonic mixture density.
6.2. Optimal information property of $(p,\eta)$-mixture density
Theorem 6.2. Let f, f 0 and f 1 be three density functions. Then, the solution to the optimization problem
is the $(p,\eta)$-mixture density with $\eta=\alpha$ and mixing parameter $p=\frac{1}{1+\lambda_0}$, and $\lambda_0 \gt 0$ is the Lagrangian multiplier.
Proof. We use the Lagrangian multiplier technique for finding the solution of the optimization problem in Eq. (6.1). Thus, we have
Now, differentiating with respect to f, we obtain
Setting Eq. (6.2) to zero, we get the optimal density function to be
where $p=\frac{1}{1+\lambda_0}$, as required.
Theorem 6.3. Let f, f 0 and f 1 be three density functions. Then, the solution to the optimization problem,
is the $(p,\eta)$-mixture density with mixing parameter p = w.
Proof. Making use of the Lagrangian multiplier technique in the same way as in Theorem 6.1, the required result is obtained.
Theorem 6.4. Let f, f 0 and f 1 be three density functions and $T_\alpha(X)=\frac{{f}(X)}{{f}_{1}^{1-\alpha}(X)}$. Then, the solution to the optimization problem,
is the $(p,\eta)$-mixture density with mixing parameter $p=\frac{1}{1+\lambda_0}$ and $\lambda_0 \gt 0$ is the Lagrangian multiplier.
Proof. Making use of the Lagrangian multiplier technique in the same way as in Theorem 6.1, the required result is obtained.
Now, we extend Theorem 6.2 to the case of n + 2 density functions.
Theorem 6.5. Let f, ${f}_0,\ldots,f_n$ be n + 2 density functions. Then, the solution to the optimization problem,
is the extended $(p,\eta)$-mixture density with $\eta=\alpha$ and mixing parameters $p_i=\frac{\lambda_i}{1+\sum_{i=0}^{n-1}\lambda_i}$ and $\lambda_i \gt 0$, $i=0,\ldots,n$, is the Lagrangian multiplier.
Proof. We use the Lagrangian multiplier technique for finding the solution to the optimization problem in Eq. (6.5). Thus, we have
Now, differentiating with respect to f, we obtain
Setting Eq. (6.6) to zero, we get the optimal density function to be
where
and $p_i=\frac{\lambda_i}{1+\sum_{i=0}^{n-1}\lambda_i}$, as required.
7. Relative-$\chi_{\alpha}^2$ divergence measure of mixed reliability systems
Consider a system with component lifetimes $X_1,\ldots,X_n$, which are independent and identically distributed (i.i.d.) with a common lifetime cumulative distribution function (c.d.f.) F and a probability density function (p.d.f.) f. Then, the system lifetime $T =\phi(X_1,\ldots , X_n)$, where ϕ is referred to as the system’s structure function, is connected to signature vector $\textbf{s}=(s_1,\ldots,s_n)$ through
where $X_{1:n},\ldots,X_{n:n}$ are the order statistics of component lifetimes and ni is the number of ways that component lifetimes can be arranged such that $T =\phi(X_1,\ldots, X_n)=X_{i:n}$; for more details, see [Reference Samaniego20]. Then, the reliability function of T can be expressed as a mixture of reliability functions of $X_{i:n}, i=1,\ldots,n$, as
Consequently, the corresponding p.d.f. of T is
where $f_{i:n}$ is the p.d.f. of $X_{i:n}$, given by
see [Reference Arnold, Balakrishnan and Nagaraja1].
7.1. $D_{\alpha}^{\psi}$ measure for order statistics
Suppose $X_1,\ldots,X_n$ are i.i.d. variables from an absolutely continuous c.d.f. F and p.d.f. f, and $X_{1:n},\ldots,X_{n:n}$ are the corresponding order statistics.
Theorem 7.1. The $ D_{\alpha}^{\psi}$ divergence measure between densities $f_{i:n}$ and f is given by
where the random variables U and $U_{i:n}$ are uniform and $Beta(i, n-i+1)$ random variables on $(0,1)$ with density functions fU and $f_{U_{i:n}}$, respectively.
Proof. By using the definition of $D_{\alpha}^{\psi}$ divergence measure and the transformation $u=F(x)$, we obtain
as required.
Corollary 7.2. From Theorem 7.1, we readily deduce the following:
(i) If $\psi(x)=f(x)$, then
\begin{eqnarray*} D_{\alpha}^{\psi}(\,f_{i:n}:f)=\chi_{\alpha}^2(\,f_U:f_{U_{i:n}}). \end{eqnarray*}(ii) If $\psi(x)=f_{i:n}(x)$, then
\begin{eqnarray*} D_{\alpha}^{\psi}(\,f_{i:n}:f)=\chi_{\alpha}^2(\,f_{U_{i:n}}:f_U). \end{eqnarray*}
From Corollary 7.2, it is immediately seen that under the imposed assumptions, $D_{\alpha}^{\psi}(\,f_{i:n}:f)$ divergence is free of the baseline distribution.
Theorem 7.3. The $D_{\alpha}^{\psi}$ divergence measure between two density functions $f_{i:n}$ and $f_{j:n}$ is given by
Proof. By using the definition of $D_{\alpha}^{\psi}$ divergence measure and the transformation $u=F(x)$ in the same way as in the proof of Theorem 7.1, the required result is obtained.
In the special case when $\psi(x)=f_{i:n}(x)$, we find that
7.2. $D_{\alpha}^{\psi}$ measure for mixed systems
In this section, we examine the $D_{\alpha}^{\psi}$ divergence measure associated with mixed reliability systems.
Theorem 7.4. If $\psi(x)=f_{i:n}(x)$, then the $D_{\alpha}^{\psi}(\,f_T:f_{i:n})$ divergence measure is given by
Proof. From the assumption that $\psi(x)=f_{i:n}(x)$ and the definition of $D_{\alpha}^{\psi}(\,f_T:f_{i:n})$ measure, and making use of the transformation $u=F(x)$, we have
as required.
Theorem 7.5. Let T 1 and T 2 be the lifetimes of two mixed systems with signatures s and $\textbf{s}^{\prime}$ consisting of n i.i.d. components having common c.d.f. F and p.d.f. f. Then, if $\psi(x)=f(x)$, we have
where $f_{U_{i:n}}(u)$ is the p.d.f. of a beta distribution with parameters i and $n-i+1$.
Proof. From the assumption made and use of the transformation $u=F(x)$, we have
as required.
8. Application to image processing
In this section, we present an application of Jensen-$\chi_{\alpha}^2$ measure in the framework of image quality assessment. For pertinent details about image quality assessment, see [Reference Gonzalez10].
Figure 1 shows the original lake image that includes $512\times512$ cells, and the level of the color gray of each cell assumes a value in the interval $[0,1]$ (0 for black and 1 for white). It depicts the image labeled as X and three adjusted versions of it labeled as $Y(=X+0.3)$ (increasing brightness), $Z(=\sqrt{2\times X})$ (with increased contrast and gamma correction) and $W(=\sqrt{X})$ (gamma corrected). For pertinent details, see EBImage package in R software [Reference Oles, Pau, Smith, Sklyar and Hube19].
The extracted histograms with the corresponding empirical densities for images X, Y, Z and W are plotted in Figure 2.
We can see from Figures 1 and 2 that the highest degree of similarity is first related to W and then to Y, whereas Z has the highest degree of divergence from the original image X.
8.1. Nonparametric estimation of the Jensen-$\chi_{\alpha}^2$ divergence measure
Let f 1, f 2 and ψ be probability density functions. Suppose we draw independent and identically distributed random samples from each of these distributions, obtaining samples of sizes n 1, n 2 and nψ, respectively. Denote the resulting samples by $X_1^{(1)}, \ldots, X_{n_1}^{(1)}$ for f 1, $X_1^{(2)}, \ldots, X_{n_2}^{(2)}$ for f 2 and $X_1^{\psi}, \ldots, X_{n_\psi}^{\psi}$ for ψ.
To estimate the underlying probability density functions f 1, f 2 and ψ using kernel density estimation, we can use the following functions:
Let $\hat{f_1}(x)$ be the kernel density estimate of f 1, based on the sample $X_1^{(1)}, \ldots, X_{n_1}^{(1)}$. Then, we have
where $K(\cdot)$ is a kernel function, typically chosen to be a symmetric probability density function, and h 1 is a bandwidth parameter that controls the smoothness of the estimate.
Similarly, let $\hat{f_2}(x)$ be the kernel density estimate for f 2, based on the sample $X_1^{(2)}, \ldots, X_{n_2}^{(2)}$. Then, we can write
where h 2 is a bandwidth parameter for the kernel density estimate of $f_2.$ Finally, let $\hat{\psi}(x)$ be the kernel density estimate for ψ, based on the sample $x_1^{\psi}, \ldots, x_{n_\psi}^{\psi}$. Then, we have
where hψ is a bandwidth parameter for the kernel density estimate of ψ.
For more details, see [Reference Duong, Duong and Suggests8].
Using these estimates based on Gaussian kernel, $K(u)=\frac{1}{\sqrt{2\pi}}{\rm e}^{-\frac{u^2}{2}},$ we can compute the integrated nonparametric estimate of the Jensen-$\chi_{\alpha}^2$ measure, for $0 \lt p \lt 1,$ as
We have computed the Jensen-$\chi_{\alpha}^2$ information measure for each pair of adjusted images with respect to the original lake image, and these are presented in Table 1. The results demonstrate that the Jensen-$\chi_{\alpha}^2$ divergence is an effective measure of similarity between each pair of adjusted images and the reference original image. Specifically, the Jensen-$\chi_{\alpha}^2$ divergence highlights the high degree of similarity between images Y and Z with respect to the original image (X). Furthermore, the results in Table 1 indicate that the comparison of images Z and W with respect to the reference image X results in low similarity. Therefore, the Jensen-$\chi_{\alpha}^2$ information measure can be considered as an efficient criteria for comparing the similarity between each pair of adjusted images with respect to the reference image.
9. Concluding remarks
In this paper, by considering the $\chi_{\alpha}^2$ divergence measure, we have proposed relative-$\chi_{\alpha}^2$, Jensen-$\chi_{\alpha}^2$ and (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measures. We have first shown that the $\chi_{\alpha}^2$ divergence measure has a close relationship with q-Fisher information of mixing parameter of an arithmetic mixture distribution. We have then shown that the proposed relative-$\chi_{\alpha}^2$ divergence measure includes some other well-known versions of chi-square divergence such as the usual chi-square (χ 2), generalized-χ 2 ($\chi_{\alpha}^{2}$), triangular and Balakrishnan–Sanghavi divergence measures all as special cases. We have shown that the Jensen-$\chi_{\alpha}^2$ divergence is a mixture of relative-$\chi_{\alpha}^2$ divergence measures. A lower bound for Jensen-$\chi_{\alpha}^2$ divergence has been obtained in terms of Jensen–Shannon entropy measure. We have also introduced (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measure and have then established some of its properties. Further, we have studied the relative-$\chi_{\alpha}^2$ divergence measure of escort and arithmetic mixture densities. Next, we have introduced $(p,\eta)$-mixture density, which includes arithmetic-mixture and harmonic-mixture densities as special cases. Interestingly, we have shown that the proposed mixture density possesses optimal information under three different optimization problems associated with the $\chi_{\alpha}^2$ divergence measure. We have also provided a discussion about the relative-$\chi_{\alpha}^2$ divergence measure of order statistics and mixed reliability systems. Finally, we have described an application of the Jensen-$\chi_{\alpha}^2$ measure in image processing.
In summary, in this paper, some extensions of the chi-square divergence measure such as the relative-$\chi_{\alpha}^2$, $D_{\alpha}^{\psi}$, Jensen-$\chi_{\alpha}^2$ and (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measures have been proposed. Particularly, it has been shown that the relative-$\chi_{\alpha}^2$ divergence measure includes the well-known divergence measures, such as L 2, χ 2, triangular, symmetric χ 2, $\chi_{\alpha}^2$ and Balakrishnan–Sanghvi divergence measures, all as special cases, and provides a flexible and powerful divergence measure for comparing probability distributions in a wide rage of problems. The choice of α and the weight function $\psi(x)$ can be tailored to suit the specific characteristics and the features of the data for the models that are being compared, allowing for greater sensitivity and flexibility in the comparison process.
Furthermore, the proposed Jensen-$\chi_{\alpha}^2$ and (p, w)-Jensen-$\chi_{\alpha}^2$ divergence measures are extensions of $D_{{\alpha}}^{\psi}(\,f:g)$ that are based on a convex combination. These extensions allow for the incorporation of additional divergence measures into the framework, further increasing the flexibility and applicability of the method.
There are, of course, several areas of the proposed information measures that require more study with regard to its theoretical as well as experimental analysis. Additionally, with the incorporation of the idea of relative-$\chi_{\alpha}^2$ divergence and Jensen-$\chi_{\alpha}^2$ divergence measures, there is an opportunity to broaden and explore the discrete and cumulative versions of the established divergence measures, utilizing the properties of convexity or concavity. It will also be of great interest to study cumulative versions of these measures, and we plan to do this in our future work. Finally, there is also a potential to extend the idea to relative Fisher information measure. We are currently working on these problems and hope to report the findings in a future paper.
Acknowledgements
The authors express their sincere thanks to the Editor and the anonymous reviewers for their useful comments and suggestion on the earlier version of this manuscript, which resulted in this much improved version.
Competing interests
On behalf of all authors, the corresponding author states that there is no conflict of interest.