1 Introduction
Deep learning models, particularly those designed for vision-based tasks such as image classification, have proliferated across various domains, demonstrating remarkable performance in discerning intricate patterns in visual data. However, this widespread adoption comes with a caveat – these models are inherently susceptible to biases present in the training data (Reference Buhrmester, Münch and ArensBuhrmester et al. 2021).
One infamous illustration of this bias is exemplified by the “wolf in the snow” problem (Reference Ribeiro, Singh and GuestrinRibeiro et al. 2016), where convolutional neural networks (CNNs) erroneously identify a husky as a wolf due to the presence of snow in the background. This happened because they learnt to associate “snow” with “wolf” based on the training data. This vulnerability to undesirable bias becomes particularly significant when these models are used in high-stakes settings such as disease diagnosis (Reference Müezzinoglu, Baygin, Tuncer, Barua, Baygin, Dogan, Tuncer, Palmer, Cheong and AcharyaMüezzinoglu et al., 2023) and autonomous vehicle operation (Reference Barea, Bergasa, Romera, López-Guillén, Perez, Tradacete and LópezBarea et al. 2019). In such scenarios, the consequences could be dire if, for example, a model incorrectly identifies a malignant condition as benign, or erroneously suggests it is safe to switch lanes into oncoming traffic due to misleading patterns in the training data. Such instances underscore the urgent need to address the bias correction in CNNs to ensure their reliability and fairness in real-world applications.
Recent works have shown that it is possible to obtain the knowledge of a trained CNN in the form of a symbolic rule-set, more specifically as a stratified Answer Set Program (Reference Padalkar, Wang and GuptaPadalkar et al., 2024b,Reference Padalkar, Wang and Guptac). The authors proposed a framework called NeSyFOLD, wherein the activation of filters in the final convolutional layer of the CNN serves as the truth value of predicates in the generated rule-set, offering valuable insights into the concepts learnt by the model and their relation to the target class to be predicted. The filters in the CNN are $n \times n$ real-valued matrices. These matrices capture the representation of various concepts in the images. The predicates are labeled as the concept(s) that their corresponding filters learn to identify in the images. Figure 1 illustrates the NeSyFOLD framework and the final rule-set that is generated for a train set containing images of the “bathroom,” “bedroom” and “kitchen” classes.
It is easy to scrutinize the rule-set generated by NeSyFOLD and find the biases that the CNN develops toward each class. The CNN’s filters, during training, learn the most appropriate concepts that would easily distinguish between the different classes of images in the train set. To a human, a distinction made based on concepts found by the CNN may be counter-intuitive. Moreover, the concepts learned may only be adequate for differentiating between the classes of images present in the current train set. Frequently, these concepts might not suffice for accurately classifying images of the same class sourced differently, where key patterns in the images may vary. For instance, images from the new source might be captured from a different angle, at a different time of day, or under varying weather conditions. Hence, a human with adequate domain knowledge can identify the biases that are “undesired” or “desired” such that after retraining, the model becomes more robust to differently sourced data. An example of this could be a doctor identifying the concepts that appear in the generated rule-set, that are positively linked to the target class “malignant” and then suggesting the undesired and desired concepts. Those concepts if unlearnt/learnt by the CNN filters can improve the performance of the CNN on classifying images from different sources when deployed.
We introduce the NeSyBiCor (Neuro-Symbolic Bias Correction) framework, to aid in correcting pre-identified biases of the CNN. The Answer Set Program (ASP) rule-set generated by NeSyFOLD, from the bias-corrected CNN serves to validate the effectiveness of the framework. The pre-identified biases are presented as semantic constraints based on concepts that should and should not be used to predict the class of an image. These concepts can be selected by scrutinizing the rule-set generated by NeSyFOLD. We map the undesirable/desirable semantic concepts to their corresponding vector representations learnt by the filters. Next, we retrain the CNN with a novel semantic similarity loss function which is a core component of our framework. The loss function is designed to minimize the similarity of the representations learnt by the filters with the undesirable concepts and maximize the similarity to the desirable ones. Once the retraining is complete, we use the NeSyFOLD framework again to extract the refined rule-set. Hence, our approach provides a way to refine a given ASP rule-set subject to some semantic constraints.
To summarize, our contributions are as follows:
-
1. We propose a novel framework, NeSyBiCor, for targeted bias correction in a CNN.
-
2. We introduce a semantic similarity loss for penalizing/reinforcing filters that learn undesirable/desirable concept representations.
-
3. We evaluate the framework on subsets of the Places (Reference Zhou, Lapedriza, Khosla, Oliva and TorralbaZhou et al. 2017a) dataset.
2 Background
2.1 Convolutional neural networks
Convolutional Neural Networks (CNNs) are a sub-category of neural networks (NNs) well suited to pattern recognition in visual data first introduced by Reference LeCun, Boser, Denker, Henderson, Howard, Hubbard and JackelLeCun et al. (1989). Their distinctive feature is the presence of convolutional layers which employ learnable filters (also called kernels) to extract spatial hierarchies of features from input data. These filters are designed to detect specific features, such as edges or textures, by applying a mathematical operation called convolution. This operation involves sliding the filter over the image and computing the dot product of the filter values and the original pixel values in the image. The result of this process is a feature map, which is a new representation of the image emphasizing the detected features. Each filter in a layer can produce a distinct feature map, collectively forming a complex representation of the input that assists the network in learning to classify images or recognize patterns efficiently. It has been shown that when a CNN is trained with images, the filters in its convolutional layers learn specific patterns. Moreover, the last layer filters learn to recognize high-level features such as objects or object parts. This has been used to increase the explainability of CNNs via analysis of the types of emergent patterns and the relations between them (Reference Zhang, Cao, Shi, Wu and ZhuZhang et al., 2018, Reference Zhang, Wang and Zhu2017a). In this work, we build upon the line of research where CNN filters are used as symbolic atoms/predicates in a rule-set for image classification (Reference Townsend, Kasioumis and InakoshiTownsend et al. 2021; Reference Padalkar, Wang and GuptaPadalkar et al., 2024b,Reference Padalkar, Wang and Guptac).
2.2 FOLD-SE-M and NeSyFOLD
2.2.1 FOLD-SE-M
Default logic is a non-monotonic logic used to formalize commonsense reasoning. A default $D$ is expressed as:
Equation 1 states that the conclusion $\Gamma$ can be inferred if pre-requisite $A$ holds and $B$ is justified. $\textbf{M} B$ stands for “it is consistent to believe $B$ .” Normal logic programs can encode a default theory quite elegantly (Reference Gelfond and KahlGelfond and Kahl 2014). A default of the form:
can be formalized as the normal logic programing rule:
where $\alpha$ ’s and $\beta$ ’s are positive predicates and not represents negation-as-failure. We call such rules default rules. Thus, the default
will be represented as the following default rule in normal logic programing:
flies(X) :- bird(X), not penguin(X).
We call bird(X), the condition that allows us to jump to the default conclusion that X flies, the default part of the rule, and not penguin(X) the exception part of the rule.
FOLD-SE-M (Reference Wang and GuptaWang and Gupta 2024) is a Rule Based Machine Learning algorithm. It generates a rule-set from tabular data, comprising rules in the form described above. The complete rule-set can be viewed as a stratified answer set program (a stratified ASP rule-set has no cycles through negation (Reference BaralBaral 2003)). It uses special abx predicates to represent the exception part of a rule where x is a unique numerical identifier. FOLD-SE-M incrementally generates literals for default rules that cover positive examples while avoiding covering negative examples. It then swaps the positive and negative examples and calls itself recursively to learn exceptions to the default when there are still negative examples falsely covered.
There are $2$ tunable hyperparameters, $ratio$ , and $tail$ . The $ratio$ controls the upper bound on the number of false positives to the number of true positives implied by the default part of a rule. The $tail$ controls the limit of the minimum number of training examples a rule can cover.
2.2.2 NeSyFOLD
Reference Padalkar, Ślusarz, Komendantskaya and GuptaPadalkar et al. (2024b) introduced a neurosymbolic framework called NeSyFOLD to extract a symbolic rule-set in the form of a stratified Answer Set Program from the last layer filters of a CNN. Figure 1 illustrates the NeSyFOLD framework.
2.2.3 Binarization of filter outputs
For each image, the output feature maps of the last layer filters in the CNN are collected and stored. Next, the norms of the feature maps are computed. A norm of a matrix here is simply adding the squared values of each element in the matrix and taking a square root of the output. Equation (2) depicts this operation for the feature map produced by the $k^{th}$ filter for the $i^{th}$ image. This creates a table of norms of dimensions equal to no. of images $\times$ no. of filters. Finally, these norm values for each filter are treated as the filter’s activation value and a weighted sum of the mean and standard deviation (equation (3)) of these values over all the rows (images) in the table determines the threshold of activation for each filter. $\alpha$ and $\gamma$ are hyperparameters in Equation (3) and $n$ is the number of images in the train set. $\theta _k$ is the threshold for the $k^{th}$ filter. This threshold is used to binarize each value. This creates a binarized vector representation of the image with a dimension equal to the number of filters in the last layer. Such vectors are computed for all images in the training set and collected in a binarization table (Figure 1 top-right). Next, the FOLD-SE-M algorithm takes the binarization table as input and outputs a raw rule-set (Figure 1 bottom-right) wherein each predicate represents a filter in the CNN’s last layer. The truth value of each predicate is determined by the binarized activation of the corresponding filter in the CNN when classifying a new image using the rule-set.
Note that in the scope of our work, the abx/1 predicates (e.g. ab1(X), ab2(X), etc.) do not carry inherent semantic meaning but are crucial for the structural integrity and compact representation of the rule-sets generated. Each abx/1 predicate is found in the head of precisely one rule, and the corresponding rule body may contain semantically meaningful predicates linked to CNN filters or other auxiliary predicates. In essence, each abx/1 predicate in the rule-sets can be represented through a combination of conjunctions and disjunctions of predicates that possess semantic significance and are connected to the CNN’s filters. They are essential in simplifying the structure of the rule-sets.
2.2.4 Semantic labeling of predicates
Semantic segmentation masks of the images are used to map each filter to a set of concepts that it represents and thus the predicates in the rule-set can be given a semantic meaning. These semantic segmentation masks can be human annotated or those generated by using large foundation models such as SegGPT (Reference Wang, Zhang, Cao, Wang, Shen and HuangWang et al. 2023) or RAM (Reference Zhang, Huang, Ma, Li, Luo, Xie, Qin, Luo, Li, Liu, Guo and ZhangZhang et al. 2023) in conjunction with the Segment Anything Model (Reference Kirillov, Mintun, Ravi, Mao, Rolland, Gustafson, Xiao, Whitehead, Berg, Lo, Dollár and GirshickKirillov et al. 2023). Hence, the rule-set serves as a highly interpretable global explanation of the CNN’s decision-making process. The predicted class of the image is represented in terms of logical rules based on concepts that the CNN filters learn individually. Figure 2 illustrates the semantic labeling of a single predicate in the raw rule-set extracted by the FOLD-SE-M algorithm or, in other words, it maps the predicate in the raw rule-set with semantic concept(s) that its corresponding filter has learnt. The filters in the last convolutional layer are known to learn high-level concepts, such as objects or object parts (Reference Zhou, Khosla, Lapedriza, Oliva and TorralbaZhou et al. 2015). First, the feature maps generated by a filter in the last convolutional layer for the top-m images that activate it are collected. The top-m images are selected according to the norm values of the feature maps generated by the filter for all images in the train set. Next, the feature maps are resized and masked onto the top-m images (Figure 2 top). Notice that by doing this, one can observe the concepts that the filter is looking at in each of selected images. For example, the filter considered in Figure 2 is looking at beds in images. The masked images are then overlapped onto the semantic segmentation masks and an Intersection over Union (IoU) score is calculated for each concept that is visible after the overlap (Figure 2, middle). Finally, the scores are aggregated over the top-m images and the label of the predicate that is associated with the filter under consideration is determined (Figure 2, bottom). We refer the reader to the NeSyFOLD paper by Reference Padalkar, Ślusarz, Komendantskaya and GuptaPadalkar et al. (2024b) for more details on the semantic labeling procedure and the NeSyFOLD framework itself.
3 Methodology
We demonstrate the utility of our framework using classes of images from the Places dataset (Reference Zhou, Lapedriza, Khosla, Oliva and TorralbaZhou et al. 2017a). The dataset contains images of various indoor and outdoor places and scenes such as “kitchen,” “bathroom,” “beach,” “forest,” etc. We choose this dataset because the manually annotated semantic segmentation masks of the images are also readily available as part of a different dataset called ADE20k (Reference Zhou, Zhao, Puig, Fidler, Barriuso and TorralbaZhou et al. 2017b). The NeSyFOLD framework requires semantic segmentation masks for generating a meaningful rule-set with concept(s) as predicate names.
In Figure 3 we show a schematic diagram of the NeSyBiCor framework. As a running example, let us consider a CNN trained on $2$ classes of the places dataset namely “desert road” and “street.” NeSyFOLD generates a rule-set as shown in the blue box at the top left of Figure 3. First, the desired and undesired biases are conceptualized in terms of the concepts that are desired or undesired to appear as being positively associated with a target class namely “desert road” or “street.” Notice that in rule 2, the “desert road” class is being predicted based on whether there is sky visible in the image. Let us say this is an undesired concept for classifying images of “desert road” as is the road concept for the "street" class. Note that "street" images are not being classified by the road concept, meaning that the road concept is not positively linked to the “street” class. Also note, that rule 5 has a predicate based on the road concept (road1/1) but it appears with a negation preceding it hence it is not positively linked to the “street” class. However, it could happen that during our retraining the CNN might pick up on this concept so we can provide the undesired concepts beforehand. Similarly, the desired concepts are also identified by scrutinizing the rule-set or through domain knowledge. The yellow box on the top-right of Figure 3 shows the Semantic ASP constraints that are constructed for each class with the undesired (red) and desired (blue) concepts. The objective is to convert the given ASP rule-set to a rule-set that satisfies the given constraints as closely as possible.
Since the ASP rule-set is generated from a CNN, the intrinsic connectionist knowledge of the CNN needs to be revised based on the symbolic semantic constraints. Hence, we developed the following procedure to convert the semantic concepts that are meaningful to the human, to vector representations, meaningful to the CNN to facilitate retraining.
3.1 Computing concept representation vectors
The first step is to obtain the concept representation vectors for each desired and undesired concept specified in the semantic constraints for each target class. Recall that a filter in the CNN is a matrix that detects patterns in the input image. For example, the filter associated with the predicate “sky1/1” in the rule-set is detecting some type of patterns in the sky (say, blue clear sky) and the filter associated with the “sky2/1” predicate is detecting some other types of patterns in the sky (say, evening sky) in the input ‘desert road’ images. The output produced by a filter for a given image is again a matrix that can be flattened into a vector and treated as the representation of the patterns it has learnt to detect. Hence, to find the concept representation vector of sky, we first compute the individual filter representation vectors for all the predicates that have the concept sky in their name and are positively associated with the class “desert road.” In this case, there are only two, that is “sky1/1” and “sky2/1.” To compute their respective filter representation vectors, we find the top-10 images (inspired by NeSyFOLD) in the train set that these filters are most activated by and take the mean of all $10$ vectors produced as their outputs for these $10$ images. Finally, to compute the concept representation vector for the concept sky, we simply take the mean of the “sky1/1” and “sky2/1” filter representation vectors.
Repeating the above procedure for every undesired and desired concept yields their respective concept representation vectors.
Note that, as the number of images in the train set increases, the time to find the top-10 images increases linearly. Hence, if the number of images in the train set is doubled then the time taken to find the top-10 images would also nearly double. In contrast, as the number of concepts increases, the time taken to compute all the concept representation vectors depends largely on the number of filters that are associated with different concepts. If there are desired/undesired concepts that have no filters associated with them in the rule-set, then no concept representation vectors are computed. Thus, adding more desirable/undesirable concepts does not always translate to greater computation time.
Next, we start the retraining of the CNN with the original cross-entropy loss $\mathcal{L_{CE}}$ and our novel Semantic Similarity loss $\mathcal{L_{SS}}$ .
3.2 Calculating the semantic similarity loss
We define the semantic similarity loss $\mathcal{L_{SS}}$ , for a train set with $N$ images and CNN with $K$ filters in the last convolutional layer, as follows:
The $cos\_sim$ function calculates the cosine similarity between two representation vectors. $\mathbf{r}_j^i$ is the filter representation vector obtained from the $i^{th}$ image’s $j^{th}$ filter output. $\mathbf{r}_b$ is the concept representation vector for some concept $b$ in the list of undesired concepts $\mathbf{B}$ . Similarly, $\mathbf{r}_g$ is the concept representation vector for some concept $g$ in the list of desired concepts $\mathbf{G}$ . $\lambda _B$ and $\lambda _G$ are hyperparameters.
The rationale behind the loss function is simple: the loss increases when the filter representation vectors closely resemble the undesired concept representation vectors and decreases when the filter representation vectors are closer to the desirable concept representation vectors. This approach is conceptually similar to the loss function used in word2vec (Reference Mikolov, Chen, Corrado and DeanMikolov et al. 2013), where the model’s objective is to maximize the similarity between a target word and its context words while minimizing similarity with randomly sampled words. Thus, as training progresses with any standard optimization technique such as Adam (Reference Kingma and BaKingma and Ba 2015), the loss is minimized and the filters get pushed away from learning undesirable concepts and pushed toward learning desirable concepts. Note that the total loss is defined as $\mathcal{L_{TOTAL}} = \mathcal{L_{CE}} + \mathcal{L_{SS}}$ . Hence, the cross-entropy loss $\mathcal{L_{CE}}$ is also jointly minimized along with the semantic similarity loss $\mathcal{L_{SS}}$ , to maintain the classification performance.
An important observation to make here is that the semantic similarity loss might become negative as the first term tends to $0$ . The total loss is the sum of the semantic similarity loss and the cross-entropy loss. The hyperparameters $\lambda _B$ and $\lambda _G$ help to determine the influence of the semantic similarity loss toward the total loss. This is common practice in machine learning literature to control the influence of various loss terms in the total loss. Hence even if the semantic similarity loss is negative, due to the hyperparameters which are tuned on a validation set, the cross-entropy loss is not highly influenced which helps in maintaining the model’s classification performance. This is demonstrated by our experiments later in the paper.
3.3 Recalibrating the concept representation vectors
As the training progresses, the CNN filters may learn slightly different representations of undesirable/desirable concepts. For example, some filters might pick up a different pattern in the sky of the images that was not caught earlier when the training started. Now if this new pattern in the sky (say, dense clouds) is again deemed sufficient by the CNN to be a significant feature in distinguishing the “desert road” class from the “street” class, then this filter would show up in the rule-set as a “skyx/1” predicate. This would happen because the initially computed concept representation vector for sky does not encapsulate the representation for this new type of sky which became significant because the filters were pushed away from learning the other types of sky by the loss function. It is also possible that some new undesirable concept such as road for the “street” class shows up in the rule-set. In such a case, the new concept would not be mitigated as there was no filter capturing the representations for road initially. Hence, no concept representation vector for road is available to push the filters away from learning this new undesirable concept. Similarly, some new filters, learning known or new desirable concepts might also appear.
To solve this problem, we propose rectifying all the concept representation vectors for each class after every $k$ epochs during training. We do this by running the NeSyFOLD framework after every $k$ epochs and obtaining a new rule-set from the partially retrained CNN. We then compute the concept representation vectors for all the undesired and desired concepts again by considering all the predicates that appear in the newly generated rule-set, by following the algorithm described above. We then aggregate the new concept representation vectors with the old concept representation vectors by taking their mean, so that the new aggregated vectors encapsulate the information of the new patterns that were found. Hence, the filters can now be pushed away from/toward these new undesirable or desirable representations respectively. This way we can ensure that at the end of retraining, there is a greater chance that the new ASP rule-set produced from the CNN satisfies the semantic constraints posed against the initial rule-set.
As the number of concepts increases, the time taken to compute all the concept representation vectors depends largely on the number of filters that are associated with different concepts. If there are desired/undesired concepts that have no filters associated with them in the rule-set, then no concept representation vectors are computed. Thus, adding more desirable/undesirable concepts does not always translate to greater computation time.
4 Experiments
We conducted experiments to address the following research questions:
Q1: How does our NeSyBiCor framework affect the initial rule-set?
Q2: How does the performance of the rule-set extracted from the bias-corrected CNN compare against the one extracted from vanilla CNN w.r.t. accuracy, fidelity and size?
Q3: What percentage of the covered examples are classified using undesired/desired concepts before and after applying the NeSyBiCor framework?
[Q1] Bias Corrected Rule-Set: The core idea of correcting the bias of the CNN and by extension, the rule-set, is that the images should beno classified by rules that use concepts that are intuitive to humans and might be more representative of the class to which the image belongs. Hence, given constraints on the undesired and desired concepts to be learnt by the CNN, the ideal outcome should be that the undesired concepts are eradicated and desired concepts are infused into the revised rule-set.
4.1 Setup
We train a CNN on subsets of the Places (Reference Zhou, Lapedriza, Khosla, Oliva and TorralbaZhou et al. 2017a) dataset. We selected 3 subsets of 3 classes that is babek (“bathroom,” “bedroom,” “kitchen”), defs (“desert road,” “forest road,” “street”) and dedrh (“desert road,” “driveway,” “highway”) along with 3 subsets of 2 classes that is babe (“bathroom,” “bedroom”), des (“desert road,” “street”) and deh (“desert road,” “highway”). We employed a VGG16 CNN, pre-trained on Imagenet (Reference Deng, Dong, Socher, Li, Li and Fei-FeiDeng et al. 2009), training over $100$ epochs with batch size $32$ . The Adam (Reference Kingma and BaKingma and Ba 2015) optimizer was used, accompanied by class weights to address data imbalance. $L2$ Regularization of $0.005$ spanning all layers, and a learning rate of $5 \times 10^{-7}$ was adopted. A decay factor of $0.5$ with a $10$ -epoch patience was implemented. Images were resized to $224 \times 224$ . Each class has $5000$ images so we used $4000$ as the train set and $1000$ as the test set per class. Finally, we used NeSyFOLD to generate the rule-set.
Next, we manually identified the desired and undesired concepts for each class as shown in Table 1. We then used the NeSyBiCor framework on all the subsets listed above with the appropriate semantic constraints based on each class’s desired and undesired concepts. We used a value of $5e^{-2}$ and $1e^{-3}$ for the $\lambda _B$ and $\lambda _G$ hyperparameters (after empirical evaluation on a validation set) while computing the semantic similarity loss respectively. We retrained for $50$ epochs and recalibrated the concept representation vectors every $5$ epochs. NeSyFOLD was used on the retrained CNN to obtain the bias-corrected rule-sets for each subset.
4.2 Result
In Figure 4 we show the initial (RULE-SET 1 & RULE-SET 2) and final bias-corrected (RULE-SET 1* & RULE-SET 2*) rule-sets for $2$ of the $6$ subsets, namely des and defs. Due to the lack of space, we present the other rule-sets in the supplementary material (Reference Padalkar, Ślusarz, Komendantskaya and GuptaPadalkar et al. 2024a).
Recall that, the undesired concepts for the “desert road” class are “sky” and “building.” In RULE-SET 1, rule $2$ uses the “sky1/1” predicate to determine if the image belongs to the “desert road” class. In the bias-corrected rule-set, RULE-SET 1*, there is no “sky” based predicate. Moreover, the only predicate positively linked with the “desert road” class is the “ground1_road1/1” predicate which is based on the desired concept “ground” and refers to the corresponding filter, learning a pattern comprising of specific type of patches of “ground” and “road.” Note, that the name of the predicate is determined by NeSyFOLD, which uses manually annotated semantic segmentation masks for the images. Ideally, it would be more appropriate for the predicate to be labeled as “sand1_road1” but we are limited by the available annotations. One could generate images that are masked with the receptive field of the corresponding filter and manually label the predicate based on the concepts that are visible in the image as shown by Reference Townsend, Kasioumis and InakoshiTownsend et al. 2021).
Note that the rule-set is generated by the FOLD-SE-M algorithm from the binarized feature map outputs of the filters in the last convolutional layer of the CNN. Hence the rule-set mirrors the representations that are learnt by the CNN filters. As the retraining progresses, the filters are penalized for learning representations of the undesired concepts “sky” and “building,” in case of the “desert road” class. Thus at the end of the bias correction, very few/none of the filters learn representations of the “sky” or “building” concepts. The filters tend to learn representations for the desired concepts (e.g. “ground”) more. Thus naturally when the filters’ feature maps are binarized and the binarization table is obtained, it is the filters that have learnt these desired concepts that form better features and are selected by the FOLD-SE-M algorithm to appear in the rule-set. Hence the undesired concepts appear to have been “dropped” by the algorithm in the bias-corrected rule-set.
Similarly, looking at RULE-SET 2* it is apparent that the number of undesired predicates associated positively with a class is reduced w.r.t. RULE-SET 2. Also, notice that certain predicates that are capturing irrelevant concepts such as “person” are eradicated. It is clear by simply examining the rule-sets that in general, the rule-sets are more in line with the specified semantic constraints. Hence, the predictions being made are based on the concepts that the human finds more intuitive.
[Q2, Q3] Qualitative evaluation: Recall that, the rule-set acts as the final decision-making mechanism for any input image to the CNN. The input image is converted to a binarized vector of dimension equal to the number of filters in the last convolutional layer of the CNN. Each value in the binarized vector represents the activation (1) /deactivation (0) of the corresponding filter. Since each filter is mapped to a predicate in the rule-set, the truth value of the predicate is determined by the activation of the filter. The bias-corrected rule-set should be more faithful to the semantic constraints while sacrificing minimal accuracy, fidelity and interpretability.
4.3 Setup
We use the previously generated rule-sets for all subsets to classify the test images. In Table 2 we show the comparison between the accuracy (Acc.), fidelity (Fid.), number of unique predicates (Pred.) and rule-set size (Size) or total number of predicates between the initial and the bias-corrected rule-set for each of the $6$ subsets. The top three rows are $3$ -class subsets and the bottom three are $2$ -class subsets. We use rule-set size as a metric of interpretability. Reference Lage, Chen, He, Narayanan, Kim, Gershman and Doshi-VelezLage et al. (2019) showed through human evaluations that as the size of the rule-set increases the difficulty in interpreting the rule-set also increases. We also qualitatively evaluate how well the NeSyBiCor framework performs in eradicating the undesired concepts while introducing desired ones. Hence, for a given rule-set, out of all training images that were classified by the rule-set as any one of the classes, we report the fraction of images that followed a decision path that involved an undesired predicate. We report this value under the % Undesired column in Table 2. If there was no undesired predicate in the decision path then we check for a desired predicate. This value is reported as % Desired in Table 2.
4.4 Result
The accuracy of the bias-corrected rule-set is comparable to the initial rule-set for all the $2$ -class subsets. For the $3$ -class subsets, the accuracy drops in 2 cases, babek and dedrh but is higher for defs showing no clear trend. This might be because for a smaller number of classes, it is easier for the filters to learn alternate representations which is exactly what we are doing. This is expected, and as the number of classes increases, it will become difficult to optimize for accuracy as well as learning/unlearning concepts. A similar reason applies to lower fidelity values as well. Note that a loss of accuracy on the current data could mean a gain in accuracy in a dataset that is sourced differently hence making the rule-set more robust.
The number of unique predicates and rule-set size is consistently reduced in the bias-corrected rule-set with an average reduction of approximately $65\%$ in the number of unique predicates and rule-set size. Recall that within the NeSyFOLD framework, the FOLD-SE-M algorithm generates the final rule-set by processing binarized filter outputs of all training images, organized into a structure known as the binarization table (refer to Figure 1). This process is akin to how a decision tree algorithm selects the most significant features that effectively segment the majority of the data. A reduction in the number of unique predicates within the rule-set indicates that fewer features are required to successfully differentiate the data while preserving accuracy. This reduction suggests that during retraining, the filters are increasingly focusing on learning more targeted representations (ideally the desired ones), which are distinctly relevant for their respective target classes.
Finally, the most relevant observation is that the % Undesired value is always lesser for the bias-corrected rule-set as expected with an average reduction of $58\%$ . This means that there is a reduction in the number of images being classified by following a path in the decision-making that involved an undesired concept. Since we used a higher $\lambda _B$ than $\lambda _G$ while calculating the semantic similarity loss, the CNN focused more on learning representations that are dissimilar to the undesired concepts. The % Desired value is higher in all cases except for babe and deh with an average increase of $35\%$ . This means that the number of images being classified with at least one desirable concept in the decision path is increased. Note that in the case of babe and deh, a lower % Desired value means that the images were being classified by some other concepts that the human is indifferent toward. In practical scenarios, the primary concern is to ensure images are not categorized based on undesired concepts, a capability effectively demonstrated by the NeSyBiCor framework in our experiments.
5 Related work
The problem of bias in machine learning is well known and many efforts to mitigate it have been made in the last years as surveyed by Reference Mehrabi, Morstatter, Saxena, Lerman and GalstyanMehrabi et al. (2021) who provide an in-depth overview of types of biases as well as bias mitigation techniques specific to different machine learning fields.
When considering visual datasets and CNNs, there are multiple distinct types of bias as described by Reference Torralba and EfrosTorralba and Efros (2011): selection bias, framing bias, label bias, and negative set bias. Selection bias occurs if the selection of subjects for the datasets differs systematically from the population of interest – for example, a dataset may lack the representation of a certain gender or ethnicity (Reference Buolamwini and GebruBuolamwini and Gebru 2018). Framing bias refers to both the selection of angle and composition of the scene when taking a photo as well as any editing done on the images. Label bias arises from errors in labeling the data in comparison to ground truth as well as poor semantic categories. The latter is more common; for example if in a dataset exist two classes – “grass” versus “lawn” – different labelers may assign different labels to the same image (Reference Malisiewicz and EfrosMalisiewicz and Efros 2008). Lastly, the negative set bias refers to a negative set which is what in a dataset is considered “rest of the world” (e.g. “car” and “not-car”); if this set is unbalanced it can negatively affect the classifier (Reference Torralba and EfrosTorralba and Efros 2011).
There are several works on bias detection in CNNs and neural networks in general. Reference Zhang, Wang and ZhuZhang et al. (2017b) propose one such method for detecting the learnt bias of a CNN. They show how spurious correlations can be detected by mining attribute relationships in the CNN without the use of any extra data. For example, the CNN might mistakenly learn the “smiling” attribute to be correlated with the “black hair” attribute due to bias in the dataset. Their method can help detect such correlations. Reference Serna, DeAlcala, Moreno, Fiérrez and Ortega-GarciaSerna et al. (2022) propose an approach to analyze the biases in neural networks by observing identifiable patterns in the weights of the model. Unlike our NeSyBiCor framework, these methods do not correct the identified biases.
Earlier forms of bias mitigation involve transforming problems or data to address bias or imbalance, and over the years became more specialized resulting in algorithms that re-balance internal distributions of training data to minimize bias (Reference Chakraborty, Majumder and MenziesChakraborty et al. 2021). Various new approaches have also been proposed that endeavor to tackle bias either during training or by modifying the already trained model. For example Reference Zhao, Wang and YatskarZhao et al. (2017) present a method based on Lagrangian relaxation for collective inference to reduce it. Many methods are generic and applicable across many types of ML models, such as the measure of decision boundary (un)fairness designed by Reference Zafar, Valera, Rogriguez and GummadiZafar et al. (2017) or convex fairness regularizers for regression problems as introduced by Reference Berk, Heidari, Jabbari, Joseph, Kearns, Morgenstern, Neel and RothBerk et al. (2017).
Logic-based bias mitigation is a comparatively smaller and newer area that offers promising results. In NLP Reference Cai, Ding, Chen, Du and LiuCai et al. (2022) present a model based on neural logic, Soft Logic Enhanced Event Temporal Reasoning, which utilizes t-norm fuzzy logics to acquire unbiased temporal common sense knowledge from text. Various methods of incorporating logical constraints to increase the fairness of the model, either before or after training, have also been proposed. Some of them are problem-specific, for example related to voting fairness (Reference Celis, Huang and VishnoiCelis et al. 2018) others are more general (Reference Goh, Cotter, Gupta and FriedlanderGoh et al. 2016).
Similarities can also be seen in more general approaches to incorporating logical constraints in ML not explicitly related to bias mitigation – either during training for example using custom property-based loss functions (Reference Fischer, Balunovic, Drachsler-Cohen, Gehr, Zhang and VechevFischer et al. 2019; Reference Ślusarz, Komendantskaya, Daggitt, Stewart and StarkŚlusarz et al. 2023) or modification of an existing model (Reference Leino, Fromherz, Mangal, Fredrikson, Parno and PăsăreanuLeino et al. 2022).
6 Conclusion and future work
We proposed a neurosymbolic framework called NeSyBiCor for correcting the bias of a CNN. Our framework, in addition to correcting the bias of a CNN also allows the user to fine-tune the bias based on general concepts according to their specific needs or application. To the best of our knowledge, this is the first method that does bias correction by using the learnt representations of the CNN filters in a targeted manner. We show through our experiments that the bias-corrected rule-set is highly effective at avoiding the classification of images based on undesired concepts. It is also more likely to classify the images based on the desired concepts. The main component of the NeSyBiCor framework is the semantic similarity loss. It serves as a measure of similarity between the representations learnt by the filters, to the representation of the undesired and desired concepts. Another benefit of the NeSyBiCor framework is that the bias-corrected rule-set is smaller in size, thus improving the interpretability.
As part of future work we intend to use tools such as those designed by Reference Wang, Zhang, Cao, Wang, Shen and HuangWang et al. (2023) and Reference Zhang, Huang, Ma, Li, Luo, Xie, Qin, Luo, Li, Liu, Guo and ZhangZhang et al. (2023) that use vision foundation models for automatic semantic segmentation of images. Using these tools can help create semantic segmentation masks for images with custom names for the concepts that appear in the image. Currently, we are using pre-annotated semantic segmentation masks which often can be limited in the variety of concepts annotated, as well as their labels. With regard to scalability, we plan to examine the effects of scaling to datasets with a large number of classes. The challenge here is that the rule-set generated by the NeSyFOLD framework that we employ tends to lose accuracy critically as the number of classes increases due to the binarization of the filter activations.
Another factor that can diminish accuracy when utilizing the NeSyBiCor framework is the selection of inappropriate concepts or essential concepts that are crucial for the CNN to perform classifications effectively. For example, if the concept “bed” is deemed as undesired by the user, then the filters in the CNN will avoid learning the representations of beds in the images of bedrooms but since it is a significant feature of most bedroom images, there might not be other features that could help the CNN distinguish the bedroom images from other images hence hurting accuracy of the CNN. We intend to investigate methods to maintain accuracy even if the selected concepts are essential.
Finally, the work we presented here may be used to extend implementations of loss functions based on Differentiable Logics: Reference Fischer, Balunovic, Drachsler-Cohen, Gehr, Zhang and VechevFischer et al. (2019); Reference Ślusarz, Komendantskaya, Daggitt, Stewart and StarkŚlusarz et al. (2023); Reference Daggitt, Kokke, Atkey, Slusarz, Arnaboldi and KomendantskayaDaggitt et al. (2024). So far, the idea of compiling a loss function from an arbitrary logical formula (advocated in the above papers) was based on the assumption that during training, checking for property satisfaction is as easy as checking class labels. Therefore, logical loss functions can be easily combined with the standard cross-entropy loss function in backpropagation training. For example, the famous property of neural network robustness involves a trivial robustness decision procedure at the training time (cf. Reference Casadio, Komendantskaya, Daggitt, Kokke, Katz, Amir and RefaeliCasadio et al. 2022). This paper gives an example of a neurosymbolic training scenario when a property of interest cannot be decided at the training time, and we had to resort to using CNN filters in order to replace a deterministic decision procedure. As the Differentiable Logic community continues to extend its range of applications (see e.g. Reference Flinkow, Pearlmutter and MonahanFlinkow et al. 2024)), solutions such as the ones presented here will be needed in order to approximate the decision procedures. We intend to investigate this line of research.
Competing interests
The author(s) declare none.