Интерпретация моделей машинного обучения с помощью методов, основанных на соответствиях Галуа

Паракал Эрик Джордж

Интерпретация моделей машинного обучения с помощью методов, основанных на соответствиях Галуа тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Паракал Эрик Джордж

Паракал Эрик Джордж
кандидат наук
2025

Специальность ВАК РФ00.00.00

Количество страниц 213

Паракал Эрик Джордж. Интерпретация моделей машинного обучения с помощью методов, основанных на соответствиях Галуа: дис. кандидат наук: 00.00.00 - Другие cпециальности. «Национальный исследовательский университет «Высшая школа экономики». 2025. 213 с.

Оглавление диссертации кандидат наук Паракал Эрик Джордж

Motivation

Aim and Objectives of the Research

Scientific Novelty

Approbation of the Research

Key Aspects to be Defended

Structure of the Dissertation

1 Related Work

1.1 Introduction

1.2 Taxonomies of Explainability

1.3 Concept-based Explanations

1.4 FCA and Pattern Structures for ML and XAI

1.5 Graph-based Representations of Text

1.6 Applying GNNs to Text Classification

1.7 Explaining GNNs via Significant Subgraphs

1.8 Informed ML

1.9 Summary and Gap Analysis

2 Theoretical Background

2.1 Introduction

2.2 Galois Connections

2.3 FCA

2.4 Pattern Structures

2.5 The gSOFIA Algorithm for Graph-based Pattern Mining

2.6 Theory and Implementation of CW

3 Inherently Explainable Document Classification via Concept Lattices: A FCA-based Approach

3.1 Introduction

3.2 Constructing Class-specific Concept Lattices

3.3 Baseline Aggregate Rule Classifier

3.4 Inherently Explainable Formal Concepts-based Classifier

3.5 Explanation Generation via Formal Concept Contributions

3.6 Results and Discussion

4 Inherently Explainable Document Classification via Stable Graph Patterns and Conceptual AMR Graphs: A Pattern Structures-based Approach

4.1 Introduction

4.2 Document-to-Graph Conversion

4.3 Mining Stable Graph Pattern Concepts

4.4 Aggregate Rule Classification

4.5 Explanation Generation

4.6 Results and Discussion

5 Explainable Document Classification via Concept Whitening and Stable Graph Patterns: A Hybrid Symbolic-Neural Approach

5.1 Introduction

5.2 Optimizing Concept Alignment via Graph Concepts

5.3 Suitability of Document Graphs for Concept Alignment

5.4 Design Choices for GNN Models in CW-based Document Graph Classification

5.5 Explainability Metrics for CW-based Document Graph Classification

5.6 Computational Cost Analysis

5.7 Results and Discussion

5.7.1 Ablation Analysis

5.7.2 Concept Sensitivity and Structure Analysis

5.7.3 Concept Alignment Visualization

5.7.4 Comparison with an Explainability-focused Baseline

Conclusion and Future Work

Summary of Contributions

Directions for Future Work

References

List of Figures

List of Tables

List of Algorithms

Author Contributions

Appendix: Перевод диссертации на русский язык (Russian Translation of the Dissertation)

Введение диссертации (часть автореферата) на тему «Интерпретация моделей машинного обучения с помощью методов, основанных на соответствиях Галуа»

Введение 112

1 Анализ связанных исследований 118

2 Теоретические основы 138

3 Изначально объяснимая классификация документов с использованием решёток концептов: подход на основе FCA 150

4 Изначально объяснимая классификация документов на основе стабильных графовых паттернов и концептуальных AMR-графов: подход, основанный на структурах шаблонов 161

5 Объяснимая классификация документов с использованием отбеливания концептов и стабильных графовых шаблонов: гибридный символически-нейронный подход 179

Заключение и направления будущих исследований 204

Список иллюстраций 207

Список таблиц 209

Список алгоритмов 211

Вклад авторов 212

Acknowledgments

First and foremost, I would like to thank God, whose grace and blessings have sustained me throughout this journey. All things are only possible through Him.

I am profoundly thankful to my parents, George Varghese and Susan George, for their unwavering and unconditional love, support, prayers, and understanding. Their belief in me has been a constant source of strength, without which I would have given up a long time ago. I also extend my heartfelt thanks to my grandmother, Lizzy Varghese, who has always kept me in her prayers. To my brother, Steve George—thank you for always standing by me.

I owe special thanks to my academic supervisor, Dr. Sergei O. Kuznetsov, for his insightful guidance, remarkable patience, continuous support, and for always pushing me to reach greater heights in my research. Through his mentorship, my eyes have truly been opened to what it means to be a researcher and to pursue science with depth, integrity, and full awareness.

I would also like to thank the members of the School of Data Analysis and Artificial Intelligence at the Faculty of Computer Science, HSE University, for providing a stimulating and supportive academic environment. Their expertise and feedback have greatly enriched this work.

To all who have contributed to this dissertation, whether directly or indirectly, I offer my sincere thanks.

Introduction

Motivation

Recent advances in the field of Machine Learning (ML), particularly those driven by neural networks, have enabled ML models to achieve performance that rivals or even surpasses that of humans across a broad spectrum of tasks, including image recognition [1, 2], natural language understanding [3, 4], speech processing [5, 6], and strategic gameplay [7, 8]. These models are typically composed of a large number of neuronal parameters and utilize deep, layered architectures that are capable of capturing highly non-linear relationships, thereby enabling them to generalize the solving of a wide variety of tasks.

While the complexity of these models facilitates their superlative predictive power, it also renders their internal decision-making processes difficult to interpret, thereby making their decisions hard to explain or trust. The lack of transparency in how these models reach their conclusions limits their adoption in high-stakes domains such as healthcare, finance, and law, where the ability to explain a decision is often as critical as the decision itself. Accordingly, there is a growing consensus that explainability must be a core property of ML models, rather than simply a desirable side effect of their training process.

The lack of transparency has motivated the emergence of the field of Explainable Artificial Intelligence (XAI), which seeks to make model behavior more transparent and trustworthy. Interpretable ML models are those models whose internal decision-making processes are transparent and whose decisions are explainable. The transparency of such models allows them to be termed as white-box models. In contrast, ML models whose internal decision-making processes are opaque are termed as black-box models.

Although there exists a wide array of explainability methods, a vital distinction [9] is the one between post-hoc explainability methods and inherently explainable models. Post-hoc explainability methods explain a model's decisions after it has been trained, whereas inherently explainable models incorporate explainability into their actual design. Inherently explainable models have some distinct advantages over post-hoc explainability methods. It was demonstrated in [10] that the predictive power of a ML model is not necessarily correlated with its explainability. Thus, it cannot be guaranteed that a post-hoc explainability method can generate coherent explanations for a model with high predictive power. [11] argues that inherently explainable models provide reliable explanations that reflect what the models actually learn, unlike those provided by post-hoc explainability methods.

Many explainability methods frame explanations in terms of the input features of the ML models they seek to explain. For example, [12] proposes various ways of perturbing input features across different models trained on diverse data types and observes the resulting changes in decisions to assess feature importance. [13] is a seminal work that also frames explanations around a model's input features, but does so by generating explanations that are comprised of the input features of a neural network represented as abstract, high-level, significant concepts. Intuitively, a concept is comprised of perceptually similar examples of the input data having some semantic meaning. A concept can be considered significant if its presence is influential for the decisions made by the model. The use of such concepts to explain models has led to the formation of a subfield of explainability methods known as concept-based explanations.

The principle of what exactly constitutes a concept in the subfield of concept-based explanations is ambiguous, and consequently, a concept does not have a formal mathematical definition. This ambiguity leads to problems such as being unable to identify relevant concepts and not being able to properly compare concepts regarding their impact on the predictions made by ML models. This dissertation addresses this foundational ambiguity by grounding the vague, abstract notion of a concept within the formal mathematical framework provided by methods based on Galois connections, such as Formal Concept Analysis (FCA) [14] and its extension pattern structures [15].

FCA and pattern structures offer a principled approach to defining and organizing relationships between objects and their attributes using lattice theory. Concepts are structured as elements in partially ordered sets, enabling reasoning about relationships through algebraic operations such as closure, intersection, and subsumption. This dissertation proposes a series of inherently explainable models that leverage methods based on FCA and pattern structures to induce structured, concept-level representations of data that support transparent and formally justified classification decisions.

Document classification is chosen as the testbed for evaluating the proposed models, due to three reasons. First, it is a widely studied and practically relevant task that spans multiple domains, including legal, medical, and academic text processing, where explainability is critical. Second, documents can be naturally represented in forms suitable to work with FCA, such as binary keyword vectors, and similarly for pattern structures using document graphs derived from the text of the documents. Third, document classification often involves complex semantic phenomena that are difficult to capture with purely statistical models such as text embeddings, which makes it an ideal setting for evaluating concept-based explainability.

The central hypothesis of this dissertation is that methods based on Galois connections, such as FCA and pattern structures, can be used to develop inherently explainable document classification models without compromising predictive performance. Rather than treating explainability as an afterthought, the proposed approach incorporates it directly into the model's architecture through symbolic representations and algebraic operations such as closure, intersection, and subsumption, that are applied to the model's training data. These

operations enable classifiers to generate explanations that are consistent, transparent, and intelligible to human users.

In summary, this dissertation contributes to the ongoing shift within the field of XAI from post-hoc explainability toward inherently explainable models, formalized in this case through rigorous mathematical frameworks such as FCA and pattern structures. Document classification provides a suitable and demanding testbed for evaluating the models proposed in this dissertation. Ultimately, these models aim to overcome the longstanding trade-off [9] between explainability and predictive performance—a compromise that has traditionally required sacrificing one in favor of the other.

Aim and Objectives of the Research

The main aim of this dissertation is to create inherently explainable document classification models that generate formal, human-comprehensible explanations using methods based on Galois connections such as FCA and pattern structures. The objectives through which this aim is to be realized are as follows:

• Develop a document classification model based on FCA, where each document is represented as a binary keyword vector. Then, use the graph topology of concept lattices to perform classification and generate explanations in the form of interpretable keyword clusters.

• Extend this model using pattern structures by designing a custom document-to-graph conversion algorithm. This algorithm uses Abstract Meaning Representation (AMR) graphs [16] to construct document graphs from the text of each document.

• Implement a pattern structure-based model that uses document graphs, to enable classification and generate explanations through stable, significant graph patterns and also semantically meaningful text fragments, using aggregate rules.

• Integrate Concept Whitening (CW) [17] into the pattern structure-based model, by replacing the aggregate rules with Graph Neural Networks (GNNs) for classifying the document graphs. CW is used to align the GNN's latent space with graph concepts derived via pattern structures.

• Evaluate a range of metrics to quantify the impact of explanations on classification performance, assess the quality of explanations, and measure the alignment between symbolic concepts and latent representations in the GNNs.

Scientific Novelty

This dissertation introduces the following novel contributions to the field of XAI:

• A unique document classification model that leverages the graph topology of class-specific lattices induced via FCA to classify other documents and generate keyword cluster-based explanations that are readily comprehensible to humans. By utilizing the topology of each class's concept lattice, the model effectively frames explanation generation as an edge prediction problem.

• A specialized algorithm that uses AMR graphs to convert documents into graph-based representations optimized for the graph intersection and subsumption operations used in pattern structures, which also circumvents some of the weakness of the gSOFIA algorithm [18] used here to mine concepts from document graphs.

• An innovative method for classifying document graphs using aggregate rules based on intersection and subsumption relations between graph patterns derived via pattern structures, and the input document graph. Each classification is paired with explanations highlighting key text fragments and influential graph patterns. The aggregate rules support multiple operational modes and penalty configurations, enabling flexible and adaptable classification.

• One of the first known applications of CW to graph classification—particularly, document graph classification—and the first to train CW using graph concepts that are automatically extracted via methods based on pattern structures.

Approbation of the Research

The results of this research have been published in the following peer-reviewed venues:

1. Eric George Parakal and Sergei O. Kuznetsov. "Intrinsically Interpretable Document Classification via Concept Lattices". In: Proceedings of the 10th International Workshop "What can FCA do for Artificial Intelligence?" co-located with the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022), Vienna, Austria, July 23, 2022. Ed. by Sergei O. Kuznetsov, Amedeo Napoli, and Sebastian Rudolph. Vol. 3233. CEUR Workshop Proceedings. CEUR-WS.org, 2022, pp. 9-22. url: https: //ceur-ws.org/Vol-3233/paper2.pdf

2. Sergei O. Kuznetsov and Eric George Parakal. "Explainable Document Classification via Pattern Structures". In: Proceedings of the Seventh International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'23). Ed. by Sergey Kovalev, Igor Kotenko, and Andrey Sukhanov. Cham: Springer Nature Switzerland, 2023, pp. 423-434. isbn: 978-3-031-43789-2

3. Eric George Parakal et al. "Document Classification via Stable Graph Patterns and Conceptual AMR Graphs". In: Conceptual Knowledge Structures - First International Joint Conference, CONCEPTS 2024, Cádiz, Spain, September 9-13, 2024, Proceedings.

Ed. by Inma P. Cabrera, Sébastien Ferré, and Sergei A. Obiedkov. Vol. 14914. Lecture Notes in Computer Science. Springer, 2024, pp. 286-301. doi: 10.1007/978-3-031-67868-4_19. URL: https://doi.org/10.1007/978-3-031-67868-4_19

4. Eric George Parakal et al. "Explainable Document Classification via Concept Whitening and Stable Graph Patterns". Manuscript under review at IEEE Access. 2025

In addition, the findings were presented at the following scientific events:

• The Scientific Conference of the Faculty of Computer Science, 2023, held at HSE University.

• The 7th Workshop on Computational Linguistics and Language Science, 2023, held at HSE University.

Key Aspects to be Defended

This dissertation defends the following core claims, each of which contributes to the theoretical and practical advancement of creating inherently explainable models:

• FCA can be used to construct document classifiers that are both explainable and effective, by using the graph topology of class-specific lattices to classify other documents and generate explanations. This demonstrates that FCA can identify patterns in data that are simultaneously useful for classification and explanation.

• Methods based on pattern structures can be successfully applied to document graphs derived using AMR graphs, in order to classify the document graphs and generate explanations for the classification decisions, using aggregate rules. This confirms that document graphs can be tailored for compatibility with pattern structures, and that pattern structures are effective in extracting meaningful graph patterns for both classification and explanation.

• CW can be successfully integrated with GNN-based document classifiers to align latent representations with semantically meaningful graph concepts derived via pattern structures, thereby creating a hybrid neural-symbolic model that preserves explainability without sacrificing predictive performance.

• The proposed models demonstrate that symbolic and neural-symbolic approaches formulated using Galois connections and realized using FCA and pattern structures are practically viable, and can offer transparent, and semantically meaningful explanations while maintaining predictive performance comparable to standard black-box models.

Structure of the Dissertation

This dissertation is structured as follows: Chapter 1 reviews related work in concept-based explanation methods, the use of FCA and pattern structures in the field of XAI, and prior studies on representing texts as graphs, using GNNs for text classification, and generating explanations for GNN decisions in terms of significant subgraphs. Chapter 2 presents the theoretical background on Galois connections, FCA, pattern structures, the gSOFIA algorithm, and CW. Chapter 3 details the design and functioning of the inherently explainable document classification model based on FCA and the graph topology of concept lattices. Chapter 4 describes the document-to-graph conversion algorithm in detail, and how the pattern structure-based aggregate rule classifier model performs classification and produces paired explanations for the input document graphs. Chapter 5 focuses on integrating CW into GNNs that are trained to classify the document graphs, thereby enhancing interpretability by aligning latent representations with symbolic graph concepts extracted using methods based on pattern structures.

Chapter 1 Related Work

1.1 Introduction

This chapter surveys prior works that are the most relevant to the main research objectives. The focus lies at the intersection of XAI, FCA, pattern structures, and structured graph representations of textual data. Special attention is also given to how stochastic learning methods such as GNNs can operate on these structured graph representations while incorporating explainability. Here, informed ML emerges as a promising strategy for embedding explainability into the learning process. The discussion is organized around seven interrelated lines of research.

Section 1.2 introduces foundational taxonomies of explainability and considers how the models developed in this dissertation can be situated within these taxonomies. Section 1.3 outlines various concept-based explanation methods, including both early latent vector methods and more recent methods that integrate concepts into the model architecture. Section 1.4 discusses how FCA and pattern structures are applied to ML and XAI. Section 1.5 reviews structured graph representations of text, with a focus on AMR graphs and their advantages over other syntactic-based graph representations. Section 1.6 examines the application of GNNs to text classification tasks, while Section 1.7 surveys subgraph-based methods for explaining GNN predictions. Section 1.8 explores informed ML as a way to embed prior knowledge into models, thereby enhancing explainability. Together, these sections provide the theoretical and methodological foundation for the contributions that follow in this dissertation. Finally, Section 1.9 concludes with a synthesis of key findings and a discussion of the research gaps in the reviewed works that this dissertation addresses.

1.2 Taxonomies of Explainability

A growing body of work has sought to formalize the goals, dimensions, and evaluation criteria of explainability methods for ML models. [23] proposed a principled taxonomy based on the mode of evaluation. Application-grounded methods involve real tasks with domain experts. Human-grounded methods involve simplified tasks with ordinary users. Functionally-grounded

forgo human evaluation entirely in favor of formal proxies such as sparsity or stability. The third category remains highly relevant for explainability methods that operate without annotated explanations, such as those presented in this dissertation, by providing guidance on how to justify and validate explainability claims in the absence of user studies. [24] introduces a taxonomy based on what is being explained: the model's processing, its internal representations, or explanations generated as part of the model's output. This complements other frameworks by distinguishing between explainability methods that emulate behavior, probe internal structure, or inherently produce explanations. Their discussion of the trade-off between explainability and completeness also aligns with the broader distinction discussed earlier, between post-hoc explainability methods and inherently explainable models.

[9] further extended these ideas by proposing a comprehensive taxonomy that classifies explainability methods along several directions. The first direction is the distinction between post-hoc explainability methods and inherently explainable models. The second important distinction is that between global and local explainability methods. Global explainability methods aim to provide insight into the model's overall decision-making process across its complete input space, often doing so by exposing the internal structure or rules of the model. Local explainability methods focus on explaining individual predictions, typically by analyzing the model's behavior near a specific input. The third direction distinguishes model-specific and model-agnostic explainability methods. Model-specific methods are tailored toward explaining the internal workings of a particular model and often exploit architectural details such as gradients in neural networks, whereas model-agnostic methods treat the model as a black-box model and explain its behavior based solely on observed inputs and outputs. [25] consolidates these taxonomies and offers practical guidance on how to select explainability methods based on model type, stakeholder needs, and intended use. The framework in [25] also highlights the trade-offs between explainability, fidelity, generalizability, and also illustrates how explanation types map to different user goals. For example, global transparency is essential in high-stakes decision-making settings, while local explanations may suffice for debugging or user-facing model interfaces.

Based on the taxonomies discussed above, all of the models proposed in this dissertation are inherently explainable. Depending on the specific model, the explanations they provide may be either global or local in scope. Likewise, the models incorporate either model-specific or model-agnostic explainability methods, depending on the architectural context and the form of symbolic alignment employed.

1.3 Concept-based Explanations

[13] was the paper that introduced the idea of concept-based explanation methods. It shifted the focus of explainability away from low-level input feature-based methods such as saliency maps [26, 27, 28], and instead aimed to quantify the influence of high-level, human-comprehensible concepts on a model's predictions. In this approach, a concept is represented

by a set of perceptually similar data instances selected by a human. These instances are used to train a linear classifier to distinguish concept-aligned activations from random ones in the latent space of a trained model. The resulting direction in activation space, represented by a Concept Activation Vector (CAV) is then used to measure the directional sensitivity of a model's output with respect to that concept.

Formally, let vlc G Rm be a unit CAV defined for a concept C in layer l of a model, and fi(x) be the activations for an input x at layer l. The directional derivative Sc,kj(x) measures the "conceptual sensitivity" of class k to concept C as is defined as

Sc,k,i(x) = Vhi>k(fi(x)) • vC (1.1)

where hl>k : Rm ^ R denotes the logit output of the model for an input x for the class k and layer l. Let Xk denote all inputs that belong to class k, then the TCAV score for the concept C, class k, and layer l is defined as

TCAV \{x G Xk : Sc,k,i(x) > 0}|

1CAVQc, m = -^--(1.2)

The TCAV score thus measures the proportion of samples in class k for which increasing the alignment with concept C increases the class score.

Because the method is based entirely on post-hoc analysis of latent activations, it can be applied to any differentiable model without requiring architectural changes or retraining. However, the reliance on manually labeled concept examples introduces user bias, and the method assumes linear separability of concepts within the latent space. Moreover, since concepts are externally imposed and do not emerge from the training process itself, TCAV lacks integration with the model's internal reasoning. These shortcomings motivate a series of subsequent works aimed at automating concept discovery, validating concept completeness, and embedding concepts directly into model structure, thereby leading to more principled and scalable frameworks.

Building directly on the limitations of TCAV, Automated Concept-based Explanation (ACE) was proposed in [29], to eliminate the need for manually defined concept sets by automating their discovery. Rather than requiring a human to annotate examples corresponding to each concept, ACE clusters semantically meaningful image segments directly from the input data in an unsupervised fashion, thereby allowing concept explanations to be derived without predefined annotations. The motivation for ACE lies in addressing two critical problems in TCAV: the subjectivity of user-defined concepts and the scalability limitations introduced by manual concept curation.

The ACE pipeline begins by selecting a set of inputs from a single class and applying multiresolution superpixel decomposition to each image. This segmentation process is performed at several granularities, e.g., 15, 50, or 80 segments, to capture visual concepts at different levels of abstraction. Each segment is then passed through a pre-trained Convolutional Neural Network (CNN), and the activations at a selected internal layer are used to embed the segment

into a latent space. Candidate concepts are obtained by clustering these segment embeddings using fc-means clustering, with outlier removal to ensure within-cluster coherence. Each resulting cluster is treated as a candidate concept.

To evaluate the relevance of each cluster, ACE applies the same directional derivative-based sensitivity measure defined in Equation (1.1). That is, for each discovered concept cluster, a CAV is derived by fitting a linear classifier in the latent space. Then, the TCAV score is computed to quantify how often increasing alignment with that cluster's CAV leads to an increase in the class logit. Clusters that yield high sensitivity scores are retained as valid explanatory concepts.

This approach significantly improves the objectivity and scalability of concept-based explanations by shifting the discovery process from human intuition to data-driven embedding and clustering. However, the semantic coherence of the extracted clusters is not guaranteed, and ACE remains confined to domains where perceptual similarity aligns well with conceptual similarity, such as vision. Furthermore, the method does not address the foundational challenge of defining what constitutes a concept beyond intuitive or statistical groupings in activation space, or perceptual similarity in the input domain.

Nevertheless, ACE represents an important evolution of concept-based explanation methods by demonstrating that concepts can be discovered rather than prescribed. This transition from hand-crafted to emergent concepts sets the stage for further refinements, including formal assessments of concept completeness and the integration of concept variables into the model itself.

ConceptSHAP was proposed in [30] to address a central limitation of early concept-based explanation methods: the lack of formal mechanisms to assess how faithfully a set of concepts explains a model's behavior. Whereas TCAV and ACE emphasize directional sensitivity or unsupervised discovery, ConceptSHAP introduces a game-theoretic approach to quantify the importance of each concept in terms of its contribution to a model's predictive performance when used as an intermediate representation.

Let xl, x2,..., xn be training examples having labels y1,..., yn. A pre-trained model f can be decomposed into the following two parts: $(•) G Rd that maps an input x% to an intermediate layer, and hy(^(xl)) that calculates the probability of classifying x% as being predicted as label y by f. Then, supposing there is a set of m concept vectors represented by unit vectors c1,..., cm, each representing a linear direction in the activation space of $(•). For each input x, a concept score vc(x) is defined such that the completeness of a set of concepts can be determined by the likelihood of being able to recover the model's predictions using only the concept score. Stated formally, if concept scores vc(^) are sufficient statistics for the model's output, then there should exist some surrogate model g such that h(g(vc(x))) ~ f (x). For the prediction model f (x) = h(&(x)), the extent to which g replicates f defines the completeness of a set of concept vectors c1,..., cm as:

supfl FXy„v[y = argmaxy' hy'(g(vc(x)))] - ar

Vf cm) = -™-r---(1.3)

Px,y~v[y = argmaxy' fy (x)] - ar

where V denotes the set of validation data and supfl Px,y^v[y = argmaxy hy (g(vc(&)))] is the best possible accuracy obtained by predicting the label given just the concept scores vc, with ar being the accuracy of a random baseline model.

To attribute importance to individual concepts within the set, ConceptSHAP computes Shapley values [31] based on their marginal contributions to completeness. Given a set of concepts CS = {ci,..., cm} and some completeness score n, the ConceptSHAP si for concept ci is given by:

Si(n)= £ (m -lSl~1)l 'lS|! [n(S U{Ci}) - n(S)] (1.4)

' m!

SCCs\{c¿}

These scores reflect the average improvement in completeness achieved by including concept ci, considering all possible subsets S C CS \ ci.

ConceptSHAP thus provides a principled, axiomatically grounded method for ranking concepts by their explanatory utility. It formalizes intuitive notions such as completeness and sufficiency—two criteria that are frequently invoked but rarely defined in earlier concept-based explanation methods. Nonetheless, the reliance on a trained surrogate introduces potential estimation bias, and the method assumes that concept representations are fixed and given, without addressing how to discover or validate them.

As a result, ConceptSHAP is best interpreted as a tool for evaluating and interpreting concept sets produced by other methods, such as ACE. It also marks a shift toward framing in-terpretability in terms of representational sufficiency, which is an idea that resonates strongly with the aims of this dissertation, particularly in formalizing hierarchical relationships among concepts through Galois connections, where the use of FCA and pattern structures makes obtaining the underlying notion of order explicit.

CBMs were proposed in [32] as a method for making the decision-making process of neural networks both interpretable and controllable by design. In contrast to post-hoc explanation methods such as TCAV, ACE, and ConceptSHAP, CBMs enforce a specific intermediate representation consisting of user-defined concepts through which all predictions must pass. This architecture provides a transparent mechanism for understanding and intervening in the model's predictions.

A CBM decomposes the prediction process into two explicit stages: concept prediction and label prediction. Formally, given an input x £ Rd, and a label to be predicted y £ R, the training points for the model are defined as {(x(i),y(i,c(ii>)}n=i with c £ Rk being a vector of k concepts and where Rk denotes the vector of k ground-truth concept annotations for the ¿th example.

The prediction of a CBM is given by y = f (g(x)), where g : Rd ^ Rk maps an input x to the concept space and f : Rk ^ R maps the predicted concepts to the final prediction y. The performance of the CBM is measured by how well f (g(x)) predicts y.

Several training schemes are proposed for optimizing the working of CBMs. In the independent setting, f is trained using ground truth concepts c, while g is trained separately

to predict concepts from x. In the sequential setting, g is trained first, and the predictions c = g(x) are then used to train f. In the joint setting, both f and g are trained end-to-end using a composite loss function:

f,g = argmin/fl^

Ly(f (g(xi)); + XLCj (g(x(i)); c(i))

(1.5)

3

where LY is the label prediction loss and LCj is the loss for predicting concept j, with a weighting factor A controlling the trade-off.

CBMs permit test-time intervention, thereby allowing a user to overwrite the predicted concept vector c with manually specified values in order to simulate counterfactual scenarios. This enables counterfactual reasoning and model debugging in a way that standard blackbox models do not support. While CBMs provide a high degree of explainability by design, they also require access to concept annotations during training, and their performance itself is sensitive to the quality and completeness of the concept set. Moreover, the space of rep-resentable functions is restricted to only those that can be expressed as compositions over concept variables, which limits expressivity in unconstrained settings.

CBMs also represent a structural shift in the design of explainable models by incorporating concepts not merely as post-hoc descriptors but as integral components of the predictive process. This architectural enforcement of intermediate semantics resonates with the aims of this dissertation, particularly in making structural relationships among concepts explicit. While CBMs rely on intermediate representations expressed as vectors, the work in this dissertation extends the idea by embedding concept structure within the model's representation space as partially ordered sets derived using FCA and pattern structures, thus providing a symbolic and mathematically grounded account of conceptual reasoning.

In the field of XAI, concept-based explanation methods prefer explanations to be expressed as semantically meaningful, human-aligned abstractions, rather than the older explainabil-ity methods that represented explanations as low-level, instance-specific feature attribution. Early works such as TCAV introduced the idea of quantifying a model's sensitivity to concepts using activation-space directions. Subsequent works such as ACE automated the discovery of such concepts via unsupervised clustering, while ConceptSHAP formalized the explanatory utility of a concept set through completeness and sufficiency, that is formalized using Shap-ley values. Then, CBMs integrated concepts directly into the model's architecture, which allowed counterfactual reasoning and test-time interventions. Despite these advances, most concept-based explanation methods still use vector-based representations and lack formal mechanisms to reason about the structure and interdependencies among concepts. They also typically assume that concepts are either predefined or statistically emergent, without offering a mathematical framework to capture their internal organization or relational semantics.

Through the use of mathematical frameworks such as FCA and pattern structures, the models developed in this dissertation move beyond isolated concept scoring and toward a lattice-based representation of conceptual relationships. Thus, explainability emerges not

merely from salience or alignment, but from the structural properties of the concept space itself, thereby capturing hierarchical relationships in a principled manner.

1.4 FCA and Pattern Structures for ML and XAI

FCA offers a mathematically grounded approach to uncovering explainable structure in data by converting object-attribute relationships into concept lattices. While originally developed for exploratory data analysis, FCA has since evolved into a broader paradigm for symbolic ML, which is capable of modeling hypotheses, inducing conceptual hierarchies, and enabling data classification. A key strength of FCA lies in its ability to represent knowledge through minimal, algebraically closed units of meaning: formal concepts. This lends it immediate relevance to the goals of explainability, where transparent and structured reasoning is often preferable to opaque statistical approximation.

This section surveys eight works that illustrate how FCA has been applied across diverse ML applications while preserving its core interpretability. It begins with foundational work done in [33], which formalizes FCA as a learning model grounded in hypothesis spaces and classification. From there, it turns to [34], that extends FCA to the numerical domain via interval-based pattern structures, and [35], which applies FCA to induce conceptual hierarchies from syntactic co-occurrences in unstructured text. The section then considers the work done in [36], where concept lattices are used to create neural network architectures at design time. It also reviews the series of studies [37, 38, 39] that combine FCA with cooperative game theory to quantify attribute importance in concept-based learning. Finally, it concludes with the work done in [40], where FCA is used in a post-hoc manner to explain the predictions of black-box neural networks. Together, these works trace the integration of FCA and pattern structures into both symbolic and hybrid learning pipelines aimed at preserving explainability, which is a goal that is directly pursued by the models developed in this dissertation.

[33] is a seminal work that positions FCA as a full-fledged framework for supervised learning, unifying rule extraction, hypothesis formulation, and classification within a lattice-theoretic perspective. Rather than treating FCA as a purely descriptive tool, this work formalizes the notion of learning from positive and negative examples using formal concept intents, where hypotheses correspond to closed sets of attributes covering only one class. Central to this model is the idea of minimal counterexample-free hypotheses, that provide compact, interpretable representations of regularities in the data and support the idea of the most specific generalization. Beyond classical object-attribute contexts, the paper extends these ideas to pattern structures, enabling learning over complex data types like graphs or intervals through meet-semilattices. Notably, this pattern-based extension preserves core FCA semantics while broadening its applicability to real-world ML tasks.

The work also connects FCA to core learning constructs such as decision trees and version spaces, showing how concepts and Galois connections can subsume and generalize these

structures. This leads to a formal correspondence between concept closures and classification paths, thereby offering a principled alternative to heuristic tree induction. Algorithmic considerations are addressed in depth, including tractable hypothesis generation under structural constraints and the use of projection operators to manage complexity in high-dimensional pattern spaces. Applications such as spam filtering and predictive toxicology illustrate the practical utility of the approach. Overall, the contributions of [33] establish FCA not only as a symbolic data analysis tool but as a theoretical and computational framework for explainable learning, laying the essential groundwork for later efforts to integrate FCA with structured data and hybrid symbolic-neural models.

In response to the limitations of FCA in handling numerical data, [34] proposed two mathematically equivalent methods for extracting interpretable patterns from gene expression datasets: one that uses interordinal scaling and the other based on interval-valued pattern structures. The first approach transforms numerical expression profiles into a dense binary context by introducing inequality-based attributes, e.g., "mi < 5" or "mi > 4", which enable conventional FCA algorithms to operate without any loss of information. However, the resulting context often becomes computationally expensive due to its size and density. To address this, a second method was introduced that directly models numerical values as intervals within a pattern structure framework. This approach replaces binary attributes with semilattice-structured interval descriptions and applies adapted FCA algorithms to discover closed patterns, i.e., concepts that are defined by shared expression ranges across biological conditions.

The interval pattern structure retains the formal rigor of FCA while improving both scalability and readability, particularly when the number of distinct numerical values is high. An important contribution is the introduction of an interestingness constraint that limits the width of intervals to prioritize patterns representing coherent gene groups. Experimental results on large-scale fungal gene expression data demonstrate that this method not only outperforms interordinal scaling in computational efficiency, but also yields biologically meaningful insights, including the discovery of co-expressed genes involved in nutrient remobilization and metabolic regulation. This work is among the earliest to demonstrate the practical value of FCA and pattern structures on real-valued scientific data, and it directly informs this dissertation's use of structured representations in explainable classification models.

[35] presents one of the earliest and most thorough applications of FCA to working with unstructured texts, demonstrating how concept lattices can be used to automatically induce hierarchical taxonomies from natural language corpora. Their method begins with first creating a dependency-parsed corpora, from which syntactic relations such as verb-subject, verb-object, and verb-prepositional phrase pairs are extracted. These co-occurrence patterns are then treated as formal attributes describing domain-specific terms, which serve as the formal objects in the FCA context. By applying standard FCA to this data, the method constructs concept lattices that encode conceptual groupings, which are then transformed and pruned into taxonomic structures more suitable for ontological applications. A key innovation is the

use of statistical weighting and smoothing techniques to filter noisy dependencies and address sparsity, which enables more robust extraction of contextually relevant generalizations.

Compared to the hierarchical clustering algorithms on corpora from the tourism and finance domains, the FCA-based approach achieved higher recall and explainability, despite a modest trade-off in precision. Also, while the generated hierarchies are dependent on linguistic usage patterns and thus inherently corpus-bound, the authors emphasize their practical utility for bootstrapping domain ontologies. Their approach not only demonstrates the viability of FCA for text mining, but also provides a compelling case for its use in conceptual structure learning, which is a central idea of the symbolic document classification models developed later in this dissertation.

[36] proposed a novel method for constructing neural network architectures by leveraging the graph topology of the concept lattices derived from formal contexts. Each neuron in the hidden layers of the neural network corresponds to a formal concept, and the connections between them mirror the covering relation in the concept lattice, i.e., they preserve semantic structure within the neural network itself. Input neurons represent attributes, while output neurons represent class labels, creating a model in which classification flows through interpretable concept layers. Concept selection is guided by measures such as Fi score, as well as the purity and coverage of the concepts, thereby enabling the pruning of weak or redundant nodes and yielding sparse architectures with built-in explainability.

Two variants of the model were proposed: the first based on standard, i.e., antitone Galois connections, and the second based on monotone Galois connections that yield disjunctive formal concepts. The latter supports activation semantics more naturally aligned with neural computation, especially under thresholding activation functions. Experiments on UCI datasets such as the Breast Cancer [41], Credit Card Default [42], and Mammo-graphic Mass [43] datasets show that concept lattice-based networks can approach or match the performance of standard neural network architectures while offering significantly more transparency. More importantly, the construction of the neural network is informed by domain-relevant patterns extracted prior to training, marking a shift from black-box architectures to symbolic, data-dependent design. This hybridization of symbolic structure and neural execution serves as a precursor to the neural-symbolic synthesis explored later in this dissertation.

A closely related line of research applies FCA to quantified interpretability, by combining concept-based classification with power indices from cooperative game theory to assess the contribution of individual attributes. [37] introduces this idea in the context of the JSM-method, an FCA formulation of inductive reasoning in which minimal, counterexample-free intents serve as classification hypotheses. While such hypotheses are intrinsically interpretable, they do not convey the relative influence of their constituent attributes. To address this, the authors compute Shapley values over the coalitions of attributes present in an example, treating a coalition as a "winning" one if it entails a positive or negative hypothesis. The resulting per-object Shapley vector provides a local, model-specific explanation of the decision in terms of

Список литературы диссертационного исследования кандидат наук Паракал Эрик Джордж, 2025 год

Список таблиц

2.1 Формальный контекст с 8 объектами и 7 атрибутами............. 140

3.1 Результаты классификации с использованием базового классификатора на основе агрегированных правил для датасета 10 Newsgroups......... 157

3.2 Результаты классификации с использованием объяснимого классификатора на основе формальных понятий для датасета 10 Newsgroups....... 158

3.3 Результаты классификации с использованием базового классификатора на основе агрегированных правил для датасета BBC Sport........... 158

3.4 Результаты классификации с использованием объяснимого классификатора на основе формальных понятий для датасета BBC Sport......... 159

3.5 Формальные понятия, внесшие наибольший вклад в классификацию тестовых документов по классам (датасет 10 Newsgroups)............. 159

3.6 Формальные понятия, внесшие наибольший вклад в классификацию тестовых документов по классам (датасет BBC Sport)............... 159

4.1 Таблица, описывающая результаты классификации документов базовыми моделями для обоих датасетов.......................... 173

4.2 Таблица, описывающая результаты классификации документов классификатором с агрегирующими правилами при всех штрафах, работающим в режиме частых подграфов, для обоих датасетов................ 173

4.3 Таблица, описывающая результаты классификации документов классификатором с агрегирующими правилами при всех штрафах, работающим в режиме по умолчанию, т.е. используя намерения концептов графовых паттернов, для обоих датасетов............................ 173

4.4 Таблица, описывающая результаты классификации документов классификатором с агрегирующими правилами при всех штрафах, работающим в режиме фильтрованных классов эквивалентности, для обоих датасетов. . 173

4.5 Наиболее значимые графовые паттерны частых подграфов для каждого класса в датасете 10 Newsgroups......................... 174

4.6 Наиболее значимые намерения концептов графовых паттернов для каждого класса в датасете 10 Newsgroups....................... 174

4.7 Наиболее значимые графовые паттерны фильтрованных классов эквивалентности для каждого класса в датасете 10 Newsgroups........... 175

4.8 Наиболее значимые графовые паттерны частых подграфов для каждого класса в датасете BBC Sport........................... 175

4.9 Наиболее значимые намерения концептов графовых паттернов для каждого класса в датасете BBC Sport.......................... 175

4.10 Наиболее значимые графовые паттерны фильтрованных классов эквивалентности для каждого класса в датасете BBC Sport............. 175

5.1 Mакроусреднённые значения F1 для классификации документов из дата-сета 10 Newsgroups, полученные с использованием моделей GNN в режиме чёрного ящика без модуля CW, гибридных моделей GNN-BERT и отдельной модели BERT.................................. 191

5.2 Mакроусреднённые значения F1 для классификации документов из дата-сета BBC Sport, полученные с использованием моделей GNN в режиме чёрного ящика без модуля CW, гибридных моделей GNN-BERT и отдельной модели BERT.................................. 192

5.3 Mакроусреднённые значения F1 для классификации документов из дата-сета 10 Newsgroups, полученные с использованием моделей GNN в режиме белого ящика с модулем CW, оценка дана по всем четырём типам графовых концептов.................................... 192

5.4 Mакроусреднённые значения F1 для классификации документов из дата-сета BBC Sport, полученные с использованием моделей GNN в режиме белого ящика с модулем CW, оценка дана по всем четырём типам графовых концептов.................................... 192

5.5 Mакроусреднённые значения F1 метрики CAP для датасета 10 Newsgroups, полученные с использованием чёрных ящиков (GNN-моделей без модуля CW), оцененные по всем четырём типам графовых концептов........ 193

5.6 Mакроусреднённые значения F1 метрики CAP для датасета BBC Sport, полученные с использованием чёрных ящиков (GNN-моделей без модуля CW), оцененные по всем четырём типам графовых концептов........ 194

5.7 Mакроусреднённые значения F1 метрики CAP для датасета 10 Newsgroups, полученные с использованием белых ящиков (GNN-моделей с модулем CW), оцененные по всем четырём типам графовых концептов........ 194

5.8 Mакроусреднённые значения F1 метрики CAP для датасета BBC Sport, полученные с использованием белых ящиков (GNN-моделей с модулем CW), оцененные по всем четырём типам графовых концептов........... 194

5.9 Наивысшие показатели согласования концептов F(Q) при классификации документов на датасете 10 Newsgroups с использованием GNN-моделей с модулем CW, оценённые по всем четырём типам графовых концептов. . . 195

5.10 Наивысшие показатели согласования концептов F(Q) при классификации документов на датасете BBC Sport с использованием GNN-моделей с модулем CW, оценённые по всем четырём типам графовых концептов. . . . 195

Список алгоритмов

1 DocToGRAPH......................................................................164

2 MODIFYGRAPH......................................................................165

3 GRAPHPATT0SENT..................................................................171

Вклад авторов

Intrinsically Interpretable Document Classification via Concept Lattices

Ссылка: Eric George Parakal и Sergei O. Kuznetsov. "Intrinsically Interpretable Document Classification via Concept Lattices". В: Proceedings of the 10th International Workshop "What can FCA do for Artificial Intelligence?"co-located with the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI2022), Vienna, Austria, July 23, 2022. Под ред. Sergei O. Kuznetsov, Amedeo Napoli и Sebastian Rudolph. Т. 3233. CEUR Workshop Proceedings. CEUR-WS.org, 2022, с. 9—22. URL: https://ceur-ws.org/Vol-3233/paper2.pdf Вклад: Разработка и реализация архитектуры модели, различных метрик объяснимости и проведение экспериментов, представленных в статье.

Explainable Document Classification via Pattern Structures

Ссылка: Sergei O. Kuznetsov и Eric George Parakal. "Explainable Document Classification via Pattern Structures". В: Proceedings of the Seventh International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'23). Под ред. Sergey Kovalev, Igor Kotenko и Andrey Sukhanov. Cham: Springer Nature Switzerland, 2023, с. 423—434. ISBN: 978-3-031-43789-2

Вклад: Разработка и реализация архитектуры модели, различных метрик объяснимости и проведение экспериментов, представленных в статье.

Document Classification via Stable Graph Patterns and Conceptual AMR Graphs

Ссылка: Eric George Parakal и др. "Document Classification via Stable Graph Patterns and Conceptual AMR Graphs". В: Conceptual Knoujledge Structures - First International Joint Conference, CONCEPTS 2024, Cádiz, Spain, September 9-13, 2024, Proceedings. Под ред. Inma P. Cabrera, Sébastien Ferré и Sergei A. Obiedkov. Т. 14914. Lecture Notes in Computer Science. Springer, 2024, с. 286—301. DOI: 10.1007/978-3-031-67868-4_19. URL: https://doi.org/10.1007/978-3-031-67868-4_19

Вклад: Разработка и реализация архитектуры модели, различных метрик объяснимости и проведение экспериментов, представленных в статье.

Explainable Document Classification via Concept Whitening and Stable Graph Patterns

Ссылка: Eric George Parakal и др. "Explainable Document Classification via Concept Whitening and Stable Graph Patterns". Рукопись находится на рассмотрении в IEEE Access. 2025

Вклад: Настройка и повторная реализация метода CW для работы с графами документов и графовыми концептами, полученными с использованием методов на основе структур шаблонов, а также разработка новых экспериментов для демонстрации полезности графов документов в сочетании с графовыми нейронными сетями.

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.

Оглавление диссертации кандидат наук Паракал Эрик Джордж

Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Индексы интересности замкнутых описаний в задачах анализа данных и обнаружения знаний2020 год, кандидат наук Махалова Татьяна Павловна

Методы геометрии и топологии для исследования моделей глубокого обучения2025 год, кандидат наук Магай Герман Игоревич

Введение диссертации (часть автореферата) на тему «Интерпретация моделей машинного обучения с помощью методов, основанных на соответствиях Галуа»

Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Проектирование композитных конструкций произвольной формы /Design of freeform composite structures2024 год, кандидат наук Москалева Анастасия Викторовна

Аналитика Больших Текстовых Данных2022 год, кандидат наук Али Ноаман Мухаммад Абоалязид Мухаммад

Модели связывания именованных сущностей в биомедицинском домене2022 год, кандидат наук Мифтахутдинов Зульфат Шайхинурович

Список литературы диссертационного исследования кандидат наук Паракал Эрик Джордж, 2025 год