Кластеризация в обогащенных признаками сетях с использованием подхода восстановления данных тема диссертации и автореферата по ВАК РФ 05.13.17, кандидат наук Шалилех Соруш Ахмад

  • Шалилех Соруш Ахмад
  • кандидат науккандидат наук
  • 2021, ФГАОУ ВО «Национальный исследовательский университет «Высшая школа экономики»
  • Специальность ВАК РФ05.13.17
  • Количество страниц 118
Шалилех Соруш Ахмад. Кластеризация в обогащенных признаками сетях с использованием подхода восстановления данных: дис. кандидат наук: 05.13.17 - Теоретические основы информатики. ФГАОУ ВО «Национальный исследовательский университет «Высшая школа экономики». 2021. 118 с.

Оглавление диссертации кандидат наук Шалилех Соруш Ахмад

Contents

Abstract ii

Acknowledgements iii

1 Introduction

1.1 The relevance and importance of research

1.2 Novelty of the obtained results

1.3 Publications and approbation of the research

1.4 The organization of thesis

2 Literature review

2.1 Network-only clustering methods

2.2 Early fusion methods

2.3 Simultaneous fusion methods

2.4 Late fusion methods

3 Methodologies

3.1 Sequential data recovery clusters extraction methods

3.1.1 Sequential methods at feature-rich networks

3.1.1.1 Motivation

3.1.1.2 Notation

3.1.1.3 Methodology

3.1.2 Sequential methods at feature-rich networks using similarity data

3.1.2.1 Motivation

3.1.2.2 Inner products as similarities

3.1.2.3 Notation

3.1.2.4 Methodology

3.2 Simultaneous data recovery clusters extraction methods

3.2.1 Simultaneous methods at feature-rich network

3.2.1.1 Motivation

3.2.1.2 Notation

3.2.1.3 Methodology

4 Experimental setting

4.1 Algorithms under comparison

4.2 Data sets

4.2.1 Real world data sets

4.2.2 Generating synthetic data sets

4.3 Data pre-processing techniques

4.4 Evaluation criteria

5 Experiments

5.1 Experimental comparison of the methods under consideration

5.1.1 Comparison of the methods over real-world data sets

5.1.2 Comparison of the methods over synthetic data sets with categorical features

5.1.3 Execution time of the methods under consideration over synthetic data

sets with categorical features

5.2 Experimental validation of the proposed methods

5.2.1 Choosing the data standardization options

5.2.1.1 Investigating the impact data pre-processing techniques at small-size networks with quantitative features

5.2.1.2 Investigating the impact data pre-processing techniques at small-size networks with categorical features

5.2.1.3 Investigating the impact data pre-processing techniques at small-size networks combination of quantitative and categorical features

5.2.1.4 Conclusion on pre-processing techniques

5.2.2 Experimental results for the proposed methods at various feature scales

5.2.2.1 The proposed methods at synthetic networks with quantitative features at the nodes

5.2.2.2 The proposed methods at synthetic networks with categorical features at the nodes

5.2.2.3 The proposed methods at synthetic networks with combination of quantitative and categorical features at the nodes

6 Conclusions & future work

6.1 Conclusion

6.2 Future works

Bibliography

A Appendix

.1 The sequential clusters extraction at similarity data: notation of all modes

.2 The sequential clusters extraction at similarity data: methodology of all modes

B Appendix

List of Figures

List of Tables

108

Рекомендованный список диссертаций по специальности «Теоретические основы информатики», 05.13.17 шифр ВАК

Введение диссертации (часть автореферата) на тему «Кластеризация в обогащенных признаками сетях с использованием подхода восстановления данных»

Chapter 1

Introduction

Community detection is widespread and applied in various applications ranging from sociology to biology to computer science. The corresponding data structure is a network, or graph, of objects, called nodes interconnected by pair-wise relationships (edges). Our subject is a more complex data structure, feature-rich network. Specifically, we consider networks at which a set of features are associated with the nodes. If the features are categorical, such a structure is usually referred to as a node attributed network [13, 134]. Since we consider datasets at which the features are not necessarily categorical but may be quantitative or the combination of both, we refer to these data structures as "feature-rich" networks following [63]. Figure (1.1a) intuitively depicts the concept of feature-rich networks.

We define a community as relatively dense interconnected nodes that are also similar in the feature space. Our goal is to extract the clusters in feature-rich networks. Figure (1.1b) visualizes our goal.

Formally, we can define our goal as follows. We define a feature-rich network, i.e., a network with features at the nodes, A = {P, Y}, over an entity set I. Here I is a set of network

(a) Feature-rich networks (b) Detected clusters (our goal)

Figure 1.1: The concept of feature-rich networks and our goal. (A) visualizes the data structure, while (B) depicts our goal, which is to detect clusters/communities

nodes of cardinality 1I1 = N; P = (pj) is an N x N matrix of mutual link weights between nodes i, j £ I; and Y = (yiv) is an N x V matrix of feature values, so that entry yiv is the value of feature v = 1,2,..., V at node i £ I. Our goal is to partition I into K crisps and non-overlapping communities S = {Sk j^Li, where K is the number of communities.

In the rest of this dissertation, interchangeably, we use "cluster" and "community," which reflect the same meaning. Moreover, when either "extraction" or "detection" is associated with "cluster" or "community," this combination, e,g cluster extraction, reflects precisely the same meaning.

Depending on the application, the proposed definition should be interpreted accordingly. For instance, in the context of social networks, a community could be a group of people sharing similar political views or graduated from the same school/University or sharing the same religious interests, while they are following each other in a social network, say, Facebook or Twitter.

In the area of recommendation systems, communities can be defined as groups of a platform's users, such that, not only they exchange information (links), say sending (direct) messages, tagging each other Etc., but also they are characterized by their profile info or by the content of their messages (features). After detecting communities in such circumstances, a recommender system can recommend the appropriate set of services to each community.

Detecting clusters in education systems can also be beneficial. For example, detecting different communities of students who are working on the same projects, or spending their spare time together (forming the networks), along with their features, e.g., age, gender, parent(s) occupation(s), students marks, Etc. could bring significant insights about education and evaluation systems and methods, and also might reveal some causalities.

As detecting communities in daily routines say, analyzing the transportation networks also could be advantageous to yield sufficient information for the top-level managers to make proper decisions regarding the extension of the transportation systems, maintaining them, and so forth.

Moreover, detecting and analyzing the communities in a society not only could help a political party to increase its success likelihood but also can prevent sociological disasters.

Detecting communities in sports activities, terrorist attacks, the spread of infection, and many more, are other applications of the community detection methods.

Recall that our goal in this research is to extract clusters in feature-rich networks. In this regard, one may claim that the above examples and their corresponding results are achievable by solely clustering networks or by merely clustering features. Although this claim might be right in some circumstances, we can still justify these two data sources' simultaneous usage as

follows. First of all, naturally, these two data sources reflect different aspects of a phenomenon under consideration; thus, more accurate results or different interpretations should be obtained. Second, an overwhelming number of researches have shown that using these two data sources usually leads to more accurate results (see example [125, 140]). Consequently, in the current research, based upon these two reasons, we assume that the two data sources are given, and thus our goal is to detect the communities.

Похожие диссертационные работы по специальности «Теоретические основы информатики», 05.13.17 шифр ВАК

Заключение диссертации по теме «Теоретические основы информатики», Шалилех Соруш Ахмад

Chapter 6

Conclusions & future work

In this chapter, we first conclude the thesis, and then we describe several future works.

6.1 Conclusion

In Section (3.1.1), using the conventional data recovery approach, we propose two similar methods, SEFNACs and SEFNACn, for community extraction at feature-rich networks. The methods differ in the assumptions of the network data entries' summability across the link table, yes or no, respectively. In this way, we distinguish between cases where the network data scales are the same for all the network nodes and cases at which each node collects its linkage data independently. The methods are similar in that both a) find clusters one-by-one, b) add the entries of a cluster also one-by-one.

The methods proposed in Section (3.1.2) continue the line of research started in the Section (3.1.1). We explore whether the doubly-greedy least-squares approach proposed in that subsection can be successfully applied to feature-rich networks at which the feature-related part is converted to a similarity matrix format. Usually, similarity data are considered as measured on the same scale; so that one can meaningfully compare and sum similarity values across the entire similarity matrix (summability mode). However, there can be situations in which similarity values in one column (or row) should not be compared with the values in another column (or row) - nonsummable mode. By applying these two assumptions to the two similarity matrices, that feature-generated and that native, with link scores, we come to four different summability patterns denoted in the thesis by ss, ns, sn, and nn, and, accordingly to four different Iterative Community Extraction from Similarity data (ICESi) algorithms.

One of the theoretical advantages of SEFNAC and ICESi is a Pythagorean decomposition of the data scatter in the sum of the least-squares criterion and individual cluster contributions. This

property allows scoring the contribution of various elements of found solutions to the data scatter, which can help interpretation [88]. Among practical advantages is the competitiveness of the doubly-greedy approach regarding its capacity for the cluster recovery against other computational procedures (see, for example, experimental results in [7, 27, 94,115]).

The SEFNAC and ICESi methods have some properties which distinguish them from many others.

Desirable properties: a) no restriction on the feature scale type; b) no restriction on the network data type; c) determining the number of clusters/communities automatically; d) a Pythagorean decomposition of the combined data scatter in the sum of individual clusters' contributions and the minimized criterion.

Less desirable properties: e) the data standardization is a necessary part of the methods, both for network data and feature/similarity data; f) slow computations; g) no advice regarding the constants balancing the relative contributions of two data sources, the network, and features.

Nevertheless, our experiments show that our SEFNAC methods are competitive against state-of-the-art algorithms on small-size and medium-size data. The SEFNAC methods are relatively robust against noise and unfavorable structure parameters such as the probability of intercommunity links, which can be as high as 0.6, meaning that the proportion of inter-community edges may be comparable or even greater than the probability of within-community edges.

On almost all settings for SEFNAC methods, the best data standardization options in our experiments involve z-scoring of the feature data and uniform shift transformation of the network data. The reason why Z-scoring improves our results can be justified as follows. Recall that the Z-scoring leads to different feature ranges, in contrast to the Range standardization. And since our method starts from the most anomalous clusters; thus, it is applying Z-scoring increases the sensitivity of SEFNAC. The uniform shift subtracts a constant threshold from the link values. In contrast, the popular modularity transformation subtracts random noise, which may differ depending on the number of links at different nodes. Our result supports the view [88] that at flat network data, the subtracted value should be flat/constant, too.

It appears that ICESi methods can be competitive too. Taking a closer look at a restricted version of our real-world dataset collection, by ignoring DMoN and KEFRiN methods, they win in most cases over remaining state-of-the-art methods, including in the non-summability mode. They show rather good performance at the networks with categorical features by closely following the winner, SEFNACs, and even outperforming that at some data configurations (in ss and sn modes). At synthetic networks with quantitative features, among all ICESi methods, ICESisn obtains the best results, with occasional interventions of ICESiss.

The properties of the methods mentioned above determine our last two methods' main directions, i.e., KEFRiN methods. First of all, we had to raise the computational power of the methods to be applicable to larger data sets. Thus, we adopt the k-means method to define the cluster center and distances from that to cluster elements according to our clustering criteria by considering both data sources. Furthermore, to alleviate the so-called curse of dimensionality, we used Cosine distance instead of the conventional Euclidean distance. And, this leads to two versions of KEFRiN methods, namely, KEFRiNe and KEFRiNc. Such a development has lead us to two algorithms, capable of handling dozens of thousand nodes rather than thousands, the SEFNAC and ICESi methods' capability.

KEFRiNc performs well at our comparison over real-world data sets. Although it wins just a few different data sets settings at the synthetic data sets, its overall performance is still quite acceptable. More importantly, it can be considered a decent solution for community extraction in feature-rich networks regarding its fast execution time. However, KEFRiNe is not as successful as its counterpart, KEFRiNc. Although it is as fast as KEFRiNc, due to the curse of dimensionality, it loses its efficiency on most of the data sets, especially at big data sets.

We can recommend the following benchmark for applying our proposed methods of this research as follows. A) When the number of clusters is unknown, and the network under consideration has similar characteristics to our synthetic data sets, SEFNACs is the preferable solution. B) In the same setting, if the user seeks faster execution time, SEFNACn would be recommended; C) Applying ICESiss and ICESins could be an extra appropriate trial for networks with the characteristics mentioned above. D) When the user knows the number of clusters disregarding networks' characteristics under consideration, KEFRiNc is the fasted and most robust solution that this research can recommend.

6.2 Future works

We can see several directions for future works. Reformulating our proposed methods in a theory-driven framework is the most promising direction of our future works.

Another direction for future developments is to scrutinize the impact of applying different distance metrics, say, Minkowski distance, Manhattan distance, Mahalanobis distance Etc. on the performance of our methods.

The acceleration of the proposed sequential methods' execution time can be another exciting direction for future developments.

One other direction for future research is studying and analyzing the size proportions between the network data and the feature data. Changing the currently equal values of the balancing constants may become needed.

Applying the proposed simultaneous clustering strategy at feature-rich networks using similarity data should be considered another future work.

Adopting the online-Kmeans or adopting the Kernel-Kmeans for the proposed simultaneous clustering methods is another possible direction for future works. Moreover, applying the early mentioned methods at feature-rich networks using similarity data should be considered another future work.

Finally, investigating the impact of applying more clustering strategies like hierarchical, spectral strategies Etc. on our proposed models' performance can be considered a comprehensive and burdensome future research direction 1.

1Duringthis Ph.D. study, we also investigated the impact oftheSEFNACs clustering criterion using a hierarchical clustering strategy, namely, the Louvain algorithm, and a Spectral decomposition algorithm, i.e., by decomposing the two matrices using Eigen-decomposition. Although the preliminary results were not satisfactory, we still prefer to postpone concluding after conducting more systematic studies.

Список литературы диссертационного исследования кандидат наук Шалилех Соруш Ахмад, 2021 год

Bibliography

[1] B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg. 2012. On the separability of structural classes of communities. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Beijing, China, 624-632.

[2] Abu-El-Haija, S. Alipourfard, N. Harutyunyan, H. Kapoor, and A B. Perozzi. 2018. Higher-Order Graph Convolutional Layer. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS). Curran Associates Inc., NY, United States,

1-8.

[3] L.M. Aiello, C. Cherifi, H. Cherifi, R. Lambiotte, P. Lio, and L.M. Rocha. Annual. Complex Networks and their Applications. Complex Networks. https://complexnetworks.org

[4] E. Akbas and P. Zhao. 2017. Attributed graph clustering: An attribute-aware graph embedding approach.. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, Sydney, Australia, 305-308.

[5] L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. 2012. Parameter-free identification of cohesive subgroups in large attributed graphs. In Proceedings of the 12th SIAM International Conference on Data Mining (PICS). SIAM, Pacific-Asia, 439—450.

[6] Reda Alhajj. regular. Social Network Analysis and Mining (SNAM). Springer. https: //www.springer.com/journal/13278

[7] R.C. Amorim and B. Mirkin. 2012. feature weighting and anomalous cluster initializing in K-Means clustering Minkowski metric. Pattern Recognition 45, 3 (2012), 1061-1075.

[8] D. Arthur and S. Vassilvitskii. 2006. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. ACM, Philadelphia, PAUnited States, 1027-1035.

[9] S.T. Barnard, A. Pothen, and H. Simon. 1995. A spectral algorithm for envelope reduction of sparse matrices. Numer. Lin. Algebra Appl 2, 4 (1995), 317-334.

[10] A. Baroni, A. Conte, M. Patrignani, and S. Ruggieri. 2017. Efficiently clustering very large attributed graphs. In 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) IEEE. IEEE/ACM, Sydney, Australia, 369-376.

[11] M. Berlingerio, M. Coscia, and F. Giannotti. 2011. Finding and characterizing communities in multidimensional networks. In 2011 International Conference on Advances in Social Networks Analysis and Mining. ACM, Hague, Netherlands, 490-494.

[12] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment P10008, 10 (2008).

[13] A. Bojchevski and S. Giinnemanz. 2018. Bayesian robust attributed graph clustering: Joint learning of Partial anomalies and group structure. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, California, USA, 1-10.

[14] S. Boyd and L. Vandenberghe. 2004. Convex optimization (1st ed.). Cambridge university press, GB.

[15] P. S. Bradley, K. P. Bennett, and A. Demiriz. 2000. Constrained k-means clustering.

Microsoft Research, Redmond 20, 0 (2000), e-1.

[16] Z. Bu, G. Gao, H.J. Li, and J. Cao. 2017. CAMAS: A cluster-aware multiagent system for attributed graph clustering. Information Fusion 37 (2017), 10-21.

[17] Z. Bu, H. Li, J. Cao, Z. Wang, and G. Gao. 2019. Dynamic cluster formation game for attributed graph clustering. IEEE Trans. Cybern. 49, 1 (2019), 328-341.

[18] Z. Bu, H.J Li, C. Zhang, J. Cao, A. Li, and Y. Shi. 2019. Graph K-means based on leader identification, dynamic game, and opinion dynamics. IEEE Transactions on Knowledge and Data Engineering 32, 7 (2019), 1348-1361.

[19] Buchta C., Kober M., Feinerer I., and Hornik K. 2012. Spherical k-means clustering.

Journal of statistical software 50, 10 (2012), 1-22.

[20] J. Cao, H. Wanga, D. Jin, and J. Dang. 2019. Combination of links and node contents for community discovery using a graph regularization approach. Future Generation Computer Systems 91, 1 (2019), 361-370.

[21] S. Cao, W. Lu, and Q. Xu. 2015. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, Melbourne, Australia, 891-900.

[22] S. Cavallari, V. W. Zheng, H. Cai, K. C. Chang, and E. Cambria. 2017. Learning community embedding with community detection and node embedding on graphs. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management, ACM. ACM, Sydney, Australia, 377-386.

[23] B.F. Chai, J. Yu, C.Y. Jia, T.B Yang, and Y.W. Jiang. 2013. Combining a popularity-productivity stochastic block model with a discriminative-content model for general structure detection. Physical review E 88, 1 (2013), p.012807.

[24] S. Chang, W. Han, J. Tang, G.J. Qi, C.C. Aggarwal, and T.S. Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Sydney, Australia, 119-128.

[25] H. Cheng, Y. Zhou, and J.X. Yu. 2011. Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 2 (2011), 1-33.

[26] Y. Zhou H. Cheng and J.X. Yu. 2010. Clustering large attributed graphs: An efficient incremental approach. In IEEE International Conference on Data Mining. Springer, Auckland, New Zealand, 689-698.

[27] M.M.T. Chiang and B. Mirkin. 2010. Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification 27, 1 (2010), 3-40.

[28] P. Chunaev. 2020. Community detection in node-attributed social networks: a survey. Computer Science Review 100286, 37 (2020).

[29] S. Citraro and G. Rossetti. 2020. Identifying and exploiting homogeneous communities in labeled networks. Applied Network Science 5,1 (2020), 1-20.

[30] D. Combe, C. Largeron, M. Gery, and E. Egyed-Zsigmond. 2015. I-louvain: An attributed graph clustering method. In International Symposium on Intelligent Data Analysis. Springer, Konstanz, Germany, 181-192.

[31] D. Combe, C. Largeron, M. Gery, and E. Egyed-Zsigmond. 2015. I-louvain: An attributed graph clustering method,. In Advances in Intelligent Data Analysis XIV, E. Fromont, T. De Bie, M. van Leeuwen (Eds.). Springer, Saint Etienne. France, 181-192.

[32] R.L. Cross and A. Parker. 2004. The hidden power of social networks: Understanding how work really gets done in organizations (1st ed.). Harvard Business Press, USA.

[33] J.D. Cruz and C. Bothorel. 2013. Information integration for detecting communities in attributed graphs. In Fifth International Conference on Computational Aspects of Social Networks. IEEE, Fargo, ND, USA, 62-67.

[34] J.D. Cruz, C. Bothorel, and F. Poulet. 2011. Entropy based community detection in augmented social networks. In International Conference on computational aspects of social networks (CASoN) IEEE. IEEE, Salamanca, Spain, 163-168.

[35] X. Cui and T.E. Potok. 2005. Document clustering analysis based on hybrid PSO+ K-means algorithm. Journal of Computer Sciences (special issue) (2005), 27-33.

[36] T.A. Dang and E. Viennet. 2012. Community detection based on structural and attribute similarities. In International conference on digital society (icds). Not published, Valencia, Spain, 7-12.

[37] T.A. Dang and E. Viennet. 2012. Community detection based on structural and attribute similarities. In International conference on Digital Society (icds). -, Valencia Spain, 7-12.

[38] N. Dhanachandra, K. Manglem, and Y.J. Chanu. 2015. Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Computer Science 54 (2015), 764-771.

[39] W.E Donath and A.J. Hoffman. 1973. Lower bounds for the partitioning of graphs. IBM J. Res. Dev 17, 5 (1973), 420-425.

[40] P. Doreian, V. Batagelj, and A. Ferligoj. 2005. Generalized Blockmodeling. Cambridge University Press 25 (2005).

[41] H. Elhadi and G. Agam. 2013. Structure and attributes community detection: comparative analysis of composite, ensemble and selection methods. In Proceedings of the 7th Workshop on Social Network Mining and Analysis. ACM, Chicago, USA, 1-7.

[42] Reda Alhajj et.al. Annual. Advances in Social Networks Analysis and Mining. IEEE/ACM. http://asonam.cpsc.ucalgary.ca/2021/

[43] E.V.Kovaleva and B. Mirkin. 2015. Bisecting K-means and 1D projection divisive clustering: A unified framework and experimental comparison. Journal of Classification 32, 3 (2015), 414-442.

[44] I. Falih., N. Grozavu, R. Kanawati R., and Y. Bennani. 2017. Anca: Attributed network clustering algorithm. In International Conference on Complex Networks and their Applications. Springer, Lyon, France, 241-252.

[45] I. Falih., N. Grozavu, R. R. Kanawati, and Y. Bennani. 2018. Community detection in attributed network. In Companion Proceedings of the The Web Conference 2018. International World Wide Web Conferences Steering Committee Republic and Canton of Geneva Switzerland, Lyon, France, 1299-1306.

[46] M. Fiedler. 1973. Algebraic connectivity of graphs. Czech. Math. J 23, 2 (1973), 298-305.

[47] Bianchi F.M., Grattarola D., and Alippi C. 2020. Spectral clustering with graph neural networks for graph pooling. In International Conference on Machine Learning (PMLR). PMLR web, Long Beach, California, USA., 874-883.

[48] C. Gao and Z. Ma. 2018. Minimax rates in network analysis: Graphon estimation, community detection and hypothesis testing. Statist. Sci. 36, 1 (2018), 16-33.

[49] S. Gaucher and O. Klopp. 2019. Maximum likelihood estimation of sparse networks with missing observations. arXiv preprint (2019). arXiv:1902.10605

[50] M. Ghorbani, H.R. Rabiee, and A. Khodadadi. 2016. Bayesian Overlapping Community Detection in Dynamic Networks. arXiv preprint (2016). arXiv:1605.02288

[51] M. Girvan and M.E. Newman. 2002. Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A 99, 12 (2002), 7821-7826.

[52] B.L. Golden and D.R. Shier. Annual. Networks. Wiley Periodicals. https://onlinelibrary. wiley.com/journal/10970037

[53] A. Goldenberg, A.X. Zheng, S.E. Fienberg, and E.M. Airoldi. 2010. A survey of statistical network models. Now Publishers Inc 20,1 (2010), 1-10.

[54] P.J. Green and B.W. Silverman. 1993. Nonparametric regression and generalized linear models: a roughness penalty approach (1 ed.). Chapman and Hall/CRC, USA.

[55] D. He, Z. Feng, D. Jin, X. Wang, and W. Zhang. 2017. Joint identification of network communities and semantics via integrative modeling of network topologies and node contents. In In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, Vancouver, Canada, 6-12.

[56] D. He, D. Jin, Z. Chen, and W. Zhang. 2015. Identification of hybrid node and link communities in complex networks. Nature Scientific Reports 5, 1 (2015), 1-14.

[57] M. Hoffman, D. Steinley, K.M. Gates, M.J. Prinstein, and M. J. Brusco. 2018. Detecting clusters/communities in social networks. Multivariate Behavioral Research 53, 1 (2018), 57-73.

[58] P.W. Holland, K.B. Laskey, and S. Leinhardt. 1983. Stochastic blockmodels: First steps.

Social networks 5, 2 (1983), 109-137.

[59] Y. Hu, M. Li, P. Zhang, Y. Fan, and Z. Di. 2008. Community detection by signaling on complex networks. Physical Review E 78, 1 (2008), 16115.

[60] Y. Huang and H. Wangg. 2016. Consensus and multiplex approach for community detection in attributed networks. In 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP. IEEE Community, Ottawa, Ontario, Canada, 425-429.

[61] L. Hubert and P. Arabie. 1985. Comparing partitions,. Journal of Classification 2,1 (1985), 193-218.

[62] INSNA. Annual. International Networks for Social Networks Analysis. INSNA. https: //www.insna.org/#

[63] R. Interdonato, M. Atzmueller, S. Gaito, R. Kanawati, C. Largeron, and A. Sala. 2019. Feature-rich networks: going beyond complex network topologies. Applied Network Science 4, 1 (2019), 20-33. https://doi.org/10.1007/s41109-019-0111-x

[64] H. Ito, T. Komamizu, T. Amagasa, and H. Kitagawa. 2018. Community Detection and Correlated Attribute Cluster Analysis on Multi-Attributed Graphs. In In EDBT/ICDT Workshops. ACM, Copenhagen, Denmark, 2-9.

[65] M.A. Javed, M. S. Younis, S. Latif, J. Qadir, and A. Baig. 2018. Community detection in networks: A multidisciplinary review. Journal of Network and Computer Applications 108, 1 (2018), 87-111.

[66] C. Jia, Y. Li, M.B. Carson, X. Wang, and J. Yu. 2017. Node attribute-enhanced community detection in complex networks. Scientific Reports 7, 1 (2017), 2626.

[67] C. Jia, Y. Li, M.B. Carson, X. Wang, and J. Yu. 2017. Node attribute-enhanced community detection in complex networks. Scientific Reports 7,1 (2017), 1-15.

[68] H. Jia, S. Ding, and M. Du. 2017. A Nystrom spectral clustering algorithm based on probability incremental sampling. Soft Comput 21, 19 (2017), 5815-5827.

[69] D. Jin, J. He, B. Chai, and D. He. 2021. Semi-supervised community detection on attributed networks using non-negative matrix tri-factorization with node popularity.

Frontiers of Computer Science 15, 4 (2021), 1-11.

[70] H. Jin, W. Yu, and S. Li. 2018. A clustering algorithm for determining community structure in complex networks. Physica A: Statistical Mechanics and its Applications 492 (2018), 980-993.

[71] D.R. Karger. 1993. Global Min-cuts in RNC, and Other Ramifications of a Simple MinCut Algorithm. In SODA. Society for Industrial and Applied Mathematics3600 University City Science Center Philadelphia, PAUnited States, Austin Texas USA, 21-30.

[72] D.P Kingma and J. Ba. 2014. A method for stochastic optimization. arXiv preprint cs.LG, (2014), arXiv 1412.6980.

[73] T.N. Kipf and M. Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv arXiv preprint (2016). arXiv:1609

[74] D.B. Larremore, A. Clauset, and C.O. Buckee A. 2013. network approach to analyzing highly recombinant malaria parasite genes. PLoS Computational Biology 9, 10 (2013), p.e1003268.

[75] E. Lazega. 2001. The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership (1st ed.). Oxford University Press, GB.

[76] J. Leskovec and R. Sosic. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1. https://github.com/snap-stanford/snap/tree/master/examples/cesna

[77] P. Li, L. Huang, C. Wang, D. Huang, and J. Lai. 2018. Community detection using attribute homogenous motif. IEEE Access 6 IEEE Access, 6 (2018), 47707—47716.

[78] Y. Li, C. Jia, and J. Yu. 2015. Parameter-free community detection method based on centrality and dispersion of nodes in complex networks. Physica A:Statistical Mechanics and Its Applications 438, 1 (2015), 321--334.

[79] L. Liu, L. Xu, Z. Wangy, and E. Chen. 2015. Community detection based on structure and content: A content propagation perspective. In 2015 IEEE International Conference on Data Mining. IEEE Computer Society, Washington DC, USA, 271-280.

[80] S. Luo, Z. Zhang, Y. Zhang, and S. Ma. 2019. Co-association matrix-based multi-layer fusion for community detection in attributed networks. Entropy 21, 1 (2019), p.95.

[81] X. Luo, Z. Liu, M. Shang, and M. Zhou. 2020. Highly-Accurate Community Detection via Pointwise Mutual Information-Incorporated Symmetric Non-negative Matrix Factorization. IEEE Transactions on Network Science and Engineering -, - (2020), -.

[82] J. B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Symposium on Math, Statistics, and Probability. CA: University of California Press, Berkeley, USA, 281-297.

[83] M. Maia, J. Almeida, and V. Almeida. 2008. Identifying user behavior in online social networks. In Proceedings of the 1st workshop on Social network systems. ACM, Europe, 1-6.

[84] O. Maqbool and H.A. Babri. 2004. The weighted combined algorithm: A linkage algorithm for software clustering. In Proceedings Eighth European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society, Tampere, Finland, 1524.

[85] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298 5594 (2002), 824—827.

[86] B. Mirkin. 1987. Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification 4 (1987), 7-31.

[87] B. Mirkin. 2008. The iterative extraction approach to clustering. In A. Gorban (Ed.) "Principal Manifolds for Data Visualization and Dimension Reduction. Springer, Berlin, Germany, 151-177.

[88] B. Mirkin. 2012. Clustering: A Data Recovery Approach (2nd ed.). CRC Press, USA.

[89] B. Mirkin and S. Nascimento. 2012. Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Information Sciences 183, 1 (2012), 16-34.

[90] I.B. Mohamad and D. Usman. 2013. Standardization and its effects on K-means clustering algorithm. Research Journal of Applied Sciences, Engineering and Technology 6,17 (2013), 3299-3303.

[91] A. Monge and C. Elkan. 1997. An efficient domain-independent algorithm for detecting approximately duplicate database records. Citeseer 85 (1997), 4-20.

[92] A. Morvan, K. Choromanski, C. Gouy-Pailler, and J. Atif. 2017. Graph Sketching-based Massive Data Clustering. arXiV preprint (2017). arXiv:1703.02375

[93] Stanley N., Bonacci T., R. Kwitt, Niethammer M., and Mucha P.J. 2019. Stochastic block models with multiple continuous attributes. Applied Network Science 4, 1 (2019), 1-22.

[94] S. Nascimento, S. Casca, and B. Mirkin. 2015. A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images. Computers & Geosciences 85 (2015), 74-85.

[95] J. Neville, M. Adler, and D. Jensen. 2003. Clustering relational data using attribute and link information. In Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence. -, San Francisco, 9-15.

[96] M.E. Newman. 2006. Modularity and community structure in networks. Proc. Natl. Acad. Sci. U.S.A 103, 23 (2006), 8577-8582.

[97] M.E. Newman. 2006. Modularity and community structure in networks. In Proceedings of the National Academy of Sciences 103(23). PNAS, USA, 8577-8582.

[98] M.E. Newman and A. Clauset. 2016. Structure and inference in annotated networks.

Nature Communications 7,1 (2016), 1-11.

[99] M.E. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. 69, 2 (2004), E026113.

[100] M. J. Newman. 2016. SIAN source file. W DC University. https://www.nature.com/ articles/ncomms 11863

[101] A. Ng. 2011. Sparse autoencoder CS294A Lecture notes 72.2011, pp. 1-19. Stanford University. https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

[102] W. De Nooy, A. Mrvar, and V. Batagelj. 2004. Exploratory Social Network Analysis with Pajek, (1st ed.). Cambridge University Press, GB.

[103] K. Nowicki and T.A.B. Snijders. 2001. Estimation and prediction for stochastic block-structures. Journal of the American statistical association 96, 455 (2001), 1077-1087.

[104] A. Ferligoj (Eds.) P. Doreian, V. Batagelj. 2020. Advances in Network Clustering and Blockmodeling (1st ed.). John Wiley & Sons, USA.

[105] L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. Pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab -, - (1999).

[106] A. Papadopoulos, D. Rafailidis, G. Pallis, and M.D. Dikaiakos. 2015. Clustering attributed multi-graphs with information ranking. In Database and Expert Systems Applications. Springer, Turin, Italy, 432-446.

[107] L. Peel, D.B. Larremore, and A. Clauset. 2017. The ground truth about metadata and community detection in networks. Science advances 3, 5 (2017), e1602548.

[108] A. Pothen. 1997. Graph partitioning algorithms with applications to scientific computing. In Parallel Numerical Algorithms. Springer, -, 323-368.

[109] A. Pothen, H.D Simon, and K. P. Liou. 1990. Partitioning sparse matrices with eigenvectors of graphs. SIAMJ. Matrix Anal. Appl 11, 3 (1990), 430-452.

[110] U.N. Raghavan, R. Albert, and S. Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical review E 76, 3 (2007), 036106.

[111] Giulio Rossetti. 2020. EVA source file. Italian National Research Council. https: //github.com/GiulioRossetti/EVA

[112] M. Roux. 2015. A Comparative Study of Divisive Hierarchical Clustering Algorithms. arXiv preprint (2015). arXiv:1506.08977

[113] T. Semertzidis, D. Rafailidis, M.G. Strintzis, and P. Daras. 2015. Large-scale spectral clustering based on pairwise constraints. Inf. Process. Manag 51, 5 (2015), 616—624.

[114] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. 2008. Collective classification in network data. AI magazine 29, 3 (2008), 93-106.

[115] S. Shalileh and B. Mirkin. 2020. A Method for Community Detection in Networks with Mixed Scale Features at Its Nodes. In International Conference on Complex Networks and Their Applications. Springer, Madrid, Spain, 3-14.

[116] O. Shchur, M. Mumme, A. Bojchevski, and S. Gunnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint (2018). arXiv:1811.05868

[117] H. Shen, X. Cheng, K. Cai, and M.B M.B. Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and its Applications 388, 8 (2009), 1706-1712.

[118] J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell 22, 8 (2000), 888—905.

[119] T. Snijders. 2001. Lawyers Data Set. Siena. https://www.stats.ox.ac.uk/~snijders/ siena/

[120] K. Steinhaeuser and N.V. Chawla. 2008. Community detection in a large real-world social network. In In Social computing, behavioral modeling, and prediction. Springer, College Park, MD, USA, 168-175.

[121] D. Steinley. 2006. K-means clustering: a half-century synthesis. Brit. J. Math. Statist. Psych. 59, 1 (2006), 1-34.

[122] A. Strehl and J. Ghosh. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research -, - (2002), 583-617.

[123] H. Sun, F. He, J. Huang, Y. Sun, Y. Li, C. Wang, L. He, Z. Sun, and X. Jia. 2020. Network Embedding for Community Detection in Attributed Networks,. ACM Transactions on Knowledge Discovery from Data (TKDD) 14, 3 (2020), 1-25.

[124] P. I. Sanchez, E. Muller, U. L. Korn, K. Bohm, A. Kappes, T. Hartmann, and D Wagner. 2015. Efficient algorithms for a robust modularity-driven clustering of attributed graphs. In

Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, Vancouver, Canada, 100-108.

[125] F. Tang, C. Wang, J. Su, and Y. Wang. 2020. Semidefinite programming based community detection for node-attributed networks and multiplex networks. Communications in Statistics-Simulation and Computation - (2020), 1-17.

[126] F. Tian, B. Gao, Q. Cui, E. Chen, and T.Y. Liu. 2014. Learning deep representations for graph clustering.. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, Quebec, Canada, 100-106.

[127] J.A. T.M. Cover and Thomas. 2012. Elements of Information Theory (1st ed.). John Wiley and Sons, USA.

[128] A. Tsitsulin, J. Palowitch, B. Perozzi, and E. Muller. 2020. Graph clustering with graph neural networks. arXiv preprint (2020). arXiv:2006.16904

[129] G.F. Tzortzis and A.C. Likas. 2009. The global kernel K-means algorithm for clustering in feature space. IEEE transactions on neural networks 20, 7 (2009), 1181-1194.

[130] M. Vichi. 2008. Fitting semiparametric clustering models to dissimilarity data. Advances in Data Analysis and Classification 2, 2 (2008), 121-161.

[131] C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang. 2019. Attributed graph clustering: A deep attentional embedding approach. arXiv preprint (2019). arXiv:1906.06532

[132] D. Wang and Y. Zhao. 2019. Network community detection from the perspective of time series. Physica A: Statistical Mechanics and its Applications 522 (2019), 205-214.

[133] X. Wang, D.Jin, X. Cao, L. Yang, and W. Zhang. 2016. Semantic community identification in large attribute networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI'16,. ACM, Arizona, USA, 265—271.

[134] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. 2012. A model-based approach to attributed graph clustering. In Proceedings of the 2012 ACM SIGMOD international conference on management of data (ACM). ACM, Arizona, USA, 505-516.

[135] J. Yang, J. McAuley, and J. Leskovec. 2013. Community detection in networks with node attributes. In IEEE 13th International Conference on Data Mining. IEEE Computer Society, Washington DC, USA, 1151-1156.

[136] T. Yang, R. Jin, Y. Chi, and S. Zhu. 2009. Combining link and content for community detection: a discriminative approach. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Alexandria Virginia, USA, 927-936.

[137] W. Ye, L. Zhou, X. Sun, C. Plant, and C. Bohm. 2017. Attributed graph clustering with unimodal normalized cut. In M. Ceci, J. Hollmen, C. Todorovski Vens, and S. Dzeroski (Eds.) Machine Learning and Knowledge Discovery in Databases. Springer, Skopje, Macedonia, 601-616.

[138] Z. Yin, M. Gupta, T. Weninger, and J. Han. 2010. A unified framework for link recommendation using random walks. In 2010 International Conference on Advances in Social Networks Analysis and Mining. IEEE Computer Society, San Francisco, CA, USA, 152-159.

[139] H. Zanghi, S. Volant, and C. Ambroise. 2010. Clustering based on random graph model embedding vertex features. Pattern Recognition Letters 31, 9 (2010), 830-836.

[140] Y. Zhang, E. Levina, and J. Zhu. 2016. Community detection in networks with node features. Electronic Journal of Statistics 10, 2 (2016), 3153-3178.

[141] H. Zhou. 2003. Distance, dissimilarity index, and network community structure. Phys. Rev. E 67, 6 (2003), E-061901.

[142] H. Zhou and R. Lipowsky. 2004. Network brownian motion: a new method to measure vertex-vertex proximity and to identify communities and subcommunities. In International Conference on Computational Science. AMC, Faro, Portugal, 1062-1069.

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.