A novel hybrid meta-heuristic algorithm based on black hole and harmony search for fuzzy clustering problem

Abstract: Fuzzy clustering problem (FCP) is a process of assigning data elements to clusters associated with membership values showing the level of degrees that the elements belong to the groups. It is an optimization problem whose constraints involve the membership degrees. The conventional approach to solve the FCP problem is using an iterative scheme to calculate the membership degrees and the centers until the stopping conditions hold. This approach, named as Fuzzy C-Means (FCM), was invented by Bezdek et al. (1984). Nonetheless, it was shown to converge to local optimum or the saddle points. This leads to a natural question of how to define an algorithm that is able to escape from local optimal point to improve solution quality. In this paper, a new hybrid meta-heuristic algorithm for FCP called BHHS which combines the power of existing meta-heuristic frameworks such as Black Hole and Harmony Search is proposed. These two algorithms cooperate and support each other. The Black Hole part of the algorithm has the goal to find good candidate solution while the Harmony Search part has the goal to generate new candidate solutions for Black Hole when the event horizon of the Black Hole occurs. In addition, new best solutions of these two components are exchanged with each other to improve the quality of the search. The experiments on the benchmark UCI Machine Learning datasets indicate that the proposed framework outperforms the recent state-of-art algorithms for this problem.

pdf7 trang | Chia sẻ: thanhle95 | Lượt xem: 509 | Lượt tải: 0download
Bạn đang xem nội dung tài liệu A novel hybrid meta-heuristic algorithm based on black hole and harmony search for fuzzy clustering problem, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
ISSN 2354-0575 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Journal of Science and Technology 47 A NOVEL HYBRID META-HEURISTIC ALGORITHM BASED ON BLACK HOLE AND HARMONY SEARCH FOR FUZZY CLUSTERING PROBLEM Pham Minh Chuan1,*, Le Hoang Son2, Vu Khanh Quy1, Nguyen Dinh Chien1 1 Hung Yen University of Technology and Education 2 VNU Information Technology Institute, Vietnam National University * Corresponding author: chuanpm@utehy.edu.vn Received: 15/06/2020 Revised: 06/08/2020 Accepted for publication: 12/09/2020 Abstract: Fuzzy clustering problem (FCP) is a process of assigning data elements to clusters associated with membership values showing the level of degrees that the elements belong to the groups. It is an optimization problem whose constraints involve the membership degrees. The conventional approach to solve the FCP problem is using an iterative scheme to calculate the membership degrees and the centers until the stopping conditions hold. This approach, named as Fuzzy C-Means (FCM), was invented by Bezdek et al. (1984). Nonetheless, it was shown to converge to local optimum or the saddle points. This leads to a natural question of how to define an algorithm that is able to escape from local optimal point to improve solution quality. In this paper, a new hybrid meta-heuristic algorithm for FCP called BHHS which combines the power of existing meta-heuristic frameworks such as Black Hole and Harmony Search is proposed. These two algorithms cooperate and support each other. The Black Hole part of the algorithm has the goal to find good candidate solution while the Harmony Search part has the goal to generate new candidate solutions for Black Hole when the event horizon of the Black Hole occurs. In addition, new best solutions of these two components are exchanged with each other to improve the quality of the search. The experiments on the benchmark UCI Machine Learning datasets indicate that the proposed framework outperforms the recent state-of-art algorithms for this problem. Keywords: Black Hole; Clustering quality; Fuzzy clustering problem; Harmony Search; Meta-heuristic algorithms. 1. Introduction In this note, we consider the fuzzy clustering problem (FCP) defined in equations (1-2) which has been used widely for pattern recognition, data compression, image segmentation, computer vision and many other fields [5; 11; 15; 17]. (1) Subject to (2) where N is the number of patterns to be clusters. Each pattern is represented in r dimensions; C is a fixed and known number of clusters (1< C< N); m is a scalar number, which is a tuning parameter that regulates the degree of fuzziness in the clustering process. X, which is a N x r matrix, is the pattern set in the feature space; V, which is a C x r matrix, is the unknown cluster center set; U, which is a N x C matrix, represents the membership degrees in which u kj shows the level of degree that X k belongs to cluster Vj; ||.|| denotes the Euclidean norm. By using the Lagranian method, Bezdek et al. (1984) showed an iteration scheme in an algorithm so-called Fuzzy C-Means (FCM) to calculate the membership degrees and centers as in equations (3-4). (3) (4) Nk ,...,1= , Cj ,...,1= Convergence of FCM to a local optimum or a saddle point was proved in this article. However, finding a global optimum of FCP appears to be very difficult. When both U and V are unknown, which is the general case, the objective function (1) is neither convex nor concave and usually has many local optima. This leads to a natural question of how to define an algorithm that is able to escape from local optimal point to improve solution quality. This ISSN 2354-0575 Journal of Science and Technology48 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 question gives the chance not only to apply meta- heuristic algorithm to solve local optimal issues but also the chance to apply meta-heuristic algorithm to find high quality solutions. In this paper, a new hybrid meta-heuristic algorithm for FCP called BHHS which combines the power of existing meta-heuristic frameworks such as the Black Hole and the Harmony Search algorithms is proposed. These two algorithms cooperate and support each other. The Black Hole part of the algorithm has the goal to find good candidate solution while the Harmony Search part has the goal to generate new candidate solutions for Black Hole when the event horizon of the Black Hole algorithm occurs. In addition, new best solutions of these two components are exchanged with each other to improve the quality of the search. The proposed work will be experimentally validated on the benchmark UCI Machine Learning datasets. 2. Related Work In this section, a brief overview of the literature regarding meta-heuristic approaches for the FCP problem is given. The FCP problem was first mathematically formulated by Dunn [7] who described this problem as a minimization problem subject to conditions on membership functions. Bezdek et al. [4] suggested a generalized version of this problem and proposed a well-known algorithm called Fuzzy C-Means (FCM), which is an iterative heuristic algorithm. The major disadvantage of FCM is easy trapping in local optima. In order to handle this limitation, meta-heuristic algorithms mostly Tabu Search (TS), Variable Neighborhood Search (VNS) and Particle Swarm Optimization (PSO) were used. Tabu Search: Delgado, Skármeta & Barberá [6] proposed two TS heuristic methods to generate trail solutions. The first one, named Move centroids, moves the centroids over the space and then calculates the new membership matrix. The second one, named Shake memberships, swaps the membership matrix to clusters and then calculates the new centroids. Liu, Yi, Wu, Ye & Chen [13] presented a TS based clustering approach called TS-Clustering to deal with the minimum sum-of- squares clustering problem. Variable Neighborhood Search: Belacel, Hansen & Mladenovic [2] proposed the Fuzzy J-Means (FJM) and VNS algorithms to escape from the local minima points during the search. The neighborhood of the current solution is defined by all possible centroid- to-entity relocations followed by corresponding changes of assignments. Moves are made in such neighborhoods until a local optimum is reached. Applications of VNS based fuzzy clustering for gene expression, categorical data, etc. can be found in [3]; Particle Swarm Optimization: Omran, Salman & Engelbrecht [14] proposed a new dynamic clustering approach based on binary PSO for image segmentation. Pang, Wang, Zhou & Dong [16] proposed a modified PSO called FPSO that redefines the position and velocity of particles to represent the fuzzy relation between variables. Izakian & Abraham [12] presented an improvement of FPSO called FCM-PSO. 3. Background theories of meta-heuristic approaches The black hole (BH) algorithm [10] is a population-based method. Like other population- based method, the algorithm starts with a set of candidate solutions, namely stars, distributed randomly in the search space. This population will be evolved with an evolution process and existing solutions are updated so the population is towards the optimal solution via certain mechanisms. In genetic algorithm, the evolution is done by the crossover and mutation operations. In particle swarm optimization, the evolution is done by moving each candidate around the search space using the global information namely gBest and the local information namely pBest. Main step of the BH algorithm are as follows: Step 1. Initialization of Star. Step 2. Evaluate Fitness of each Star. Step 3. Selecting Star with best Fitness as Black Hole. Step 4. Determining position of each Star based on their interactions, as in equation (5). (5) Step 5. Determining position of best Star as the Back Hole. Step 6. Stopping Criteria. Harmony search (HS) [8-9] is a meta-heuristic algorithm for optimization problems. It has the ability to exploit the new suggested solution synchronizing with exploring the search space in both intensification and diversification parallel optimization environment. This algorithm imitates the natural phenomenon of musicians’ behavior when they cooperate the pitches of their instruments together to achieve a fantastic harmony as measured by aesthetic standards. 4. The BHHS Method In this section, a hybrid way to combine Black Hole and Harmony Search algorithms in a new algorithmic framework called BHHS is proposed. It is clear that the two algorithms are population-based algorithm sharing the same characteristics. BH is ISSN 2354-0575 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Journal of Science and Technology 49 able to converge to a good solution faster than HS because in each iteration all candidates are updated while in HS, only a new solution is generated. However, because of HS’s new solution generation mechanism, this algorithm is able to generate new diverse candidates better than the random generated candidates in BH. In this framework, BH has the goal to drive the search, while HS in addition to the ability of exploring the search space to find solutions itself, it also supports BH by providing new generated candidates. We now describe how to define the candidate solutions, and then how these two algorithms operate together in a new framework to search for good final solutions. In order to choose a suitable representation of the candidate solutions for BH and HS, the candidate cluster centers are encoded. The algorithm operates on the cluster centers and generates new solutions in each iteration. The choice of using cluster centers instead of the fuzzy matrix increases the performance regarding execution time and memory required because the number of objects is much large than the number of clusters. In our implementation, each cluster centers vector has a physical length of (C × R) where C is the number of centers and R is the dimension of each center. The solution vector is described as . Now we will describe the proposed algorithm. First, two sets of candidate solutions are generated to form the solution pools for BH and HS. Next, we will start with the BH algorithm. First, the objective function is evaluated for each candidate solution in BH’s population set, and then the best one is selected as the black hole. If the black hole changes its location, which means a new best solution is found, intensify the search by apply again equation (3) with only the old best solution (the old black hole), and the current best one (the current black hole). This is because intuitively we hope that in the direction between these two solutions new improved ones can be found. Next, update each star by using equation (3) to modify the centers and update the black hole if possible. If a star crosses the event horizon, replace this star by a random candidate solution in the population of HS algorithm. After each iteration of the BH algorithm with the main goal to find good candidate solution, the HS algorithm generates a new candidate for itself and also for the BH when event horizon occurs. The HS has the ability to escape the local optimal point and this algorithm never converges. Thus, a solution generated by HS is a good source of new candidate solution for BH. Table 1 describes the algorithm in details. Table 1. The BHHS algorithm 1. Generate two sets of candidate solutions to form two candidate populations for BH and HS. 2. Loop 3. BH part: a. Evaluate the objective function for each star, a candidate solution, in BH’s population. b. Select the candidate star that has the best fitness value as the black hole. If the location of the black hole is changed, intensify the search between the new black hole and the old one by using equation (3) until no new solution is found. c. Change the location of other normal stars according to equation (3). d. If a star reaches a location with lower cost than the black hole, exchange their locations. e. If a star crosses the event horizon of the black hole, replace it with a new candidate which is a random solution in the population of HS. 4. HS part: a. Generate a new candidate harmony x’. For each component 'ix i. With probability HMCR (harmony memory considering rate), pick the stored value from HM: ( ) 110 +×← hmsuii xx ),(int' . ii. With probability 1-HMCR, pick a random value within the allowed range. b. Perform additional work if the value in Step 2 came from HM. i. With probability PAR (pitch adjusting rate), change 'ix by a small amount . ii. With probability 1-par, do nothing. c. If 'x is better than the worst vector in HM’s population, replace the worst one by 'x 5. If the best candidate in BH is better than the best candidate in HS, replace the best one in HS by the best one in BH. 6. If the best candidate in HS is better than the best candidate in BH, replace the best one in BH by the best one in HS and update the black hole. 7. If a termination criterion (a maximum number of iterations or a sufficiently good fitness) is met, exit the loop. 8. End loop. 5. Results and Disscussions 5.1. Experimental environments Experimental tools: The proposed BHHS method is implemented in Matlab and executed un- der the environment of Intel Core i5-3210M CPU @2.50GHz, 4GB RAM and Windows 7, 64bit oper- ating system. It was tested against FCM [4], Fuzzy J-Means (FJM) [3], Variable Neighborhood Search (VNS) [2], Tabu Search (TS) [13], Fuzzy Particle Swarm Optimization (FPSO) [16] and FCM-PSO [12] and the standalone methods such as Harmony Search (HS) [9] and Black Hole (BH) [10]. Six benchmark experimental datasets from UCI Machine Learning Repository are used [1] (Table 2). A real dataset from the TSP-LIB database [18] is also taken. ISSN 2354-0575 Journal of Science and Technology50 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Execution setting: The experimental results of over 10 independent runs with the stopping condition being either the loop of 10000 iterations or 1000 consecutive iterations without improvement of the best solution are calculated. Regarding the parameters, the number of solution candidates for BH is fixed to 10, while the number of solution candidates for HS is fixed to 20. The HCMR parameter is fixed to 0.9 and the PAR parameter is fixed to 0.3 following the suggestion in [9]. Evaluation criteria: the value of objective function (J) in equation (1) and the computational time (sec). Table 2. Descriptions of the datasets Dataset N. of clusters N. of features N. of objects Iris 3 4 150 Wine 3 13 178 Glass 6 9 214 Cancer 2 9 683 CMC 3 9 1473 Vowel 6 3 871 5.2. Discussions Firstly, the experimental results found by the proposed BHHS approach with those of the FPSO and FCM-PSO taken from [12] are given in Table 3. This table measures the results of three algorithms in the worst, the average and the best cases. It is obvious that the proposed algorithm completely outperforms one of the most current state-of-art algorithms for these set of experiments. The results also indicate that the proposed algorithm, BHHS, is able to explore the promising region as well as escape from the local minimum points in the search space which are two important requirements in any search algorithm. Secondly, the performances of the proposed algorithm regarding other owned implemented algorithms such as FJM, VNS, TS, BH and HM are presented in Tables 4 & 5. Table 4 presents the average after 10 runs of each algorithm while Table 5 presents the improvement by percentage of the results of our algorithm with respect to those found by other algorithms. The experimental results indicate that in average, the BHHS algorithm is able to provide competitive results or even better with respect to other algorithms. Thirdly, the average execution time of each implementation is described in Table 6. Regarding the running time factor, because of the high number of iterations and the number of the stars, BHHS requires more execution time than the standalone ones such as FJM or HM. FJM algorithm uses only basic operations so that its execution time is smallest. HM also stops quickly due to its simplicity of its evolution rules. BH and BHHS is a population- based approach like HM, however, each iteration requires the update of all candidate solutions so that they require more execution time than two previous approaches. VNS and TS are able to find good solutions however they require long execution time while not able to provide better results than those of HM. One reason that is because of the difficulty of the designing efficient neighborhood structures for non-convex continuous optimization problems. Fourthly, to further compare the performance of the proposed algorithm with other algorithms, an additional test on a real dataset taken from the TSP- LIB database [18] containing 3038 points in the plane is performed. Different values of the number of centers are verified. The results in Tables 7, 8 & 9 show that FPSO and FCM-PSO provide worst results with worst execution time. This is because these two algorithms require an update of the membership matrix which has the size N x C while other algorithms plays with matrices of size C x R which are typically much smaller. FJM stops quickly however the algorithm provides solutions as not good as other algorithms; the VNS which is based on FJM with long execution time only does a small improvement. The BHHS is able to find the best solution with acceptable running time. It is noted that when C increases which increases the hardness of the clustering problem, the average gap between the solutions found by FJM and VNS with those by BHHS is increased. The TS is able to find results close to those found by the BHHS however it requires more execution time. It is noted that the BHHS requires less running time than the standalone BH while provides better results. It can be interpreted by the contribution of the HM component which is also able to find good results in short execution time. That shows the advantage of the proposed algorithm regarding hard instances. Finally, Table 10 compares the results found by the random initialization (BHHS with Random Initialization) and FCM-initialization (BHHS + FCM Initialization). With FCM-Initialization, we first generate stars with random locations and then run the FCM algorithm to find some good initial candidates solution for the problem. The results show that again the BHHS algorithm is able to find good solutions with different starting candidate configurations. This is important because it proves the ability of escaping saddle points and bad local optimal points of the search space and converging to good local optimal points. ISSN 2354-0575 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Journal of Science and Technology 51 Table 3. The comparative results with FPSO and FCM-PSO in term of objective function on UCI’s instances taken from [12] Instances FPSO FCM-PSO BHHS Worst Average Best Worst Average Best Worst Average Best Iris 69.72 67.39 66.26 62.96 62.55 62.19 61.16 61.02 60.97 Glass 87.37 86.97 86.26 73.11 72.64 72.23 68.75 67.29 66.10 Cancer 2750.1 2724.4 2704.6 2218.7 2190.5 2181.9 2102.42 2102.30 2102.02 Wine 12250.1 11528.8 11173.2 11218.0 10603.9 10411.7 10087.23 10087.09 10086.61 CMC 4190.1 4095.6 4025.2 3531.3 3485.6 3416.5 3241.03 3240.52 3240.06