Abstract:
Fuzzy clustering problem (FCP) is a process of assigning data elements to clusters associated with
membership values showing the level of degrees that the elements belong to the groups. It is an optimization
problem whose constraints involve the membership degrees. The conventional approach to solve the FCP
problem is using an iterative scheme to calculate the membership degrees and the centers until the stopping
conditions hold. This approach, named as Fuzzy C-Means (FCM), was invented by Bezdek et al. (1984).
Nonetheless, it was shown to converge to local optimum or the saddle points. This leads to a natural
question of how to define an algorithm that is able to escape from local optimal point to improve solution
quality. In this paper, a new hybrid meta-heuristic algorithm for FCP called BHHS which combines the
power of existing meta-heuristic frameworks such as Black Hole and Harmony Search is proposed. These
two algorithms cooperate and support each other. The Black Hole part of the algorithm has the goal to find
good candidate solution while the Harmony Search part has the goal to generate new candidate solutions
for Black Hole when the event horizon of the Black Hole occurs. In addition, new best solutions of these
two components are exchanged with each other to improve the quality of the search. The experiments on the
benchmark UCI Machine Learning datasets indicate that the proposed framework outperforms the recent
state-of-art algorithms for this problem.
7 trang |
Chia sẻ: thanhle95 | Lượt xem: 528 | Lượt tải: 0
Bạn đang xem nội dung tài liệu A novel hybrid meta-heuristic algorithm based on black hole and harmony search for fuzzy clustering problem, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
ISSN 2354-0575
Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Journal of Science and Technology 47
A NOVEL HYBRID META-HEURISTIC ALGORITHM
BASED ON BLACK HOLE AND HARMONY SEARCH
FOR FUZZY CLUSTERING PROBLEM
Pham Minh Chuan1,*, Le Hoang Son2, Vu Khanh Quy1, Nguyen Dinh Chien1
1 Hung Yen University of Technology and Education
2 VNU Information Technology Institute, Vietnam National University
* Corresponding author: chuanpm@utehy.edu.vn
Received: 15/06/2020
Revised: 06/08/2020
Accepted for publication: 12/09/2020
Abstract:
Fuzzy clustering problem (FCP) is a process of assigning data elements to clusters associated with
membership values showing the level of degrees that the elements belong to the groups. It is an optimization
problem whose constraints involve the membership degrees. The conventional approach to solve the FCP
problem is using an iterative scheme to calculate the membership degrees and the centers until the stopping
conditions hold. This approach, named as Fuzzy C-Means (FCM), was invented by Bezdek et al. (1984).
Nonetheless, it was shown to converge to local optimum or the saddle points. This leads to a natural
question of how to define an algorithm that is able to escape from local optimal point to improve solution
quality. In this paper, a new hybrid meta-heuristic algorithm for FCP called BHHS which combines the
power of existing meta-heuristic frameworks such as Black Hole and Harmony Search is proposed. These
two algorithms cooperate and support each other. The Black Hole part of the algorithm has the goal to find
good candidate solution while the Harmony Search part has the goal to generate new candidate solutions
for Black Hole when the event horizon of the Black Hole occurs. In addition, new best solutions of these
two components are exchanged with each other to improve the quality of the search. The experiments on the
benchmark UCI Machine Learning datasets indicate that the proposed framework outperforms the recent
state-of-art algorithms for this problem.
Keywords: Black Hole; Clustering quality; Fuzzy clustering problem; Harmony Search; Meta-heuristic
algorithms.
1. Introduction
In this note, we consider the fuzzy clustering
problem (FCP) defined in equations (1-2) which
has been used widely for pattern recognition, data
compression, image segmentation, computer vision
and many other fields [5; 11; 15; 17].
(1)
Subject to
(2)
where
N is the number of patterns to be clusters. Each
pattern is represented in r dimensions; C is a fixed
and known number of clusters (1< C< N); m is a
scalar number, which is a tuning parameter that
regulates the degree of fuzziness in the clustering
process. X, which is a N x r matrix, is the pattern
set in the feature space; V, which is a C x r matrix,
is the unknown cluster center set; U, which is a N
x C matrix, represents the membership degrees in
which u
kj
shows the level of degree that X
k
belongs
to cluster Vj; ||.|| denotes the Euclidean norm.
By using the Lagranian method, Bezdek et al.
(1984) showed an iteration scheme in an algorithm
so-called Fuzzy C-Means (FCM) to calculate the
membership degrees and centers as in equations (3-4).
(3) (4)
Nk ,...,1= , Cj ,...,1=
Convergence of FCM to a local optimum or a saddle
point was proved in this article. However, finding a
global optimum of FCP appears to be very difficult.
When both U and V are unknown, which is the
general case, the objective function (1) is neither
convex nor concave and usually has many local
optima. This leads to a natural question of how to
define an algorithm that is able to escape from local
optimal point to improve solution quality. This
ISSN 2354-0575
Journal of Science and Technology48 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020
question gives the chance not only to apply meta-
heuristic algorithm to solve local optimal issues but
also the chance to apply meta-heuristic algorithm to
find high quality solutions.
In this paper, a new hybrid meta-heuristic
algorithm for FCP called BHHS which combines
the power of existing meta-heuristic frameworks
such as the Black Hole and the Harmony Search
algorithms is proposed. These two algorithms
cooperate and support each other. The Black Hole
part of the algorithm has the goal to find good
candidate solution while the Harmony Search part
has the goal to generate new candidate solutions
for Black Hole when the event horizon of the
Black Hole algorithm occurs. In addition, new best
solutions of these two components are exchanged
with each other to improve the quality of the search.
The proposed work will be experimentally validated
on the benchmark UCI Machine Learning datasets.
2. Related Work
In this section, a brief overview of the
literature regarding meta-heuristic approaches for
the FCP problem is given. The FCP problem was
first mathematically formulated by Dunn [7] who
described this problem as a minimization problem
subject to conditions on membership functions.
Bezdek et al. [4] suggested a generalized version of
this problem and proposed a well-known algorithm
called Fuzzy C-Means (FCM), which is an iterative
heuristic algorithm. The major disadvantage of
FCM is easy trapping in local optima. In order to
handle this limitation, meta-heuristic algorithms
mostly Tabu Search (TS), Variable Neighborhood
Search (VNS) and Particle Swarm Optimization
(PSO) were used.
Tabu Search: Delgado, Skármeta & Barberá [6]
proposed two TS heuristic methods to generate
trail solutions. The first one, named Move
centroids, moves the centroids over the space and
then calculates the new membership matrix. The
second one, named Shake memberships, swaps the
membership matrix to clusters and then calculates
the new centroids. Liu, Yi, Wu, Ye & Chen [13]
presented a TS based clustering approach called
TS-Clustering to deal with the minimum sum-of-
squares clustering problem.
Variable Neighborhood Search: Belacel, Hansen &
Mladenovic [2] proposed the Fuzzy J-Means (FJM)
and VNS algorithms to escape from the local minima
points during the search. The neighborhood of the
current solution is defined by all possible centroid-
to-entity relocations followed by corresponding
changes of assignments. Moves are made in such
neighborhoods until a local optimum is reached.
Applications of VNS based fuzzy clustering for
gene expression, categorical data, etc. can be found
in [3];
Particle Swarm Optimization: Omran, Salman
& Engelbrecht [14] proposed a new dynamic
clustering approach based on binary PSO for
image segmentation. Pang, Wang, Zhou & Dong
[16] proposed a modified PSO called FPSO that
redefines the position and velocity of particles to
represent the fuzzy relation between variables.
Izakian & Abraham [12] presented an improvement
of FPSO called FCM-PSO.
3. Background theories of meta-heuristic approaches
The black hole (BH) algorithm [10] is a
population-based method. Like other population-
based method, the algorithm starts with a set of
candidate solutions, namely stars, distributed
randomly in the search space. This population will
be evolved with an evolution process and existing
solutions are updated so the population is towards
the optimal solution via certain mechanisms. In
genetic algorithm, the evolution is done by the
crossover and mutation operations. In particle
swarm optimization, the evolution is done by
moving each candidate around the search space
using the global information namely gBest and the
local information namely pBest. Main step of the
BH algorithm are as follows:
Step 1. Initialization of Star.
Step 2. Evaluate Fitness of each Star.
Step 3. Selecting Star with best Fitness as
Black Hole.
Step 4. Determining position of each Star
based on their interactions, as in equation (5).
(5)
Step 5. Determining position of best Star as
the Back Hole.
Step 6. Stopping Criteria.
Harmony search (HS) [8-9] is a meta-heuristic
algorithm for optimization problems. It has the
ability to exploit the new suggested solution
synchronizing with exploring the search space in
both intensification and diversification parallel
optimization environment. This algorithm imitates
the natural phenomenon of musicians’ behavior
when they cooperate the pitches of their instruments
together to achieve a fantastic harmony as measured
by aesthetic standards.
4. The BHHS Method
In this section, a hybrid way to combine Black
Hole and Harmony Search algorithms in a new
algorithmic framework called BHHS is proposed. It
is clear that the two algorithms are population-based
algorithm sharing the same characteristics. BH is
ISSN 2354-0575
Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Journal of Science and Technology 49
able to converge to a good solution faster than HS
because in each iteration all candidates are updated
while in HS, only a new solution is generated.
However, because of HS’s new solution generation
mechanism, this algorithm is able to generate new
diverse candidates better than the random generated
candidates in BH. In this framework, BH has the
goal to drive the search, while HS in addition to
the ability of exploring the search space to find
solutions itself, it also supports BH by providing
new generated candidates. We now describe how to
define the candidate solutions, and then how these
two algorithms operate together in a new framework
to search for good final solutions.
In order to choose a suitable representation of
the candidate solutions for BH and HS, the candidate
cluster centers are encoded. The algorithm operates
on the cluster centers and generates new solutions
in each iteration. The choice of using cluster
centers instead of the fuzzy matrix increases the
performance regarding execution time and memory
required because the number of objects is much large
than the number of clusters. In our implementation,
each cluster centers vector has a physical length of
(C × R) where C is the number of centers and R is
the dimension of each center. The solution vector is
described as .
Now we will describe the proposed algorithm.
First, two sets of candidate solutions are generated
to form the solution pools for BH and HS. Next,
we will start with the BH algorithm. First, the
objective function is evaluated for each candidate
solution in BH’s population set, and then the best
one is selected as the black hole. If the black hole
changes its location, which means a new best
solution is found, intensify the search by apply
again equation (3) with only the old best solution
(the old black hole), and the current best one (the
current black hole). This is because intuitively
we hope that in the direction between these two
solutions new improved ones can be found. Next,
update each star by using equation (3) to modify the
centers and update the black hole if possible. If a
star crosses the event horizon, replace this star by a
random candidate solution in the population of HS
algorithm. After each iteration of the BH algorithm
with the main goal to find good candidate solution,
the HS algorithm generates a new candidate for
itself and also for the BH when event horizon
occurs. The HS has the ability to escape the local
optimal point and this algorithm never converges.
Thus, a solution generated by HS is a good source
of new candidate solution for BH. Table 1 describes
the algorithm in details.
Table 1. The BHHS algorithm
1. Generate two sets of candidate solutions to form
two candidate populations for BH and HS.
2. Loop
3. BH part:
a. Evaluate the objective function for each star, a
candidate solution, in BH’s population.
b. Select the candidate star that has the best fitness
value as the black hole.
If the location of the black hole is changed, intensify
the search between the new black hole and the old one
by using equation (3) until no new solution is found.
c. Change the location of other normal stars according
to equation (3).
d. If a star reaches a location with lower cost than the
black hole, exchange their locations.
e. If a star crosses the event horizon of the black hole,
replace it with a new candidate which is a random
solution in the population of HS.
4. HS part:
a. Generate a new candidate harmony x’. For each
component 'ix
i. With probability HMCR (harmony memory
considering rate), pick the stored value from HM:
( ) 110 +×← hmsuii xx
),(int' .
ii. With probability 1-HMCR, pick a random value
within the allowed range.
b. Perform additional work if the value in Step 2 came
from HM.
i. With probability PAR (pitch adjusting rate), change
'ix by a small amount .
ii. With probability 1-par, do nothing.
c. If 'x is better than the worst vector in HM’s
population, replace the worst one by 'x
5. If the best candidate in BH is better than the best
candidate in HS, replace the best one in HS by the
best one in BH.
6. If the best candidate in HS is better than the best
candidate in BH, replace the best one in BH by the
best one in HS and update the black hole.
7. If a termination criterion (a maximum number of
iterations or a sufficiently good fitness) is met, exit the
loop.
8. End loop.
5. Results and Disscussions
5.1. Experimental environments
Experimental tools: The proposed BHHS
method is implemented in Matlab and executed un-
der the environment of Intel Core i5-3210M CPU
@2.50GHz, 4GB RAM and Windows 7, 64bit oper-
ating system. It was tested against FCM [4], Fuzzy
J-Means (FJM) [3], Variable Neighborhood Search
(VNS) [2], Tabu Search (TS) [13], Fuzzy Particle
Swarm Optimization (FPSO) [16] and FCM-PSO
[12] and the standalone methods such as Harmony
Search (HS) [9] and Black Hole (BH) [10].
Six benchmark experimental datasets from
UCI Machine Learning Repository are used [1]
(Table 2). A real dataset from the TSP-LIB database
[18] is also taken.
ISSN 2354-0575
Journal of Science and Technology50 Khoa học & Công nghệ - Số 27/Tháng 9 - 2020
Execution setting: The experimental results
of over 10 independent runs with the stopping
condition being either the loop of 10000 iterations
or 1000 consecutive iterations without improvement
of the best solution are calculated. Regarding the
parameters, the number of solution candidates for
BH is fixed to 10, while the number of solution
candidates for HS is fixed to 20. The HCMR
parameter is fixed to 0.9 and the PAR parameter is
fixed to 0.3 following the suggestion in [9].
Evaluation criteria: the value of objective
function (J) in equation (1) and the computational
time (sec).
Table 2. Descriptions of the datasets
Dataset N. of
clusters
N. of
features
N. of
objects
Iris 3 4 150
Wine 3 13 178
Glass 6 9 214
Cancer 2 9 683
CMC 3 9 1473
Vowel 6 3 871
5.2. Discussions
Firstly, the experimental results found by the
proposed BHHS approach with those of the FPSO
and FCM-PSO taken from [12] are given in Table 3.
This table measures the results of three algorithms
in the worst, the average and the best cases. It is
obvious that the proposed algorithm completely
outperforms one of the most current state-of-art
algorithms for these set of experiments. The results
also indicate that the proposed algorithm, BHHS,
is able to explore the promising region as well as
escape from the local minimum points in the search
space which are two important requirements in any
search algorithm.
Secondly, the performances of the proposed
algorithm regarding other owned implemented
algorithms such as FJM, VNS, TS, BH and HM
are presented in Tables 4 & 5. Table 4 presents
the average after 10 runs of each algorithm while
Table 5 presents the improvement by percentage of
the results of our algorithm with respect to those
found by other algorithms. The experimental results
indicate that in average, the BHHS algorithm is able
to provide competitive results or even better with
respect to other algorithms.
Thirdly, the average execution time of each
implementation is described in Table 6. Regarding
the running time factor, because of the high number
of iterations and the number of the stars, BHHS
requires more execution time than the standalone
ones such as FJM or HM. FJM algorithm uses
only basic operations so that its execution time is
smallest. HM also stops quickly due to its simplicity
of its evolution rules. BH and BHHS is a population-
based approach like HM, however, each iteration
requires the update of all candidate solutions so that
they require more execution time than two previous
approaches. VNS and TS are able to find good
solutions however they require long execution time
while not able to provide better results than those
of HM. One reason that is because of the difficulty
of the designing efficient neighborhood structures
for non-convex continuous optimization problems.
Fourthly, to further compare the performance of
the proposed algorithm with other algorithms, an
additional test on a real dataset taken from the TSP-
LIB database [18] containing 3038 points in the
plane is performed. Different values of the number
of centers are verified.
The results in Tables 7, 8 & 9 show that FPSO
and FCM-PSO provide worst results with worst
execution time. This is because these two algorithms
require an update of the membership matrix which
has the size N x C while other algorithms plays with
matrices of size C x R which are typically much
smaller. FJM stops quickly however the algorithm
provides solutions as not good as other algorithms;
the VNS which is based on FJM with long execution
time only does a small improvement. The BHHS is
able to find the best solution with acceptable running
time. It is noted that when C increases which
increases the hardness of the clustering problem,
the average gap between the solutions found by
FJM and VNS with those by BHHS is increased.
The TS is able to find results close to those found
by the BHHS however it requires more execution
time. It is noted that the BHHS requires less running
time than the standalone BH while provides better
results. It can be interpreted by the contribution of
the HM component which is also able to find good
results in short execution time. That shows the
advantage of the proposed algorithm regarding hard
instances.
Finally, Table 10 compares the results found
by the random initialization (BHHS with Random
Initialization) and FCM-initialization (BHHS +
FCM Initialization). With FCM-Initialization, we
first generate stars with random locations and then
run the FCM algorithm to find some good initial
candidates solution for the problem. The results
show that again the BHHS algorithm is able to find
good solutions with different starting candidate
configurations. This is important because it proves
the ability of escaping saddle points and bad local
optimal points of the search space and converging
to good local optimal points.
ISSN 2354-0575
Khoa học & Công nghệ - Số 27/Tháng 9 - 2020 Journal of Science and Technology 51
Table 3. The comparative results with FPSO and FCM-PSO in term of objective function
on UCI’s instances taken from [12]
Instances FPSO FCM-PSO BHHS
Worst Average Best Worst Average Best Worst Average Best
Iris 69.72 67.39 66.26 62.96 62.55 62.19 61.16 61.02 60.97
Glass 87.37 86.97 86.26 73.11 72.64 72.23 68.75 67.29 66.10
Cancer 2750.1 2724.4 2704.6 2218.7 2190.5 2181.9 2102.42 2102.30 2102.02
Wine 12250.1 11528.8 11173.2 11218.0 10603.9 10411.7 10087.23 10087.09 10086.61
CMC 4190.1 4095.6 4025.2 3531.3 3485.6 3416.5 3241.03 3240.52 3240.06