Abstract. In fuzzy databases represented by linguistic values, linguistic terms carry
qualitative semantics. However, when it comes to calculating linguistic terms or
when a comparison between linguistic terms is needed, it is necessary to quantify
their semantics. If the method of quantitation is suitable, the computational
efficiency at the next step will be higher. When a database is embedded in Complete
Linear Hedge Algebras, each base language values may have accompanying
emphasis hedges. In such cases, the semantics of linguistic terms are quantified
to become subintervals of an interval [0, 1]. This article is about quantification
semantics of linguistic terms embedded in Complete Linear Hedge Algebras and a
application in querying in the database having the linguistic terms. The calculation
is done both to the language values and real numbers. Query results are quite
reasonable.
9 trang |
Chia sẻ: thanhle95 | Lượt xem: 264 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Semantic quantitation of linguistic values embedded in complete Linear Hedge algebras and querying in the linguistic database, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
JOURNAL OF SCIENCE OF HNUE
Interdisciplinary Science, 2013, Vol. 58, No. 5, pp. 30-38
This paper is available online at
SEMANTIC QUANTITATION OF LINGUISTIC VALUES EMBEDDED
IN COMPLETE LINEAR HEDGE ALGEBRAS AND QUERYING
IN THE LINGUISTIC DATABASE
Nguyen Tan An1 and Nguyen Van Quyen2
1Faculty of Information Technology, Hanoi National University of Education
2Department of Post Graduate, Hai Phong University
Abstract. In fuzzy databases represented by linguistic values, linguistic terms carry
qualitative semantics. However, when it comes to calculating linguistic terms or
when a comparison between linguistic terms is needed, it is necessary to quantify
their semantics. If the method of quantitation is suitable, the computational
efficiency at the next step will be higher. When a database is embedded in Complete
Linear Hedge Algebras, each base language values may have accompanying
emphasis hedges. In such cases, the semantics of linguistic terms are quantified
to become subintervals of an interval [0, 1]. This article is about quantification
semantics of linguistic terms embedded in Complete Linear Hedge Algebras and a
application in querying in the database having the linguistic terms. The calculation
is done both to the language values and real numbers. Query results are quite
reasonable.
Keywords: Computation with words, semantics of terms, hedge algebras, linguistic
representation model.
1. Introduction
Systems support making decisions increasingly tend to approach human processing
information. This is how information processing of collected data or information that
is none-precise, incomplete, uncertain and/or vague and described by words in natural
languages take place.
The processing of vague data represented by linguistic values usually occurs in
following ways:
i) Using fuzzy set theory: Each linguistic value is represented by a fuzzy set
characterized by a membership function or a membership function plus a no-membership
function.
Received March 2, 2013. Accepted June 5, 2013.
Contact Nguyen Tan An, e-mail address: nguyentanan@yahoo.com
30
Semantic quantitation of linguistic values embedded in complete linear hedge algebras...
ii) Based on the possibility theory: Each language value is given an possibility
distribution to receive values in the reference domain.
iii) With an approach in hedge algebras, each language database is embedded in the
hedge algebras.
Each method involves a series of subsequent problems, such as the integration of
fuzzy values, a comparison of fuzzy values, computing the distance between fuzzy values
and measuring the similarity between the fuzzy values. Therefore the quality of the fuzzy
systems when different methods are applied is not the same.
When the language database is embedded in the hedge algebras, each base language
value may have accompanying emphasis hedges. While this presents, a convenient way
for computing, the semantics of linguistic terms are quantified to become subintervals of
an interval [0, 1]. This paper presents the embedding language databases in the Complete
Linear Hedge Algebra (ComLin-HA) and the application of semantic quantitation in order
to query the data in this case.
This paper include main contences: ComLin-HA, semantic quantitation of the
language values, a example of queries in fuzzy databases and the last one, some
conclusions.
2. Content
2.1. An overview of ComLin-HA
2.1.1. Hedge algebra (HA)
Definition 2.1. An algebra X = (X,G,H,≤), where X is a term domain, G is a set of
primary terms,H is a set of linguistic hedges, and ≤ is an semantically ordering relation
(SOR) on X , is called a hedge algebra (HA) if it satisfies the following [1]:
(A1). Each hedge is either positive or negative the others, including itself;
(A2). If terms u and v are independent, then, ∀x ∈ H(u), we have x /∈ H(v). In
addition, if u and v are incomparable, i.e., u ≮ v and v ≮ u, then so are x and y, for
every x ∈ H(u) and y ∈ H(v);
(A3). If x 6= hx , then x /∈ H(hx), and if h 6= k and hx ≤ kx , then h′hx ≤ k′kx, for
all h, k, h′, k′ ∈ H and x ∈ X . Moreover, if hx 6= kx , then hx and kx are independent;
(A4) If u /∈ H(v) and uleqv ((u ≥ v), then u ≤ hv (u ≥ hv) for any h ∈ H .
2.1.2. ComLin-HA
A ComLin-HA is is presented by 7-tuple AX = (X,G,C,H,
∑
, φ,≤), where:
X ∈ Dom(X) withDom(X) being the set of all terms of X;
G = {0; c−;W ; c+; 1} is the set of its primary terms, in which W is the neutral
concept positioned in between the two primary terms, i.e. we have 0 ≤ c− ≤ W ≤
c+ ≤ 1;
C = {c−; c+} is called a set of basic words;
H is the set of hedges considered as to be unary operations on X .H = H− ∪H+;
31
Nguyen Tan An and Nguyen Van Quyen
The effect of h acting on x is expressed by the fact that either hx ≥ x or hx ≤ x.
In the case that kx ≥ hx ≥ x or y ≥ hy ≥ ky, for a certain x or y, we say that the effect
of k is greater than the effect of h, and write k ≥ h.
H is divided intoH+ andH−:
H+ = {h ∈ H : hc− ≤ c−or hc+ ≥ c+}
H− = {h ∈ H : hc− ≥ c−or hc+ ≤ c+}
In this paper, we denote by H− = {h0; h1; . . . ; h−q}, h0 < h1 < . . . < hp and
H+ = {h0; h1; . . . ; hp}, h0 1 are the hedges. Some
time, h0 is replayed by I (identity): ∀x, Ix = x.∑
is the operator such that
∑
x = supremumH(x);
φ is the operator such that φx = infimumH(x);
And ≤ is an order relation on X.
For example:
Let a set of basic words C be {true; false}, that mean c+ = true, c− = false. If
a set of hedges H be {Very;More; Identity;Less;Possible}, in which, Identity is the
identity operator, not only is a positive, but there is also a negative. For short we denote
H = {V ;M ; I;L;P}.
It is clear that:
Pfalse ≥ false and Ptrue ≤ true.
Lfalse ≥ false and Ltrue ≤ true.
Mfalse ≤ false andMtrue ≥ true.
V false ≤ false and V true ≥ true.
Therefore H+ = {V ;M ; I};H− = {I;L;P}.
H(true) = {true;Ltrue;Mtrue;LLtrue;LMtrue;MMtrue, . . .}.
and
H(false) = {false;Lfalse;Mfalse;LLfalse;LMfalse;MMfalse; . . .}.
2.2. Semantic quantitation of linguistic terms
2.2.1. Semantic quantitation of ComLine-HA
Definition 2.2. Let X be a HA. f : X → [0, 1] is called the function for the semantic
quantitation of AX if ∀h, k ∈ H+ or ∀h, k ∈ H−, and ∀x, y ∈ X we have:
|f(hx)− f(x)|
|f(kx)− f(x)| =
|f(hy)− f(y)|
|f(ky)− f(y)|
2.2.2. Fuzziness measures of term x
Definition 2.3. Let X be a HA. fm : X → [0, 1] is called the fuzziness measure of term
x if
(1). fm(c−) = θ and fm(c+) = 1 − θ, in which θ ∈ 0[0, 1] and∑
−q≤i≤q,i 6=0 fm(hi(u)) = fm(u), u ∈ H(x).
32
Semantic quantitation of linguistic values embedded in complete linear hedge algebras...
(2). If x is a script concept, i.e. H(x) = x, then fm(x) = 0. Hence, fm(0) =
fm(W ) = fm(1) = 0.
(3). ∀x, y ∈ X , ∀h ∈ H , we have fm(hx)
fm(hy)
=
fm(hy)
fm(y)
. That means that this ratio
does not depends on x and y. This ratio is denoted by µ(h) and is called the fuzziness
measure of hedge h. Based on the structure of the algebras, the fuzziness measure fm of
term x ∈ X, defined by an axiomatization, has the following properties [2-5]:
Based on the structure of the algebras, the fuzziness measure fm of the term x ∈ X ,
defined by an axiomatization, has the following properties [2-5]:
i) fm(hx) = µ(h)fm(x), ∀x ∈ X and fm(x) = 0, ∀x ∈ LDom(X);
ii) fm(c−) + fm(c+) = 1;
iii)
∑
−q≤i≤p,i 6=0 fm(hic) = fm(c), where c ∈ c−, c+;
iv)
∑
−q≤i≤p,i 6=0 fm(hix) = fm(x), x ∈ X;
v) sum−q≤i≤−1µ(hi) = α and
∑
1≤i≤p = β, where
alpha, β > 0 and α + β = 1,
2.2.3. Algebraic signs of Linguistic terms
We observe that the semantic order-based semantics of terms leads to the fact
that terms and hedges have the so-called semantic tendencies that can be recognized as
follows:
- As above presented, for x = c ∈ G, the comparability of hc and c implies that
H is partitioned into two sets: H+ = {h ∈ H : hc− ≤ c− or hc+ ≥ c+}, which
consists of the hedges that increase the both semantic tendencies of the primary terms;
H− = {h ∈ H : hc− ≥ c− or hc+ ≤ c+}, which consists of the hedges that decrease
these semantic tendencies. This leads to the interesting concept that the primary terms
and hedges have their own “algebraic” sign: sign(c+) = +1, sign(c−) = −1, sign(h) =
+1, for h ∈ H+, and sign(h) = −1, for h ∈ H−.
- The comparability of khx and hx implies also that either k increases or decreases
the effect of h. For instance, having x ≤ hx the inequalities x ≤ hx ≤ khx state that k
increases the effect of h and x ≤ khx ≤ hx state that k decreases the effect of h. We
shall write sign(k, h) = +1, for the former case, and sign(k, h) = −1, for the latter one.
They are called the relative sign of k with respect to h. For example, it can be verified that
sign(V, L) = +1 while sign(L, V ) = −1.
- If hh′x 6= h′x, we say that the effect of h in the expression hh′x is proper. A string
hn . . . h1c, where hi ∈ H and c ∈ G, is said to be a canonical string representation of x
if x = hn . . . h1c and the effect of all h
′
is is proper in this expression. It was proved that
the canonical representation of x is unique, for any term x, and, hence, we may define the
length of x, denoted by |x|, which is just the length of the string hm . . . h1c of x. Now, the
sign of a linguistic term x, can be defined by
Sgn(x) = sign(hm, hm− 1)× . . .× sign(h2, h1)× sign(h1)× sign(c)
The meaning of Sgn is expressed as follows:
33
Nguyen Tan An and Nguyen Van Quyen
(Sgn(hx) = +1) ⇒ (hx ≥ x) and (Sgn(hx) = −1) ⇒ (hx ≤ x) For example,
since Sgn(V L_true) = sign(V, L)sign(L)sign(true) = −1, we have V Ltrue ≤
Ltrue.
On this basis, we have
Definition 2.4. Sgn : X → {−1, 0, 1} is defined as follow:
(1). Sgn(c−) = −1 and Sgn(hc−) =
{
+Sgn(hc−) if hc− < c−
−Sgn(hc−) if hc− ≥ c−
(2). Sgn(c+) = +1 and Sgn(hc+) =
{
+Sgn(hc+) if hc+ < c+
−Sgn(hc+) if hc+ ≥ c+
(3). Sgn(h′hx) = −Sgn(hx) if h′ is negative for h and hh′x 6= hx
(4). Sgn(h′hx) = +Sgn(hx) if h′ is posstive for h and hh′x 6= hx
(5). Sgn(h′hx) = 0 if hh′x = hx
2.2.4. Semantic heredity – an essential meaning of linguistic hedges
An essential property of hedges is the so-called semantic heredity, which states
that the terms generated from a given term x by using hedges must inherit or contain the
(genetic) core meaning of x own. This implies that hedges cannot change the essential
meaning of terms expressed in terms of the semantic order relation ≤, i.e. it results in the
following:
+ If the meaning of hx and kx is expressed by the order relationship hx ≤ kx,
h 6= k, then any hedges h′ and k′ cannot change this semantic relationship, that is
hx 6= kx⇒ h′hx ≤ k′kx
+ Similarly, if the meaning of x and hx is expressed by either x ≤ hx or hx ≤ x,
then x ≤ hx⇒ x ≤ h′hx or hx ≤ x⇒ h′hx ≤ x
Assuming that H− = {h0; h−1; . . . ; h−q} and H+ = {h0; h1; . . . ; hp}, where
h0 = I , the artificial hedge identity, and h0 < h−1 < . . . < h−q and h0 < h1 <
. . . < hp, the hedge heredity leads to the following results:
+ For Sgn(hpx) = −1, H(hpx) ≤ . . . ≤ H(h1x) ≤ x ≤ H(h−1x) ≤ . . . ≤
H(h−qx)
+ For Sgn(hpx) = +1, H(h−qx) ≤ . . . ≤ H(h−1x) ≤ x ≤ H(h1x) ≤ . . . ≤
H(hpx)
with a note that H(h0x) = H(Ix) = x, as our convention. In particular, we have:
0 ≤ H(c−) ≤ W ≤ H(c+) ≤ 1 + The sets H(hjx), j ∈ [−q∧p], where
[−q∧p] = {j| − q ≤ j ≤ p}, constitute a partition of H(x), i.e. they are disjoint and
H(x) =
⋃
j∈[−q∧p] H(hjx)
These only such listed properties show already that term-domains of linguistic
variables with such qualitative semantics of terms possess a rich order-based structure.
Therefore, we may observe that hedge algebras are formalized structures of the qualitative
semantics of term-domains, noting that the meaning of a term represented in a formalized
structure carries much information than by a fuzzy set itself, in general.
34
Semantic quantitation of linguistic values embedded in complete linear hedge algebras...
2.2.5. Fuzziness measure of terms using fuzziness intervals
* Fuzziness interval of x ∈ X
For every term x, the fuzziness interval of x ∈ X is a subinterval of [0, 1] of length
fm(x), denoted by ℑfm(x), which will be constructed by induction on the length of x as
follows:
(i) For x of length 1, i.e. x ∈ c+; c−, ℑfm(c−) and ℑfm(c+) are intervals which
constitute a partition of [0, 1] and satisfy the conditions that c− ≤ c+ implies ℑfm(c−) ≤
ℑfm(c+), |ℑfm(c−)| = fm(c−) and |ℑfm(c+)| = fm(c+), where |ℑ(c+)| denotes the
length of ℑ(x) and the notation U ≤ V means that, ∀x ∈ U, ∀y ∈ V , we have x ≤ y.
(ii) Suppose that ℑfm(x) has been defined and |ℑfm(x)| = fm(x), ∀x of length k.
Then, the fuzziness intervalsℑfm(hix) : i ∈ [−q∧p] are constructed so that they constitute
a partition of ℑfm(x) and satisfy the conditions that |ℑfm(hix)| = fm(hix) and
ℑfm(hix) : i ∈ [−q∧p] is a linearly ordered set, whose order is induced by the order of
the set {h−qx; h−q+1x; . . . ; hpx} (Figure 1).
The fuzziness intervals of terms of length k are called depth k or k-intervals for
short. The set of all k-intervals of the terms of length k is denoted by Jk and put J =⋃
1≤k≤∞ Jk which is the set of all fuzziness intervals. So, by definition, each fuzziness
interval in J is associated with exactly one unique term x and, conversely, each term x
is associated exactly one unique fuzziness interval ℑfm(x) of the depth |x|, where |x|
denotes the length of x. By definition, set I has its own structure defined by the inclusion
and ordering relation between the intervals of the same depth. That is each fuzziness
interval has its own position in this structure and it represents a certain meaning of the
term that is associated with it.
Figure 1.
In [2, 4-6], the semantically quantifying mapping (SQM), υ : X → [0, 1], which
is induced by the given fuzziness measure or defined by the structure of J , has been
examined. For each term x ∈ X of length k, υ(x) is defined to be the value in ℑfm(x)
which is the common end-point of the (k+1)−intervals. ℑ(h1(x)) andℑ(h−1x) of Jk+1
35
Nguyen Tan An and Nguyen Van Quyen
is called the semantic value of the term x. This value divides internally the interval ℑ(x)
in the proportion α : β, if sign(hpx) = +1 or in the proportion β : α, if sign(hpx) = −1.
So, based on the structure of J , the SQM υ induced by fm can be computed as follows:
Let AX = (X,G,C,H,
∑
, φ,≤) be a free ComLin-HA and fm(c−), fm(c+) and
µ(h) be fuzziness measures of the primary terms c−, c+ and hedges h ∈ H , respectively,
which satisfy Properties 2) and 5). Then, the mapping υ can be computed recursively as
follows:
i) υ(W ) = κ = fm(c−), υ(c−) = κ − αfm(c−) = βfm(c−), υ(c+) = κ +
αfm(c+);
ii) υ(hjx) = υ(x) + Sgn(hjx){
∑j
i=Sgn(j) µ(hi)fm(x) − ω(hjx)µ(hj)fm(x)},
where ω(hjx) =
1
2
[1 + Sgn(hjx)Sgn(hphjx)(β − α)] ∈ {α; β}, ∀j ∈ [−q, p];
iii) υ(Φc−) = 0, υ(
∑
c−) = κ = υ(Φc+), υ(
∑
c+) = 1, and ∀j ∈ [−q∧p],
we have: υ(Φhjx) = υ(x) + Sgn(hjx){
∑j−Sgn(j)1 + Sgn(hjx)2
i=Sgn(j) µ(hj)fm(x)} and
υ(
∑
hjx) = υ(x) + Sgn(hjx){
∑j−Sgn(j)1 + Sgn(hjx)2
i=Sgn(j) µ(hj)fm(x)}.
* The uncertainty k-similarity relation
These two values t[x], s[x] inDom(X) is called the k−equal, denoted t[x] =k s[x],
if one of the following conditions are satisfied [6]:
(i) t[x] and s[x] denotes the same symbols;
(ii) There exists a similar class δk(u) of similar relationships δk for the k such that
Omin,k(t[x]) ⊆ δk(u) and Omin,k(s[x]) ⊆ δk(u)
For any two values t[x] and s[x] of Dom(X),standard for testing equation t[x] =k
s[x] is the following conditions:
- There exists an approximately similar level k, δku, of X such that
If t[x] and s[x] is the scrip value, then we must have t[x], s[x] ∈ δk(u);
If only one of two values t[x] and s[x] is the language value, such as t[x], then we
must have υ(t[x]), s[x] ∈ δk(u);
If the two values t[x] and s[x] are the language values, then we must have
υ(t[x]), υ(s[x]) ∈ δk(u);
To calculate the approximately same level k, k > 0 arbitrary, we use the fuzzy
intervals with degree k′ > k such that a sufficient number of values of each of the sets
Xj,(k) have less at least 2 fuzzy intervals with degree k
′ of its own. It is proved that the
conditions for k′ is for p, q > 1, then k′ > k. In contrast, the k′ > k + 1.
2.3. An example of queries in fuzzy databases
Consider a relation scheme of pupils R = {#CHILD, NAME, ANSWER 1,
ANSWER 2} (Table 1).
To verify this, we assume for simplicity sake that the linguistic attributes have the
36
Semantic quantitation of linguistic values embedded in complete linear hedge algebras...
same real domain, which is the interval [0, 100], and the same terms-domain, which is
modeled by the same linear hedge algebra AX = (X,G,C,H,
∑
, φ,≤), where G =
{false; true}, C = {0;W ; 1}, H = {M ;L}, in which M and L stand for More, Less
respectively, and ≤ is a semantics based order. To quantify this hedge algebra, we assume
that fm(true) = fm(c+) = 0.6, fm(false) = fm(c−) = 0.4, µ(L) = 0.25 , µ(L) =
0.25. Hence, we have α = β = 0.5.
Table 1. A relation scheme of pupils
#CHILD NAME ANSWER 1 ANSWER 2
1 John More True 80
2 Peter 65 True
3 William 90 95
4 Johnson Less True 80
5 Mary 60 70
6 Claude 55 More True
7 Martin 90 More True
8 Mitchell 65 True
i) Find out who having answer 1:More True and answer 2:More True?
ii) Find out who having answer 1:More True and answer 2: True?
First, we calculate the semantic values of the following terms noting that since
α = beta = 0.5, υA,r(x) is the center of the fuzziness interval ℑr(x) where the subscript
r indicates that the respective notations take values in the real domain [a, b] of the attribute
A in question, instead of taking in [0, 1]:
υA,r(true) = [fm(false)+0.5×fm(true)]×100 = [0.4+0.5×0.6]×100 = 70
υA,r(Ltrue) = [fm(false)+0.5×fm(Ltrue)]×100 = [0.4+0.5×0.25×0.6]×
100 = 47.5
υA,r(Mtrue) = [fm(false) + 0.5 × fm(Ltrue) + 0.5 × fm(Mtrue)] × 100 =
[0.4 + 0.25× 0.6 + 0.5× 0.25× 0.6]× 100 = 62.5
Now, we calculate certain 2− intervals of the domain [0, 100] as follows:
δ2,r(Mtrue) = θ2,r(true) = ℑr(Mtrue) ∪ ℑr(Ltrue)
= (70− 0.25× 0.6× 100, 70 + 0.25× 0.60× 100] = (55, 85].
δ2,r(Mtrue) = ℑr(Mtrue) = (100− 0.25× 0.60× 100, 100] = (85, 100].
δ2,r(Ltrue) = ℑr(LTrue) = (0.40×100, 0.40×100+0.25×0.60×100] = (40, 55].
Result for querying 1: William; Result for querying 2: John.
3. Conclusion
By using the approach of hedge algebra, we can perform complete semantic
structure of linguistic terms in term-domains of their respective linguistic variables or
linguistic attributes. The comparison operations are defined based on the κ− equalities,
κ−similar, where k is the length of terms which indicates the degree of these equalities.
The uncertainty or fuzziness of the uncertain equalities lies in the fuzziness intervals of
37
Nguyen Tan An and Nguyen Van Quyen
linguistic terms of length k used to define these equalities.
Under such semantics of κ − equalities, κ − similar of linguistic data, we can
make queries in a fuzzy data base represented by hedge algebra. When we see these values
clearly, we can see that they are individual cases of fuzzy data.
REFERENCES
[1] Nguyen, C. H. and Wechler, 1990. Hedge algebras: An algebraic approach to
structure of sets of linguistic truth values. Fuzzy Sets and Systems 35, pp. 281-293.
[2] Nguyen, C. H. and Wechler, 1992. Extended hedge algebras and their application to
fuzzy logic. Fuzzy Sets and Systems 52, pp. 259-281.
[3] N.C. Ho, 2003. Quantifying Hedge Algebras and Interpolation Methods in
Approximate Reasoning. Proc. of the 5th Inter. Conf. on Fuzzy Information
Processing, Beijing, March 1-4, pp. 105-112.
[4] N.C. Ho, N.V. Long, 2007. Fuzziness Measure on Complete Hedge Algebras and
Quantitative Semantics of Terms in Linear Hedge Algebras. Fuzzy Sets and Systems
158, pp. 452-471.
[5] N.C. Ho, H.V. Nam, T.D Khang and L.H. Chau, 1999. Hedge Algebras, Lingui