Semantic quantitation of linguistic values embedded in complete Linear Hedge algebras and querying in the linguistic database

Abstract. In fuzzy databases represented by linguistic values, linguistic terms carry qualitative semantics. However, when it comes to calculating linguistic terms or when a comparison between linguistic terms is needed, it is necessary to quantify their semantics. If the method of quantitation is suitable, the computational efficiency at the next step will be higher. When a database is embedded in Complete Linear Hedge Algebras, each base language values may have accompanying emphasis hedges. In such cases, the semantics of linguistic terms are quantified to become subintervals of an interval [0, 1]. This article is about quantification semantics of linguistic terms embedded in Complete Linear Hedge Algebras and a application in querying in the database having the linguistic terms. The calculation is done both to the language values and real numbers. Query results are quite reasonable.

9 trang | Chia sẻ: thanhle95 | Lượt xem: 729 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Semantic quantitation of linguistic values embedded in complete Linear Hedge algebras and querying in the linguistic database, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

JOURNAL OF SCIENCE OF HNUE Interdisciplinary Science, 2013, Vol. 58, No. 5, pp. 30-38 This paper is available online at SEMANTIC QUANTITATION OF LINGUISTIC VALUES EMBEDDED IN COMPLETE LINEAR HEDGE ALGEBRAS AND QUERYING IN THE LINGUISTIC DATABASE Nguyen Tan An1 and Nguyen Van Quyen2 1Faculty of Information Technology, Hanoi National University of Education 2Department of Post Graduate, Hai Phong University Abstract. In fuzzy databases represented by linguistic values, linguistic terms carry qualitative semantics. However, when it comes to calculating linguistic terms or when a comparison between linguistic terms is needed, it is necessary to quantify their semantics. If the method of quantitation is suitable, the computational efficiency at the next step will be higher. When a database is embedded in Complete Linear Hedge Algebras, each base language values may have accompanying emphasis hedges. In such cases, the semantics of linguistic terms are quantified to become subintervals of an interval [0, 1]. This article is about quantification semantics of linguistic terms embedded in Complete Linear Hedge Algebras and a application in querying in the database having the linguistic terms. The calculation is done both to the language values and real numbers. Query results are quite reasonable. Keywords: Computation with words, semantics of terms, hedge algebras, linguistic representation model. 1. Introduction Systems support making decisions increasingly tend to approach human processing information. This is how information processing of collected data or information that is none-precise, incomplete, uncertain and/or vague and described by words in natural languages take place. The processing of vague data represented by linguistic values usually occurs in following ways: i) Using fuzzy set theory: Each linguistic value is represented by a fuzzy set characterized by a membership function or a membership function plus a no-membership function. Received March 2, 2013. Accepted June 5, 2013. Contact Nguyen Tan An, e-mail address: [email protected] 30 Semantic quantitation of linguistic values embedded in complete linear hedge algebras... ii) Based on the possibility theory: Each language value is given an possibility distribution to receive values in the reference domain. iii) With an approach in hedge algebras, each language database is embedded in the hedge algebras. Each method involves a series of subsequent problems, such as the integration of fuzzy values, a comparison of fuzzy values, computing the distance between fuzzy values and measuring the similarity between the fuzzy values. Therefore the quality of the fuzzy systems when different methods are applied is not the same. When the language database is embedded in the hedge algebras, each base language value may have accompanying emphasis hedges. While this presents, a convenient way for computing, the semantics of linguistic terms are quantified to become subintervals of an interval [0, 1]. This paper presents the embedding language databases in the Complete Linear Hedge Algebra (ComLin-HA) and the application of semantic quantitation in order to query the data in this case. This paper include main contences: ComLin-HA, semantic quantitation of the language values, a example of queries in fuzzy databases and the last one, some conclusions. 2. Content 2.1. An overview of ComLin-HA 2.1.1. Hedge algebra (HA) Definition 2.1. An algebra X = (X,G,H,≤), where X is a term domain, G is a set of primary terms,H is a set of linguistic hedges, and ≤ is an semantically ordering relation (SOR) on X , is called a hedge algebra (HA) if it satisfies the following [1]: (A1). Each hedge is either positive or negative the others, including itself; (A2). If terms u and v are independent, then, ∀x ∈ H(u), we have x /∈ H(v). In addition, if u and v are incomparable, i.e., u ≮ v and v ≮ u, then so are x and y, for every x ∈ H(u) and y ∈ H(v); (A3). If x 6= hx , then x /∈ H(hx), and if h 6= k and hx ≤ kx , then h′hx ≤ k′kx, for all h, k, h′, k′ ∈ H and x ∈ X . Moreover, if hx 6= kx , then hx and kx are independent; (A4) If u /∈ H(v) and uleqv ((u ≥ v), then u ≤ hv (u ≥ hv) for any h ∈ H . 2.1.2. ComLin-HA A ComLin-HA is is presented by 7-tuple AX = (X,G,C,H, ∑ , φ,≤), where: X ∈ Dom(X) withDom(X) being the set of all terms of X; G = {0; c−;W ; c+; 1} is the set of its primary terms, in which W is the neutral concept positioned in between the two primary terms, i.e. we have 0 ≤ c− ≤ W ≤ c+ ≤ 1; C = {c−; c+} is called a set of basic words; H is the set of hedges considered as to be unary operations on X .H = H− ∪H+; 31 Nguyen Tan An and Nguyen Van Quyen The effect of h acting on x is expressed by the fact that either hx ≥ x or hx ≤ x. In the case that kx ≥ hx ≥ x or y ≥ hy ≥ ky, for a certain x or y, we say that the effect of k is greater than the effect of h, and write k ≥ h. H is divided intoH+ andH−: H+ = {h ∈ H : hc− ≤ c−or hc+ ≥ c+} H− = {h ∈ H : hc− ≥ c−or hc+ ≤ c+} In this paper, we denote by H− = {h0; h1; . . . ; h−q}, h0 < h1 < . . . < hp and H+ = {h0; h1; . . . ; hp}, h0 1 are the hedges. Some time, h0 is replayed by I (identity): ∀x, Ix = x.∑ is the operator such that ∑ x = supremumH(x); φ is the operator such that φx = infimumH(x); And ≤ is an order relation on X. For example: Let a set of basic words C be {true; false}, that mean c+ = true, c− = false. If a set of hedges H be {Very;More; Identity;Less;Possible}, in which, Identity is the identity operator, not only is a positive, but there is also a negative. For short we denote H = {V ;M ; I;L;P}. It is clear that: Pfalse ≥ false and Ptrue ≤ true. Lfalse ≥ false and Ltrue ≤ true. Mfalse ≤ false andMtrue ≥ true. V false ≤ false and V true ≥ true. Therefore H+ = {V ;M ; I};H− = {I;L;P}. H(true) = {true;Ltrue;Mtrue;LLtrue;LMtrue;MMtrue, . . .}. and H(false) = {false;Lfalse;Mfalse;LLfalse;LMfalse;MMfalse; . . .}. 2.2. Semantic quantitation of linguistic terms 2.2.1. Semantic quantitation of ComLine-HA Definition 2.2. Let X be a HA. f : X → [0, 1] is called the function for the semantic quantitation of AX if ∀h, k ∈ H+ or ∀h, k ∈ H−, and ∀x, y ∈ X we have: |f(hx)− f(x)| |f(kx)− f(x)| = |f(hy)− f(y)| |f(ky)− f(y)| 2.2.2. Fuzziness measures of term x Definition 2.3. Let X be a HA. fm : X → [0, 1] is called the fuzziness measure of term x if (1). fm(c−) = θ and fm(c+) = 1 − θ, in which θ ∈ 0[0, 1] and∑ −q≤i≤q,i 6=0 fm(hi(u)) = fm(u), u ∈ H(x). 32 Semantic quantitation of linguistic values embedded in complete linear hedge algebras... (2). If x is a script concept, i.e. H(x) = x, then fm(x) = 0. Hence, fm(0) = fm(W ) = fm(1) = 0. (3). ∀x, y ∈ X , ∀h ∈ H , we have fm(hx) fm(hy) = fm(hy) fm(y) . That means that this ratio does not depends on x and y. This ratio is denoted by µ(h) and is called the fuzziness measure of hedge h. Based on the structure of the algebras, the fuzziness measure fm of term x ∈ X, defined by an axiomatization, has the following properties [2-5]: Based on the structure of the algebras, the fuzziness measure fm of the term x ∈ X , defined by an axiomatization, has the following properties [2-5]: i) fm(hx) = µ(h)fm(x), ∀x ∈ X and fm(x) = 0, ∀x ∈ LDom(X); ii) fm(c−) + fm(c+) = 1; iii) ∑ −q≤i≤p,i 6=0 fm(hic) = fm(c), where c ∈ c−, c+; iv) ∑ −q≤i≤p,i 6=0 fm(hix) = fm(x), x ∈ X; v) sum−q≤i≤−1µ(hi) = α and ∑ 1≤i≤p = β, where alpha, β > 0 and α + β = 1, 2.2.3. Algebraic signs of Linguistic terms We observe that the semantic order-based semantics of terms leads to the fact that terms and hedges have the so-called semantic tendencies that can be recognized as follows: - As above presented, for x = c ∈ G, the comparability of hc and c implies that H is partitioned into two sets: H+ = {h ∈ H : hc− ≤ c− or hc+ ≥ c+}, which consists of the hedges that increase the both semantic tendencies of the primary terms; H− = {h ∈ H : hc− ≥ c− or hc+ ≤ c+}, which consists of the hedges that decrease these semantic tendencies. This leads to the interesting concept that the primary terms and hedges have their own “algebraic” sign: sign(c+) = +1, sign(c−) = −1, sign(h) = +1, for h ∈ H+, and sign(h) = −1, for h ∈ H−. - The comparability of khx and hx implies also that either k increases or decreases the effect of h. For instance, having x ≤ hx the inequalities x ≤ hx ≤ khx state that k increases the effect of h and x ≤ khx ≤ hx state that k decreases the effect of h. We shall write sign(k, h) = +1, for the former case, and sign(k, h) = −1, for the latter one. They are called the relative sign of k with respect to h. For example, it can be verified that sign(V, L) = +1 while sign(L, V ) = −1. - If hh′x 6= h′x, we say that the effect of h in the expression hh′x is proper. A string hn . . . h1c, where hi ∈ H and c ∈ G, is said to be a canonical string representation of x if x = hn . . . h1c and the effect of all h ′ is is proper in this expression. It was proved that the canonical representation of x is unique, for any term x, and, hence, we may define the length of x, denoted by |x|, which is just the length of the string hm . . . h1c of x. Now, the sign of a linguistic term x, can be defined by Sgn(x) = sign(hm, hm− 1)× . . .× sign(h2, h1)× sign(h1)× sign(c) The meaning of Sgn is expressed as follows: 33 Nguyen Tan An and Nguyen Van Quyen (Sgn(hx) = +1) ⇒ (hx ≥ x) and (Sgn(hx) = −1) ⇒ (hx ≤ x) For example, since Sgn(V L_true) = sign(V, L)sign(L)sign(true) = −1, we have V Ltrue ≤ Ltrue. On this basis, we have Definition 2.4. Sgn : X → {−1, 0, 1} is defined as follow: (1). Sgn(c−) = −1 and Sgn(hc−) = { +Sgn(hc−) if hc− < c− −Sgn(hc−) if hc− ≥ c− (2). Sgn(c+) = +1 and Sgn(hc+) = { +Sgn(hc+) if hc+ < c+ −Sgn(hc+) if hc+ ≥ c+ (3). Sgn(h′hx) = −Sgn(hx) if h′ is negative for h and hh′x 6= hx (4). Sgn(h′hx) = +Sgn(hx) if h′ is posstive for h and hh′x 6= hx (5). Sgn(h′hx) = 0 if hh′x = hx 2.2.4. Semantic heredity – an essential meaning of linguistic hedges An essential property of hedges is the so-called semantic heredity, which states that the terms generated from a given term x by using hedges must inherit or contain the (genetic) core meaning of x own. This implies that hedges cannot change the essential meaning of terms expressed in terms of the semantic order relation ≤, i.e. it results in the following: + If the meaning of hx and kx is expressed by the order relationship hx ≤ kx, h 6= k, then any hedges h′ and k′ cannot change this semantic relationship, that is hx 6= kx⇒ h′hx ≤ k′kx + Similarly, if the meaning of x and hx is expressed by either x ≤ hx or hx ≤ x, then x ≤ hx⇒ x ≤ h′hx or hx ≤ x⇒ h′hx ≤ x Assuming that H− = {h0; h−1; . . . ; h−q} and H+ = {h0; h1; . . . ; hp}, where h0 = I , the artificial hedge identity, and h0 < h−1 < . . . < h−q and h0 < h1 < . . . < hp, the hedge heredity leads to the following results: + For Sgn(hpx) = −1, H(hpx) ≤ . . . ≤ H(h1x) ≤ x ≤ H(h−1x) ≤ . . . ≤ H(h−qx) + For Sgn(hpx) = +1, H(h−qx) ≤ . . . ≤ H(h−1x) ≤ x ≤ H(h1x) ≤ . . . ≤ H(hpx) with a note that H(h0x) = H(Ix) = x, as our convention. In particular, we have: 0 ≤ H(c−) ≤ W ≤ H(c+) ≤ 1 + The sets H(hjx), j ∈ [−q∧p], where [−q∧p] = {j| − q ≤ j ≤ p}, constitute a partition of H(x), i.e. they are disjoint and H(x) = ⋃ j∈[−q∧p] H(hjx) These only such listed properties show already that term-domains of linguistic variables with such qualitative semantics of terms possess a rich order-based structure. Therefore, we may observe that hedge algebras are formalized structures of the qualitative semantics of term-domains, noting that the meaning of a term represented in a formalized structure carries much information than by a fuzzy set itself, in general. 34 Semantic quantitation of linguistic values embedded in complete linear hedge algebras... 2.2.5. Fuzziness measure of terms using fuzziness intervals * Fuzziness interval of x ∈ X For every term x, the fuzziness interval of x ∈ X is a subinterval of [0, 1] of length fm(x), denoted by ℑfm(x), which will be constructed by induction on the length of x as follows: (i) For x of length 1, i.e. x ∈ c+; c−, ℑfm(c−) and ℑfm(c+) are intervals which constitute a partition of [0, 1] and satisfy the conditions that c− ≤ c+ implies ℑfm(c−) ≤ ℑfm(c+), |ℑfm(c−)| = fm(c−) and |ℑfm(c+)| = fm(c+), where |ℑ(c+)| denotes the length of ℑ(x) and the notation U ≤ V means that, ∀x ∈ U, ∀y ∈ V , we have x ≤ y. (ii) Suppose that ℑfm(x) has been defined and |ℑfm(x)| = fm(x), ∀x of length k. Then, the fuzziness intervalsℑfm(hix) : i ∈ [−q∧p] are constructed so that they constitute a partition of ℑfm(x) and satisfy the conditions that |ℑfm(hix)| = fm(hix) and ℑfm(hix) : i ∈ [−q∧p] is a linearly ordered set, whose order is induced by the order of the set {h−qx; h−q+1x; . . . ; hpx} (Figure 1). The fuzziness intervals of terms of length k are called depth k or k-intervals for short. The set of all k-intervals of the terms of length k is denoted by Jk and put J =⋃ 1≤k≤∞ Jk which is the set of all fuzziness intervals. So, by definition, each fuzziness interval in J is associated with exactly one unique term x and, conversely, each term x is associated exactly one unique fuzziness interval ℑfm(x) of the depth |x|, where |x| denotes the length of x. By definition, set I has its own structure defined by the inclusion and ordering relation between the intervals of the same depth. That is each fuzziness interval has its own position in this structure and it represents a certain meaning of the term that is associated with it. Figure 1. In [2, 4-6], the semantically quantifying mapping (SQM), υ : X → [0, 1], which is induced by the given fuzziness measure or defined by the structure of J , has been examined. For each term x ∈ X of length k, υ(x) is defined to be the value in ℑfm(x) which is the common end-point of the (k+1)−intervals. ℑ(h1(x)) andℑ(h−1x) of Jk+1 35 Nguyen Tan An and Nguyen Van Quyen is called the semantic value of the term x. This value divides internally the interval ℑ(x) in the proportion α : β, if sign(hpx) = +1 or in the proportion β : α, if sign(hpx) = −1. So, based on the structure of J , the SQM υ induced by fm can be computed as follows: Let AX = (X,G,C,H, ∑ , φ,≤) be a free ComLin-HA and fm(c−), fm(c+) and µ(h) be fuzziness measures of the primary terms c−, c+ and hedges h ∈ H , respectively, which satisfy Properties 2) and 5). Then, the mapping υ can be computed recursively as follows: i) υ(W ) = κ = fm(c−), υ(c−) = κ − αfm(c−) = βfm(c−), υ(c+) = κ + αfm(c+); ii) υ(hjx) = υ(x) + Sgn(hjx){ ∑j i=Sgn(j) µ(hi)fm(x) − ω(hjx)µ(hj)fm(x)}, where ω(hjx) = 1 2 [1 + Sgn(hjx)Sgn(hphjx)(β − α)] ∈ {α; β}, ∀j ∈ [−q, p]; iii) υ(Φc−) = 0, υ( ∑ c−) = κ = υ(Φc+), υ( ∑ c+) = 1, and ∀j ∈ [−q∧p], we have: υ(Φhjx) = υ(x) + Sgn(hjx){ ∑j−Sgn(j)1 + Sgn(hjx)2 i=Sgn(j) µ(hj)fm(x)} and υ( ∑ hjx) = υ(x) + Sgn(hjx){ ∑j−Sgn(j)1 + Sgn(hjx)2 i=Sgn(j) µ(hj)fm(x)}. * The uncertainty k-similarity relation These two values t[x], s[x] inDom(X) is called the k−equal, denoted t[x] =k s[x], if one of the following conditions are satisfied [6]: (i) t[x] and s[x] denotes the same symbols; (ii) There exists a similar class δk(u) of similar relationships δk for the k such that Omin,k(t[x]) ⊆ δk(u) and Omin,k(s[x]) ⊆ δk(u) For any two values t[x] and s[x] of Dom(X),standard for testing equation t[x] =k s[x] is the following conditions: - There exists an approximately similar level k, δku, of X such that If t[x] and s[x] is the scrip value, then we must have t[x], s[x] ∈ δk(u); If only one of two values t[x] and s[x] is the language value, such as t[x], then we must have υ(t[x]), s[x] ∈ δk(u); If the two values t[x] and s[x] are the language values, then we must have υ(t[x]), υ(s[x]) ∈ δk(u); To calculate the approximately same level k, k > 0 arbitrary, we use the fuzzy intervals with degree k′ > k such that a sufficient number of values of each of the sets Xj,(k) have less at least 2 fuzzy intervals with degree k ′ of its own. It is proved that the conditions for k′ is for p, q > 1, then k′ > k. In contrast, the k′ > k + 1. 2.3. An example of queries in fuzzy databases Consider a relation scheme of pupils R = {#CHILD, NAME, ANSWER 1, ANSWER 2} (Table 1). To verify this, we assume for simplicity sake that the linguistic attributes have the 36 Semantic quantitation of linguistic values embedded in complete linear hedge algebras... same real domain, which is the interval [0, 100], and the same terms-domain, which is modeled by the same linear hedge algebra AX = (X,G,C,H, ∑ , φ,≤), where G = {false; true}, C = {0;W ; 1}, H = {M ;L}, in which M and L stand for More, Less respectively, and ≤ is a semantics based order. To quantify this hedge algebra, we assume that fm(true) = fm(c+) = 0.6, fm(false) = fm(c−) = 0.4, µ(L) = 0.25 , µ(L) = 0.25. Hence, we have α = β = 0.5. Table 1. A relation scheme of pupils #CHILD NAME ANSWER 1 ANSWER 2 1 John More True 80 2 Peter 65 True 3 William 90 95 4 Johnson Less True 80 5 Mary 60 70 6 Claude 55 More True 7 Martin 90 More True 8 Mitchell 65 True i) Find out who having answer 1:More True and answer 2:More True? ii) Find out who having answer 1:More True and answer 2: True? First, we calculate the semantic values of the following terms noting that since α = beta = 0.5, υA,r(x) is the center of the fuzziness interval ℑr(x) where the subscript r indicates that the respective notations take values in the real domain [a, b] of the attribute A in question, instead of taking in [0, 1]: υA,r(true) = [fm(false)+0.5×fm(true)]×100 = [0.4+0.5×0.6]×100 = 70 υA,r(Ltrue) = [fm(false)+0.5×fm(Ltrue)]×100 = [0.4+0.5×0.25×0.6]× 100 = 47.5 υA,r(Mtrue) = [fm(false) + 0.5 × fm(Ltrue) + 0.5 × fm(Mtrue)] × 100 = [0.4 + 0.25× 0.6 + 0.5× 0.25× 0.6]× 100 = 62.5 Now, we calculate certain 2− intervals of the domain [0, 100] as follows: δ2,r(Mtrue) = θ2,r(true) = ℑr(Mtrue) ∪ ℑr(Ltrue) = (70− 0.25× 0.6× 100, 70 + 0.25× 0.60× 100] = (55, 85]. δ2,r(Mtrue) = ℑr(Mtrue) = (100− 0.25× 0.60× 100, 100] = (85, 100]. δ2,r(Ltrue) = ℑr(LTrue) = (0.40×100, 0.40×100+0.25×0.60×100] = (40, 55]. Result for querying 1: William; Result for querying 2: John. 3. Conclusion By using the approach of hedge algebra, we can perform complete semantic structure of linguistic terms in term-domains of their respective linguistic variables or linguistic attributes. The comparison operations are defined based on the κ− equalities, κ−similar, where k is the length of terms which indicates the degree of these equalities. The uncertainty or fuzziness of the uncertain equalities lies in the fuzziness intervals of 37 Nguyen Tan An and Nguyen Van Quyen linguistic terms of length k used to define these equalities. Under such semantics of κ − equalities, κ − similar of linguistic data, we can make queries in a fuzzy data base represented by hedge algebra. When we see these values clearly, we can see that they are individual cases of fuzzy data. REFERENCES [1] Nguyen, C. H. and Wechler, 1990. Hedge algebras: An algebraic approach to structure of sets of linguistic truth values. Fuzzy Sets and Systems 35, pp. 281-293. [2] Nguyen, C. H. and Wechler, 1992. Extended hedge algebras and their application to fuzzy logic. Fuzzy Sets and Systems 52, pp. 259-281. [3] N.C. Ho, 2003. Quantifying Hedge Algebras and Interpolation Methods in Approximate Reasoning. Proc. of the 5th Inter. Conf. on Fuzzy Information Processing, Beijing, March 1-4, pp. 105-112. [4] N.C. Ho, N.V. Long, 2007. Fuzziness Measure on Complete Hedge Algebras and Quantitative Semantics of Terms in Linear Hedge Algebras. Fuzzy Sets and Systems 158, pp. 452-471. [5] N.C. Ho, H.V. Nam, T.D Khang and L.H. Chau, 1999. Hedge Algebras, Lingui