Advances in Information Mining 2010

Page 1

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-08-12

A new fuzzy MADM approach used for finite selection problem Muley A.A. and Bajaj V.H.* *Department of Statistics, Dr. B. A. M. University, Aurangabad (M.S.)-431004, India vhbajaj@gmail.com, aniket.muley@gmail.com Abstract- This paper proposes a new approach to product configuration by applying the theory of Fuzzy Multiple Attribute Decision Making (FMADM), which focuses on uncertain and fuzzy requirements the customer, submits to the product supplier. The proposed method can be used in e-commerce websites, with which it is easy for customers to get his preferred product according to the utility value with respect to all attributes. The main concern of this paper, in which requirements the customer submitted to the configuration of television is vague. Further verify the validity and the feasibility of the proposed method compared with Weighted Product Method (WPM). Finally, the television is taken as an example to demonstrate the proposed methods. Keywords- MADM, Fuzzy, Triangular fuzzy number, T.V., Uncertainty Introduction Real world problems are often require a decision maker (DM) to rank discrete alternatives or, at least, to select best one. The MADM theory was developed to help the DM to solve such problems. MADM has been one of the fastest growing areas during the last decades depending on the changing in the business sector; Hwang & Yoon [1], Turban [4]. We focus on MADM which is used in a finite ‘selection’ or choice problem. In real world problem, MADM play most important role. Now a day’s television is the common in every person’s life. Here, we take as an application of selection of television configuration. Generally, common people purchase 21” size for house purpose; therefore we choose the most common size. Mass customization, as a business strategy, aims at meeting diverse customer needs while maintaining near mass production efficiency, can implement both economies of scale and scope for an enterprise, and has become the goal that the companies pursue; Zhu & Jiang [7]. In order to reach the goal, companies are often forced to adopt differentiation strategy to offer customer more choices of products to meet the growing individualization of demand, by giving a more customer-centric role. The configuration approaches based on rules which are usually dependent on expert’s experience to establish. The configuration is one of the most important ways to realize quickly product customization. But, in business, particularly through the internet, a customer normally develops in his mind some sort of ambiguity, given the choice of similar products. The main concern is the requirements of the customers with respect to configuration of television which are vague. The television is taken as an example to further verify the validity and the feasibility of the proposed method and compared with WPM by Millar & Starr [2]. Framework of product configuration based on uncertain customers requirements Each attribute has a finite set of possible values, in which, the variant is defined by using attributes and attribute values. Together, all attributes and attribute values describe a

complete range of the product family. Products in the same product family vary according to different attribute and its attribute value, choosing a product could be considered as a process of choosing its attributes and different attribute value. But, generally, it is difficult for a customer to express his requirements in a clear and unambiguous way, which is often due to the fact that he is not thoroughly familiar with the product which is the supplier offers. So, the requirements are often vague and fuzzy, preference weight varies with respect to different product attributes. We describe the customers’ vague and uncertain requirements in the form of fuzzy number by using the method of representation of fuzzy set. It is also the design to solve the configuration problem of the uncertain environment. As we know, there are various attributes in different products, but in which some attributes, such as color, shape, and so on, are not suitable to be represented as a fuzzy number. These attributes are often clear in the customers mind and the customer could select the attribute value by seeing the virtual product model in some browser environment. In realistic configuration system, it could be achieved by selecting the corresponding attribute value that the customer prefers directly. By using the theory of fuzzy MADM, the requirements the customer decided with respect to corresponding product attribute can be regarded as an ideal product. Firstly, the uncertain attribute value the customer decided would be represented in the form of triangular fuzzy number or interval fuzzy number, which is the most common way to solve uncertain, imprecise problems. Moreover, as the attribute values of the alternate products for the customer to select are determinate, which are usually definite and known, therefore, it is impossible to measure directly the distance or similarity degree between the ideal product the customer wanted and the alternate products. The definite attribute value is converted into the form of the fuzzy number so as to compute the distance between two fuzzy numbers. When choosing a product from a number of similar alternatives, customer

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


A new fuzzy MADM approach used for finite selection problem

normally develops some sort of ambiguity. The ambiguity is mainly due to two reasons. Firstly, how to make a final product choice to purchase and secondly, on what basis the other products will be rejected. In order to answer the above questions, the customer may like to classify the products in different preference levels, preferably through some numerical strength of preference by Mohanty & Bhasker [3]. We adopt the triangular fuzzy number to represent the vague requirements provided by the customers it is shown by Fig. (1).

 0, x < a  x−a µ A%( x ) =  , a ≤ x ≤ b b − a c − x  c − b , x > c

ω = (ω1, ω2 , ..., ωn )T

• Let

vector where ∑ n

of

the

weights,

, j =1ω j = 1, ω j ≥ 0, j = 1, 2..., n

and • Let

ω j denotes the weight of attribute Aj . R = (rij )m × n denote the m × n

decision matrix, where

rij ( ≥ 0)

is the

performance rating of alternative attribute

(1)

be

Xi

with

respect

to

Aj .

Normally, there are basically two types of attributes for a MADM problem, the first type is of ‘cost’ nature, and the second type is of ‘benefit’ nature. Since the attributes are generally incommensurate, the decision matrix needs to be normalized so as to transform the various attribute values into comparable ones. A common method of normalization is given as

rij − rmin j

Zij =

rjmax − rjmin

, i =1,..., m; j = 1,..., n;

for benefit attribute (2) and

rjmax − rij

Zij = Fuzzy MADM methodology As we know, when a customer chooses his preferred product from many candidate products, it is done, in fact, by comparing different attributes that could be used to describe product performance in different aspects, and ranking these products according to the customer’s subjective preference. The customer requirements for products are usually uncertain and vague due to unable to understand product specifications comprehensively. On the other hand, the attribute values or specification of products offered by manufacturers are determinate and known. The model of fuzzy MADM has been introduced firstly by Yang & Chou [6]. The general MADM model can be described as follows: • Let finite

X = { X i | i = 1, 2,..., m} discrete set of m ( ≥ 2)

denote a possible

finite

A = { Aj | j = 1, 2,..., n} denote set of n ( ≥ 2) attributes according

which the desirability of an alternative is to be judged,

a to

, i = 1,..., m; j = 1,..., n;

for cost attribute (3) Where

Zij

is the normalized attribute value,

r jmax and r jmin

given by,

r max = max(r1 j , r2 j ,..., rmj ) j = 1,..., n; (4) j r jmin = min(r1 j , r2 j ,..., rmj ) j = 1,..., n; (5) Let

Z = ( Zij ) m × n be the normalized decision

matrix. According to the SAW method, the overall weighted assessment value of alternative

n

di = ∑ Zijω j i = 1,..., m (6) i =1

Where

di

is a linear function of weight

variables and the greater the value of better the alternative

alternatives (courses of action, candidates); • Let

rjmax − rjmin

X i . The aim

di the

of MADM is

to rank alternatives or to determine the best alternative with the highest degree of desirability with respect to all relevant attributes. So, the best alternative is the one with the greatest overall weighted assessment value. The classic MADM techniques assume

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

9


Muley AA and Bajaj VH

all

rij

values are crisp numbers. In the

practical MADM problems,

rij

values can be

crisp and/or fuzzy data. Fuzzy MADM methods have been developed due to the lack of precision in accessing the performance rating of alternatives with respect to an attribute, in which the representation of

rij

values are

often linguistic terms or fuzzy numbers. Configuration approach based on fuzzy MADM is introduced in details the algorithm which includes the following steps: Step 1: Representation of fuzzy requirements When choosing a product from a number of similar alternatives, a customer normally develops in his mind some sort of ambiguity. Step 2: Similarity measure In step 1, the customer’s requirements have been described as the triangular fuzzy number with respect to different product attributes. In this step, we will take the requirement vector as the ideal product the customer really wants, with the purpose to measure the similarity degree with the existing product vectors, in which the specification values are known and determinate. As we know, the fuzzy numbers can not be compared with crisp ones directly unless the unfuzzy numbers have to be transformed into the form of fuzzy numbers firstly. For example, for a crisp number b, the form of its triangular fuzzy can be written as follows:

b%= (bL , bM , bU ) (7) L

M

U

Where b = b = b , and Similarity measure between two triangular fuzzy numbers can be calculated with Eq. (8); Xu [5], L L MM UU ab +a b +ab (8) L2 M2 max((a ) +(a ) +(aU)2,(bL)2 +(bM)2 +(bU)2)

sab ( , )=

Where the above two triangular fuzzy numbers L

M

U

L

M

U

are a = (a , a , a ) and b = (b , b , b ) , respectively. In realistic configuration system, it could be achieved by selecting an attribute value the customer prefers from the given alternate options. The similarity measure of this type of attributes is defined as follows:

1, a ' = b ' ' ' s(a , b ) =  (9) ' ' 0, a ≠ b Step 3: Construction of Decision Matrix (DM). Calculation result of similarity measure between alternate products and the ideal product can be concisely expressed in a matrix format which is called decision matrix in MADM problems, and in which columns indicate product attributes and rows alternate products. Thus, an element

Sij

in the in Eq. (10)

denotes the similarity degree to the ideal 10

product of the ith product with respect to the j attribute.

 S11 S12 S S22 DM =  21  ... ...   Sm1 Sm 2

... ... Sij ...

th

S1n  S2 n  ...  (10)  Smn 

Step 4: Normalization: In order to eliminate the difference of dimension among different attributes, the operation of normalization is needed to transform various attribute dimensions into the non-dimensional attribute. Here, we adopt the Eqs. (11) and (12) to complete normalization of the fuzzy number.

r% i =(

ai max ci

,

bi max bi

,

ci max ai

∧ 1)

for

benefit attribute(11)

r% i =(

cimin cimin cimin , , ∧ 1) ci bi ai

for cost

attribute (12) Where

(g)imax = max{(g)i } and (g)imin = min{(g)i } i i Step 5: Rank of the alternate products The element S in the decision matrix reflects ij the closeness degree of the ideal product with th the ith alternate product with respect to the j attribute. In this step, we can use the SAW method, which is widely used in MADM, to calculate the utility value with respect to all attributes, with which the ranking order of alternate products according to utility value can be obtained. And we can consider the product with the highest utility value as the closest one to that of the customer requires. The utility th value of i alternate product can be calculated with Eq. (13).

n

Ui = ∑ xijω j ,

i = 1, 2,..., m (13) j =1 And the maximum of utility value can be written as Eq. (14). n

Umax = max ∑ xijω j i

j=1

i =1,2,..., m (14)

Here, we compared with the WPM by Millar & Starr [2] and check the feasibility of the customer’s requirement

n

U i = ∏ xij

ωj

i = 1, 2,..., m (15)

j =1

Case study In this section, we take the television as an example to illustrate the method mentioned above. Table 1 shows the television that could

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


A new fuzzy MADM approach used for finite selection problem

be used to configure for the different customers with respect to different attributes, in which the corresponding attributes are described as follows: Table 1 Configuration of Television Sr. No. P1

Speakers

Watt

Channels

Price

6

1800

200

10300

P2

2

110

100

9790

P3

5

500

200

11990

P4

4

1200

200

12400

P5

2

200

200

9400

P6

2

400

100

11490

P7

2

250

200

9300

P8

4

500

200

9900

Suppose that the ideal product the customer wants according to the above attributes and the corresponding preference weight are shown in Table 2. Table 2 The ideal product and attribute weight Attributes

Ideal

Lower

Upper

Weight

Speakers

5

2

8

0.25

Watt

1000

200

2400

0.2

Channels

150

100

250

0.25

Price

10,000

9,000

12,000

0.30

The vector of the ideal product can be represented as the following form of the triangular fuzzy number.

Table 4 Utility value configuration by SAW

of

each

product

P1

P2

P3

P4

P5

P6

P7

P8

0.8447

0.5039

0.7214

0.7462

0.5791

0.5243

0.5815

0.7058

The Table 4 presents the final utility value, with which the customer can rank the candidate products according to his preference to different attributes, and the order that shows the closeness degree to the customer requirements can be written as follows:

P1 > P4 > P3 > P8 > P7 > P5 > P6 > P2 Here, we compare the above method with the WP method and check the feasibility of the customer requirement calculated by Eq. (15), we get Table 5 Utility value of each product configuration by WPM P1

P2

P3

P4

P5

P6

P7

P8

0.8373

0.356

0.6637

0.7398

0.4446

0.4558

0.4635

0.6452

P1 > P4 > P3 > P8 > P7 > P6 > P5 > P2 Due to the uncertainty of the customers’

requirements and the fact that different C%= [(2,5,8), (200,1000, 2400), (100,150, 250), (9 '000,10may '000,12 algorithms yield '000)] different results, therefore, The corresponding vector of the attribute weight can be written as the follows:

ω = (0.25,0.20,0.25,0.30) The decision matrix, which shows the similarity degree with respect to each attribute between the ideal television that the customer desired and the candidate ones, is shown in Table 3 by using Eqs. (8)-(12). Table 3 Decision Matrix Sr. No. P1

U1

U2

U3

U4

0.8333

0.6666

0.8333

0.9824

P2

0.3225

0.0582

0.5263

0.738

P3

0.8064

0.2647

0.8333

0.8618

P4

0.6451

0.6329

0.8333

0.8333

P5

0.3225

0.1058

0.8333

0.8966

P6

0.3225

0.2117

0.5263

0.897

P7

0.3225

0.1323

0.8333

0.887

P8

0.6451

0.2647

0.8333

0.9443

The utility value of all candidate products with respect to all attributes can be calculated by Eq. (13) and the final calculated results are given below:

in the realistic configuration system, several products that have the higher similarity degree to that of the customer requires can be presented for customer to choose so as to satisfy the customer requirements to the greatest degree. Conclusion This paper proposes an approach to realize product level configuration according to the fuzzy and uncertain customer requirements by using the theory of the fuzzy MADM. The television is taken as an example to demonstrate feasibility of the proposed method for solving uncertain customer requirements. When results of SAW and WPM are compared, we get the same preferences to our problem and the optimal solution for selection of television is P1. References [1] Hwang C.L. and Yoon K. P. (1981) Springer, Berlin. [2] Millar D.W., Starr M.K. (1969) Prentice Hall, Englewood Cliffs, New Jersey. [3] Mohanty B.K. & Bhasker B. (2005) Decision Support Systems, .38, 611– 619. [4] Turban E. (1988) Macmillan, New York.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

11


Muley AA and Bajaj VH

[5] [6] [7]

12

Xu Z. S. (2002) Systems Engineering and Electronics, 124, 9–12. Yang T. & Chou P. (2005) Mathematics and Computers in Simulation, 68, 9– 21. Zhu B. & Jiang P. Y. (2005) The International Journal of Product Development, 2, 155–169.

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-13-17

Ant based rule mining with parallel fuzzy cluster 1

Sankar K. and Krishnamoorthy K. 1

2

Department of Master of Computer Applications, KSR College of Engineering, Tiruchengode, san_kri_78@yahoo.com 2 Department of Computer Science and Engineering, SONA College of Technology, Salem, kkr_510@yahoo.co.in

Abstract- Ant-based techniques, in the computer sciences, are designed those who take biological inspirations on the behavior of these social insects. Data clustering techniques are classification algorithms that have a wide range of applications, from Biology to Image processing and Data presentation. Since real life ants do perform clustering and sorting of objects among their many activities, we expect that an study of ant colonies can provide new insights for clustering techniques. The aim of clustering is to separate a set of data points into self-similar groups such that the points that belong to the same group are more similar than the points belonging to different groups. Each group is called a cluster. Data may be clustered using an iterative version of the Fuzzy C means (FCM) algorithm, but the draw back of FCM algorithm is that it is very sensitive to cluster center initialization because the search is based on the hill climbing heuristic. The ant based algorithm provides a relevant partition of data without any knowledge of the initial cluster centers. In the past researchers have used ant based algorithms which are based on stochastic principles coupled with the k-means algorithm. The proposed system in this work use the Fuzzy C means algorithm as the deterministic algorithm for ant optimization. The proposed model is used after reformulation and the partitions obtained from the ant based algorithm were better optimized than those from randomly initialized hard C Means. The proposed technique executes the ant fuzzy in parallel for multiple clusters. This would enhance the speed and accuracy of cluster formation for the required system problem. 1. INTRODUCTION Research in using the social insect metaphor for solving problems is still in its infancy. The systems developed using swarm intelligence principles emphasize distributiveness, direct or indirect interactions among relatively simple agents, flexibility and robustness [4]. Successful applications have been developed in the communication networks, robotics and combinatorial optimization fields. 1.1 ANT COLONY OPTIMIZATION Many species of ants cluster dead bodies to form cemeteries, and sort the larvae into several piles [4]. This behavior can be simulated using a simple model in which the agents move randomly in space and pick up and deposit items on the basis of local information. The clustering and sorting behavior of ants can be used as a metaphor for designing new algorithms for data analysis and graph partitioning. The objects can be considered as items to be sorted. Objects placed next to each other have similar attributes. This sorting takes place in two-dimensional space, offering a low-dimensional representation of the objects. Most swarm clustering work has followed the above model. In the work, there is implicit communication among the ants making up a partition. The ants also have memory. However, they do not pick up and put down objects but rather place summary objects in locations and remember the locations that are evaluated as having good objective function values. The objects represent single dimensions of multidimensional cluster centroids which make up a data partition.

1.2 CLUSTERING The aim of cluster analysis is to find groupings or structures within unlabeled data [5]. The partitions found should result in similar data being assigned to the same cluster and dissimilar data assigned to different clusters. In most cases the data is in the form of real-valued vectors. The Euclidean distance is one measure of similarity for these data sets. Clustering techniques can be broadly classified into a number of categories [6]. Hard C Means (HCM) is one of the simplest unsupervised clustering algorithms for a fixed number of clusters. The basic idea of the algorithm is to initially guess the centroids of the clusters and then refine them. Cluster initialization is very crucial because the algorithm is very sensitive to this initialization. A good choice for the initial cluster centers is to place them as far away from each other as possible. The nearest neighbor algorithm is then used to assign each example to a cluster. Using the clusters obtained, new cluster centroids are calculated. The above steps are repeated until there is no significant change in the centroids. Hard clustering algorithms assign each example to one and only one cluster. This model is inappropriate for real data sets in which the boundaries between the clusters may not be well defined. Fuzzy algorithms can partially assign data to multiple clusters. The strength of membership in the cluster depends on the closeness of the example to the cluster center. The Fuzzy C Means algorithm (FCM), allows an example to be a partial member of more than one cluster. The FCM algorithm is based on

Copyright Š 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Ant based rule mining with parallel fuzzy cluster

minimizing the objective function. The drawback of clustering algorithms like FCM and HCM, which are based on the hill climbing heuristic, is, prior knowledge of the number of clusters in the data is required and they have significant sensitivity to cluster center initialization. The proposal of this work moves in the direction of constructing C fuzzy means clustering with ant colony optimization (parallel ant agents) in evolving efficient rule mining techniques. In this thesis, the proposal introduces the problem of combining multiple partitionings of a set of objects without accessing the original features. The system first identify several application scenarios for the resultant `knowledge reuse' framework that the system call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information in building rule mining techniques. In addition to a direct maximization approach, the system proposes three effective and efficient techniques for obtaining high-quality combiners. 2. RELATED WORKS Andrea Baraldi and Palma Blonda,[1] describe, equivalence between the concepts of fuzzy clustering and soft competitive learning in clustering algorithms was proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clustering algorithms. Alfred Ultsch systems for clustering with collectives of autonomous agents follow either the ant approach of picking up and dropping objects or the DataBot approach of identifying the data points with artificial life creatures. In DataBot systems the clustering behaviour is controlled by movement programs. Julia Handl and Bernd Meyer Sorting and clustering methods inspired by the behavior of real ants are among the earliest methods in antbased meta-heuristics. The system revisits these methods in the context of a concrete application and introduces some modifications that yield significant improvements in terms of both quality and efficiency. Firstly, re-examine their capability to simultaneously perform a combination of clustering and multi-dimensional scaling. In J.Handl, J.Knowles and M.Dorigo Ant-based clustering and sorting is a nature-inspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to text-mining, all with some promise. However, although early results were broadly encouraging, there has been very limited analytical evaluation of the algorithm. Alexander Strehl, Joydeep Ghosh introduces the problem of combining multiple partitioning of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitioning. The system first

14

identify several application scenarios for the resultant `knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, the system proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitioning and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of the techniques, it is quite feasible to use a supraconsensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. The system evaluates the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets. Nicolas Labroche, Nicolas Monmarch´e and Gilles Venturini introduces a method to solve the unsupervised clustering problem, based on a modeling of the chemical recognition system of ants. This system allow ants to discriminate between estimates and intruders, and thus to create homogeneous groups of individuals sharing a similar odor by continuously exchanging chemical cues. This phenomenon, known as ”colonial closure”, inspired us into developing a new clustering algorithm and then comparing it to a well-known method such as K-MEANS method. The previous literature work on fuzzy cluster depicted above insists on the following parameters. The first one handles the functional attribute with the theoretical analysis. The second and third one deal with the cluster object movement issues on synthetic data sets. The fourth and fifth one deals with heuristic ant optimization model with trial repetition. Sixth and seventh authors utilized unsupervised cluster with class tree structuring. The final one uses c-fuzzy mean cluster in the sequential way. This motivates us to precede our proposal on ACO with c-fuzzy means. Based on the C-Fuzzy sequential clustering of ACO Problem, we derived a parallel fuzzy ant clustering model to improve the attribute accuracy rate and faster execution on the proposed problem domain.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Sankar K and Krishnamoorthy K

3. FUZZY ANT CLUSTERING Ant-based clustering algorithms are usually inspired by the way ants cluster dead nest mates into piles, without negotiating about where to gather the corpses. These algorithms are characterized by the lack of centralized control or a priori information, which makes them very appropriate candidates for the task at hand. Since the Fuzzy ants algorithm does not need initial partitioning of the data or a predefined number of clusters, it is very well suited for the Web People Search task, where the system do not know in advance how many clusters (or individuals) correspond to a particular document set (or person name in the case). A detailed description of the algorithm is given by Schockaert et al. It involves a pass in which ants can only pick up one item as well as a pass during which ants can only pick up an entire heap. A fuzzy ant-based clustering algorithm was introduced where the ants are endowed with a level of intelligence in the form of IF / THEN rules that allow them to do approximate reasoning. As a result, at any time the ants can decide for themselves whether to pick up a single item or an entire heap, which makes a separation of the clustering in different passes superuous. The system has experiment with a different number of ant’s runs and fixed the number of runs to 800000 for the experiments. In addition, the system has also evaluated different values for the parameters that determine the probability that a document or heap of documents is picked up or dropped by the ants and kept following values for the experiments: Table 1: Parameter settings for fuzzy clustering n1 probality of dropping one 1 item m1 probality of picking up one 1 item n2 probality of dropping an 5 entire heap m2 probality of picking up a 5 heap 3.1 Hierarchical Clustering The second clustering algorithm the system applies is an agglomerative hierarchical approach. This clustering algorithm builds a hierarchy of clustering’s that can be represented as a tree (called a dendrogram) which has singleton clusters (individual documents) as leaves and a single cluster containing all documents as root. An agglomerative clustering algorithm builds this tree from the leaves to the top, in each step merging the two clusters with the largest similarity. Cutting the tree at a given height gives a clustering at a selected number of clusters. The system have opted to cut the tree at different similarity thresholds between the document pairs, with intervals of 0.1 (e.g. for threshold 0.2 all document pairs with similarities

above 0.2 are clustered together). For the experiments, the system has used an implementation of Agnes (Agglomerative Nesting) that is fully described. 3.2 Fuzzy Ant Parallel System Clustering approaches are typically quite sensitive to initialization. In this thesis, the system examine a swarm inspired approach to building clusters which allows for a more global search for the best partition than iterative optimization approaches. The approach is described with cooperating ants as its basis. The ants participate in placing cluster centroids in feature space. They produce a partition which can be utilized as is or further optimized. The further optimization can be done via a focused iterative optimization algorithm. Experiments were done with both deterministic algorithms which assign each example to one and only one cluster and fuzzy algorithms which partially assign examples to multiple clusters. The algorithms are from the Cmeans family. These algorithms were integrated with swarm intelligence concepts to result in clustering approaches that were less sensitive to initialization. 4. EXPERIMENTAL SIMULATION ON ANT BASED PARALLEL CLUSTER The system implementation of fuzzy ant based parallel clustering algorithm for rule mining used three real data sets obtained from UCI repository. The data sets were Iris Human Data Set, Wine Recognition Data Set, and Glass Identification Data Set. The simulation conducted in matlab normalizes the feature values between 0 and 1. The normalization is linear. The minimum value of a dataset specific feature is mapped to 0 and the maximum value of the feature is mapped to 1. Initialize the ants with random initial values and with random direction. There are two directions, positive and negative. The positive direction means the ant is moving in the feature space from 0 to 1. The negative direction means the ant is moving in the feature space from 1 to 0. Clear the initial memory. The ants are initially assigned to a particular feature within a particular cluster of a particular partition. The ants never change the feature, cluster or the partition assigned to them. Repeat For one epoch /* One epoch is n iterations of random ant movement */ For all ants With a probability Prest the ant rests for this epoch If the ant is not resting then with a probability Pcontinue the ant continues in the same direction else it changes direction With a value between Dmin and Dmax the ant moves in the selected direction The new Rm value is calculated using the new cluster centers calculated by recording the

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

15


Ant based rule mining with parallel fuzzy cluster

5. RESULTS AND DISCUSSIONS The ants move the cluster centers in feature space to find a good partition for the data. There are less controlling parameters than the previous ant based clustering algorithms. The previous ant clustering algorithms typically group the objects on a two-dimensional grid. Results from 18 data sets show the superiority of the algorithm over the randomly initialized FCM and HCM algorithms. For comparison purposes, Table 2 shows the frequency of occurrence of different extrema for the ant initialized FCM and HCM algorithms and the randomly initialized FCM and HCM algorithms.

16

Table 3- Frequency of different extrema from parallel fuzzy based ant clustering, for Glass (2 class) Iris and Wine data set Data Set

Extr ema

Glass (2 class)

34.1 320 34.1 343 34.1 372 34.1 658 6.99 81 7.13 86 10.9 083

Iris

Wine

Freque ncy HCM, and Initializa tion 19

Frequency HCM, random Initializatio n

Sequential C-Fuzzy ACO (Existing)

3

31

Parallel CFuzzy ACO (Propo sed) 27.8

11

19

32.12

28.5

19

15

32.36

29.1

1

5

32.89

29.82

50

23

5.3938

4.23

0

14

5.8389

4.3658

0

5

8.3746

5.3256

12.1 437

0

8

10.6434

8.2356

9.36 45 11.3 748 13.8 483

20

2

5.2369

3.2567

15

20

8.2356

5.236

12

18

10.2356

8.3656

The ant initialized parallel ant fuzzy algorithm always finds better extrema for the Iris data set and for the Wine data set the ant initialized algorithm finds the better extrema 49 out of 50 times. The ant initialized HCM algorithm always finds better extrema for the Iris data set and for the Glass (2 class) data set a majority of the time. For the different Iris, the ant initialized parallel algorithm finds a better extrema most of the time. The ACO approach was used to optimize the clustering criteria, the ant approach for parallel C Means, found better extrema 64% of the time for the Iris data set. The ant initialized parallel C fuzzy ACO finds better extrema all the time. The number of ants is an important parameter of the algorithm. This number only increases when more partitions are searched for at the same time; as ants are (currently) added in increments (Graph 1 and Graph 2). The quality of the final partition improves with an increase of ants, but the improvement comes at the expense of increased execution time. Graph 1: Number of Iterations Vs Time No.of.Iterations VS. Time 25 20 T im e

position of the ants known to move the features of clusters for a given partition. If the partition is better than any of the old partitions in memory then the worst partition is removed from the memory and this new partition is copied to the memories of the ants making up the partition. If the partition is not better than any of the old partitions in memory Then With a probability P Continue Current the ant continues with the current partition Else With a probability 0.6 the ant moves to the best known partition, with a probability 0.2 the ant moves to the second best known partition, with a probability 0.1 the ant goes to the third best known partition, with a probability 0.075 the ant goes to the fourth best known partition and with a probability 0.025 the ant goes to the worst known partition Until Stopping criteria The stopping criterion is the number of epochs. Table 2- Parameter Values Parameter Value Number of ants 30 * c * # features Memory per ant 5 Iterations per epoch 50 Epochs 1000 Prest 0.01 Pcontinue 0.75 PContinueCurrent 0.20 Dmin 0.001 Dmax 0.01 Note the multiplier 30 for the number of ants allows for 30 partitions. Three data sets Glass Data Set, Wine Data Set, Iris Data Set were evaluated from a mixture of five Gaussians. The probability distribution across all the data sets is the same but the means and standard deviations of the Gaussians are different. Of the three data sets, two data sets had 500 instances each and the remaining one data set had 1000 instances each. Each instance had two attributes. To visualize the Iris data set, the Principal Component Analysis (PCA) algorithm was used to project the data points into a 2D and 3D space.

15

Ant Fuzzy Parallel Ant Fuzzy Sequential

10 5 0 1

2

3

4

5

6

7

8

9 10

No.of.Iterations

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Sankar K and Krishnamoorthy K

[2]

Graph 2: Time Vs Path Length

Path Length

Time VS. Path Length

[3]

35

35

30

30

25

25

20

20

Ant Fuzzy Sequential

15

15

Ant Fuzzy Parallel

10

10

5

5

0

[4]

0 1

2

3

4

5

6

7

8

9 10

Time

[5] [6]

Kanade P.M. and Hall L.O. (2003) IEEE Transactions on Fuzzy Systems , 11(2), 227-232. Handl J. and Meyer B. (2002) SpringerVerlag, 2439, 913-923. Handl J., Knowles J. and Dorigo M. (2003) IOS Press, Amsterdam, the Netherlands, 204-213. Strehl A. and Ghosh J. (2002) Journal of Machine Learning Research 3, 583-617. Labroche N., Monmarche N. and Venturini G. (2002) France: IOS Presss, 345-349.

7. CONCLUSION The system discussed a swarm inspired optimization algorithm to partition or creates clusters of data. The system described it using the ant paradigm. The approach is to have a coordinated group of ant’s position cluster centroids in feature space. The algorithm was evaluated with a soft clustering formulation utilizing the fuzzy c-means objective function and a hard clustering formulation utilizing the hard cmeans objective function. The presented clustering approach seems clearly advantageous for the data sets where it is expected there will be lots of local extrema. The cluster discovery aspect of the algorithm provides the advantage of obtaining a partition at the same time it indicates the number of clusters. That partition can be further optimized or accepted as is. This is in contrast to some other schemes which require partitions to be created with different numbers of clusters and then evaluated. The results are generally a superior optimized partition (objective function) than obtained with FCM/HCM. One needs a large number of random initializations to be competitive in terms of skipping some of the poor local extrema which was done with the antbased algorithm. It has provided enhanced final partitions on average than a previously introduced evolutionary computation clustering approach for several data sets. Random initializations have been shown to be the best approach for the c-means family and the ant clustering algorithm results in generally better partitions than a single random initialization. The parallel version of the ants algorithm could operate much faster than the current sequential implementation, thereby making it a clear choice for minimizing the chance of finding a poor extrema when doing c-means clustering. This algorithm should scale better for large numbers of examples than grid-based ant clustering algorithms. REFERENCES [1] Baraldi A. and Blonda P. (1999a) IEEE Transactions on Systems, Man, and Cybernetics, 29(6), 778-785.

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

17


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-01-07

Fuzzy multi-objective multi-index transportation problem 1

*1

1

2

Lohgaonkar M.H. , Bajaj V.H. *, Jadhav V.A. and Patwari M.B.

1

Department of Statistics, Dr. B. A. M. University, Aurangabad, MS, vhbajaj@gmail.com, mhlohgaonkar@gmail.com 2 Departments of Statistics, Science College, Nanded, MS

Abstract- The aim of this paper is to present a fuzzy multi-objective multi-index transportation problem and develop multi-objective multi-index fuzzy programming model. This model cannot only satisfy more of the actual requirements of the integral system but is also more flexible than conventional transportation problems. Furthermore, it can offer more information to the decision maker (DM) for reference, and then it can raise the quality for decision-making. This paper, we use a special type of linear and non-linear membership functions to solve the multi-objective multi-index transportation problem. It gives an optimal compromise solution. Keywords- Transportation problem, multi-objective transportation problem, multi-index, linear membership function, non-linear membership function Introduction Fuzzy set theory was proposed by L. A. Zadeh and has been found extensive in various fields. Bellman and Zadeh [2] were the first to consider the application of the fuzzy set theory in solving optimization problems in a fuzzy environment, these investigators constraints that both the objective function and the constraints that exist in the model could be represented by corresponding fuzzy set and should be treated in the same manner. The earliest application of it to transportation problems include Prade [11], O’he’igeartaigh [10], Chanas et al. [4]. But these researcher emphases on investigating theory and algorithm. Furthermore, these above investigations are illustrated with simple instance slacking in actual cases of submition. On the other hand, these models are only of single objective and are classical two index transportation problems. In actual transportation problem, the multi-objective functions are generally considered, which includes average delivery time of the commodities, minimum cost, etc. Zimmermann [15] applied the fuzzy set theory to the linear multicriteria decision making problem. It used the linear fuzzy membership function and presented the application of fuzzy linear vector maximum problem. He showed that solutions obtained by fuzzy linear programming always provide efficient solutions and also an optimal compromised solution. Aneja and Nair [1] Showed that the problem model. Multi-index transportation problem are the extension of conventional transportation problems, and are appropriate for solving transportation problems with multiple supply points, multiple demand points as well as problems using diverse modes of transportation demand or delivering different kinds of merchandises. Thus, the forwarded problem would be more complicated than conventional transportation problems. Junginger [9] who proposed a set of logic problems, to solve multi-index transportation problems, has also conducted a detailed investigation regarding the characteristics of multi-index transportation problem model.

Rautman et al. [12] used multi-index transportation problem model to solve the shipping scheduling suggested that the employment of such transportation efficiency but also optimize the inegral system. Mathematical Model Multi-objective Multi-index Transportation Problem Let a

ijl

be

multi-dimensional

1 ≤ i ≤ m, 1 ≤ j ≤ n ,

array

1 ≤ l ≤ k and let

A=(a ), B=(b ), C=(c ) be multi-matrices ij jl il then multi-index transportation problem is defined as follows

Minimize Z = ∑ ∑ ∑ a ijl X ijl (1) i j l Subject to

∑ X ijl = a ij l

∀ ( i,j)

∑ X ijl = c il j

∀ ( i,l)

∑ X ijl = b jl i

∀ ( j,l)

X ijl ≥ 0

(2)

∀ (i,j,l)

It is immediate that

∑ a ij =∑ b jl ; i l

∑ a ij = ∑ cil ; j l

∑ b jl =∑ cil (3) j i

are three necessary conditions however they are noted to be non sufficient. Multi-objective double problem as follows

transportation

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Fuzzy multi-objective multi-index transportation problem

m n (1) m n (2) Minimize Z p = ∑ ∑ k(1) x + ∑ ∑ k(2)x i=1 j=1 ij ij i=1 j=1 ij ij Subject to n (1) ∑ x =a1i j=1 ij n (2) ∑ x =a 2i j=1 ij m (1) ∑ x =b1j i=1 ij m (2) ∑ xij =b2j i=1 x(1) +x(2) =c ij ij ij x

(1) (2) ,x ≥0 ij ij

(4)

∀i

(5)

∀i

(6)

∀j

(7)

∀j

(8)

∀ i,j

(9)

∀ i,j

(10)

It may be easily seen that for existence of solution following set of conditions are necessary. (11) (12)

m n ∑ a =∑ b i=1 1i j=1 1j m n ∑ a =∑ b i=1 2i j=1 2j

(13) (14) ∀ (i,j)

(15)

It may be easily seen that DTP is composed of two transportation tables and one C matrix as given below.  k(1) k(1) ... k(1)  12 1n  a11  11 (1)  k(1) k(1) ... k  a12 22 2n  C1 =  21 ...  ...  (1) (1) (1)  a1m k k ... k mn  m1 m2  b11

b11

k (1)  11 k (1) T1 =  21 ...  (1)  k m1 b11

(1) k12 ... (1) k 22 ... (1) k m2 ...

(1) k1n   (1) k 2n  

Z12

Z 21 ...

Z 22 ...

Z p1

Z p2

... ...

Z1p 

  Z 2p   ...   Z pp   

(p)

Where, X ,X ,...,X are the isolated optimal solutions of the P different transportation problems for P different objective functions

i Ζιj =Ζj (X )

(i=1,2,...,p & j=1,2,...,p) be the

i-th row and j-th column element of the pay-off matrix. Step 3: From step 2, we find for each objective the worst (Up) and the best (Lp) values corresponding to the set of solutions, where,

L p = Zpp and C = (cij ) mxn

(17)

p=1,2,...,P

An initial fuzzy model of the problem

(4)-(10)

can be stated as Find

X ij

so as to satisfy

a 11

a12  ... a  (1) k mn  1m

(18)

subject to

i=1,2,...,m

j=1,2,...,n,

~ Zp < Lp

p=1,2,...,P

(4)-(10)

b12 ... b1n

and C = (cij )mxn

(19)

Step 4: Case (i) Define membership function for the p-th objective function as follows:

b 22 ... b 2n

Fuzzy Algorithm to solve multi-objective multi-index transportation problem Step 1: Solve the multi-objective multi-index transportation problem as a single objective transportation problem P times by taking one of the objectives at a time Step 2 : From the results of step 1, determine the corresponding values for every objective at each solution derived. According to each solution and value for every objective, we can find pay-off matrix as follows

Z1 (X)

Z 2 (X)

... Z p (X)

1 if Zp (X) ≤ L p   U p -Zp (X) µ p (X)=  if L p < Zp < U p (20)  U p -L p 0 if Zp ≥ U p  Step 5: Find an equivalent crisp model by using a linear membership function for the initial fuzzy model

M a x im iz e λ ≤

λ

U p -Z p (X ) U p -L p

s u b je c t to ( 5 )-( 1 0 )

2

U p = max (Z1p ,Z2p ,...,Zpp ) and

b12 ... b1n

 k (2) k (2) ... k (2)  12 1n  a 21  11 (2)  k (2) k (2) ... k 2n  a 22 22 T2 =  21  ...  ...  k (2) k (2) ... k (2)  a 2m mn   m1 m2 b 21

(16)

b12 ... b1n

 k (2) k (2) ... k (2)  12 1n  a 21  11 (2)  k (2) k (2) ... k 2n  a 22 22 C2 =  21  ...  ...  (2) (2) (2)  a 2m  k m1 k m2 ... k mn 

Z 11

(1) (2)

n ∀i ∑ c =a +a j=1 ij 1i 2i m ∑ cij =b1j +a 2j ∀ j i=1

∑ cij ≤ Min(a1i +b1j )+Min(a 2i +b 2j )

(1) X    (2)  X   .  .  (P)  X 

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

(21)


Lohgaonkar MH, Bajaj VH, Jadhav VA and Patwari MB

Step 6: Solve the crisp model by an appropriate mathematical programming algorithm.

Maximize λ (22)

Subject to p C ij X ij + λ (U p -L p ) ≤ U p subject to (5)-(10)

p=1,2,...,P

1   (U +L ) (U +L ) p p  { 2 -Zp(x)}αp -{ p2 p -Zp(x)}αp -e 1 1 e H µ Zp (x)= + (U +L ) (U +L ) 2 { p p -Zp(x)}αp -{ p p -Zp(x)}αp 2 2 +e  e 2 0  

if Zp ≤ Lp

U p -L p

S u b jec t X

U p -L p

X

(U +L ) (U +L ) { p p -Zp (x)}αp -{ p p -Zp (x)}αp 2 2 1 e -e 1 (24) λ≤ + (U +L ) (U +L ) p p p p 2 { -Zp (x)}αp -{ -Zp (x)}αp 2 2 2 e +e

λ≥0

&

Solve the crisp model as

X X X X X X

Maximize Xmn+1

(25)

subject to αp Zp (x) + Xmn+1 ≤ αp (Up + Lp ) /2 ,

p = 1,2,-----P

X X X

subject to (5)-(10) and Xmn+1 ≥ 0 -1 Where, X mn+1 =tanh (2λ-1)

X

Now, by using exponential membership function for the p th objective function and is defined as

X

1, if Z p ≤ L p   -SΨp (X) -S e -e (26) E µ Z p (x)=  , if L p < Z p < U p -S 1-e  0, if Z p ≥ U p   Zp -L P Where, Ψ p (X)= p=1,2,...,P U p -Lp

X

S is a non zero parameter, prescribed by the decision maker

X

X

X X X X X

X

Numerical Examples Example 1 C1 3 6 4

C2 5 9  2 14  1 6 12  7

10 14 12 10

8 5  9  4

6 3 6  4 1 7  2 6 5 9 3  6

5 8

5 8  4  2

7

3  (28)

4

9

1

6

8

3 

11

X C

5 8  4 2

X 7

3 (27)

4

9

  1 6 8 3 

(29)

(1) (1) (1) (1) (2) (2) (2) (2) X33 +9X41 +10X42 +12X43 +8X11 +6X12 +3X13 +5X21 + (2) (2) (2) (2) (2) (2) (2) (2) 4X22 +X23 +9X31 +2X32 +6X33 +4X41 +9X42 +3X43

if Zp ≥ Up

Crisp model for the fuzzy model can be formulated as: Maximize λ subject to

4 8  7 9

C

9 9 6 9 2 7  7 95 4 5  6

(1) (1) (1) (1) (1) (1) (1) (1) Minimize Z1 = 4X11 +3X12 +5X13 +8X21 +6X22 +2X23 +7X31 +4X32 +

X

subject to (5) − (10)

3

10 7  8 8

2 14 12 10

6

=

5

if Lp < Zp < Up

(23)

3

6

T2 7 9 2  14  4 6 3  7

Example 1 is simplified as

Now, by using hyperbolic membership function for the P-th objective function

Where, α p =

T1

5 4  1  4

X

(1 ) 11 (1 ) 21 (1 ) 31 (1 ) 41 (2 ) 11 (2 ) 21 (2 ) 31 (2 ) 41 (1 ) 11 (1 ) 12 (1 ) 13 (2 ) 11 (2 ) 12 (2 ) 13 (1 ) 11 (1 ) 12 (1 ) 13 (1 ) 21 (1 ) 22 (1 ) 23 (1 ) 31 (1 ) 32 (1 ) 33 (1 ) 41 (1 ) 42 (1 ) 43

+ X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X

to (1 ) 12 (1 ) 22 (1 ) 32 (1 ) 42 (2 ) 12 (2 ) 22 (2 ) 32 (2 ) 42 (1 ) 21 (1 ) 22 (1 ) 23 (2 ) 21 (2 ) 22 (2 ) 23 (2 ) 11 (2 ) 12 (2 ) 13 (2 ) 21 (2 ) 22 (2 ) 23 (2 ) 31 (2 ) 32 (2 ) 33 (2 ) 41 (2 ) 42 (2 ) 43

(1 ) + X 13 (1 ) + X 23 (1 ) + X 33 (1 ) + X 43 (2 ) + X 13 (2 ) + X 23 (2 ) + X 33 (2 ) + X 43 (1 ) + X 31 (1 ) + X 32 (1 ) + X 33 (2 ) + X 31 (2 ) + X 32 (2 ) + X 33

= 9 = 14 = 6 = 7 = 6 = 7 = 5 = 6 (1 ) + X 41 (1 ) + X 42 (1 ) + X 43 (2 ) + X 41 (2 ) + X 42 (2 ) + X 43

= 14 = 12 = 10 = 5 = 8 = 11

= 5 = 7 = 3 = 8 = 4 = 9 = 4 = 1 = 6

(30)

= 2 = 8 = 3

5 8 11

Example 2 Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

3


Fuzzy multi-objective multi-index transportation problem

Example 2 is simplified as (1) (1) (1) (1) (1) (1) (1) (1) Minimize Z2 = 5X11 +6X12 +7X13 +4X21 +5X22 +2X23 +1X31 +3X32 + (31) (1) (1) (1) (1) (2) (2) (2) (2) 4X33 +4X41 +2X42 +3X43 +10X11 +9X12 +9X13 +7X21 + (2) (2) (2) (2) (2) (2) (2) (2) 9X22 +2X23 +8X31 +7X32 +9X33 +8X41 +4X42 +5X43

S u b je c t

to

(1) (1) (1 ) X11 +X12 +X13 =9 (1) X 21 (1) X 31 (1) X 41 (2) X11 (2) X 21 (2) X 31 (2) X 41 (1) X11 (1) X12 (1) X13 (2) X11 (2) X12 (2) X13 (1) X11 (1) X12 (1) X13 (1) X 21 (1) X 22 (1) X 23 (1) X 31 (1) X 32 (1) X 33 (1) X 41 (1) X 42 (1) X 43

4

(1) +X 22 (1) +X 32 (1) +X 42 (2) +X12 (2) +X 22 (2) +X 32 (2) +X 42 (1) +X 21 (1) +X 22 (1) +X 23 (2) +X 21 (2) +X 22 (2) +X 23 (2) +X11 (2) +X12 (2) +X13 (2) +X 21 (2) +X 22 (2) +X 23 (2) +X 31 (2) +X 32 (2) +X 33 (2) +X 41 (2) +X 42 (2) +X 43

(1 ) +X 23 (1 ) +X 33 (1 ) +X 43 (2 ) +X13 (2 ) +X 23 (2 ) +X 33 (2 ) +X 43 (1 ) +X 31 (1 ) +X 32 (1 ) +X 33 (2 ) +X 31 (2 ) +X 32 (2 ) +X 33

=14 =6

as

X (1) =5 ; X (1) =4 ;X (1) =8 ,X (1) =2 ,  12 21 22  11  (1) (1) (1) X =4; X =6 ; X =1 ;X (1) = 6; 33 41 42  23  (1)  (2) (2) (2) (2) X = X12 =3 ; X13 =3 ;X 22 =2 ; X 23 =5 ,   (2)  (2) (2) (2) X31 =4; X32 =1 ; X 41 =1 ;X 42 = 2;   (2)  X =3  43 

=7

Z1 =300

Z2 , we find the optimal solution

For objective

=6

as

X (1) =4 ; X (1) =5 ;X (1) =8 ,X (1) =4 ,  12 21 22  11  X (1) =2; X (1) =1;X (1) =5 ; X (1) =1 ;  31 33 41  23  (1) (1) (2) (2)  X 42 = 3; X 43 =3;X11 =1 ; X12 =2; (2) X =  X (2) =3;X (2) =7; X (2) =3; X (2) =1 ;  23 31 32  13  (2) (2) (2) X = 1; X =1 ;X = 5;  41 42  33   

=7 =5 =6 (1 ) +X 41 (1 ) +X 42 (1 ) +X 43 (2 ) +X 41 (2 ) +X 42 (2 ) +X 43

=14 =12 =10

Z1 =283

=5 =8

Now

=11

Z2 (X Now

=5

Z1 (X

=7

X

for

(1)

we

can

find

we

can

find

Z2 ,

out

)=291

for

(2)

(1)

X

(2)

out

Z1

) =330

Pay-off matrix is

Z1

=3

Z2

(1)

300 (2) 330  X X

=8 =4

291

283 

From

this

U1 =330,

=9

U 2 =291,

{

=4

matrix

L1 =300,

L 2 =283

}

Find Xij , i=1,2,3, j=1,2,3 , soastosatisfy Z1≤300 andZ2≤283,

=1

Define membership function for the objective functions Z (X) and

1

=6 =2

Z1 , we find the optimal solution

For objective

Z2 (X)

(32)

=8 =3

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

respectively


Lohgaonkar MH, Bajaj VH, Jadhav VA and Patwari MB

1, if   330-Z1(X) , if µ1 (X)=  330-300 0, if  

Z1 (X) ≤ 300 300 < Z1(X) < 330 Z1(X) ≥ 330

we

get

the

1, if Z2 (X) ≤ 283   291-Z2 (X) , if 283 < Z (X) < 291 2 µ2 (X)=  291-283 0, if Z2 (X) ≥ 291   Find an equivalent crisp model

λ+Z1 (X) ≤ 330 and

membership

H µ1 (Z1)

H functions and µ2 (Z2 ) for the objectives Z & Z respectively, are 1 2 defined as follows: 1,   1 6 1 µ Z1 (x)=  tanh {315-Z1 (X)} + 30 2 2 0,  H

;

Maximize λ ,

Then

1,   1 6 H µ Z2 (x)=  tanh {287-Z2 (X)} 8 2 0,  We get an equivalent crisp model

if Z1(X) ≤ 300 if 300 < Z1(X) < 330 if Z1(X) ≥ 330

if Z2 (X) ≤ 283 if 283 < Z2 (X) < 291 if Z 2 (X) ≥ 291

Maximize X mn+1 Subject to

α α1Z1 (X)+X10 ≤ 1 (U1 +L1 ) 2 6 (1) (1) (1) (1) (1) (1) (1) (1) (4X11 +3X12 +5X13 +8X21 +6X22 +2X23 +7X31 +4X32 + 30 (1) (1) (1) (1) (2) (2) (2) (2) X33 +9X41 +10X42 +12X43 +8X11 +6X12 +3X13 +5X21 + 6 (2) (2) (2) (2) (2) (2) (2) (2) 4X22 +X23 +9X31 +2X32 +6X33 +4X41 +9X42 +3X43 )+Xmn+1 ≤ 315 30

5λ+Z 2 (X) ≤ 291 Solve the crisp model by using an appropriate mathematical algorithm. (1) (1) (1) (1) (1) (1) (1) (1) 4X11 +3X12 +5X13 +8X 21 +6X 22 +2X 23 +7X31 +4X32 + (1) (1) (1) (1) (2) (2) (2) (2) X33 +9X 41 +10X42 +12X 43 +8X11 +6X12 +3X13 +5X 21 + (2) (2) (2) (2) (2) (2) (2) (2) 4X 22 +X 23 +9X31 +2X32 +6X33 +4X 41 +9X 42 +3X 43 +30λ ≤ 330

(1) (1) (1) (1) (1) (1) (1) (1) 5X11 +6X12 +7X13 +4X 21 +5X 22 +2X 23 +1X 31 +3X 32 + (1) (1) (1) (1) (2) (2) (2) (2) 4X 33 +4X 41 +2X 42 +3X 43 +10X11 +9X12 +9X13 +7X 21 + (2) (2) (2) (2) (2) (2) (2) (2) 9X 22 +2X 23 +8X 31 +7X 32 +9X 33 +8X 41 +4X 42 +5X 43 +8λ ≤ 291

Subject to (30) The optimal compromise solution of the problem is represented as λ=0.6521

X (1) =5 ; X (1) =2.2608 ;X (1) =1.7391 ; X (1) =8;  12 13 21  11  X (1) =3.7391;X (1) =2.2608; X (1) =6 ; X (1) =1 ;  22 23 33 41   (1) (2) (2) (2) (*) X = 6; X12 =4.7391;X13 =1.2608;X 23 =6.7391; X =  42  X (2) =4; X(2) =1 ; X (2) =1 ;X (2) = 2; X(2) = 3  31 32 41 42 43      *  * Z =309.3902 and Z =283.4329  1  2

(1) (1) (1) (1) (1) (1) (1) (1) 24X11 +18X12 +30X13 +48X21 +36X22 +12X23 +42X31 +24X32 + (1) (1) (1) (1) (2) (2) (2) (2) 6X33 +54X41 +60X42 +72X43 +48X11 +36X12 +18X13 +30X21 + (2) (2) (2) (2) (2) (2) (2) (2) 24X22 +6X23 +54X31 +12X32 +36X33 +24X41 +54X42 +18X43 +30Xmn+1 ≤ 1890

And

α α 2 Z 2 (X)+X ≤ 2 (U 2 +L2 ) 2 6 8

(1) (1) (1) (1) (1) (1) (1) (1) (5X11 +6X12 +7X13 +4X 21 +5X 22 +2X 23 +1X 31 +3X 32 +

(1) (1) (1) (1) (2) (2) (2) (2) 4X 33 +4X 41 +2X 42 +3X 43 +10X11 +9X12 +9X13 +7X 21 + 6 (2) (2) (2) (2) (2) (2) (2) (2) 9X 22 +2X 23 +8X 31 +7X 32 +9X 33 +8X 41 +4X 42 +5X 43 )+X mn+1 ≤ 291 8

(1) (1) (1) (1) (1) (1) (1) (1) 30X11 +36X12 +42X13 +24X21+30X22+12X23+6X31+18X32+ (1) (1) (1) (1) (2) (2) (2) (2) 24X33+24X41+12X42+18X43+60X11 +54X12 +54X13 +42X21+ (2) (2) (2) (2) (2) (2) (2) (2) 54X22+12X23+48X31+42X32+54X33+48X41+24X42+30X43)+8Xmn+1≤1746 Subject to (30) The problem was solved by using the linear interactive and discrete optimization (LINDO) software, the optimal compromise solution is

If we use hyperbolic membership function with 6 6 6 α1 = = = ; U1-L1 330-300 30 U1 +L1 630 = =315 ; 2 2

α2 =

U 2 +L 2 2

=

6 U 2 -L 2

574

=

6 291-283

=

6 8

=287

2

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

5


Fuzzy multi-objective multi-index transportation problem

Subject to

X mn+1 =1.9608

X

(*)

X (1) =5; X (1) =3.1304 ; X(1) =8;X (1) =2.8695;  12 21 22  11  X (1) =3.1304; X (1) =6 ; X(1) =1;X (1) = 6;  23 33 41 42   (2) (2) (2) X12 =3.8695;X13 =2.1304;X 22 =1.1304;   (2)  (2) (2) (2) (2) (2) X 23 =5.8695; X31 =4; X32 =1 ; X 41 =1;X 42 = 2; X 43 = 3 

         

= 

 Z* =300.8683 and  1   λ =0.9804    

* Z 2 =282.3024

e

-SΨ p (X)

-(1-e

-S

)λ ≥ e

(3.2) − (3.4) ∀ i,j

-S

&

p=1,2,...,P λ≥0

⇒ Maximize λ -Ψ(X) -1 -1 -Ψ(X) -Ψ(X) e 1 -(1-e )λ≥e ⇒e 1 -(1-0.368)λ≥0.368⇒e 1 -(0.6321)λ≥0.368 -Ψ (X) -Ψ (X) -1 −1 e 2 -(1-e )λ ≥ e ⇒ e 2 -(0.6321λ ≥ 0.368 The problem is solved by the general interactive optimization (LINGO) software λ=0.7084

1,   -1Ψ (X) -1  e 1 -e E µ Z1 (x)=  , -S 1-e  0,  

if Z1 ≤ 300 if 300 < Z1 < 330 if Z1 ≥ 330

; 1,   -1Ψ (X) -1 2 -e e E µ Z 2 (x)=  , -S 1-e  0,  

if Z 2 ≤ 283 if 283 < Z 2 < 291 if Z 2 ≥ 291

Then an equivalent crisp model for fuzzy model can be formulated as Maximize λ subject to

-sψp ( x ) -s -e λ≤ , -s 1-e e

p = 1,2,-----P

and

subject to (7)-(9)

Z -L Z -300 Z1 -300 Ψ1 (X)= 1 1 = 1 = U1 -L1 330-300 30 Z -L Z -283 Z 2 -283 Ψ 2 (X)= 2 2 = 2 = U 2 -L 2 291-283 8

and

(1) (1) (1) (1) (1) (1) (1) (1) Ψ 2 (X) = (-4X11 -3X12 -5X13 -8X 21 -6X 22 -2X23 -7X31 -4X32 (1) (1) (1) (1) (2) (2) (2) (2) X33 -9X 41 -10X 42 -12X 43 -8X11 -6X12 -3X13 -5X21 (2) (2) (2) (2) (2) (2) (2) (2) 4X 22 -X 23 -9X31 -2X 32 -6X33 +4X 41 -9X 42 -3X 43 + 300) /30

(1) (1) (1) (1) (1) (1) (1) (1) Ψ 2 (X) =(-5X11 -6X12 -7X13 -4X 21 -5X 22 -2X 23 -1X 31 -3X 32 (1) (1) (1) (1) (2) (2) (2) (2) 4X33 -4X 41 -2X 42 -3X 43 -10X11 -9X12 -9X13 -7X 21 (2) (2) (2) (2) (2) (2) (2) (2) 9X 22 -2X 23 -8X31 -7X32 -9X 33 -8X 41 -4X 42 -5X 43 + 283)/ 8

Then the problem is

λ≤

ψ ( x ) -1 e 1 -e , and -1 1-e

ψ ( x ) -1 e 2 -e λ≤ , -1 1-e And subject to (30) Then the problem can be simplified as

Maximize λ

6

X(1) =5; X (1) =2.3703 ; X(1) =1.6296;X(1) =8;X(1) =4;  12 13 21 22  11  X(1) =2; X (1) =6 ; X (1) =1 ;X (1) = 5.6296;X (1) = 0.3703  23 33 41 42 43   (2) (2) (2) (2) (2) X12 =4.6296;X13 =1.3703;X 31 =4; X 32 =1 ; X 41 =1 ;    (*)  (2) (2) X = X 42 = 2.3703; X 43 = 2.6296      *  Z1* =306.1085  and Z 2 =270.6274      

Conclusion In this paper multi-objective multi-index transportation problem is defined and problem is solved by using fuzzy programming technique (Linear, Hyperbolic and Exponential membership function). The multi-index transportation problem can represent different modes of origins and destination or it may represent a set of intermediate warehouse. If we use the hyperbolic membership function, then the crisp model becomes linear. The optimal compromise solution of hyperbolic membership function changes significantly if we compare with the solution obtained by the linear membership function but the optimal compromise solution of exponential membership function does not change significantly if we compare with the solution obtained by the linear membership function. References [1] Aneja V.P. and Nair K.P.K. (1979) Management Science, 25, 73-78. [2] Bellman R. E. and Zadeh L. A. (1970) Management science, 17, 141-164. [3] Bit A. K., Biswal M.P. and Alam S. S. (1993) Industrial Engineering Journal XXII, No. 6, 8-12. [4] Chanas S., Kolodzejczyk W. and Machaj A. (1984) Fuzzy set and systems, 13, 211-221. [5] Gwo-Hshiung Tzeng, Dusan Teodorovic and Ming-Jiu Hwang (1996) European Journal of Operations Research, 95, 62-72.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Lohgaonkar MH, Bajaj VH, Jadhav VA and Patwari MB

[6] [7] [8] [9] [10] [11]

[12] [13] [14] [15]

Haley K. B. (1963) Operations Research 10, 448-463. Haley K. B. (1963) Operations Research 11, 369-379. Haley K. B. (1965) Operations Research 16, 471-474. Junginger W. (1993) European Journal of Operational Research 66, 353-371. Oheigeartaigh M. (1982) fuzzy sets and systems, 8 , 235-243. Prade H. (1980) Fuzzy sets. Theory and applications to policy analysis and information Systems. Plenum Press, new work, 155-170. Rautman C.A. Reid R.A. and Ryder E.E. (1993) Operations Research 41, 459469. Verma Rakesh, Biswal M.P. and Biswas A. (1997) Fuzzy sets and systems 91, 37-43. Waiel F. and Abd El- Wahed (2001) fuzzy sets and systems, 117, 26-33. Zimmermann H. J. (1978) fuzzy set and system 1, 45-55.

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

7


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-18-22

Data mining- A Mathematical Realization and cryptic application using variable key Chakrabarti P. Sir Padampat Singhania University, Udaipur-313601, Rajasthan, India, prasun9999@rediffmail.com Abstract- In this paper we have depicted the various mathematical models based on the themes on data mining. The numerical representations of regression and linear models have been explained. We have also shown the prediction of datum in the light of statistical approaches namely probabilistic approach, data estimation and dispersion theory. The paper also deals with the efficient generation of shared keys required for direct communication among co-processors without active participation of server. Hence minimization of time-complexity, proper utilization of resource as well as environment for parallel computing can be achieved with higher throughput in secured fashion. The techniques involved are cryptic methods based support analysis, confidence rule, resource mining, sequence mining and feature extraction. A new approach towards realizing variability concept of key in Wide – Mouth Frog Protocol, Yahalom Protocol and SKEY Protocol has been depicted in this context. Keywords-data mining, regression, dispersion theory, sequence mining, variable key Regression based data-mining techniques A. Concept We have pointed out the scenario where the prediction of dependency of a datum at time instant t1 on another at t2 can be computed. If we assume d1 as datum at t1 and d2 as datum at t2 then we can write the following equation as d2 = a + bd1.. (1) Where a,b are constants. Data prediction based on linear regression model has been concentrated. B. Linear representation As per statistical prediction let the predicted value of a datum d is ∆1 . We assume that its original; value is ∆2. As per data mining based regression model , we can denote ∆i = d 2,i – (a+ b d1,i ) as the error in taking a + b d 1,i for d 2,i and this is known as error of estimation . Prediction based on probabilistic approach Suppose observed data be k1, k2, k3 ..km have respective probability p1,p2, …….pn. m When ∑ pi = 1 i=1 then E(k) = ∑ ki pi = 1 .(2), i=1 provided it is finite. Here, we are use bivariate probability based on K (k1, k2, k3……km) i.e. set of observed data and Q (q1, q2, q3, …….qn) i.e. set of predictive values , ( 1 < m < n) Theorem 1 If the observed data set value and predicted data set value be two jointly distributed random variable then E ( K + Q) = E (K) + E(Q) . Proof : K assume values k1, k2, k3 … km Q assume values q1, q2, q3 …. qm

P(K=ki, Q = qj) = pij, i = 1 to n and j = 1 to n E (K + Q) = ∑ ∑ (ki + qj) pij i j = ∑ ∑ ki pij + ∑ ∑ qj pij i j i j = ∑ ki ∑ pij + ∑ qj ∑ pij i j j i E( K + Q) = E (K) + E(Q)...(3) Similarly, E( K * Q) = E (K) * E (Q)…(4) Prediction based on datum estimation Let the data space be (k1, k2, k3----kn), let distribution function f1(k1) of random variable k involves a parameter whose value is unknown and we have to uses value of  on the basis of observed data space (k1,k2,…….km) where (m < n). We have to select  = f2(k1,k2,……..km), it is basically a number and it is taken as a given for the value of . Hence,  is an estimation of  and value of  obtained from observed data space is on estimate of .    should be negligible for successful prediction of datum. Now, we can represent the datum assumption criteria as below : E () =  for true value of  ….(5) and Var () <= Var (Ψ), for ..(6) True value of and Ψ being any other estimate satisfying equation (5). Hence the data prediction has been pointed out on the basis of property of unbiasedness (equation (5)) and property of maximum variance ( equation(6)). Prediction based on dispersion theory and pattern analysis The values of the data for different sessions are not all equal. In some cases the values are close to one another, where in some cases they are highly dedicated from one another. In order to get a proper idea about overall nature of a given set of values, it is necessary to know, besides average, the extent to which the data differ

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Data mining- A Mathematical Realization and cryptic application using variable key

among themselves or equivalently, how they are scattered about the average. Let the values k1, k2, k3…….km are the obtained data and c be the average of the original values of km+1, km+2, ……………kn Mean Deviation of k about c will be given by 1 n-m MDc = ________ ∑ | ki – c | ..(7) ( n – m) i = 1 In particular , when c = k , mean deviation about mean will be given by 1 n-m MDk = ________ ∑ | ki – ki | .(8) ( n – m) i = 1 B. Pattern matching We want to study the trend analysis of future events based on prediction using previously observed data. If the event delivers some numerical based data estimation , then we can predict so in certain forms. We assume dp to be predicted datum and do as observed datum If dp and do are linearly related, then dp = a + b do…..(9) If exponentially related , the equation will be in the form of dp = ab do..(10). If logarithmic transformation based prediction rule is observed, then the equation will be Dp = A + B d t..(11). where Dp = log dp , A = log a and B = log b. In case of data merging towards obtaining a meaningful information , the convention rule is as follows- di=> d(i+k)mod n where di €D , k is the offset value and n is the number of sensed data elements ie. number of elements of set D.The value of k varies from stage to stage. Communication based on support A .Scheme A and B are two parties . K1,K2,K3,K4,K5,K6 are keys which are protected to A and B only . A sends message m1,m2,m3,m4,m5,m6 in encrypted form with the help of one or more keys . Third party will decipher each message by errorand-trial method and form sets . The key having maximum support is the shared key between A and B . If the number of shared key is more than one then that one is primary while other one is candidate to it .Here we will find shared key so that the third party will not be able to decipher the message . B. Mathematical Analysis Message Encrypted Key m1 ek1=f(k1,k3,k4,k6)=k1^k3^k4^k6 m2 ek2=f(k3,k5)=k3^k5 m3 ek3=f(k4,k5,k6)=k4^ k5^k6 m4 ek4=f(k2,k3,k5)=k2^k3^k5 m5 ek5=f(k1,k2)=k1^k2 m6 ek6=f(k1,k2,k3,k6)=k1^k2^k3^k6 So, it is seen that k3 is supported by 4 out of 6 sets of shared key . This support of k3=66.6% . Hence shared key of A& B is k3. If hacker hacks k1,k2…….,k6 then by applying error-and-trial it

will get shared key. So concept of automatic variable shared key is proposed. The concept is that shared key = (key having maximum support) xor (xor of the value of messages where the support is not available) . Hence, k3= key having maximum support , m3,m5= messages encrypted without k3 . Therefore , shared key =k3^m3^m5 . This scheme cannot be revealed to the hacker . So it will hack k3 instead modified value of the shared key. Communication based on confidence rule A. Scheme Input:- m1,m2,m3,m4,m5,m6 to A. K1,K2,K3,K4,K5,K6 TO A and B. Step1: A encrypts each of the messages with combination of the keys and sends it to B. Step2: B finds the key which has the confidence level of 100 %,i.e. key1=>key2. If key1 exists, then key2 will also exist and hence confidence of Key1=>key2 is 100 %. Step3: Shared key is key1. Step4: ( Application only for enhancing security level ) Shared (key=key1) XOR (key-new) , where key-new can be obtained such that key-new=>key1 is minimum. B. Mathematical Analysis Message Encrypted Keys m1 Sk1=(k1,k3,k4,k6)=(k1^k3^k4^k6) m2 Sk2=(k3,k5)=(k3^k5) m3 Sk3=(k4,k5,k6)=(k4^k5^k6) m4 Sk4=(k2,k3,k5)=(k2^k3^k5) m5 Sk5=(k1,k2)=(k1^k2) m6 Sk6=(k1,k2,k3,k6)=(k1^k2^k3^k6) Only k4=>k6 has confidence level of 100 % . So, shared key=k4(up to step 3). Association Scheme Probability k1=>k4 1/3 k2=>k4 0 k3=>k4 1/4 k5=>k4 1/2 k6=>k4 2/3 So, key-new=k2 since it has least probability . Hence , shared key=k4 XOR k2. Statistical approaches of resource mining A. Based on prediction of most frequent word The most frequent key can be obtained based on Max(f1,f2,….,fn) where f1, f2, ….,fn are relative frequencies and n is total number of keys. B. Based on prediction of variable within interval

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

19


Chakrabarti P

We can predict the value of a variable key if we can measure interval properly. We can apply this scheme in hacking.

Step 7 : Final shared key will be based on shared key in combined form of U1/U2/U3 and computation scheme.

Theorem 2 If a variable key changes (V) over time (t) in an exponential manner, in that case the value of the variable at the centre point an interval (a1, a2) is a geometric mean of its value at a1 and a2. a Proof: Let Va = mn a1 a2 Then Va1 = mn and Va2 = mn Now, value of V at (a1 + a2)/2 = mn (a1+a2)/2 2 (a1+a2) 1/2 = [m n ] a1 a2 1/2 = [(mn )(mn )] = (Va1Va2)1/2

B. Analysis The bits can be denoted by A,B,C,D,E,F. Combined sequence of U1: (A,B,D,E) (C,D) (A,B,C,D)

C. Based on prediction of interrelated variables In a message there may be a variable which is dependent on any other based on any equation in that case extraction can be made. Theorem 3 If a variable m related to another variable n in the form m = an, where a is a constant, then harmonic mean of n is related to that of n based on the same equation. Proof: Let x is no. of given values. If mHM = x / (∑ 1/mi) for i = 1 to x = x / (∑ 1/ani) [ Since mi = ani] = x / ( 1/a ∑ 1/ni) for i = 1 to x = a( x / ( ∑ 1/ni) for i= 1 to x = anHM Shared key generation in the light of sequence mining Let us suppose that four users viz.U1,U2,U3,U4 are in a network. Each of U1,U2,U3 transmits three messages to U4 in successive sessions. Sender Key Operations U1 110110 U1(m1) U4 U2 100101 U2(m1) U4 U3 001010 U3(m1) U4 U1 001100 U1(m2) U4 U2 000011 U2(m2) U4 U3 100001 U3(m2) U4 U1 111100 U1(m3) U4 U2 000001 U2(m3) U4 U3 110100 U3(m3) U4 A . Algorithm Step 1 : Designate each bit of key as a character. Step 2 : If the character index value is 1 include it in sequence. Step 3 : else ignore the value. Step 4 :Identify the pattern that is decided by the communicating party and fetch the combination. Step 5 : The shared key for each user will be based on the combined result Step 6 : Repeat the steps 1to5 for other users

Table 1- Combined sequence forU1 Sequence

Session

A

B

C

D

E

F

1

1

1

1

0

1

1

0

2

4

0

0

1

1

0

0

3

7

1

1

1

1

0

0

Combined sequence (A,D,F) (E,F) (F)

of

U2

:

Table 2- Combined sequence forU2 Sequence

Session

A

B

C

D

E

F

1

2

1

0

0

1

0

1

2

5

0

0

0

0

1

1

3

8

0

0

0

0

1

1

Combined sequence of U3 : (C,E) (A,F) (A,B,D) Table 3- Combined sequence forU3 Sequence

Session

A

B

C

D

E

F

1

3

0

0

1

0

1

0

2

6

1

0

0

0

0

1

3

9

1

1

0

1

0

0

C. Method 1 Communicating parties : U1 and U4 (say).Sequence of AB and D are as follows : AB=2, D=3.Therefore x1=2 and x2=3 Therefore U1 will compute ((A.M.of2and3)*(H.M.of 2and3))1/2 and U4 will compute G.M. of 2and3.So, shared key= 61/2. If any occurrence becomes null, then that parameter value is treated as zero. D. Method 2 Communicating parties : U3 and U4 (say) In case of U3, Union becomes C E A F B D So, shared key of U3 and U4 is C E A F B D E. Method 3 Communicating parties : U2 and U4 (say) Shared key is based on intersection and it is F. Key using feature based method Let six messages are to be sent by the sender and those have to be encrypted by combination of one or more keys using some function.

20 Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Data mining- A Mathematical Realization and cryptic application using variable key

Table 4 - Association of keys against each message message Keys associated M1 SK1 = ( K1,K3,K4,K6) M2 SK2 = (K3,K5) M3 SK3 = (K4,K5,K6) M4 SK4 = (K2,K3,K5) M5 SK5 = (K1,K2) M6 SK6 = (K1,K2,K3,K6)

Then it will compute “f” based on TA and binary form of ASCII value of his name. 5) Thus he computes K, i.e. the session key with which he will communicate with Alice. 6) In the next iteration TA, KA will be changed and hence “f” and so on. The main advantage is that nowhere the transmission of key K is used.

Table 5 - Determination of count and value

Yahalom protocol using variable key Both Bob and Alice share a secret key with Trent. Let , RA = Nonce chosen by Alice NB =Number chosen by Bob based on RA ,A KA = Shared key between Alice and Trent KB = Shared key between Bob and Trent A=Alice’s name B=Bob’s name K=Random session key

Key

Initial value

Count

Value

(Value)

K1 K2 K3 K4 K5 K6

0.1 0.2 0.3 0.4 0.5 0.6

3 3 4 2 3 3

0.3 0.6 1.2 0.8 1.5 1.8

0.09 0.36 1.44 0.64 2.25 3.24

2

Now CF = ( x , y , z ) where x = number of elements , y = linear sum of the elements and z = sum of the square of the elements CF1 = ( 4 , 4.1 , 5.41) CF2 = ( 2 , 2.7 , 3.69) CF3 = ( 3 , 4.1 , 6.13 ) CF4 = ( 3 , 3.3 , 4.05 ) CF5 = ( 2 , 0.9 , 0.45 ) CF6 = ( 4 , 3.9 , 5.13 ) So CFnet = accumulation of maximum of each tuple = ( 4 , 4.1 , 6.13) So shared key = floor of modulus of (4.1 – 6.13) =2 Wide – mouth frog using variable key Both Alice and Bob share a secret key with a trusted server let Trent. The keys are just used for key distribution and not to encrypt any actual messages between users. The proposed algorithm is as follows1) Alice concatenates a timestamp, Bob’s name and a technique to deduce random session key based on timestamp and Bob’s name. She then encrypts the whole message with the key she shares with Trent. She sends this to Trent along with her name. Alice sends:- A,EKA(TA, B, f). 2) Trent decrypts the message. For enhanced security, he concatenates a new timestamp, Alice’s name, function “f” and the difference between TB and TA. He then encrypts the whole message with the key he shares with Bob. Trent sends: EKB(TB, A, f, d). Hence, f is automatic variable based on TB, d. 3) Bob decrypts it. He then first verify the sender’s name, and compute TA based on TA= TB – d

4)

1. 2.

Alice concatenates her name and a random number and sends it to Bob. Bob computes NB = RA + (binary form of ASCII value of Alice).

He sends Trent B, EKB (A, RA, f), where f= offset which when applied on NB yields RA. 3. Trent generates two messages to Alice EKA (B, K’, RA, f, d), EKB (A, K’, d),where K= session key random = f( K’, d). 4. Alice decrypts first message, extracts K using f((K’, d). Alice sends Bob two messages EKB (A, K’, d), EK (RA, f). 5. Bob decrypts A, K’, d are extracts of K like f(K’, d)= K.. Then he extracts NB using f(RA, f)≡ NB. It is to be remembered that the functions f(K’, d) and f(RA, f) should be reversible. Bob then matches whether NB has same value. At the end, both Alice and Bob are convinced that they are talking to the other and not to a third party. Advantage is that there is no use of transmitting NB and K. Demerit is calculation of NB and K using the functions specified. Analysis of skey using variable key SKEY is mainly a program for authentication and it is based on a one-way function. The proposed algorithm is as follows1. Host computes a Bernouli trial with biased coin for which p= probability of coming 1.q=(1-p)=probability of coming 0.Let number of trials be n. Assume n=6, and string=110011. 2. Host sends the string to Alice. 3. Alice modifies its own public key based on that the new public key = previous key + ( binary equivalent of the number of 1’s present in the string). 4. Alice creates a Shared Key.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

21


Chakrabarti P

5. 6. 7. 8.

9.

Alice modifies the public key along with modification scheme with shared key. Alice then encrypts the string with her private key and sends back to the host along with her name. Host first decrypts public key and accordingly fetches it from database of Alice and computes the result. If match is found, then it performs another level of verification by decrypting the string with new value of Alice’s public key. If that also matches, then authentication of Alice is certified.

Conclusion The techniques involved for data prediction in this paper are namely regression rule, probabilistic approach, and datum estimation analysis and dispersion theory. We have also shown how pattern matching can be sensed. Several approaches of shared key computation on the basis of data mining techniques have been discussed in details with relevant mathematical analysis. Variable concept of key in Wide-Mouth Frog Protocol, Yahalom Protocol and SKEY Protocol has also been applied in cryptic data mining. References [1] Chakrabarti P., et. al. (2008) IJCSNS, 8,7. [2] Chakrabarti P., et. al. () Asian Journal of Information Technology, Article ID: 706AJIT [3] Chakrabarti P., et. al. () Asian Journal of Information Technology, Article ID: 743AJIT [4] Chakrabarti P., et. al. (2008) IJHIS . [5] Chakrabarti P. (2008) International conference on Emerging Technologies and Applications in Engineering, Technology and Sciences , Rajkot. [6] Chakrabarti P. (2008) ICQMOIT08, Hyderabad. [7] Schneier B. (2008) Applied Cryptography , Wiley-India Edition

22 Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-01-07

Fuzzy multi-objective multi-index transportation problem 1

*1

1

2

Lohgaonkar M.H. , Bajaj V.H. *, Jadhav V.A. and Patwari M.B.

1

Department of Statistics, Dr. B. A. M. University, Aurangabad, MS, vhbajaj@gmail.com, mhlohgaonkar@gmail.com 2 Departments of Statistics, Science College, Nanded, MS

Abstract- The aim of this paper is to present a fuzzy multi-objective multi-index transportation problem and develop multi-objective multi-index fuzzy programming model. This model cannot only satisfy more of the actual requirements of the integral system but is also more flexible than conventional transportation problems. Furthermore, it can offer more information to the decision maker (DM) for reference, and then it can raise the quality for decision-making. This paper, we use a special type of linear and non-linear membership functions to solve the multi-objective multi-index transportation problem. It gives an optimal compromise solution. Keywords- Transportation problem, multi-objective transportation problem, multi-index, linear membership function, non-linear membership function Introduction Fuzzy set theory was proposed by L. A. Zadeh and has been found extensive in various fields. Bellman and Zadeh [2] were the first to consider the application of the fuzzy set theory in solving optimization problems in a fuzzy environment, these investigators constraints that both the objective function and the constraints that exist in the model could be represented by corresponding fuzzy set and should be treated in the same manner. The earliest application of it to transportation problems include Prade [11], O’he’igeartaigh [10], Chanas et al. [4]. But these researcher emphases on investigating theory and algorithm. Furthermore, these above investigations are illustrated with simple instance slacking in actual cases of submition. On the other hand, these models are only of single objective and are classical two index transportation problems. In actual transportation problem, the multi-objective functions are generally considered, which includes average delivery time of the commodities, minimum cost, etc. Zimmermann [15] applied the fuzzy set theory to the linear multicriteria decision making problem. It used the linear fuzzy membership function and presented the application of fuzzy linear vector maximum problem. He showed that solutions obtained by fuzzy linear programming always provide efficient solutions and also an optimal compromised solution. Aneja and Nair [1] Showed that the problem model. Multi-index transportation problem are the extension of conventional transportation problems, and are appropriate for solving transportation problems with multiple supply points, multiple demand points as well as problems using diverse modes of transportation demand or delivering different kinds of merchandises. Thus, the forwarded problem would be more complicated than conventional transportation problems. Junginger [9] who proposed a set of logic problems, to solve multi-index transportation problems, has also conducted a detailed investigation regarding the characteristics of multi-index transportation problem model.

Rautman et al. [12] used multi-index transportation problem model to solve the shipping scheduling suggested that the employment of such transportation efficiency but also optimize the inegral system. Mathematical Model Multi-objective Multi-index Transportation Problem Let a

ijl

be

multi-dimensional

1 ≤ i ≤ m, 1 ≤ j ≤ n ,

array

1 ≤ l ≤ k and let

A=(a ), B=(b ), C=(c ) be multi-matrices ij jl il then multi-index transportation problem is defined as follows

Minimize Z = ∑ ∑ ∑ a ijl X ijl (1) i j l Subject to

∑ X ijl = a ij l

∀ ( i,j)

∑ X ijl = c il j

∀ ( i,l)

∑ X ijl = b jl i

∀ ( j,l)

X ijl ≥ 0

(2)

∀ (i,j,l)

It is immediate that

∑ a ij =∑ b jl ; i l

∑ a ij = ∑ cil ; j l

∑ b jl =∑ cil (3) j i

are three necessary conditions however they are noted to be non sufficient. Multi-objective double problem as follows

transportation

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Fuzzy multi-objective multi-index transportation problem

m n (1) m n (2) Minimize Z p = ∑ ∑ k(1) x + ∑ ∑ k(2)x i=1 j=1 ij ij i=1 j=1 ij ij Subject to n (1) ∑ x =a1i j=1 ij n (2) ∑ x =a 2i j=1 ij m (1) ∑ x =b1j i=1 ij m (2) ∑ xij =b2j i=1 x(1) +x(2) =c ij ij ij x

(1) (2) ,x ≥0 ij ij

(4)

∀i

(5)

∀i

(6)

∀j

(7)

∀j

(8)

∀ i,j

(9)

∀ i,j

(10)

It may be easily seen that for existence of solution following set of conditions are necessary. (11) (12)

m n ∑ a =∑ b i=1 1i j=1 1j m n ∑ a =∑ b i=1 2i j=1 2j

(13) (14) ∀ (i,j)

(15)

It may be easily seen that DTP is composed of two transportation tables and one C matrix as given below.  k(1) k(1) ... k(1)  12 1n  a11  11 (1)  k(1) k(1) ... k  a12 22 2n  C1 =  21 ...  ...  (1) (1) (1)  a1m k k ... k mn  m1 m2  b11

b11

k (1)  11 k (1) T1 =  21 ...  (1)  k m1 b11

(1) k12 ... (1) k 22 ... (1) k m2 ...

(1) k1n   (1) k 2n  

Z12

Z 21 ...

Z 22 ...

Z p1

Z p2

... ...

Z1p 

  Z 2p   ...   Z pp   

(p)

Where, X ,X ,...,X are the isolated optimal solutions of the P different transportation problems for P different objective functions

i Ζιj =Ζj (X )

(i=1,2,...,p & j=1,2,...,p) be the

i-th row and j-th column element of the pay-off matrix. Step 3: From step 2, we find for each objective the worst (Up) and the best (Lp) values corresponding to the set of solutions, where,

L p = Zpp and C = (cij ) mxn

(17)

p=1,2,...,P

An initial fuzzy model of the problem

(4)-(10)

can be stated as Find

X ij

so as to satisfy

a 11

a12  ... a  (1) k mn  1m

(18)

subject to

i=1,2,...,m

j=1,2,...,n,

~ Zp < Lp

p=1,2,...,P

(4)-(10)

b12 ... b1n

and C = (cij )mxn

(19)

Step 4: Case (i) Define membership function for the p-th objective function as follows:

b 22 ... b 2n

Fuzzy Algorithm to solve multi-objective multi-index transportation problem Step 1: Solve the multi-objective multi-index transportation problem as a single objective transportation problem P times by taking one of the objectives at a time Step 2 : From the results of step 1, determine the corresponding values for every objective at each solution derived. According to each solution and value for every objective, we can find pay-off matrix as follows

Z1 (X)

Z 2 (X)

... Z p (X)

1 if Zp (X) ≤ L p   U p -Zp (X) µ p (X)=  if L p < Zp < U p (20)  U p -L p 0 if Zp ≥ U p  Step 5: Find an equivalent crisp model by using a linear membership function for the initial fuzzy model

M a x im iz e λ ≤

λ

U p -Z p (X ) U p -L p

s u b je c t to ( 5 )-( 1 0 )

2

U p = max (Z1p ,Z2p ,...,Zpp ) and

b12 ... b1n

 k (2) k (2) ... k (2)  12 1n  a 21  11 (2)  k (2) k (2) ... k 2n  a 22 22 T2 =  21  ...  ...  k (2) k (2) ... k (2)  a 2m mn   m1 m2 b 21

(16)

b12 ... b1n

 k (2) k (2) ... k (2)  12 1n  a 21  11 (2)  k (2) k (2) ... k 2n  a 22 22 C2 =  21  ...  ...  (2) (2) (2)  a 2m  k m1 k m2 ... k mn 

Z 11

(1) (2)

n ∀i ∑ c =a +a j=1 ij 1i 2i m ∑ cij =b1j +a 2j ∀ j i=1

∑ cij ≤ Min(a1i +b1j )+Min(a 2i +b 2j )

(1) X    (2)  X   .  .  (P)  X 

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

(21)


Lohgaonkar MH, Bajaj VH, Jadhav VA and Patwari MB

Step 6: Solve the crisp model by an appropriate mathematical programming algorithm.

Maximize λ (22)

Subject to p C ij X ij + λ (U p -L p ) ≤ U p subject to (5)-(10)

p=1,2,...,P

1   (U +L ) (U +L ) p p  { 2 -Zp(x)}αp -{ p2 p -Zp(x)}αp -e 1 1 e H µ Zp (x)= + (U +L ) (U +L ) 2 { p p -Zp(x)}αp -{ p p -Zp(x)}αp 2 2 +e  e 2 0  

if Zp ≤ Lp

U p -L p

S u b jec t X

U p -L p

X

(U +L ) (U +L ) { p p -Zp (x)}αp -{ p p -Zp (x)}αp 2 2 1 e -e 1 (24) λ≤ + (U +L ) (U +L ) p p p p 2 { -Zp (x)}αp -{ -Zp (x)}αp 2 2 2 e +e

λ≥0

&

Solve the crisp model as

X X X X X X

Maximize Xmn+1

(25)

subject to αp Zp (x) + Xmn+1 ≤ αp (Up + Lp ) /2 ,

p = 1,2,-----P

X X X

subject to (5)-(10) and Xmn+1 ≥ 0 -1 Where, X mn+1 =tanh (2λ-1)

X

Now, by using exponential membership function for the p th objective function and is defined as

X

1, if Z p ≤ L p   -SΨp (X) -S e -e (26) E µ Z p (x)=  , if L p < Z p < U p -S 1-e  0, if Z p ≥ U p   Zp -L P Where, Ψ p (X)= p=1,2,...,P U p -Lp

X

S is a non zero parameter, prescribed by the decision maker

X

X

X X X X X

X

Numerical Examples Example 1 C1 3 6 4

C2 5 9  2 14  1 6 12  7

10 14 12 10

8 5  9  4

6 3 6  4 1 7  2 6 5 9 3  6

5 8

5 8  4  2

7

3  (28)

4

9

1

6

8

3 

11

X C

5 8  4 2

X 7

3 (27)

4

9

  1 6 8 3 

(29)

(1) (1) (1) (1) (2) (2) (2) (2) X33 +9X41 +10X42 +12X43 +8X11 +6X12 +3X13 +5X21 + (2) (2) (2) (2) (2) (2) (2) (2) 4X22 +X23 +9X31 +2X32 +6X33 +4X41 +9X42 +3X43

if Zp ≥ Up

Crisp model for the fuzzy model can be formulated as: Maximize λ subject to

4 8  7 9

C

9 9 6 9 2 7  7 95 4 5  6

(1) (1) (1) (1) (1) (1) (1) (1) Minimize Z1 = 4X11 +3X12 +5X13 +8X21 +6X22 +2X23 +7X31 +4X32 +

X

subject to (5) − (10)

3

10 7  8 8

2 14 12 10

6

=

5

if Lp < Zp < Up

(23)

3

6

T2 7 9 2  14  4 6 3  7

Example 1 is simplified as

Now, by using hyperbolic membership function for the P-th objective function

Where, α p =

T1

5 4  1  4

X

(1 ) 11 (1 ) 21 (1 ) 31 (1 ) 41 (2 ) 11 (2 ) 21 (2 ) 31 (2 ) 41 (1 ) 11 (1 ) 12 (1 ) 13 (2 ) 11 (2 ) 12 (2 ) 13 (1 ) 11 (1 ) 12 (1 ) 13 (1 ) 21 (1 ) 22 (1 ) 23 (1 ) 31 (1 ) 32 (1 ) 33 (1 ) 41 (1 ) 42 (1 ) 43

+ X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X + X

to (1 ) 12 (1 ) 22 (1 ) 32 (1 ) 42 (2 ) 12 (2 ) 22 (2 ) 32 (2 ) 42 (1 ) 21 (1 ) 22 (1 ) 23 (2 ) 21 (2 ) 22 (2 ) 23 (2 ) 11 (2 ) 12 (2 ) 13 (2 ) 21 (2 ) 22 (2 ) 23 (2 ) 31 (2 ) 32 (2 ) 33 (2 ) 41 (2 ) 42 (2 ) 43

(1 ) + X 13 (1 ) + X 23 (1 ) + X 33 (1 ) + X 43 (2 ) + X 13 (2 ) + X 23 (2 ) + X 33 (2 ) + X 43 (1 ) + X 31 (1 ) + X 32 (1 ) + X 33 (2 ) + X 31 (2 ) + X 32 (2 ) + X 33

= 9 = 14 = 6 = 7 = 6 = 7 = 5 = 6 (1 ) + X 41 (1 ) + X 42 (1 ) + X 43 (2 ) + X 41 (2 ) + X 42 (2 ) + X 43

= 14 = 12 = 10 = 5 = 8 = 11

= 5 = 7 = 3 = 8 = 4 = 9 = 4 = 1 = 6

(30)

= 2 = 8 = 3

5 8 11

Example 2 Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

3


Fuzzy multi-objective multi-index transportation problem

Example 2 is simplified as (1) (1) (1) (1) (1) (1) (1) (1) Minimize Z2 = 5X11 +6X12 +7X13 +4X21 +5X22 +2X23 +1X31 +3X32 + (31) (1) (1) (1) (1) (2) (2) (2) (2) 4X33 +4X41 +2X42 +3X43 +10X11 +9X12 +9X13 +7X21 + (2) (2) (2) (2) (2) (2) (2) (2) 9X22 +2X23 +8X31 +7X32 +9X33 +8X41 +4X42 +5X43

S u b je c t

to

(1) (1) (1 ) X11 +X12 +X13 =9 (1) X 21 (1) X 31 (1) X 41 (2) X11 (2) X 21 (2) X 31 (2) X 41 (1) X11 (1) X12 (1) X13 (2) X11 (2) X12 (2) X13 (1) X11 (1) X12 (1) X13 (1) X 21 (1) X 22 (1) X 23 (1) X 31 (1) X 32 (1) X 33 (1) X 41 (1) X 42 (1) X 43

4

(1) +X 22 (1) +X 32 (1) +X 42 (2) +X12 (2) +X 22 (2) +X 32 (2) +X 42 (1) +X 21 (1) +X 22 (1) +X 23 (2) +X 21 (2) +X 22 (2) +X 23 (2) +X11 (2) +X12 (2) +X13 (2) +X 21 (2) +X 22 (2) +X 23 (2) +X 31 (2) +X 32 (2) +X 33 (2) +X 41 (2) +X 42 (2) +X 43

(1 ) +X 23 (1 ) +X 33 (1 ) +X 43 (2 ) +X13 (2 ) +X 23 (2 ) +X 33 (2 ) +X 43 (1 ) +X 31 (1 ) +X 32 (1 ) +X 33 (2 ) +X 31 (2 ) +X 32 (2 ) +X 33

=14 =6

as

X (1) =5 ; X (1) =4 ;X (1) =8 ,X (1) =2 ,  12 21 22  11  (1) (1) (1) X =4; X =6 ; X =1 ;X (1) = 6; 33 41 42  23  (1)  (2) (2) (2) (2) X = X12 =3 ; X13 =3 ;X 22 =2 ; X 23 =5 ,   (2)  (2) (2) (2) X31 =4; X32 =1 ; X 41 =1 ;X 42 = 2;   (2)  X =3  43 

=7

Z1 =300

Z2 , we find the optimal solution

For objective

=6

as

X (1) =4 ; X (1) =5 ;X (1) =8 ,X (1) =4 ,  12 21 22  11  X (1) =2; X (1) =1;X (1) =5 ; X (1) =1 ;  31 33 41  23  (1) (1) (2) (2)  X 42 = 3; X 43 =3;X11 =1 ; X12 =2; (2) X =  X (2) =3;X (2) =7; X (2) =3; X (2) =1 ;  23 31 32  13  (2) (2) (2) X = 1; X =1 ;X = 5;  41 42  33   

=7 =5 =6 (1 ) +X 41 (1 ) +X 42 (1 ) +X 43 (2 ) +X 41 (2 ) +X 42 (2 ) +X 43

=14 =12 =10

Z1 =283

=5 =8

Now

=11

Z2 (X Now

=5

Z1 (X

=7

X

for

(1)

we

can

find

we

can

find

Z2 ,

out

)=291

for

(2)

(1)

X

(2)

out

Z1

) =330

Pay-off matrix is

Z1

=3

Z2

(1)

300 (2) 330  X X

=8 =4

291

283 

From

this

U1 =330,

=9

U 2 =291,

{

=4

matrix

L1 =300,

L 2 =283

}

Find Xij , i=1,2,3, j=1,2,3 , soastosatisfy Z1≤300 andZ2≤283,

=1

Define membership function for the objective functions Z (X) and

1

=6 =2

Z1 , we find the optimal solution

For objective

Z2 (X)

(32)

=8 =3

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

respectively


Lohgaonkar MH, Bajaj VH, Jadhav VA and Patwari MB

1, if   330-Z1(X) , if µ1 (X)=  330-300 0, if  

Z1 (X) ≤ 300 300 < Z1(X) < 330 Z1(X) ≥ 330

we

get

the

1, if Z2 (X) ≤ 283   291-Z2 (X) , if 283 < Z (X) < 291 2 µ2 (X)=  291-283 0, if Z2 (X) ≥ 291   Find an equivalent crisp model

λ+Z1 (X) ≤ 330 and

membership

H µ1 (Z1)

H functions and µ2 (Z2 ) for the objectives Z & Z respectively, are 1 2 defined as follows: 1,   1 6 1 µ Z1 (x)=  tanh {315-Z1 (X)} + 30 2 2 0,  H

;

Maximize λ ,

Then

1,   1 6 H µ Z2 (x)=  tanh {287-Z2 (X)} 8 2 0,  We get an equivalent crisp model

if Z1(X) ≤ 300 if 300 < Z1(X) < 330 if Z1(X) ≥ 330

if Z2 (X) ≤ 283 if 283 < Z2 (X) < 291 if Z 2 (X) ≥ 291

Maximize X mn+1 Subject to

α α1Z1 (X)+X10 ≤ 1 (U1 +L1 ) 2 6 (1) (1) (1) (1) (1) (1) (1) (1) (4X11 +3X12 +5X13 +8X21 +6X22 +2X23 +7X31 +4X32 + 30 (1) (1) (1) (1) (2) (2) (2) (2) X33 +9X41 +10X42 +12X43 +8X11 +6X12 +3X13 +5X21 + 6 (2) (2) (2) (2) (2) (2) (2) (2) 4X22 +X23 +9X31 +2X32 +6X33 +4X41 +9X42 +3X43 )+Xmn+1 ≤ 315 30

5λ+Z 2 (X) ≤ 291 Solve the crisp model by using an appropriate mathematical algorithm. (1) (1) (1) (1) (1) (1) (1) (1) 4X11 +3X12 +5X13 +8X 21 +6X 22 +2X 23 +7X31 +4X32 + (1) (1) (1) (1) (2) (2) (2) (2) X33 +9X 41 +10X42 +12X 43 +8X11 +6X12 +3X13 +5X 21 + (2) (2) (2) (2) (2) (2) (2) (2) 4X 22 +X 23 +9X31 +2X32 +6X33 +4X 41 +9X 42 +3X 43 +30λ ≤ 330

(1) (1) (1) (1) (1) (1) (1) (1) 5X11 +6X12 +7X13 +4X 21 +5X 22 +2X 23 +1X 31 +3X 32 + (1) (1) (1) (1) (2) (2) (2) (2) 4X 33 +4X 41 +2X 42 +3X 43 +10X11 +9X12 +9X13 +7X 21 + (2) (2) (2) (2) (2) (2) (2) (2) 9X 22 +2X 23 +8X 31 +7X 32 +9X 33 +8X 41 +4X 42 +5X 43 +8λ ≤ 291

Subject to (30) The optimal compromise solution of the problem is represented as λ=0.6521

X (1) =5 ; X (1) =2.2608 ;X (1) =1.7391 ; X (1) =8;  12 13 21  11  X (1) =3.7391;X (1) =2.2608; X (1) =6 ; X (1) =1 ;  22 23 33 41   (1) (2) (2) (2) (*) X = 6; X12 =4.7391;X13 =1.2608;X 23 =6.7391; X =  42  X (2) =4; X(2) =1 ; X (2) =1 ;X (2) = 2; X(2) = 3  31 32 41 42 43      *  * Z =309.3902 and Z =283.4329  1  2

(1) (1) (1) (1) (1) (1) (1) (1) 24X11 +18X12 +30X13 +48X21 +36X22 +12X23 +42X31 +24X32 + (1) (1) (1) (1) (2) (2) (2) (2) 6X33 +54X41 +60X42 +72X43 +48X11 +36X12 +18X13 +30X21 + (2) (2) (2) (2) (2) (2) (2) (2) 24X22 +6X23 +54X31 +12X32 +36X33 +24X41 +54X42 +18X43 +30Xmn+1 ≤ 1890

And

α α 2 Z 2 (X)+X ≤ 2 (U 2 +L2 ) 2 6 8

(1) (1) (1) (1) (1) (1) (1) (1) (5X11 +6X12 +7X13 +4X 21 +5X 22 +2X 23 +1X 31 +3X 32 +

(1) (1) (1) (1) (2) (2) (2) (2) 4X 33 +4X 41 +2X 42 +3X 43 +10X11 +9X12 +9X13 +7X 21 + 6 (2) (2) (2) (2) (2) (2) (2) (2) 9X 22 +2X 23 +8X 31 +7X 32 +9X 33 +8X 41 +4X 42 +5X 43 )+X mn+1 ≤ 291 8

(1) (1) (1) (1) (1) (1) (1) (1) 30X11 +36X12 +42X13 +24X21+30X22+12X23+6X31+18X32+ (1) (1) (1) (1) (2) (2) (2) (2) 24X33+24X41+12X42+18X43+60X11 +54X12 +54X13 +42X21+ (2) (2) (2) (2) (2) (2) (2) (2) 54X22+12X23+48X31+42X32+54X33+48X41+24X42+30X43)+8Xmn+1≤1746 Subject to (30) The problem was solved by using the linear interactive and discrete optimization (LINDO) software, the optimal compromise solution is

If we use hyperbolic membership function with 6 6 6 α1 = = = ; U1-L1 330-300 30 U1 +L1 630 = =315 ; 2 2

α2 =

U 2 +L 2 2

=

6 U 2 -L 2

574

=

6 291-283

=

6 8

=287

2

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

5


Fuzzy multi-objective multi-index transportation problem

Subject to

X mn+1 =1.9608

X

(*)

X (1) =5; X (1) =3.1304 ; X(1) =8;X (1) =2.8695;  12 21 22  11  X (1) =3.1304; X (1) =6 ; X(1) =1;X (1) = 6;  23 33 41 42   (2) (2) (2) X12 =3.8695;X13 =2.1304;X 22 =1.1304;   (2)  (2) (2) (2) (2) (2) X 23 =5.8695; X31 =4; X32 =1 ; X 41 =1;X 42 = 2; X 43 = 3 

         

= 

 Z* =300.8683 and  1   λ =0.9804    

* Z 2 =282.3024

e

-SΨ p (X)

-(1-e

-S

)λ ≥ e

(3.2) − (3.4) ∀ i,j

-S

&

p=1,2,...,P λ≥0

⇒ Maximize λ -Ψ(X) -1 -1 -Ψ(X) -Ψ(X) e 1 -(1-e )λ≥e ⇒e 1 -(1-0.368)λ≥0.368⇒e 1 -(0.6321)λ≥0.368 -Ψ (X) -Ψ (X) -1 −1 e 2 -(1-e )λ ≥ e ⇒ e 2 -(0.6321λ ≥ 0.368 The problem is solved by the general interactive optimization (LINGO) software λ=0.7084

1,   -1Ψ (X) -1  e 1 -e E µ Z1 (x)=  , -S 1-e  0,  

if Z1 ≤ 300 if 300 < Z1 < 330 if Z1 ≥ 330

; 1,   -1Ψ (X) -1 2 -e e E µ Z 2 (x)=  , -S 1-e  0,  

if Z 2 ≤ 283 if 283 < Z 2 < 291 if Z 2 ≥ 291

Then an equivalent crisp model for fuzzy model can be formulated as Maximize λ subject to

-sψp ( x ) -s -e λ≤ , -s 1-e e

p = 1,2,-----P

and

subject to (7)-(9)

Z -L Z -300 Z1 -300 Ψ1 (X)= 1 1 = 1 = U1 -L1 330-300 30 Z -L Z -283 Z 2 -283 Ψ 2 (X)= 2 2 = 2 = U 2 -L 2 291-283 8

and

(1) (1) (1) (1) (1) (1) (1) (1) Ψ 2 (X) = (-4X11 -3X12 -5X13 -8X 21 -6X 22 -2X23 -7X31 -4X32 (1) (1) (1) (1) (2) (2) (2) (2) X33 -9X 41 -10X 42 -12X 43 -8X11 -6X12 -3X13 -5X21 (2) (2) (2) (2) (2) (2) (2) (2) 4X 22 -X 23 -9X31 -2X 32 -6X33 +4X 41 -9X 42 -3X 43 + 300) /30

(1) (1) (1) (1) (1) (1) (1) (1) Ψ 2 (X) =(-5X11 -6X12 -7X13 -4X 21 -5X 22 -2X 23 -1X 31 -3X 32 (1) (1) (1) (1) (2) (2) (2) (2) 4X33 -4X 41 -2X 42 -3X 43 -10X11 -9X12 -9X13 -7X 21 (2) (2) (2) (2) (2) (2) (2) (2) 9X 22 -2X 23 -8X31 -7X32 -9X 33 -8X 41 -4X 42 -5X 43 + 283)/ 8

Then the problem is

λ≤

ψ ( x ) -1 e 1 -e , and -1 1-e

ψ ( x ) -1 e 2 -e λ≤ , -1 1-e And subject to (30) Then the problem can be simplified as

Maximize λ

6

X(1) =5; X (1) =2.3703 ; X(1) =1.6296;X(1) =8;X(1) =4;  12 13 21 22  11  X(1) =2; X (1) =6 ; X (1) =1 ;X (1) = 5.6296;X (1) = 0.3703  23 33 41 42 43   (2) (2) (2) (2) (2) X12 =4.6296;X13 =1.3703;X 31 =4; X 32 =1 ; X 41 =1 ;    (*)  (2) (2) X = X 42 = 2.3703; X 43 = 2.6296      *  Z1* =306.1085  and Z 2 =270.6274      

Conclusion In this paper multi-objective multi-index transportation problem is defined and problem is solved by using fuzzy programming technique (Linear, Hyperbolic and Exponential membership function). The multi-index transportation problem can represent different modes of origins and destination or it may represent a set of intermediate warehouse. If we use the hyperbolic membership function, then the crisp model becomes linear. The optimal compromise solution of hyperbolic membership function changes significantly if we compare with the solution obtained by the linear membership function but the optimal compromise solution of exponential membership function does not change significantly if we compare with the solution obtained by the linear membership function. References [1] Aneja V.P. and Nair K.P.K. (1979) Management Science, 25, 73-78. [2] Bellman R. E. and Zadeh L. A. (1970) Management science, 17, 141-164. [3] Bit A. K., Biswal M.P. and Alam S. S. (1993) Industrial Engineering Journal XXII, No. 6, 8-12. [4] Chanas S., Kolodzejczyk W. and Machaj A. (1984) Fuzzy set and systems, 13, 211-221. [5] Gwo-Hshiung Tzeng, Dusan Teodorovic and Ming-Jiu Hwang (1996) European Journal of Operations Research, 95, 62-72.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Lohgaonkar MH, Bajaj VH, Jadhav VA and Patwari MB

[6] [7] [8] [9] [10] [11]

[12] [13] [14] [15]

Haley K. B. (1963) Operations Research 10, 448-463. Haley K. B. (1963) Operations Research 11, 369-379. Haley K. B. (1965) Operations Research 16, 471-474. Junginger W. (1993) European Journal of Operational Research 66, 353-371. Oheigeartaigh M. (1982) fuzzy sets and systems, 8 , 235-243. Prade H. (1980) Fuzzy sets. Theory and applications to policy analysis and information Systems. Plenum Press, new work, 155-170. Rautman C.A. Reid R.A. and Ryder E.E. (1993) Operations Research 41, 459469. Verma Rakesh, Biswal M.P. and Biswas A. (1997) Fuzzy sets and systems 91, 37-43. Waiel F. and Abd El- Wahed (2001) fuzzy sets and systems, 117, 26-33. Zimmermann H. J. (1978) fuzzy set and system 1, 45-55.

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

7


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-08-12

A new fuzzy MADM approach used for finite selection problem Muley A.A. and Bajaj V.H.* *Department of Statistics, Dr. B. A. M. University, Aurangabad (M.S.)-431004, India vhbajaj@gmail.com, aniket.muley@gmail.com Abstract- This paper proposes a new approach to product configuration by applying the theory of Fuzzy Multiple Attribute Decision Making (FMADM), which focuses on uncertain and fuzzy requirements the customer, submits to the product supplier. The proposed method can be used in e-commerce websites, with which it is easy for customers to get his preferred product according to the utility value with respect to all attributes. The main concern of this paper, in which requirements the customer submitted to the configuration of television is vague. Further verify the validity and the feasibility of the proposed method compared with Weighted Product Method (WPM). Finally, the television is taken as an example to demonstrate the proposed methods. Keywords- MADM, Fuzzy, Triangular fuzzy number, T.V., Uncertainty Introduction Real world problems are often require a decision maker (DM) to rank discrete alternatives or, at least, to select best one. The MADM theory was developed to help the DM to solve such problems. MADM has been one of the fastest growing areas during the last decades depending on the changing in the business sector; Hwang & Yoon [1], Turban [4]. We focus on MADM which is used in a finite ‘selection’ or choice problem. In real world problem, MADM play most important role. Now a day’s television is the common in every person’s life. Here, we take as an application of selection of television configuration. Generally, common people purchase 21” size for house purpose; therefore we choose the most common size. Mass customization, as a business strategy, aims at meeting diverse customer needs while maintaining near mass production efficiency, can implement both economies of scale and scope for an enterprise, and has become the goal that the companies pursue; Zhu & Jiang [7]. In order to reach the goal, companies are often forced to adopt differentiation strategy to offer customer more choices of products to meet the growing individualization of demand, by giving a more customer-centric role. The configuration approaches based on rules which are usually dependent on expert’s experience to establish. The configuration is one of the most important ways to realize quickly product customization. But, in business, particularly through the internet, a customer normally develops in his mind some sort of ambiguity, given the choice of similar products. The main concern is the requirements of the customers with respect to configuration of television which are vague. The television is taken as an example to further verify the validity and the feasibility of the proposed method and compared with WPM by Millar & Starr [2]. Framework of product configuration based on uncertain customers requirements Each attribute has a finite set of possible values, in which, the variant is defined by using attributes and attribute values. Together, all attributes and attribute values describe a

complete range of the product family. Products in the same product family vary according to different attribute and its attribute value, choosing a product could be considered as a process of choosing its attributes and different attribute value. But, generally, it is difficult for a customer to express his requirements in a clear and unambiguous way, which is often due to the fact that he is not thoroughly familiar with the product which is the supplier offers. So, the requirements are often vague and fuzzy, preference weight varies with respect to different product attributes. We describe the customers’ vague and uncertain requirements in the form of fuzzy number by using the method of representation of fuzzy set. It is also the design to solve the configuration problem of the uncertain environment. As we know, there are various attributes in different products, but in which some attributes, such as color, shape, and so on, are not suitable to be represented as a fuzzy number. These attributes are often clear in the customers mind and the customer could select the attribute value by seeing the virtual product model in some browser environment. In realistic configuration system, it could be achieved by selecting the corresponding attribute value that the customer prefers directly. By using the theory of fuzzy MADM, the requirements the customer decided with respect to corresponding product attribute can be regarded as an ideal product. Firstly, the uncertain attribute value the customer decided would be represented in the form of triangular fuzzy number or interval fuzzy number, which is the most common way to solve uncertain, imprecise problems. Moreover, as the attribute values of the alternate products for the customer to select are determinate, which are usually definite and known, therefore, it is impossible to measure directly the distance or similarity degree between the ideal product the customer wanted and the alternate products. The definite attribute value is converted into the form of the fuzzy number so as to compute the distance between two fuzzy numbers. When choosing a product from a number of similar alternatives, customer

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


A new fuzzy MADM approach used for finite selection problem

normally develops some sort of ambiguity. The ambiguity is mainly due to two reasons. Firstly, how to make a final product choice to purchase and secondly, on what basis the other products will be rejected. In order to answer the above questions, the customer may like to classify the products in different preference levels, preferably through some numerical strength of preference by Mohanty & Bhasker [3]. We adopt the triangular fuzzy number to represent the vague requirements provided by the customers it is shown by Fig. (1).

 0, x < a  x−a µ A%( x ) =  , a ≤ x ≤ b b − a c − x  c − b , x > c

ω = (ω1, ω2 , ..., ωn )T

• Let

vector where ∑ n

of

the

weights,

, j =1ω j = 1, ω j ≥ 0, j = 1, 2..., n

and • Let

ω j denotes the weight of attribute Aj . R = (rij )m × n denote the m × n

decision matrix, where

rij ( ≥ 0)

is the

performance rating of alternative attribute

(1)

be

Xi

with

respect

to

Aj .

Normally, there are basically two types of attributes for a MADM problem, the first type is of ‘cost’ nature, and the second type is of ‘benefit’ nature. Since the attributes are generally incommensurate, the decision matrix needs to be normalized so as to transform the various attribute values into comparable ones. A common method of normalization is given as

rij − rmin j

Zij =

rjmax − rjmin

, i =1,..., m; j = 1,..., n;

for benefit attribute (2) and

rjmax − rij

Zij = Fuzzy MADM methodology As we know, when a customer chooses his preferred product from many candidate products, it is done, in fact, by comparing different attributes that could be used to describe product performance in different aspects, and ranking these products according to the customer’s subjective preference. The customer requirements for products are usually uncertain and vague due to unable to understand product specifications comprehensively. On the other hand, the attribute values or specification of products offered by manufacturers are determinate and known. The model of fuzzy MADM has been introduced firstly by Yang & Chou [6]. The general MADM model can be described as follows: • Let finite

X = { X i | i = 1, 2,..., m} discrete set of m ( ≥ 2)

denote a possible

finite

A = { Aj | j = 1, 2,..., n} denote set of n ( ≥ 2) attributes according

which the desirability of an alternative is to be judged,

a to

, i = 1,..., m; j = 1,..., n;

for cost attribute (3) Where

Zij

is the normalized attribute value,

r jmax and r jmin

given by,

r max = max(r1 j , r2 j ,..., rmj ) j = 1,..., n; (4) j r jmin = min(r1 j , r2 j ,..., rmj ) j = 1,..., n; (5) Let

Z = ( Zij ) m × n be the normalized decision

matrix. According to the SAW method, the overall weighted assessment value of alternative

n

di = ∑ Zijω j i = 1,..., m (6) i =1

Where

di

is a linear function of weight

variables and the greater the value of better the alternative

alternatives (courses of action, candidates); • Let

rjmax − rjmin

X i . The aim

di the

of MADM is

to rank alternatives or to determine the best alternative with the highest degree of desirability with respect to all relevant attributes. So, the best alternative is the one with the greatest overall weighted assessment value. The classic MADM techniques assume

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

9


Muley AA and Bajaj VH

all

rij

values are crisp numbers. In the

practical MADM problems,

rij

values can be

crisp and/or fuzzy data. Fuzzy MADM methods have been developed due to the lack of precision in accessing the performance rating of alternatives with respect to an attribute, in which the representation of

rij

values are

often linguistic terms or fuzzy numbers. Configuration approach based on fuzzy MADM is introduced in details the algorithm which includes the following steps: Step 1: Representation of fuzzy requirements When choosing a product from a number of similar alternatives, a customer normally develops in his mind some sort of ambiguity. Step 2: Similarity measure In step 1, the customer’s requirements have been described as the triangular fuzzy number with respect to different product attributes. In this step, we will take the requirement vector as the ideal product the customer really wants, with the purpose to measure the similarity degree with the existing product vectors, in which the specification values are known and determinate. As we know, the fuzzy numbers can not be compared with crisp ones directly unless the unfuzzy numbers have to be transformed into the form of fuzzy numbers firstly. For example, for a crisp number b, the form of its triangular fuzzy can be written as follows:

b%= (bL , bM , bU ) (7) L

M

U

Where b = b = b , and Similarity measure between two triangular fuzzy numbers can be calculated with Eq. (8); Xu [5], L L MM UU ab +a b +ab (8) L2 M2 max((a ) +(a ) +(aU)2,(bL)2 +(bM)2 +(bU)2)

sab ( , )=

Where the above two triangular fuzzy numbers L

M

U

L

M

U

are a = (a , a , a ) and b = (b , b , b ) , respectively. In realistic configuration system, it could be achieved by selecting an attribute value the customer prefers from the given alternate options. The similarity measure of this type of attributes is defined as follows:

1, a ' = b ' ' ' s(a , b ) =  (9) ' ' 0, a ≠ b Step 3: Construction of Decision Matrix (DM). Calculation result of similarity measure between alternate products and the ideal product can be concisely expressed in a matrix format which is called decision matrix in MADM problems, and in which columns indicate product attributes and rows alternate products. Thus, an element

Sij

in the in Eq. (10)

denotes the similarity degree to the ideal 10

product of the ith product with respect to the j attribute.

 S11 S12 S S22 DM =  21  ... ...   Sm1 Sm 2

... ... Sij ...

th

S1n  S2 n  ...  (10)  Smn 

Step 4: Normalization: In order to eliminate the difference of dimension among different attributes, the operation of normalization is needed to transform various attribute dimensions into the non-dimensional attribute. Here, we adopt the Eqs. (11) and (12) to complete normalization of the fuzzy number.

r% i =(

ai max ci

,

bi max bi

,

ci max ai

∧ 1)

for

benefit attribute(11)

r% i =(

cimin cimin cimin , , ∧ 1) ci bi ai

for cost

attribute (12) Where

(g)imax = max{(g)i } and (g)imin = min{(g)i } i i Step 5: Rank of the alternate products The element S in the decision matrix reflects ij the closeness degree of the ideal product with th the ith alternate product with respect to the j attribute. In this step, we can use the SAW method, which is widely used in MADM, to calculate the utility value with respect to all attributes, with which the ranking order of alternate products according to utility value can be obtained. And we can consider the product with the highest utility value as the closest one to that of the customer requires. The utility th value of i alternate product can be calculated with Eq. (13).

n

Ui = ∑ xijω j ,

i = 1, 2,..., m (13) j =1 And the maximum of utility value can be written as Eq. (14). n

Umax = max ∑ xijω j i

j=1

i =1,2,..., m (14)

Here, we compared with the WPM by Millar & Starr [2] and check the feasibility of the customer’s requirement

n

U i = ∏ xij

ωj

i = 1, 2,..., m (15)

j =1

Case study In this section, we take the television as an example to illustrate the method mentioned above. Table 1 shows the television that could

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


A new fuzzy MADM approach used for finite selection problem

be used to configure for the different customers with respect to different attributes, in which the corresponding attributes are described as follows: Table 1 Configuration of Television Sr. No. P1

Speakers

Watt

Channels

Price

6

1800

200

10300

P2

2

110

100

9790

P3

5

500

200

11990

P4

4

1200

200

12400

P5

2

200

200

9400

P6

2

400

100

11490

P7

2

250

200

9300

P8

4

500

200

9900

Suppose that the ideal product the customer wants according to the above attributes and the corresponding preference weight are shown in Table 2. Table 2 The ideal product and attribute weight Attributes

Ideal

Lower

Upper

Weight

Speakers

5

2

8

0.25

Watt

1000

200

2400

0.2

Channels

150

100

250

0.25

Price

10,000

9,000

12,000

0.30

The vector of the ideal product can be represented as the following form of the triangular fuzzy number.

Table 4 Utility value configuration by SAW

of

each

product

P1

P2

P3

P4

P5

P6

P7

P8

0.8447

0.5039

0.7214

0.7462

0.5791

0.5243

0.5815

0.7058

The Table 4 presents the final utility value, with which the customer can rank the candidate products according to his preference to different attributes, and the order that shows the closeness degree to the customer requirements can be written as follows:

P1 > P4 > P3 > P8 > P7 > P5 > P6 > P2 Here, we compare the above method with the WP method and check the feasibility of the customer requirement calculated by Eq. (15), we get Table 5 Utility value of each product configuration by WPM P1

P2

P3

P4

P5

P6

P7

P8

0.8373

0.356

0.6637

0.7398

0.4446

0.4558

0.4635

0.6452

P1 > P4 > P3 > P8 > P7 > P6 > P5 > P2 Due to the uncertainty of the customers’

requirements and the fact that different C%= [(2,5,8), (200,1000, 2400), (100,150, 250), (9 '000,10may '000,12 algorithms yield '000)] different results, therefore, The corresponding vector of the attribute weight can be written as the follows:

ω = (0.25,0.20,0.25,0.30) The decision matrix, which shows the similarity degree with respect to each attribute between the ideal television that the customer desired and the candidate ones, is shown in Table 3 by using Eqs. (8)-(12). Table 3 Decision Matrix Sr. No. P1

U1

U2

U3

U4

0.8333

0.6666

0.8333

0.9824

P2

0.3225

0.0582

0.5263

0.738

P3

0.8064

0.2647

0.8333

0.8618

P4

0.6451

0.6329

0.8333

0.8333

P5

0.3225

0.1058

0.8333

0.8966

P6

0.3225

0.2117

0.5263

0.897

P7

0.3225

0.1323

0.8333

0.887

P8

0.6451

0.2647

0.8333

0.9443

The utility value of all candidate products with respect to all attributes can be calculated by Eq. (13) and the final calculated results are given below:

in the realistic configuration system, several products that have the higher similarity degree to that of the customer requires can be presented for customer to choose so as to satisfy the customer requirements to the greatest degree. Conclusion This paper proposes an approach to realize product level configuration according to the fuzzy and uncertain customer requirements by using the theory of the fuzzy MADM. The television is taken as an example to demonstrate feasibility of the proposed method for solving uncertain customer requirements. When results of SAW and WPM are compared, we get the same preferences to our problem and the optimal solution for selection of television is P1. References [1] Hwang C.L. and Yoon K. P. (1981) Springer, Berlin. [2] Millar D.W., Starr M.K. (1969) Prentice Hall, Englewood Cliffs, New Jersey. [3] Mohanty B.K. & Bhasker B. (2005) Decision Support Systems, .38, 611– 619. [4] Turban E. (1988) Macmillan, New York.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

11


Muley AA and Bajaj VH

[5] [6] [7]

12

Xu Z. S. (2002) Systems Engineering and Electronics, 124, 9–12. Yang T. & Chou P. (2005) Mathematics and Computers in Simulation, 68, 9– 21. Zhu B. & Jiang P. Y. (2005) The International Journal of Product Development, 2, 155–169.

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-13-17

Ant based rule mining with parallel fuzzy cluster 1

Sankar K. and Krishnamoorthy K. 1

2

Department of Master of Computer Applications, KSR College of Engineering, Tiruchengode, san_kri_78@yahoo.com 2 Department of Computer Science and Engineering, SONA College of Technology, Salem, kkr_510@yahoo.co.in

Abstract- Ant-based techniques, in the computer sciences, are designed those who take biological inspirations on the behavior of these social insects. Data clustering techniques are classification algorithms that have a wide range of applications, from Biology to Image processing and Data presentation. Since real life ants do perform clustering and sorting of objects among their many activities, we expect that an study of ant colonies can provide new insights for clustering techniques. The aim of clustering is to separate a set of data points into self-similar groups such that the points that belong to the same group are more similar than the points belonging to different groups. Each group is called a cluster. Data may be clustered using an iterative version of the Fuzzy C means (FCM) algorithm, but the draw back of FCM algorithm is that it is very sensitive to cluster center initialization because the search is based on the hill climbing heuristic. The ant based algorithm provides a relevant partition of data without any knowledge of the initial cluster centers. In the past researchers have used ant based algorithms which are based on stochastic principles coupled with the k-means algorithm. The proposed system in this work use the Fuzzy C means algorithm as the deterministic algorithm for ant optimization. The proposed model is used after reformulation and the partitions obtained from the ant based algorithm were better optimized than those from randomly initialized hard C Means. The proposed technique executes the ant fuzzy in parallel for multiple clusters. This would enhance the speed and accuracy of cluster formation for the required system problem. 1. INTRODUCTION Research in using the social insect metaphor for solving problems is still in its infancy. The systems developed using swarm intelligence principles emphasize distributiveness, direct or indirect interactions among relatively simple agents, flexibility and robustness [4]. Successful applications have been developed in the communication networks, robotics and combinatorial optimization fields. 1.1 ANT COLONY OPTIMIZATION Many species of ants cluster dead bodies to form cemeteries, and sort the larvae into several piles [4]. This behavior can be simulated using a simple model in which the agents move randomly in space and pick up and deposit items on the basis of local information. The clustering and sorting behavior of ants can be used as a metaphor for designing new algorithms for data analysis and graph partitioning. The objects can be considered as items to be sorted. Objects placed next to each other have similar attributes. This sorting takes place in two-dimensional space, offering a low-dimensional representation of the objects. Most swarm clustering work has followed the above model. In the work, there is implicit communication among the ants making up a partition. The ants also have memory. However, they do not pick up and put down objects but rather place summary objects in locations and remember the locations that are evaluated as having good objective function values. The objects represent single dimensions of multidimensional cluster centroids which make up a data partition.

1.2 CLUSTERING The aim of cluster analysis is to find groupings or structures within unlabeled data [5]. The partitions found should result in similar data being assigned to the same cluster and dissimilar data assigned to different clusters. In most cases the data is in the form of real-valued vectors. The Euclidean distance is one measure of similarity for these data sets. Clustering techniques can be broadly classified into a number of categories [6]. Hard C Means (HCM) is one of the simplest unsupervised clustering algorithms for a fixed number of clusters. The basic idea of the algorithm is to initially guess the centroids of the clusters and then refine them. Cluster initialization is very crucial because the algorithm is very sensitive to this initialization. A good choice for the initial cluster centers is to place them as far away from each other as possible. The nearest neighbor algorithm is then used to assign each example to a cluster. Using the clusters obtained, new cluster centroids are calculated. The above steps are repeated until there is no significant change in the centroids. Hard clustering algorithms assign each example to one and only one cluster. This model is inappropriate for real data sets in which the boundaries between the clusters may not be well defined. Fuzzy algorithms can partially assign data to multiple clusters. The strength of membership in the cluster depends on the closeness of the example to the cluster center. The Fuzzy C Means algorithm (FCM), allows an example to be a partial member of more than one cluster. The FCM algorithm is based on

Copyright Š 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Ant based rule mining with parallel fuzzy cluster

minimizing the objective function. The drawback of clustering algorithms like FCM and HCM, which are based on the hill climbing heuristic, is, prior knowledge of the number of clusters in the data is required and they have significant sensitivity to cluster center initialization. The proposal of this work moves in the direction of constructing C fuzzy means clustering with ant colony optimization (parallel ant agents) in evolving efficient rule mining techniques. In this thesis, the proposal introduces the problem of combining multiple partitionings of a set of objects without accessing the original features. The system first identify several application scenarios for the resultant `knowledge reuse' framework that the system call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information in building rule mining techniques. In addition to a direct maximization approach, the system proposes three effective and efficient techniques for obtaining high-quality combiners. 2. RELATED WORKS Andrea Baraldi and Palma Blonda,[1] describe, equivalence between the concepts of fuzzy clustering and soft competitive learning in clustering algorithms was proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clustering algorithms. Alfred Ultsch systems for clustering with collectives of autonomous agents follow either the ant approach of picking up and dropping objects or the DataBot approach of identifying the data points with artificial life creatures. In DataBot systems the clustering behaviour is controlled by movement programs. Julia Handl and Bernd Meyer Sorting and clustering methods inspired by the behavior of real ants are among the earliest methods in antbased meta-heuristics. The system revisits these methods in the context of a concrete application and introduces some modifications that yield significant improvements in terms of both quality and efficiency. Firstly, re-examine their capability to simultaneously perform a combination of clustering and multi-dimensional scaling. In J.Handl, J.Knowles and M.Dorigo Ant-based clustering and sorting is a nature-inspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to text-mining, all with some promise. However, although early results were broadly encouraging, there has been very limited analytical evaluation of the algorithm. Alexander Strehl, Joydeep Ghosh introduces the problem of combining multiple partitioning of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitioning. The system first

14

identify several application scenarios for the resultant `knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, the system proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitioning and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of the techniques, it is quite feasible to use a supraconsensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. The system evaluates the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets. Nicolas Labroche, Nicolas Monmarch´e and Gilles Venturini introduces a method to solve the unsupervised clustering problem, based on a modeling of the chemical recognition system of ants. This system allow ants to discriminate between estimates and intruders, and thus to create homogeneous groups of individuals sharing a similar odor by continuously exchanging chemical cues. This phenomenon, known as ”colonial closure”, inspired us into developing a new clustering algorithm and then comparing it to a well-known method such as K-MEANS method. The previous literature work on fuzzy cluster depicted above insists on the following parameters. The first one handles the functional attribute with the theoretical analysis. The second and third one deal with the cluster object movement issues on synthetic data sets. The fourth and fifth one deals with heuristic ant optimization model with trial repetition. Sixth and seventh authors utilized unsupervised cluster with class tree structuring. The final one uses c-fuzzy mean cluster in the sequential way. This motivates us to precede our proposal on ACO with c-fuzzy means. Based on the C-Fuzzy sequential clustering of ACO Problem, we derived a parallel fuzzy ant clustering model to improve the attribute accuracy rate and faster execution on the proposed problem domain.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Sankar K and Krishnamoorthy K

3. FUZZY ANT CLUSTERING Ant-based clustering algorithms are usually inspired by the way ants cluster dead nest mates into piles, without negotiating about where to gather the corpses. These algorithms are characterized by the lack of centralized control or a priori information, which makes them very appropriate candidates for the task at hand. Since the Fuzzy ants algorithm does not need initial partitioning of the data or a predefined number of clusters, it is very well suited for the Web People Search task, where the system do not know in advance how many clusters (or individuals) correspond to a particular document set (or person name in the case). A detailed description of the algorithm is given by Schockaert et al. It involves a pass in which ants can only pick up one item as well as a pass during which ants can only pick up an entire heap. A fuzzy ant-based clustering algorithm was introduced where the ants are endowed with a level of intelligence in the form of IF / THEN rules that allow them to do approximate reasoning. As a result, at any time the ants can decide for themselves whether to pick up a single item or an entire heap, which makes a separation of the clustering in different passes superuous. The system has experiment with a different number of ant’s runs and fixed the number of runs to 800000 for the experiments. In addition, the system has also evaluated different values for the parameters that determine the probability that a document or heap of documents is picked up or dropped by the ants and kept following values for the experiments: Table 1: Parameter settings for fuzzy clustering n1 probality of dropping one 1 item m1 probality of picking up one 1 item n2 probality of dropping an 5 entire heap m2 probality of picking up a 5 heap 3.1 Hierarchical Clustering The second clustering algorithm the system applies is an agglomerative hierarchical approach. This clustering algorithm builds a hierarchy of clustering’s that can be represented as a tree (called a dendrogram) which has singleton clusters (individual documents) as leaves and a single cluster containing all documents as root. An agglomerative clustering algorithm builds this tree from the leaves to the top, in each step merging the two clusters with the largest similarity. Cutting the tree at a given height gives a clustering at a selected number of clusters. The system have opted to cut the tree at different similarity thresholds between the document pairs, with intervals of 0.1 (e.g. for threshold 0.2 all document pairs with similarities

above 0.2 are clustered together). For the experiments, the system has used an implementation of Agnes (Agglomerative Nesting) that is fully described. 3.2 Fuzzy Ant Parallel System Clustering approaches are typically quite sensitive to initialization. In this thesis, the system examine a swarm inspired approach to building clusters which allows for a more global search for the best partition than iterative optimization approaches. The approach is described with cooperating ants as its basis. The ants participate in placing cluster centroids in feature space. They produce a partition which can be utilized as is or further optimized. The further optimization can be done via a focused iterative optimization algorithm. Experiments were done with both deterministic algorithms which assign each example to one and only one cluster and fuzzy algorithms which partially assign examples to multiple clusters. The algorithms are from the Cmeans family. These algorithms were integrated with swarm intelligence concepts to result in clustering approaches that were less sensitive to initialization. 4. EXPERIMENTAL SIMULATION ON ANT BASED PARALLEL CLUSTER The system implementation of fuzzy ant based parallel clustering algorithm for rule mining used three real data sets obtained from UCI repository. The data sets were Iris Human Data Set, Wine Recognition Data Set, and Glass Identification Data Set. The simulation conducted in matlab normalizes the feature values between 0 and 1. The normalization is linear. The minimum value of a dataset specific feature is mapped to 0 and the maximum value of the feature is mapped to 1. Initialize the ants with random initial values and with random direction. There are two directions, positive and negative. The positive direction means the ant is moving in the feature space from 0 to 1. The negative direction means the ant is moving in the feature space from 1 to 0. Clear the initial memory. The ants are initially assigned to a particular feature within a particular cluster of a particular partition. The ants never change the feature, cluster or the partition assigned to them. Repeat For one epoch /* One epoch is n iterations of random ant movement */ For all ants With a probability Prest the ant rests for this epoch If the ant is not resting then with a probability Pcontinue the ant continues in the same direction else it changes direction With a value between Dmin and Dmax the ant moves in the selected direction The new Rm value is calculated using the new cluster centers calculated by recording the

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

15


Ant based rule mining with parallel fuzzy cluster

5. RESULTS AND DISCUSSIONS The ants move the cluster centers in feature space to find a good partition for the data. There are less controlling parameters than the previous ant based clustering algorithms. The previous ant clustering algorithms typically group the objects on a two-dimensional grid. Results from 18 data sets show the superiority of the algorithm over the randomly initialized FCM and HCM algorithms. For comparison purposes, Table 2 shows the frequency of occurrence of different extrema for the ant initialized FCM and HCM algorithms and the randomly initialized FCM and HCM algorithms.

16

Table 3- Frequency of different extrema from parallel fuzzy based ant clustering, for Glass (2 class) Iris and Wine data set Data Set

Extr ema

Glass (2 class)

34.1 320 34.1 343 34.1 372 34.1 658 6.99 81 7.13 86 10.9 083

Iris

Wine

Freque ncy HCM, and Initializa tion 19

Frequency HCM, random Initializatio n

Sequential C-Fuzzy ACO (Existing)

3

31

Parallel CFuzzy ACO (Propo sed) 27.8

11

19

32.12

28.5

19

15

32.36

29.1

1

5

32.89

29.82

50

23

5.3938

4.23

0

14

5.8389

4.3658

0

5

8.3746

5.3256

12.1 437

0

8

10.6434

8.2356

9.36 45 11.3 748 13.8 483

20

2

5.2369

3.2567

15

20

8.2356

5.236

12

18

10.2356

8.3656

The ant initialized parallel ant fuzzy algorithm always finds better extrema for the Iris data set and for the Wine data set the ant initialized algorithm finds the better extrema 49 out of 50 times. The ant initialized HCM algorithm always finds better extrema for the Iris data set and for the Glass (2 class) data set a majority of the time. For the different Iris, the ant initialized parallel algorithm finds a better extrema most of the time. The ACO approach was used to optimize the clustering criteria, the ant approach for parallel C Means, found better extrema 64% of the time for the Iris data set. The ant initialized parallel C fuzzy ACO finds better extrema all the time. The number of ants is an important parameter of the algorithm. This number only increases when more partitions are searched for at the same time; as ants are (currently) added in increments (Graph 1 and Graph 2). The quality of the final partition improves with an increase of ants, but the improvement comes at the expense of increased execution time. Graph 1: Number of Iterations Vs Time No.of.Iterations VS. Time 25 20 T im e

position of the ants known to move the features of clusters for a given partition. If the partition is better than any of the old partitions in memory then the worst partition is removed from the memory and this new partition is copied to the memories of the ants making up the partition. If the partition is not better than any of the old partitions in memory Then With a probability P Continue Current the ant continues with the current partition Else With a probability 0.6 the ant moves to the best known partition, with a probability 0.2 the ant moves to the second best known partition, with a probability 0.1 the ant goes to the third best known partition, with a probability 0.075 the ant goes to the fourth best known partition and with a probability 0.025 the ant goes to the worst known partition Until Stopping criteria The stopping criterion is the number of epochs. Table 2- Parameter Values Parameter Value Number of ants 30 * c * # features Memory per ant 5 Iterations per epoch 50 Epochs 1000 Prest 0.01 Pcontinue 0.75 PContinueCurrent 0.20 Dmin 0.001 Dmax 0.01 Note the multiplier 30 for the number of ants allows for 30 partitions. Three data sets Glass Data Set, Wine Data Set, Iris Data Set were evaluated from a mixture of five Gaussians. The probability distribution across all the data sets is the same but the means and standard deviations of the Gaussians are different. Of the three data sets, two data sets had 500 instances each and the remaining one data set had 1000 instances each. Each instance had two attributes. To visualize the Iris data set, the Principal Component Analysis (PCA) algorithm was used to project the data points into a 2D and 3D space.

15

Ant Fuzzy Parallel Ant Fuzzy Sequential

10 5 0 1

2

3

4

5

6

7

8

9 10

No.of.Iterations

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Sankar K and Krishnamoorthy K

[2]

Graph 2: Time Vs Path Length

Path Length

Time VS. Path Length

[3]

35

35

30

30

25

25

20

20

Ant Fuzzy Sequential

15

15

Ant Fuzzy Parallel

10

10

5

5

0

[4]

0 1

2

3

4

5

6

7

8

9 10

Time

[5] [6]

Kanade P.M. and Hall L.O. (2003) IEEE Transactions on Fuzzy Systems , 11(2), 227-232. Handl J. and Meyer B. (2002) SpringerVerlag, 2439, 913-923. Handl J., Knowles J. and Dorigo M. (2003) IOS Press, Amsterdam, the Netherlands, 204-213. Strehl A. and Ghosh J. (2002) Journal of Machine Learning Research 3, 583-617. Labroche N., Monmarche N. and Venturini G. (2002) France: IOS Presss, 345-349.

7. CONCLUSION The system discussed a swarm inspired optimization algorithm to partition or creates clusters of data. The system described it using the ant paradigm. The approach is to have a coordinated group of ant’s position cluster centroids in feature space. The algorithm was evaluated with a soft clustering formulation utilizing the fuzzy c-means objective function and a hard clustering formulation utilizing the hard cmeans objective function. The presented clustering approach seems clearly advantageous for the data sets where it is expected there will be lots of local extrema. The cluster discovery aspect of the algorithm provides the advantage of obtaining a partition at the same time it indicates the number of clusters. That partition can be further optimized or accepted as is. This is in contrast to some other schemes which require partitions to be created with different numbers of clusters and then evaluated. The results are generally a superior optimized partition (objective function) than obtained with FCM/HCM. One needs a large number of random initializations to be competitive in terms of skipping some of the poor local extrema which was done with the antbased algorithm. It has provided enhanced final partitions on average than a previously introduced evolutionary computation clustering approach for several data sets. Random initializations have been shown to be the best approach for the c-means family and the ant clustering algorithm results in generally better partitions than a single random initialization. The parallel version of the ants algorithm could operate much faster than the current sequential implementation, thereby making it a clear choice for minimizing the chance of finding a poor extrema when doing c-means clustering. This algorithm should scale better for large numbers of examples than grid-based ant clustering algorithms. REFERENCES [1] Baraldi A. and Blonda P. (1999a) IEEE Transactions on Systems, Man, and Cybernetics, 29(6), 778-785.

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

17


Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010, pp-18-22

Data mining- A Mathematical Realization and cryptic application using variable key Chakrabarti P. Sir Padampat Singhania University, Udaipur-313601, Rajasthan, India, prasun9999@rediffmail.com Abstract- In this paper we have depicted the various mathematical models based on the themes on data mining. The numerical representations of regression and linear models have been explained. We have also shown the prediction of datum in the light of statistical approaches namely probabilistic approach, data estimation and dispersion theory. The paper also deals with the efficient generation of shared keys required for direct communication among co-processors without active participation of server. Hence minimization of time-complexity, proper utilization of resource as well as environment for parallel computing can be achieved with higher throughput in secured fashion. The techniques involved are cryptic methods based support analysis, confidence rule, resource mining, sequence mining and feature extraction. A new approach towards realizing variability concept of key in Wide – Mouth Frog Protocol, Yahalom Protocol and SKEY Protocol has been depicted in this context. Keywords-data mining, regression, dispersion theory, sequence mining, variable key Regression based data-mining techniques A. Concept We have pointed out the scenario where the prediction of dependency of a datum at time instant t1 on another at t2 can be computed. If we assume d1 as datum at t1 and d2 as datum at t2 then we can write the following equation as d2 = a + bd1.. (1) Where a,b are constants. Data prediction based on linear regression model has been concentrated. B. Linear representation As per statistical prediction let the predicted value of a datum d is ∆1 . We assume that its original; value is ∆2. As per data mining based regression model , we can denote ∆i = d 2,i – (a+ b d1,i ) as the error in taking a + b d 1,i for d 2,i and this is known as error of estimation . Prediction based on probabilistic approach Suppose observed data be k1, k2, k3 ..km have respective probability p1,p2, …….pn. m When ∑ pi = 1 i=1 then E(k) = ∑ ki pi = 1 .(2), i=1 provided it is finite. Here, we are use bivariate probability based on K (k1, k2, k3……km) i.e. set of observed data and Q (q1, q2, q3, …….qn) i.e. set of predictive values , ( 1 < m < n) Theorem 1 If the observed data set value and predicted data set value be two jointly distributed random variable then E ( K + Q) = E (K) + E(Q) . Proof : K assume values k1, k2, k3 … km Q assume values q1, q2, q3 …. qm

P(K=ki, Q = qj) = pij, i = 1 to n and j = 1 to n E (K + Q) = ∑ ∑ (ki + qj) pij i j = ∑ ∑ ki pij + ∑ ∑ qj pij i j i j = ∑ ki ∑ pij + ∑ qj ∑ pij i j j i E( K + Q) = E (K) + E(Q)...(3) Similarly, E( K * Q) = E (K) * E (Q)…(4) Prediction based on datum estimation Let the data space be (k1, k2, k3----kn), let distribution function f1(k1) of random variable k involves a parameter whose value is unknown and we have to uses value of  on the basis of observed data space (k1,k2,…….km) where (m < n). We have to select  = f2(k1,k2,……..km), it is basically a number and it is taken as a given for the value of . Hence,  is an estimation of  and value of  obtained from observed data space is on estimate of .    should be negligible for successful prediction of datum. Now, we can represent the datum assumption criteria as below : E () =  for true value of  ….(5) and Var () <= Var (Ψ), for ..(6) True value of and Ψ being any other estimate satisfying equation (5). Hence the data prediction has been pointed out on the basis of property of unbiasedness (equation (5)) and property of maximum variance ( equation(6)). Prediction based on dispersion theory and pattern analysis The values of the data for different sessions are not all equal. In some cases the values are close to one another, where in some cases they are highly dedicated from one another. In order to get a proper idea about overall nature of a given set of values, it is necessary to know, besides average, the extent to which the data differ

Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Data mining- A Mathematical Realization and cryptic application using variable key

among themselves or equivalently, how they are scattered about the average. Let the values k1, k2, k3…….km are the obtained data and c be the average of the original values of km+1, km+2, ……………kn Mean Deviation of k about c will be given by 1 n-m MDc = ________ ∑ | ki – c | ..(7) ( n – m) i = 1 In particular , when c = k , mean deviation about mean will be given by 1 n-m MDk = ________ ∑ | ki – ki | .(8) ( n – m) i = 1 B. Pattern matching We want to study the trend analysis of future events based on prediction using previously observed data. If the event delivers some numerical based data estimation , then we can predict so in certain forms. We assume dp to be predicted datum and do as observed datum If dp and do are linearly related, then dp = a + b do…..(9) If exponentially related , the equation will be in the form of dp = ab do..(10). If logarithmic transformation based prediction rule is observed, then the equation will be Dp = A + B d t..(11). where Dp = log dp , A = log a and B = log b. In case of data merging towards obtaining a meaningful information , the convention rule is as follows- di=> d(i+k)mod n where di €D , k is the offset value and n is the number of sensed data elements ie. number of elements of set D.The value of k varies from stage to stage. Communication based on support A .Scheme A and B are two parties . K1,K2,K3,K4,K5,K6 are keys which are protected to A and B only . A sends message m1,m2,m3,m4,m5,m6 in encrypted form with the help of one or more keys . Third party will decipher each message by errorand-trial method and form sets . The key having maximum support is the shared key between A and B . If the number of shared key is more than one then that one is primary while other one is candidate to it .Here we will find shared key so that the third party will not be able to decipher the message . B. Mathematical Analysis Message Encrypted Key m1 ek1=f(k1,k3,k4,k6)=k1^k3^k4^k6 m2 ek2=f(k3,k5)=k3^k5 m3 ek3=f(k4,k5,k6)=k4^ k5^k6 m4 ek4=f(k2,k3,k5)=k2^k3^k5 m5 ek5=f(k1,k2)=k1^k2 m6 ek6=f(k1,k2,k3,k6)=k1^k2^k3^k6 So, it is seen that k3 is supported by 4 out of 6 sets of shared key . This support of k3=66.6% . Hence shared key of A& B is k3. If hacker hacks k1,k2…….,k6 then by applying error-and-trial it

will get shared key. So concept of automatic variable shared key is proposed. The concept is that shared key = (key having maximum support) xor (xor of the value of messages where the support is not available) . Hence, k3= key having maximum support , m3,m5= messages encrypted without k3 . Therefore , shared key =k3^m3^m5 . This scheme cannot be revealed to the hacker . So it will hack k3 instead modified value of the shared key. Communication based on confidence rule A. Scheme Input:- m1,m2,m3,m4,m5,m6 to A. K1,K2,K3,K4,K5,K6 TO A and B. Step1: A encrypts each of the messages with combination of the keys and sends it to B. Step2: B finds the key which has the confidence level of 100 %,i.e. key1=>key2. If key1 exists, then key2 will also exist and hence confidence of Key1=>key2 is 100 %. Step3: Shared key is key1. Step4: ( Application only for enhancing security level ) Shared (key=key1) XOR (key-new) , where key-new can be obtained such that key-new=>key1 is minimum. B. Mathematical Analysis Message Encrypted Keys m1 Sk1=(k1,k3,k4,k6)=(k1^k3^k4^k6) m2 Sk2=(k3,k5)=(k3^k5) m3 Sk3=(k4,k5,k6)=(k4^k5^k6) m4 Sk4=(k2,k3,k5)=(k2^k3^k5) m5 Sk5=(k1,k2)=(k1^k2) m6 Sk6=(k1,k2,k3,k6)=(k1^k2^k3^k6) Only k4=>k6 has confidence level of 100 % . So, shared key=k4(up to step 3). Association Scheme Probability k1=>k4 1/3 k2=>k4 0 k3=>k4 1/4 k5=>k4 1/2 k6=>k4 2/3 So, key-new=k2 since it has least probability . Hence , shared key=k4 XOR k2. Statistical approaches of resource mining A. Based on prediction of most frequent word The most frequent key can be obtained based on Max(f1,f2,….,fn) where f1, f2, ….,fn are relative frequencies and n is total number of keys. B. Based on prediction of variable within interval

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

19


Chakrabarti P

We can predict the value of a variable key if we can measure interval properly. We can apply this scheme in hacking.

Step 7 : Final shared key will be based on shared key in combined form of U1/U2/U3 and computation scheme.

Theorem 2 If a variable key changes (V) over time (t) in an exponential manner, in that case the value of the variable at the centre point an interval (a1, a2) is a geometric mean of its value at a1 and a2. a Proof: Let Va = mn a1 a2 Then Va1 = mn and Va2 = mn Now, value of V at (a1 + a2)/2 = mn (a1+a2)/2 2 (a1+a2) 1/2 = [m n ] a1 a2 1/2 = [(mn )(mn )] = (Va1Va2)1/2

B. Analysis The bits can be denoted by A,B,C,D,E,F. Combined sequence of U1: (A,B,D,E) (C,D) (A,B,C,D)

C. Based on prediction of interrelated variables In a message there may be a variable which is dependent on any other based on any equation in that case extraction can be made. Theorem 3 If a variable m related to another variable n in the form m = an, where a is a constant, then harmonic mean of n is related to that of n based on the same equation. Proof: Let x is no. of given values. If mHM = x / (∑ 1/mi) for i = 1 to x = x / (∑ 1/ani) [ Since mi = ani] = x / ( 1/a ∑ 1/ni) for i = 1 to x = a( x / ( ∑ 1/ni) for i= 1 to x = anHM Shared key generation in the light of sequence mining Let us suppose that four users viz.U1,U2,U3,U4 are in a network. Each of U1,U2,U3 transmits three messages to U4 in successive sessions. Sender Key Operations U1 110110 U1(m1) U4 U2 100101 U2(m1) U4 U3 001010 U3(m1) U4 U1 001100 U1(m2) U4 U2 000011 U2(m2) U4 U3 100001 U3(m2) U4 U1 111100 U1(m3) U4 U2 000001 U2(m3) U4 U3 110100 U3(m3) U4 A . Algorithm Step 1 : Designate each bit of key as a character. Step 2 : If the character index value is 1 include it in sequence. Step 3 : else ignore the value. Step 4 :Identify the pattern that is decided by the communicating party and fetch the combination. Step 5 : The shared key for each user will be based on the combined result Step 6 : Repeat the steps 1to5 for other users

Table 1- Combined sequence forU1 Sequence

Session

A

B

C

D

E

F

1

1

1

1

0

1

1

0

2

4

0

0

1

1

0

0

3

7

1

1

1

1

0

0

Combined sequence (A,D,F) (E,F) (F)

of

U2

:

Table 2- Combined sequence forU2 Sequence

Session

A

B

C

D

E

F

1

2

1

0

0

1

0

1

2

5

0

0

0

0

1

1

3

8

0

0

0

0

1

1

Combined sequence of U3 : (C,E) (A,F) (A,B,D) Table 3- Combined sequence forU3 Sequence

Session

A

B

C

D

E

F

1

3

0

0

1

0

1

0

2

6

1

0

0

0

0

1

3

9

1

1

0

1

0

0

C. Method 1 Communicating parties : U1 and U4 (say).Sequence of AB and D are as follows : AB=2, D=3.Therefore x1=2 and x2=3 Therefore U1 will compute ((A.M.of2and3)*(H.M.of 2and3))1/2 and U4 will compute G.M. of 2and3.So, shared key= 61/2. If any occurrence becomes null, then that parameter value is treated as zero. D. Method 2 Communicating parties : U3 and U4 (say) In case of U3, Union becomes C E A F B D So, shared key of U3 and U4 is C E A F B D E. Method 3 Communicating parties : U2 and U4 (say) Shared key is based on intersection and it is F. Key using feature based method Let six messages are to be sent by the sender and those have to be encrypted by combination of one or more keys using some function.

20 Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Data mining- A Mathematical Realization and cryptic application using variable key

Table 4 - Association of keys against each message message Keys associated M1 SK1 = ( K1,K3,K4,K6) M2 SK2 = (K3,K5) M3 SK3 = (K4,K5,K6) M4 SK4 = (K2,K3,K5) M5 SK5 = (K1,K2) M6 SK6 = (K1,K2,K3,K6)

Then it will compute “f” based on TA and binary form of ASCII value of his name. 5) Thus he computes K, i.e. the session key with which he will communicate with Alice. 6) In the next iteration TA, KA will be changed and hence “f” and so on. The main advantage is that nowhere the transmission of key K is used.

Table 5 - Determination of count and value

Yahalom protocol using variable key Both Bob and Alice share a secret key with Trent. Let , RA = Nonce chosen by Alice NB =Number chosen by Bob based on RA ,A KA = Shared key between Alice and Trent KB = Shared key between Bob and Trent A=Alice’s name B=Bob’s name K=Random session key

Key

Initial value

Count

Value

(Value)

K1 K2 K3 K4 K5 K6

0.1 0.2 0.3 0.4 0.5 0.6

3 3 4 2 3 3

0.3 0.6 1.2 0.8 1.5 1.8

0.09 0.36 1.44 0.64 2.25 3.24

2

Now CF = ( x , y , z ) where x = number of elements , y = linear sum of the elements and z = sum of the square of the elements CF1 = ( 4 , 4.1 , 5.41) CF2 = ( 2 , 2.7 , 3.69) CF3 = ( 3 , 4.1 , 6.13 ) CF4 = ( 3 , 3.3 , 4.05 ) CF5 = ( 2 , 0.9 , 0.45 ) CF6 = ( 4 , 3.9 , 5.13 ) So CFnet = accumulation of maximum of each tuple = ( 4 , 4.1 , 6.13) So shared key = floor of modulus of (4.1 – 6.13) =2 Wide – mouth frog using variable key Both Alice and Bob share a secret key with a trusted server let Trent. The keys are just used for key distribution and not to encrypt any actual messages between users. The proposed algorithm is as follows1) Alice concatenates a timestamp, Bob’s name and a technique to deduce random session key based on timestamp and Bob’s name. She then encrypts the whole message with the key she shares with Trent. She sends this to Trent along with her name. Alice sends:- A,EKA(TA, B, f). 2) Trent decrypts the message. For enhanced security, he concatenates a new timestamp, Alice’s name, function “f” and the difference between TB and TA. He then encrypts the whole message with the key he shares with Bob. Trent sends: EKB(TB, A, f, d). Hence, f is automatic variable based on TB, d. 3) Bob decrypts it. He then first verify the sender’s name, and compute TA based on TA= TB – d

4)

1. 2.

Alice concatenates her name and a random number and sends it to Bob. Bob computes NB = RA + (binary form of ASCII value of Alice).

He sends Trent B, EKB (A, RA, f), where f= offset which when applied on NB yields RA. 3. Trent generates two messages to Alice EKA (B, K’, RA, f, d), EKB (A, K’, d),where K= session key random = f( K’, d). 4. Alice decrypts first message, extracts K using f((K’, d). Alice sends Bob two messages EKB (A, K’, d), EK (RA, f). 5. Bob decrypts A, K’, d are extracts of K like f(K’, d)= K.. Then he extracts NB using f(RA, f)≡ NB. It is to be remembered that the functions f(K’, d) and f(RA, f) should be reversible. Bob then matches whether NB has same value. At the end, both Alice and Bob are convinced that they are talking to the other and not to a third party. Advantage is that there is no use of transmitting NB and K. Demerit is calculation of NB and K using the functions specified. Analysis of skey using variable key SKEY is mainly a program for authentication and it is based on a one-way function. The proposed algorithm is as follows1. Host computes a Bernouli trial with biased coin for which p= probability of coming 1.q=(1-p)=probability of coming 0.Let number of trials be n. Assume n=6, and string=110011. 2. Host sends the string to Alice. 3. Alice modifies its own public key based on that the new public key = previous key + ( binary equivalent of the number of 1’s present in the string). 4. Alice creates a Shared Key.

Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010

21


Chakrabarti P

5. 6. 7. 8.

9.

Alice modifies the public key along with modification scheme with shared key. Alice then encrypts the string with her private key and sends back to the host along with her name. Host first decrypts public key and accordingly fetches it from database of Alice and computes the result. If match is found, then it performs another level of verification by decrypting the string with new value of Alice’s public key. If that also matches, then authentication of Alice is certified.

Conclusion The techniques involved for data prediction in this paper are namely regression rule, probabilistic approach, and datum estimation analysis and dispersion theory. We have also shown how pattern matching can be sensed. Several approaches of shared key computation on the basis of data mining techniques have been discussed in details with relevant mathematical analysis. Variable concept of key in Wide-Mouth Frog Protocol, Yahalom Protocol and SKEY Protocol has also been applied in cryptic data mining. References [1] Chakrabarti P., et. al. (2008) IJCSNS, 8,7. [2] Chakrabarti P., et. al. () Asian Journal of Information Technology, Article ID: 706AJIT [3] Chakrabarti P., et. al. () Asian Journal of Information Technology, Article ID: 743AJIT [4] Chakrabarti P., et. al. (2008) IJHIS . [5] Chakrabarti P. (2008) International conference on Emerging Technologies and Applications in Engineering, Technology and Sciences , Rajkot. [6] Chakrabarti P. (2008) ICQMOIT08, Hyderabad. [7] Schneier B. (2008) Applied Cryptography , Wiley-India Edition

22 Copyright © 2010, Bioinfo Publications, Advances in Information Mining, ISSN: 0975–3265, Volume 2, Issue 1, 2010


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.