Page 118 - Demo

P. 118

                                
                                    %u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u0629118The good clustering method will produce high quality clusters with high intra-class similarity ( cohesive within clusters) and low inter-class similarity ( distinctive between clusters). The quality of a clustering method depends on : a) the similarity measure used by the method , b) its implementation, and c) Its ability to discover some or all of the hidden patterns. Measure the Quality of Clustering : Dissimilarity/Similarity metric : Similarity is expressed in terms of a distance function, typically metric: d(i, j) . The definitions of distance functions are usually rather different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables . Weights should be associated with different variables based on applications and data semantics . Quality of clustering: There is usually a separate %u201cquality%u201d function that measures the %u201cgoodness%u201d of a cluster. It is hard to define %u201csimilar enough%u201d or %u201cgood enough%u201d . The answer is typically highly subjective. Considerations for Cluster Analysis: 1- Partitioning criteria : Single level partitioning and hierarchical partitioning (often, multi-level hierarchical partitioning is desirable. 2- Separation of clusters : Exclusive (e.g., one customer belongs to only one region) and non-exclusive (e.g., one document may belong to more than one class). 3- Similarity measure : Distance-based (e.g., Euclidian, road network, vector) and connectivity-based (e.g., density or contiguity) 4- Clustering space : Full space (often when low dimensional) vs. subspaces (often in high-dimensional clustering). The following are typical requirements of clustering in data mining : 1- Scalability: Many clustering algorithms work well on small data sets containing fewer than several hundred data objects; however, a large database may contain millions or even billions of
112 113 114 115 116 117 118 119 120 121 122