Page 47 - Demo

P. 47

                                
                                    %u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u062947attribute%u2013value/frequency pair, the buckets are called singleton buckets. Often, buckets instead represent continuous ranges for the given attribute. Buckets are Partitioned according to rules: Equal-width (equal bucket range) where In an equal-width histogram, the width of each bucket range is uniform, Equalfrequency (or equal-depth) where the buckets are created so that, roughly, the frequency of each bucket is constant Clustering: Partition data set into clusters based on similarity, and store cluster representation (e.g., centroid and diameter) only can be very effective if data is clustered but not if data is %u201csmeared%u201d Can have hierarchical clustering and be stored in multidimensional index tree structures, there are many choices of clustering definitions and clustering algorithms. Centroid distance is an alternative measure of cluster quality and is defined as the average distance of each cluster object from the cluster centroid.  Sampling: Is obtaining a small sample s to represent the whole data set N. it allows a mining algorithm to run in complexity that is potentially sub-linear to the size of the data Types of Sampling: a- Simple random sampling: -there is an equal probability of selecting any particular item. b- Sampling without replacement: -once an object is selected, it is removed from the population
41 42 43 44 45 46 47 48 49 50 51