Page 51 - Demo
P. 51


                                    %u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u062951Discretization divides the range of a continuous attribute into intervals, Interval labels can then be used to replace actual data values, reduce data size by discretization it can be supervised vs. unsupervised. Discretization can be performed recursively on an attribute and prepare for further analysis, e.g., classification. Methods of Discretization: a- Discretization by Binning:- binning is a top-down splitting technique based on a specified number of bins. These methods are also used as discretization methods for data reduction and concept hierarchy generation. For example, attribute values can be discretized by applying equal-width or equal-frequency binning, and then replacing each bin value by the bin mean or median. b- Discretization by Histogram Analysis:- histogram analysis is an unsupervised discretization technique because it does not use class information. c- Discretization by Cluster, Decision Tree, and Correlation Analyses:-Cluster analysis is a popular data discretization method. A clustering algorithm can be applied to discretize a numeric attribute, A, by partitioning the values of A into clusters or groups. It uses unsupervised learning and uses top-down split or bottom-up merge. Decision tree analysis uses supervised learning and uses topdown split. Correlation analysis uses unsupervised learning and uses bottom-up split. Simple Discretization by Binning : In equal-width (distance) partitioning, we divide the range into N intervals of equal size. If A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B %u2013A)/N. This is the most straightforward, but outliers may dominate presentation, and in this approach skewed data is not handled well. In Equal-depth (frequency), partitioning divides the range into N intervals, each containing approximately same number of samples it has good data scaling and managing categorical attributes can be tricky. Example : Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 Partition into equal-frequency (equi-depth) bins: 
                                
   45   46   47   48   49   50   51   52   53   54   55