Page 45 - Demo
P. 45


                                    %u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u062945where n is the number of tuples, A and B are the respective means of A and B, %u03c3A and %u03c3B are the respective standard deviation of A and B, and %u03a3(aibi) is the sum of the AB cross-product If rA,B > 0, A and B are positively correlated (A%u2019s values increase as B%u2019s). The higher, the stronger correlation, rA,B = 0 means both are independent and if rAB < 0 indicates negatively correlated. VII. Data ReductionData reduction techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume, mining on the reduced data set should be more efficient yet produce the same (or almost the same) analytical results. This is done as a database/data warehouse may store terabytes of data and the complex data analysis may take a very long time to run on the complete data set. Data reduction strategies: 1- Dimensionality reduction: - is the process of reducing the number of random variables or attributes under consideration. When dimensionality increases, data becomes increasingly sparse, density and distance between points, which is critical to clustering, outlier analysis, becomes less meaningful. Dimensionality reduction helps eliminate irrelevant features and reduce noise, reduce time and space required in data mining, allow easier visualization. Dimensionality reduction techniques include Wavelet transforms, Principal Component Analysis and Attribute Subset Selection 2- Numerosity Reduction Reduce data volume by choosing alternative, smaller forms of data representation. There are two methods: a-Parametric methods (e.g., regression):-Assume the data fits some model, estimate model parameters, store only the parameters, and discard the data (except possible outliers) Ex.: Log-linear models%u2014obtain value at a point in m-D space as the product on appropriate marginal subspaces. b-Non-parametric methods:-Do not assume models, major families: histograms, clustering, sampling Parametric Data Reduction include Regression and Log-Linear Models : 
                                
   39   40   41   42   43   44   45   46   47   48   49