Page 41 - Demo
P. 41


                                    %u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u062941b- Regression: Smoothing is done by fitting the data into regression functions. Linear regression involves finding the %u201cbest%u201d line to fit two attributes (or variables) so that one attribute can be used to predict the other. Multiple linear regression is an extension of linear regression, where more than two attributes are involved, and the data are fit to a multidimensional surface. c- Clustering: Outliers may be detected by clustering, for example, where similar values are organized into groups, or %u201cclusters.%u201d Intuitively, values that fall outside of the set of clusters may be considered outliers d- Combined computer and human inspection: This is used to detect suspicious values and check by human (e.g., deal with possible outliers) 3- Data discrepancy: - Discrepancies can be caused by several factors, including poorly designed data entry forms that have many optional fields, human error in data entry, deliberate errors and data decay (outdated addresses). We can proceed with discrepancy detection by check:- a- Metadata:- This is where we can make use of the knowledge we gained about our data For example, what are the data type and domain of each attribute? What are the acceptable values for each attribute? 
                                
   35   36   37   38   39   40   41   42   43   44   45