Page 39 - Demo
P. 39
%u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u062939II. Tasks of Data Preprocessing The major steps involved in data preprocessing are: 1- Data cleaning: By filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies. If users believe the data are dirty, they are unlikely to trust the results of any data mining that has been applied. 2- Data integration: Some attributes representing a given concept may have different names in different databases, causing inconsistencies and redundancies. For example, the attribute for customer identification may be referred to as customer id in one data store and cust-id in another. 3- Data reduction: Obtains a reduced representation of the data set that is much smaller in volume yet produces the same (or almost the same) analytical results. Data reduction strategies include: a- Dimensionality reduction: - data encoding schemes are applied to obtain a reduced or %u201ccompressed%u201d representation of the original data. Examples include data compression techniques (wavelet transforms and principal components analysis), attribute subset selection (removing irrelevant attributes), and attribute construction (where a small set of more useful attributes is derived from the original set). b- Numerosity reduction: - the data are replaced by alternative, smaller representations using parametric models (regression or log-linear models) or nonparametric models (histograms, clusters, sampling, or data aggregation). 4- Data transformation and data discretization: - data transformation routines convert the data into appropriate forms for mining, we can do it by : a-Smoothing: -this uses binning, regression, and clustering to remove noise from the data. b-Attribute construction: - new attributes are constructed and added from the given set of attributes c- Aggregation: - aggregation operations are performed on the data dNormalization: - the attribute data is scaled so as to fall within a smaller