Page 43 - Demo
P. 43
%u062c%u0645%u064a%u0639 %u0627%u0644%u062d%u0642%u0648%u0642 %u0645%u062d%u0641%u0648%u0638%u0629 %u0640 %u0627%u0625%u0644%u0639%u062a%u062f%u0627%u0621 %u0639%u0649%u0644 %u062d%u0642 %u0627%u0645%u0644%u0624%u0644%u0641 %u0628%u0627%u0644%u0646%u0633%u062e %u0623%u0648 %u0627%u0644%u0637%u0628%u0627%u0639%u0629 %u064a%u0639%u0631%u0636 %u0641%u0627%u0639%u0644%u0647 %u0644%u0644%u0645%u0633%u0627%u0626%u0644%u0629 %u0627%u0644%u0642%u0627%u0646%u0648%u0646%u064a%u062943Data mining often requires data integration%u2014the merging of data from multiple data stores. Careful integration can help reduce and avoid redundancies and inconsistencies in the resulting data set. This can help improve the accuracy and speed of the subsequent data mining process. 1-Entity Identification Problem:- There are a number of issues to consider during data integration. Schema integration and object matching can be tricky. How can equivalent real-world entities from multiple data sources be matched up? This is referred to as the entity identification problem. For example, how can the data analyst or the computer be sure that customer id in one database and cust number in another refer to the same attribute? Examples of metadata for each attribute include the name, meaning, data type, and range of values permitted for the attribute, and null rules for handling blank, zero, or null values. 2- Detecting and resolving data value conflicts: For the same real world entity, attribute values from different sources are different Possible reasons: different representations, different scales, e.g., metric vs. British units 3-Handling Redundancy in Data Integration: Redundant data occur often when integration of multiple databases.Object identification: The same attribute or object may have different names in different databases. Derivable data: One attribute may be a %u201cderived%u201d attribute in another table, e.g., annual revenue. Redundant attributes may be able to be detected by correlation analysis and covariance analysis, Careful integration of the data from multiple sources may help reduce/avoid redundancies and inconsistencies and improve mining speed and quality. Correlation analysis :-Given two attributes, such analysis can measure how strongly one attribute implies the other, based on the available data. a- for nominal data