home

MatchIT

MatchIT is a desktop tool that operates as an Eclipse plug-in or as a standalone application. Its purpose is to automate and facilitate schema-level mapping between disparate data sources using the semantics of the terms used within the schemas. MatchIT drastically reduces the domain expertise, manual effort, and computer resources needed to achieve successful data federation, integration and interoperability.


MatchIT includes an ontology of the English language called WordNet, but it offers users the ability to install and use their own ontologies as well. Currently, MatchIT has built-in importers for XSDs and JDBC data sources. It will import any schemas and associated metadata that are defined in a data source.

Vocabulary Discovery

Before mapping schemas, MatchIT must build a Vocabulary of the schemas. To do so, MatchIT discovers and extracts individual words from element names in the schemas using a variety of techniques. It then maps these words to their equivalents in the installed ontology. This forms the Vocabulary of the schemas. It contains all the words used in the schemas, along with the meaning of the words, as defined by the ontology. It can be edited as needed to accurately reflect the domain of the schemas. The Vocabulary is available for export as an OWL file for use in other tools (i.e. Knoodl).


Semantic Mapping

MatchIT uses a Vocabulary to compute the semantic similarity of terms used in the schemas that are to be mapped. It can automatically map any number of schemas to a single target schema.


Current data federation approaches offer matching capabilities that are based only on the syntax of terms used, not on the semantics of the terms. If only the syntax is examined, even minor variations in spelling, format, and/or terminology can render seemingly similar data irreconcilable. For example, a field named Cust_Num and another called ClientID would not be recognized as being similar unless the meaning of the terms is considered. It is clear that Customer and Client are synonyms and a Number can be a type of identification or ID. MatchIT's understanding of these relationships enables it to know that the two terms, Cust_Num and ClientID, are potential matches.


Candidate matches are ranked and scored by a battery of string, semantic and aggregate similarity algorithms and displayed with associated metadata. They can be accepted, rejected, or filtered by source or data type and via intuitive scoring and vocabulary parameters. Derived Vocabularies and their word senses can be edited, and the matching algorithms can be fine-tuned to obtain new ranked recommendations and alternative matches. Diagnostics explain each match recommendation, further empowering users to understand the underlying semantics.


MatchIT enables users to collaborate and to set up and maintain multiple matching projects that persist all results, imports, and exports, including lexicon modifications, discovered vocabularies, and schemas. Logging, notes and annotations allow progress to be monitored and audited and loose ends to be tagged for completion.