Similarity Computations with an OAI-PMH Aggregator
Title: Similarity Computations with an OAI-PMH Aggregator
Research Question: How can we measure the similarity of metadata records using an OAI-PMH aggregator?
Methodology: We used the Vector Space Model (VSM) to compute the cosine similarity of documents based on the common terms present in the documents and their approximate importance. We implemented an OAI-PMH aggregator that uses the optional "about" container to re-export the results of similarity calculations. We harvested 3751 metadata records from NASA's Langley Technical Reports Server and computed similarities for these records.
Results: Our aggregator successfully computed similarities for the harvested metadata records. The results were useful for detecting duplicates, similarities, and metadata errors.
Implications: This proof of concept demonstrates how an OAI-PMH aggregator can be used to compute similarities among metadata records, which has many valuable uses such as detecting duplicate records, finding additional versions of the same work, and recommending similar documents. This method can be applied to the open corpus of OAI-PMH metadata to improve metadata management and discovery.
Link to Article: https://arxiv.org/abs/0401001v1 Authors: arXiv ID: 0401001v1