Lehigh University

SWAT abstracts 2005

SWAT logo
  SWAT Home
  People
  Projects
  Publications
  Downloads
  Contact Info
S. Wang, Y. Guo, A. Qasem, J. Heflin. Rapid Benchmarking for Semantic Web Knowledge Base Systems. Technical Report LU-CSE-05-026. CSE Department, Lehigh University, 2005

We present a method for rapid development of benchmarks for Semantic Web knowledge base systems. At the core, we have a synthetic data generation approach for OWL that is scalable and models the real world data. The data-generation algorithm learns from real domain documents and generates benchmark data based on the extracted properties relevant for benchmarking. We believe that this is important because relative performance of systems will vary depending on the structure of the ontology and data used. However, due to the novelty of the Semantic Web, we rarely have sufficient data for benchmarking. Our approach helps overcome the problem of having insuffi-cient real world data for benchmarking and allows us to develop benchmarks for a variety of domains and applications in a very time efficient manner. Based on our method, we have created a new Lehigh BibTeX Benchmark and con-ducted an experiment on four Semantic Web knowledge base systems. We have verified our hypothesis about the need for representative data by comparing the experimental result to that of our previous Lehigh University Benchmark. The difference in both experiments has demonstrated the influence of ontology and data on the capability and performance of the systems and thus the need of us-ing a representative benchmark for the intended application of the systems. Fi-nally, we evaluated the technique by comparing our synthetic data to real world data and proved that it is a reasonable substitute when sufficient data is not available.


Back to publications page

S. Wang, Y. Guo, A. Qasem, and J. Heflin. Rapid Benchmarking for Semantic Web Knowledge Base Systems. [Accepted] Fourth International Semantic Web Conference, Galway, Ireland, 2005

We present a method for rapid development of benchmarks for Semantic Web knowledge base systems. At the core, we have a synthetic data generation approach for OWL that is scalable and models the real world data. The data-generation algorithm learns from real domain documents and generates benchmark data based on the extracted properties relevant for benchmarking. We believe that this is important because relative performance of systems will vary depending on the structure of the ontology and data used. However, due to the novelty of the Semantic Web, we rarely have sufficient data for benchmarking. Our approach helps overcome the problem of having insufficient real world data for benchmarking and allows us to develop benchmarks for a variety of domains and applications in a very time efficient manner. Based on our method, we have created a new Lehigh BibTeX Benchmark and conducted an experiment on four Semantic Web knowledge base systems. We have verified our hypothesis about the need for representative data by comparing the experimental result to that of our previous Lehigh University Benchmark. The difference in both experiments has demonstrated the influence of ontology and data on the capability and performance of the systems and thus the need of using a representative benchmark for the intended application of the systems.


Back to publications page

Y. Guo, Z. Pan, and J. Heflin. LUBM: A Benchmark for OWL Knowledge Base Systems. In Journal of Web Semantics, Vol 3, Issue 2, 2005 (currently available via pre-print server http://www.websemanticsjournal.org/)

We describe our method for benchmarking Semantic Web knowledge base systems with respect to use in large OWL applications. We present the Lehigh University Benchmark (LUBM) as an example of how to design such benchmarks. The LUBM features an ontology for the university domain, synthetic OWL data scalable to an arbitrary size, fourteen extensional queries representing a variety of properties, and several performance metrics. The LUBM can be used to evaluate systems with different reasoning capabilities and storage mechanisms. We demonstrate this with an evaluation of two memorybased systems and two systems with persistent storage.

Back to publications page



Y. Guo, A. and J. Heflin. On Logical Consequence for Collections of OWL Documents [Accepted] Fourth International Semantic Web Conference, Galway, Ireland, 2005

In this paper, we investigate the (in)dependence among OWL documents with respect to the logical consequence when they are combined, in particular the inference of concept and role assertions about individuals. One the one hand, we present a systematic approach to identifying those documents that affect the inference of a given fact. On the other hand, we consider ways for fast detection of independence. First, we demonstrate several special cases in which two documents are independent of each other. Secondly, we introduce an algorithm for checking the independence in the general case. In addition, we describe two applications in which the above results have allowed us to develop novel approaches to overcome some difficulties with reasoning on large scale OWL data. Both applications demonstrate the usefulness of this work for improving the scalability of a practical Semantic Web system that relies on the reasoning about individuals.


Back to publications page