|
SWAT abstracts 2005 |
| ||||||
|
S. Wang, Y. Guo, A. Qasem, J. Heflin. Rapid Benchmarking for Semantic Web Knowledge Base Systems.
Technical Report LU-CSE-05-026. CSE Department, Lehigh University, 2005
We present a method for rapid development of benchmarks for Semantic Web
knowledge base systems. At the core, we have a synthetic data generation
approach for OWL that is scalable and models the real world data. The
data-generation algorithm learns from real domain documents and generates
benchmark data based on the extracted properties relevant for benchmarking. We
believe that this is important because relative performance of systems will
vary depending on the structure of the ontology and data used. However, due to
the novelty of the Semantic Web, we rarely have sufficient data for
benchmarking. Our approach helps overcome the problem of having insuffi-cient
real world data for benchmarking and allows us to develop benchmarks for a
variety of domains and applications in a very time efficient manner. Based on
our method, we have created a new Lehigh BibTeX Benchmark and con-ducted an
experiment on four Semantic Web knowledge base systems. We have verified our
hypothesis about the need for representative data by comparing the
experimental
result to that of our previous Lehigh University Benchmark. The difference in
both experiments has demonstrated the influence of ontology and data on the
capability and performance of the systems and thus the need of us-ing a
representative benchmark for the intended application of the systems.
Fi-nally,
we evaluated the technique by comparing our synthetic data to real world data
and proved that it is a reasonable substitute when sufficient data is not
available.
Back to publications page S. Wang, Y. Guo, A. Qasem, and J. Heflin. Rapid Benchmarking for Semantic Web Knowledge Base Systems. [Accepted] Fourth International Semantic Web Conference, Galway, Ireland, 2005 We present a method for rapid development of benchmarks for Semantic Web
knowledge base systems. At the core, we have a synthetic data generation
approach for OWL that is scalable and models the real world data. The
data-generation algorithm learns from real domain documents and generates
benchmark data based on the extracted properties relevant for benchmarking. We
believe that this is important because relative performance of systems will vary
depending on the structure of the ontology and data used. However, due to the
novelty of the Semantic Web, we rarely have sufficient data for benchmarking.
Our approach helps overcome the problem of having insufficient real world data
for benchmarking and allows us to develop benchmarks for a variety of domains
and applications in a very time efficient manner. Based on our method, we have
created a new Lehigh BibTeX Benchmark and conducted an experiment on four
Semantic Web knowledge base systems. We have verified our hypothesis about the
need for representative data by comparing the experimental result to that of our
previous Lehigh University Benchmark. The difference in both experiments has
demonstrated the influence of ontology and data on the capability and
performance of the systems and thus the need of using a representative benchmark
for the intended application of the systems. Back to publications page Y. Guo, Z. Pan, and J. Heflin. LUBM: A Benchmark for OWL Knowledge Base Systems. In Journal of Web Semantics, Vol 3, Issue 2, 2005 (currently available via pre-print server http://www.websemanticsjournal.org/) We describe our method for benchmarking Semantic Web knowledge base systems with respect to use in large OWL applications. We present the Lehigh University Benchmark (LUBM) as an example of how to design such benchmarks. The LUBM features an ontology for the university domain, synthetic OWL data scalable to an arbitrary size, fourteen extensional queries representing a variety of properties, and several performance metrics. The LUBM can be used to evaluate systems with different reasoning capabilities and storage mechanisms. We demonstrate this with an evaluation of two memorybased systems and two systems with persistent storage. Y. Guo, A. and J. Heflin. On Logical Consequence for Collections of OWL Documents [Accepted] Fourth International Semantic Web Conference, Galway, Ireland, 2005 In this paper,
we investigate the (in)dependence among OWL documents with respect to the
logical consequence when they are combined, in particular the inference of
concept and role assertions about individuals. One the one hand, we present a
systematic approach to identifying those documents that affect the inference of
a given fact. On the other hand, we consider ways for fast detection of
independence. First, we demonstrate several special cases in which two documents
are independent of each other. Secondly, we introduce an algorithm for checking
the independence in the general case. In addition, we describe two applications
in which the above results have allowed us to develop novel approaches to
overcome some difficulties with reasoning on large scale OWL data. Both
applications demonstrate the usefulness of this work for improving the
scalability of a practical Semantic Web system that relies on the reasoning
about individuals. | |||||||