|
SWAT abstracts 2007 |
| ||||||
|
Z. Pan, A. Qasem, S. Kanitkar, F. Prabhakar and J. Heflin. Hawkeye: A Practical Large Scale Demonstration of SemanticWeb Integration. In Proc. of the 3rd International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS'07). Vilamoura, Algarve, Portugal, Nov 27, 2007 We discuss our DLDB knowledge base system and evaluate its capability in processing a very large set of real-world Semantic Web data. Using DLDB, we have constructed the Hawkeye knowledge base, in which we have loaded more than 166 million facts from a diverse set of real-world data sources. We use this knowledge base to demonstrate realistic integration queries in egovernment and academic scenarios. In order to support Hawkeye, we extended DLDB with additional reasoning capabilities. At present, the Semantic Web consists of numerous independent ontologies.We demonstrate that OWL can be used to integrate these ontologies and thereby integrate the data sources that commit to them. In terms of performance, we show that the load time of our system is linear on the number of triples loaded. Furthermore, we show that many complex queries have response times under one minute, and that simple queries can be answered in seconds. A. Qasem and J. Heflin. Efficient Selection and Integration of Data Sources for Answering Semantic Web Queries . In Proc. of Workshop on New forms of reasoning for the Semantic Web: scaleable, tolerant and dynamic, ISWC 07, Busan, Korea We present an approach to identifying the minimal set of potentially relevant Semantic Web data sources for a given query. Our solution involves the adaptation of an efficient information integration algorithm that has polynomial time complexity. We then use these selected sources and an OWL reasoner to answer queries on the Semantic Web. We introduce a concept of source relevance expressed in OWL to reduce the number of sources needed to get the answers to a query. As the Semantic Web is an autonomous entity, some of the data sources may contain data that are not described directly in terms of a given query ontology. In our solution we define and use a mapping language that is a subset of OWL for the purpose of aligning heterogeneous ontologies. Our implemented system supports a subset of SPARQL queries, simple OWL ontologies and data sources that commit to them. Since the time to load sources is a dominating factor in performance, and our system identifies the minimal set of potentially relevant sources, it is very efficient. We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a given work load of 20 domain ontologies, 20 map ontologies and 400 data sources is approximately 1 second. Furthermore, our system returned correct answers to 200 randomly generated queries in three different data configurations. A. Chitnis, A. Qasem and J. Heflin. Benchmarking Reasoners for Multi-Ontology Applications. In Proc. of Workshop on Evaluation of Ontologies and Ontology-Based Tools, ISWC 07, Busan, Korea, 2007. We describe an approach to create a synthetic workload for large scale extensional query answering experiments. The workload comprises multiple interrelated domain ontologies, data sources which commit to these ontologies, synthetic queries and map ontologies that specify a graph over the domain ontologies. Some of the important parameters of the system are the average number of classes and properties of the source ontology which are mapped with the terms of target ontology and the number of data sources per ontology. The ontology graph is described by various parameters like its diameter, number of ontologies and average out-degree of node ontology. These parameters give a significant degree of control over the graph topology. This graph of ontologies is the central component of our synthetic workload that effectively represents a web of data. Y. Guo and J. Heflin. Document-Centric Query Answering for the Semantic Web. 2007 IEEE/WIC/ACM International Conference on Web Intelligence (WI '07), pp. 409-415, 2007. In this paper, we propose document-centric query answering, a novel form of query answering for the Semantic Web. We discuss how we have built a knowledge base system to support the new queries. In particular, we describe the key techniques used in the system in order to address scalability issues. In addition, we show encouraging experimental results. A. Qasem, D. Dimitrov, and J. Heflin. An Efficient and Complete Distributed Query Answering System for Semantic Web Data. Technical Report LU-CSE-07-007, Dept. of Computer Science and Engineering, 2007. In this work we consider the problem of answering queries using distributed Semantic Web data sources. We define a mapping language that is a subset of OWL for the purpose of aligning heterogeneous ontologies. In order to answer queries we provide a two step solution. First, given a query we identify potentially relevant sources, which we call the source selection problem. We adapt an information integration algorithm to provide a complete solution to this problem in polynomial time. Second, we load these selected sources into an OWL reasoner, and thereby achieve complete answers to the queries. Since the time to load sources is a dominating factor in performance, and our system identifies the minimal set of potentially relevant sources, it is very efficient.We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a given work load of 20 domain ontologies, 20 map ontologies and 400 data sources is just a little over 1 second. Z. Pan, A. Qasem, S. Kanitkar, F. Prabhakar, and J. Heflin. Hawkeye: A Practical Large Scale Demonstration of Semantic Web Integration. Technical Report LU-CSE-07-006, Dept. of Computer Science and Engineering, 2007. At present, the Semantic Web consists of numerous independent ontologies. We put forward that OWL can be used to integrate these ontologies and thereby integrate the data sources that commit to them. In this paper we present the Hawkeye knowledge base, in which we have loaded more than 166 million facts from a diverse set of real-world data sources. In order to support Hawkeye, we extended our DLDB knowledge base system with additional reasoning capabilities. DLDB is a system that given sufficient OWL descriptions, can answer queries that span heterogeneous data sources. We use the Hawkeye knowledge base to demonstrate realistic integration queries in e-government and academic scenarios. For example, our system can produce answers that integrates Citeseer and DBLP. We achieve this integration in a declarative way by only using OWL. These queries cannot be answered by traditional search engines. Furthermore, we show that many complex queries have response times under one minute, and that simple queries can be answered in seconds. Y. Guo, A. Qasem, Z. Pan and J. Heflin. A Requirements Driven Framework for Benchmarking Semantic Web Knowledge Base Systems. In IEEE Transactions on Knowledge and Data Engineering: Special Issue: Knowledge and Data Engineering in the Semantic Web Era, 2007
A key challenge for the Semantic Web is to acquire the capability to effectively query
large knowledge bases. As there will be several competing systems, we need benchmarks that
will objectively evaluate these systems. Development of effective benchmarks in an emerging
domain is a challenging endeavor. In this paper, we propose a requirements driven framework
for developing benchmarks for Semantic Web Knowledge Base Systems (SW KBSs). In this
paper we make two major contributions. First, we provide a list of requirements for SW KBS
benchmarks. This can serve as an unbiased guide to both the benchmark developers and
personnel responsible for systems acquisition and benchmarking. Second, we provide an
organized collection of techniques and tools needed to develop such benchmarks. In particular,
the collection contains a detailed guide for generating benchmark workload, defining
performance metrics and interpreting experimental results. | |||||||