SemSearch 2010 : International Semantic Search workshop at WWW 2010
Call For Papers
Deadline for submissions:
March 6th, 2010 (12:00 AM, GMT)
Notification of acceptance:
March 28th, 2010
April 6th, 2010
April 26th-30th, 2010
April 26th, 2010
In recent years we have witnessed tremendous interest and substantial economic exploitation of search technologies, both at web and enterprise scale. However, the representation of user queries and resource content in existing search appliances is still almost exclusively achieved by simple syntax-based descriptions of the resource content and the information need such as in the predominant keyword-centric paradigm (i.e. keyword queries matched against bag-of-words document representation).
On the other hand, recent advances in the field of semantic technologies have resulted in tools and standards that allow for the articulation of domain knowledge in a formal manner at a high level of expressivity. At the same time, semantic repositories and reasoning engines have only now advanced to a state where querying and processing of this knowledge can scale to realistic IR scenarios.
In parallel to these developments, in the past years we have also seen the emergence of important results in adapting ideas from IR to the problem of search in RDF/OWL data, folksonomies, microformat collections or semantically tagged natural text. Common to these scenarios is that the search is focused not on a document collection, but on metadata (which may be possibly linked to or embedded in textual information). Search and ranking in metadata stores is another key topic addressed by the workshop.
As such, semantic technologies are now in a state to provide significant contributions to IR problems.
In this context, several challenges arise for Semantic Search systems. These include, among others:
* How can semantic technologies be exploited to capture the information need of the user?
* How can the information need of the user be translated to expressive formal queries without enforcing the user to be capable of handling the difficult query syntax?
* How can expressive resource descriptions be extracted (acquired) from documents (users)?
* How can expressive resource descriptions be stored and queried efficiently on a large scale?
* How can vague information needs and incomplete resource descriptions be handled?
* How can semantic search systems be evaluated and compared with standard IR systems?
In this context, challenges for Semantic Search research will include, among others:
* How can semantic technologies be applied to the IR problems?
* How to address scalability and effectiveness of data Web search (by applying IR technologies)?
* How to allow web users to exploit the expressiveness of the semantic data on the Web? I.e. how to lower the technical barriers for users to ask complex questions and to interact with web data to obtain concrete answers for complex needs?
* And most importantly, how can this new generation of search systems that successfully exploit semantics for IR or for data Web search can be evaluated and compared (with standard IR systems or semantic repositories)?
Topics of Interest
Semantic Search is defined through two main directions. First is Semantic-driven IR, the application of semantic technologies to the IR problem. The second is Semantic Data Search, which mainly deals with the retrieval of semantic data. Main topics of interest for the envisioned workshop contributions include (but are not limited to) the following:
* Expressive Document Models
* Knowledge Extraction for Building Expressive Document Representation
* Matching and Ranking based on Expressive Document Representation
* Infrastructure for Semantic-driven IR
Semantic Data Search
* Crawling, Storage and Indexing of Semantic Data
* Semantic Data Search and Ranking
* Data Web Search: Search in Multi-Data-Source, Multi-Repository Scenarios
* Dealing with Vague, Incomplete and Dirty Semantic Data
* Infrastructure for Searching Semantic Data on the Web
Interaction Paradigms for Semantic Search
* Natural Language Interfaces
* Keyword-based Query Interfaces
* Hybrid Query Interfaces (A Combination of NL, Keywords, Forms, Facets, and Formal Queries)
* Visualization of Semantic Data and Expressive Document Representation on the Web
Evaluation of Semantic Search
* Evaluation Methodologies for Semantic Search
* Standard Datasets and Benchmarks for Semantic Search
* Infrastructure for Semantic Search Evaluation
Evaluation for Entity Search Track
Our ultimate goal is to develop a benchmark, based on which semantic search systems can be compared and analysed in a systematic fashion. Clearly, semantics can be used for different tasks (document vs. data retrieval) and can be exploited throughout the search process (for more usable query construction, for better matching and ranking, for richer results presenation etc). Hence, such a benchmark shall enable the study of different aspects of semantic search systems.
For this workshop, we will intially focus on the aspects of matching and ranking in the semantic data search scenario. In particular, we aim to analyze the effectiveness, efficiency and robustness of those features of semantic search systems which are ready to be applied to the Web today: A large share of Web search queries issued today are about entities, i.e. are of the type entity search queries. There is a large and increasing amount of semantic data about entities on the Web. The research questions we aim to tackle are:
* How well do semantic data search engines perform on the task of Entity Search on the Web?
* What are the underlying concepts and techniques that make up the differences?
For answering these questions, we provide the following guidelines and support for evaluating entity search systems:
Queries: We provide a set of queries that are focused on the task of entity search. These queries represent a sample extracted from the Yahoo Web search query log. Every query is a plain list of keywords.
Data: We provide a corpus of datasets which contain entity descriptions in the form of RDF. They represent a sample of Web data extracted from publicly available sources (selected LOD datasets such as DBPedia). Also, a large amount of entity descriptions comes from data associated with Web pages (Microformats, RDFa).
Relevance Judgement: The search systems produce lists of at most 10 entities ordered by relevance. These results have to be drawn from data in the corpus. Results will be evaluated via the three point scale (0) Not Relevant, (1) Relevant and (3) Perfect Match. A perfect match is a description of a resource that matches the entity to be retrieved by the query. A relevant result is a resource description that is related to the entity, i.e. the entity is contained in the description of that resource. Otherwise, a resource description is not relevant.
Evaluation Process: For participating, each system will have to run the provided queries on the corpus. We will provide a bechnmarking system for participants to submit their results. The assessment of the results will be performed manually using Amazon Mechanical Turk. Based on the relevance judgements, recall, precision, f-measure and the mean average precision will be computed, and used as the basis for comparing search systems' performance. The process of result submission, assessment and providing evaluation feedbacks will be supported by the benchmarking system.
* Marko Grobelnik, Jožef Stefan Institute, Ljubljana, Slovenia
* Peter Mika, Yahoo! Research, Barcelona, Spain
* Thanh Tran Duc, Institute AIFB, University of Karlsruhe (TH), Germany
* Haofen Wang, Apex Lab, Shanghai Jiao Tong University, China
* Bettina Berendt, Katholieke Universiteit Leuven, Belgium
* Wray Buntine, NICTA Canberra, Australia
* Pablo Castells, Universidad Autónonoma de Madrid, Spain
* Gong Cheng, Nanjing University, China
* Mathieu d'Aquin, KMi, Open University, England
* Miriam Fernandez, KMI, Open University, England
* Blaz Fortuna, Jožef Stefan Institute, Slovenia
* Lise Getoor, University Maryland, USA
* Rayid Ghani, Accenture Labs, USA
* Peter Haase, Fluid Operations, Waldorf, Germany
* Harry Halpin, University of Edinburgh, Scotland
* Andreas Harth, Institute AIFB, Karlsruhe Institute of Technology, Germany
* Michiel Hildebrand, Centre for Mathematics and Computer Science Amsterdam, Netherlands
* Wei Jin, North Dakota State Univeristy, USA
* Guenter Ladwig, Institute AIFB, Karlsruhe Institute of Technology, Germany
* Yuzhong Qu, Southeast University, Nanjing, China
* Sergej Sizov, University of Koblenz-Landau, Germany
* Kavitha Srinivas, IBM Research, Hawthorne, USA
* Nenad Stojanovic, FZI Karlsruhe, Germany
* Rudi Studer, Institute AIFB, University of Karlsruhe, Germany
* Cao Hoang Tru, HCMC University of Technology, HCMC, Vietnam
* Giovanni Tummarello, DERI, Galway, Ireland
* Yong Yu, Apex Lab, Shanghai Jiao Tong University, China
* Valentin Zacharias, FZI, Germany
* Ilya Zaihrayeu, University of Trento, Italy
* Hugo Zaragoza, Yahoo! Research Barcelona, Spain
* Lei Zhang, IBM Research, China
Submission and Proceedings
For submissions, the following rules apply:
* Full technical papers: up to 10 pages in ACM format
* Short position or demo papers: up to 5 pages in ACM format
* Submissions must be formatted using the WWW2010 templates available here.
* Submissions will be peer reviewed by three independent reviewers. Accepted papers will be presented at the workshop and included in the workshop proceedings.
* We will pursue a journal special issue with the topics of the workshop if we receive an appropriate number of high-quality submissions.
* Details on the proceedings and camera-ready formatting will be announced upon notification of the authors.
* Please use the following link to the submission system to submit your paper at Easychair Submission System for SemSearch10
The organization committee can be reached via firstname.lastname@example.org