posted by user: grupocole || 3230 views || tracked by 11 users: [display]

FGGSIR 2010 : Feature Generation and Selection for Information Retrieval


When Jul 23, 2010 - Jul 23, 2010
Where Geneva, Switzerland
Submission Deadline May 30, 2010
Notification Due Jun 25, 2010
Final Version Due Jul 5, 2010
Categories    information retrieval

Call For Papers

Call for Papers

Feature Generation and Selection for Information Retrieval
Workshop at the 33rd Annual ACM SIGIR Conference (SIGIR 2010)

July 23, 2010
Geneva, Switzerland


We solicit submissions for the Workshop on Feature Generation
and Selection for Information Retrieval, to be held on July 23,
2010, in Geneva, Switzerland, in conjunction with the 33rd
Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR 2010). The workshop
will bring together researchers and practitioners from academia
and industry to discuss the latest developments in various
aspects of feature generation and selection for textual
information retrieval.

Modern information retrieval systems facilitate information
access at unprecedented scale and level of sophistication.
However, in many cases the underlying representation of text
remains quite simple, often limited to using a weighted bag of
words. Over the years, several approaches to automatic feature
generation have been proposed (such as Latent Semantic
Indexing, Explicit Semantic Analysis, Hashing, and Latent
Dirichlet Allocation), yet their application in large scale
systems still remains the exception rather than the rule. On
the other hand, numerous studies in NLP and IR resort to
manually crafting features, which is a laborious and expensive
process. Such studies often focus on one specific problem, and
consequently many features they define are task- or
domain-dependent. Consequently, little knowledge transfer is
possible to other problem domains. This limits our
understanding of how to reliably construct informative features
for new tasks.

An area of machine learning concerned with feature generation
(or constructive induction) studies methods that endow
computers with the ability to modify or enhance the
representation language. Feature generation techniques search
for new features that describe the target concepts better than
the attributes supplied with the training instances. It is
worthwhile to note that traditional machine learning data sets,
such as those available from the UCI data repository, are only
available as feature vectors, while their feature set is
essentially fixed. In fact, feature generation for specific UCI
benchmark datasets is scorned upon. On the other hand, textual
data is almost always available in its raw format (in some case
as structured data with sufficient side information). Given the
importance of text as a data format, it is well worthwhile
designing text-specific feature generation algorithms.
Complementary to feature generation, the issue of feature
selection arises. It aims to retain only the most informative
features, e.g., in order to reduce noise and to avoid
overfitting, and is essential when numerous features are
automatically constructed. This allows us to deal with features
that are correlated, redundant, or uninformative, and hence we
may want to decimate them through a principled selection

We believe that much can be done in the quest for automatic
feature generation for text processing, for example, using
large-scale knowledge bases as well as sheer amounts of textual
data easily accessible today. We further believe the time is
ripe to bring together researchers from many related areas
(including information retrieval, machine learning, statistics,
and natural language processing) to address these issues and
seek cross-pollination among the different fields.

Papers from a rich set of empirical, experimental, and
theoretical perspectives are invited. Topics of interest for
the workshop include but are not limited to:
- Identifying cases when new features should be constructed
- Knowledge-based methods (including identification of appropriate
knowledge resources)
- Efficiently utilizing human expertise (akin to active learning,
assisted feature construction)
- (Bayesian) nonparametric distribution models for text (e.g. LDA,
hierarchical Pitman-Yor model)
- Compression and autoencoder algorithms (e.g., information bottleneck,
deep belief networks)
- Feature selection (L1 programming, message passing, dependency
measures, submodularity)
- Cross-language methods for feature generation and selection
- New types of features, e.g., spatial features to support geographical
- Applications of feature generation in IR (e.g., constructing new
features for indexing, ranking)

The workshop will include invited talks as well as
presentations of accepted research contributions. The schedule
will provide time for both organized and open discussion.
Registration will be open to all SIGIR 2010 attendees.

Submission Instructions

Submissions should report new (unpublished) research results or
ongoing research. Submissions can be up to 8 pages long for
full papers, and up to 4 pages long for short papers. Papers
should be formatted in double-column ACM SIG proceedings format
for LaTeX, use "Option 2"). Papers must be in English and must
be submitted as PDF files.

Papers should be submitted electronically using the EasyChair
system at no
later than 23:59 Pacific Standard time, Sunday, May 30, 2010.

At least one author of each accepted paper will be expected to
attend and present their findings at the workshop.

Important Dates
Submission Deadline: May 30, 2010
Acceptance notification: June 25, 2010
Camera-ready submission: July 5, 2010
Workshop date: July 23, 2010

Invited speakers

The workshop will feature a keynote talk by Dr. Kenneth Church,
Chief Scientist of the Human Language Technology Center of
Excellence at the Johns Hopkins University. Additional invited
speakers are to be announced.

Organizing Committee
- Evgeniy Gabrilovich, Yahoo! Research, USA
- Alex Smola, Australian National University and Yahoo! Research, USA
- Naftali Tishby, Hebrew University of Jerusalem, Israel

Program Committee
- Francis Bach, INRIA, France
- Misha Bilenko, Microsoft Research, USA
- David Blei, Princeton, USA
- Karsten Borgwardt, Max Planck Institute, Germany
- Wray Buntine, NICTA, Australia
- Raman Chandrasekar, Microsoft Research, USA
- Kevyn Collins-Thompson, Microsoft Research, USA
- Silviu Cucerzan, Microsoft Research, USA
- Brian Davison, Lehigh University, USA
- Gideon Dror, Academic College of Tel-Aviv-Yaffo, Israel
- Wai Lam, CUHK, Hong Kong SAR, China
- Tie-Yan Liu, Microsoft Research Asia, China
- Donald Metzler, Yahoo Research, USA
- Daichi Mochihashi, NTT, Japan
- Filip Radlinski, Microsoft Research, United Kingdom
- Rajat Raina, Facebook, USA
- Pradeep Ravikumar, University of Texas at Austin, USA
- Mehran Sahami, Stanford, USA
- Le Song, CMU, USA
- Krysta Svore, Microsoft Research, USA
- Volker Tresp, Siemens, Germany
- Kai Yu, NEC, USA
- ChengXiang Zhai, UIUC, USA
- Jerry Zhu, University of Wisconsin, USA

Related Resources

ECNLPIR 2021   2021 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2021)
CSEIT 2021   8th International Conference on Computer Science, Engineering and Information Technology
AS-RLPMTM 2021   Applied Sciences special issue Rich Linguistic Processing for Multilingual Text Mining
NLCA 2021   2nd International Conference on Natural Language Computing Advances
KDIR 2021   13th International Conference on Knowledge Discovery and Information Retrieval
CSIA 2021   12th International Conference on Communications Security & Information Assurance
IWoSR 2021   2021 International Workshop on Service Robotics (IWoSR 2021)
CIoT 2021   3rd International Conference on Internet of Things
ICONIP 2021   The 28th International Conference on Neural Information Processing (ICONIP2021)
TPDL 2021   25th International Conference on Theory and Practice of Digital Libraries