HiddenWeb 2014 : Structure, Mechanics, and Practical Uses of the Hidden Web
Call For Papers
For release in the Advances in Web Technologies and Engineering (AWTE) Book Series
The Advances in Web Technologies and Engineering (AWTE) Book Series seeks to create a stage where comprehensive publications are distributed for the objective of bettering and expanding the field of web systems, knowledge capture, and communication technologies. The series will provide researchers and practitioners with solutions for improving how technology is utilized for the purpose of a growing awareness of the importance of web applications and engineering.
Competitive advantage involves attaining access to relevant information. With the Internet and Web delivering so much of the world’s information, people have long found ways to exploit the publicly available open-source intelligence (OSINT). Since the inception of the Web, there have been various types of data records that are called up dynamically to users for various use cases. These records are web-accessible to users who generally have to authenticate into particular sites to gain access. This information is part of the so-called “Hidden Web” because the contents are not as easily located using contemporary web browsers; rather, the data is accessed through Web forms, Web service interfaces, and focused Deep Web portals (designed to find particular types of information, such as dynamically-generated or ephemeral information, various types of Web database records, or subscription-based materials). Currently, the Hidden or Deep Web is said to contain some upwards of 9 petabytes of information and tens of millions of discrete data sources, many times the size of the Surface Web (or Publicly Indexable Web), which contains billions of static Web pages. Multiple sources suggest a 500-2000:1 ratio between the Hidden Web and the surface one. The Hidden Web is said to contain hundreds of billions of Web pages. What this suggests is that there is a lot of underlying Web-based data that is going unexploited and generally undiscovered by a majority of those accessing the WWW through browsers and limited Web forms alone. This also suggests that any solutions for federated searching of the Hidden Web will need to be efficient and scalable while engaging a broad range of data. With advancements in Internet technology, Hidden Web data sources are expected to grow exponentially.
Some modern browsers have added some Hidden Web crawling capabilities as well. In the past decade, there have been various endeavors to map the Hidden Web by extracting metadata about the records, to extract selective data (structured, semi-structured, and unstructured) through federated hidden web searches, to protect some of the information, and to provide tools for users to better access and reconstitute this information in human- and machine-usable form. Some progress has been made in this area, but there are still challenges that are being explored and addressed.
The overall objectives of this text would be to expand human ways of knowing what is on the Hidden Web and how to access this information. This work will help readers understand some structures of the Hidden Web and ways to access and analyze the information there. This will also address the use of metadata for analysis.
This work will have implications not only for research but also for information security and assurance planning, in terms of understanding the various Hidden Web “attack surfaces” possible online and methods for protecting this Web-delivered data.
This book could be used by academics, researchers, journalists, and other professionals who have an interest in the Hidden Web and the information it contains. The electronic information can be added to public Web searches. The competitive advantage in research involves accessing information that others do not have or do not know about.
There are also potential implications for those who work in IT security and data protection. Knowing how to search the Hidden Web shows some of the limits to site and information security.
Recommended topics include, but are not limited to, the following:
• History of the hidden Web
o Legacy databases
o Deep Web directories
o Web portals/entry points to the Hidden Web
o Specialty search engines
o Hidden Web traffic
• A survey of the Hidden Web
• Extant standards
• Current information discovery paradigms on the Deep Web
• Conceptualizations and models of the Hidden Web
• Contemporary (meta)search engines and browsers and the Hidden Web
• Building tools for (federated) searching for and extracting Information from the Deep Web
• Proprietary and open methods to crawl, search, and data-collect from the Hidden Web
o Agents, metasearchers, and other tools for crawling the invisible Web
o Automated form filling strategies for information searching
o Manual (non-automated), automated, and mixed methods research on the Hidden Web
o Webmining strategies (including machine-learning) for the Deep Web
o Metadata extraction from the invisible Web
• Entities and organizations on the Hidden Web
• Structures and schemas of Web databases on the Hidden Web
o Graphical and other visual representations of the Hidden Web’s structures
o Content networks and data clusters on the Hidden Web
o Social network analysis (SNA) on the Hidden Web
o Community mining
• Data hierarchies of the Deep Web
• Regions of the Hidden Web
o Specific domains on the Hidden Web: public zones, restricted zones, the illicit dark Web
o Latent communities on the Hidden Web
o Latent (associative) content structures on the Hidden Web
• Analyzing electronic data on the Hidden Web
o Types of sites on the Hidden Web
o Classification of information on the Hidden Web
o Structured, semi-structured, and unstructured data
o Textual, visual, audio, video, animation, and multimedia files on the Hidden Web
o Metadata and the Hidden Web
o (Manual and automated) data processing from the Hidden Web
o Data integration across the Deep Web
o Source validity
• Defining and assessing information quality on the Hidden Web
o Heuristics for ranking data relevance on the Hidden Web
• Protecting data on the Hidden Web (and other security angles)
• Monitoring data access on the Hidden Web
• Applied cases of research using the Hidden Web (task-specific approaches in unique domains)
• Elegant hacks of the Hidden Web
• Future of the evolving Hidden Web
Researchers and practitioners are invited to submit on or before May 30, 2014 a page-long chapter proposal clearly explaining the mission and concerns of his or her proposed chapter. Please feel free to submit proposals through the book system or directly by email. Authors of accepted proposals will be notified about the status of their proposals and sent chapter guidelines. Full chapters are expected to be submitted July 30, 2014. All submitted chapters will be reviewed on a double-blind peer review basis. Contributors may also be requested to serve as peer-reviewers for this project. Empirical research is especially desirable. Case studies are encouraged as well.
This book is scheduled to be published by IGI Global (formerly Idea Group Inc.), publisher of the “Information Science Reference” (formerly Idea Group Reference), “Medical Information Science Reference,” “Business Science Reference,” and “Engineering Science Reference” imprints. For additional information regarding the publisher, please visit www.igi-global.com. This publication is anticipated to be released in 2015.
May 30, 2014: Proposal Submission Deadline
June 15, 2014: Notification of Acceptance
July 30, 2014: Full Chapter Submission
September 15, 2014: Review Results Returned
October 30, 2014: Final Chapter Submission
(There will be some flexibility on the dates.)
Editorial Advisory Board
Dr. Roger McHaney, Kansas State University, USA
Nancy Hays, EDUCAUSE, USA
Abe Lederman, Deep Web Technologies, USA
Dr. Shalin Hai-Jew