MLCS 2018 : First workshop on Machine Learning for Computing Systems
Call For Papers
We invite authors to submit papers to the First workshop on Machine Learning for Computing Systems (MLCS) which will be co-located with ACM HPDC 2018 and held on June 12, 2018 in Tempe, AZ, USA.
As the HPC community rapidly approaches the era of exascale machines, the complexity of problems such as monitoring, troubleshooting, and design also increases. Large HPC facilities already produce terabytes of data each day, ranging from low-level hardware telemetry and system logs, to troubleshooting tickets. Current administration tools tend to focus on designing filters for well-defined system events, restricting them to only detect behaviors previously known to be interesting. These tools will never find new, previously unknown modes of behavior automatically, or adapt to changes in the system. Meanwhile, machine learning techniques are uniquely suited for characterizing and extracting knowledge from large and complex datasets.
Recently, machine learning techniques are also used to better understand and analyze HPC machines and facilities. Interdisciplinary research at the intersection of machine learning and HPC has already produced advances in memory error mitigation, datacenter cooling, system log analysis, job scheduling, and many other areas. As the machine learning community focuses on human-understandable models, these models become extremely attractive for HPC-related decision support and development of data-driven tools to assist of human experts. Additionally, HPC-related problems are often relevant to open machine learning research areas, such as anomaly detection within near-natural language text in logs, and there is a definite need for collaboration between HPC domain experts and statistical modeling / machine learning experts.
For these reasons, we are organizing the 1st Workshop on Machine Learning for Computing Systems (MLCS). MLCS 2018 will provide a much-needed opportunity for cutting-edge research ideas to be shared, and bring together researchers across the disciplines of machine learning, systems design, systems monitoring, HPC resilience, hardware architecture, data science, statistics, and applied mathematics to address a shared goal of better and more efficient use and monitoring of HPC machines and facilities.
Conference web-site: https://mlcsworkshop.weebly.com/
Working from our premise that the deluge of HPC monitoring data necessitates a move toward data-driven intelligent modeling, we solicit contributions including, but not limited to:
1. Use of machine learning or data science to better understand:
- Hardware faults and errors
- Software errors
- Telemetry data (temperature, voltages, cooling apparatus)
- Power consumption
- Facilities / building control
- Job scheduling
- Filesystem logs
- Network logs
- Syslog or console logs
- Error detection and correction
- Resilience and fault tolerance
- Failure troubleshooting / assistance of human experts
- Assistance of non-expert users
- HPC system security
2. Use of interpretable machine learning models for HPC-related decision support
- Including user/human-subject studies
3. Modeling techniques incorporating human expert knowledge along with knowledge extracted from data
- Use of these models to evaluate, confirm, or refute human assumptions
4. New or improved machine learning models particularly suited for HPC problems
5. Tools, at any stage of development, using data-driven technologies for some aspect of systems monitoring or design
6. Experience reports detailing successes and failures of machine learning applied to HPC
7.Formulations of unsolved data-related HPC problems with the potential for machine learning
- Especially including the public release of HPC-related datasets for use by the community
* Submission Instructions
We are soliciting full papers, short work-in-progress, experience, or position papers, and poster abstracts:
- Submitted full papers must be no longer than 8 single-spaced 8.5" x 11" pages, including figures, tables, and references; in the ACM format (two-column format, using 10-point type on 12-point (single-spaced) leading; and a text block 6.5" wide x 9" deep). Author names and affiliations should appear on the title page.
- Submitted short work-in-progress, experience, or position papers must be no longer than 4 single-spaced 8.5" x 11" pages, including figures, tables, and references; in the ACM format (two-column format, using 10-point type on 12-point (single-spaced) leading; and a text block 6.5" wide x 9" deep). Author names and affiliations should appear on the title page.
- Submitted poster abstracts must be no longer than 2 single-spaced 8.5" x 11" pages, including figures, tables, and references; in the ACM format (two-column format, using 10-point type on 12-point (single-spaced) leading; and a text block 6.5" wide x 9" deep). Author names and affiliations should appear on the title page.
The submitted papers should present original theoretical and/or experimental research in any of the areas listed above that has not been previously published, accepted for publication, or is not currently under review by another conference or journal.
The accepted papers will be published in the workshop proceedings of ACM HPDC 2018 and available in the ACM Digital Library.
* Important Dates:
Paper submissions due: April 9, 2018, 11:59pm AoE
Notification to authors: May 9, 2018
Final paper files due: May 12, 2018
* Submission Site:
* Workshop Organizers
- Elisabeth Baseman, Los Alamos National Laboratory, USA
- George Amvrosiadis, Carnegie Mellon University, USA
- Huiping Cao, New Mexico State University, USA
- Medha Bhadkamkar, Nimble Storage, USA
- Sean Blanchard, Los Alamos National Laboratory, USA
- John Daly, Department of Defense, USA
- Nathan DeBardeleben, Los Alamos National Laboratory, USA
- Kurt Ferreira, Sandia National Laboratories, USA
- Todd Gamblin, Lawrence Livermore National Laboratory, USA
- Chuan Hu, Microsoft, USA
- Satyajayant Misra, New Mexico State University, USA
- Frank Mueller, North Carolina State University, USA
- Nicole Nichols, Pacific Northwest National Laboratory, USA
- Aleatha Parker-Wood, Center for Advanced Machine Learning at Symantec, USA
- J. Ray Scott, Pittsburgh Supercomputing Center, USA
- Feng Yan, University of Nevada, Reno, USA
- Mai Zheng, New Mexico State University, USA