FTXS 2018 : Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop

posted by organizer: scottlevy || 2798 views || tracked by 3 users: [display]

FTXS 2018 : Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop

Link: https://sites.google.com/site/ftxsworkshop/home/ftxs-2018

When	Nov 16, 2018 - Nov 16, 2018
Where	Dallas, TX, USA
Submission Deadline	Aug 30, 2018
Notification Due	Sep 27, 2018

Categories HPC high-performance computing fault tolerance resilience

Call For Papers

CALL FOR PAPERS
8th Workshop on Fault-Tolerance for HPC at eXtreme Scale (FTXS 2018)

In conjunction with
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18)
Dallas, Texas, USA November 11 – 16, 2018

Important Dates
* Submission of papers: August 30, 2018
* Author notification: September 27, 2018
* Camera-ready papers: TBA
* Workshop: Friday, November 16, 2018

Authors are invited to submit original papers on the research and practice of
fault-tolerance in extreme-scale distributed systems (primarily HPC systems, but
including grid and cloud systems). Resilience and fault-tolerance remain a major
concern for supercomputing and advances in this area are needed to allow applications
to compute accurate (or within an acceptable error tolerance) answers in a timely and
efficient manner in the presence of degradations or failures of platform components
(both hardware and software).

Topics include, but are not limited to:
* Failure data analysis and field studies
* Power, performance, resilience (PPR) assessments / tradeoffs
* Novel fault-tolerance techniques and implementations
* Emerging hardware and software technology for resilience
* Silent data corruption (SDC) detection / correction techniques
* Advances in reliability monitoring, analysis, and control of highly complex
systems
* Failure prediction, error preemption, and recovery techniques
* Fault-tolerant programming models
* Models for software and hardware reliability
* Metrics and standards for measuring, improving, and enforcing effective
fault-tolerance
* Scalable Byzantine fault-tolerance and security from single-fault and
fail-silent violations
* Atmospheric evaluations relevant to HPC systems (terrestrial neutrons,
temperature, voltage, etc.)
* Near-threshold-voltage implications and evaluations for reliability
* Benchmarks and experimental environments including fault injection
* Frameworks and APIs for fault-tolerance and fault management

PAPER SUBMISSIONS
Submissions are solicited in the following categories:
* Regular papers presenting innovative ideas improving the state of the
art or discussing the issues seen on existing extreme-scale systems,
including some form of analysis and evaluation.
* Extended abstracts proposing disruptive ideas and challenging assumptions in the field, including
some form of preliminary results.
Extended abstracts will be evaluated separately and given shorter oral presentations.

Submissions shall be sent electronically, must conform to SC18
proceedings style. Regular papers should not exceed ten (10) pages
including all text, appendices, and figures. Extended abstract papers
should not exceed six (6) pages.

WORKSHOP CO-CHAIRS
Nathan DeBardeleben – Los Alamos National Laboratory
Scott Levy - Sandia National Laboratories

ORGANIZING COMMITTEE
Keita Teranishi – Sandia National Laboratories
John Daly – Laboratory for Physical Sciences

PROGRAM COMMITTEE
Rizwan Ashraf - Oak Ridge National Laboratory
Leonardo Bautista Gomez – Barcelona Supercomputing Center
Aurélien Bouteiller – University of Tennessee Knoxville
Robert Clay – Sandia National Laboratories
James Elliott - Sandia National Laboratories
Christian Engelmann –Oak Ridge National Laboratory
Kurt B. Ferreira – Sandia National Laboratories
Qiang Guan – Kent State University
Sudhanva Gurumurthi –AMD
Hideyuki Jitsumoto – Tokyo Institute of Technology
Zhiling Lan – Illinois Institute of Technology
Naoya Maruyama – Lawrence Livermore National Laboratory
Bogdan Nicolae - Argonne National Laboratory
Yves Robert – ENS Lyon & Univ. Tenn. Knoxville
Vilas Sridharan – AMD
Abhinav Vishnu – Pacific Northwest National Laboratory
Panruo Wu – University of California at Riverside

https://sites.google.com/site/ftxsworkshop/home/ftxs-2018

Related Resources

XLOOP 2025 The 7th Annual Workshop on Extreme-Scale Experiment-in-the-Loop Computing

IEEE- CCRIS 2025 2025 IEEE 6th International Conference on Control, Robotics and Intelligent System (CCRIS 2025)

REX-IO 2025 5th Workshop on Re-envisioning Extreme-Scale I/O for Emerging Hybrid HPC Workloads @ IEEE Cluster 2025

OpenSuCo @ ISC HPC 2017 2017 International Workshop on Open Source Supercomputing

ESSA 2025 6th Workshop on Extreme-Scale Storage and Analysis

HICSS 2026 Hawaii International Conference on System Sciences Mini Track: Advances in Software Resilience: New Frontiers in Testing, Verification, Compliance, and Fault-Tolerance Mechanisms

SICSI 2025 1st IEEE International Workshop on Secure Industrial Control Systems and Industrial-IoT 2025

DFT 2025 38th IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems

GraphSys 2025 The Third Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems (Co-Located with Europar 2025)

CARLA 2025 LATIN AMERICA HIGH PERFORMANCE COMPUTING CONFERENCE