dc.contributor.author | Subasi, Omer | |
dc.contributor.author | Yalcin, Gulay | |
dc.contributor.author | Zyulkyarov, Ferad | |
dc.contributor.author | Unsal, Osman | |
dc.contributor.author | Labarta, Jesus | |
dc.date.accessioned | 2019-07-08T08:54:01Z | |
dc.date.available | 2019-07-08T08:54:01Z | |
dc.date.issued | 2017 | en_US |
dc.identifier.citation | 2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) Book Group Author(s):IEEE Book Series: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing Pages: 452-457 DOI: 10.1109/CCGRID.2017.40 | en_US |
dc.identifier.isbn | 978-1-5090-6611-7 | |
dc.identifier.issn | 2376-4414 | |
dc.identifier.other | Accession Number: WOS:000426912900048 | |
dc.identifier.other | DOI: 10.1109/CCGRID.2017.40 | |
dc.identifier.uri | http://acikerisim.agu.edu.tr/xmlui/handle/20.500.12573/72 | |
dc.description | This work is supported in part by the European Union Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and the FEDER funds under contract TIN2015-65316-P. | en_US |
dc.description.abstract | Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However few studies address both types of errors together. In this paper we propose a software-based selective replication technique for HPC applications for both fail-stop errors and SDCs. Since complete replication of applications can be costly in terms of resources, we develop a runtime-based technique for selective replication. Selective replication provides an opportunity to meet HPC reliability targets while decreasing resource costs. Our technique is low-overhead, automatic and completely transparent to the user. | en_US |
dc.description.sponsorship | European Union Mont-blanc 2 Project - 610402 FEDER funds - TIN2015-65316-P | en_US |
dc.language.iso | eng | en_US |
dc.publisher | IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA | en_US |
dc.relation.ispartofseries | IEEE-ACM International Symposium on Cluster Cloud and Grid Computing;452-457 | |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.title | Designing and Modelling Selective Replication for Fault-tolerant HPC Applications | en_US |
dc.type | other | en_US |
dc.contributor.department | AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | en_US |
dc.contributor.institutionauthor | | |
dc.identifier.doi | 10.1109/CCGRID.2017.40 | |
dc.relation.publicationcategory | Diğer | en_US |