2026-04-14

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning	Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang	2026-04-14	下载	Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collecti...
EPAC: The Last Dance	Filippo Mantovani, Fabio Banchelli, Pablo Vizcaino, Roger Ferrer, Oscar Palomar, Francesco Minervini, Jesus Labarta, Mauro Olivieri, Sebastiano Pomata, Pedro Marcuello, Jordi Cortina, Alberto Moreno, Josep Sans, Roger Espasa, Vassilis Papaefstathiou, Nikolaos Dimou, Georgios Ieronymakis, Antonis Psathakis, Michalis Giaourtas, Iasonas Mastorakis, Manolis Marazakis, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, César Fuguet, Mate Kovač, Mario Kovač, Luka Mrković, Josip Ramljak, Luca Bertaccini, Tim Fischer, Frank K. Gurkaynak, Paul Scheffler, Luca Benini, Bhavishya Goel, Madhavan Manivannan, Tiago Rocha, Nuno Neves, Jens Krüger	2026-04-14	下载	This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosy...
CODO: An Automated Compiler for Comprehensive Dataflow Optimization	Weichuang Zhang, Yiquan Wang, Xinzhou Zhang, Chi Zhang, Yu Feng, Xiaofeng Hou, Chao Li, Jieru Zhao, Minyi Guo	2026-04-14	下载	FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications.
HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming	Ilhuan Choi, Jiwon Yoo, Yoona Lee, Yewon Jeong, Jason Jaesung Lee, Woo-Seok Choi	2026-04-14	下载	Write-and-verify (WV) is essential for programming multi-level RRAM weights, yet under scaled-voltage and low-SNR conditions the verify read increasingly limits mapping accuracy, convergence speed and...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution	Chenwei Xie, Urjeet Shrestha, Corbin McElhanney, Lukas Lorimer, Gopal V, Zihao Ye, Yi Pan, Nic Crouch, Elliott Brossard, Florian Funke, Yuxiong He	2026-04-14	下载	Snowflake revolutionized data warehousing with an elastic architecture that decouples compute and storage, enabling scalable solutions for diverse data analytics needs.
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience	Jonathan Coles, Stefano Schuppli, Lukas Drescher, Fawzi Roberto Mohamed, Elia Palme, Henrique Mendonça, Miguel Gila, Mark Klein, Maxime Martinasso, Joost VandeVondele, Torsten Hoefler, Thomas Schulthess, Josh Romero, Igor Gorodetsky, Ryan Hankins, Isa Wazirzada, Martin Jaggi, Antoine Bosselut, Imanol Schlag, Antoni-Joan Solergibert i Llaquet, Alejandro Hernández Cano, Theofilos Ioannis Manitaras, Nicholas John Browning	2026-04-14	下载	Large Language Models (LLMs) have surged as a transformative technology for science and society, prompting governments worldwide to pursue sovereign AI capabilities that ensure data compliance and cul...
Towards a Linear-Algebraic Hypervisor	Breandan Considine	2026-04-14	下载	Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are t...
EPAC: The Last Dance	Filippo Mantovani, Fabio Banchelli, Pablo Vizcaino, Roger Ferrer, Oscar Palomar, Francesco Minervini, Jesus Labarta, Mauro Olivieri, Sebastiano Pomata, Pedro Marcuello, Jordi Cortina, Alberto Moreno, Josep Sans, Roger Espasa, Vassilis Papaefstathiou, Nikolaos Dimou, Georgios Ieronymakis, Antonis Psathakis, Michalis Giaourtas, Iasonas Mastorakis, Manolis Marazakis, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, César Fuguet, Mate Kovač, Mario Kovač, Luka Mrković, Josip Ramljak, Luca Bertaccini, Tim Fischer, Frank K. Gurkaynak, Paul Scheffler, Luca Benini, Bhavishya Goel, Madhavan Manivannan, Tiago Rocha, Nuno Neves, Jens Krüger	2026-04-14	下载	This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosy...
Intelligent resource prediction for SAP HANA continuous integration build workloads	Torsten Mandel, Jonathan Bader, Hanyoung Yoo, Stephan Kraft	2026-04-14	下载	Large enterprises often operate extensive Continuous Integration (CI) pipelines on large, heterogeneous compute clusters, where conservative, statically defined resource requirements are used to ensur...
Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems	Dino Conciatore, Elia Oggian, Federico Da Forno, Stefano Schuppli, Jerome Tissieres, Joost VandeVondele, Maxime Martinasso	2026-04-14	下载	Large-scale pre-training of Foundational Models (FM) constitutes a computationally intensive first phase for enabling AI across diverse scientific and societal applications.
Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization	Zhijie Cai, Yuhao Zheng, Haolong Chen, Dongzhu Liu, Bin Wang, Guangxu Zhu	2026-04-14	下载	Federated Learning (FL) offers a promising pathway for collaboratively fine-tuning Large Language Models (LLMs) at the edge; however, this paradigm faces a critical bottleneck: the prohibitive communi...
Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads	Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum	2026-04-14	下载	We present a systematic measurement study of seven tactics for reducing cloud LLM token usage when a small local model can act as a triage layer in front of a frontier cloud model.
Decentralized Learning via Random Walk with Jumps	Zonghong Liu, Matthew Dwyer, Salim El Rouayheb	2026-04-14	下载	We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagate...
A Periodic Space of Distributed Computing: Vision & Framework	Mohsen Amini Salehi, Adel N. Tousi, Hai Duc Nguyen, Murtaza Rangwala, Omar Rana, Tevfik Kosar, Valeria Cardellini, Rajkumar Buyya	2026-04-14	下载	Advances in networking and computing technologies throughout the early decades of the 21st century have transformed long-standing dreams of pervasive communication and computation into reality.
BlazingAML: High-Throughput Anti-Money Laundering (AML) via Multi-Stage Graph Mining	Haojie Ye, Arjun Laxman, Yichao Yuan, Krisztian Flautner, Nishil Talati	2026-04-14	下载	Money laundering detection faces challenges due to excessive false positives and inadequate adaptation to sophisticated multi-stage schemes that exploit modern financial networks.
PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving	Xu Bai, Muhammed Tawfiqul Islam, Chen Wang, Adel N. Toosi	2026-04-14	下载	Pipeline parallelism (PP) is widely used to partition layers of large language models (LLMs) across GPUs, enabling scalable inference for large models.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Improving Network Clock Synchronization by Marking Congestion	Yash Deshpande, Quirin Vogel, Laura Becker, Kaan Aykurt, Wolfgang Kellerer	2026-04-14	下载	Achieving consistent time across devices in distributed systems often involves exchanging timestamped messages over a network. Precise time synchronization is crucial for applications such as cellular...
Graph-based Hierarchical Deep Reinforcement Learning for Deliverable Block Propagation with Optimal Hybrid Cost in Web 3.0	Shi Chen, Jinbo Wen, Jiawen Kang, Tenghui Huang, Maomao Zhang, Tao Zhang, Dong In Kim	2026-04-14	下载	Web 3.0 is envisioned as a decentralized paradigm, where blockchain serves as a core technology for transparent and tamper-proof data management.
Joint Clustering and Prediction of the Quality of Service in Vehicular Cellular Networks	Oscar Stenhammar, Gábor Fodor, Carlo Fischione	2026-04-14	下载	Machine learning models are increasingly deployed in wireless networks with stringent performance requirements. However, dynamic propagation environments and fluctuating traffic densities introduce co...
Advancing Network Digital Twin Framework for Generating Realistic Datasets	Oscar Stenhammar, Sundeep Rangan, Gábor Fodor, Carlo Fischione	2026-04-14	下载	The integration of accurate and reproducible wireless network simulations is a key enabler for research on open, virtualized, and intelligent communication systems.
Efficient Semantic Image Communication for Traffic Monitoring at the Edge	Damir Assylbek, Nurmukhammed Aitymbetov, Marko Ristin, Dimitrios Zorbas	2026-04-14	下载	Many visual monitoring systems operate under strict communication constraints, where transmitting full-resolution images is impractical and often unnecessary.
Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFS	Dennis Trautwein, Cornelius Ihle, Moritz Schubotz, Corinna Breitinger, Bela Gipp	2026-04-14	下载	The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen...
LightTune: Lightweight Forward-Only Online Fine-Tuning with Applications to Link Adaptation	Ramy E. Ali, Federico Penna	2026-04-14	下载	Deploying machine learning (ML) algorithms on mobile phones is bottlenecked by performance degradation under dynamic, real-world conditions that differ from the offline training conditions.
Throughput Characterization of Wireless CSMA Networks With Arbitrary Sensing and Interference Topologies	Xinghua Sun, Wenhai Lin, Ruike Zhou	2026-04-14	下载	The performance analysis of wireless CSMA networks is notoriously difficult due to the intricate sensing and interference relationships among links.
Traffic-Aware Domain Partitioning and Load-Balanced Inter-Domain Routing for LEO Satellite Networks	Chen Zhou, Jiangtao Luo, Yongyi Ran	2026-04-14	下载	Low Earth Orbit (LEO) satellite networks provide global coverage and low latency, yet high node mobility, uneven traffic distribution, and stochastic link failures pose severe challenges for inter-dom...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
TierBPF: Page Migration Admission Control for Tiered Memory via eBPF	Xi Wang, Tal Zussman, Yuang Xu, Bin Ma, Asaf Cidon, Dong Li	2026-04-14	下载	Existing software-based memory tiering systems decide which pages to place on the slower or faster tier. However, they do not take into account two important factors that greatly influence application...
A Periodic Space of Distributed Computing: Vision & Framework	Mohsen Amini Salehi, Adel N. Tousi, Hai Duc Nguyen, Murtaza Rangwala, Omar Rana, Tevfik Kosar, Valeria Cardellini, Rajkumar Buyya	2026-04-14	下载	Advances in networking and computing technologies throughout the early decades of the 21st century have transformed long-standing dreams of pervasive communication and computation into reality.
Hybrid Adaptive Tuning for Tiered Memory Systems	Xi Wang, Jie Liu, Shuangyan Yang, Jongryool Kim, Pengfei Su, Dong Li	2026-04-14	下载	Memory tiering provides a cost-effective solution to increase memory capacity, utilization, and even bandwidth. Memory tiering relies on system software for memory profiling, detection of frequently a...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Towards a Linear-Algebraic Hypervisor	Breandan Considine	2026-04-14	下载	Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are t...
Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFS	Dennis Trautwein, Cornelius Ihle, Moritz Schubotz, Corinna Breitinger, Bela Gipp	2026-04-14	下载	The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen...