Skip to content

2026-04-14

cs.AR - Architecture

标题作者发布日期PDF摘要
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual LearningChaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang2026-04-14下载Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collecti...
EPAC: The Last DanceFilippo Mantovani, Fabio Banchelli, Pablo Vizcaino, Roger Ferrer, Oscar Palomar, Francesco Minervini, Jesus Labarta, Mauro Olivieri, Sebastiano Pomata, Pedro Marcuello, Jordi Cortina, Alberto Moreno, Josep Sans, Roger Espasa, Vassilis Papaefstathiou, Nikolaos Dimou, Georgios Ieronymakis, Antonis Psathakis, Michalis Giaourtas, Iasonas Mastorakis, Manolis Marazakis, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, César Fuguet, Mate Kovač, Mario Kovač, Luka Mrković, Josip Ramljak, Luca Bertaccini, Tim Fischer, Frank K. Gurkaynak, Paul Scheffler, Luca Benini, Bhavishya Goel, Madhavan Manivannan, Tiago Rocha, Nuno Neves, Jens Krüger2026-04-14下载This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosy...
CODO: An Automated Compiler for Comprehensive Dataflow OptimizationWeichuang Zhang, Yiquan Wang, Xinzhou Zhang, Chi Zhang, Yu Feng, Xiaofeng Hou, Chao Li, Jieru Zhao, Minyi Guo2026-04-14下载FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications.
HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM ProgrammingIlhuan Choi, Jiwon Yoo, Yoona Lee, Yewon Jeong, Jason Jaesung Lee, Woo-Seok Choi2026-04-14下载Write-and-verify (WV) is essential for programming multi-level RRAM weights, yet under scaled-voltage and low-SNR conditions the verify read increasingly limits mapping accuracy, convergence speed and...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF ExecutionChenwei Xie, Urjeet Shrestha, Corbin McElhanney, Lukas Lorimer, Gopal V, Zihao Ye, Yi Pan, Nic Crouch, Elliott Brossard, Florian Funke, Yuxiong He2026-04-14下载Snowflake revolutionized data warehousing with an elastic architecture that decouples compute and storage, enabling scalable solutions for diverse data analytics needs.
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus ExperienceJonathan Coles, Stefano Schuppli, Lukas Drescher, Fawzi Roberto Mohamed, Elia Palme, Henrique Mendonça, Miguel Gila, Mark Klein, Maxime Martinasso, Joost VandeVondele, Torsten Hoefler, Thomas Schulthess, Josh Romero, Igor Gorodetsky, Ryan Hankins, Isa Wazirzada, Martin Jaggi, Antoine Bosselut, Imanol Schlag, Antoni-Joan Solergibert i Llaquet, Alejandro Hernández Cano, Theofilos Ioannis Manitaras, Nicholas John Browning2026-04-14下载Large Language Models (LLMs) have surged as a transformative technology for science and society, prompting governments worldwide to pursue sovereign AI capabilities that ensure data compliance and cul...
Towards a Linear-Algebraic HypervisorBreandan Considine2026-04-14下载Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are t...
EPAC: The Last DanceFilippo Mantovani, Fabio Banchelli, Pablo Vizcaino, Roger Ferrer, Oscar Palomar, Francesco Minervini, Jesus Labarta, Mauro Olivieri, Sebastiano Pomata, Pedro Marcuello, Jordi Cortina, Alberto Moreno, Josep Sans, Roger Espasa, Vassilis Papaefstathiou, Nikolaos Dimou, Georgios Ieronymakis, Antonis Psathakis, Michalis Giaourtas, Iasonas Mastorakis, Manolis Marazakis, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, César Fuguet, Mate Kovač, Mario Kovač, Luka Mrković, Josip Ramljak, Luca Bertaccini, Tim Fischer, Frank K. Gurkaynak, Paul Scheffler, Luca Benini, Bhavishya Goel, Madhavan Manivannan, Tiago Rocha, Nuno Neves, Jens Krüger2026-04-14下载This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosy...
Intelligent resource prediction for SAP HANA continuous integration build workloadsTorsten Mandel, Jonathan Bader, Hanyoung Yoo, Stephan Kraft2026-04-14下载Large enterprises often operate extensive Continuous Integration (CI) pipelines on large, heterogeneous compute clusters, where conservative, statically defined resource requirements are used to ensur...
Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC SystemsDino Conciatore, Elia Oggian, Federico Da Forno, Stefano Schuppli, Jerome Tissieres, Joost VandeVondele, Maxime Martinasso2026-04-14下载Large-scale pre-training of Foundational Models (FM) constitutes a computationally intensive first phase for enabling AI across diverse scientific and societal applications.
Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order OptimizationZhijie Cai, Yuhao Zheng, Haolong Chen, Dongzhu Liu, Bin Wang, Guangxu Zhu2026-04-14下载Federated Learning (FL) offers a promising pathway for collaboratively fine-tuning Large Language Models (LLMs) at the edge; however, this paradigm faces a critical bottleneck: the prohibitive communi...
Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent WorkloadsJustice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum2026-04-14下载We present a systematic measurement study of seven tactics for reducing cloud LLM token usage when a small local model can act as a triage layer in front of a frontier cloud model.
Decentralized Learning via Random Walk with JumpsZonghong Liu, Matthew Dwyer, Salim El Rouayheb2026-04-14下载We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagate...
A Periodic Space of Distributed Computing: Vision & FrameworkMohsen Amini Salehi, Adel N. Tousi, Hai Duc Nguyen, Murtaza Rangwala, Omar Rana, Tevfik Kosar, Valeria Cardellini, Rajkumar Buyya2026-04-14下载Advances in networking and computing technologies throughout the early decades of the 21st century have transformed long-standing dreams of pervasive communication and computation into reality.
BlazingAML: High-Throughput Anti-Money Laundering (AML) via Multi-Stage Graph MiningHaojie Ye, Arjun Laxman, Yichao Yuan, Krisztian Flautner, Nishil Talati2026-04-14下载Money laundering detection faces challenges due to excessive false positives and inadequate adaptation to sophisticated multi-stage schemes that exploit modern financial networks.
PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM ServingXu Bai, Muhammed Tawfiqul Islam, Chen Wang, Adel N. Toosi2026-04-14下载Pipeline parallelism (PP) is widely used to partition layers of large language models (LLMs) across GPUs, enabling scalable inference for large models.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Improving Network Clock Synchronization by Marking CongestionYash Deshpande, Quirin Vogel, Laura Becker, Kaan Aykurt, Wolfgang Kellerer2026-04-14下载Achieving consistent time across devices in distributed systems often involves exchanging timestamped messages over a network. Precise time synchronization is crucial for applications such as cellular...
Graph-based Hierarchical Deep Reinforcement Learning for Deliverable Block Propagation with Optimal Hybrid Cost in Web 3.0Shi Chen, Jinbo Wen, Jiawen Kang, Tenghui Huang, Maomao Zhang, Tao Zhang, Dong In Kim2026-04-14下载Web 3.0 is envisioned as a decentralized paradigm, where blockchain serves as a core technology for transparent and tamper-proof data management.
Joint Clustering and Prediction of the Quality of Service in Vehicular Cellular NetworksOscar Stenhammar, Gábor Fodor, Carlo Fischione2026-04-14下载Machine learning models are increasingly deployed in wireless networks with stringent performance requirements. However, dynamic propagation environments and fluctuating traffic densities introduce co...
Advancing Network Digital Twin Framework for Generating Realistic DatasetsOscar Stenhammar, Sundeep Rangan, Gábor Fodor, Carlo Fischione2026-04-14下载The integration of accurate and reproducible wireless network simulations is a key enabler for research on open, virtualized, and intelligent communication systems.
Efficient Semantic Image Communication for Traffic Monitoring at the EdgeDamir Assylbek, Nurmukhammed Aitymbetov, Marko Ristin, Dimitrios Zorbas2026-04-14下载Many visual monitoring systems operate under strict communication constraints, where transmitting full-resolution images is impractical and often unnecessary.
Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFSDennis Trautwein, Cornelius Ihle, Moritz Schubotz, Corinna Breitinger, Bela Gipp2026-04-14下载The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen...
LightTune: Lightweight Forward-Only Online Fine-Tuning with Applications to Link AdaptationRamy E. Ali, Federico Penna2026-04-14下载Deploying machine learning (ML) algorithms on mobile phones is bottlenecked by performance degradation under dynamic, real-world conditions that differ from the offline training conditions.
Throughput Characterization of Wireless CSMA Networks With Arbitrary Sensing and Interference TopologiesXinghua Sun, Wenhai Lin, Ruike Zhou2026-04-14下载The performance analysis of wireless CSMA networks is notoriously difficult due to the intricate sensing and interference relationships among links.
Traffic-Aware Domain Partitioning and Load-Balanced Inter-Domain Routing for LEO Satellite NetworksChen Zhou, Jiangtao Luo, Yongyi Ran2026-04-14下载Low Earth Orbit (LEO) satellite networks provide global coverage and low latency, yet high node mobility, uneven traffic distribution, and stochastic link failures pose severe challenges for inter-dom...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
TierBPF: Page Migration Admission Control for Tiered Memory via eBPFXi Wang, Tal Zussman, Yuang Xu, Bin Ma, Asaf Cidon, Dong Li2026-04-14下载Existing software-based memory tiering systems decide which pages to place on the slower or faster tier. However, they do not take into account two important factors that greatly influence application...
A Periodic Space of Distributed Computing: Vision & FrameworkMohsen Amini Salehi, Adel N. Tousi, Hai Duc Nguyen, Murtaza Rangwala, Omar Rana, Tevfik Kosar, Valeria Cardellini, Rajkumar Buyya2026-04-14下载Advances in networking and computing technologies throughout the early decades of the 21st century have transformed long-standing dreams of pervasive communication and computation into reality.
Hybrid Adaptive Tuning for Tiered Memory SystemsXi Wang, Jie Liu, Shuangyan Yang, Jongryool Kim, Pengfei Su, Dong Li2026-04-14下载Memory tiering provides a cost-effective solution to increase memory capacity, utilization, and even bandwidth. Memory tiering relies on system software for memory profiling, detection of frequently a...

cs.PF - Performance

标题作者发布日期PDF摘要
Towards a Linear-Algebraic HypervisorBreandan Considine2026-04-14下载Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are t...
Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFSDennis Trautwein, Cornelius Ihle, Moritz Schubotz, Corinna Breitinger, Bela Gipp2026-04-14下载The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen...

基于 VitePress 构建