Appearance
2026-04-14
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning | Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang | 2026-04-14 | 下载 | Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collecti... |
| EPAC: The Last Dance | Filippo Mantovani, Fabio Banchelli, Pablo Vizcaino, Roger Ferrer, Oscar Palomar, Francesco Minervini, Jesus Labarta, Mauro Olivieri, Sebastiano Pomata, Pedro Marcuello, Jordi Cortina, Alberto Moreno, Josep Sans, Roger Espasa, Vassilis Papaefstathiou, Nikolaos Dimou, Georgios Ieronymakis, Antonis Psathakis, Michalis Giaourtas, Iasonas Mastorakis, Manolis Marazakis, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, César Fuguet, Mate Kovač, Mario Kovač, Luka Mrković, Josip Ramljak, Luca Bertaccini, Tim Fischer, Frank K. Gurkaynak, Paul Scheffler, Luca Benini, Bhavishya Goel, Madhavan Manivannan, Tiago Rocha, Nuno Neves, Jens Krüger | 2026-04-14 | 下载 | This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosy... |
| CODO: An Automated Compiler for Comprehensive Dataflow Optimization | Weichuang Zhang, Yiquan Wang, Xinzhou Zhang, Chi Zhang, Yu Feng, Xiaofeng Hou, Chao Li, Jieru Zhao, Minyi Guo | 2026-04-14 | 下载 | FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. |
| HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming | Ilhuan Choi, Jiwon Yoo, Yoona Lee, Yewon Jeong, Jason Jaesung Lee, Woo-Seok Choi | 2026-04-14 | 下载 | Write-and-verify (WV) is essential for programming multi-level RRAM weights, yet under scaled-voltage and low-SNR conditions the verify read increasingly limits mapping accuracy, convergence speed and... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution | Chenwei Xie, Urjeet Shrestha, Corbin McElhanney, Lukas Lorimer, Gopal V, Zihao Ye, Yi Pan, Nic Crouch, Elliott Brossard, Florian Funke, Yuxiong He | 2026-04-14 | 下载 | Snowflake revolutionized data warehousing with an elastic architecture that decouples compute and storage, enabling scalable solutions for diverse data analytics needs. |
| An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience | Jonathan Coles, Stefano Schuppli, Lukas Drescher, Fawzi Roberto Mohamed, Elia Palme, Henrique Mendonça, Miguel Gila, Mark Klein, Maxime Martinasso, Joost VandeVondele, Torsten Hoefler, Thomas Schulthess, Josh Romero, Igor Gorodetsky, Ryan Hankins, Isa Wazirzada, Martin Jaggi, Antoine Bosselut, Imanol Schlag, Antoni-Joan Solergibert i Llaquet, Alejandro Hernández Cano, Theofilos Ioannis Manitaras, Nicholas John Browning | 2026-04-14 | 下载 | Large Language Models (LLMs) have surged as a transformative technology for science and society, prompting governments worldwide to pursue sovereign AI capabilities that ensure data compliance and cul... |
| Towards a Linear-Algebraic Hypervisor | Breandan Considine | 2026-04-14 | 下载 | Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are t... |
| EPAC: The Last Dance | Filippo Mantovani, Fabio Banchelli, Pablo Vizcaino, Roger Ferrer, Oscar Palomar, Francesco Minervini, Jesus Labarta, Mauro Olivieri, Sebastiano Pomata, Pedro Marcuello, Jordi Cortina, Alberto Moreno, Josep Sans, Roger Espasa, Vassilis Papaefstathiou, Nikolaos Dimou, Georgios Ieronymakis, Antonis Psathakis, Michalis Giaourtas, Iasonas Mastorakis, Manolis Marazakis, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, César Fuguet, Mate Kovač, Mario Kovač, Luka Mrković, Josip Ramljak, Luca Bertaccini, Tim Fischer, Frank K. Gurkaynak, Paul Scheffler, Luca Benini, Bhavishya Goel, Madhavan Manivannan, Tiago Rocha, Nuno Neves, Jens Krüger | 2026-04-14 | 下载 | This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosy... |
| Intelligent resource prediction for SAP HANA continuous integration build workloads | Torsten Mandel, Jonathan Bader, Hanyoung Yoo, Stephan Kraft | 2026-04-14 | 下载 | Large enterprises often operate extensive Continuous Integration (CI) pipelines on large, heterogeneous compute clusters, where conservative, statically defined resource requirements are used to ensur... |
| Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems | Dino Conciatore, Elia Oggian, Federico Da Forno, Stefano Schuppli, Jerome Tissieres, Joost VandeVondele, Maxime Martinasso | 2026-04-14 | 下载 | Large-scale pre-training of Foundational Models (FM) constitutes a computationally intensive first phase for enabling AI across diverse scientific and societal applications. |
| Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization | Zhijie Cai, Yuhao Zheng, Haolong Chen, Dongzhu Liu, Bin Wang, Guangxu Zhu | 2026-04-14 | 下载 | Federated Learning (FL) offers a promising pathway for collaboratively fine-tuning Large Language Models (LLMs) at the edge; however, this paradigm faces a critical bottleneck: the prohibitive communi... |
| Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads | Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum | 2026-04-14 | 下载 | We present a systematic measurement study of seven tactics for reducing cloud LLM token usage when a small local model can act as a triage layer in front of a frontier cloud model. |
| Decentralized Learning via Random Walk with Jumps | Zonghong Liu, Matthew Dwyer, Salim El Rouayheb | 2026-04-14 | 下载 | We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagate... |
| A Periodic Space of Distributed Computing: Vision & Framework | Mohsen Amini Salehi, Adel N. Tousi, Hai Duc Nguyen, Murtaza Rangwala, Omar Rana, Tevfik Kosar, Valeria Cardellini, Rajkumar Buyya | 2026-04-14 | 下载 | Advances in networking and computing technologies throughout the early decades of the 21st century have transformed long-standing dreams of pervasive communication and computation into reality. |
| BlazingAML: High-Throughput Anti-Money Laundering (AML) via Multi-Stage Graph Mining | Haojie Ye, Arjun Laxman, Yichao Yuan, Krisztian Flautner, Nishil Talati | 2026-04-14 | 下载 | Money laundering detection faces challenges due to excessive false positives and inadequate adaptation to sophisticated multi-stage schemes that exploit modern financial networks. |
| PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving | Xu Bai, Muhammed Tawfiqul Islam, Chen Wang, Adel N. Toosi | 2026-04-14 | 下载 | Pipeline parallelism (PP) is widely used to partition layers of large language models (LLMs) across GPUs, enabling scalable inference for large models. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Improving Network Clock Synchronization by Marking Congestion | Yash Deshpande, Quirin Vogel, Laura Becker, Kaan Aykurt, Wolfgang Kellerer | 2026-04-14 | 下载 | Achieving consistent time across devices in distributed systems often involves exchanging timestamped messages over a network. Precise time synchronization is crucial for applications such as cellular... |
| Graph-based Hierarchical Deep Reinforcement Learning for Deliverable Block Propagation with Optimal Hybrid Cost in Web 3.0 | Shi Chen, Jinbo Wen, Jiawen Kang, Tenghui Huang, Maomao Zhang, Tao Zhang, Dong In Kim | 2026-04-14 | 下载 | Web 3.0 is envisioned as a decentralized paradigm, where blockchain serves as a core technology for transparent and tamper-proof data management. |
| Joint Clustering and Prediction of the Quality of Service in Vehicular Cellular Networks | Oscar Stenhammar, Gábor Fodor, Carlo Fischione | 2026-04-14 | 下载 | Machine learning models are increasingly deployed in wireless networks with stringent performance requirements. However, dynamic propagation environments and fluctuating traffic densities introduce co... |
| Advancing Network Digital Twin Framework for Generating Realistic Datasets | Oscar Stenhammar, Sundeep Rangan, Gábor Fodor, Carlo Fischione | 2026-04-14 | 下载 | The integration of accurate and reproducible wireless network simulations is a key enabler for research on open, virtualized, and intelligent communication systems. |
| Efficient Semantic Image Communication for Traffic Monitoring at the Edge | Damir Assylbek, Nurmukhammed Aitymbetov, Marko Ristin, Dimitrios Zorbas | 2026-04-14 | 下载 | Many visual monitoring systems operate under strict communication constraints, where transmitting full-resolution images is impractical and often unnecessary. |
| Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFS | Dennis Trautwein, Cornelius Ihle, Moritz Schubotz, Corinna Breitinger, Bela Gipp | 2026-04-14 | 下载 | The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen... |
| LightTune: Lightweight Forward-Only Online Fine-Tuning with Applications to Link Adaptation | Ramy E. Ali, Federico Penna | 2026-04-14 | 下载 | Deploying machine learning (ML) algorithms on mobile phones is bottlenecked by performance degradation under dynamic, real-world conditions that differ from the offline training conditions. |
| Throughput Characterization of Wireless CSMA Networks With Arbitrary Sensing and Interference Topologies | Xinghua Sun, Wenhai Lin, Ruike Zhou | 2026-04-14 | 下载 | The performance analysis of wireless CSMA networks is notoriously difficult due to the intricate sensing and interference relationships among links. |
| Traffic-Aware Domain Partitioning and Load-Balanced Inter-Domain Routing for LEO Satellite Networks | Chen Zhou, Jiangtao Luo, Yongyi Ran | 2026-04-14 | 下载 | Low Earth Orbit (LEO) satellite networks provide global coverage and low latency, yet high node mobility, uneven traffic distribution, and stochastic link failures pose severe challenges for inter-dom... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| TierBPF: Page Migration Admission Control for Tiered Memory via eBPF | Xi Wang, Tal Zussman, Yuang Xu, Bin Ma, Asaf Cidon, Dong Li | 2026-04-14 | 下载 | Existing software-based memory tiering systems decide which pages to place on the slower or faster tier. However, they do not take into account two important factors that greatly influence application... |
| A Periodic Space of Distributed Computing: Vision & Framework | Mohsen Amini Salehi, Adel N. Tousi, Hai Duc Nguyen, Murtaza Rangwala, Omar Rana, Tevfik Kosar, Valeria Cardellini, Rajkumar Buyya | 2026-04-14 | 下载 | Advances in networking and computing technologies throughout the early decades of the 21st century have transformed long-standing dreams of pervasive communication and computation into reality. |
| Hybrid Adaptive Tuning for Tiered Memory Systems | Xi Wang, Jie Liu, Shuangyan Yang, Jongryool Kim, Pengfei Su, Dong Li | 2026-04-14 | 下载 | Memory tiering provides a cost-effective solution to increase memory capacity, utilization, and even bandwidth. Memory tiering relies on system software for memory profiling, detection of frequently a... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Towards a Linear-Algebraic Hypervisor | Breandan Considine | 2026-04-14 | 下载 | Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are t... |
| Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFS | Dennis Trautwein, Cornelius Ihle, Moritz Schubotz, Corinna Breitinger, Bela Gipp | 2026-04-14 | 下载 | The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen... |