2026-04-08

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration	Md Zesun Ahmed Mia, Jiahui Duan, Kai Ni, Abhronil Sengupta	2026-04-08	下载	Self-attention in Transformers generates dynamic operands that force conventional Compute-in-Memory (CIM) accelerators into costly non-volatile memory (NVM) reprogramming cycles, degrading throughput ...
From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference	Ravindra Ganti, Steve Xu	2026-04-08	下载	We present an RL-driven compiler that jointly optimizes ASIC architecture, memory hierarchy, and workload partitioning for AI inference across 3nm to 28nm.
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration	Xingzhen Chen, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Sarah Schultz, Zheng Dong, Weisong Shi, Peipei Zhou	2026-04-08	下载	With the development of deep neural network (DNN) enabled applications, achieving high hardware resource efficiency on diverse workloads is non-trivial in heterogeneous computing platforms.
Symbolic Polyhedral-Based Energy Analysis for Nested Loop Programs	Avinash Mahesh Nirmala, Dominik Walter, Frank Hannig, Jürgen Teich	2026-04-08	下载	This work presents a symbolic approach for estimating the energy consumption for nested loop programs when mapped and scheduled on parallel processor array accelerator architectures.
Assessing the Added Value of Onboard Earth Observation Processing with the IRIDE HEO Service Segment	Parampuneet Kaur Thind, Charles Mwangi, Giovanni Varetto, Lorenzo Sarti, Andrea Papa, Andrea Taramelli	2026-04-08	下载	Current operational Earth Observation (EO) services, including the Copernicus Emergency Management Service (CEMS), the European Forest Fire Information System (EFFIS), and the Copernicus Land Monitori...
TRAPTI: Time-Resolved Analysis for SRAM Banking and Power Gating Optimization in Embedded Transformer Inference	Jan Klhufek, Alberto Marchisio, Vojtech Mrazek, Lukas Sekanina, Muhammad Shafique	2026-04-08	下载	Transformer neural networks achieve state-of-the-art accuracy across language and vision tasks, but their deployment on embedded hardware is hindered by stringent area, latency, and energy constraints...
CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing	Kanta Yoshioka, Soshi Hirayae, Yuichiro Tanaka, Yuichi Katori, Takashi Morie, Hakaru Tamukoh	2026-04-08	下载	This paper presents CBM-Dual, the first silicon-proven digital chaotic dynamics processor (CDP) supporting both simulated annealing (SA) and reservoir computing (RC).
SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs	Jintao Zhang, Xuanyao Fong	2026-04-08	下载	Large Language Model (LLM) inference on edge Neural Processing Units (NPUs) is fundamentally constrained by limited on-chip memory capacity. Although high-density embedded DRAM (eDRAM) is attractive f...
SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems	Hyeseong Kim, Gwangoo Yeo, Minsoo Rhu	2026-04-08	下载	GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimiz...
Self-Calibrating LLM-Based Analog Circuit Sizing with Interpretable Design Equations	Antonio J. Bujana, Aydin I. Karsilayan	2026-04-08	下载	We present a self-calibrating framework for analog circuit sizing in which a large language model (LLM) derives topology-specific analytical design equations directly from a raw circuit netlist.
CoverAssert: Iterative LLM Assertion Generation Driven by Functional Coverage via Syntax-Semantic Representations	Yonghao Wang, Yang Yin, Hongqin Lyu, Jiaxin Zhou, Zhiteng Chao, Mingyu Shi, Wenchao Ding, Yunlin Du, Jing Ye, Tiancheng Wang, Huawei Li	2026-04-08	下载	LLMs can generate SystemVerilog assertions (SVAs) from natural language specs, but single-pass outputs often lack functional coverage due to limited IC design understanding.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC	Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa	2026-04-08	下载	Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control.
MEV-ACE: Identity-Authenticated Fair Ordering for Proposer-Controlled MEV Mitigation	Jian Sheng Wang	2026-04-08	下载	Maximal Extractable Value, or MEV, remains a structural threat to blockchain fairness because a block producer can often observe pending transactions and unilaterally decide their ordering or inclusio...
Parallel Batch-Dynamic Maximal Independent Set	Guy Blelloch, Andrew Brady, Laxman Dhulipala, Jeremy Fineman, Jared Lo	2026-04-08	下载	We develop the first theoretically-efficient algorithm for maintaining the maximal independent set (MIS) of a graph in the parallel batch-dynamic setting.
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning	Roberto Vercellino, Jared Willard, Gustavo Campos, Weslley da Silva Pereira, Olivia Hull, Matthew Selensky, Juliane Mueller	2026-04-08	下载	The rapid growth of generative artificial intelligence (AI) has introduced unprecedented computational demands, driving significant increases in the energy footprint of data centers.
Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS	Luca Pennati, Andong Hu, Ivy Peng, Lukas Müllender, Stefano Markidis	2026-04-08	下载	GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge:...
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models	Hongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun	2026-04-08	下载	LoRA enables efficient customization of LLMs and is widely used in multi-tenant and multi-task serving. However, emerging model architectures such as MoE significantly increase LoRA memory cost, makin...
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics	Youhe Jiang, Ran Yan, You Peng, Wenshuang Li, Taiyi Wang, Fangcheng Fu, Binhang Yuan	2026-04-08	下载	Modern Large Language Model (LLM) serving operates in highly volatile environments characterized by severe runtime dynamics, such as workload fluctuations and elastic cluster autoscaling.
Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale	Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang	2026-04-08	下载	When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors...
NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining	Zhida Jiang, Zhaolong Xing, Huichao Chai, Tianxing Sun, Qiang Peng, Baopeng Yuan, Jiaxing Wang, Hua Du, Zhixin Wu, Xuemiao Li, Yikui Cao, Xinyu Liu, Yongxiang Feng, Zhen Chen, Ke Zhang	2026-04-08	下载	Modern recommendation models have increased to trillions of parameters. As cluster scales expand to O(1k), distributed training bottlenecks shift from computation and memory to data movement, especial...
On the Decidability of Distributed Tasks with Output Sets under Asynchrony and Any Number of Crashes	Timothé Albouy, Antonio Fernández Anta, Chryssis Georgiou, Nicolas Nicolaou, Junlang Wang	2026-04-08	下载	In this paper, we define a new class of distributed tasks, called SOS tasks (for Set of Output Sets tasks), defined by the set $O$ of distinct output sets of values that can be produced.
Determinacy with Priorities up to Clocks	Luigi Liquori, Michael Mendler, Claude Stolze	2026-04-08	下载	In Milner's seminal book on communication and concurrency introducing CCS, a process algebra inherently non-deterministic, chapter 11 was completely devoted to introduce the notion of determinacy and ...
Exploiting Aggregate Programming in a Multi-Robot Service Prototype	Giorgio Audrito, Andrea Basso, Daniele Bortoluzzi, Ferruccio Damiani, Giordano Scarso, Gianluca Torta	2026-04-08	下载	Multi-robot systems are becoming increasingly relevant within diverse application domains, such as healthcare, exploration, and rescue missions.
Branching Out: Existential External Choice in Effpi	Benjamin Robinson, Nobuko Yoshida	2026-04-08	下载	Effpi is a framework for writing strongly-typed message-passing programs in Scala, where the compiler enforces the conformance of process implementations to specified protocol types.
Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge	Yebo Wu, Jingguang Li, Chunlin Tian, Kahou Tam, Zhijiang Guo, Li Li	2026-04-08	下载	Federated fine-tuning enables privacy-preserving LLM adaptation but faces a critical bottleneck: the disparity between LLMs' high memory demands and edge devices' limited capacity.
Nexus: Transparent I/O Offloading for High-Density Serverless Computing	JooYoung Park, Kevin Nguetchouang, Jovan Stojkovic, Likun Zhang, Riccardo Mancini, Marco Cali, Dmitrii Ustiugov	2026-04-08	下载	Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibilit...
SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems	Hyeseong Kim, Gwangoo Yeo, Minsoo Rhu	2026-04-08	下载	GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimiz...
Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start	Xueshen Liu, Yongji Wu, Yuncheng Yao, Danyang Zhuo, Ion Stoica, Z. Morley Mao	2026-04-08	下载	Modern LLM service providers increasingly rely on autoscaling and parallelism reconfiguration to respond to rapidly changing workloads, but cold-start latency remains a major bottleneck.
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication	Matthew Qian, Yahia Ramadan, Suhita Anubha, Ariful Azad	2026-04-08	下载	Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in scientific computing, graph analytics, and machine learning, whose performance is often constrained by memory bandwidth.
DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning	S M Shovan, Arindam Khanda, S M Ferdous, Sajal K. Das, Mahantesh Halappanavar	2026-04-08	下载	Semi-supervised learning aims to infer class labels using only a small fraction of labeled data. In graph-based semi-supervised learning, this is typically achieved through label propagation to predic...
Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions	Song-Ju Kim	2026-04-08	下载	We study a lightweight ledger protocol for intermittent and noisy networks, motivated by IoT and mobile settings in which partitions are common and full-history verification is impractical.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
SAFE: Spatially-Aware Feedback Enhancement for Fault-Tolerant Trust Management in VANETs	İpek Abasıkeleş Turgut	2026-04-08	下载	Trust management in VANETs is critically important for secure communication between vehicles. In event-based trust systems, vehicles broadcast the events they witness to their surroundings and send fe...
RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement Learning	F. Fernando Jurado-Lasso, J. F. Jurado	2026-04-08	下载	Time Slotted Channel Hopping (TSCH) is a widely adopted Media Access Control (MAC) protocol within the IEEE 802.15.4e standard, designed to provide reliable and energy-efficient communication in Indus...
IPEK: Intelligent Priority-Aware Event-Based Trust with Asymmetric Knowledge for Resilient Vehicular Ad-Hoc Networks	İpek Abasıkeleş Turgut	2026-04-08	下载	Vehicular Ad Hoc Networks (VANETs) are vulnerable to intelligent attackers who exploit the homogeneous treatment of traffic events in existing trust models.
Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained Inference	Jiaming Cheng, Duong Tung Nguyen	2026-04-08	下载	Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency...
Multiprotocol Wireless Timer Synchronization for IoT Systems	Ziyao Zhou, Tiancheng Cao, Chen Shen, Jiaqi Zhang, Yuting Liu, Hen-Wei Huang	2026-04-08	下载	Accurate time synchronization is essential for Internet of Things (IoT) systems, where multiple distributed nodes must share a common time base for coordinated sensing and data fusion.
Aerial Booster-Cell Enabled Inter-Cell Interference Coordination for 5G NR Networks	Md Sharif Hossen, Vijay K. Shah, Ismail Guvenc	2026-04-08	下载	Cellular-connected unmanned aerial vehicles (UAVs) operating in 5G New Radio (NR) macro networks experience severe and spatially non-uniform downlink interference.
Enhancing Secure Intent-Based Networking with an Agentic AI: The EU Project MARE Approach	Iulisloi Zacarias, Marla Grunewald, Fin Gentzen, Xavi Masip-Bruin, Admela Jukan	2026-04-08	下载	In the EU project MARE, a novel plane was proposed and used in combination with intent-based networking (IBN), allowing the operator to focus on what, rather than on how.
Towards National Quantum Communication in Europe: Planning and Sizing Terrestrial QKD Networks	Sebastian Raubitzek, Werner Strasser, Sebastian Ramacher, Thomas Lebeth, Andreas Neuhold, Christoph Pacher	2026-04-08	下载	The European Union is developing the European Quantum Communication Infrastructure (EuroQCI) as a pan-European network to provide secure communication capabilities across Member States, including gove...
Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions	Song-Ju Kim	2026-04-08	下载	We study a lightweight ledger protocol for intermittent and noisy networks, motivated by IoT and mobile settings in which partitions are common and full-history verification is impractical.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC	Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa	2026-04-08	下载	Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control.
Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale	Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang	2026-04-08	下载	When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors...
Nexus: Transparent I/O Offloading for High-Density Serverless Computing	JooYoung Park, Kevin Nguetchouang, Jovan Stojkovic, Likun Zhang, Riccardo Mancini, Marco Cali, Dmitrii Ustiugov	2026-04-08	下载	Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibilit...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC	Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa	2026-04-08	下载	Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control.
Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale	Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang	2026-04-08	下载	When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors...

2026-04-08 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.OS - Operating Systems ​

cs.PF - Performance ​

2026-04-08

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.OS - Operating Systems

cs.PF - Performance