Skip to content

2026-04-08

cs.AR - Architecture

标题作者发布日期PDF摘要
Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer AccelerationMd Zesun Ahmed Mia, Jiahui Duan, Kai Ni, Abhronil Sengupta2026-04-08下载Self-attention in Transformers generates dynamic operands that force conventional Compute-in-Memory (CIM) accelerators into costly non-volatile memory (NVM) reprogramming cycles, degrading throughput ...
From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI InferenceRavindra Ganti, Steve Xu2026-04-08下载We present an RL-driven compiler that jointly optimizes ASIC architecture, memory hierarchy, and workload partitioning for AI inference across 3nm to 28nm.
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN AccelerationXingzhen Chen, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Sarah Schultz, Zheng Dong, Weisong Shi, Peipei Zhou2026-04-08下载With the development of deep neural network (DNN) enabled applications, achieving high hardware resource efficiency on diverse workloads is non-trivial in heterogeneous computing platforms.
Symbolic Polyhedral-Based Energy Analysis for Nested Loop ProgramsAvinash Mahesh Nirmala, Dominik Walter, Frank Hannig, Jürgen Teich2026-04-08下载This work presents a symbolic approach for estimating the energy consumption for nested loop programs when mapped and scheduled on parallel processor array accelerator architectures.
Assessing the Added Value of Onboard Earth Observation Processing with the IRIDE HEO Service SegmentParampuneet Kaur Thind, Charles Mwangi, Giovanni Varetto, Lorenzo Sarti, Andrea Papa, Andrea Taramelli2026-04-08下载Current operational Earth Observation (EO) services, including the Copernicus Emergency Management Service (CEMS), the European Forest Fire Information System (EFFIS), and the Copernicus Land Monitori...
TRAPTI: Time-Resolved Analysis for SRAM Banking and Power Gating Optimization in Embedded Transformer InferenceJan Klhufek, Alberto Marchisio, Vojtech Mrazek, Lukas Sekanina, Muhammad Shafique2026-04-08下载Transformer neural networks achieve state-of-the-art accuracy across language and vision tasks, but their deployment on embedded hardware is hindered by stringent area, latency, and energy constraints...
CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir ComputingKanta Yoshioka, Soshi Hirayae, Yuichiro Tanaka, Yuichi Katori, Takashi Morie, Hakaru Tamukoh2026-04-08下载This paper presents CBM-Dual, the first silicon-proven digital chaotic dynamics processor (CDP) supporting both simulated annealing (SA) and reservoir computing (RC).
SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUsJintao Zhang, Xuanyao Fong2026-04-08下载Large Language Model (LLM) inference on edge Neural Processing Units (NPUs) is fundamentally constrained by limited on-chip memory capacity. Although high-density embedded DRAM (eDRAM) is attractive f...
SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage SystemsHyeseong Kim, Gwangoo Yeo, Minsoo Rhu2026-04-08下载GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimiz...
Self-Calibrating LLM-Based Analog Circuit Sizing with Interpretable Design EquationsAntonio J. Bujana, Aydin I. Karsilayan2026-04-08下载We present a self-calibrating framework for analog circuit sizing in which a large language model (LLM) derives topology-specific analytical design equations directly from a raw circuit netlist.
CoverAssert: Iterative LLM Assertion Generation Driven by Functional Coverage via Syntax-Semantic RepresentationsYonghao Wang, Yang Yin, Hongqin Lyu, Jiaxin Zhou, Zhiteng Chao, Mingyu Shi, Wenchao Ding, Yunlin Du, Jing Ye, Tiancheng Wang, Huawei Li2026-04-08下载LLMs can generate SystemVerilog assertions (SVAs) from natural language specs, but single-pass outputs often lack functional coverage due to limited IC design understanding.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNICMohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa2026-04-08下载Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control.
MEV-ACE: Identity-Authenticated Fair Ordering for Proposer-Controlled MEV MitigationJian Sheng Wang2026-04-08下载Maximal Extractable Value, or MEV, remains a structural threat to blockchain fairness because a block producer can often observe pending transactions and unilaterally decide their ordering or inclusio...
Parallel Batch-Dynamic Maximal Independent SetGuy Blelloch, Andrew Brady, Laxman Dhulipala, Jeremy Fineman, Jared Lo2026-04-08下载We develop the first theoretically-efficient algorithm for maintaining the maximal independent set (MIS) of a graph in the parallel batch-dynamic setting.
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure PlanningRoberto Vercellino, Jared Willard, Gustavo Campos, Weslley da Silva Pereira, Olivia Hull, Matthew Selensky, Juliane Mueller2026-04-08下载The rapid growth of generative artificial intelligence (AI) has introduced unprecedented computational demands, driving significant increases in the energy footprint of data centers.
Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACSLuca Pennati, Andong Hu, Ivy Peng, Lukas Müllender, Stefano Markidis2026-04-08下载GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge:...
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language ModelsHongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun2026-04-08下载LoRA enables efficient customization of LLMs and is widely used in multi-tenant and multi-task serving. However, emerging model architectures such as MoE significantly increase LoRA memory cost, makin...
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime DynamicsYouhe Jiang, Ran Yan, You Peng, Wenshuang Li, Taiyi Wang, Fangcheng Fu, Binhang Yuan2026-04-08下载Modern Large Language Model (LLM) serving operates in highly volatile environments characterized by severe runtime dynamics, such as workload fluctuations and elastic cluster autoscaling.
Scheduling the Unschedulable: Taming Black-Box LLM Inference at ScaleRenzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang2026-04-08下载When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors...
NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested PipeliningZhida Jiang, Zhaolong Xing, Huichao Chai, Tianxing Sun, Qiang Peng, Baopeng Yuan, Jiaxing Wang, Hua Du, Zhixin Wu, Xuemiao Li, Yikui Cao, Xinyu Liu, Yongxiang Feng, Zhen Chen, Ke Zhang2026-04-08下载Modern recommendation models have increased to trillions of parameters. As cluster scales expand to O(1k), distributed training bottlenecks shift from computation and memory to data movement, especial...
On the Decidability of Distributed Tasks with Output Sets under Asynchrony and Any Number of CrashesTimothé Albouy, Antonio Fernández Anta, Chryssis Georgiou, Nicolas Nicolaou, Junlang Wang2026-04-08下载In this paper, we define a new class of distributed tasks, called SOS tasks (for Set of Output Sets tasks), defined by the set OO of distinct output sets of values that can be produced.
Determinacy with Priorities up to ClocksLuigi Liquori, Michael Mendler, Claude Stolze2026-04-08下载In Milner's seminal book on communication and concurrency introducing CCS, a process algebra inherently non-deterministic, chapter 11 was completely devoted to introduce the notion of determinacy and ...
Exploiting Aggregate Programming in a Multi-Robot Service PrototypeGiorgio Audrito, Andrea Basso, Daniele Bortoluzzi, Ferruccio Damiani, Giordano Scarso, Gianluca Torta2026-04-08下载Multi-robot systems are becoming increasingly relevant within diverse application domains, such as healthcare, exploration, and rescue missions.
Branching Out: Existential External Choice in EffpiBenjamin Robinson, Nobuko Yoshida2026-04-08下载Effpi is a framework for writing strongly-typed message-passing programs in Scala, where the compiler enforces the conformance of process implementations to specified protocol types.
Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the EdgeYebo Wu, Jingguang Li, Chunlin Tian, Kahou Tam, Zhijiang Guo, Li Li2026-04-08下载Federated fine-tuning enables privacy-preserving LLM adaptation but faces a critical bottleneck: the disparity between LLMs' high memory demands and edge devices' limited capacity.
Nexus: Transparent I/O Offloading for High-Density Serverless ComputingJooYoung Park, Kevin Nguetchouang, Jovan Stojkovic, Likun Zhang, Riccardo Mancini, Marco Cali, Dmitrii Ustiugov2026-04-08下载Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibilit...
SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage SystemsHyeseong Kim, Gwangoo Yeo, Minsoo Rhu2026-04-08下载GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimiz...
Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold StartXueshen Liu, Yongji Wu, Yuncheng Yao, Danyang Zhuo, Ion Stoica, Z. Morley Mao2026-04-08下载Modern LLM service providers increasingly rely on autoscaling and parallelism reconfiguration to respond to rapidly changing workloads, but cold-start latency remains a major bottleneck.
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix MultiplicationMatthew Qian, Yahia Ramadan, Suhita Anubha, Ariful Azad2026-04-08下载Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in scientific computing, graph analytics, and machine learning, whose performance is often constrained by memory bandwidth.
DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised LearningS M Shovan, Arindam Khanda, S M Ferdous, Sajal K. Das, Mahantesh Halappanavar2026-04-08下载Semi-supervised learning aims to infer class labels using only a small fraction of labeled data. In graph-based semi-supervised learning, this is typically achieved through label propagation to predic...
Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent PartitionsSong-Ju Kim2026-04-08下载We study a lightweight ledger protocol for intermittent and noisy networks, motivated by IoT and mobile settings in which partitions are common and full-history verification is impractical.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
SAFE: Spatially-Aware Feedback Enhancement for Fault-Tolerant Trust Management in VANETsİpek Abasıkeleş Turgut2026-04-08下载Trust management in VANETs is critically important for secure communication between vehicles. In event-based trust systems, vehicles broadcast the events they witness to their surroundings and send fe...
RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement LearningF. Fernando Jurado-Lasso, J. F. Jurado2026-04-08下载Time Slotted Channel Hopping (TSCH) is a widely adopted Media Access Control (MAC) protocol within the IEEE 802.15.4e standard, designed to provide reliable and energy-efficient communication in Indus...
IPEK: Intelligent Priority-Aware Event-Based Trust with Asymmetric Knowledge for Resilient Vehicular Ad-Hoc Networksİpek Abasıkeleş Turgut2026-04-08下载Vehicular Ad Hoc Networks (VANETs) are vulnerable to intelligent attackers who exploit the homogeneous treatment of traffic events in existing trust models.
Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained InferenceJiaming Cheng, Duong Tung Nguyen2026-04-08下载Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency...
Multiprotocol Wireless Timer Synchronization for IoT SystemsZiyao Zhou, Tiancheng Cao, Chen Shen, Jiaqi Zhang, Yuting Liu, Hen-Wei Huang2026-04-08下载Accurate time synchronization is essential for Internet of Things (IoT) systems, where multiple distributed nodes must share a common time base for coordinated sensing and data fusion.
Aerial Booster-Cell Enabled Inter-Cell Interference Coordination for 5G NR NetworksMd Sharif Hossen, Vijay K. Shah, Ismail Guvenc2026-04-08下载Cellular-connected unmanned aerial vehicles (UAVs) operating in 5G New Radio (NR) macro networks experience severe and spatially non-uniform downlink interference.
Enhancing Secure Intent-Based Networking with an Agentic AI: The EU Project MARE ApproachIulisloi Zacarias, Marla Grunewald, Fin Gentzen, Xavi Masip-Bruin, Admela Jukan2026-04-08下载In the EU project MARE, a novel plane was proposed and used in combination with intent-based networking (IBN), allowing the operator to focus on what, rather than on how.
Towards National Quantum Communication in Europe: Planning and Sizing Terrestrial QKD NetworksSebastian Raubitzek, Werner Strasser, Sebastian Ramacher, Thomas Lebeth, Andreas Neuhold, Christoph Pacher2026-04-08下载The European Union is developing the European Quantum Communication Infrastructure (EuroQCI) as a pan-European network to provide secure communication capabilities across Member States, including gove...
Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent PartitionsSong-Ju Kim2026-04-08下载We study a lightweight ledger protocol for intermittent and noisy networks, motivated by IoT and mobile settings in which partitions are common and full-history verification is impractical.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNICMohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa2026-04-08下载Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control.
Scheduling the Unschedulable: Taming Black-Box LLM Inference at ScaleRenzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang2026-04-08下载When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors...
Nexus: Transparent I/O Offloading for High-Density Serverless ComputingJooYoung Park, Kevin Nguetchouang, Jovan Stojkovic, Likun Zhang, Riccardo Mancini, Marco Cali, Dmitrii Ustiugov2026-04-08下载Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibilit...

cs.PF - Performance

标题作者发布日期PDF摘要
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNICMohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa2026-04-08下载Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control.
Scheduling the Unschedulable: Taming Black-Box LLM Inference at ScaleRenzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang2026-04-08下载When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors...

基于 VitePress 构建