Appearance
2026-04-08
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration | Md Zesun Ahmed Mia, Jiahui Duan, Kai Ni, Abhronil Sengupta | 2026-04-08 | 下载 | Self-attention in Transformers generates dynamic operands that force conventional Compute-in-Memory (CIM) accelerators into costly non-volatile memory (NVM) reprogramming cycles, degrading throughput ... |
| From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference | Ravindra Ganti, Steve Xu | 2026-04-08 | 下载 | We present an RL-driven compiler that jointly optimizes ASIC architecture, memory hierarchy, and workload partitioning for AI inference across 3nm to 28nm. |
| FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration | Xingzhen Chen, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Sarah Schultz, Zheng Dong, Weisong Shi, Peipei Zhou | 2026-04-08 | 下载 | With the development of deep neural network (DNN) enabled applications, achieving high hardware resource efficiency on diverse workloads is non-trivial in heterogeneous computing platforms. |
| Symbolic Polyhedral-Based Energy Analysis for Nested Loop Programs | Avinash Mahesh Nirmala, Dominik Walter, Frank Hannig, Jürgen Teich | 2026-04-08 | 下载 | This work presents a symbolic approach for estimating the energy consumption for nested loop programs when mapped and scheduled on parallel processor array accelerator architectures. |
| Assessing the Added Value of Onboard Earth Observation Processing with the IRIDE HEO Service Segment | Parampuneet Kaur Thind, Charles Mwangi, Giovanni Varetto, Lorenzo Sarti, Andrea Papa, Andrea Taramelli | 2026-04-08 | 下载 | Current operational Earth Observation (EO) services, including the Copernicus Emergency Management Service (CEMS), the European Forest Fire Information System (EFFIS), and the Copernicus Land Monitori... |
| TRAPTI: Time-Resolved Analysis for SRAM Banking and Power Gating Optimization in Embedded Transformer Inference | Jan Klhufek, Alberto Marchisio, Vojtech Mrazek, Lukas Sekanina, Muhammad Shafique | 2026-04-08 | 下载 | Transformer neural networks achieve state-of-the-art accuracy across language and vision tasks, but their deployment on embedded hardware is hindered by stringent area, latency, and energy constraints... |
| CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing | Kanta Yoshioka, Soshi Hirayae, Yuichiro Tanaka, Yuichi Katori, Takashi Morie, Hakaru Tamukoh | 2026-04-08 | 下载 | This paper presents CBM-Dual, the first silicon-proven digital chaotic dynamics processor (CDP) supporting both simulated annealing (SA) and reservoir computing (RC). |
| SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs | Jintao Zhang, Xuanyao Fong | 2026-04-08 | 下载 | Large Language Model (LLM) inference on edge Neural Processing Units (NPUs) is fundamentally constrained by limited on-chip memory capacity. Although high-density embedded DRAM (eDRAM) is attractive f... |
| SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems | Hyeseong Kim, Gwangoo Yeo, Minsoo Rhu | 2026-04-08 | 下载 | GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimiz... |
| Self-Calibrating LLM-Based Analog Circuit Sizing with Interpretable Design Equations | Antonio J. Bujana, Aydin I. Karsilayan | 2026-04-08 | 下载 | We present a self-calibrating framework for analog circuit sizing in which a large language model (LLM) derives topology-specific analytical design equations directly from a raw circuit netlist. |
| CoverAssert: Iterative LLM Assertion Generation Driven by Functional Coverage via Syntax-Semantic Representations | Yonghao Wang, Yang Yin, Hongqin Lyu, Jiaxin Zhou, Zhiteng Chao, Mingyu Shi, Wenchao Ding, Yunlin Du, Jing Ye, Tiancheng Wang, Huawei Li | 2026-04-08 | 下载 | LLMs can generate SystemVerilog assertions (SVAs) from natural language specs, but single-pass outputs often lack functional coverage due to limited IC design understanding. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC | Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa | 2026-04-08 | 下载 | Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control. |
| MEV-ACE: Identity-Authenticated Fair Ordering for Proposer-Controlled MEV Mitigation | Jian Sheng Wang | 2026-04-08 | 下载 | Maximal Extractable Value, or MEV, remains a structural threat to blockchain fairness because a block producer can often observe pending transactions and unilaterally decide their ordering or inclusio... |
| Parallel Batch-Dynamic Maximal Independent Set | Guy Blelloch, Andrew Brady, Laxman Dhulipala, Jeremy Fineman, Jared Lo | 2026-04-08 | 下载 | We develop the first theoretically-efficient algorithm for maintaining the maximal independent set (MIS) of a graph in the parallel batch-dynamic setting. |
| Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning | Roberto Vercellino, Jared Willard, Gustavo Campos, Weslley da Silva Pereira, Olivia Hull, Matthew Selensky, Juliane Mueller | 2026-04-08 | 下载 | The rapid growth of generative artificial intelligence (AI) has introduced unprecedented computational demands, driving significant increases in the energy footprint of data centers. |
| Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS | Luca Pennati, Andong Hu, Ivy Peng, Lukas Müllender, Stefano Markidis | 2026-04-08 | 下载 | GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge:... |
| InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models | Hongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun | 2026-04-08 | 下载 | LoRA enables efficient customization of LLMs and is widely used in multi-tenant and multi-task serving. However, emerging model architectures such as MoE significantly increase LoRA memory cost, makin... |
| Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics | Youhe Jiang, Ran Yan, You Peng, Wenshuang Li, Taiyi Wang, Fangcheng Fu, Binhang Yuan | 2026-04-08 | 下载 | Modern Large Language Model (LLM) serving operates in highly volatile environments characterized by severe runtime dynamics, such as workload fluctuations and elastic cluster autoscaling. |
| Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale | Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang | 2026-04-08 | 下载 | When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors... |
| NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining | Zhida Jiang, Zhaolong Xing, Huichao Chai, Tianxing Sun, Qiang Peng, Baopeng Yuan, Jiaxing Wang, Hua Du, Zhixin Wu, Xuemiao Li, Yikui Cao, Xinyu Liu, Yongxiang Feng, Zhen Chen, Ke Zhang | 2026-04-08 | 下载 | Modern recommendation models have increased to trillions of parameters. As cluster scales expand to O(1k), distributed training bottlenecks shift from computation and memory to data movement, especial... |
| On the Decidability of Distributed Tasks with Output Sets under Asynchrony and Any Number of Crashes | Timothé Albouy, Antonio Fernández Anta, Chryssis Georgiou, Nicolas Nicolaou, Junlang Wang | 2026-04-08 | 下载 | In this paper, we define a new class of distributed tasks, called SOS tasks (for Set of Output Sets tasks), defined by the set of distinct output sets of values that can be produced. |
| Determinacy with Priorities up to Clocks | Luigi Liquori, Michael Mendler, Claude Stolze | 2026-04-08 | 下载 | In Milner's seminal book on communication and concurrency introducing CCS, a process algebra inherently non-deterministic, chapter 11 was completely devoted to introduce the notion of determinacy and ... |
| Exploiting Aggregate Programming in a Multi-Robot Service Prototype | Giorgio Audrito, Andrea Basso, Daniele Bortoluzzi, Ferruccio Damiani, Giordano Scarso, Gianluca Torta | 2026-04-08 | 下载 | Multi-robot systems are becoming increasingly relevant within diverse application domains, such as healthcare, exploration, and rescue missions. |
| Branching Out: Existential External Choice in Effpi | Benjamin Robinson, Nobuko Yoshida | 2026-04-08 | 下载 | Effpi is a framework for writing strongly-typed message-passing programs in Scala, where the compiler enforces the conformance of process implementations to specified protocol types. |
| Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge | Yebo Wu, Jingguang Li, Chunlin Tian, Kahou Tam, Zhijiang Guo, Li Li | 2026-04-08 | 下载 | Federated fine-tuning enables privacy-preserving LLM adaptation but faces a critical bottleneck: the disparity between LLMs' high memory demands and edge devices' limited capacity. |
| Nexus: Transparent I/O Offloading for High-Density Serverless Computing | JooYoung Park, Kevin Nguetchouang, Jovan Stojkovic, Likun Zhang, Riccardo Mancini, Marco Cali, Dmitrii Ustiugov | 2026-04-08 | 下载 | Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibilit... |
| SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems | Hyeseong Kim, Gwangoo Yeo, Minsoo Rhu | 2026-04-08 | 下载 | GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimiz... |
| Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start | Xueshen Liu, Yongji Wu, Yuncheng Yao, Danyang Zhuo, Ion Stoica, Z. Morley Mao | 2026-04-08 | 下载 | Modern LLM service providers increasingly rely on autoscaling and parallelism reconfiguration to respond to rapidly changing workloads, but cold-start latency remains a major bottleneck. |
| Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication | Matthew Qian, Yahia Ramadan, Suhita Anubha, Ariful Azad | 2026-04-08 | 下载 | Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in scientific computing, graph analytics, and machine learning, whose performance is often constrained by memory bandwidth. |
| DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning | S M Shovan, Arindam Khanda, S M Ferdous, Sajal K. Das, Mahantesh Halappanavar | 2026-04-08 | 下载 | Semi-supervised learning aims to infer class labels using only a small fraction of labeled data. In graph-based semi-supervised learning, this is typically achieved through label propagation to predic... |
| Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions | Song-Ju Kim | 2026-04-08 | 下载 | We study a lightweight ledger protocol for intermittent and noisy networks, motivated by IoT and mobile settings in which partitions are common and full-history verification is impractical. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SAFE: Spatially-Aware Feedback Enhancement for Fault-Tolerant Trust Management in VANETs | İpek Abasıkeleş Turgut | 2026-04-08 | 下载 | Trust management in VANETs is critically important for secure communication between vehicles. In event-based trust systems, vehicles broadcast the events they witness to their surroundings and send fe... |
| RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement Learning | F. Fernando Jurado-Lasso, J. F. Jurado | 2026-04-08 | 下载 | Time Slotted Channel Hopping (TSCH) is a widely adopted Media Access Control (MAC) protocol within the IEEE 802.15.4e standard, designed to provide reliable and energy-efficient communication in Indus... |
| IPEK: Intelligent Priority-Aware Event-Based Trust with Asymmetric Knowledge for Resilient Vehicular Ad-Hoc Networks | İpek Abasıkeleş Turgut | 2026-04-08 | 下载 | Vehicular Ad Hoc Networks (VANETs) are vulnerable to intelligent attackers who exploit the homogeneous treatment of traffic events in existing trust models. |
| Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained Inference | Jiaming Cheng, Duong Tung Nguyen | 2026-04-08 | 下载 | Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency... |
| Multiprotocol Wireless Timer Synchronization for IoT Systems | Ziyao Zhou, Tiancheng Cao, Chen Shen, Jiaqi Zhang, Yuting Liu, Hen-Wei Huang | 2026-04-08 | 下载 | Accurate time synchronization is essential for Internet of Things (IoT) systems, where multiple distributed nodes must share a common time base for coordinated sensing and data fusion. |
| Aerial Booster-Cell Enabled Inter-Cell Interference Coordination for 5G NR Networks | Md Sharif Hossen, Vijay K. Shah, Ismail Guvenc | 2026-04-08 | 下载 | Cellular-connected unmanned aerial vehicles (UAVs) operating in 5G New Radio (NR) macro networks experience severe and spatially non-uniform downlink interference. |
| Enhancing Secure Intent-Based Networking with an Agentic AI: The EU Project MARE Approach | Iulisloi Zacarias, Marla Grunewald, Fin Gentzen, Xavi Masip-Bruin, Admela Jukan | 2026-04-08 | 下载 | In the EU project MARE, a novel plane was proposed and used in combination with intent-based networking (IBN), allowing the operator to focus on what, rather than on how. |
| Towards National Quantum Communication in Europe: Planning and Sizing Terrestrial QKD Networks | Sebastian Raubitzek, Werner Strasser, Sebastian Ramacher, Thomas Lebeth, Andreas Neuhold, Christoph Pacher | 2026-04-08 | 下载 | The European Union is developing the European Quantum Communication Infrastructure (EuroQCI) as a pan-European network to provide secure communication capabilities across Member States, including gove... |
| Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions | Song-Ju Kim | 2026-04-08 | 下载 | We study a lightweight ledger protocol for intermittent and noisy networks, motivated by IoT and mobile settings in which partitions are common and full-history verification is impractical. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC | Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa | 2026-04-08 | 下载 | Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control. |
| Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale | Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang | 2026-04-08 | 下载 | When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors... |
| Nexus: Transparent I/O Offloading for High-Density Serverless Computing | JooYoung Park, Kevin Nguetchouang, Jovan Stojkovic, Likun Zhang, Riccardo Mancini, Marco Cali, Dmitrii Ustiugov | 2026-04-08 | 下载 | Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibilit... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC | Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa | 2026-04-08 | 下载 | Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control. |
| Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale | Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang | 2026-04-08 | 下载 | When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors... |