Skip to content

2026-04-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Fast Cross-Operator Optimization of Attention DataflowHaodong Chang, Hailiang Hu, Zhenrui Wang, Yu Gong, Rongjian Liang, Zhexiang Tang, Bo Yuan, Jiang Hu2026-04-03下载Attention is a fundamental computational kernel that accounts for the majority of the workload in transformer and LLM computing. Optimizing dataflow is crucial for enhancing both performance and energ...
YANA: Bridging the Neuromorphic Simulation-to-Hardware GapBrian Pachideh, Sven Nitzsche, Moritz Neher, Jann Krausse, Carmen Weigelt, Klaus Knobloch, Victor Pazmino Betancourt, Juergen Becker2026-04-03下载Spiking Neural Networks (SNNs) promise significant advantages over conventional Artificial Neural Networks (ANNs) for applications requiring real-time processing of temporally sparse data streams unde...
InCoder-32B-Thinking: Industrial Code World Model for ThinkingJian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Tuney Zheng, Fanglin Xu, Weicheng Gu, Lin Jing, Yaxin Du, Joseph Li, Yizhi Li, Yan Xing, Chuan Hao, Ran Tao, Ruihao Gong, Aishan Liu, Zhoujun Li, Mingjie Tang, Chenghua Lin, Siheng Chen, Wayne Xin Zhao, Xianglong Liu, Ming Zhou, Bryan Dai, Weifeng Lv2026-04-03下载Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason about hardware constraints and timing semantics.
EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph ColoringXuanhao Bao, Danial Chitnis2026-04-03下载As modern analogue/mixed-signal design increasingly relies on optimization-in-the-loop flows, such as AI and LLM-based sizing agents that repeatedly invoke SPICE-efficient, accurate high-performance s...
ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMsLik Tung Fu, Jie Zhou, Shaokai Ren, Mengli Zhang, Jia Xiong, Hugo Jiang, Nan Guan, Xi Wang, Jun Yang2026-04-03下载Functional verification consumes over 50% of the IC development lifecycle, where SystemVerilog Assertions (SVAs) are indispensable for formal property verification and enhanced simulation-based debugg...
AXELRAM: Quantize Once, Never DequantizeYasushi Nishida2026-04-03下载We propose AXELRAM, a smart SRAM macro architecture that computes attention scores directly from quantized KV cache indices without dequantization.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Hybrid Quantum-HPC Middleware Systems for Adaptive Resource, Workload and Task ManagementPradeep Mantha, Florian J. Kiwit, Nishant Saurabh, Shantenu Jha, Andre Luckow2026-04-03下载Hybrid quantum-classical applications pose significant resource management challenges due to heterogeneity and dynamism in both infrastructure and workloads.
AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU SystemsZhaoting Gong, Ran Ran, Fan Yao, Wujie Wen2026-04-03下载Fully Homomorphic Encryption (FHE) enables privacy-preserving Transformer inference, but long-sequence encrypted Transformers quickly exceed single-GPU memory capacity because encoded weights are alre...
Causal Inference for Quantifying Noisy Neighbor Effects in Multi-Tenant Cloud EnvironmentsPhilipe S. Schiavo, João P. S. Milanezi, Moisés R. N. Ribeiro, Víctor M. G. Martínez, João Henrique Corrêa, José Marcos Nogueira, Fernando Frota Redigolo, Tereza C. Carvalho, Flávio de Oliveira Silva2026-04-03下载Resource sharing in multi-tenant cloud environments enables cost efficiency but introduces the Noisy Neighbor problem, i.e., co-located workloads that unpredictably degrade each other's performance.
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache SharingZhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, Youwei Zhuo2026-04-03下载Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context.
HistMSO: A Logic for Reasoning about Consistency Models with MONAIsabelle Coget, Étienne Lozes2026-04-03下载Reasoning about consistency models for replicated data systems is a challenging task that requires a deep understanding of both the consistency models themselves and a large part of human inputs in me...
CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic SynchronizationYuxuan Du, Xuchuan Luo, Xin Wang, Yangfan Zhou, Jiacheng Shen2026-04-03下载Memory-disaggregated key-value (KV) stores suffer from a severe performance bottleneck due to their I/O redundancy issues. A huge amount of redundant I/Os are generated when synchronizing concurrent d...
FedSQ: Optimized Weight Averaging via Fixed GatingCristian Pérez-Corral, Jose I. Mestre, Alberto Fernández-Hernández, Manuel F. Dolz, José Duato, Enrique S. Quintana-Ortí2026-04-03下载Federated learning (FL) enables collaborative training across organizations without sharing raw data, but it is hindered by statistical heterogeneity (non-i.i.d.
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM InferenceZheming Yang, Qi Guo, Jun Wan, Jiarui Ruan, Yunqing Hu, Chang Zhao, Xiangyang Li2026-04-03下载Multimodal large language models (MLLMs) enable powerful cross-modal reasoning capabilities but impose substantial computational and latency burdens, posing critical challenges for deployment on resou...
Digital Twin-Assisted In-Network and Edge Collaboration for Joint User Association, Task Offloading, and Resource Allocation in the MetaverseIbrahim Aliyu, Seungmin Oh, Sangwon Oh, Jinsul Kim2026-04-03下载Advancements in extended reality (XR) are driving the development of the metaverse, which demands efficient real-time transformation of 2D scenes into 3D objects, a computation-intensive process that ...
Scalable Mean-Variance Portfolio Optimization via Subspace Embeddings and GPU-Friendly Nesterov-Accelerated Projected GradientYi-Shuai Niu, Yajuan Wang2026-04-03下载We develop a sketch-based factor reduction and a Nesterov-accelerated projected gradient algorithm (NPGA) with GPU acceleration, yielding a doubly accelerated solver for large-scale constrained mean-v...
Accelerating Nonlinear Time-History Analysis with Complex Constitutive Laws via Heterogeneous Memory Management: From 3D Seismic Simulation to Neural Network TrainingTsuyoshi Ichimura, Kohei Fujita, Hideaki Ito, Muneo Hori, Lalith Maddegedara2026-04-03下载Nonlinear time-history evolution problems employing high-fidelity physical models are essential in numerous scientific domains. However, these problems face a critical dual bottleneck: the immense com...
Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN TrainingCunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele2026-04-03下载Graph neural networks (GNNs) are widely used for learning on graph datasets derived from various real-world scenarios. Learning from extremely large graphs requires distributed training, and mini-batc...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Scaling Multi-agent Systems: A Smart Middleware for Improving Agent InteractionsCharles Fleming, Ramana Kompella, Peter Bosch, Vijoy Pandey2026-04-03下载As Large Language Model (LLM) based Multi-Agent Systems (MAS) evolve from experimental pilots to complex, persistent ecosystems, the limitations of direct agent-to-agent communication have become incr...
Causal Inference for Quantifying Noisy Neighbor Effects in Multi-Tenant Cloud EnvironmentsPhilipe S. Schiavo, João P. S. Milanezi, Moisés R. N. Ribeiro, Víctor M. G. Martínez, João Henrique Corrêa, José Marcos Nogueira, Fernando Frota Redigolo, Tereza C. Carvalho, Flávio de Oliveira Silva2026-04-03下载Resource sharing in multi-tenant cloud environments enables cost efficiency but introduces the Noisy Neighbor problem, i.e., co-located workloads that unpredictably degrade each other's performance.
Towards Near-Real-Time Telemetry-Aware Routing with Neural Routing AlgorithmsAndreas Boltres, Niklas Freymuth, Benjamin Schichtholz, Michael König, Gerhard Neumann2026-04-03下载Routing algorithms are crucial for efficient computer network operations, and in many settings they must be able to react to traffic bursts within milliseconds.
Open Challenges for Secure and Scalable Wi-Fi Connectivity in Rural AreasPhilip Virgil Berrer Astillo, Jayasree Sengupta, Mathy Vanhoef2026-04-03下载Providing reliable, affordable, and secure Internet connectivity in rural areas remains a major challenge. Pay-for-use Wi-Fi hotspots are emerging as a scalable solution to provide affordable Internet...

cs.PF - Performance

标题作者发布日期PDF摘要
The Price of Interoperability: Exploring Cross-Chain Bridges and Their Economic ConsequencesYiyue Cao, Mingzhe Zheng, Lin William Cong, Siguang Li, Xuechao Wang2026-04-03下载Modern blockchain ecosystems comprise many heterogeneous networks, creating a growing need for interoperability. Cross-chain bridges provide the core infrastructure for this interoperability by enabli...

基于 VitePress 构建