2025-11-11

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial Intelligence	Eren Kurshan, Yuan Xie, Paul Franzon	2025-11-11	下载	AI systems have found a wide range of real-world applications in recent years. The adoption of edge artificial intelligence, embedding AI directly into edge devices, is rapidly growing.
CO2-Meter: A Comprehensive Carbon Footprint Estimator for LLMs on Edge Devices	Zhenxiao Fu, Chen Fan, Lei Jiang	2025-11-11	下载	LLMs have transformed NLP, yet deploying them on edge devices poses great carbon challenges. Prior estimators remain incomplete, neglecting peripheral energy use, distinct prefill/decode behaviors, an...
DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator	Xingyu Liu, Jiawei Liang, Yipu Zhang, Linfeng Du, Chaofang Ma, Hui Yu, Jiang Xu, Wei Zhang	2025-11-11	下载	We propose a hardware-efficient RBD accelerator based on FPGA, introducing three key innovations. First, we propose a precision-aware quantization framework that reduces DSP demand while preserving mo...
BDD2Seq: Enabling Scalable Reversible-Circuit Synthesis via Graph-to-Sequence Learning	Mingkai Miao, Jianheng Tang, Guangyu Hu, Hongce Zhang	2025-11-11	下载	Binary Decision Diagrams (BDDs) are instrumental in many electronic design automation (EDA) tasks thanks to their compact representation of Boolean functions.
UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing	Zhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, Hong Yan	2025-11-11	下载	The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-...
WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and Portability	Mislav Has, Tao Xiong, Fehmi Ben Abdesslem, Mario Kušek	2025-11-11	下载	The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited re...
Re $^{\text{2}}$ MaP: Macro Placement by Recursively Prototyping and Packing Tree-based Relocating	Yunqi Shi, Xi Lin, Zhiang Wang, Siyuan Xu, Shixiong Kai, Yao Lai, Chengrui Gao, Ke Xue, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou	2025-11-11	下载	This work introduces the Re $^{\text{2}}$ MaP method, which generates expert-quality macro placements through recursively prototyping and packing tree-based relocating.
PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization	Simei Yang, Xinyu Shi, Lu Zhao, Yunyu Ling, Quanjun Wang, Francky Catthoor	2025-11-11	下载	Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses.
Streaming Tensor Programs: A Streaming Abstraction for Dynamic Parallelism	Gina Sohn, Genghan Zhang, Konstantin Hossfeld, Jungwoo Kim, Nathan Sobotka, Nathan Zhang, Olivia Hsu, Kunle Olukotun	2025-11-11	下载	Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
OSGym: Scalable OS Infra for Computer Use Agents	Zengyi Qin, Jinyuan Chen, Yunze Man, Shengcao Cao, Ziqi Pang, Zhuoyuan Wang, Han Fang, Ling Zhu, Zixin Xie, Zibu Wei, Tianshu Ran, Haoran Geng, Ray Pan, Qizhen Sun, Zachary Bright, Yuyang Cai, Chongye Yang, Jiace Zhao, Tianrui Liu, Han Cao, Yeyang Zhou, Rui Wang, Song Wang, Xiang Ren, Bo Zhang, Yutong Ban, Pieter Abbeel, Brian Anthony	2025-11-11	下载	Training computer use agents requires full-featured OS sandboxes with GUI environments, which consume substantial hardware resources as the number of sandboxes scales.
An MLIR pipeline for offloading Fortran to FPGAs via OpenMP	Gabriel Rodriguez-Canal, David Katz, Nick Brown	2025-11-11	下载	With the slowing of Moore's Law, heterogeneous computing platforms such as Field Programmable Gate Arrays (FPGAs) have gained increasing interest for accelerating HPC workloads.
Priority Matters: Optimising Kubernetes Clusters Usage with Constraint-Based Pod Packing	Henrik Daniel Christensen, Saverio Giallorenzo, Jacopo Mauro	2025-11-11	下载	Distributed applications employ Kubernetes for scalable, fault-tolerant deployments over computer clusters, where application components run in groups of containers called pods.
Gathering in Vertex- and Edge-Transitive Graphs without Multiplicity Detection under Round Robin	Serafino Cicerone, Alessia Di Fonso, Gabriele Di Stefano, Alfredo Navarra	2025-11-11	下载	In the field of swarm robotics, one of the most studied problem is Gathering. It asks for a distributed algorithm that brings the robots to a common location, not known in advance.
Forgetting Alternation and Blossoms: A New Framework for Fast Matching Augmentation and Its Applications to Sequential/Distributed/Streaming Computation	Taisuke Izumi, Naoki Kitamura, Yutaro Yamaguchi	2025-11-11	下载	Finding a maximum cardinality matching in a graph is one of the most fundamental problems. An algorithm proposed by Micali and Vazirani (1980) is well-known to solve the problem in $O(m\sqrt{n})$ time...
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks	Mingyu Sung, Suhwan Im, Vikas Palakonda, Jae-Mo Kang	2025-11-11	下载	Split computing distributes deep neural network inference between resource-constrained edge devices and cloud servers but faces significant communication bottlenecks when transmitting intermediate fea...
LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME Architectures	Kelun Lei, Hailong Yang, Kaige Zhang, Kejie Ma, Yiqing Wang, Xin You, Yufan Xu, Enrique S. Quintana-Orti, Zhongzhi Luan, Yi Liu, Depei Qian	2025-11-11	下载	Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in both scientific computing and emerging graph learning workloads. The recent Armv9 architecture introduces Scalable Matrix Exten...
ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum	Andrija Stanisic, Stefan Nastic	2025-11-11	下载	Integration of edge, cloud and space devices into a unified 3D continuum imposes significant challenges for client selection in federated learning systems.
BIPPO: Budget-Aware Independent PPO for Energy-Efficient Federated Learning Services	Anna Lackinger, Andrea Morichetta, Pantelis A. Frangoudis, Schahram Dustdar	2025-11-11	下载	Federated Learning (FL) is a promising machine learning solution in large-scale IoT systems, guaranteeing load distribution and privacy. However, FL does not natively consider infrastructure efficienc...
UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing	Zhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, Hong Yan	2025-11-11	下载	The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-...
Foam Segmentation in Wastewater Treatment Plants: A Federated Learning Approach with Segment Anything Model 2	Mehmet Batuhan Duman, Alejandro Carnero, Cristian Martín, Daniel Garrido, Manuel Díaz	2025-11-11	下载	Foam formation in Wastewater Treatment Plants (WTPs) is a major challenge that can reduce treatment efficiency and increase costs. The ability to automatically examine changes in real-time with respec...
Generic Algorithm for Universal TDM Communication Over Inter Satellite Links	Miroslav Popovic, Marko Popovic, Pavle Vasiljevic, Ilija Basicevic	2025-11-11	下载	The original Python Testbed for Federated Learning Algorithms is a light FL framework, which provides the three generic algorithms: the centralized federated learning, the decentralized federated lear...
ACGraph: An Efficient Asynchronous Out-of-Core Graph Processing Framework	Dechuang Chen, Sibo Wang, Qintian Guo	2025-11-11	下载	Graphs are a ubiquitous data structure in diverse domains such as machine learning, social networks, and data mining. As real-world graphs continue to grow beyond the memory capacity of single machine...
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI	Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré	2025-11-11	下载	Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to sca...
Parallel Sampling via Autospeculation	Nima Anari, Carlo Baronio, CJ Chen, Alireza Haqi, Frederic Koehler, Anqi Li, Thuy-Duong Vuong	2025-11-11	下载	We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models.
HeteroSTA: A CPU-GPU Heterogeneous Static Timing Analysis Engine with Holistic Industrial Design Support	Zizheng Guo, Haichuan Liu, Xizhe Shi, Shenglu Hua, Zuodong Zhang, Chunyuan Zhao, Runsheng Wang, Yibo Lin	2025-11-11	下载	We introduce in this paper, HeteroSTA, the first CPU-GPU heterogeneous timing analysis engine that efficiently supports: (1) a set of delay calculation models providing versatile accuracy-speed choice...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Universal Connection Schedules for Reconfigurable Networking	Shaleen Baral, Robert Kleinberg, Sylvan Martin, Henry Rogers, Tegan Wilson, Ruogu Zhang	2025-11-11	下载	Reconfigurable networks are a novel communication paradigm in which the pattern of connectivity between hosts varies rapidly over time. Prior theoretical work explored the inherent tradeoffs between t...
Vision Transformer Based User Equipment Positioning	Parshwa Shah, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez, Siddhartan Govindasamy	2025-11-11	下载	Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ...
Toward Autonomous and Efficient Cybersecurity: A Multi-Objective AutoML-based Intrusion Detection System	Li Yang, Abdallah Shami	2025-11-11	下载	With increasingly sophisticated cybersecurity threats and rising demand for network automation, autonomous cybersecurity mechanisms are becoming critical for securing modern networks.
Adaptive Reallocation of RAN Functions for Resilient 6G Networks	Gabriel M. Almeida, Jacek Kibiłda, Joao F. Santos, Kleber Vieira Cardoso	2025-11-11	下载	The disaggregation of base stations into discrete RAN functions introduces new threats to mobile networks, as failures in one RAN function can trigger cascading failures and disrupt the entire functio...
Fault Tolerant Reconfigurable ML Multiprocessor	Tangrui Li, Justin Y. Shi, Matteo Spatola, Hongzheng Wang	2025-11-11	下载	This paper reports three computational experiments for a von Neumann inspired reconfigurable fault tolerant multiprocessor for neural network (NN) training workflows.
Demystifying QUIC from the Specifications	Darius Saif, Ashraf Matrawy	2025-11-11	下载	QUIC is an advanced transport layer protocol whose ubiquity on the Internet is now very apparent. Importantly, QUIC fuels the next generation of web browsing: HTTP/3.
A CODECO Case Study and Initial Validation for Edge Orchestration of Autonomous Mobile Robots	H. Zhu, T. Samizadeh, R. C. Sofia	2025-11-11	下载	Autonomous Mobile Robots (AMRs) increasingly adopt containerized micro-services across the Edge-Cloud continuum. While Kubernetes is the de-facto orchestrator for such systems, its assumptions of stab...
Revisiting Network Traffic Analysis: Compatible network flows for ML models	João Vitorino, Daniela Pinto, Eva Maia, Ivone Amorim, Isabel Praça	2025-11-11	下载	To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features.
SRE-Llama -- Fine-Tuned Meta's Llama LLM, Federated Learning, Blockchain and NFT Enabled Site Reliability Engineering(SRE) Platform for Communication and Networking Software Services	Eranga Bandara, Safdar H. Bouk, Sachin Shetty, Ravi Mukkamala, Abdul Rahman, Peter Foytik, Ross Gore, Xueping Liang, Ng Wee Keong, Kasun De Zoysa	2025-11-11	下载	Software services are crucial for reliable communication and networking; therefore, Site Reliability Engineering (SRE) is important to ensure these systems stay reliable and perform well in cloud-nati...
Argo: An efficient verification framework for distributed in-network computing	Mingyuan Song, Huan Shen, Jinghui Jiang, Qiang Su, Qingyu Song, Lu Tang, Wanjian Feng, Fei Yuan, Qiao Xiang, Jiwu Shu	2025-11-11	下载	Distributed in-network programs are increasingly deployed in data centers for their performance benefits, but shifting application logic to switches also enlarges the failure domain.
SMoRFFI: A Large-Scale Same-Model 2.4 GHz Wi-Fi Dataset and Reproducible Framework for RF Fingerprinting	Zewei Guo, Zhen Jia, JinXiao Zhu, Wenhao Huang, Yin Chen	2025-11-11	下载	Radio frequency (RF) fingerprinting exploits hardware imperfections for device identification, but distinguishing between same-model devices remains challenging due to their minimal hardware variation...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Work-in-Progress: Function-as-Subtask API Replacing Publish/Subscribe for OS-Native DAG Scheduling	Takahiro Ishikawa-Aso, Atsushi Yano, Yutaro Kobayashi, Takumi Jin, Yuuki Takano, Shinpei Kato	2025-11-11	下载	The Directed Acyclic Graph (DAG) task model for real-time scheduling finds its primary practical target in Robot Operating System 2 (ROS 2). However, ROS 2's publish/subscribe API leaves DAG precedenc...
WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and Portability	Mislav Has, Tao Xiong, Fehmi Ben Abdesslem, Mario Kušek	2025-11-11	下载	The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited re...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
PANDA: Noise-Resilient Antagonist Identification in Production Datacenters	Sixiang Zhou, Nan Deng, Krzysiek Rzadca, Xiaojun Lin, Y. Charlie Hu	2025-11-11	下载	Modern warehouse-scale datacenters commonly collocate multiple jobs on shared machines to improve resource utilization. However, such collocation often leads to performance interference caused by anta...
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models	Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang	2025-11-11	下载	Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications.
Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory	Jie Ren, Bin Ma, Shuangyan Yang, Benjamin Francis, Ehsan K. Ardestani, Min Si, Dong Li	2025-11-11	下载	Deep learning recommendation models (DLRMs) are widely used in industry, and their memory capacity requirements reach the terabyte scale. Tiered memory architectures provide a cost-effective solution ...