Skip to content

2025-11-11

cs.AR - Architecture

标题作者发布日期PDF摘要
3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial IntelligenceEren Kurshan, Yuan Xie, Paul Franzon2025-11-11下载AI systems have found a wide range of real-world applications in recent years. The adoption of edge artificial intelligence, embedding AI directly into edge devices, is rapidly growing.
CO2-Meter: A Comprehensive Carbon Footprint Estimator for LLMs on Edge DevicesZhenxiao Fu, Chen Fan, Lei Jiang2025-11-11下载LLMs have transformed NLP, yet deploying them on edge devices poses great carbon challenges. Prior estimators remain incomplete, neglecting peripheral energy use, distinct prefill/decode behaviors, an...
DRACO: Co-design for DSP-Efficient Rigid Body Dynamics AcceleratorXingyu Liu, Jiawei Liang, Yipu Zhang, Linfeng Du, Chaofang Ma, Hui Yu, Jiang Xu, Wei Zhang2025-11-11下载We propose a hardware-efficient RBD accelerator based on FPGA, introducing three key innovations. First, we propose a precision-aware quantization framework that reduces DSP demand while preserving mo...
BDD2Seq: Enabling Scalable Reversible-Circuit Synthesis via Graph-to-Sequence LearningMingkai Miao, Jianheng Tang, Guangyu Hu, Hongce Zhang2025-11-11下载Binary Decision Diagrams (BDDs) are instrumental in many electronic design automation (EDA) tasks thanks to their compact representation of Boolean functions.
UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom ComputingZhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, Hong Yan2025-11-11下载The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-...
WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and PortabilityMislav Has, Tao Xiong, Fehmi Ben Abdesslem, Mario Kušek2025-11-11下载The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited re...
Re2^{\text{2}}MaP: Macro Placement by Recursively Prototyping and Packing Tree-based RelocatingYunqi Shi, Xi Lin, Zhiang Wang, Siyuan Xu, Shixiong Kai, Yao Lai, Chengrui Gao, Ke Xue, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou2025-11-11下载This work introduces the Re2^{\text{2}}MaP method, which generates expert-quality macro placements through recursively prototyping and packing tree-based relocating.
PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer OptimizationSimei Yang, Xinyu Shi, Lu Zhao, Yunyu Ling, Quanjun Wang, Francky Catthoor2025-11-11下载Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses.
Streaming Tensor Programs: A Streaming Abstraction for Dynamic ParallelismGina Sohn, Genghan Zhang, Konstantin Hossfeld, Jungwoo Kim, Nathan Sobotka, Nathan Zhang, Olivia Hsu, Kunle Olukotun2025-11-11下载Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
OSGym: Scalable OS Infra for Computer Use AgentsZengyi Qin, Jinyuan Chen, Yunze Man, Shengcao Cao, Ziqi Pang, Zhuoyuan Wang, Han Fang, Ling Zhu, Zixin Xie, Zibu Wei, Tianshu Ran, Haoran Geng, Ray Pan, Qizhen Sun, Zachary Bright, Yuyang Cai, Chongye Yang, Jiace Zhao, Tianrui Liu, Han Cao, Yeyang Zhou, Rui Wang, Song Wang, Xiang Ren, Bo Zhang, Yutong Ban, Pieter Abbeel, Brian Anthony2025-11-11下载Training computer use agents requires full-featured OS sandboxes with GUI environments, which consume substantial hardware resources as the number of sandboxes scales.
An MLIR pipeline for offloading Fortran to FPGAs via OpenMPGabriel Rodriguez-Canal, David Katz, Nick Brown2025-11-11下载With the slowing of Moore's Law, heterogeneous computing platforms such as Field Programmable Gate Arrays (FPGAs) have gained increasing interest for accelerating HPC workloads.
Priority Matters: Optimising Kubernetes Clusters Usage with Constraint-Based Pod PackingHenrik Daniel Christensen, Saverio Giallorenzo, Jacopo Mauro2025-11-11下载Distributed applications employ Kubernetes for scalable, fault-tolerant deployments over computer clusters, where application components run in groups of containers called pods.
Gathering in Vertex- and Edge-Transitive Graphs without Multiplicity Detection under Round RobinSerafino Cicerone, Alessia Di Fonso, Gabriele Di Stefano, Alfredo Navarra2025-11-11下载In the field of swarm robotics, one of the most studied problem is Gathering. It asks for a distributed algorithm that brings the robots to a common location, not known in advance.
Forgetting Alternation and Blossoms: A New Framework for Fast Matching Augmentation and Its Applications to Sequential/Distributed/Streaming ComputationTaisuke Izumi, Naoki Kitamura, Yutaro Yamaguchi2025-11-11下载Finding a maximum cardinality matching in a graph is one of the most fundamental problems. An algorithm proposed by Micali and Vazirani (1980) is well-known to solve the problem in O(mn)O(m\sqrt{n}) time...
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural NetworksMingyu Sung, Suhwan Im, Vikas Palakonda, Jae-Mo Kang2025-11-11下载Split computing distributes deep neural network inference between resource-constrained edge devices and cloud servers but faces significant communication bottlenecks when transmitting intermediate fea...
LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME ArchitecturesKelun Lei, Hailong Yang, Kaige Zhang, Kejie Ma, Yiqing Wang, Xin You, Yufan Xu, Enrique S. Quintana-Orti, Zhongzhi Luan, Yi Liu, Depei Qian2025-11-11下载Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in both scientific computing and emerging graph learning workloads. The recent Armv9 architecture introduces Scalable Matrix Exten...
ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D ContinuumAndrija Stanisic, Stefan Nastic2025-11-11下载Integration of edge, cloud and space devices into a unified 3D continuum imposes significant challenges for client selection in federated learning systems.
BIPPO: Budget-Aware Independent PPO for Energy-Efficient Federated Learning ServicesAnna Lackinger, Andrea Morichetta, Pantelis A. Frangoudis, Schahram Dustdar2025-11-11下载Federated Learning (FL) is a promising machine learning solution in large-scale IoT systems, guaranteeing load distribution and privacy. However, FL does not natively consider infrastructure efficienc...
UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom ComputingZhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, Hong Yan2025-11-11下载The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-...
Foam Segmentation in Wastewater Treatment Plants: A Federated Learning Approach with Segment Anything Model 2Mehmet Batuhan Duman, Alejandro Carnero, Cristian Martín, Daniel Garrido, Manuel Díaz2025-11-11下载Foam formation in Wastewater Treatment Plants (WTPs) is a major challenge that can reduce treatment efficiency and increase costs. The ability to automatically examine changes in real-time with respec...
Generic Algorithm for Universal TDM Communication Over Inter Satellite LinksMiroslav Popovic, Marko Popovic, Pavle Vasiljevic, Ilija Basicevic2025-11-11下载The original Python Testbed for Federated Learning Algorithms is a light FL framework, which provides the three generic algorithms: the centralized federated learning, the decentralized federated lear...
ACGraph: An Efficient Asynchronous Out-of-Core Graph Processing FrameworkDechuang Chen, Sibo Wang, Qintian Guo2025-11-11下载Graphs are a ubiquitous data structure in diverse domains such as machine learning, social networks, and data mining. As real-world graphs continue to grow beyond the memory capacity of single machine...
Intelligence per Watt: Measuring Intelligence Efficiency of Local AIJon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré2025-11-11下载Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to sca...
Parallel Sampling via AutospeculationNima Anari, Carlo Baronio, CJ Chen, Alireza Haqi, Frederic Koehler, Anqi Li, Thuy-Duong Vuong2025-11-11下载We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models.
HeteroSTA: A CPU-GPU Heterogeneous Static Timing Analysis Engine with Holistic Industrial Design SupportZizheng Guo, Haichuan Liu, Xizhe Shi, Shenglu Hua, Zuodong Zhang, Chunyuan Zhao, Runsheng Wang, Yibo Lin2025-11-11下载We introduce in this paper, HeteroSTA, the first CPU-GPU heterogeneous timing analysis engine that efficiently supports: (1) a set of delay calculation models providing versatile accuracy-speed choice...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Universal Connection Schedules for Reconfigurable NetworkingShaleen Baral, Robert Kleinberg, Sylvan Martin, Henry Rogers, Tegan Wilson, Ruogu Zhang2025-11-11下载Reconfigurable networks are a novel communication paradigm in which the pattern of connectivity between hosts varies rapidly over time. Prior theoretical work explored the inherent tradeoffs between t...
Vision Transformer Based User Equipment PositioningParshwa Shah, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez, Siddhartan Govindasamy2025-11-11下载Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ...
Toward Autonomous and Efficient Cybersecurity: A Multi-Objective AutoML-based Intrusion Detection SystemLi Yang, Abdallah Shami2025-11-11下载With increasingly sophisticated cybersecurity threats and rising demand for network automation, autonomous cybersecurity mechanisms are becoming critical for securing modern networks.
Adaptive Reallocation of RAN Functions for Resilient 6G NetworksGabriel M. Almeida, Jacek Kibiłda, Joao F. Santos, Kleber Vieira Cardoso2025-11-11下载The disaggregation of base stations into discrete RAN functions introduces new threats to mobile networks, as failures in one RAN function can trigger cascading failures and disrupt the entire functio...
Fault Tolerant Reconfigurable ML MultiprocessorTangrui Li, Justin Y. Shi, Matteo Spatola, Hongzheng Wang2025-11-11下载This paper reports three computational experiments for a von Neumann inspired reconfigurable fault tolerant multiprocessor for neural network (NN) training workflows.
Demystifying QUIC from the SpecificationsDarius Saif, Ashraf Matrawy2025-11-11下载QUIC is an advanced transport layer protocol whose ubiquity on the Internet is now very apparent. Importantly, QUIC fuels the next generation of web browsing: HTTP/3.
A CODECO Case Study and Initial Validation for Edge Orchestration of Autonomous Mobile RobotsH. Zhu, T. Samizadeh, R. C. Sofia2025-11-11下载Autonomous Mobile Robots (AMRs) increasingly adopt containerized micro-services across the Edge-Cloud continuum. While Kubernetes is the de-facto orchestrator for such systems, its assumptions of stab...
Revisiting Network Traffic Analysis: Compatible network flows for ML modelsJoão Vitorino, Daniela Pinto, Eva Maia, Ivone Amorim, Isabel Praça2025-11-11下载To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features.
SRE-Llama -- Fine-Tuned Meta's Llama LLM, Federated Learning, Blockchain and NFT Enabled Site Reliability Engineering(SRE) Platform for Communication and Networking Software ServicesEranga Bandara, Safdar H. Bouk, Sachin Shetty, Ravi Mukkamala, Abdul Rahman, Peter Foytik, Ross Gore, Xueping Liang, Ng Wee Keong, Kasun De Zoysa2025-11-11下载Software services are crucial for reliable communication and networking; therefore, Site Reliability Engineering (SRE) is important to ensure these systems stay reliable and perform well in cloud-nati...
Argo: An efficient verification framework for distributed in-network computingMingyuan Song, Huan Shen, Jinghui Jiang, Qiang Su, Qingyu Song, Lu Tang, Wanjian Feng, Fei Yuan, Qiao Xiang, Jiwu Shu2025-11-11下载Distributed in-network programs are increasingly deployed in data centers for their performance benefits, but shifting application logic to switches also enlarges the failure domain.
SMoRFFI: A Large-Scale Same-Model 2.4 GHz Wi-Fi Dataset and Reproducible Framework for RF FingerprintingZewei Guo, Zhen Jia, JinXiao Zhu, Wenhao Huang, Yin Chen2025-11-11下载Radio frequency (RF) fingerprinting exploits hardware imperfections for device identification, but distinguishing between same-model devices remains challenging due to their minimal hardware variation...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Work-in-Progress: Function-as-Subtask API Replacing Publish/Subscribe for OS-Native DAG SchedulingTakahiro Ishikawa-Aso, Atsushi Yano, Yutaro Kobayashi, Takumi Jin, Yuuki Takano, Shinpei Kato2025-11-11下载The Directed Acyclic Graph (DAG) task model for real-time scheduling finds its primary practical target in Robot Operating System 2 (ROS 2). However, ROS 2's publish/subscribe API leaves DAG precedenc...
WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and PortabilityMislav Has, Tao Xiong, Fehmi Ben Abdesslem, Mario Kušek2025-11-11下载The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited re...

cs.PF - Performance

标题作者发布日期PDF摘要
PANDA: Noise-Resilient Antagonist Identification in Production DatacentersSixiang Zhou, Nan Deng, Krzysiek Rzadca, Xiaojun Lin, Y. Charlie Hu2025-11-11下载Modern warehouse-scale datacenters commonly collocate multiple jobs on shared machines to improve resource utilization. However, such collocation often leads to performance interference caused by anta...
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language ModelsTianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang2025-11-11下载Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications.
Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered MemoryJie Ren, Bin Ma, Shuangyan Yang, Benjamin Francis, Ehsan K. Ardestani, Min Si, Dong Li2025-11-11下载Deep learning recommendation models (DLRMs) are widely used in industry, and their memory capacity requirements reach the terabyte scale. Tiered memory architectures provide a cost-effective solution ...

基于 VitePress 构建