Appearance
2025-11-11
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| 3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial Intelligence | Eren Kurshan, Yuan Xie, Paul Franzon | 2025-11-11 | 下载 | AI systems have found a wide range of real-world applications in recent years. The adoption of edge artificial intelligence, embedding AI directly into edge devices, is rapidly growing. |
| CO2-Meter: A Comprehensive Carbon Footprint Estimator for LLMs on Edge Devices | Zhenxiao Fu, Chen Fan, Lei Jiang | 2025-11-11 | 下载 | LLMs have transformed NLP, yet deploying them on edge devices poses great carbon challenges. Prior estimators remain incomplete, neglecting peripheral energy use, distinct prefill/decode behaviors, an... |
| DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator | Xingyu Liu, Jiawei Liang, Yipu Zhang, Linfeng Du, Chaofang Ma, Hui Yu, Jiang Xu, Wei Zhang | 2025-11-11 | 下载 | We propose a hardware-efficient RBD accelerator based on FPGA, introducing three key innovations. First, we propose a precision-aware quantization framework that reduces DSP demand while preserving mo... |
| BDD2Seq: Enabling Scalable Reversible-Circuit Synthesis via Graph-to-Sequence Learning | Mingkai Miao, Jianheng Tang, Guangyu Hu, Hongce Zhang | 2025-11-11 | 下载 | Binary Decision Diagrams (BDDs) are instrumental in many electronic design automation (EDA) tasks thanks to their compact representation of Boolean functions. |
| UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing | Zhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, Hong Yan | 2025-11-11 | 下载 | The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-... |
| WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and Portability | Mislav Has, Tao Xiong, Fehmi Ben Abdesslem, Mario Kušek | 2025-11-11 | 下载 | The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited re... |
| ReMaP: Macro Placement by Recursively Prototyping and Packing Tree-based Relocating | Yunqi Shi, Xi Lin, Zhiang Wang, Siyuan Xu, Shixiong Kai, Yao Lai, Chengrui Gao, Ke Xue, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou | 2025-11-11 | 下载 | This work introduces the ReMaP method, which generates expert-quality macro placements through recursively prototyping and packing tree-based relocating. |
| PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization | Simei Yang, Xinyu Shi, Lu Zhao, Yunyu Ling, Quanjun Wang, Francky Catthoor | 2025-11-11 | 下载 | Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses. |
| Streaming Tensor Programs: A Streaming Abstraction for Dynamic Parallelism | Gina Sohn, Genghan Zhang, Konstantin Hossfeld, Jungwoo Kim, Nathan Sobotka, Nathan Zhang, Olivia Hsu, Kunle Olukotun | 2025-11-11 | 下载 | Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| OSGym: Scalable OS Infra for Computer Use Agents | Zengyi Qin, Jinyuan Chen, Yunze Man, Shengcao Cao, Ziqi Pang, Zhuoyuan Wang, Han Fang, Ling Zhu, Zixin Xie, Zibu Wei, Tianshu Ran, Haoran Geng, Ray Pan, Qizhen Sun, Zachary Bright, Yuyang Cai, Chongye Yang, Jiace Zhao, Tianrui Liu, Han Cao, Yeyang Zhou, Rui Wang, Song Wang, Xiang Ren, Bo Zhang, Yutong Ban, Pieter Abbeel, Brian Anthony | 2025-11-11 | 下载 | Training computer use agents requires full-featured OS sandboxes with GUI environments, which consume substantial hardware resources as the number of sandboxes scales. |
| An MLIR pipeline for offloading Fortran to FPGAs via OpenMP | Gabriel Rodriguez-Canal, David Katz, Nick Brown | 2025-11-11 | 下载 | With the slowing of Moore's Law, heterogeneous computing platforms such as Field Programmable Gate Arrays (FPGAs) have gained increasing interest for accelerating HPC workloads. |
| Priority Matters: Optimising Kubernetes Clusters Usage with Constraint-Based Pod Packing | Henrik Daniel Christensen, Saverio Giallorenzo, Jacopo Mauro | 2025-11-11 | 下载 | Distributed applications employ Kubernetes for scalable, fault-tolerant deployments over computer clusters, where application components run in groups of containers called pods. |
| Gathering in Vertex- and Edge-Transitive Graphs without Multiplicity Detection under Round Robin | Serafino Cicerone, Alessia Di Fonso, Gabriele Di Stefano, Alfredo Navarra | 2025-11-11 | 下载 | In the field of swarm robotics, one of the most studied problem is Gathering. It asks for a distributed algorithm that brings the robots to a common location, not known in advance. |
| Forgetting Alternation and Blossoms: A New Framework for Fast Matching Augmentation and Its Applications to Sequential/Distributed/Streaming Computation | Taisuke Izumi, Naoki Kitamura, Yutaro Yamaguchi | 2025-11-11 | 下载 | Finding a maximum cardinality matching in a graph is one of the most fundamental problems. An algorithm proposed by Micali and Vazirani (1980) is well-known to solve the problem in time... |
| Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks | Mingyu Sung, Suhwan Im, Vikas Palakonda, Jae-Mo Kang | 2025-11-11 | 下载 | Split computing distributes deep neural network inference between resource-constrained edge devices and cloud servers but faces significant communication bottlenecks when transmitting intermediate fea... |
| LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME Architectures | Kelun Lei, Hailong Yang, Kaige Zhang, Kejie Ma, Yiqing Wang, Xin You, Yufan Xu, Enrique S. Quintana-Orti, Zhongzhi Luan, Yi Liu, Depei Qian | 2025-11-11 | 下载 | Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in both scientific computing and emerging graph learning workloads. The recent Armv9 architecture introduces Scalable Matrix Exten... |
| ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum | Andrija Stanisic, Stefan Nastic | 2025-11-11 | 下载 | Integration of edge, cloud and space devices into a unified 3D continuum imposes significant challenges for client selection in federated learning systems. |
| BIPPO: Budget-Aware Independent PPO for Energy-Efficient Federated Learning Services | Anna Lackinger, Andrea Morichetta, Pantelis A. Frangoudis, Schahram Dustdar | 2025-11-11 | 下载 | Federated Learning (FL) is a promising machine learning solution in large-scale IoT systems, guaranteeing load distribution and privacy. However, FL does not natively consider infrastructure efficienc... |
| UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing | Zhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, Hong Yan | 2025-11-11 | 下载 | The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-... |
| Foam Segmentation in Wastewater Treatment Plants: A Federated Learning Approach with Segment Anything Model 2 | Mehmet Batuhan Duman, Alejandro Carnero, Cristian Martín, Daniel Garrido, Manuel Díaz | 2025-11-11 | 下载 | Foam formation in Wastewater Treatment Plants (WTPs) is a major challenge that can reduce treatment efficiency and increase costs. The ability to automatically examine changes in real-time with respec... |
| Generic Algorithm for Universal TDM Communication Over Inter Satellite Links | Miroslav Popovic, Marko Popovic, Pavle Vasiljevic, Ilija Basicevic | 2025-11-11 | 下载 | The original Python Testbed for Federated Learning Algorithms is a light FL framework, which provides the three generic algorithms: the centralized federated learning, the decentralized federated lear... |
| ACGraph: An Efficient Asynchronous Out-of-Core Graph Processing Framework | Dechuang Chen, Sibo Wang, Qintian Guo | 2025-11-11 | 下载 | Graphs are a ubiquitous data structure in diverse domains such as machine learning, social networks, and data mining. As real-world graphs continue to grow beyond the memory capacity of single machine... |
| Intelligence per Watt: Measuring Intelligence Efficiency of Local AI | Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré | 2025-11-11 | 下载 | Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to sca... |
| Parallel Sampling via Autospeculation | Nima Anari, Carlo Baronio, CJ Chen, Alireza Haqi, Frederic Koehler, Anqi Li, Thuy-Duong Vuong | 2025-11-11 | 下载 | We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. |
| HeteroSTA: A CPU-GPU Heterogeneous Static Timing Analysis Engine with Holistic Industrial Design Support | Zizheng Guo, Haichuan Liu, Xizhe Shi, Shenglu Hua, Zuodong Zhang, Chunyuan Zhao, Runsheng Wang, Yibo Lin | 2025-11-11 | 下载 | We introduce in this paper, HeteroSTA, the first CPU-GPU heterogeneous timing analysis engine that efficiently supports: (1) a set of delay calculation models providing versatile accuracy-speed choice... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Universal Connection Schedules for Reconfigurable Networking | Shaleen Baral, Robert Kleinberg, Sylvan Martin, Henry Rogers, Tegan Wilson, Ruogu Zhang | 2025-11-11 | 下载 | Reconfigurable networks are a novel communication paradigm in which the pattern of connectivity between hosts varies rapidly over time. Prior theoretical work explored the inherent tradeoffs between t... |
| Vision Transformer Based User Equipment Positioning | Parshwa Shah, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez, Siddhartan Govindasamy | 2025-11-11 | 下载 | Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ... |
| Toward Autonomous and Efficient Cybersecurity: A Multi-Objective AutoML-based Intrusion Detection System | Li Yang, Abdallah Shami | 2025-11-11 | 下载 | With increasingly sophisticated cybersecurity threats and rising demand for network automation, autonomous cybersecurity mechanisms are becoming critical for securing modern networks. |
| Adaptive Reallocation of RAN Functions for Resilient 6G Networks | Gabriel M. Almeida, Jacek Kibiłda, Joao F. Santos, Kleber Vieira Cardoso | 2025-11-11 | 下载 | The disaggregation of base stations into discrete RAN functions introduces new threats to mobile networks, as failures in one RAN function can trigger cascading failures and disrupt the entire functio... |
| Fault Tolerant Reconfigurable ML Multiprocessor | Tangrui Li, Justin Y. Shi, Matteo Spatola, Hongzheng Wang | 2025-11-11 | 下载 | This paper reports three computational experiments for a von Neumann inspired reconfigurable fault tolerant multiprocessor for neural network (NN) training workflows. |
| Demystifying QUIC from the Specifications | Darius Saif, Ashraf Matrawy | 2025-11-11 | 下载 | QUIC is an advanced transport layer protocol whose ubiquity on the Internet is now very apparent. Importantly, QUIC fuels the next generation of web browsing: HTTP/3. |
| A CODECO Case Study and Initial Validation for Edge Orchestration of Autonomous Mobile Robots | H. Zhu, T. Samizadeh, R. C. Sofia | 2025-11-11 | 下载 | Autonomous Mobile Robots (AMRs) increasingly adopt containerized micro-services across the Edge-Cloud continuum. While Kubernetes is the de-facto orchestrator for such systems, its assumptions of stab... |
| Revisiting Network Traffic Analysis: Compatible network flows for ML models | João Vitorino, Daniela Pinto, Eva Maia, Ivone Amorim, Isabel Praça | 2025-11-11 | 下载 | To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. |
| SRE-Llama -- Fine-Tuned Meta's Llama LLM, Federated Learning, Blockchain and NFT Enabled Site Reliability Engineering(SRE) Platform for Communication and Networking Software Services | Eranga Bandara, Safdar H. Bouk, Sachin Shetty, Ravi Mukkamala, Abdul Rahman, Peter Foytik, Ross Gore, Xueping Liang, Ng Wee Keong, Kasun De Zoysa | 2025-11-11 | 下载 | Software services are crucial for reliable communication and networking; therefore, Site Reliability Engineering (SRE) is important to ensure these systems stay reliable and perform well in cloud-nati... |
| Argo: An efficient verification framework for distributed in-network computing | Mingyuan Song, Huan Shen, Jinghui Jiang, Qiang Su, Qingyu Song, Lu Tang, Wanjian Feng, Fei Yuan, Qiao Xiang, Jiwu Shu | 2025-11-11 | 下载 | Distributed in-network programs are increasingly deployed in data centers for their performance benefits, but shifting application logic to switches also enlarges the failure domain. |
| SMoRFFI: A Large-Scale Same-Model 2.4 GHz Wi-Fi Dataset and Reproducible Framework for RF Fingerprinting | Zewei Guo, Zhen Jia, JinXiao Zhu, Wenhao Huang, Yin Chen | 2025-11-11 | 下载 | Radio frequency (RF) fingerprinting exploits hardware imperfections for device identification, but distinguishing between same-model devices remains challenging due to their minimal hardware variation... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Work-in-Progress: Function-as-Subtask API Replacing Publish/Subscribe for OS-Native DAG Scheduling | Takahiro Ishikawa-Aso, Atsushi Yano, Yutaro Kobayashi, Takumi Jin, Yuuki Takano, Shinpei Kato | 2025-11-11 | 下载 | The Directed Acyclic Graph (DAG) task model for real-time scheduling finds its primary practical target in Robot Operating System 2 (ROS 2). However, ROS 2's publish/subscribe API leaves DAG precedenc... |
| WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and Portability | Mislav Has, Tao Xiong, Fehmi Ben Abdesslem, Mario Kušek | 2025-11-11 | 下载 | The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited re... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PANDA: Noise-Resilient Antagonist Identification in Production Datacenters | Sixiang Zhou, Nan Deng, Krzysiek Rzadca, Xiaojun Lin, Y. Charlie Hu | 2025-11-11 | 下载 | Modern warehouse-scale datacenters commonly collocate multiple jobs on shared machines to improve resource utilization. However, such collocation often leads to performance interference caused by anta... |
| Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models | Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang | 2025-11-11 | 下载 | Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. |
| Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory | Jie Ren, Bin Ma, Shuangyan Yang, Benjamin Francis, Ehsan K. Ardestani, Min Si, Dong Li | 2025-11-11 | 下载 | Deep learning recommendation models (DLRMs) are widely used in industry, and their memory capacity requirements reach the terabyte scale. Tiered memory architectures provide a cost-effective solution ... |