2025-03-12

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment Problem	Haesol Im, Chan-Woo Yang, Moslem Noori, Dmitrii Dobrynin, Elisabetta Valiante, Giacomo Pedretti, Arne Heittmann, Thomas Van Vaerenbergh, Masoud Mohseni, John Paul Strachan, Dmitri Strukov, Ray Beausoleil, Ignacio Rozada	2025-03-12	下载	Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years.
Hardware.jl - An MLIR-based Julia HLS Flow (Work in Progress)	Benedict Short, Ian McInerney, John Wickerson	2025-03-12	下载	Co-developing scientific algorithms and hardware accelerators requires domain-specific knowledge and large engineering resources. This leads to a slow development pace and high project complexity, whi...
EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data Transfer	Yi Chen, Jie Lou, Malte Wabnitz, Johnson Loh, Tobias Gemmeke	2025-03-12	下载	Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which e...
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics	Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajing Tang, Gang Pan, Fei Wu, Bingsheng He, Gustavo Alonso	2025-03-12	下载	Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both ...
A Review on Proprietary Accelerators for Large Language Models	Sihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon	2025-03-12	下载	With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing.
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing	Mayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu	2025-03-12	下载	Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Parallel Batch-Dynamic Maximal Matching with Constant Work per Update	Guy E. Blelloch, Andrew C. Brady	2025-03-12	下载	We present a work optimal algorithm for parallel fully batch-dynamic maximal matching against an oblivious adversary. It processes batches of updates (either insertions or deletions of edges) in const...
Computing the Saturation Throughput for Heterogeneous p-CSMA in a General Wireless Network	Faezeh Dehghan Tarzjani, Bhaskar Krishnamachari	2025-03-12	下载	A well-known expression for the saturation throughput of heterogeneous transmitting nodes in a wireless network using p-CSMA, derived from Renewal Theory, implicitly assumes that all transmitting node...
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo	Zachary Charles, Gabriel Teston, Lucio Dery, Keith Rush, Nova Fallen, Zachary Garrett, Arthur Szlam, Arthur Douillard	2025-03-12	下载	As we scale to more massive machine learning models, the frequent synchronization demands inherent in data-parallel approaches create significant slowdowns, posing a critical challenge to further scal...
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching	Tairan Xu, Leyang Xue, Zhan Lu, Adrian Jackson, Luo Mai	2025-03-12	下载	This paper presents MoE-Gen, a high-throughput MoE inference system optimized for single-GPU execution. Existing inference systems rely on model-based or continuous batching strategies, originally des...
The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs	Timothée David--Cléris, Guillaume Laibe, Yona Lapeyre	2025-03-12	下载	We present Shamrock, a performance portable framework developed in C++17 with the SYCL programming standard, tailored for numerical astrophysics on Exascale architectures.
Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device	Mumuksh Tayal, Yogesh Simmhan	2025-03-12	下载	Edge devices like Nvidia Jetson platforms now offer several on-board accelerators -- including GPU CUDA cores, Tensor Cores, and Deep Learning Accelerators (DLA) -- which can be concurrently exploited...
Energy Metrics for Edge Microservice Request Placement Strategies	Klervie Toczé, Simin Nadjm-Tehrani	2025-03-12	下载	Microservices are a way of splitting the logic of an application into small blocks that can be run on different computing units and used by other applications.
EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data Transfer	Yi Chen, Jie Lou, Malte Wabnitz, Johnson Loh, Tobias Gemmeke	2025-03-12	下载	Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which e...
MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization	Zongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan	2025-03-12	下载	Large language models (LLMs) are increasingly utilized for complex tasks requiring longer context lengths, with some models supporting up to 128K or 1M tokens.
Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach	Ruifeng She, Bowen Pang, Kai Li, Zehua Liu, Tao Zhong	2025-03-12	下载	As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential.
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics	Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajing Tang, Gang Pan, Fei Wu, Bingsheng He, Gustavo Alonso	2025-03-12	下载	Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both ...
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference	Mohammad Siavashi, Faezeh Keshmiri Dindarloo, Dejan Kostic, Marco Chiesa	2025-03-12	下载	Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and bes...
Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs	Aidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano	2025-03-12	下载	The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE).
Falafels: A tool for Estimating Federated Learning Energy Consumption via Discrete Simulation	Andrew Mary Huet de Barochez, Stéphan Plassart, Sébastien Monnet	2025-03-12	下载	The growth in computational power and data hungriness of Machine Learning has led to an important shift of research efforts towards the distribution of ML models on multiple machines, leading in even ...
Drift-Aware Federated Learning: A Causal Perspective	Yunjie Fang, Sheng Wu, Tao Yang, Xiaofeng Wu, Bo Hu	2025-03-12	下载	Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual ...
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge	Maximilian Abstreiter, Sasu Tarkoma, Roberto Morabito	2025-03-12	下载	The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making.
FedMSGL: A Self-Expressive Hypergraph Based Federated Multi-View Learning	Daoyuan Li, Zuyuan Yang, Shengli Xie	2025-03-12	下载	Federated learning is essential for enabling collaborative model training across decentralized data sources while preserving data privacy and security.
Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning	Zirui Gong, Yanjun Zhang, Leo Yu Zhang, Zhaoxi Zhang, Yong Xiang, Shirui Pan	2025-03-12	下载	Federated Ranking Learning (FRL) is a state-of-the-art FL framework that stands out for its communication efficiency and resilience to poisoning attacks.
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing	Mayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu	2025-03-12	下载	Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications.
Performance Models for a Two-tiered Storage System	Aparna Sasidharan, Xian-He, Jay Lofstead, Scott Klasky	2025-03-12	下载	This work describes the design, implementation and performance analysis of a distributed two-tiered storage software. The first tier functions as a distributed software cache implemented using solid-s...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Computing the Saturation Throughput for Heterogeneous p-CSMA in a General Wireless Network	Faezeh Dehghan Tarzjani, Bhaskar Krishnamachari	2025-03-12	下载	A well-known expression for the saturation throughput of heterogeneous transmitting nodes in a wireless network using p-CSMA, derived from Renewal Theory, implicitly assumes that all transmitting node...
A Short Scalability Study on the SeQUeNCe Parallel Quantum Network Simulator	Aaron Welch, Mariam Kiran	2025-03-12	下载	As quantum networking continues to grow in importance, its study is of interest to an ever wider community and at an increasing scale. However, the development of its physical infrastructure remains b...
IUP: Integrated and Programmable User Plane for Next-Generation Mobile Networks	Chieh-Chun Chen, Chia-Yu Chang, Navid Nikaein	2025-03-12	下载	Mobile networks evolve on a regular basis to meet the requirements of a rapidly changing application ecosystem; hence, a future-proof design is key to getting the most out of their lifecycle.
Experimental Analysis of a Self-Coherent M-QAM Receiver by Means of Recurrent Optical Spectrum Slicing and Direct Detection	Kostas Sozos, Francesco Da Ros, Senior Member Optica, Metodi Yankov, Stavros Deligiannidis, George Sarantoglou, Charis Mesaritakis, Adonis Bogris, Fellow Optica	2025-03-12	下载	High order modulation formats constitute the most prominent way for increasing spectral efficiency in transmission systems. Coherent transceivers that support such higher order formats require heavy d...
Charting 5G Energy Efficiency: Flexible Energy Modeling for Sustainable Networks	Anderson L de Araujo, Luc Deneire, Guillaume Urvoy-Keller, André L F de Almeida	2025-03-12	下载	Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies...
Efficient Adaptive Bandwidth Allocation for Deadline-Aware Online Admission Control in Time-Sensitive Networking	Sifan Yu, Feng He, Anlan Xie, Luxi Zhao	2025-03-12	下载	With the growing demand for dynamic real-time applications, online admission control for time-critical event-triggered (ET) traffic in Time-Sensitive Networking (TSN) has become a critical challenge.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
BYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More Excellent	Hongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Zhenghong Lin, Mingjie Xing, Yanjun Wu, Carl Yang	2025-03-12	下载	Operating system (OS) kernel tuning is a critical yet challenging problem for performance optimization, due to the large configuration space, complex interdependencies among configuration options, and...
KNighter: Transforming Static Analysis with LLM-Synthesized Checkers	Chenyuan Yang, Zijie Zhao, Zichen Xie, Haoyu Li, Lingming Zhang	2025-03-12	下载	Static analysis is a powerful technique for bug detection in critical systems like operating system kernels. However, designing and implementing static analyzers is challenging, time-consuming, and ty...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
AI Work Quantization Model: Closed-System AI Computational Effort Metric	Aasish Kumar Sharma, Michael Bidollahkhani, Julian Martin Kunkel	2025-03-12	下载	The rapid adoption of AI-driven automation in IoT environments, particularly in smart cities and industrial systems, necessitates a standardized approach to quantify AIs computational workload.
Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs	Aidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano	2025-03-12	下载	The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE).
A Review on Proprietary Accelerators for Large Language Models	Sihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon	2025-03-12	下载	With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing.
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge	Maximilian Abstreiter, Sasu Tarkoma, Roberto Morabito	2025-03-12	下载	The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making.
Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks	Idris Zakariyya, Ferheen Ayaz, Mounia Kharbouche-Harrari, Jeremy Singer, Sye Loong Keoh, Danilo Pau, José Cano	2025-03-12	下载	Reducing the memory footprint of Machine Learning (ML) models, especially Deep Neural Networks (DNNs), is imperative to facilitate their deployment on resource-constrained edge devices.

2025-03-12 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.OS - Operating Systems ​

cs.PF - Performance ​

2025-03-12

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.OS - Operating Systems

cs.PF - Performance