2025-01-31

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Theoretical complexity analysis of many-cores on a single chip	Ran Ginosar	2025-01-31	下载	When a single core is scaled up to m cores occupying the same chip area and executing the same (parallelizable) task, achievable speedup is square-root m, power is reduced by square-root m and energy ...
Efficient Read-Port-Count Reduction Schemes for the Centralized Physical Register File in a Superscalar Microprocessor	Denis Los	2025-01-31	下载	The physical register file supports increasing the execution width and depth of a superscalar microprocessor to exploit more instruction-level parallelism.
An All-digital 8.6-nJ/Frame 65-nm Tsetlin Machine Image Classification Accelerator	Svein Anders Tunheim, Yujin Zheng, Lei Jiao, Rishad Shafik, Alex Yakovlev, Ole-Christoffer Granmo	2025-01-31	下载	We present an all-digital programmable machine learning accelerator chip for image classification, underpinning on the Tsetlin machine (TM) principles.
A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator	Sixiao Huang, Tintin Wang, Ang Li, Ao Shen, Kai Li, Keyao Jiang, Mingqiang Huang, Hao Yu	2025-01-31	下载	Large language models (LLMs) are both storage-intensive and computation-intensive, posing significant challenges when deployed on resource-constrained hardware.
StruM: Structured Mixed Precision for Efficient Deep Learning Hardware Codesign	Michael Wu, Arnab Raha, Deepak A. Mathaikutty, Martin Langhammer, Engin Tunali, Daksha Sharma	2025-01-31	下载	In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computat...
Latch Based Design for Fast Voltage Droop Response	Shreyas Srinivas, Ian W Jones, Goran Panic, Christoph Lenzen	2025-01-31	下载	We present a latch-based and PLL-free design of the voltage droop correction circuit of Lenzen, Fuegger, Kinali, and Wiederhake\cite{DroopJournal}.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
The Free Termination Property of Queries Over Time	Conor Power, Paraschos Koutris, Joseph M Hellerstein	2025-01-31	下载	Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow node...
BICompFL: Stochastic Federated Learning with Bi-Directional Compression	Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh, Nir Weinberger, Deniz Gündüz	2025-01-31	下载	We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather ...
Byzantine-Resilient Zero-Order Optimization for Communication-Efficient Heterogeneous Federated Learning	Maximilian Egger, Mayank Bakshi, Rawad Bitar	2025-01-31	下载	We introduce CyBeR-0, a Byzantine-resilient federated zero-order optimization method that is robust under Byzantine attacks and provides significant savings in uplink and downlink communication costs.
Asynchronous Fault-Tolerant Language Decidability for Runtime Verification of Distributed Systems	Armando Castañeda, Gilde Valeria Rodríguez	2025-01-31	下载	Implementing correct distributed systems is an error-prone task. Runtime Verification (RV) offers a lightweight formal method to improve reliability by monitoring system executions against correctness...
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques	Nathaniel Tomczak, Sanmukh Kuppannagari	2025-01-31	下载	Transformers have demonstrated great success in numerous domains including natural language processing and bioinformatics. This success stems from the use of the attention mechanism by these models in...
JustAct+: A Framework for Auditable Multi-Agent Systems Regulated by Inter-Organisational Policies	Christopher A. Esterhuyse, Tim Müller, L. Thomas van Binsbergen	2025-01-31	下载	In open multi-agent agent systems that cross organisational boundaries, agent actions must be regulated by complex policies. Consider medical data processing systems, which must observe generic laws (...
S-VOTE: Similarity-based Voting for Client Selection in Decentralized Federated Learning	Pedro Miguel Sánchez Sánchez, Enrique Tomás Martínez Beltrán, Chao Feng, Gérôme Bovet, Gregorio Martínez Pérez, Alberto Huertas Celdrán	2025-01-31	下载	Decentralized Federated Learning (DFL) enables collaborative, privacy-preserving model training without relying on a central server. This decentralized approach reduces bottlenecks and eliminates sing...
FL-APU: A Software Architecture to Ease Practical Implementation of Cross-Silo Federated Learning	F. Stricker, J. A. Peregrina, D. Bermbach, C. Zirpins	2025-01-31	下载	Federated Learning (FL) is an upcoming technology that is increasingly applied in real-world applications. Early applications focused on cross-device scenarios, where many participants with limited re...
A Bias-Correction Decentralized Stochastic Gradient Algorithm with Momentum Acceleration	Yuchen Hu, Xi Chen, Weidong Liu, Xiaojun Mao	2025-01-31	下载	Distributed stochastic optimization algorithms can simultaneously process large-scale datasets, significantly accelerating model training. However, their effectiveness is often hindered by the sparsit...
CPU vs. GPU for Community Detection: Performance Insights from GVE-Louvain and ν-Louvain	Subhajit Sahu	2025-01-31	下载	Community detection involves identifying natural divisions in networks, a crucial task for many large-scale applications. This report presents GVE-Louvain, one of the most efficient multicore implemen...
BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems	Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao, Jiongzhou Liu, Shujie Han, Yi Liu, Fan Xu	2025-01-31	下载	Cloud infrastructure is the collective term for all physical devices within cloud systems. Failures within the cloud infrastructure system can severely compromise the stability and availability of clo...
Continuous-Time Analysis of Federated Averaging	Tom Overman, Diego Klabjan	2025-01-31	下载	Federated averaging (FedAvg) is a popular algorithm for horizontal federated learning (FL), where samples are gathered across different clients and are not shared with each other or a central server.
Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations	Motahare Mounesan, Xiaojie Zhang, Saptarshi Debroy	2025-01-31	下载	Balancing mutually diverging performance metrics, such as end-to-end latency, accuracy, and device energy consumption, is a challenging undertaking for deep neural network (DNN) inference in Just-in-T...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Open RAN Slicing with Quantum Optimization	Patatchona Keyela, Soumaya Cherkaoui	2025-01-31	下载	RAN slicing technology is a key aspect of the Open RAN paradigm, allowing simultaneous and independent provision of various services such as ultra-reliable low-latency communications (URLLC), enhanced...
Characterizing User Behavior: The Interplay Between Mobility Patterns and Mobile Traffic	Anne Josiane Kouam, Aline Carneiro Viana, Mariano G. Beiró, Leo Ferres, Luca Pappalardo	2025-01-31	下载	Mobile devices have become essential for capturing human activity, and eXtended Data Records (XDRs) offer rich opportunities for detailed user behavior modeling, which is useful for designing personal...
Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes	Zhiyao Xu, Dan Zhao, Qingsong Zou, Jingyu Xiao, Yong Jiang, Zhenhui Yuan, Qing Li	2025-01-31	下载	In recent years, as smart home systems have become more widespread, security concerns within these environments have become a growing threat. Currently, most smart home security solutions, such as ano...
APEX: Automated Parameter Exploration for Low-Power Wireless Protocols	Mohamed Hassaan M. Hydher, Markus Schuss, Olga Saukh, Kay Römer, Carlo Alberto Boano	2025-01-31	下载	Careful parametrization of networking protocols is crucial to maximize the performance of low-power wireless systems and ensure that stringent application requirements can be met.
On Measuring Available Capacity in High-speed Cloud Networks	Ganapathy Raman Madanagopal, Christofer Flinta, Andreas Johnsson, Farnaz Moradi, Daniel Turull	2025-01-31	下载	Measurement of available path capacity with high accuracy over high-speed links deployed in cloud and transport networks is vital for performance assessment and traffic engineering.
Reliability Modeling for Beyond-5G Mission Critical Networks Using Effective Capacity	Anudeep Karnam, Jobish John, Kishor C. Joshi, George Exarchakos, Sonia Heemstra de Groot, Ignas Niemegeers	2025-01-31	下载	Accurate reliability modeling for ultra-reliable low latency communication (URLLC) and hyper-reliable low latency communication (HRLLC) networks is challenging due to the complex interactions between ...
Swift: Rethinking RDMA Control Plane for Elastic Computing	Junxue Zhang, Han Tian, Xinyang Huang, Wenxue Li, Kaiqiang Xu, Dian Shen, Yong Wang, Kai Chen	2025-01-31	下载	Elastic computing enables dynamic scaling to meet workload demands, and Remote Direct Memory Access (RDMA) enhances this by providing high-throughput, low-latency network communication.
Sharing GPUs and Programmable Switches in a Federated Testbed with SHARY	Stefano Salsano, Andrea Mayer, Paolo Lungaroni, Pierpaolo Loreti, Lorenzo Bracciale, Andrea Detti, Marco Orazi, Paolo Giaccone, Fulvio Risso, Alessandro Cornacchia, Carla Fabiana Chiasserini	2025-01-31	下载	Federated testbeds enable collaborative research by providing access to diverse resources, including computing power, storage, and specialized hardware like GPUs, programmable switches and smart Netwo...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
On Measuring Available Capacity in High-speed Cloud Networks	Ganapathy Raman Madanagopal, Christofer Flinta, Andreas Johnsson, Farnaz Moradi, Daniel Turull	2025-01-31	下载	Measurement of available path capacity with high accuracy over high-speed links deployed in cloud and transport networks is vital for performance assessment and traffic engineering.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques	Nathaniel Tomczak, Sanmukh Kuppannagari	2025-01-31	下载	Transformers have demonstrated great success in numerous domains including natural language processing and bioinformatics. This success stems from the use of the attention mechanism by these models in...