Skip to content

2025-01-31

cs.AR - Architecture

标题作者发布日期PDF摘要
Theoretical complexity analysis of many-cores on a single chipRan Ginosar2025-01-31下载When a single core is scaled up to m cores occupying the same chip area and executing the same (parallelizable) task, achievable speedup is square-root m, power is reduced by square-root m and energy ...
Efficient Read-Port-Count Reduction Schemes for the Centralized Physical Register File in a Superscalar MicroprocessorDenis Los2025-01-31下载The physical register file supports increasing the execution width and depth of a superscalar microprocessor to exploit more instruction-level parallelism.
An All-digital 8.6-nJ/Frame 65-nm Tsetlin Machine Image Classification AcceleratorSvein Anders Tunheim, Yujin Zheng, Lei Jiao, Rishad Shafik, Alex Yakovlev, Ole-Christoffer Granmo2025-01-31下载We present an all-digital programmable machine learning accelerator chip for image classification, underpinning on the Tsetlin machine (TM) principles.
A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic AcceleratorSixiao Huang, Tintin Wang, Ang Li, Ao Shen, Kai Li, Keyao Jiang, Mingqiang Huang, Hao Yu2025-01-31下载Large language models (LLMs) are both storage-intensive and computation-intensive, posing significant challenges when deployed on resource-constrained hardware.
StruM: Structured Mixed Precision for Efficient Deep Learning Hardware CodesignMichael Wu, Arnab Raha, Deepak A. Mathaikutty, Martin Langhammer, Engin Tunali, Daksha Sharma2025-01-31下载In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computat...
Latch Based Design for Fast Voltage Droop ResponseShreyas Srinivas, Ian W Jones, Goran Panic, Christoph Lenzen2025-01-31下载We present a latch-based and PLL-free design of the voltage droop correction circuit of Lenzen, Fuegger, Kinali, and Wiederhake\cite{DroopJournal}.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
The Free Termination Property of Queries Over TimeConor Power, Paraschos Koutris, Joseph M Hellerstein2025-01-31下载Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow node...
BICompFL: Stochastic Federated Learning with Bi-Directional CompressionMaximilian Egger, Rawad Bitar, Antonia Wachter-Zeh, Nir Weinberger, Deniz Gündüz2025-01-31下载We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather ...
Byzantine-Resilient Zero-Order Optimization for Communication-Efficient Heterogeneous Federated LearningMaximilian Egger, Mayank Bakshi, Rawad Bitar2025-01-31下载We introduce CyBeR-0, a Byzantine-resilient federated zero-order optimization method that is robust under Byzantine attacks and provides significant savings in uplink and downlink communication costs.
Asynchronous Fault-Tolerant Language Decidability for Runtime Verification of Distributed SystemsArmando Castañeda, Gilde Valeria Rodríguez2025-01-31下载Implementing correct distributed systems is an error-prone task. Runtime Verification (RV) offers a lightweight formal method to improve reliability by monitoring system executions against correctness...
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing TechniquesNathaniel Tomczak, Sanmukh Kuppannagari2025-01-31下载Transformers have demonstrated great success in numerous domains including natural language processing and bioinformatics. This success stems from the use of the attention mechanism by these models in...
JustAct+: A Framework for Auditable Multi-Agent Systems Regulated by Inter-Organisational PoliciesChristopher A. Esterhuyse, Tim Müller, L. Thomas van Binsbergen2025-01-31下载In open multi-agent agent systems that cross organisational boundaries, agent actions must be regulated by complex policies. Consider medical data processing systems, which must observe generic laws (...
S-VOTE: Similarity-based Voting for Client Selection in Decentralized Federated LearningPedro Miguel Sánchez Sánchez, Enrique Tomás Martínez Beltrán, Chao Feng, Gérôme Bovet, Gregorio Martínez Pérez, Alberto Huertas Celdrán2025-01-31下载Decentralized Federated Learning (DFL) enables collaborative, privacy-preserving model training without relying on a central server. This decentralized approach reduces bottlenecks and eliminates sing...
FL-APU: A Software Architecture to Ease Practical Implementation of Cross-Silo Federated LearningF. Stricker, J. A. Peregrina, D. Bermbach, C. Zirpins2025-01-31下载Federated Learning (FL) is an upcoming technology that is increasingly applied in real-world applications. Early applications focused on cross-device scenarios, where many participants with limited re...
A Bias-Correction Decentralized Stochastic Gradient Algorithm with Momentum AccelerationYuchen Hu, Xi Chen, Weidong Liu, Xiaojun Mao2025-01-31下载Distributed stochastic optimization algorithms can simultaneously process large-scale datasets, significantly accelerating model training. However, their effectiveness is often hindered by the sparsit...
CPU vs. GPU for Community Detection: Performance Insights from GVE-Louvain and ν-LouvainSubhajit Sahu2025-01-31下载Community detection involves identifying natural divisions in networks, a crucial task for many large-scale applications. This report presents GVE-Louvain, one of the most efficient multicore implemen...
BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure SystemsTao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao, Jiongzhou Liu, Shujie Han, Yi Liu, Fan Xu2025-01-31下载Cloud infrastructure is the collective term for all physical devices within cloud systems. Failures within the cloud infrastructure system can severely compromise the stability and availability of clo...
Continuous-Time Analysis of Federated AveragingTom Overman, Diego Klabjan2025-01-31下载Federated averaging (FedAvg) is a popular algorithm for horizontal federated learning (FL), where samples are gathered across different clients and are not shared with each other or a central server.
Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI ImplementationsMotahare Mounesan, Xiaojie Zhang, Saptarshi Debroy2025-01-31下载Balancing mutually diverging performance metrics, such as end-to-end latency, accuracy, and device energy consumption, is a challenging undertaking for deep neural network (DNN) inference in Just-in-T...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Open RAN Slicing with Quantum OptimizationPatatchona Keyela, Soumaya Cherkaoui2025-01-31下载RAN slicing technology is a key aspect of the Open RAN paradigm, allowing simultaneous and independent provision of various services such as ultra-reliable low-latency communications (URLLC), enhanced...
Characterizing User Behavior: The Interplay Between Mobility Patterns and Mobile TrafficAnne Josiane Kouam, Aline Carneiro Viana, Mariano G. Beiró, Leo Ferres, Luca Pappalardo2025-01-31下载Mobile devices have become essential for capturing human activity, and eXtended Data Records (XDRs) offer rich opportunities for detailed user behavior modeling, which is useful for designing personal...
Synthetic User Behavior Sequence Generation with Large Language Models for Smart HomesZhiyao Xu, Dan Zhao, Qingsong Zou, Jingyu Xiao, Yong Jiang, Zhenhui Yuan, Qing Li2025-01-31下载In recent years, as smart home systems have become more widespread, security concerns within these environments have become a growing threat. Currently, most smart home security solutions, such as ano...
APEX: Automated Parameter Exploration for Low-Power Wireless ProtocolsMohamed Hassaan M. Hydher, Markus Schuss, Olga Saukh, Kay Römer, Carlo Alberto Boano2025-01-31下载Careful parametrization of networking protocols is crucial to maximize the performance of low-power wireless systems and ensure that stringent application requirements can be met.
On Measuring Available Capacity in High-speed Cloud NetworksGanapathy Raman Madanagopal, Christofer Flinta, Andreas Johnsson, Farnaz Moradi, Daniel Turull2025-01-31下载Measurement of available path capacity with high accuracy over high-speed links deployed in cloud and transport networks is vital for performance assessment and traffic engineering.
Reliability Modeling for Beyond-5G Mission Critical Networks Using Effective CapacityAnudeep Karnam, Jobish John, Kishor C. Joshi, George Exarchakos, Sonia Heemstra de Groot, Ignas Niemegeers2025-01-31下载Accurate reliability modeling for ultra-reliable low latency communication (URLLC) and hyper-reliable low latency communication (HRLLC) networks is challenging due to the complex interactions between ...
Swift: Rethinking RDMA Control Plane for Elastic ComputingJunxue Zhang, Han Tian, Xinyang Huang, Wenxue Li, Kaiqiang Xu, Dian Shen, Yong Wang, Kai Chen2025-01-31下载Elastic computing enables dynamic scaling to meet workload demands, and Remote Direct Memory Access (RDMA) enhances this by providing high-throughput, low-latency network communication.
Sharing GPUs and Programmable Switches in a Federated Testbed with SHARYStefano Salsano, Andrea Mayer, Paolo Lungaroni, Pierpaolo Loreti, Lorenzo Bracciale, Andrea Detti, Marco Orazi, Paolo Giaccone, Fulvio Risso, Alessandro Cornacchia, Carla Fabiana Chiasserini2025-01-31下载Federated testbeds enable collaborative research by providing access to diverse resources, including computing power, storage, and specialized hardware like GPUs, programmable switches and smart Netwo...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
On Measuring Available Capacity in High-speed Cloud NetworksGanapathy Raman Madanagopal, Christofer Flinta, Andreas Johnsson, Farnaz Moradi, Daniel Turull2025-01-31下载Measurement of available path capacity with high accuracy over high-speed links deployed in cloud and transport networks is vital for performance assessment and traffic engineering.

cs.PF - Performance

标题作者发布日期PDF摘要
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing TechniquesNathaniel Tomczak, Sanmukh Kuppannagari2025-01-31下载Transformers have demonstrated great success in numerous domains including natural language processing and bioinformatics. This success stems from the use of the attention mechanism by these models in...

基于 VitePress 构建