2025-02-21

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor Cores	Benjamin Valpey, Xinyi Li, Sreepathi Pai, Ganesh Gopalakrishnan	2025-02-21	下载	Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication.
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System	Yintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li, Xiaowei Li, Ying Wang, Onur Mutlu	2025-02-21	下载	Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens.
Towards Efficient Flash Caches with Emerging NVMe Flexible Data Placement SSDs	Michael Allison, Arun George, Javier Gonzalez, Dan Helmick, Vikash Kumar, Roshan Nair, Vivek Shah	2025-02-21	下载	NVMe Flash-based SSDs are widely deployed in data centers to cache working sets of large-scale web services. As data centers face increasing sustainability demands, such as reduced carbon emissions, e...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models	Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re	2025-02-21	下载	We investigate an emerging setup in which a small, on-device language model (LM) with access to local data communicates with a frontier, cloud-hosted LM to solve real-world tasks involving financial, ...
Orthogonal Calibration for Asynchronous Federated Learning	Jiayun Zhang, Shuheng Li, Haiyu Huang, Xiaofan Yu, Rajesh K. Gupta, Jingbo Shang	2025-02-21	下载	Asynchronous federated learning mitigates the inefficiency of conventional synchronous aggregation by integrating updates as they arrive and adjusting their influence based on staleness.
Latency-Aware 2-Opt Monotonic Local Search for Distributed Constraint Optimization	Ben Rachmut, Roie Zivan, William Yeoh	2025-02-21	下载	Researchers recently extended Distributed Constraint Optimization Problems (DCOPs) to Communication-Aware DCOPs so that they are applicable in scenarios in which messages can be arbitrarily delayed.
Computation Offloading Strategies in Integrated Terrestrial and Non-Terrestrial Networks	Muhammad Ahmed Mohsin, Muhammad Umer, Amara Umar, Hatem Abou-Zeid, Syed Ali Hassan	2025-02-21	下载	The rapid growth of computation-intensive applications like augmented reality, autonomous driving, remote healthcare, and smart cities has exposed the limitations of traditional terrestrial networks, ...
Blockchain-based Trust Management in Security Credential Management System for Vehicular Network	SangHyun Byun, Arijet Sarker, Sang-Yoon Chang, Jugal Kalita	2025-02-21	下载	Cellular networking is advancing as a wireless technology to support diverse applications in vehicular communication, enabling vehicles to interact with various applications to enhance the driving exp...
What Every Computer Scientist Needs To Know About Parallelization	Temitayo Adefemi	2025-02-21	下载	Parallelization has become a cornerstone of modern computing, influencing everything from high performance supercomputers to everyday mobile devices.
NPB-Rust: NAS Parallel Benchmarks in Rust	Eduardo M. Martins, Leonardo G. Faé, Renato B. Hoffmann, Lucas S. Bianchessi, Dalvan Griebler	2025-02-21	下载	Parallel programming often requires developers to handle complex computational tasks that can yield many errors in its development cycle. Rust is a performant low-level language that promises memory s...
Hiku: Pull-Based Scheduling for Serverless Computing	Saman Akbari, Manfred Hauswirth	2025-02-21	下载	Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integr...
HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds	Chiheng Lou, Sheng Qi, Chao Jin, Dapeng Nie, Haoran Yang, Yu Ding, Xuanzhe Liu, Xin Jin	2025-02-21	下载	With the proliferation of large language model (LLM) variants, developers are turning to serverless computing for cost-efficient LLM deployment.
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System	Yintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li, Xiaowei Li, Ying Wang, Onur Mutlu	2025-02-21	下载	Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens.
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning	Raghav Singhal, Kaustubh Ponkshe, Rohit Vartak, Lav R. Varshney, Praneeth Vepakomma	2025-02-21	下载	Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditio...
OptiLog: Assigning Roles in Byzantine Consensus	Hanish Gogada, Christian Berger, Leander Jehl, Hans P. Reiser, Hein Meling	2025-02-21	下载	Byzantine Fault-Tolerant (BFT) protocols play an important role in blockchains. As the deployment of such systems extends to wide-area networks, the scalability of BFT protocols becomes a critical con...
Sampling in Cloud Benchmarking: A Critical Review and Methodological Guidelines	Saman Akbari, Manfred Hauswirth	2025-02-21	下载	Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design.
Optimal Distributed Replacement Paths	Yi-Jun Chang, Yanyu Chen, Dipan Dey, Gopinath Mishra, Hung Thuan Nguyen, Bryce Sanchez	2025-02-21	下载	We study the replacement paths problem in the $\mathsf{CONGEST}$ model of distributed computing. Given an $s$ - $t$ shortest path $P$ , the goal is to compute, for every edge $e$ in $P$ , the shortest-pat...
Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC	Lucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco	2025-02-21	下载	This paper presents the GPU porting through OpenACC directives of the Dutch Atmospheric Large-Eddy Simulation (DALES) application, a high-resolution atmospheric model.
From descriptive to distributed	Jan Grebík, Zoltán Vidnyánszky	2025-02-21	下载	In the past couple of years a rich connection has been found between the fields of descriptive set theory and distributed computing. Frequently, and less surprisingly, finitary algorithms can be adopt...
Adversarially-Robust Gossip Algorithms for Approximate Quantile and Mean Computations	Bernhard Haeupler, Marc Kaufmann, Raghu Raman Ravi, Ulysse Schaller	2025-02-21	下载	This paper presents gossip algorithms for aggregation tasks that demonstrate both robustness to adversarial corruptions of any order of magnitude and optimality across a substantial range of these cor...
FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]	Runhua Zhang, Hongxu Jiang, Jinkun Geng, Yuhang Ma, Chenhui Zhu, Haojie Wang	2025-02-21	下载	The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collabo...
Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANs	Le Zhang, Quanling Zhao, Run Wang, Shirley Bian, Onat Gungor, Flavio Ponzina, Tajana Rosing	2025-02-21	下载	Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems.
SPAARC: Spatial Proximity and Association based prefetching for Augmented Reality in edge Cache	Nikhil Sreekumar, Abhishek Chandra, Jon Weissman	2025-02-21	下载	Mobile Augmented Reality (MAR) applications face performance challenges due to their high computational demands and need for low-latency responses.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Space-O-RAN: Enabling Intelligent, Open, and Interoperable Non Terrestrial Networks in 6G	Eduardo Baena, Paolo Testolina, Michele Polese, Dimitrios Koutsonikolas, Josep Jornet, Tommaso Melodia	2025-02-21	下载	Satellite networks are rapidly evolving, yet most \glspl{ntn} remain isolated from terrestrial orchestration frameworks. Their control architectures are typically monolithic and static, limiting their...
InSlicing: Interpretable Learning-Assisted Network Slice Configuration in Open Radio Access Networks	Ming Zhao, Yuru Zhang, Qiang Liu, Ahan Kak, Nakjung Choi	2025-02-21	下载	Network slicing is a key technology enabling the flexibility and efficiency of 5G networks, offering customized services for diverse applications.
Blockchain-based Trust Management in Security Credential Management System for Vehicular Network	SangHyun Byun, Arijet Sarker, Sang-Yoon Chang, Jugal Kalita	2025-02-21	下载	Cellular networking is advancing as a wireless technology to support diverse applications in vehicular communication, enabling vehicles to interact with various applications to enhance the driving exp...
A Comprehensive Survey of Linear, Integer, and Mixed-Integer Programming Approaches for Optimizing Resource Allocation in 5G and Beyond Networks	Naveed Ejaz, Salimur Choudhury	2025-02-21	下载	The introduction of 5G networks has significantly advanced communication technology, offering faster speeds, lower latency, and greater capacity.
Starlink in Northern Europe: A New Look at Stationary and In-motion Performance	Muhammad Asad Ullah, Antti Heikkinen, Mikko Uitto, Marko Höyhtyä, Antti Anttonen, Konstantin Mikhaylov, Timo Lind	2025-02-21	下载	Starlink has introduced the Flat High Performance (FHP) terminal, specifically designed to support the vehicles and the vessels in motion as well as the high-demand stationary users.
Energy Efficient Network Path Reconfiguration for Industrial Field Data	Theofanis P. Raptis, Andrea Passarella, Marco Conti	2025-02-21	下载	Energy efficiency and reliability are vital design requirements of recent industrial networking solutions. Increased energy consumption, poor data access rates and unpredictable end-to-end data access...
Complex Electromagnetic Space Combat System-of-systems Modeling and Key Node Identification Method	Xiao Liu, Sudan Han, Jinlin Peng	2025-02-21	下载	With the application of advanced science and technology in the military field, modern warfare has developed into a confrontation between systems.
Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis	Alexandre Gemayel, Dimitrios Michael Manias, Abdallah Shami	2025-02-21	下载	As smart cities begin to materialize, the role of Unmanned Aerial Vehicles (UAVs) and their reliability becomes increasingly important. One aspect of reliability relates to Condition Monitoring (CM), ...
Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANs	Le Zhang, Quanling Zhao, Run Wang, Shirley Bian, Onat Gungor, Flavio Ponzina, Tajana Rosing	2025-02-21	下载	Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Hiku: Pull-Based Scheduling for Serverless Computing	Saman Akbari, Manfred Hauswirth	2025-02-21	下载	Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integr...
Sampling in Cloud Benchmarking: A Critical Review and Methodological Guidelines	Saman Akbari, Manfred Hauswirth	2025-02-21	下载	Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design.
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms	Feiyang Chen, Yu Cheng, Lei Wang, Yuqing Xia, Ziming Miao, Lingxiao Ma, Fan Yang, Jilong Xue, Zhi Yang, Mao Yang, Haibo Chen	2025-02-21	下载	Transformers and large language models (LLMs) have revolutionized machine learning, with attention mechanisms at the core of their success. As the landscape of attention variants expands, so too do th...

2025-02-21 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.PF - Performance ​

2025-02-21

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.PF - Performance