Skip to content

2025-02-21

cs.AR - Architecture

标题作者发布日期PDF摘要
An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor CoresBenjamin Valpey, Xinyi Li, Sreepathi Pai, Ganesh Gopalakrishnan2025-02-21下载Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication.
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing SystemYintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li, Xiaowei Li, Ying Wang, Onur Mutlu2025-02-21下载Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens.
Towards Efficient Flash Caches with Emerging NVMe Flexible Data Placement SSDsMichael Allison, Arun George, Javier Gonzalez, Dan Helmick, Vikash Kumar, Roshan Nair, Vivek Shah2025-02-21下载NVMe Flash-based SSDs are widely deployed in data centers to cache working sets of large-scale web services. As data centers face increasing sustainability demands, such as reduced carbon emissions, e...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Minions: Cost-efficient Collaboration Between On-device and Cloud Language ModelsAvanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re2025-02-21下载We investigate an emerging setup in which a small, on-device language model (LM) with access to local data communicates with a frontier, cloud-hosted LM to solve real-world tasks involving financial, ...
Orthogonal Calibration for Asynchronous Federated LearningJiayun Zhang, Shuheng Li, Haiyu Huang, Xiaofan Yu, Rajesh K. Gupta, Jingbo Shang2025-02-21下载Asynchronous federated learning mitigates the inefficiency of conventional synchronous aggregation by integrating updates as they arrive and adjusting their influence based on staleness.
Latency-Aware 2-Opt Monotonic Local Search for Distributed Constraint OptimizationBen Rachmut, Roie Zivan, William Yeoh2025-02-21下载Researchers recently extended Distributed Constraint Optimization Problems (DCOPs) to Communication-Aware DCOPs so that they are applicable in scenarios in which messages can be arbitrarily delayed.
Computation Offloading Strategies in Integrated Terrestrial and Non-Terrestrial NetworksMuhammad Ahmed Mohsin, Muhammad Umer, Amara Umar, Hatem Abou-Zeid, Syed Ali Hassan2025-02-21下载The rapid growth of computation-intensive applications like augmented reality, autonomous driving, remote healthcare, and smart cities has exposed the limitations of traditional terrestrial networks, ...
Blockchain-based Trust Management in Security Credential Management System for Vehicular NetworkSangHyun Byun, Arijet Sarker, Sang-Yoon Chang, Jugal Kalita2025-02-21下载Cellular networking is advancing as a wireless technology to support diverse applications in vehicular communication, enabling vehicles to interact with various applications to enhance the driving exp...
What Every Computer Scientist Needs To Know About ParallelizationTemitayo Adefemi2025-02-21下载Parallelization has become a cornerstone of modern computing, influencing everything from high performance supercomputers to everyday mobile devices.
NPB-Rust: NAS Parallel Benchmarks in RustEduardo M. Martins, Leonardo G. Faé, Renato B. Hoffmann, Lucas S. Bianchessi, Dalvan Griebler2025-02-21下载Parallel programming often requires developers to handle complex computational tasks that can yield many errors in its development cycle. Rust is a performant low-level language that promises memory s...
Hiku: Pull-Based Scheduling for Serverless ComputingSaman Akbari, Manfred Hauswirth2025-02-21下载Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integr...
HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public CloudsChiheng Lou, Sheng Qi, Chao Jin, Dapeng Nie, Haoran Yang, Yu Ding, Xuanzhe Liu, Xin Jin2025-02-21下载With the proliferation of large language model (LLM) variants, developers are turning to serverless computing for cost-efficient LLM deployment.
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing SystemYintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li, Xiaowei Li, Ying Wang, Onur Mutlu2025-02-21下载Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens.
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-TuningRaghav Singhal, Kaustubh Ponkshe, Rohit Vartak, Lav R. Varshney, Praneeth Vepakomma2025-02-21下载Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditio...
OptiLog: Assigning Roles in Byzantine ConsensusHanish Gogada, Christian Berger, Leander Jehl, Hans P. Reiser, Hein Meling2025-02-21下载Byzantine Fault-Tolerant (BFT) protocols play an important role in blockchains. As the deployment of such systems extends to wide-area networks, the scalability of BFT protocols becomes a critical con...
Sampling in Cloud Benchmarking: A Critical Review and Methodological GuidelinesSaman Akbari, Manfred Hauswirth2025-02-21下载Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design.
Optimal Distributed Replacement PathsYi-Jun Chang, Yanyu Chen, Dipan Dey, Gopinath Mishra, Hung Thuan Nguyen, Bryce Sanchez2025-02-21下载We study the replacement paths problem in the CONGEST\mathsf{CONGEST} model of distributed computing. Given an ss-tt shortest path PP, the goal is to compute, for every edge ee in PP, the shortest-pat...
Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACCLucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco2025-02-21下载This paper presents the GPU porting through OpenACC directives of the Dutch Atmospheric Large-Eddy Simulation (DALES) application, a high-resolution atmospheric model.
From descriptive to distributedJan Grebík, Zoltán Vidnyánszky2025-02-21下载In the past couple of years a rich connection has been found between the fields of descriptive set theory and distributed computing. Frequently, and less surprisingly, finitary algorithms can be adopt...
Adversarially-Robust Gossip Algorithms for Approximate Quantile and Mean ComputationsBernhard Haeupler, Marc Kaufmann, Raghu Raman Ravi, Ulysse Schaller2025-02-21下载This paper presents gossip algorithms for aggregation tasks that demonstrate both robustness to adversarial corruptions of any order of magnitude and optimality across a substantial range of these cor...
FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]Runhua Zhang, Hongxu Jiang, Jinkun Geng, Yuhang Ma, Chenhui Zhu, Haojie Wang2025-02-21下载The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collabo...
Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANsLe Zhang, Quanling Zhao, Run Wang, Shirley Bian, Onat Gungor, Flavio Ponzina, Tajana Rosing2025-02-21下载Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems.
SPAARC: Spatial Proximity and Association based prefetching for Augmented Reality in edge CacheNikhil Sreekumar, Abhishek Chandra, Jon Weissman2025-02-21下载Mobile Augmented Reality (MAR) applications face performance challenges due to their high computational demands and need for low-latency responses.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Space-O-RAN: Enabling Intelligent, Open, and Interoperable Non Terrestrial Networks in 6GEduardo Baena, Paolo Testolina, Michele Polese, Dimitrios Koutsonikolas, Josep Jornet, Tommaso Melodia2025-02-21下载Satellite networks are rapidly evolving, yet most \glspl{ntn} remain isolated from terrestrial orchestration frameworks. Their control architectures are typically monolithic and static, limiting their...
InSlicing: Interpretable Learning-Assisted Network Slice Configuration in Open Radio Access NetworksMing Zhao, Yuru Zhang, Qiang Liu, Ahan Kak, Nakjung Choi2025-02-21下载Network slicing is a key technology enabling the flexibility and efficiency of 5G networks, offering customized services for diverse applications.
Blockchain-based Trust Management in Security Credential Management System for Vehicular NetworkSangHyun Byun, Arijet Sarker, Sang-Yoon Chang, Jugal Kalita2025-02-21下载Cellular networking is advancing as a wireless technology to support diverse applications in vehicular communication, enabling vehicles to interact with various applications to enhance the driving exp...
A Comprehensive Survey of Linear, Integer, and Mixed-Integer Programming Approaches for Optimizing Resource Allocation in 5G and Beyond NetworksNaveed Ejaz, Salimur Choudhury2025-02-21下载The introduction of 5G networks has significantly advanced communication technology, offering faster speeds, lower latency, and greater capacity.
Starlink in Northern Europe: A New Look at Stationary and In-motion PerformanceMuhammad Asad Ullah, Antti Heikkinen, Mikko Uitto, Marko Höyhtyä, Antti Anttonen, Konstantin Mikhaylov, Timo Lind2025-02-21下载Starlink has introduced the Flat High Performance (FHP) terminal, specifically designed to support the vehicles and the vessels in motion as well as the high-demand stationary users.
Energy Efficient Network Path Reconfiguration for Industrial Field DataTheofanis P. Raptis, Andrea Passarella, Marco Conti2025-02-21下载Energy efficiency and reliability are vital design requirements of recent industrial networking solutions. Increased energy consumption, poor data access rates and unpredictable end-to-end data access...
Complex Electromagnetic Space Combat System-of-systems Modeling and Key Node Identification MethodXiao Liu, Sudan Han, Jinlin Peng2025-02-21下载With the application of advanced science and technology in the military field, modern warfare has developed into a confrontation between systems.
Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration AnalysisAlexandre Gemayel, Dimitrios Michael Manias, Abdallah Shami2025-02-21下载As smart cities begin to materialize, the role of Unmanned Aerial Vehicles (UAVs) and their reliability becomes increasingly important. One aspect of reliability relates to Condition Monitoring (CM), ...
Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANsLe Zhang, Quanling Zhao, Run Wang, Shirley Bian, Onat Gungor, Flavio Ponzina, Tajana Rosing2025-02-21下载Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems.

cs.PF - Performance

标题作者发布日期PDF摘要
Hiku: Pull-Based Scheduling for Serverless ComputingSaman Akbari, Manfred Hauswirth2025-02-21下载Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integr...
Sampling in Cloud Benchmarking: A Critical Review and Methodological GuidelinesSaman Akbari, Manfred Hauswirth2025-02-21下载Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design.
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware PlatformsFeiyang Chen, Yu Cheng, Lei Wang, Yuqing Xia, Ziming Miao, Lingxiao Ma, Fan Yang, Jilong Xue, Zhi Yang, Mao Yang, Haibo Chen2025-02-21下载Transformers and large language models (LLMs) have revolutionized machine learning, with attention mechanisms at the core of their success. As the landscape of attention variants expands, so too do th...

基于 VitePress 构建