Appearance
2025-02-21
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor Cores | Benjamin Valpey, Xinyi Li, Sreepathi Pai, Ganesh Gopalakrishnan | 2025-02-21 | 下载 | Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication. |
| PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | Yintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li, Xiaowei Li, Ying Wang, Onur Mutlu | 2025-02-21 | 下载 | Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. |
| Towards Efficient Flash Caches with Emerging NVMe Flexible Data Placement SSDs | Michael Allison, Arun George, Javier Gonzalez, Dan Helmick, Vikash Kumar, Roshan Nair, Vivek Shah | 2025-02-21 | 下载 | NVMe Flash-based SSDs are widely deployed in data centers to cache working sets of large-scale web services. As data centers face increasing sustainability demands, such as reduced carbon emissions, e... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models | Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re | 2025-02-21 | 下载 | We investigate an emerging setup in which a small, on-device language model (LM) with access to local data communicates with a frontier, cloud-hosted LM to solve real-world tasks involving financial, ... |
| Orthogonal Calibration for Asynchronous Federated Learning | Jiayun Zhang, Shuheng Li, Haiyu Huang, Xiaofan Yu, Rajesh K. Gupta, Jingbo Shang | 2025-02-21 | 下载 | Asynchronous federated learning mitigates the inefficiency of conventional synchronous aggregation by integrating updates as they arrive and adjusting their influence based on staleness. |
| Latency-Aware 2-Opt Monotonic Local Search for Distributed Constraint Optimization | Ben Rachmut, Roie Zivan, William Yeoh | 2025-02-21 | 下载 | Researchers recently extended Distributed Constraint Optimization Problems (DCOPs) to Communication-Aware DCOPs so that they are applicable in scenarios in which messages can be arbitrarily delayed. |
| Computation Offloading Strategies in Integrated Terrestrial and Non-Terrestrial Networks | Muhammad Ahmed Mohsin, Muhammad Umer, Amara Umar, Hatem Abou-Zeid, Syed Ali Hassan | 2025-02-21 | 下载 | The rapid growth of computation-intensive applications like augmented reality, autonomous driving, remote healthcare, and smart cities has exposed the limitations of traditional terrestrial networks, ... |
| Blockchain-based Trust Management in Security Credential Management System for Vehicular Network | SangHyun Byun, Arijet Sarker, Sang-Yoon Chang, Jugal Kalita | 2025-02-21 | 下载 | Cellular networking is advancing as a wireless technology to support diverse applications in vehicular communication, enabling vehicles to interact with various applications to enhance the driving exp... |
| What Every Computer Scientist Needs To Know About Parallelization | Temitayo Adefemi | 2025-02-21 | 下载 | Parallelization has become a cornerstone of modern computing, influencing everything from high performance supercomputers to everyday mobile devices. |
| NPB-Rust: NAS Parallel Benchmarks in Rust | Eduardo M. Martins, Leonardo G. Faé, Renato B. Hoffmann, Lucas S. Bianchessi, Dalvan Griebler | 2025-02-21 | 下载 | Parallel programming often requires developers to handle complex computational tasks that can yield many errors in its development cycle. Rust is a performant low-level language that promises memory s... |
| Hiku: Pull-Based Scheduling for Serverless Computing | Saman Akbari, Manfred Hauswirth | 2025-02-21 | 下载 | Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integr... |
| HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds | Chiheng Lou, Sheng Qi, Chao Jin, Dapeng Nie, Haoran Yang, Yu Ding, Xuanzhe Liu, Xin Jin | 2025-02-21 | 下载 | With the proliferation of large language model (LLM) variants, developers are turning to serverless computing for cost-efficient LLM deployment. |
| PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | Yintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li, Xiaowei Li, Ying Wang, Onur Mutlu | 2025-02-21 | 下载 | Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. |
| Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning | Raghav Singhal, Kaustubh Ponkshe, Rohit Vartak, Lav R. Varshney, Praneeth Vepakomma | 2025-02-21 | 下载 | Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditio... |
| OptiLog: Assigning Roles in Byzantine Consensus | Hanish Gogada, Christian Berger, Leander Jehl, Hans P. Reiser, Hein Meling | 2025-02-21 | 下载 | Byzantine Fault-Tolerant (BFT) protocols play an important role in blockchains. As the deployment of such systems extends to wide-area networks, the scalability of BFT protocols becomes a critical con... |
| Sampling in Cloud Benchmarking: A Critical Review and Methodological Guidelines | Saman Akbari, Manfred Hauswirth | 2025-02-21 | 下载 | Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design. |
| Optimal Distributed Replacement Paths | Yi-Jun Chang, Yanyu Chen, Dipan Dey, Gopinath Mishra, Hung Thuan Nguyen, Bryce Sanchez | 2025-02-21 | 下载 | We study the replacement paths problem in the model of distributed computing. Given an - shortest path , the goal is to compute, for every edge in , the shortest-pat... |
| Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC | Lucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco | 2025-02-21 | 下载 | This paper presents the GPU porting through OpenACC directives of the Dutch Atmospheric Large-Eddy Simulation (DALES) application, a high-resolution atmospheric model. |
| From descriptive to distributed | Jan Grebík, Zoltán Vidnyánszky | 2025-02-21 | 下载 | In the past couple of years a rich connection has been found between the fields of descriptive set theory and distributed computing. Frequently, and less surprisingly, finitary algorithms can be adopt... |
| Adversarially-Robust Gossip Algorithms for Approximate Quantile and Mean Computations | Bernhard Haeupler, Marc Kaufmann, Raghu Raman Ravi, Ulysse Schaller | 2025-02-21 | 下载 | This paper presents gossip algorithms for aggregation tasks that demonstrate both robustness to adversarial corruptions of any order of magnitude and optimality across a substantial range of these cor... |
| FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report] | Runhua Zhang, Hongxu Jiang, Jinkun Geng, Yuhang Ma, Chenhui Zhu, Haojie Wang | 2025-02-21 | 下载 | The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collabo... |
| Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANs | Le Zhang, Quanling Zhao, Run Wang, Shirley Bian, Onat Gungor, Flavio Ponzina, Tajana Rosing | 2025-02-21 | 下载 | Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems. |
| SPAARC: Spatial Proximity and Association based prefetching for Augmented Reality in edge Cache | Nikhil Sreekumar, Abhishek Chandra, Jon Weissman | 2025-02-21 | 下载 | Mobile Augmented Reality (MAR) applications face performance challenges due to their high computational demands and need for low-latency responses. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Space-O-RAN: Enabling Intelligent, Open, and Interoperable Non Terrestrial Networks in 6G | Eduardo Baena, Paolo Testolina, Michele Polese, Dimitrios Koutsonikolas, Josep Jornet, Tommaso Melodia | 2025-02-21 | 下载 | Satellite networks are rapidly evolving, yet most \glspl{ntn} remain isolated from terrestrial orchestration frameworks. Their control architectures are typically monolithic and static, limiting their... |
| InSlicing: Interpretable Learning-Assisted Network Slice Configuration in Open Radio Access Networks | Ming Zhao, Yuru Zhang, Qiang Liu, Ahan Kak, Nakjung Choi | 2025-02-21 | 下载 | Network slicing is a key technology enabling the flexibility and efficiency of 5G networks, offering customized services for diverse applications. |
| Blockchain-based Trust Management in Security Credential Management System for Vehicular Network | SangHyun Byun, Arijet Sarker, Sang-Yoon Chang, Jugal Kalita | 2025-02-21 | 下载 | Cellular networking is advancing as a wireless technology to support diverse applications in vehicular communication, enabling vehicles to interact with various applications to enhance the driving exp... |
| A Comprehensive Survey of Linear, Integer, and Mixed-Integer Programming Approaches for Optimizing Resource Allocation in 5G and Beyond Networks | Naveed Ejaz, Salimur Choudhury | 2025-02-21 | 下载 | The introduction of 5G networks has significantly advanced communication technology, offering faster speeds, lower latency, and greater capacity. |
| Starlink in Northern Europe: A New Look at Stationary and In-motion Performance | Muhammad Asad Ullah, Antti Heikkinen, Mikko Uitto, Marko Höyhtyä, Antti Anttonen, Konstantin Mikhaylov, Timo Lind | 2025-02-21 | 下载 | Starlink has introduced the Flat High Performance (FHP) terminal, specifically designed to support the vehicles and the vessels in motion as well as the high-demand stationary users. |
| Energy Efficient Network Path Reconfiguration for Industrial Field Data | Theofanis P. Raptis, Andrea Passarella, Marco Conti | 2025-02-21 | 下载 | Energy efficiency and reliability are vital design requirements of recent industrial networking solutions. Increased energy consumption, poor data access rates and unpredictable end-to-end data access... |
| Complex Electromagnetic Space Combat System-of-systems Modeling and Key Node Identification Method | Xiao Liu, Sudan Han, Jinlin Peng | 2025-02-21 | 下载 | With the application of advanced science and technology in the military field, modern warfare has developed into a confrontation between systems. |
| Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis | Alexandre Gemayel, Dimitrios Michael Manias, Abdallah Shami | 2025-02-21 | 下载 | As smart cities begin to materialize, the role of Unmanned Aerial Vehicles (UAVs) and their reliability becomes increasingly important. One aspect of reliability relates to Condition Monitoring (CM), ... |
| Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANs | Le Zhang, Quanling Zhao, Run Wang, Shirley Bian, Onat Gungor, Flavio Ponzina, Tajana Rosing | 2025-02-21 | 下载 | Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Hiku: Pull-Based Scheduling for Serverless Computing | Saman Akbari, Manfred Hauswirth | 2025-02-21 | 下载 | Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integr... |
| Sampling in Cloud Benchmarking: A Critical Review and Methodological Guidelines | Saman Akbari, Manfred Hauswirth | 2025-02-21 | 下载 | Cloud benchmarks suffer from performance fluctuations caused by resource contention, network latency, hardware heterogeneity, and other factors along with decisions taken in the benchmark design. |
| AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms | Feiyang Chen, Yu Cheng, Lei Wang, Yuqing Xia, Ziming Miao, Lingxiao Ma, Fan Yang, Jilong Xue, Zhi Yang, Mao Yang, Haibo Chen | 2025-02-21 | 下载 | Transformers and large language models (LLMs) have revolutionized machine learning, with attention mechanisms at the core of their success. As the landscape of attention variants expands, so too do th... |