Skip to content

2025-02-25

cs.AR - Architecture

标题作者发布日期PDF摘要
PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMsRuokai Yin, Yuhang Li, Priyadarshini Panda2025-02-25下载Weight-only quantization has been widely explored in large language models (LLMs) to reduce memory storage and data loading overhead. During deployment on single-instruction-multiple-threads (SIMT) ar...
Kitsune: Enabling Dataflow Execution on GPUsMichael Davies, Neal Crago, Karthikeyan Sankaralingam, Stephen W. Keckler2025-02-25下载State of art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a ...
AxMED: Formal Analysis and Automated Design of Approximate Median Filters using BDDsVojtech Mrazek, Zdenek Vasicek2025-02-25下载The increasing demand for energy-efficient solutions has led to the emergence of an approximate computing paradigm that enables power-efficient implementations in various application areas such as ima...
The Art of Beating the Odds with Predictor-Guided Random Design Space ExplorationFelix Arnold, Maxence Bouvier, Ryan Amaudruz, Renzo Andri, Lukas Cavigelli2025-02-25下载This work introduces an innovative method for improving combinational digital circuits through random exploration in MIG-based synthesis. High-quality circuits are crucial for performance, power, and ...
Marco: Configurable Graph-Based Task Solving and Multi-AI Agents Framework for Hardware DesignChia-Tung Ho, Jing Gong, Yunsheng Bai, Chenhui Deng, Haoxing Ren, Brucek Khailany2025-02-25下载Hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around-time (TAT) for optimizing performance, power, area, ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Differentially Private Federated Learning With Time-Adaptive Privacy SpendingShahrzad Kiani, Nupur Kulkarni, Adam Dziedzic, Stark Draper, Franziska Boenisch2025-02-25下载Federated learning (FL) with differential privacy (DP) provides a framework for collaborative machine learning, enabling clients to train a shared model while adhering to strict privacy constraints.
Characterizing Production GPU Workloads using System-wide Telemetry DataOnur Cankur, Brian Austin, Dhruva Kulkarni, Abhinav Bhatele2025-02-25下载GPGPU-accelerated clusters and supercomputers are central to modern high-performance computing (HPC). Over the past decade, these systems continue to expand, and GPUs now expose a wide range of hardwa...
Systems and Algorithms for Convolutional Multi-Hybrid Language Models at ScaleJerome Ku, Eric Nguyen, David W. Romero, Garyk Brixi, Brandon Yang, Anton Vorontsov, Ali Taghibakhshi, Amy X. Lu, Dave P. Burke, Greg Brockman, Stefano Massaroli, Christopher Ré, Patrick D. Hsu, Brian L. Hie, Stefano Ermon, Michael Poli2025-02-25下载We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. First, operators in hybrid models can be tailored to token manipulation tasks such as in-conte...
Introducing JIRIAF: A Virtual Kubelet Integration for Optimizing HPC Resource ProvisioningVardan Gyurjyan, Graham Heyes, Christopher Larrieu, David Lawrence, Jeng-Yuan Tsai2025-02-25下载The JIRIAF (JLab Integrated Research Infrastructure Across Facilities) framework is designed to streamline resource management and optimize high-performance computing (HPC) workloads across heterogene...
ZCCL: Significantly Improving Collective Communication With Error-Bounded Lossy CompressionJiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Khalid Alharthi, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur2025-02-25下载With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communication turns out to be a critical bottleneck in lar...
Kitsune: Enabling Dataflow Execution on GPUsMichael Davies, Neal Crago, Karthikeyan Sankaralingam, Stephen W. Keckler2025-02-25下载State of art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a ...
Causal AI-based Root Cause Identification: Research to Practice at ScaleSaurabh Jha, Ameet Rahane, Laura Shwartz, Marc Palaci-Olgun, Frank Bagehorn, Jesus Rios, Dan Stingaciu, Ragu Kattinakere, Debasish Banerjee2025-02-25下载Modern applications are built as large, distributed systems spanning numerous modules, teams, and data centers. Despite robust engineering and recovery strategies, failures and performance issues rema...
The Built-In Robustness of Decentralized Federated Averaging to Bad DataSamuele Sabella, Chiara Boldrini, Lorenzo Valerio, Andrea Passarella, Marco Conti2025-02-25下载Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller.
Armada: Memory-Efficient Distributed Training of Large-Scale Graph Neural NetworksRoger Waleffe, Devesh Sarda, Jason Mohoney, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Theodoros Rekatsinas, Shivaram Venkataraman2025-02-25下载We study distributed training of Graph Neural Networks (GNNs) on billion-scale graphs that are partitioned across machines. Efficient training in this setting relies on min-edge-cut partitioning algor...
GPUArmor: A Hardware-Software Co-design for Efficient and Scalable Memory Safety on GPUsMohamed Tarek Ibn Ziad, Sana Damani, Mark Stephenson, Stephen W. Keckler, Aamer Jaleel2025-02-25下载Memory safety errors continue to pose a significant threat to current computing systems, and graphics processing units (GPUs) are no exception.
Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLMYuqing Wang, Xiao Yang2025-02-25下载Traditional security protection methods struggle to address sophisticated attack vectors in large-scale distributed systems, particularly when balancing detection accuracy with data privacy concerns.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Wireless sensor networks data synchronization using node MCU memory for precision agriculture applicationsKashif Sattar, Muhammad Arslan, Saqib Majeed, Salim Iqbal2025-02-25下载Wireless Sensor Networks have risen as a highly promising technology suitable for precision agriculture implementations, enabling efficient monitoring and control of agricultural processes.
Equidistant-Sample or Wait-and-Sample to Minimize Age Under Sampling Constraint?Subhankar Banerjee, Sennur Ulukus2025-02-25下载We study a status update system with a source, a sampler, a transmitter, and a monitor. The source governs a stochastic process that the monitor wants to observe in a timely manner.
Semantic and Goal-oriented Wireless Network Coverage: The Area of EffectivenessMattia Merluzzi, Giuseppe Di Poce, Paolo Di Lorenzo2025-02-25下载Assessing wireless coverage is a fundamental task for public network operators and private deployments, whose goal is to guarantee quality of service across the network while minimizing material waste...
Semantic Communications Services within Generalist Operated NetworksQuentin Lampin, Louis-Adrien Dufrène, Guillaume Larue2025-02-25下载This paper addresses the challenge of integrating semantic communication principles into operated networks, traditionally optimized based on network-centric metrics rather than application-specific ne...
BD Currency Detection: A CNN Based Approach with Mobile App IntegrationSyed Jubayer Jaman, Md. Zahurul Haque, Md Robiul Islam, Usama Abdun Noor2025-02-25下载Currency recognition plays a vital role in banking, commerce, and assistive technology for visually impaired individuals. Traditional methods, such as manual verification and optical scanning, often s...
Task-Driven Semantic Quantization and Imitation Learning for Goal-Oriented CommunicationsYu-Chieh Chao, Yubei Chen, Weiwei Wang, Achintha Wijesinghe, Suchinthaka Wanninayaka, Songyang Zhang, Zhi Ding2025-02-25下载Semantic communication marks a new paradigm shift from bit-wise data transmission to semantic information delivery for the purpose of bandwidth reduction.
Routing Dynamics in Distributed Quantum NetworksMst Shapna Akter, Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Fahim Sultan, Emran Kanaan2025-02-25下载Distributed quantum networks are not merely information conduits but intricate systems that embody the principles of quantum mechanics. In our study, we examine the underlying mechanisms of quantum co...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Accelerated Training on Low-Power Edge DevicesMohamed Aboelenien Ahmed, Kilian Pfeiffer, Heba Khdr, Osama Abboud, Ramin Khalili, Jörg Henkel2025-02-25下载Training on edge devices poses several challenges as these devices are generally resource-constrained, especially in terms of power. State-of-the-art techniques at the device level reduce the GPU freq...

cs.PF - Performance

标题作者发布日期PDF摘要
SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model InferenceJintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen2025-02-25下载An efficient attention implementation is essential for large models due to its quadratic time complexity. Fortunately, attention commonly exhibits sparsity, i.e.
DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology ReasoningPusheng Xu, Yue Wu, Kai Jin, Xiaolan Chen, Mingguang He, Danli Shi2025-02-25下载Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three other recently released large language models (LLMs) in bilingual complex ophthalmology cases.
Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLMYuqing Wang, Xiao Yang2025-02-25下载Traditional security protection methods struggle to address sophisticated attack vectors in large-scale distributed systems, particularly when balancing detection accuracy with data privacy concerns.

基于 VitePress 构建