Appearance
2025-02-25
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs | Ruokai Yin, Yuhang Li, Priyadarshini Panda | 2025-02-25 | 下载 | Weight-only quantization has been widely explored in large language models (LLMs) to reduce memory storage and data loading overhead. During deployment on single-instruction-multiple-threads (SIMT) ar... |
| Kitsune: Enabling Dataflow Execution on GPUs | Michael Davies, Neal Crago, Karthikeyan Sankaralingam, Stephen W. Keckler | 2025-02-25 | 下载 | State of art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a ... |
| AxMED: Formal Analysis and Automated Design of Approximate Median Filters using BDDs | Vojtech Mrazek, Zdenek Vasicek | 2025-02-25 | 下载 | The increasing demand for energy-efficient solutions has led to the emergence of an approximate computing paradigm that enables power-efficient implementations in various application areas such as ima... |
| The Art of Beating the Odds with Predictor-Guided Random Design Space Exploration | Felix Arnold, Maxence Bouvier, Ryan Amaudruz, Renzo Andri, Lukas Cavigelli | 2025-02-25 | 下载 | This work introduces an innovative method for improving combinational digital circuits through random exploration in MIG-based synthesis. High-quality circuits are crucial for performance, power, and ... |
| Marco: Configurable Graph-Based Task Solving and Multi-AI Agents Framework for Hardware Design | Chia-Tung Ho, Jing Gong, Yunsheng Bai, Chenhui Deng, Haoxing Ren, Brucek Khailany | 2025-02-25 | 下载 | Hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around-time (TAT) for optimizing performance, power, area, ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Differentially Private Federated Learning With Time-Adaptive Privacy Spending | Shahrzad Kiani, Nupur Kulkarni, Adam Dziedzic, Stark Draper, Franziska Boenisch | 2025-02-25 | 下载 | Federated learning (FL) with differential privacy (DP) provides a framework for collaborative machine learning, enabling clients to train a shared model while adhering to strict privacy constraints. |
| Characterizing Production GPU Workloads using System-wide Telemetry Data | Onur Cankur, Brian Austin, Dhruva Kulkarni, Abhinav Bhatele | 2025-02-25 | 下载 | GPGPU-accelerated clusters and supercomputers are central to modern high-performance computing (HPC). Over the past decade, these systems continue to expand, and GPUs now expose a wide range of hardwa... |
| Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale | Jerome Ku, Eric Nguyen, David W. Romero, Garyk Brixi, Brandon Yang, Anton Vorontsov, Ali Taghibakhshi, Amy X. Lu, Dave P. Burke, Greg Brockman, Stefano Massaroli, Christopher Ré, Patrick D. Hsu, Brian L. Hie, Stefano Ermon, Michael Poli | 2025-02-25 | 下载 | We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. First, operators in hybrid models can be tailored to token manipulation tasks such as in-conte... |
| Introducing JIRIAF: A Virtual Kubelet Integration for Optimizing HPC Resource Provisioning | Vardan Gyurjyan, Graham Heyes, Christopher Larrieu, David Lawrence, Jeng-Yuan Tsai | 2025-02-25 | 下载 | The JIRIAF (JLab Integrated Research Infrastructure Across Facilities) framework is designed to streamline resource management and optimize high-performance computing (HPC) workloads across heterogene... |
| ZCCL: Significantly Improving Collective Communication With Error-Bounded Lossy Compression | Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Khalid Alharthi, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur | 2025-02-25 | 下载 | With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communication turns out to be a critical bottleneck in lar... |
| Kitsune: Enabling Dataflow Execution on GPUs | Michael Davies, Neal Crago, Karthikeyan Sankaralingam, Stephen W. Keckler | 2025-02-25 | 下载 | State of art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a ... |
| Causal AI-based Root Cause Identification: Research to Practice at Scale | Saurabh Jha, Ameet Rahane, Laura Shwartz, Marc Palaci-Olgun, Frank Bagehorn, Jesus Rios, Dan Stingaciu, Ragu Kattinakere, Debasish Banerjee | 2025-02-25 | 下载 | Modern applications are built as large, distributed systems spanning numerous modules, teams, and data centers. Despite robust engineering and recovery strategies, failures and performance issues rema... |
| The Built-In Robustness of Decentralized Federated Averaging to Bad Data | Samuele Sabella, Chiara Boldrini, Lorenzo Valerio, Andrea Passarella, Marco Conti | 2025-02-25 | 下载 | Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller. |
| Armada: Memory-Efficient Distributed Training of Large-Scale Graph Neural Networks | Roger Waleffe, Devesh Sarda, Jason Mohoney, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Theodoros Rekatsinas, Shivaram Venkataraman | 2025-02-25 | 下载 | We study distributed training of Graph Neural Networks (GNNs) on billion-scale graphs that are partitioned across machines. Efficient training in this setting relies on min-edge-cut partitioning algor... |
| GPUArmor: A Hardware-Software Co-design for Efficient and Scalable Memory Safety on GPUs | Mohamed Tarek Ibn Ziad, Sana Damani, Mark Stephenson, Stephen W. Keckler, Aamer Jaleel | 2025-02-25 | 下载 | Memory safety errors continue to pose a significant threat to current computing systems, and graphics processing units (GPUs) are no exception. |
| Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM | Yuqing Wang, Xiao Yang | 2025-02-25 | 下载 | Traditional security protection methods struggle to address sophisticated attack vectors in large-scale distributed systems, particularly when balancing detection accuracy with data privacy concerns. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Wireless sensor networks data synchronization using node MCU memory for precision agriculture applications | Kashif Sattar, Muhammad Arslan, Saqib Majeed, Salim Iqbal | 2025-02-25 | 下载 | Wireless Sensor Networks have risen as a highly promising technology suitable for precision agriculture implementations, enabling efficient monitoring and control of agricultural processes. |
| Equidistant-Sample or Wait-and-Sample to Minimize Age Under Sampling Constraint? | Subhankar Banerjee, Sennur Ulukus | 2025-02-25 | 下载 | We study a status update system with a source, a sampler, a transmitter, and a monitor. The source governs a stochastic process that the monitor wants to observe in a timely manner. |
| Semantic and Goal-oriented Wireless Network Coverage: The Area of Effectiveness | Mattia Merluzzi, Giuseppe Di Poce, Paolo Di Lorenzo | 2025-02-25 | 下载 | Assessing wireless coverage is a fundamental task for public network operators and private deployments, whose goal is to guarantee quality of service across the network while minimizing material waste... |
| Semantic Communications Services within Generalist Operated Networks | Quentin Lampin, Louis-Adrien Dufrène, Guillaume Larue | 2025-02-25 | 下载 | This paper addresses the challenge of integrating semantic communication principles into operated networks, traditionally optimized based on network-centric metrics rather than application-specific ne... |
| BD Currency Detection: A CNN Based Approach with Mobile App Integration | Syed Jubayer Jaman, Md. Zahurul Haque, Md Robiul Islam, Usama Abdun Noor | 2025-02-25 | 下载 | Currency recognition plays a vital role in banking, commerce, and assistive technology for visually impaired individuals. Traditional methods, such as manual verification and optical scanning, often s... |
| Task-Driven Semantic Quantization and Imitation Learning for Goal-Oriented Communications | Yu-Chieh Chao, Yubei Chen, Weiwei Wang, Achintha Wijesinghe, Suchinthaka Wanninayaka, Songyang Zhang, Zhi Ding | 2025-02-25 | 下载 | Semantic communication marks a new paradigm shift from bit-wise data transmission to semantic information delivery for the purpose of bandwidth reduction. |
| Routing Dynamics in Distributed Quantum Networks | Mst Shapna Akter, Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Fahim Sultan, Emran Kanaan | 2025-02-25 | 下载 | Distributed quantum networks are not merely information conduits but intricate systems that embody the principles of quantum mechanics. In our study, we examine the underlying mechanisms of quantum co... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Accelerated Training on Low-Power Edge Devices | Mohamed Aboelenien Ahmed, Kilian Pfeiffer, Heba Khdr, Osama Abboud, Ramin Khalili, Jörg Henkel | 2025-02-25 | 下载 | Training on edge devices poses several challenges as these devices are generally resource-constrained, especially in terms of power. State-of-the-art techniques at the device level reduce the GPU freq... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference | Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen | 2025-02-25 | 下载 | An efficient attention implementation is essential for large models due to its quadratic time complexity. Fortunately, attention commonly exhibits sparsity, i.e. |
| DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning | Pusheng Xu, Yue Wu, Kai Jin, Xiaolan Chen, Mingguang He, Danli Shi | 2025-02-25 | 下载 | Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three other recently released large language models (LLMs) in bilingual complex ophthalmology cases. |
| Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM | Yuqing Wang, Xiao Yang | 2025-02-25 | 下载 | Traditional security protection methods struggle to address sophisticated attack vectors in large-scale distributed systems, particularly when balancing detection accuracy with data privacy concerns. |