Skip to content

2025-01-21

cs.AR - Architecture

标题作者发布日期PDF摘要
Analysis of a Memcapacitor-Based for Neural Network Accelerator FrameworkAnkur Singh, Dowon Kim, Byung-Geun Lee2025-01-21下载Data-intensive computing tasks, such as training neural networks, are crucial for artificial intelligence applications but often come with high energy demands.
Pushing the Limits of BFP on Narrow Precision LLM InferenceHui Wang, Yuan Cheng, Xiaomeng Han, Zhengpeng Zhao, Dawei Yang, Zhe Jiang2025-01-21下载The substantial computational and memory demands of Large Language Models (LLMs) hinder their deployment. Block Floating Point (BFP) has proven effective in accelerating linear operations, a cornersto...
Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level AnalysisWeile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Hongyuan Liu, Qiang Wang, Xiaowen Chu2025-01-21下载This study presents a comprehensive multi-level analysis of the NVIDIA Hopper GPU architecture, focusing on its performance characteristics and novel features.
Accelerating Recommender Model ETL with a Streaming FPGA-GPU DataflowYu Zhu, Wenqi Jiang, Piyumi Jasin Pathiranage, Yongjun He, Gustavo Alonso2025-01-21下载The real-time performance of recommender models depends on the continuous integration of massive volumes of new user interaction data into training pipelines.
A Fully Pipelined FIFO Based Polynomial Multiplication Hardware Architecture Based On Number Theoretic TransformMoslem Heidarpur, Mitra Mirhassani, Norman Chang2025-01-21下载This paper presents digital hardware for computing polynomial multiplication using Number Theoretic Transform (NTT), specifically designed for implementation on Field Programmable Gate Arrays (FPGAs).
Supervised Learning for Analog and RF Circuit Design: Benchmarks and Comparative InsightsAsal Mehradfar, Xuzhe Zhao, Yue Niu, Sara Babakniya, Mahdi Alesheikh, Hamidreza Aghasi, Salman Avestimehr2025-01-21下载Automating analog and radio-frequency (RF) circuit design using machine learning (ML) significantly reduces the time and effort required for parameter optimization.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
An Empirical Characterization of Outages and Incidents in Public Services for Large Language ModelsXiaoyu Chu, Sacheendra Talluri, Qingxian Lu, Alexandru Iosup2025-01-21下载People and businesses increasingly rely on public LLM services, such as ChatGPT, DALLE, and Claude. Understanding their outages, and particularly measuring their failure-recovery processes, is becomin...
More for Less: Integrating Capability-Predominant and Capacity-Predominant ComputingZhong Zheng, Michael E. Papka, Zhiling Lan2025-01-21下载Capability jobs (e.g., large, long-running tasks) and capacity jobs (e.g., small, short-running tasks) are two common types of workloads in high-performance computing (HPC).
CYCle: Choosing Your Collaborators Wisely to Enhance Collaborative Fairness in Decentralized LearningNurbek Tastan, Samuel Horvath, Karthik Nandakumar2025-01-21下载Collaborative learning (CL) enables multiple participants to jointly train machine learning (ML) models on decentralized data sources without raw data sharing.
Empower Healthcare through a Self-Sovereign Identity Infrastructure for Secure Electronic Health Data AccessAntonio López Martínez, Montassar Naghmouchi, Maryline Laurent, Joaquin Garcia-Alfaro, Manuel Gil Pérez, Antonio Ruiz Martínez, Pantaleone Nespoli2025-01-21下载Health data is one of the most sensitive data for people, which attracts the attention of malicious activities. We propose an open-source health data management framework, that follows a patient-centr...
Quantifying Energy and Cost Benefits of Hybrid Edge Cloud: Analysis of Traditional and Agentic WorkloadsSiavash Alamouti2025-01-21下载This paper examines the workload distribution challenges in centralized cloud systems and demonstrates how Hybrid Edge Cloud (HEC) [1] mitigates these inefficiencies.
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative DecodingZikun Li, Zhuofu Chen, Remi Delacourt, Gabriele Oliaro, Zeyu Wang, Qinghan Chen, Shuhuai Lin, April Yang, Zhihao Zhang, Zhuoming Chen, Sean Lai, Xinhao Cheng, Xupeng Miao, Zhihao Jia2025-01-21下载Modern large language model (LLM) applications exhibit diverse service-level objectives (SLOs), from low-latency requirements in interactive coding assistants to more relaxed constraints in data wrang...
BotDetect: A Decentralized Federated Learning Framework for Detecting Financial Bots on the EVM BlockchainsAhmed Mounsf Rafik Bendada, Abdelaziz Amara Korba, Mouhamed Amine Bouchiha, Yacine Ghamri-Doudane2025-01-21下载The rapid growth of decentralized finance (DeFi) has led to the widespread use of automated agents, or bots, within blockchain ecosystems like Ethereum, Binance Smart Chain, and Solana.
Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level AnalysisWeile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Hongyuan Liu, Qiang Wang, Xiaowen Chu2025-01-21下载This study presents a comprehensive multi-level analysis of the NVIDIA Hopper GPU architecture, focusing on its performance characteristics and novel features.
O(1)O(1)-Round MPC Algorithms for Multi-dimensional Grid Graph Connectivity, EMST and DBSCANJunhao Gan, Anthony Wirth, Zhuo Zhang2025-01-21下载In this paper, we investigate three fundamental problems in the Massively Parallel Computation (MPC) model: (i) grid graph connectivity, (ii) approximate Euclidean Minimum Spanning Tree (EMST), and (i...
Accelerating Recommender Model ETL with a Streaming FPGA-GPU DataflowYu Zhu, Wenqi Jiang, Piyumi Jasin Pathiranage, Yongjun He, Gustavo Alonso2025-01-21下载The real-time performance of recommender models depends on the continuous integration of massive volumes of new user interaction data into training pipelines.
Binary integer programming for optimizing ebit cost in distributed quantum circuits with fixed module allocationHyunho Cha, Jungwoo Lee2025-01-21下载Modular and networked quantum architectures can scale beyond the qubit count of a single device, but executing a circuit across modules requires implementing non-local two-qubit gates using shared ent...
SPID-Chain: A Smart Contract-Enabled, Polar-Coded Interoperable DAG ChainAmirhossein Taherpour, Xiaodong Wang2025-01-21下载As the digital landscape evolves, Web3 has gained prominence, highlighting the critical role of decentralized, interconnected, and verifiable digital ecosystems.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Sequence Spreading-Based Semantic Communication Under High RF InterferenceHazem Barka, Georges Kaddoum, Mehdi Bennis, Md Sahabul Alam, Minh Au2025-01-21下载In the evolving landscape of wireless communications, semantic communication (SemCom) has recently emerged as a 6G enabler that prioritizes the transmission of meaning and contextual relevance over co...
Light commodity devices for building vehicular ad hoc networks: An experimental studyJamal Toutouh, Enrique Alba2025-01-21下载Vehicular communication networks represent both an opportunity and a challenge for providing smart mobility services by using a hybrid solution that relies on cellular connectivity and short range com...
QoS-Aware Radio Access Technology (RAT) Selection in Hybrid Vehicular NetworksZeeshan Hameed Mir, Jamal Toutouh, Fethi Filali, Enrique Alba2025-01-21下载The increasing number of wireless communication technologies and standards bring immense opportunities and challenges to provide seamless connectivity in Hybrid Vehicular Networks (HVNs).
A Stochastic Geometry Based Techno-Economic Analysis of RIS-Assisted Cellular NetworksGuodong Sun, Francois Baccelli, Luis Uzeda Garcia, Stefano Paris2025-01-21下载Reconfigurable intelligent surfaces (RISs) are a promising technology for enhancing cellular network performance and yielding additional value to network operators.
Harnessing Generative Pre-Trained Transformer for Datacenter Packet Trace GenerationChen Griner2025-01-21下载Today, the rapid growth of applications reliant on datacenters calls for new advancements to meet the increasing traffic and computational demands.
Power Amplifier-Aware Transmit Power Optimization for OFDM and SC-FDMA SystemsPawel Kryszkiewicz2025-01-21下载The Single Carrier-Frequency Division Multiple Access (SC-FDMA) is a transmission technique used in the uplink of Long Term Evolution (LTE) and 5G systems, as it is characterized by reduced transmitte...
Message Replication for Improving Reliability of LR-FHSS Direct-to-Satellite IoTSonu Rathi, Siddhartha S. Borkotoky2025-01-21下载Long-range frequency-hopping spread spectrum (LR-FHSS) promises to enhance network capacity by integrating frequency hopping into existing Long Range Wide Area Networks (LoRaWANs).
RayLoc: Wireless Indoor Localization via Fully Differentiable Ray-tracingXueqiang Han, Tianyue Zheng, Tony Xiao Han, Jun Luo2025-01-21下载Wireless indoor localization has been a pivotal area of research over the last two decades, becoming a cornerstone for numerous sensing applications.

cs.PF - Performance

标题作者发布日期PDF摘要
An Empirical Characterization of Outages and Incidents in Public Services for Large Language ModelsXiaoyu Chu, Sacheendra Talluri, Qingxian Lu, Alexandru Iosup2025-01-21下载People and businesses increasingly rely on public LLM services, such as ChatGPT, DALLE, and Claude. Understanding their outages, and particularly measuring their failure-recovery processes, is becomin...
Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level AnalysisWeile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Hongyuan Liu, Qiang Wang, Xiaowen Chu2025-01-21下载This study presents a comprehensive multi-level analysis of the NVIDIA Hopper GPU architecture, focusing on its performance characteristics and novel features.
A Stochastic Geometry Based Techno-Economic Analysis of RIS-Assisted Cellular NetworksGuodong Sun, Francois Baccelli, Luis Uzeda Garcia, Stefano Paris2025-01-21下载Reconfigurable intelligent surfaces (RISs) are a promising technology for enhancing cellular network performance and yielding additional value to network operators.

基于 VitePress 构建