Skip to content

2025-06-17

cs.AR - Architecture

标题作者发布日期PDF摘要
Scaling Intelligence: Designing Data Centers for Next-Gen Language ModelsJesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini2025-06-17下载The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and co...
ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge ProcessorsJongin Choi, Jina Park, Woojoo Lee, Jae-Jin Lee, Massoud Pedram2025-06-17下载Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges.
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing GuaranteesAhmed Heakl, Sarim Hashmi, Chaimaa Abi, Celine Lee, Abdulrahman Mahmoud2025-06-17下载The hardware ecosystem is rapidly evolving, with increasing interest in translating low-level programs across different instruction set architectures (ISAs) in a quick, flexible, and correct way to en...
Empirically-Calibrated H100 Node Power Models for Reducing Uncertainty in AI Training Energy EstimationAlex C. Newkirk, Jared Fernandez, Jonathan Koomey, Imran Latif, Emma Strubell, Arman Shehabi, Constantine Samaras2025-06-17下载As AI's energy demand continues to grow, it is critical to enhance the understanding of characteristics of this demand, to improve grid infrastructure planning and environmental assessment.
Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoCWeiyu Zhou, Zheng Wang, Chao Chen, Yike Li, Yongkui Yang, Zhuoyu Wu, Anupam Chattopadhyay2025-06-17下载While recent advances in AI SoC design have focused heavily on accelerating tensor computation, the equally critical task of tensor manipulation, centered on high,volume data movement with minimal com...
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and VerificationNathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, Haoxing Ren2025-06-17下载We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Scaling Intelligence: Designing Data Centers for Next-Gen Language ModelsJesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini2025-06-17下载The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and co...
Zarr-Based Chunk-Level Cumulative Sums in Reduced DimensionsHailiang Zhang, Dieu My T. Nguyen, Christine Smit, Mahabal Hegde2025-06-17下载Data analysis on massive multi-dimensional data, such as high-resolution large-region time averaging or area averaging for geospatial data, often involves calculations over a significant number of dat...
Utility-Driven Speculative Decoding for Mixture-of-ExpertsAnish Saxena, Po-An Tsai, Hritvik Taneja, Aamer Jaleel, Moinuddin Qureshi2025-06-17下载GPU memory bandwidth is the main bottleneck for low-latency Large Language Model (LLM) inference. Speculative decoding leverages idle GPU compute by using a lightweight drafter to propose K tokens, wh...
Event-Driven Online Vertical Federated LearningGanyu Wang, Boyu Wang, Bin Gu, Charles Ling2025-06-17下载Online learning is more adaptable to real-world scenarios in Vertical Federated Learning (VFL) compared to offline learning. However, integrating online learning into VFL presents challenges due to th...
Scalable GPU Performance Variability Analysis frameworkAnkur Lahiry, Ayush Pokharel, Seth Ockerman, Amal Gueroudji, Line Pouchard, Tanzima Z. Islam2025-06-17下载Analyzing large-scale performance logs from GPU profilers often requires terabytes of memory and hours of runtime, even for basic summaries. These constraints prevent timely insight and hinder the int...
Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC ClustersSergio Iserte, Iker Martín-Álvarez, Krzysztof Rojek, José I. Aliaga, Maribel Castillo, Weronika Folwarska, Antonio J. Peña2025-06-17下载Dynamic resource management is essential for optimizing computational efficiency in modern high-performance computing (HPC) environments, particularly as systems scale.
SETI@home: Data Acquisition and Front-End ProcessingEric J. Korpela, David P. Anderson, Jeff Cobb, Matt Lebofsky, Wei Liu, Dan Werthimer2025-06-17下载SETI@home is a radio Search for Extraterrestrial Intelligence (SETI) project, looking for technosignatures in data recorded at multiple observatories from 1998 to 2020.
ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC SystemYongqian Sun, Xijie Pan, Xiao Xiong, Lei Tao, Jiaju Wang, Shenglin Zhang, Yuan Yuan, Yuqi Li, Kunlin Jian2025-06-17下载Network failure diagnosis is challenging yet critical for high-performance computing (HPC) systems. Existing methods cannot be directly applied to HPC scenarios due to data heterogeneity and lack of a...
Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)Rúben Adão, Zhongjie Wu, Changjun Zhou, Oana Balmau, João Paulo, Ricardo Macedo2025-06-17下载We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage...
Concepts for designing modern C++ interfaces for MPIC. Nicole Avans, Alfredo A. Correa, Sayan Ghosh, Matthias Schimek, Joseph Schuchart, Anthony Skjellum, Evan D. Suggs, Tim Niklas Uhl2025-06-17下载Since the C++ bindings were deleted in 2008, the Message Passing Interface (MPI) community has revived efforts in building high-level modern C++ interfaces.
Consensus Power Inequality: A Comparative Study of Blockchain NetworksKamil Tylinski, Abylay Satybaldy, Paolo Tasca2025-06-17下载The distribution of consensus power is a cornerstone of decentralisation, influencing the security, resilience, and fairness of blockchain networks while ensuring equitable impact among participants.
Convergence-Privacy-Fairness Trade-Off in Personalized Federated LearningXiyu Zhao, Qimei Cui, Weicai Li, Wei Ni, Ekram Hossain, Quan Z. Sheng, Xiaofeng Tao, Ping Zhang2025-06-17下载Personalized federated learning (PFL), e.g., the renowned Ditto, strikes a balance between personalization and generalization by conducting federated learning (FL) to guide personalized learning (PL).
A Novel Indicator for Quantifying and Minimizing Information Utility Loss of Robot TeamsXiyu Zhao, Qimei Cui, Wei Ni, Quan Z. Sheng, Abbas Jamalipour, Guoshun Nan, Xiaofeng Tao, Ping Zhang2025-06-17下载The timely exchange of information among robots within a team is vital, but it can be constrained by limited wireless capacity. The inability to deliver information promptly can result in estimation e...
The Redundancy of Full Nodes in Bitcoin: A Network-Theoretic Demonstration of Miner-Centric Propagation TopologiesDr Craig S Wright2025-06-17下载This paper formally examines the network structure of Bitcoin CORE (BTC) and Bitcoin Satoshi Vision (BSV) using complex graph theory to demonstrate that home-hosted full nodes are incapable of partici...
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM AgentsQizheng Zhang, Michael Wornow, Gerry Wan, Kunle Olukotun2025-06-17下载LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements.
Efficient Serving of LLM Applications with Probabilistic Demand ModelingYifei Liu, Zuo Gan, Zhenghao Gan, Weiye Wang, Chen Chen, Yizhou Shan, Xusheng Chen, Zhenhua Han, Yifei Zhu, Shixuan Sun, Minyi Guo2025-06-17下载Applications based on Large Language Models (LLMs) contains a series of tasks to address real-world problems with boosted capability, which have dynamic demand volumes on diverse backends.
Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation ReuseJinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park2025-06-17下载Recently, Video-Language Models (VideoLMs) have demonstrated remarkable capabilities, offering significant potential for flexible and powerful video query systems.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLCEman Alqudah, Ashfaq Khokhar2025-06-17下载Ensuring packet-level communication quality is vital for ultra-reliable, low-latency communications (URLLC) in large-scale industrial wireless networks.
CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLCEman Alqudah, Ashfaq Khokhar2025-06-17下载Ensuring packet-level communication quality is vital for ultra-reliable, low-latency communications (URLLC) in large-scale industrial wireless networks.
Determinação Automática de Limiar de Detecção de Ataques em Redes de Computadores Utilizando AutoencodersLuan Gonçalves Miranda, Pedro Ivo da Cruz, Murilo Bellezoni Loiola2025-06-17下载Currently, digital security mechanisms like Anomaly Detection Systems using Autoencoders (AE) show great potential for bypassing problems intrinsic to the data, such as data imbalance.
Vulnerability Disclosure or Notification? Best Practices for Reaching Stakeholders at ScaleTing-Han Chen, Jeroen van der Ham-de Vos2025-06-17下载Security researchers are interested in security vulnerabilities, but these security vulnerabilities create risks for stakeholders. Coordinated Vulnerability Disclosure has been an accepted best practi...
A Novel Dynamic Bandwidth Allocation Design for 100G Coherent Passive Optical NetworkRujia Zou, Haipeng Zhang, Karthik Sundaresan, Zhensheng Jia, Suresh Subramaniam2025-06-17下载With the rapid advancements in coherent Passive Optical Network (PON) technologies featuring 100G and higher data rates, this paper addresses the urgent requirement for sophisticated simulation and MA...
Optimizing System Latency for Blockchain-Encrypted Edge Computing in Internet of VehiclesCui Zhang, Maoxin Ji, Qiong Wu, Pingyi Fan, Qiang Fan2025-06-17下载As Internet of Vehicles (IoV) technology continues to advance, edge computing has become an important tool for assisting vehicles in handling complex tasks.
TraGe: A Generic Packet Representation for Traffic Classification Based on Header-Payload DifferencesChungang Lin, Yilong Jiang, Weiyao Zhang, Xuying Meng, Tianyu Zuo, Yujun Zhang2025-06-17下载Traffic classification has a significant impact on maintaining the Quality of Service (QoS) of the network. Since traditional methods heavily rely on feature extraction and large scale labeled data, s...

cs.PF - Performance

标题作者发布日期PDF摘要
Scaling Intelligence: Designing Data Centers for Next-Gen Language ModelsJesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini2025-06-17下载The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and co...
Determinação Automática de Limiar de Detecção de Ataques em Redes de Computadores Utilizando AutoencodersLuan Gonçalves Miranda, Pedro Ivo da Cruz, Murilo Bellezoni Loiola2025-06-17下载Currently, digital security mechanisms like Anomaly Detection Systems using Autoencoders (AE) show great potential for bypassing problems intrinsic to the data, such as data imbalance.
Scalable GPU Performance Variability Analysis frameworkAnkur Lahiry, Ayush Pokharel, Seth Ockerman, Amal Gueroudji, Line Pouchard, Tanzima Z. Islam2025-06-17下载Analyzing large-scale performance logs from GPU profilers often requires terabytes of memory and hours of runtime, even for basic summaries. These constraints prevent timely insight and hinder the int...
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM AgentsQizheng Zhang, Michael Wornow, Gerry Wan, Kunle Olukotun2025-06-17下载LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements.

基于 VitePress 构建