Skip to content

2025-07-14

cs.AR - Architecture

标题作者发布日期PDF摘要
OpenGCRAM: An Open-Source Gain Cell Compiler Enabling Design-Space Exploration for AI WorkloadsXinxin Wang, Lixian Yan, Shuhan Liu, Luke Upton, Zhuoqi Cai, Yiming Tan, Shengman Li, Koustav Jana, Peijing Li, Jesse Cirimelli-Low, Thierry Tambe, Matthew Guthaus, H. -S. Philip Wong2025-07-14下载Gain Cell memory (GCRAM) offers higher density and lower power than SRAM, making it a promising candidate for on-chip memory in domain-specific accelerators.
LASANA: Large-Scale Surrogate Modeling for Analog Neuromorphic Architecture ExplorationJason Ho, James A. Boyle, Linshen Liu, Andreas Gerstlauer2025-07-14下载Neuromorphic systems using in-memory or event-driven computing are motivated by the need for more energy-efficient processing of artificial intelligence workloads.
Solving the compute crisis with physics-based ASICsMaxwell Aifer, Zach Belateche, Suraj Bramhavar, Kerem Y. Camsari, Patrick J. Coles, Gavin Crooks, Douglas J. Durian, Andrea J. Liu, Anastasia Marchenkova, Antonio J. Martinez, Peter L. McMahon, Faris Sbahi, Benjamin Weiner, Logan G. Wright2025-07-14下载Escalating artificial intelligence (AI) demands expose a critical "compute crisis" characterized by unsustainable energy consumption, prohibitive training costs, and the approaching limits of conventi...
AssertCoder: LLM-Based Assertion Generation via Multimodal Specification ExtractionEnyuan Tian, Yiwei Ci, Qiusong Yang, Yufeng Li, Zhichao Lyu2025-07-14下载Assertion-Based Verification (ABV) is critical for ensuring functional correctness in modern hardware systems. However, manually writing high-quality SVAs remains labor-intensive and error-prone.
Evaluating LLM-based Workflows for Switched-Mode Power Supply DesignSimon Nau, Jan Krummenauer, André Zimmermann2025-07-14下载Large language models (LLMs) have great potential to enhance productivity in many disciplines, such as software engineering. However, it is unclear to what extent they can assist in the design process...
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model ServingWonung Kim, Yubin Lee, Yoonsung Kim, Jinwoo Hwang, Seongryong Oh, Jiyong Jung, Aziz Huseynov, Woong Gyu Park, Chang Hyun Park, Divya Mahajan, Jongse Park2025-07-14下载Transformers are the driving force behind today's Large Language Models (LLMs), serving as the foundation for their performance and versatility.
AnalogTester: A Large Language Model-Based Framework for Automatic Testbench Generation in Analog Circuit DesignWeiyu Chen, Chengjie Liu, Wenhao Huang, Jinyang Lyu, Mingqian Yang, Yuan Du, Li Du, Jun Yang2025-07-14下载Recent advancements have demonstrated the significant potential of large language models (LLMs) in analog circuit design. Nevertheless, testbench construction for analog circuits remains manual, creat...
Iceberg: Enhancing HLS Modeling with Synthetic DataZijian Ding, Tung Nguyen, Weikai Li, Aditya Grover, Yizhou Sun, Jason Cong2025-07-14下载Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models thr...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Stream programs are monoid homomorphisms with stateTyler Hou, Michael Arntzenius, Max Willsey2025-07-14下载We define a broad class of deterministic stream functions and show they can be implemented as homomorphisms into a "state" monoid. The homomorphism laws are simpler than the conditions of previous sem...
Dissecting the NVIDIA Blackwell Architecture with MicrobenchmarksAaron Jarmusch, Nathan Graddon, Sunita Chandrasekaran2025-07-14下载The rapid development in scientific research provides a need for more compute power, which is partly being solved by GPUs. This paper presents a microarchitectural analysis of the modern NVIDIA Blackw...
FAFO: Over 1 million TPS on a single node running EVM while still Merkleizing every blockRyan Zarick, Isaac Zhang, Daniel Wong, Thomas Kim, Bryan Pellegrino, Mignon Li, Kelvin Wong2025-07-14下载Current blockchain execution throughput is limited by data contention, reducing execution layer parallelism. Fast Ahead-of-Formation Optimization (FAFO) is the first blockchain transaction scheduler t...
Access Control for Information-Theoretically Secure Key-Document StoresYin Li, Sharad Mehrota, Shantanu Sharma, Komal Kumari2025-07-14下载This paper presents a novel key-based access control technique for secure outsourcing key-value stores where values correspond to documents that are indexed and accessed using keys.
Environmentally-Conscious Cloud Orchestration Considering Geo-Distributed Data CentersGiulio Attenni, Novella Bartolini2025-07-14下载This paper presents a theoretical discussion for environmentally-conscious job deployment and migration in cloud environments, aiming to minimize the environmental impact of resource provisioning whil...
Efficient Federated Learning with Heterogeneous Data and Adaptive DropoutJi Liu, Beichen Ma, Qiaolin Yu, Ruoming Jin, Jingbo Zhou, Yang Zhou, Huaiyu Dai, Haixun Wang, Dejing Dou, Patrick Valduriez2025-07-14下载Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.
Consensus, Inconsistency, Emergence: what's paraconsistency got to do with it?Gabriel Rocha2025-07-14下载The consensus problem, briefly stated, consists of having processes in an asynchronous distributed system agree on a value. It is widely known that the consensus problem does not have a deterministic ...
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU ClustersRunsheng Benson Guo, Utkarsh Anand, Khuzaima Daudjee, Rathijit Sen2025-07-14下载Large language models (LLMs) require vast amounts of GPU compute to train, but limited availability and high costs of GPUs make homogeneous clusters impractical for many organizations.
FalconFS: Distributed File System for Large-Scale Deep Learning PipelineJingwei Xu, Junbin Kang, Mingkai Dong, Mingyu Liu, Lu Zhang, Shaohong Guo, Ziyan Qiu, Mingzhen You, Ziyi Tian, Anqi Yu, Tianhong Ding, Xinwei Hu, Haibo Chen2025-07-14下载Client-side metadata caching has long been considered an effective method for accelerating metadata operations in distributed file systems (DFSs). However, we have found that client-side state (e.g.
Convergence of Agnostic Federated AveragingHerlock, Rahimi, Dionysis Kalogerias2025-07-14下载Federated learning (FL) enables decentralized model training without centralizing raw data. However, practical FL deployments often face a key realistic challenge: Clients participate intermittently i...
Temporal-Aware GPU Resource Allocation for Distributed LLM Inference via Reinforcement LearningChengze Du, Zhiwei Yu, Heng Xu, Haojie Wang, Bo liu, Jialong Li2025-07-14下载The rapid growth of large language model (LLM) services imposes increasing demands on distributed GPU inference infrastructure. Most existing scheduling systems follow a reactive paradigm, relying sol...
Domain Borders Are There to Be Crossed With Federated Few-Shot AdaptationManuel Röder, Christoph Raab, Frank-Michael Schleif2025-07-14下载Federated Learning has emerged as a leading paradigm for decentralized, privacy-preserving learning, particularly relevant in the era of interconnected edge devices equipped with sensors.
Past-Future Scheduler for LLM Serving under SLA GuaranteesRuihao Gong, Shihao Bai, Siyu Wu, Yunqian Fan, Zaijun Wang, Xiuhong Li, Hailong Yang, Xianglong Liu2025-07-14下载The exploration and application of Large Language Models (LLMs) is thriving. To reduce deployment costs, continuous batching has become an essential feature in current service frameworks.
Large-Scale Graph Building in Dynamic Environments: Low Latency and High QualityFilipe Miguel Gonçalves de Almeida, CJ Carey, Hendrik Fichtenberger, Jonathan Halcrow, Silvio Lattanzi, André Linhares, Tao Meng, Ashkan Norouzi-Fard, Nikos Parotsidis, Bryan Perozzi, David Simcha2025-07-14下载Learning and constructing large-scale graphs has attracted attention in recent decades, resulting in a rich literature that introduced various systems, tools, and algorithms.
A Model Aware AIGC Task Offloading Algorithm in IIoT Edge ComputingXin Wang, Xiao Huan Li, Xun Wang2025-07-14下载The integration of the Industrial Internet of Things (IIoT) with Artificial Intelligence-Generated Content (AIGC) offers new opportunities for smart manufacturing, but it also introduces challenges re...
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal ParallelismZedong Liu, Shenggan Cheng, Guangming Tan, Yang You, Dingwen Tao2025-07-14下载Multimodal large language models (MLLMs) extend LLMs to handle images, videos, and audio by incorporating feature extractors and projection modules.
EAT: QoS-Aware Edge-Collaborative AIGC Task Scheduling via Attention-Guided Diffusion Reinforcement LearningZhifei Xu, Zhiqing Tang, Jiong Lou, Zhi Yao, Xuan Xie, Tian Wang, Yinglong Wang, Weijia Jia2025-07-14下载The growth of Artificial Intelligence (AI) and large language models has enabled the use of Generative AI (GenAI) in cloud data centers for diverse AI-Generated Content (AIGC) tasks.
Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed InferenceJiaming Cheng, Duong Tung Nguyen2025-07-14下载This letter investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers (DCs) over time.
Intelligent Task Management via Dynamic Multi-region Division in LEO Satellite NetworksZixuan Song, Zhishu Shen, Xiaoyu Zheng, Qiushi Zheng, Zheng Lei, Jiong Jin2025-07-14下载As a key complement to terrestrial networks and a fundamental component of future 6G systems, Low Earth Orbit (LEO) satellite networks are expected to provide high-quality communication services when ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Fault-Tolerant Architecture for Urban and Rural Digital Connectivity: Synergizing SDWMN, Direct-to-Mobile Broadcasting, and Hybrid Cloud StreamingPavel Malinovskiy2025-07-14下载We propose an integrated architecture combining Software-Defined Wireless Mesh Networks (SDWMN), Direct-to-Mobile (D2M) broadcasting, and Kafka-based hybrid cloud streaming to improve wireless network...
FAFO: Over 1 million TPS on a single node running EVM while still Merkleizing every blockRyan Zarick, Isaac Zhang, Daniel Wong, Thomas Kim, Bryan Pellegrino, Mignon Li, Kelvin Wong2025-07-14下载Current blockchain execution throughput is limited by data contention, reducing execution layer parallelism. Fast Ahead-of-Formation Optimization (FAFO) is the first blockchain transaction scheduler t...
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AIJiangkai Wu, Zhiyuan Ren, Liming Liu, Xinggong Zhang2025-07-14下载AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model (MLLM).
On Splitting Lightweight Semantic Image Segmentation for Wireless CommunicationsEbrahim Abu-Helalah, Jordi Serra, Jordi Perez-Romero2025-07-14下载Semantic communication represents a promising technique towards reducing communication costs, especially when dealing with image segmentation, but it still lacks a balance between computational effici...
DNS Tunneling: Threat Landscape and Improved Detection SolutionsNovruz Amirov, Baran Isik, Bilal Ihsan Tuncer, Serif Bahtiyar2025-07-14下载Detecting Domain Name System (DNS) tunneling is a significant challenge in security due to its capacity to hide harmful actions within DNS traffic that appears to be normal and legitimate.
Temporal-Aware GPU Resource Allocation for Distributed LLM Inference via Reinforcement LearningChengze Du, Zhiwei Yu, Heng Xu, Haojie Wang, Bo liu, Jialong Li2025-07-14下载The rapid growth of large language model (LLM) services imposes increasing demands on distributed GPU inference infrastructure. Most existing scheduling systems follow a reactive paradigm, relying sol...
Fine-Grained Coordinated OFDMA With Fiber Backhaul Enabled by openwifi and White RabbitThijs Havinga, Xianjun Jiao, Wei Liu, Baiheng Chen, Robbe Gaeremynck, Ingrid Moerman2025-07-14下载Proper coordination is needed to guarantee the performance of wireless networks in dense deployments. Contention-based systems suffer badly in terms of latency when multiple devices compete for the sa...
Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed InferenceJiaming Cheng, Duong Tung Nguyen2025-07-14下载This letter investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers (DCs) over time.
Endorsement-Driven Blockchain SSI Framework for Dynamic IoT EcosystemsGuntur Dharma Putra, Bagus Rakadyanto Oktavianto Putra2025-07-14下载Self-Sovereign Identity (SSI) offers significant potential for managing identities in the Internet of Things (IoT), enabling decentralized authentication and credential management without reliance on ...
UavNetSim-v1: A Python-based Simulation Platform for UAV Communication NetworksZihao Zhou, Zipeng Dai, Linyi Huang, Cui Yang, Youjun Xiang, Jie Tang, Kai-kit Wong2025-07-14下载In unmanned aerial vehicle (UAV) networks, communication protocols and algorithms are essential for cooperation and collaboration between UAVs.

cs.PF - Performance

标题作者发布日期PDF摘要
FalconFS: Distributed File System for Large-Scale Deep Learning PipelineJingwei Xu, Junbin Kang, Mingkai Dong, Mingyu Liu, Lu Zhang, Shaohong Guo, Ziyan Qiu, Mingzhen You, Ziyi Tian, Anqi Yu, Tianhong Ding, Xinwei Hu, Haibo Chen2025-07-14下载Client-side metadata caching has long been considered an effective method for accelerating metadata operations in distributed file systems (DFSs). However, we have found that client-side state (e.g.

基于 VitePress 构建