Skip to content

2025-07-16

cs.AR - Architecture

标题作者发布日期PDF摘要
CRAFT: Latency and Cost-Aware Genetic-Based Framework for Node Placement in Edge-Fog EnvironmentsSoheil Mahdizadeh, Amir Mahdi Rasouli, Mohammad Pourashory, Sadra Galavani, Mohsen Ansari2025-07-16下载Reducing latency in the Internet of Things (IoT) is a critical concern. While cloud computing facilitates communication, it falls short of meeting real-time requirements reliably.
Characterizing State Space Model and Hybrid Language Model Performance with Long ContextSaptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon2025-07-16下载Emerging applications such as AR are driving demands for machine intelligence capable of processing continuous and/or long-context inputs on local devices.
High-Performance Pipelined NTT Accelerators with Homogeneous Digit-Serial Modulo ArithmeticGeorge Alexakis, Dimitrios Schoinianakis, Giorgos Dimitrakopoulos2025-07-16下载The Number Theoretic Transform (NTT) is a fundamental operation in privacy-preserving technologies, particularly within fully homomorphic encryption (FHE).
Accelerating Transistor-Level Simulation of Integrated Circuits via Equivalence of RC Long-Chain StructuresRuibai Tang, Wenlai Zhao2025-07-16下载Transistor-level simulation plays a vital role in validating the physical correctness of integrated circuits. However, such simulations are computationally expensive.
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and SummarizationPrashanth Vijayaraghavan, Apoorva Nitsure, Charles Mackin, Luyao Shi, Stefano Ambrogio, Arvind Haran, Viresh Paruthi, Ali Elzein, Dan Coops, David Beymer, Tyler Baldwin, Ehsan Degan2025-07-16下载Large Language Models (LLMs) have become widely used across diverse NLP tasks and domains, demonstrating their adaptability and effectiveness.
MOFCO: Mobility- and Migration-Aware Task Offloading in Three-Layer Fog Computing EnvironmentsSoheil Mahdizadeh, Elyas Oustad, Mohsen Ansari2025-07-16下载Task offloading in three-layer fog computing environments presents a critical challenge due to user equipment (UE) mobility, which frequently triggers costly service migrations and degrades overall sy...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM TrainingRui Li, Xiaoyun Zhi, Jinxin Chi, Menghan Yu, Lixin Huang, Jia Zhu, Weilun Zhang, Xing Ma, Wenjia Liu, Zhicheng Zhu, Daowen Luo, Zuquan Song, Xin Yin, Chao Xiang, Shuguang Wang, Wencong Xiao, Gene Cooperman2025-07-16下载Large Language Models (LLMs) have become a cornerstone of modern AI, driving breakthroughs in natural language processing and expanding into multimodal jobs involving images, audio, and video.
Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational DatabasesMd. Tanvir Alam, Md. Ahasanul Alam, Md Mahmudur Rahman, Md. Mosaddek Khan2025-07-16下载Relational databases (RDBs) are ubiquitous in enterprise and real-world applications. Flattening the database poses challenges for deep learning models that rely on fixed-size input representations to...
CRAFT: Latency and Cost-Aware Genetic-Based Framework for Node Placement in Edge-Fog EnvironmentsSoheil Mahdizadeh, Amir Mahdi Rasouli, Mohammad Pourashory, Sadra Galavani, Mohsen Ansari2025-07-16下载Reducing latency in the Internet of Things (IoT) is a critical concern. While cloud computing facilitates communication, it falls short of meeting real-time requirements reliably.
Incentivised Orchestrated Training Architecture (IOTA): A Technical Primer for ReleaseFelix Quinque, Alan Aboudib, Szymon Fonau, Rodrigo Lopez Portillo Alcocer, Brian McCrindle, Steffen Cruz2025-07-16下载In August 2024, Bittensor's Subnet 9 (SN9) demonstrated that a distributed network of incentivized, permissionless actors could each pretrain large language models (LLMs) ranging from 700 million to 1...
Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed StorageJunqing Lin, Jingwei Sun, Mingge Lu, Guangzhong Sun2025-07-16下载Sparse Matrix-Vector Multiplication (SpMV) has become a critical performance bottleneck in the local deployment of sparse Large Language Models (LLMs), where inference predominantly operates on worklo...
Urban Green Governance: IoT-Driven Management and Enhancement of Urban Green Spaces in CampobassoAntonio Salis, Gabriele Troina, Gianluca Boanelli, Marco Ottaviano, Paola Fortini, Soraya Versace2025-07-16下载The efficient design and management of public green spaces is a key factor in promoting the health and well-being of urban population, as emphasized by the WHO, UNEP, and EEA.
Distributed Algorithms for Potential ProblemsAlkida Balliu, Thomas Boudier, Francesco d'Amore, Fabian Kuhn, Dennis Olivetti, Gustav Schmid, Jukka Suomela2025-07-16下载In this work, we present a fast distributed algorithm for local potential problems: these are graph problems where the task is to find a locally optimal solution where no node can unilaterally improve...
ARRC: Explainable, Workflow-Integrated Recommender for Sustainable Resource Optimization Across the Edge-Cloud ContinuumBrian-Frederik Jahnke, René Brinkhege, Jan Peter Meyer, Daniel Tebernum, Falk Howar2025-07-16下载Achieving sustainable, explainable, and maintainable automation for resource optimization is a core challenge across the edge-cloud continuum.
MOFCO: Mobility- and Migration-Aware Task Offloading in Three-Layer Fog Computing EnvironmentsSoheil Mahdizadeh, Elyas Oustad, Mohsen Ansari2025-07-16下载Task offloading in three-layer fog computing environments presents a critical challenge due to user equipment (UE) mobility, which frequently triggers costly service migrations and degrades overall sy...
NineToothed: A Triton-Based High-Level Domain-Specific Language for Machine LearningJiacheng Huang, Zimin Li, Yinghui Li, Haojie Wang2025-07-16下载The emergence of deep learning domain-specific languages (DSLs) has substantially reduced the obstacles in developing high-performance, cross-platform compute kernels.
BlockBPE: Parallel BPE TokenizationAmos You2025-07-16下载Tokenization is a critical preprocessing step in large language model pipelines, yet widely-used implementations remain CPU-bound and suboptimal for batch inference workflows on GPU.
Making Serverless Computing Extensible: A Case Study of Serverless Data AnalyticsMinchen Yu, Yinghao Ren, Jiamu Zhao, Jiaqi Li2025-07-16下载Serverless computing has attracted a broad range of applications due to its ease of use and resource elasticity. However, developing serverless applications often poses a dilemma -- relying on general...
A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic SearchEhsan Futuhi, Nathan R. Sturtevant2025-07-16下载The rapid advancement of GPU technology has unlocked powerful parallel processing capabilities, creating new opportunities to enhance classic search algorithms.
Performance Cost Tradeoffs in Intelligent Load Balancing for Multi Data Center Cloud Systems: From Static Policies to Adaptive Resource DistributionSaeid Aghasoleymani Najafabadi, Elaheh Nabavi Nia2025-07-16下载Cloud computing infrastructures increasingly rely on geographically distributed data centers to meet the growing demand for low latency, high availability, and cost-efficient service delivery.
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AISamyam Rajbhandari, Mert Hidayetoglu, Aurick Qiao, Ye Wang, Juncheng Yang, Jeff Rasley, Michael Wyatt, Yuxiong He2025-07-16下载Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost. Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introdu...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
CRAFT: Latency and Cost-Aware Genetic-Based Framework for Node Placement in Edge-Fog EnvironmentsSoheil Mahdizadeh, Amir Mahdi Rasouli, Mohammad Pourashory, Sadra Galavani, Mohsen Ansari2025-07-16下载Reducing latency in the Internet of Things (IoT) is a critical concern. While cloud computing facilitates communication, it falls short of meeting real-time requirements reliably.
LLM-Based Config Synthesis requires DisambiguationRajdeep Mondal, Nikolaj Bjorner, Todd Millstein, Alan Tang, George Varghese2025-07-16下载Beyond hallucinations, another problem in program synthesis using LLMs is ambiguity in user intent. We illustrate the ambiguity problem in a networking context for LLM-based incremental configuration ...
FastReChain: A Novel Bidirectional Model-Based Algorithm for Topology Engineering of OCS-Based ClustersZihan Zhu, Xinchi Han, Dongchao Wu, Zhanbang Zhang, Jian Yang, Shizhen Zhao, Xinbing Wang2025-07-16下载Optical Circuit Switching (OCS) technology is increasingly being adopted in data centers due to its advantages of low power consumption and low technology refresh costs.
MOFCO: Mobility- and Migration-Aware Task Offloading in Three-Layer Fog Computing EnvironmentsSoheil Mahdizadeh, Elyas Oustad, Mohsen Ansari2025-07-16下载Task offloading in three-layer fog computing environments presents a critical challenge due to user equipment (UE) mobility, which frequently triggers costly service migrations and degrades overall sy...
AI-Native Open RAN for Non-Terrestrial Networks: An OverviewJikang Deng, S. Fizza Hassan, Hui Zhou, Saad Al-Ahmadi, Mohamed-Slim Alouini, Daniel B. Da Costa2025-07-16下载Non-terrestrial network (NTN) is envisioned as a critical component of Sixth Generation (6G) networks by enabling ubiquitous services and enhancing network resilience.
Extremal Testing for Network Software using LLMsRathin Singha, Harry Qian, Srinath Saikrishnan, Tracy Zhao, Ryan Beckett, Siva Kesava Reddy Kakarla, George Varghese2025-07-16下载Physicists often manually consider extreme cases when testing a theory. In this paper, we show how to automate extremal testing of network software using LLMs in two steps: first, ask the LLM to gener...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Rethinking the confidential cloud through a unified low-level abstraction for composable isolationAdrien Ghosn, Charly Castes, Neelu S. Kalani, Yuchen Qian, Marios Kogias, Edouard Bugnion2025-07-16下载Securing sensitive cloud workloads requires composing confidential virtual machines (CVMs) with nested enclaves or sandboxes. Unfortunately, each new isolation boundary adds ad-hoc access control mech...

cs.PF - Performance

标题作者发布日期PDF摘要
Accelerating Transistor-Level Simulation of Integrated Circuits via Equivalence of RC Long-Chain StructuresRuibai Tang, Wenlai Zhao2025-07-16下载Transistor-level simulation plays a vital role in validating the physical correctness of integrated circuits. However, such simulations are computationally expensive.
BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture searchAzhar Ikhtiarudin, Aditi Das, Param Thakkar, Akash Kundu2025-07-16下载We present BenchRL-QAS, a unified benchmarking framework for reinforcement learning (RL) in quantum architecture search (QAS) across a spectrum of variational quantum algorithm tasks on 2- to 8-qubit ...
Kevin: Multi-Turn RL for Generating CUDA KernelsCarlo Baronio, Pietro Marsella, Ben Pan, Simon Guo, Silas Alberti2025-07-16下载Writing GPU kernels is a challenging task and critical for AI systems' efficiency. It is also highly iterative: domain experts write code and improve performance through execution feedback.

基于 VitePress 构建