2025-11-05

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing	Mohsen Ahmadzadeh, Kaichang Chen, Georges Gielen	2025-11-05	下载	Analog/mixed-signal circuits are key for interfacing electronics with the physical world. Their design, however, remains a largely handcrafted process, resulting in long and error-prone design cycles.
ML-PCM : Machine Learning Technique for Write Optimization in Phase Change Memory (PCM)	Mahek Desai, Rowena Quinn, Marjan Asadinia	2025-11-05	下载	As transistor-based memory technologies like dynamic random access memory (DRAM) approach their scalability limits, the need to explore alternative storage solutions becomes increasingly urgent.
SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory	Mahek Desai, Rowena Quinn, Marjan Asadinia	2025-11-05	下载	As dynamic random access memory (DRAM) and other current transistor-based memories approach their scalability limits, the search for alternative storage methods becomes increasingly urgent.
LoRA-Edge: Tensor-Train-Assisted LoRA for Practical CNN Fine-Tuning on Edge Devices	Hyunseok Kwak, Kyeongwon Lee, Jae-Jin Lee, Woojoo Lee	2025-11-05	下载	On-device fine-tuning of CNNs is essential to withstand domain shift in edge applications such as Human Activity Recognition (HAR), yet full fine-tuning is infeasible under strict memory, compute, and...
Design and Optimization of Mixed-Kernel Mixed-Signal SVMs for Flexible Electronics	Florentia Afentaki, Maha Shatta, Konstantinos Balaskas, Georgios Panagopoulos, Georgios Zervakis, Mehdi B. Tahoori	2025-11-05	下载	Flexible Electronics (FE) have emerged as a promising alternative to silicon-based technologies, offering on-demand low-cost fabrication, conformality, and sustainability.
LaMoS: Enabling Efficient Large Number Modular Multiplication through SRAM-based CiM Acceleration	Haomin Li, Fangxin Liu, Chenyang Guan, Zongwu Wang, Li Jiang, Haibing Guan	2025-11-05	下载	Barrett's algorithm is one of the most widely used methods for performing modular multiplication, a critical nonlinear operation in modern privacy computing techniques such as homomorphic encryption (...
Delay Time Characterization on FPGA: A Low Nonlinearity, Picosecond Resolution Time-to-Digital Converter on 16-nm FPGA using Bin Sequence Calibration	Sunwoo Park, Byungkwon Park, Eunsung Kim, Jiwon Yune, Seungho Han, Seunggo Nam	2025-11-05	下载	We present a Time-to-Digital Converter (TDC) implemented on a 16 nm Xilinx UltraScale Plus FPGA that achieves a resolution of 1.15 ps, RMS precision of 3.
An Event-Driven Spiking Compute-In-Memory Macro based on SOT-MRAM	Deyang Yu, Chenchen Liu, Chuanjie Zhang, Xiao Fang, Weisheng Zhao	2025-11-05	下载	The application of Magnetic Random-Access Memory (MRAM) in computing-in-memory (CIM) has gained significant attention. However, existing designs often suffer from high energy consumption due to their ...
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators	Jonathan Li, Nasim Farahini, Evgenii Iuliugin, Magnus Vesterlund, Christian Häggström, Guangtao Wang, Shubhangi Upasani, Ayush Sachdeva, Rui Li, Faline Fu, Chen Wu, Ayesha Siddiqua, John Long, Tuowen Zhao, Matheen Musaddiq, Håkan Zeffer, Yun Du, Mingran Wang, Qinghua Li, Bo Li, Urmish Thakker, Raghu Prabhakar	2025-11-05	下载	The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches.
LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators	Changhong Li, Biswajit Basu, Shreejith Shanker	2025-11-05	下载	FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms	Arijit Bhattacharjee, Ali TehraniJamsaz, Le Chen, Niranjan Hasabnis, Mihai Capota, Nesreen Ahmed, Ali Jannesari	2025-11-05	下载	Recent advances in large language models (LLMs) have significantly accelerated progress in code translation, enabling more accurate and efficient transformation across programming languages.
A General Input-Dependent Colorless Computability Theorem and Applications to Core-Dependent Adversaries	Yannis Coutouly, Emmanuel Godard	2025-11-05	下载	Distributed computing tasks can be presented with a triple (\I,\Ou,Δ). The solvability of a colorless task on the Iterated Immediate Snapshot model (IIS) has been characterized by the Colorless Comp...
Stone Duality Proofs for Colorless Distributed Computability Theorems	Cameron Calk, Emmanuel Godard	2025-11-05	下载	We introduce a new topological encoding of executions of round-based, full-information distributed protocols via spectral spaces. Such protocols constitute a model of distributed computations which ar...
Investigating the Impact of Isolation on Synchronized Benchmarks	Nils Japke, Furat Hamdan, Diana Baumann, David Bermbach	2025-11-05	下载	Benchmarking in cloud environments suffers from performance variability from multi-tenant resource contention. Duet benchmarking mitigates this by running two workload versions concurrently on the sam...
AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism	Wendong Xu, Chujie Chen, He Xiao, Kuan Li, Jing Xiong, Chen Zhang, Wenyong Zhou, Chaofan Tao, Yang Bai, Bei Yu, Ngai Wong	2025-11-05	下载	Large Language Model (LLM) inference services demand exceptionally high availability and low latency, yet multi-GPU Tensor Parallelism (TP) makes them vulnerable to single-GPU failures.
Universal Quantum Simulation of 50 Qubits on Europe`s First Exascale Supercomputer Harnessing Its Heterogeneous CPU-GPU Architecture	Hans De Raedt, Jiri Kraus, Andreas Herten, Vrinda Mehta, Mathis Bode, Markus Hrywniak, Kristel Michielsen, Thomas Lippert	2025-11-05	下载	We have developed a new version of the high-performance Jülich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer,...
UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM	Hai Huang, Xuhong Qiang, Weisheng Zhao, Chenchen Liu	2025-11-05	下载	Large Language Models (LLMs) are increasingly deployed on edge devices with Neural Processing Units (NPUs), yet the decode phase remains memory-intensive, limiting performance.
Characterising Global Platforms: Centralised, Decentralised, Federated, and Grassroots	Ehud Shapiro	2025-11-05	下载	Global digital platforms are software systems designed to serve entire populations, with some already serving billions of people. We propose atomic transactions-based multiagent transition systems and...
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators	Jonathan Li, Nasim Farahini, Evgenii Iuliugin, Magnus Vesterlund, Christian Häggström, Guangtao Wang, Shubhangi Upasani, Ayush Sachdeva, Rui Li, Faline Fu, Chen Wu, Ayesha Siddiqua, John Long, Tuowen Zhao, Matheen Musaddiq, Håkan Zeffer, Yun Du, Mingran Wang, Qinghua Li, Bo Li, Urmish Thakker, Raghu Prabhakar	2025-11-05	下载	The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond	Botao 'Amber' Hu, Helena Rong	2025-11-05	下载	As the "agentic web" takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human oversight to protocol design.
Integrity Under Siege: A Rogue gNodeB's Manipulation of 5G Network Slice Allocation	Jiali Xu, Valeria Loscri, Romain Rouvoy	2025-11-05	下载	The advent of 5G networks, with network slicing as a cornerstone technology, promises customized, high-performance services, but also introduces novel attack surfaces beyond traditional threats.
Joint Optimization of DNN Model Caching and Request Routing in Mobile Edge Computing	Shuting Qiu, Fang Dong, Siyu Tan, Ruiting Zhou, Dian Shen, Patrick P. C. Lee, Qilin Fan	2025-11-05	下载	Mobile edge computing (MEC) can pre-cache deep neural networks (DNNs) near end-users, providing low-latency services and improving users' quality of experience (QoE).
Handover Configurations in Operational 5G Networks: Diversity, Evolution, and Impact on Performance	Moinak Ghoshal, Imran Khan, Phuc Dinh, Z. Jonny Kong, Omar Basit, Sizhe Wang, Yufei Feng, Y. Charlie Hu, Dimitrios Koutsonikolas	2025-11-05	下载	Mobility management in cellular networks, especially the handover (HO) process, plays a key role in providing seamless and ubiquitous Internet access.
CRSF: Enabling QoS-Aware Beyond-Connectivity Service Sharing in 6G Local Networks	Pragya Sharma, Amanda Xiang, Abbas Kiani, John Kaippallimalil, Tony Saboorian, Haining Wang	2025-11-05	下载	Sixth-generation (6G) networks are envisioned to support interconnected local subnetworks that can share specialized, beyond-connectivity services.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms	Arijit Bhattacharjee, Ali TehraniJamsaz, Le Chen, Niranjan Hasabnis, Mihai Capota, Nesreen Ahmed, Ali Jannesari	2025-11-05	下载	Recent advances in large language models (LLMs) have significantly accelerated progress in code translation, enabling more accurate and efficient transformation across programming languages.
One Size Does Not Fit All: Architecture-Aware Adaptive Batch Scheduling with DEBA	François Belias, Naser Ezzati-Jivan, Foutse Khomh	2025-11-05	下载	Adaptive batch size methods aim to accelerate neural network training, but existing approaches apply identical adaptation strategies across all architectures, assuming a one-size-fits-all solution.
PerfDojo: Automated ML Library Generation for Heterogeneous Architectures	Andrei Ivanov, Siyuan Shen, Gioele Gottardo, Marcin Chrapek, Afif Boudaoud, Timo Schneider, Luca Benini, Torsten Hoefler	2025-11-05	下载	The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge.
Exploring Topologies in Quantum Annealing: A Hardware-Aware Perspective	Mario Bifulco, Luca Roversi	2025-11-05	下载	Quantum Annealing (QA) offers a promising framework for solving NP-hard optimization problems, but its effectiveness is constrained by the topology of the underlying quantum hardware.