Skip to content

2025-11-05

cs.AR - Architecture

标题作者发布日期PDF摘要
AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit SizingMohsen Ahmadzadeh, Kaichang Chen, Georges Gielen2025-11-05下载Analog/mixed-signal circuits are key for interfacing electronics with the physical world. Their design, however, remains a largely handcrafted process, resulting in long and error-prone design cycles.
ML-PCM : Machine Learning Technique for Write Optimization in Phase Change Memory (PCM)Mahek Desai, Rowena Quinn, Marjan Asadinia2025-11-05下载As transistor-based memory technologies like dynamic random access memory (DRAM) approach their scalability limits, the need to explore alternative storage solutions becomes increasingly urgent.
SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change MemoryMahek Desai, Rowena Quinn, Marjan Asadinia2025-11-05下载As dynamic random access memory (DRAM) and other current transistor-based memories approach their scalability limits, the search for alternative storage methods becomes increasingly urgent.
LoRA-Edge: Tensor-Train-Assisted LoRA for Practical CNN Fine-Tuning on Edge DevicesHyunseok Kwak, Kyeongwon Lee, Jae-Jin Lee, Woojoo Lee2025-11-05下载On-device fine-tuning of CNNs is essential to withstand domain shift in edge applications such as Human Activity Recognition (HAR), yet full fine-tuning is infeasible under strict memory, compute, and...
Design and Optimization of Mixed-Kernel Mixed-Signal SVMs for Flexible ElectronicsFlorentia Afentaki, Maha Shatta, Konstantinos Balaskas, Georgios Panagopoulos, Georgios Zervakis, Mehdi B. Tahoori2025-11-05下载Flexible Electronics (FE) have emerged as a promising alternative to silicon-based technologies, offering on-demand low-cost fabrication, conformality, and sustainability.
LaMoS: Enabling Efficient Large Number Modular Multiplication through SRAM-based CiM AccelerationHaomin Li, Fangxin Liu, Chenyang Guan, Zongwu Wang, Li Jiang, Haibing Guan2025-11-05下载Barrett's algorithm is one of the most widely used methods for performing modular multiplication, a critical nonlinear operation in modern privacy computing techniques such as homomorphic encryption (...
Delay Time Characterization on FPGA: A Low Nonlinearity, Picosecond Resolution Time-to-Digital Converter on 16-nm FPGA using Bin Sequence CalibrationSunwoo Park, Byungkwon Park, Eunsung Kim, Jiwon Yune, Seungho Han, Seunggo Nam2025-11-05下载We present a Time-to-Digital Converter (TDC) implemented on a 16 nm Xilinx UltraScale Plus FPGA that achieves a resolution of 1.15 ps, RMS precision of 3.
An Event-Driven Spiking Compute-In-Memory Macro based on SOT-MRAMDeyang Yu, Chenchen Liu, Chuanjie Zhang, Xiao Fang, Weisheng Zhao2025-11-05下载The application of Magnetic Random-Access Memory (MRAM) in computing-in-memory (CIM) has gained significant attention. However, existing designs often suffer from high energy consumption due to their ...
SnapStream: Efficient Long Sequence Decoding on Dataflow AcceleratorsJonathan Li, Nasim Farahini, Evgenii Iuliugin, Magnus Vesterlund, Christian Häggström, Guangtao Wang, Shubhangi Upasani, Ayush Sachdeva, Rui Li, Faline Fu, Chen Wu, Ayesha Siddiqua, John Long, Tuowen Zhao, Matheen Musaddiq, Håkan Zeffer, Yun Du, Mingran Wang, Qinghua Li, Bo Li, Urmish Thakker, Raghu Prabhakar2025-11-05下载The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches.
LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning AcceleratorsChanghong Li, Biswajit Basu, Shreejith Shanker2025-11-05下载FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing ParadigmsArijit Bhattacharjee, Ali TehraniJamsaz, Le Chen, Niranjan Hasabnis, Mihai Capota, Nesreen Ahmed, Ali Jannesari2025-11-05下载Recent advances in large language models (LLMs) have significantly accelerated progress in code translation, enabling more accurate and efficient transformation across programming languages.
A General Input-Dependent Colorless Computability Theorem and Applications to Core-Dependent AdversariesYannis Coutouly, Emmanuel Godard2025-11-05下载Distributed computing tasks can be presented with a triple (\I,\Ou,Δ). The solvability of a colorless task on the Iterated Immediate Snapshot model (IIS) has been characterized by the Colorless Comp...
Stone Duality Proofs for Colorless Distributed Computability TheoremsCameron Calk, Emmanuel Godard2025-11-05下载We introduce a new topological encoding of executions of round-based, full-information distributed protocols via spectral spaces. Such protocols constitute a model of distributed computations which ar...
Investigating the Impact of Isolation on Synchronized BenchmarksNils Japke, Furat Hamdan, Diana Baumann, David Bermbach2025-11-05下载Benchmarking in cloud environments suffers from performance variability from multi-tenant resource contention. Duet benchmarking mitigates this by running two workload versions concurrently on the sam...
AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor ParallelismWendong Xu, Chujie Chen, He Xiao, Kuan Li, Jing Xiong, Chen Zhang, Wenyong Zhou, Chaofan Tao, Yang Bai, Bei Yu, Ngai Wong2025-11-05下载Large Language Model (LLM) inference services demand exceptionally high availability and low latency, yet multi-GPU Tensor Parallelism (TP) makes them vulnerable to single-GPU failures.
Universal Quantum Simulation of 50 Qubits on Europe`s First Exascale Supercomputer Harnessing Its Heterogeneous CPU-GPU ArchitectureHans De Raedt, Jiri Kraus, Andreas Herten, Vrinda Mehta, Mathis Bode, Markus Hrywniak, Kristel Michielsen, Thomas Lippert2025-11-05下载We have developed a new version of the high-performance Jülich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer,...
UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIMHai Huang, Xuhong Qiang, Weisheng Zhao, Chenchen Liu2025-11-05下载Large Language Models (LLMs) are increasingly deployed on edge devices with Neural Processing Units (NPUs), yet the decode phase remains memory-intensive, limiting performance.
Characterising Global Platforms: Centralised, Decentralised, Federated, and GrassrootsEhud Shapiro2025-11-05下载Global digital platforms are software systems designed to serve entire populations, with some already serving billions of people. We propose atomic transactions-based multiagent transition systems and...
SnapStream: Efficient Long Sequence Decoding on Dataflow AcceleratorsJonathan Li, Nasim Farahini, Evgenii Iuliugin, Magnus Vesterlund, Christian Häggström, Guangtao Wang, Shubhangi Upasani, Ayush Sachdeva, Rui Li, Faline Fu, Chen Wu, Ayesha Siddiqua, John Long, Tuowen Zhao, Matheen Musaddiq, Håkan Zeffer, Yun Du, Mingran Wang, Qinghua Li, Bo Li, Urmish Thakker, Raghu Prabhakar2025-11-05下载The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and BeyondBotao 'Amber' Hu, Helena Rong2025-11-05下载As the "agentic web" takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human oversight to protocol design.
Integrity Under Siege: A Rogue gNodeB's Manipulation of 5G Network Slice AllocationJiali Xu, Valeria Loscri, Romain Rouvoy2025-11-05下载The advent of 5G networks, with network slicing as a cornerstone technology, promises customized, high-performance services, but also introduces novel attack surfaces beyond traditional threats.
Joint Optimization of DNN Model Caching and Request Routing in Mobile Edge ComputingShuting Qiu, Fang Dong, Siyu Tan, Ruiting Zhou, Dian Shen, Patrick P. C. Lee, Qilin Fan2025-11-05下载Mobile edge computing (MEC) can pre-cache deep neural networks (DNNs) near end-users, providing low-latency services and improving users' quality of experience (QoE).
Handover Configurations in Operational 5G Networks: Diversity, Evolution, and Impact on PerformanceMoinak Ghoshal, Imran Khan, Phuc Dinh, Z. Jonny Kong, Omar Basit, Sizhe Wang, Yufei Feng, Y. Charlie Hu, Dimitrios Koutsonikolas2025-11-05下载Mobility management in cellular networks, especially the handover (HO) process, plays a key role in providing seamless and ubiquitous Internet access.
CRSF: Enabling QoS-Aware Beyond-Connectivity Service Sharing in 6G Local NetworksPragya Sharma, Amanda Xiang, Abbas Kiani, John Kaippallimalil, Tony Saboorian, Haining Wang2025-11-05下载Sixth-generation (6G) networks are envisioned to support interconnected local subnetworks that can share specialized, beyond-connectivity services.

cs.PF - Performance

标题作者发布日期PDF摘要
OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing ParadigmsArijit Bhattacharjee, Ali TehraniJamsaz, Le Chen, Niranjan Hasabnis, Mihai Capota, Nesreen Ahmed, Ali Jannesari2025-11-05下载Recent advances in large language models (LLMs) have significantly accelerated progress in code translation, enabling more accurate and efficient transformation across programming languages.
One Size Does Not Fit All: Architecture-Aware Adaptive Batch Scheduling with DEBAFrançois Belias, Naser Ezzati-Jivan, Foutse Khomh2025-11-05下载Adaptive batch size methods aim to accelerate neural network training, but existing approaches apply identical adaptation strategies across all architectures, assuming a one-size-fits-all solution.
PerfDojo: Automated ML Library Generation for Heterogeneous ArchitecturesAndrei Ivanov, Siyuan Shen, Gioele Gottardo, Marcin Chrapek, Afif Boudaoud, Timo Schneider, Luca Benini, Torsten Hoefler2025-11-05下载The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge.
Exploring Topologies in Quantum Annealing: A Hardware-Aware PerspectiveMario Bifulco, Luca Roversi2025-11-05下载Quantum Annealing (QA) offers a promising framework for solving NP-hard optimization problems, but its effectiveness is constrained by the topology of the underlying quantum hardware.

基于 VitePress 构建