NLP88 [2025-2] 전연주 - Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning 논문 링크: https://arxiv.org/abs/2407.16920 Train-Attention: Meta-Learning Where to Focus in Continual Knowledge LearningPrevious studies on continual knowledge learning (CKL) in large language models (LLMs) have predominantly focused on approaches such as regularization, architectural modifications, and rehearsal techniques to mitigate catastrophic forgetting. However, thesarxiv.orgConference: Neur.. 2025. 12. 6. [2025-2] 박제우 - The Impact of Reasoning Step Length on Large Language Models https://arxiv.org/abs/2401.04925 The Impact of Reasoning Step Length on Large Language ModelsChain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on thiarxiv.org본 논문은 2024 ACL Findings에 등재된 논문으로, 2025년.. 2025. 12. 6. [2025-2] 최민서 - Direct Preference Optimization:Your Language Model is Secretly a Reward Model [논문링크] https://arxiv.org/abs/2305.18290 Direct Preference Optimization: Your Language Model is Secretly a Reward ModelWhile large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining sarxiv.org 1. Introductio.. 2025. 11. 19. [2025-2] 정유림 - Quantifying Attention Flow in Transformers 논문 개요논문 제목: Quantifying Attention Flow in Transformers게재 연도: 2020 (arXiv:2005.00928)인용 횟수: 2025.11.08 기준 1331회 인용논문 배경 : Attention 시각화 = 설명일까?Self-Attention은 각 토큰이 다른 토큰을 얼마나 참조하는지를 수치화하니, 사람들은 attention heatmap을 곧잘 explanation처럼 사용했었음.하지만 Transformer는 레이어를 거치며 정보가 contextualization + mixing되고, residual connection과 FFN을 통해 정보가 우회/축적됨. 그래서 높은 레이어의 raw attention은 종종 uniform(평평)해지고, 토큰 기여도를 직관적으로 읽.. 2025. 11. 8. 이전 1 2 3 4 ··· 22 다음