Publications - Juhyeon Kim

Reliable Citations in LLMs using Saliency - RISE

Juhyeon Kim, Sanghack Lee

Working Progres 2025 Spotlight

Working Progress

Reliable Citations in LLMs using Saliency - RISE

Juhyeon Kim, Sanghack Lee

Working Progres 2025 Spotlight

Working Progress

Learning Causal Structures in Non-Stationary Time Series with Regime-Adaptive Discovery

Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee

Working Progres 2025 Spotlight

Working Progress

Learning Causal Structures in Non-Stationary Time Series with Regime-Adaptive Discovery

Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee

Working Progres 2025 Spotlight

Working Progress

CAUSAL DISCOVERY WITH DIFFUSION WITH MOE

Juhyeon Kim, Sanghack Lee

Working Progres 2025 Spotlight

Working Progress

CAUSAL DISCOVERY WITH DIFFUSION WITH MOE

Juhyeon Kim, Sanghack Lee

Working Progres 2025 Spotlight

Working Progress

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

Juhyeon Kim, Seongwook Son, Jiyong Ahn, Sanghack Lee

Working Progress 2024 Spotlight

Working Progress

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

Juhyeon Kim, Seongwook Son, Jiyong Ahn, Sanghack Lee

Working Progress 2024 Spotlight

Working Progress

On Incorporating Prior Knowledge Extracted from Pre-trained Language Models into Causal Discovery

Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee

Neural Information Processing Systems (NeurIPS) Workshop on Causality and Large Models 2024 Spotlight

Pre-trained Language Models (PLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, proving their effectiveness even when data is scarce. However, there are crucial limitations in current PLM-based causal reasoning methods: i) PLM cannot utilize large datasets in prompt due to the limits of context length, and ii) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large enough. To overcome each other approaches’ limitations, we propose a new framework that integrates PLMs-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to the time-series data and exhibits superior performance.

[Code]

On Incorporating Prior Knowledge Extracted from Pre-trained Language Models into Causal Discovery

Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee

Neural Information Processing Systems (NeurIPS) Workshop on Causality and Large Models 2024 Spotlight

Pre-trained Language Models (PLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, proving their effectiveness even when data is scarce. However, there are crucial limitations in current PLM-based causal reasoning methods: i) PLM cannot utilize large datasets in prompt due to the limits of context length, and ii) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large enough. To overcome each other approaches’ limitations, we propose a new framework that integrates PLMs-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to the time-series data and exhibits superior performance.

[Code]

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee

Arxiv 2023 Spotlight

Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.

[Code]

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee

Arxiv 2023 Spotlight

Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.

[Code]

Detecting Causality by Data Augmentation via Part-of-Speech tagging

Juhyeon Kim, Yesong Choi, Sanghack Lee

CASE at EMNLP 2022 Spotlight

Finding causal relations in texts has been a challenge since it requires methods ranging from defining event ontologies to developing proper algorithmic approaches. In this paper, we developed a framework which classifies whether a given sentence contains a causal event. As our approach, we exploited an external corpus that has causal labels to overcome the small size of the original corpus (Causal News Corpus) provided by task organizers. Further, we employed a data augmentation technique utilizing Part Of-Speech (POS) based on our observation that some parts of speech are more (or less) relevant to causality. Our approach especially improved the recall of detecting causal events in sentences.

Detecting Causality by Data Augmentation via Part-of-Speech tagging

Juhyeon Kim, Yesong Choi, Sanghack Lee

CASE at EMNLP 2022 Spotlight

Finding causal relations in texts has been a challenge since it requires methods ranging from defining event ontologies to developing proper algorithmic approaches. In this paper, we developed a framework which classifies whether a given sentence contains a causal event. As our approach, we exploited an external corpus that has causal labels to overcome the small size of the original corpus (Causal News Corpus) provided by task organizers. Further, we employed a data augmentation technique utilizing Part Of-Speech (POS) based on our observation that some parts of speech are more (or less) relevant to causality. Our approach especially improved the recall of detecting causal events in sentences.