Juhyeon Kim, Sanghack Lee
Working Progres 2025 Spotlight
Working Progress
Juhyeon Kim, Sanghack Lee
Working Progres 2025 Spotlight
Working Progress
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Working Progres 2025 Spotlight
Working Progress
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Working Progres 2025 Spotlight
Working Progress
Juhyeon Kim, Sanghack Lee
Working Progres 2025 Spotlight
Working Progress
Juhyeon Kim, Sanghack Lee
Working Progres 2025 Spotlight
Working Progress
Juhyeon Kim, Seongwook Son, Jiyong Ahn, Sanghack Lee
Working Progress 2024 Spotlight
Working Progress
Juhyeon Kim, Seongwook Son, Jiyong Ahn, Sanghack Lee
Working Progress 2024 Spotlight
Working Progress
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Neural Information Processing Systems (NeurIPS) Workshop on Causality and Large Models 2024 Spotlight
Pre-trained Language Models (PLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, proving their effectiveness even when data is scarce. However, there are crucial limitations in current PLM-based causal reasoning methods: i) PLM cannot utilize large datasets in prompt due to the limits of context length, and ii) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large enough. To overcome each other approaches’ limitations, we propose a new framework that integrates PLMs-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to the time-series data and exhibits superior performance.
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Neural Information Processing Systems (NeurIPS) Workshop on Causality and Large Models 2024 Spotlight
Pre-trained Language Models (PLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, proving their effectiveness even when data is scarce. However, there are crucial limitations in current PLM-based causal reasoning methods: i) PLM cannot utilize large datasets in prompt due to the limits of context length, and ii) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large enough. To overcome each other approaches’ limitations, we propose a new framework that integrates PLMs-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to the time-series data and exhibits superior performance.
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Arxiv 2023 Spotlight
Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Arxiv 2023 Spotlight
Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.
Juhyeon Kim, Yesong Choi, Sanghack Lee
CASE at EMNLP 2022 Spotlight
Finding causal relations in texts has been a challenge since it requires methods ranging from defining event ontologies to developing proper algorithmic approaches. In this paper, we developed a framework which classifies whether a given sentence contains a causal event. As our approach, we exploited an external corpus that has causal labels to overcome the small size of the original corpus (Causal News Corpus) provided by task organizers. Further, we employed a data augmentation technique utilizing Part Of-Speech (POS) based on our observation that some parts of speech are more (or less) relevant to causality. Our approach especially improved the recall of detecting causal events in sentences.
Juhyeon Kim, Yesong Choi, Sanghack Lee
CASE at EMNLP 2022 Spotlight
Finding causal relations in texts has been a challenge since it requires methods ranging from defining event ontologies to developing proper algorithmic approaches. In this paper, we developed a framework which classifies whether a given sentence contains a causal event. As our approach, we exploited an external corpus that has causal labels to overcome the small size of the original corpus (Causal News Corpus) provided by task organizers. Further, we employed a data augmentation technique utilizing Part Of-Speech (POS) based on our observation that some parts of speech are more (or less) relevant to causality. Our approach especially improved the recall of detecting causal events in sentences.