I am a Ph.D. student at Seoul National University, conducting research in causal discovery, large language models (LLMs). My work is broadly focused on building reliable, interpretable, and generalizable causal reasoning systems, particularly in settings where data is sparse, non stationary, or embedded in unstructured text. My early research addressed automated extraction of causal relationships from text using part of speech based data augmentation, which I presented as the first author at CASE at EMNLP 2022. Since then, I have been actively exploring how pre trained language models can be leveraged to incorporate prior knowledge into causal structure learning, co authoring papers presented at NeurIPS 2024 (Causality and Large Models Workshop) and currently under review at IEEE Access. Alongside this, I am the lead inventor of a causal discovery framework that integrates LLMs into graph learning pipelines.
My recent efforts involve both theoretical and empirical investigations of non stationary time series causal discovery and spatio temporal causal discovery. In addition, I am developing a series of studies on LLM citation reliability, including saliency based methods, and on diffusion based causal discovery using mixture of experts (MoE) architectures. My research aims to bridge language understanding, causal inference, and dynamic systems, with the long term goal of building robust AI systems that can reason about change, uncertainty, and cause effect relationships in real world settings.
") does not match the recommended repository name for your site ("
").
", so that your site can be accessed directly at "http://
".
However, if the current repository name is intended, you can ignore this message by removing "{% include widgets/debug_repo_name.html %}
" in index.html
.
",
which does not match the baseurl
("
") configured in _config.yml
.
baseurl
in _config.yml
to "
".
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Neural Information Processing Systems (NeurIPS) Workshop on Causality and Large Models 2024 Spotlight
Pre-trained Language Models (PLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, proving their effectiveness even when data is scarce. However, there are crucial limitations in current PLM-based causal reasoning methods: i) PLM cannot utilize large datasets in prompt due to the limits of context length, and ii) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large enough. To overcome each other approaches’ limitations, we propose a new framework that integrates PLMs-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to the time-series data and exhibits superior performance.
Juhyeon Kim, Chanhui Lee, LG AI Institute, Sanghack Lee
Neural Information Processing Systems (NeurIPS) Workshop on Causality and Large Models 2024 Spotlight
Pre-trained Language Models (PLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, proving their effectiveness even when data is scarce. However, there are crucial limitations in current PLM-based causal reasoning methods: i) PLM cannot utilize large datasets in prompt due to the limits of context length, and ii) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large enough. To overcome each other approaches’ limitations, we propose a new framework that integrates PLMs-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to the time-series data and exhibits superior performance.
Juhyeon Kim, Yesong Choi, Sanghack Lee
CASE at EMNLP 2022 Spotlight
Finding causal relations in texts has been a challenge since it requires methods ranging from defining event ontologies to developing proper algorithmic approaches. In this paper, we developed a framework which classifies whether a given sentence contains a causal event. As our approach, we exploited an external corpus that has causal labels to overcome the small size of the original corpus (Causal News Corpus) provided by task organizers. Further, we employed a data augmentation technique utilizing Part Of-Speech (POS) based on our observation that some parts of speech are more (or less) relevant to causality. Our approach especially improved the recall of detecting causal events in sentences.
Juhyeon Kim, Yesong Choi, Sanghack Lee
CASE at EMNLP 2022 Spotlight
Finding causal relations in texts has been a challenge since it requires methods ranging from defining event ontologies to developing proper algorithmic approaches. In this paper, we developed a framework which classifies whether a given sentence contains a causal event. As our approach, we exploited an external corpus that has causal labels to overcome the small size of the original corpus (Causal News Corpus) provided by task organizers. Further, we employed a data augmentation technique utilizing Part Of-Speech (POS) based on our observation that some parts of speech are more (or less) relevant to causality. Our approach especially improved the recall of detecting causal events in sentences.