Cross-domain Analysis on Japanese Legal Pretrained Language Models (main author)

MIYAZAKI Keisuke

November, 2022

Abstract

This paper investigates the pretrained language model (PLM) specialised in the Japanese legal domain. We create PLMs using different pretraining strategies and investigate their performance across multiple domains. Our findings are (i) the PLM built with general domain data can be improved by further pretraining with domain-specific data, (ii) domain-specific PLMs can learn domain-specific and general word meanings simultaneously and can distinguish them, (iii) domain-specific PLMs work better on its target domain; still, the PLMs retain the information learnt in the original PLM even after being further pretrained with domainspecific data, (iv) the PLMs sequentially pretrained with corpora of different domains show high performance for the later learnt domains.

Type

Conference paper

Publication

In The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing

Cross-domain Analysis on Japanese Legal Pretrained Language Models (main author)

Abstract

MIYAZAKI Keisuke

Corporate researcher