Research on Keywords Variations in Linguistics Based on TF-IDF and N-gram

Yuyao Li, Xueyi Wen, Xingyu Liu


The rapid development of natural language processing (NLP) holds great promise for bridging the divide among languages. One of its main innovative applications is to use broad data to explore the historical trend of a subject. However, since Saussure pioneered modern linguistics, there is relatively inadequate research work done in the linguistic research on the field's variations to comprehensively reveal the linguistic trends. To trace the changes in linguistic research hotspots, we use a dataset of more than 30,000 linguistics-related literature with their titles from the Web of Science and apply NLP techniques to the data consisting of their keywords and publication years. It is found that the co-occurrence relationship between keywords, NGRAM, and their relationship with years can effectively present changes in linguistic research themes. This research is supposed to provide further insights and new methods that can be applied in the field of linguistics and related disciplines.


keyword extraction, TF-IDF, N-Gram, Linear Discriminant Analysis (LDA)

Full Text:


Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Crossref Similarity Check logo

Crossref logologo_doaj

 Hrvatski arhiv weba logo