
-
Korean-Korean Sign Language Parallel Corpus 2024
(Ver 1.0) Paralled corpus consistin of Korean spoken data translated into Korean Sign Language.
-
Korean-Vietnamese Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Veitnamese language.
-
Korean-Uzbek Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Uzbek language.
-
Korean-Thai Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Thai language.
-
Korean-Tagalog Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Tagalog language.
-
Korean-Russian Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Russian language.
-
Korean-Khmer Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Khmer language.
-
Korean-Indonesian Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Indonesian language.
-
Korean-Hindi Parallel Corpus 2024
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Hindi language.
-
Korean Sign Language Raw Corpous 2025
(Ver 1.0) It is a raw data of the sign language corpus consisting of video of two deaf people conversing in sign language.
-
Korean Sign Language Annotated Corpus 2025
(Ver 1.0) It is a multi-tier translated and annotated corpus of sign language videos of two deaf people conversing in sign language, translated into Korean, segmented and annotated.
-
Korean Sign Language-Korean Parallel Corpus 2025
(Ver 1.0) It is a parallel corpus composed of videos of two deaf people conversing in sign langauge and translated into Korean.
-
Historical Korean Corpus 2024
(Ver 1.0) A primary corpus of 32 sources, including Eongan(Hanguel letters) in 17th-19th centuries and modern novels, Pansori-based narratives, and dictionaries in early 20th century.
-
Korean Sign Language Annotated Corpus 2024
(Ver 1.1) It is a multi-tier translated and annotated corpus of sign language videos of two deaf people conversing in sign language, translated into Korean, segmented and annotated.
-
Korean Sign Language-Korean Parallel Corpus 2024
(Ver 1.1) It is a parallel corpus composed of videos of two deaf people conversing in sign language and translated into Korean.
-
Korean Language and Culture Knowledge Graph 2024
-
Historical Korean Corpus 2023
(Ver 2.0) A primary corpus of literature materials written in Hangeul from the 15th century, when it was created, to the early 20th century.
-
Corpus of Table Description Sentence 2024
(Ver 1.0) It is a corpus of description sentences per four types of area(row, column, discontinuous region, entrie table) that contain key contents.
-
Summary Evaluation Corpus 2024
(Ver 1.0) This is a corpus of evaluaion of summaries written by two different workers for articles extracted from 'NIKL Newspaper Corpus 2023'.
-
Summary Corpus 2024
(Ver 1.0) It is a corpus of summaries written by two different workers for articles extracted from 'NIKL Newspaper Corpus 2023'.
-
Instrction-based Generation Corpus for writing correction 2024
(Ver 1.0) A corpus of texts from the 'NIKL Raw Writing Data 2023(version 1.0)', selected by considering different units of writing (document, paragraph, sentence) and edited according to diagnostic criteria(content, organization, expression).
-
Grading Writing Data 2024
(Ver 1.0) A corpus of grading data provided by two scoring experts on 1,000-character argumentative writing text prepared by national and public university student in nine regions nationwide.
-
Raw Writing Data 2024
(Ver 1.0) A corpus of argumentative writing text of around 1,000 characters prepared by national and public university students in nine regions nationwide in 2024.
-
Grading Writing Data 2023(2)
(Ver 1.0) A corpus of grading data provided by two scoring experts in 2024 on 1,000-character argumentative writing text prepared by national and public university student in nine regions nationwide in 2023.
-
NIKL The Basic Knowledg Data of Korean Language 2024
(Ver 1.0) Data on Korean language knowledge regarding the types of errors in Korean writting, expression, analyzed based on official document review data and Korean language counseling data, including representative examples (question and answer, description, and case)
-
Korean-Korean Braille parallel corpus 2022
(Ver 1.0) Parallel corpora consisting of sentences extracted from Korean written data (news articles), translated into Braille and then proofread. The sentences in the corpora comprise a combination of Korean, Roman alphabets, numbers, and symbols.
-
Korean-Korean Braille parallel corpus 2024
(Ver 1.0) Parallel corpora consisting of sentences extracted from Korean written data (news articles), translated into Braille and then proofread. The sentences in the corpora comprise a combination of Korean, Roman alphabets, numbers, and symbols.
-
Korean-Korean Sign Language Parallel Corpus 2023
(Ver 1.0) Parallel corpus consisting of Korean spoken data translated into Korean Sign Language.
-
Dialogue Corpus transcription 2024
(Ver 1.0) It is a corpus of daily conversations where speakers freely talk about a specific topic or presented material.
-
Dialogue Corpus audio 2024
(Ver 1.0) It is a corpus composed of speeches (PCM files) of daily conversations and transcription data.
-
Newspaper 2024
(Ver 1.0) It is a corpus of the newspaper articles produced in 2023 and permitted to use copyrighted material by the media that have been refined into a machine-analyzable format.
-
Korean Corpus of Inappropriate Utterances 2023
(Ver 1.0) It is corpus of the inappropriate speech that appears in online publications, annotated with explicite context, intensity, and domain.
-
Dialogue Contextual Inference Corpus 2024
(Ver 1.0) A corpus organized by drawing and evaluation five types of 'regular/adversarial' inferences based on the conversational context, common sense, world knowledge, etc.
-
Zero Anaphora Corpus 2024
(Ver 1.0) This is a corpus in which omitted essential components - such as subjects, objects, complements, and adverbials - have been restored based on context.
-
Dependency Parsed Corpus 2024
(Ver 1.0) It is corpus that analyzes the syntactic strcuture of a sentence and attaches a dependency label to each word.
-
Korean Dialects Corpus 2021
(Ver 1.0) This raw corpora are a collection of oral-utterance survey results from 2021(surveyed informants of three generations at each of the 10 regions)
-
Dialogue Corpus transcription 2023
(Ver 1.1) It is a corpus of daily conversations where speakers freely talk about a specific topic or presented material.
-
Dialogue Corpus audio 2023
(Ver 1.1) It is a corpus composed of speeches (PCM, WAV files) of daily conversations and transcription data.
-
Raw Writing Data 2023
(Ver 1.0) A corpus of argumentative writing text of around 1,000 characters prepared by national and public university students in nine regions nationwide in 2023.
-
Grading Writing Data 2023(1)
(Ver 1.0) A corpus of grading data provided by two scoring experts on 1,000-character argumentative writing text prepared by national and public university students in nine regions nationwide
-
Korean-Indonesian Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Indonesian language.
-
Korean-Hindi Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Hindi language.
-
Korean-Khmer Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Khmer language.
-
Korean-Russian Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Russian language.
-
Korean-Thai Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Thai language.
-
Korean-Tagalog Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Tagalog language.
-
Korean-Uzbek Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Uzbek language.
-
Korean-Vietnamese Parallel Corpus 2023
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Vietnamese language.
-
Korean Sign Language Raw Corspus 2024
(Ver 1.0) It is a raw data of the sign language corpus consisting of videos of two deaf people conversing in sign language.
-
Corpus of Korean Parliamentary Minutes Summarization 2023
(Ver 1.0) This is a corpus composed of important summarise for each issue, detailed summaries, and representative summaries of the entire document for the Korean Parilamentary Minutes.
-
Korean-Korean Braille parallel corpus 2023
(Ver 1.0) Parallel corpora consisting of sentences extracted from Korean written data (news articles and online posting materials), translated into Braille and then proofread. The sentences in the corpora comprise a combination of Korean, Roman alphabets, numbers, and symbols.
-
Korean-Korean Sign Language Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean spoken data translated into Korean Sign Language.
-
Korean Dialogue Summarization Corpus 2023
(Ver 1.0) This is a corpus comprising speaker-specific summaries, topic-specific summaries, and a representative summary derived from discourse analysis conducted on the NIKL Dialogue courpus 2020, 2021.
-
Dialogue Contextual Inference Corpus 2023
(Ver 1.1) A corpus organized by drawing five types of inferences based on the conversational context, common sense, world knowledge, etc.
-
Newspaper Corpus 2023
(Ver 1.0) It is a corpus of the newspaper articles produced in 2022 and permitted to use copyrighted material by the media that have been refined into a machine-analyzable format.
-
Graph Based Sentence Generation Corpus 2022
(Ver 1.0) It is a corpus composed of baseline sentences that describes the contents of the graph along with a paraphrased version of the baseline sentence.
-
Corpus of Korean Parliamentary Minutes Summarization
(Ver 1.0) This is a corpus composed of important summaries for each issue, detailed summaries, and representative summaries of the entire document for the Korean Parilamentary Minutes.
-
Korean-Hindi Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Hindi.
-
Korean-Indonesian Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Indonesian.
-
Korean-Khmer Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Khmer.
-
Korean-Russian Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Russian.
-
Korean-Thai Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Thai.
-
Korean-Tagalog Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Tagalog.
-
Korean-Uzbek Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Uzbek.
-
Korean-Vietnamese Parallel Corpus 2022
(Ver 1.0) Parallel corpus consisting of Korean written and spoken data translated into Vietnames.
-
Dialogue Corpus transcription 2022
(Ver 1.0) It is a corpus of daily conversations where speakers freely talk about a specific topic or presented material.
-
Dialogue Corpus audio 2022
(Ver 1.0) It is a corpus composed of speeches (PCM files) of daily conversations and transcription data.
-
Named Entity Dictionary 2022
(Ver 1.0) It is a dataset constructed by extraction entity expression, type and knowledge base connetion information from the Named Entity corpus.
-
Named Entity Linking 2022
(Ver 1.1) It is a dataset with information from Wikipedia attached to the Named Entity Corpus.
-
Newspaper Corpus 2022
(Ver 1.0) It is a corpus of the newspaper articles produced in 2021 and permitted to use copyrighted material by the media that have been refined into a machine-analyzable format.
-
Online Posting Materials Corpus 2022
(Ver 1.0) This corpus consists of posts collected form various online communities and social networking service platforms.
-
Named Entity Corpus 2022
(Ver 1.1) It is a corpus that marks the boundary of named entities appearing sentence and attaches semantic category tags to them.
-
Korean-Russian parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Russian.
-
Korean-Vietnamese parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Vietnamese.
-
Korean-Uzbek parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Uzbek.
-
Korean-Khmer parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Khmer.
-
Korean-Indonesian parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Indonesian.
-
Korean-Tagalog parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Tagalog.
-
Korean-Hindi parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Hindi.
-
Korean-Thai parallel Corpus 2021
(Ver 1.0) Parallel corpora consisting of Korean written and spoken data translated into Thai.
-
Dialogue Corpus audio 2021
(Ver 1.1) It is a corpus composed of speeches (PCM files) of daily conversations and transcription data.
-
Dialogue Corpus transcription 2021
(Ver 1.1) It is a corpus of daily conversations where speakers freely talk about a specific topic or presented material.
-
Dialogue Corpus audio 2020
(Ver 1.4) It is a corpus composed of speechs (PCM files) of daily conversations and transcription data.
-
Dialogue Corpus transcription 2020
(Ver 1.4) It is a corpus of daily conversations where speakers freely talk about a specific topic or presented material.
-
Spelling Error Correction Corpus 2022
(Ver 1.0) It is a corpus which corrects spelling errors in text data from websites.
-
Named Entity Linking 2021
(Ver 1.2) It is a dataset with information from Wikipedia attached to the Named Entity Corpus.
-
Named Entity Dictionary 2021
(Ver 1.1) It is a dataset constructed by extraction entity expression, type and knowledge base connetion information from the Named Entity corpus.
-
2022 Artificial Intelligence Language Capability Evaluation Competition Corpus: ABSA
(Ver 1.0) This is the task corpus for the 2022 National Institute of Korean Language 'Artificial Intelligence Language Capability Evaluation Competition'.
-
Newspaper Corpus 2021
(Ver 1.0) It is a corpus composed of article from comprehensive magazines and specialized magazines.
-
Korean Parliamentary Corpus 2021
(Ver 1.1) It is a corpus composed of the minutes of the National Assembly subcommittees.(2003~2020)
-
Commitment Bank Corpus 2021
(Ver 1.1) It is a corpus composed of speaker's commitment to a hypothesis implied in an embedded sentence.
-
Spelling Error Correction Corpus 2021
(Ver 1.0) It is a corpus which corrects spelling errors in text data from websites.
-
Aspect-Based Sentiment Analysis Corpus 2021
(Ver 1.1) It is a corpus with aspect-based sentiment information attached to the same document as the National institute of the Korean Language Sentiment Analysis Corpus(2020).
-
Named Entity Corpus 2021
(Ver 1.0) It is a corpus that marks the boundary of named entities appearing sentence and attaches semantic category tags to them.
-
Online Text Message Corpus
(Ver 1.1) It is a corpus of online conversations between two or more participants.
-
Commitment Bank Corpus 2020
(Ver 1.1) It is a corpus composed of speaker's commitment to a hypothesis implied in an embedded sentence.
-
Semantic Role Labeling Corpus
(Ver 1.0) It is a corpus that analyse the arguments of the predicate of a sentence and attaches their semantic roles.
-
Word Sense Tagged Corpus 2020
(Ver 2.0) It is a corpus that distinguishes polysemous words and attatches their semantic ID of Urimalsaem.
-
Newspaper Corpus 2020
(Ver 1.1) It is a corpus composed of article(2019) from comprehensive magazines, specialized magazines and internet-based newspapers.
-
Named Entity Corpus 2020
(Ver 2.1) It is a corpus that marks the boundary of named entities appearing in sentence and attaches semantic category tags to them.
-
Newspaper Corpus
(Ver 2.0) It is a corpus composed of article(2009~2018) from comprehensive magazines, specialized magazines, and internet-based newspapers.
-
Spoken Corpus
(Ver 1.2) It is a corpus composed of formal spoken data, such as broadcasts and lectures, and semi-spoken data, such as soap opera transcripts.
-
Non-publication Corpus
(Ver 1.2) It is a corpus composed of personal writings.(such as poems, diaries, letters, impressions, etc.)
-
Grammaticality Judgment Corpus
(Ver 1.1) It is a corpus containing Korean speaker's judgment on the grammaticality (or acceptablity) of Korean sentences.
-
Case Frame
(Ver 1.0) It is a model describing the obligatory semantic roles of predicates (with Urimalsaem and Sejong electronics dictionary semantic ID).
-
Sentiment Analysis Corpus 2020
(Ver 1.0) It is a corpus that attaches sentiment anaysis annotation to authors' subjective expressions.
-
Zero Anaphora Corpus 2020
(Ver 1.0) It is a corpus in which omitted subjects and objects in sentences are restored according to context.
-
Coreference Resolution Corpus 2019
(Ver 1.0) It is a corpus that finds and links different linguistic items referring to the same onject.
-
Messenger Corpus
(Ver 2.0) It is a corpus of conversations between two or more paticipants via messenger.
-
Dependency-Parsed Corpus
(Ver 2.0) It is corpus that analyzes the syntactic structure of a sentence and attaches a dependency label to each word.
-
Corpus of Korean Short Stories Read in Seoul Dialect
(Ver 2.0) It is a corpus of recitation by 120 Seoul speakers who have lived in Seoul or Gyeonggi-do at least two generations.
-
Named Entity Corpus
(Ver 1.0) It is a corpus that marks the boundary of named entities appearing in sentences and attaches semantic category tags to them.
-
Summarization Corpus
(Ver 1.0) It is a corpus consisting of topic sentences extracted from a document and its summary.
-
Written Corpus
(Ver 1.2) It is a copus of books, magazines, reports, etc.
-
Part-of-Speech Tagged Corpus
(Ver 1.1) It is a corpus that analyzes words into morphemes and attaches a part-of-speesh tag to each morpheme.
-
Paraphrase Corpus
(Ver 1.0) It is a corpus of semantically simillar sentences generated by humans and computers.
-
Lexical Relations Data: NIKLex
(Ver 1.0) It is a dataset in which language users evaluate lexical relationships such as synonyms, antonyms, hypernyms and hyponyms.
-
Korean Learners'Corpus Search Engine
Go to the website※Distributing from a dedicated website
-
21st Century Sejong Project

