Skip to content

The Vietnamese Corpus Project aims to provide a well-organized collection of Vietnamese text resources, and also integrates the Vietnamese Wikipedia dictionary resource

License

Notifications You must be signed in to change notification settings

lingskr/Vietnamese-Corpus-and-Dictionary

Repository files navigation

Vietnamese Corpus

Project Introduction

The Vietnamese Corpus Project aims to provide a well-organized collection of Vietnamese text resources covering multiple subject areas. The corpus can be used for natural language processing (NLP), machine translation, text analysis, and other research and applications involving Vietnamese. The documents in the corpus are categorized by subject so that users can easily access and utilize these resources.

This project also integrates the Vietnamese Wikipedia dictionary resource, allowing users to easily find and use the definitions and background information of Vietnamese vocabulary.

Classification Directory

The text documents in the corpus are categorized according to the content theme, and the details of each category are as follows:

  • Chính trị Xã hội (Politics and Society) - Contains 6567 documents covering Vietnamese politics, social phenomena and related issues.

  • Đời sống (Life) - Contains 4195 documents covering content related to daily life, such as family, education, culture, etc.

  • Kinh doanh (Business) - Contains 4276 files, focusing on topics such as business, economy, and finance.

  • Pháp luật (Law) - Contains 6656 files, covering laws, regulations, judicial cases, etc.

  • Sức khỏe (Health) - Contains 4417 files, covering topics such as medical health and public health.

  • Thế giới (World) - Contains 5716 files, discussing international news, global issues, diplomatic affairs, etc.

  • Thể thao (Sports) - Contains 5667 files, covering sports news, event reports, athlete information, etc.

  • Văn hóa (Culture) - Contains 5250 files, covering art, literature, traditional culture, etc.

Wikipedia Dictionary

This project integrates the Vietnamese dictionary from Wikipedia.

About

The Vietnamese Corpus Project aims to provide a well-organized collection of Vietnamese text resources, and also integrates the Vietnamese Wikipedia dictionary resource

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published