第13期:Introduction to textual document analysis


题目: Introduction to textual document analysis
时间:2020-06-04 2020年6月4日(周四)14:00-15:00
Textual documents are a very important source of information. According to a rough estimation by a prestigious sociologist, approximately 70% of the information consumed by human society comes in the form of textual documents. The essential information of a textual document is embedded in natural human languages, that are usually very noisy and free of structures. Therefore, to extract the information, it takes either intense reading efforts by human readers, or computational analysis by machines. In the context of big data, employing human readers for the information extraction is always out of the question. Nowadays, it has become increasingly practical to use machines to do the analysis. In this seminar, Dr. Ou is going to share his knowledge and experience in computational analysis of textual documents. The outline of his talk is as follows.

1.Main tasks in textual document analysis
2.Numeric representations of textualdocuments
3.Document similarities
4.Dcoument classification
5.Extracting keywords from documents


OU Wei holds a PhD degree in Knowledge Science. He graduated from Japan Advanced Institute of Science and Technology, under the sponsorship of Chinese Scholarship Council (CSC, 中国政府建设高水平大学项目). Dr. Ou's research interests mainly lie in the area of machine learning and natural language processing. Before joining IBS, Dr. OU worked as a data scientist at a major hiring company based in Tokyo, and gained rich experience in both academic and industrial research. At IBS, Dr. Ou is currently focusing on the analysis of textual patent documents, and devising efficient algorithms to mine insightful information from these documents.