This short book was prepared as supplementary material for the Exploratory Data Analysis course I taught at the University of Notre Dame in the fall of 2018. It is heavily influenced by the book Text Mining with R: A Tidy Approach by Julia Silge and David Robinson.

This book works around a dataset on teaching evaluation and illustrates how to analyze the narrative comments on college professor teaching evaluation.

I would like to thank all the students who took the class and motivated the preparation of the document. I also thank the students in my Lab for Big Data Methodology for many helpful discussions on text mining.

I will keep improving the document. If you have any comments or suggestions, please contact me at Zhiyong Zhang, 390 Corbett Family Hall, University of Notre Dame, IN 46530. You can drop me an email at zhiyongzhang(at)


The current version of the book [2018-12-23] consists of 8 chapters.

Chapter 1 introduces the teaching evaluation dataset.

Chapter 2 illustrates a simple application of text mining, to obtain the gender information.

Chapter 3 focuses on getting the word frequency to investigate the common words used in teaching evaluation.

Chapter 4 introduces the document-term matrix for representing text data.

Chapter 5 investigates the association of words using different methods such as ngrams, correlation of words, and association rules.

Chapter 6 uses cluster analysis to understand word clusters.

Chapter 7 introduces several topic models including latent Dirichlet allocation, correlated topic models and supervised latent Dirichlet allocation.

Chapter 8 shows how to conduct sentiment analysis.

To cite the book, please use the following:

Zhang, Z. (2018). Text Mining for Social and Behavioral Research Using R: A Case Study on Teaching Evaluation. Retrievable from