Facebook Likes Estimator

The project seeks to find the factors of the numbers of likes for posts from news publisher’s Facebook page.

Data are collected from the Facebook pages of major news publisher such as: The New York Times, BBC News, The Huffington Post, CNN, TIME, The Wall Street Journal, and The Economist. A parser is built to obtain the raw data, which includes total likes for the page, created time of the post, content of the post and total likes for the post. From the raw data, we extract the following features: published day (Sunday-Saturday, marked as 0-6), published time, length of post (as number of words), and the tone of the post (leveled from 0-6). The last feature is supported by the Stanford CoreNLP Natural Language Processing Toolkit. Other features to be tested are event-specific time coefficient, for example, days before Thanksgiving, and preference of terms, which will utilize Stanford CoreNLP’s ability to identify popular terms from text.

These data will be separated into train and test cases. After obtaining the features, we first used visualize method such as histogram and bar plot to gain insight into the data. Finally, applied different mining techniques, for example, decision tree, to clarify the relationship between features and like counts.

In our result, we suggest a better post consist of only a few sharp sentences or a detailed paragraph. The commonly seen emoji icon and hashtag actually have no effect on the post’s popularity. Another trend we found is that post during worktime and workday receives less like.

Results: Achieved 95% accuracy in predicting how many likes one post will obtain by machine learning techniques

Machine Learning Engineer

I am excited about using software and machine learning skills to solve real-world problems.