Charts vs. News
2020. Individual Project.
Automated collection + analysis of UK Headlines & Spotify Charts, and development of a web-app that creates Spotify playlists based on the mood of the news.
Data Collection + Analysis • Web-App Development • Flask • Natural Language Toolkit • News, FyCharts & Spotipy APIs • Python • HTML
The Charts vs. News project consists of two parts:
The automated collection and sentiment analysis of UK news headlines and titles of songs in the UK Spotify Daily Top 200 chart, in order to evaluate the relation between the mood of the news and most popular music; and,
A web-app that demonstrates an example service / interaction that such data collection and analysis could enable. In this case, the web-app allows users to generate a Spotify playlist of chart-topping songs whose names either match or oppose the current sentiment of current UK headlines.
The project was created for the Sensing & Internet of Things (IoT) module for final year Design Engineers at Imperial College. It advanced my data analysis and coding skills, in particular pushing me to further experiment with various APIs and build upon existing code, as well as search for patterns within and between time-series data using auto- and cross-correlation.
The project also introduced me to the basics of HTML + CSS, sentiment analysis using the Python Natural Language Toolkit (NLTK), and Python-based web development using the Flask micro-framework.
The source code is available to view on GitHub here.
Data Collection + Analysis
Findings + Data Analysis
The headlines and charts were gathered over a 12-day period, and the average daily compound sentiment score for each was calculated, creating a 2 time-series data sets that could be analysed. Although the news was updated regularly throughout each day (the News API was queried every 10 minutes for new headlines), the sentiment score for the headlines had to effectively be downsampled to a daily score to match the data rate for the Spotify charts, which were only updated daily. Ideally, if the project were carried forward, data would be gathered over a longer time-period in order to produce more reliable and conclusive results.
Overall, the limited data gathered did suggest a strong correlation between the sentiment of the headlines and titles of songs in the charts (Pearson cross-correlation coefficient of 0.96). If further validated by a longer study, such a result could be a useful insight to music-curation services (like radio stations, streaming services, and so on), offering another metric (the mood of current news) to determine which music to recommend/provide to listeners.
As well as gathering more data, other possible avenues for further development include examining different geographical locations and times of year, and more accurate and in-depth analysis of the sentiment of headlines and songs in particular, for instance considering lyrics and elements of musical make-up such as key, tempo and energy/loudness.
Basic time-series analysis was performed first, including graphing and slightly manipulating the data (such as looking at the ratio of positive-to-negative song titles and headlines per day).
Auto- & Cross-Correlation
Auto-correlation and cross-correlation were performed to identify seasonal (repeating) patterns within the time-series data-sets and between them, respectively. Linear regression between both data sets was also performed and suggested a positive correlation between them. The cross-correlation suggested a strong zero-lag (real-time) correlation between the daily ratio of positive-to-negative headlines and charts’ song titles (Pearson correlation coefficient = 0.96). In other words, the sentiment of news for a given day seems to be correlated with the sentiment of the charts’ song titles for that same day.