See below a selection of my articles posts in the Data Slice publication
3.5 Years of a Relationship, in Whatsapp Messages
Analysing data from a Whatsapp chat with my girlfriend.
An Automated Framework for Predicting Sports Results
In this article I discuss my Mel Rugby project (see 'Projects' section).
A Game of Words
My very first blog post, analysing transcripts from the TV series A Game of Thrones.
Below you can see some of the key projects I am working on in my spare time, along with a brief description. All of these projects are openly available on my Github (links included below).
Mel is the name of both a Twitter bot which I created for the Rugby World Cup 2019 to tweet score predictions for matches, and the Machine Learning project which produced the predictions.
I started this project shortly before the Rugby World Cup in 2019 and the project was developed using R. There are four core parts to the framework:
The first part of the project uses various webscraping packages in R (mainly httr, xml2, rvest and RSelenium) to scrape a variety of data from across the internet, pertaining to rugby matches. This includes: match results and statistics (since the last time this was run), upcoming fixtures and team announcements for upcoming games.
In this part of the project, the raw data is transformed into a selection of structured, readable tables. Additional features are also extracted such as: team form, individual player form, a hybrid world ranking and relative strength of forwards vs backs.
I used a combination of models to produce the best results. After investigating, I found I had the best results when using a decision tree classifier to identify 'high-scoring games' - where the match is likely to be very one-sided. Then depending on how upcoming fixtures were classified, they were fed into a model that was trained on previous instances of that type of match. For example, if an upcoming game is predicted to be a high-scoring game, the result is predicted using a model for high scoring games. This helped to deal with the issue of historic data being imbalanced towards low scoring games
The final stage of the framework is a series of scripts which take outputs from previous steps, format them and then tweet them from the @mel_rugby Twitter account. This includes tweeting predictions for upcoming games as well as results for previous games and updates on the accuracy of the framework.
All four parts of the model were combined into a single master script, meaning this script could be scheduled regularly and required no human interference.
'surveyr' is an R package I developed from scratch to enable quick and easy analysis of survey responses. It is available on my Github.
This was developed as part of the Data Science Labs programme at the Department for Education. During the programme I was presented with a problem that colleagues in the Department were experiencing.
The problem was that for large surveys, colleagues were struggling to analyse the responses, either taking small samples and analysing by hand or contracting the work out for several thousand pounds.
I approached this by using my expertise in Text Analytics and Natural Language Processing to develop a package which would allow other analysts in the Department to quickly and easily analyse survey responses, even if they had no experience with text analysis.
Send me an email directly and I'll get back to you as soon as possible.