Chris Brownlie

Data Scientist & Writer

Nottingham, UK


Experience

Data Scientist & Statistical Officer - Department for Education

  • Data Scientist (Higher Statistical Officer)

    • Creating complex R Shiny applications for interactive analysis
    • Developing SQL Server databases for the collection and storage of data
    • Creating interative dashboards using Microsoft PowerBI
    • Modelling complex funding using multiple data sources

  • Data Scientist

    • Providing monthly financial forecasts for a £18bn spend
    • Transferring complex funding models from Excel into R
    • Created an R package for survey analysis

  • Graduate Data Engineer

    • Experience manipulating and exploring a complex SQL Server Data Warehouse
    • Extensive learning of SQL and R
    • Development of interactive applications for monitoring Data Warehouse processes

Editor - Data Slice

  • Editor

    • 500+ subscribers
    • Average over 4k views per article
    • Data Slice is an online publication I founded to host my blog and to combat the devaluation of Data Science articles on Medium
    • Articles focus on interesting datasets or topics and always aim to provide novel insights
    • See 'Articles' section for examples

Writer - Towards Data Science

  • Writer

    • 400+ followers of my personal profile
    • 50k+ views across all 12 articles
    • Towards Data Science is the largest Data Science publication on Medium with over 8m monthly viewers
    • From my first blog post in April 2019 I gained several thousand views and was asked to contribute to TDS
    • See 'Articles' section for examples


Education

  • MSc Data Science - University of Sheffield

    • Studied part-time whilst working as a Data Scientist at the DfE
    • Key modules include: Data Mining, Data Visualisation, Researching Social Media & Database Design
    • Awarded funding by the Department for Education Analytical Community
  • BSc Economics - University of Nottingham

    • Key modules: Econometrics, Mathematical Modelling, Development Economics
    • Campus Ambassador for Nottingham Economics and Finance Society
    • Awarded Gainsborough Prize 2017 for best submission to the Nottingham Economic Review









Articles:

See below a selection of my articles posts in the Data Slice publication


3.5 Years of a Relationship, in Whatsapp Messages

Analysing data from a Whatsapp chat with my girlfriend.

An Automated Framework for Predicting Sports Results

In this article I discuss my Mel Rugby project (see 'Projects' section).

A Game of Words

My very first blog post, analysing transcripts from the TV series A Game of Thrones.




Projects

Below you can see some of the key projects I am working on in my spare time, along with a brief description. All of these projects are openly available on my Github (links included below).


Mel Rugby

Mel is the name of both a Twitter bot which I created for the Rugby World Cup 2019 to tweet score predictions for matches, and the Machine Learning project which produced the predictions.

I started this project shortly before the Rugby World Cup in 2019 and the project was developed using R. There are four core parts to the framework:

  1. The Scraper

    The first part of the project uses various webscraping packages in R (mainly httr, xml2, rvest and RSelenium) to scrape a variety of data from across the internet, pertaining to rugby matches. This includes: match results and statistics (since the last time this was run), upcoming fixtures and team announcements for upcoming games.

  2. The Feature Extractor

    In this part of the project, the raw data is transformed into a selection of structured, readable tables. Additional features are also extracted such as: team form, individual player form, a hybrid world ranking and relative strength of forwards vs backs.

  3. The Models

    I used a combination of models to produce the best results. After investigating, I found I had the best results when using a decision tree classifier to identify 'high-scoring games' - where the match is likely to be very one-sided. Then depending on how upcoming fixtures were classified, they were fed into a model that was trained on previous instances of that type of match. For example, if an upcoming game is predicted to be a high-scoring game, the result is predicted using a model for high scoring games. This helped to deal with the issue of historic data being imbalanced towards low scoring games

  4. Mel

    The final stage of the framework is a series of scripts which take outputs from previous steps, format them and then tweet them from the @mel_rugby Twitter account. This includes tweeting predictions for upcoming games as well as results for previous games and updates on the accuracy of the framework.

All four parts of the model were combined into a single master script, meaning this script could be scheduled regularly and required no human interference.

surveyr

'surveyr' is an R package I developed from scratch to enable quick and easy analysis of survey responses. It is available on my Github.

This was developed as part of the Data Science Labs programme at the Department for Education. During the programme I was presented with a problem that colleagues in the Department were experiencing.

The problem was that for large surveys, colleagues were struggling to analyse the responses, either taking small samples and analysing by hand or contracting the work out for several thousand pounds.

I approached this by using my expertise in Text Analytics and Natural Language Processing to develop a package which would allow other analysts in the Department to quickly and easily analyse survey responses, even if they had no experience with text analysis.

Contact Me

Send me an email directly and I'll get back to you as soon as possible.