Hello!

I'm Dr. William Nicholson

Hello!

Data Scientist, Project Manager, Machine Learning Expert

IBM Machine Learning Professional (2022), AWS Cloud Technician, GCP Cloud Digital Leader

PRINCE2® Practitioner

>10 years experience across academia and industry

Skill Sets

Articles by topic

Articles and coding demonstrations covering topics of data science and analysis that reflect my prominent skill sets and career interests

Most recent articles

The six most recent blog posts are shown below. For the rest, please see the section just above. Thank you!

Post 2 Image

Being clear and concise when speaking publically

Speaking in public, or in front of an audience at work, can be daunting. But there are ways we can bring structure and order to our thoughts and how we convey them. In this article I'll discuss three simple techniques for ensuring your next business speech or presentation is sharp, to the point, and engages the audience and their opinion.

Post 2 Image

Building a data science team

Regardless of whether you're a start-up company just starting out or a large existing organization - setting up a data science team is no easy feat. While you may be in a leadership or management position you're unlikely to be able to do everything yourself. It would be unreasonable to expect this of anyone! Instead, your leadership, communication, and motivation skills are best used to create and lead a data science team that is motivated towards your goals. The question therefore becomes 'what are the typical roles in a data science team?'. And that's what we'll look at in this article.

Post 2 Image

Topic Modelling with BERTopic and DataMapPlot

Topic modelling is a form of text analysis that uses unsupervised machine learning to identify patterns, themes, clusters, and groups across a collection of documents. In this article I discuss using the powerful BERTopic library alongside quantized large language models to identify themes and topics from a collection of research papers at the intersection of artificial intelligence and ophthalmology. Then, we'll use the DataMapPlot library to produce a publication ready visualization of the thematic structure contained within the abstracts of those research papers.

Post 2 Image

Loading data into Google Colab

Google Colab (or Colaboratory) is a complete, modern, cloud-based runtime environment. It gives individuals and teams the ability to work together on coding, data science and machine learning problems with shared access to data, state of the art GPU's and TPU's, and industry standard Python libraries. You are provided with an executable document, i.e. a notebook much like a Jupyter Notebook, that allows both code and markdown to be written and executed, and your results visualized. In this article I'll cover one of the most fundamental requirements all teams will face - how to get data into your Colab notebook in the first place!

Post 2 Image

Database Normalization

Even a good database design can't always protect against bad data. But there are plenty of occasions when a good database design helps us avoid the bigger of the possible bad data - database headaches. Thus, in this article I'll discuss database normalization: what it is, why we do it, and how we do it. I'll also discuss how we can determine when a database table is normalized 'enough' and what indications we can look for that suggest bad data might pose a problem for our database.

Post 2 Image

Getting started with Amazon Web Services

Amazon Web Services (AWS) was the most popular Cloud provider available through the first quarter of 2022, controlling 33% of the entire market; beating Microsoft Azure and it's 21% share. Both organizations and individual developers use cloud services from AWS, Microsoft, and other vendors for machine learning, data analytics, cloud native development, application migration, and many other services. In this short article I'll discuss how to setup your own AWS account, as the root user, then how to create your first IAM admin user, before finally introducing some of the main services that standout as the most useful through a data scientists career.