Hello!

I'm Dr. William Nicholson

Hello!

Data Scientist, Project Manager, Machine Learning Expert

IBM Machine Learning Professional (2022), AWS Cloud Technician, GCP Cloud Digital Leader

PRINCE2® Practitioner

>10 years experience across academia and industry

Skill Sets

Articles by topic

Articles and coding demonstrations covering topics of data science and analysis that reflect my prominent skill sets and career interests

Most recent articles

The eight most recent blog posts are shown below. For the rest, please see the section just above. Thank you!

Post 2 Image

Clustering and the k-means algorithm

In this article I consider the task of clustering a collection of vectors into groups or clusters of vectors that are close to each other, as measured by the distance between pairs of them. In particular I focus on the famous clustering method, called the k-means algorithm, and give some typical applications.

Post 2 Image

Foundations of Project Management

A strong foundation in project management can help anyone start a great career as a project manager or indeed help enhance and push your career beyond solely software implementation. In this article I discuss the fundamentals of project management including describing what a project is and what a project manager does, before moving onto discussing the typical daily activites of a project manager and the traditional project management roles one sees in the technology and IT industries.

Post 2 Image

Automatic Text Summarization: the plasticity of language

Automatic text summarization (ATS) techniques offer powerful solutions for generating accurate and informative summaries from textual data content. In our digital age, where an estimated 403 million terabytes of data are generated daily, it is vital that we are able to distil large amounts of textual content into focused summaries, containing just the salient details. In this article I provide an updated survey of state-of-the-art ATS methods, with a particular focus on how large language models models address the complexities and nuances of automated text summarization.

Post 2 Image

Basic linear algebra: in NumPy, PyTorch and TensorFlow

To understand even the simplest machine learning algorithm requires a solid grasp of the basics of linear algebra. We can think of linear algebra as a foundational tool that provides the mathematical framework necessary for when we later build and interpret sophisticated machine learning models. In this article, I'll guide you through the core concepts of linear algebra, starting from simple scalar arithmetic and progressing to the complexities of matrix multiplication.

Post 2 Image

Being clear and concise when speaking publically

Speaking in public, or in front of an audience at work, can be daunting. But there are ways we can bring structure and order to our thoughts and how we convey them. In this article I'll discuss three simple techniques for ensuring your next business speech or presentation is sharp, to the point, and engages the audience and their opinion.

Post 2 Image

Building a data science team

Regardless of whether you're a start-up company just starting out or a large existing organization - setting up a data science team is no easy feat. While you may be in a leadership or management position you're unlikely to be able to do everything yourself. It would be unreasonable to expect this of anyone! Instead, your leadership, communication, and motivation skills are best used to create and lead a data science team that is motivated towards your goals. The question therefore becomes 'what are the typical roles in a data science team?'. And that's what we'll look at in this article.

Post 2 Image

Topic Modelling with BERTopic and DataMapPlot

Topic modelling is a form of text analysis that uses unsupervised machine learning to identify patterns, themes, clusters, and groups across a collection of documents. In this article I discuss using the powerful BERTopic library alongside quantized large language models to identify themes and topics from a collection of research papers at the intersection of artificial intelligence and ophthalmology. Then, we'll use the DataMapPlot library to produce a publication ready visualization of the thematic structure contained within the abstracts of those research papers.

Post 2 Image

Loading data into Google Colab

Google Colab (or Colaboratory) is a complete, modern, cloud-based runtime environment. It gives individuals and teams the ability to work together on coding, data science and machine learning problems with shared access to data, state of the art GPU's and TPU's, and industry standard Python libraries. You are provided with an executable document, i.e. a notebook much like a Jupyter Notebook, that allows both code and markdown to be written and executed, and your results visualized. In this article I'll cover one of the most fundamental requirements all teams will face - how to get data into your Colab notebook in the first place!