Hello!

I'm Dr. William Nicholson

Hello!

Data Scientist, Project Manager, Machine Learning Expert

IBM Machine Learning Professional (2022), AWS Cloud Technician, GCP Cloud Digital Leader

PRINCE2® Practitioner

>10 years experience across academia and industry

Skill Sets

Articles by topic

Articles and coding demonstrations covering topics of data science and analysis that reflect my prominent skill sets and career interests

Most recent articles

The eight most recent blog posts are shown below. For the rest, please see the section just above. Thank you!

Post 2 Image

Automatic Text Summarization: the plasticity of language

Automatic text summarization (ATS) techniques offer powerful solutions for generating accurate and informative summaries from textual data content. In our digital age, where an estimated 403 million terabytes of data are generated daily, it is vital that we are able to distil large amounts of textual content into focused summaries, containing just the salient details. In this article I provide an updated survey of state-of-the-art ATS methods, with a particular focus on how large language models models address the complexities and nuances of automated text summarization.

Post 2 Image

Being clear and concise when speaking publically

Speaking in public, or in front of an audience at work, can be daunting. But there are ways we can bring structure and order to our thoughts and how we convey them. In this article I'll discuss three simple techniques for ensuring your next business speech or presentation is sharp, to the point, and engages the audience and their opinion.

Post 2 Image

Building a data science team

Regardless of whether you're a start-up company just starting out or a large existing organization - setting up a data science team is no easy feat. While you may be in a leadership or management position you're unlikely to be able to do everything yourself. It would be unreasonable to expect this of anyone! Instead, your leadership, communication, and motivation skills are best used to create and lead a data science team that is motivated towards your goals. The question therefore becomes 'what are the typical roles in a data science team?'. And that's what we'll look at in this article.

Post 2 Image

Topic Modelling with BERTopic and DataMapPlot

Topic modelling is a form of text analysis that uses unsupervised machine learning to identify patterns, themes, clusters, and groups across a collection of documents. In this article I discuss using the powerful BERTopic library alongside quantized large language models to identify themes and topics from a collection of research papers at the intersection of artificial intelligence and ophthalmology. Then, we'll use the DataMapPlot library to produce a publication ready visualization of the thematic structure contained within the abstracts of those research papers.

Post 2 Image

Loading data into Google Colab

Google Colab (or Colaboratory) is a complete, modern, cloud-based runtime environment. It gives individuals and teams the ability to work together on coding, data science and machine learning problems with shared access to data, state of the art GPU's and TPU's, and industry standard Python libraries. You are provided with an executable document, i.e. a notebook much like a Jupyter Notebook, that allows both code and markdown to be written and executed, and your results visualized. In this article I'll cover one of the most fundamental requirements all teams will face - how to get data into your Colab notebook in the first place!

Post 2 Image

Database Normalization

Even a good database design can't always protect against bad data. But there are plenty of occasions when a good database design helps us avoid the bigger of the possible bad data - database headaches. Thus, in this article I'll discuss database normalization: what it is, why we do it, and how we do it. I'll also discuss how we can determine when a database table is normalized 'enough' and what indications we can look for that suggest bad data might pose a problem for our database.

Post 2 Image

Getting started with Amazon Web Services

Amazon Web Services (AWS) was the most popular Cloud provider available through the first quarter of 2022, controlling 33% of the entire market; beating Microsoft Azure and it's 21% share. Both organizations and individual developers use cloud services from AWS, Microsoft, and other vendors for machine learning, data analytics, cloud native development, application migration, and many other services. In this short article I'll discuss how to setup your own AWS account, as the root user, then how to create your first IAM admin user, before finally introducing some of the main services that standout as the most useful through a data scientists career.

Post 2 Image

Integrating Django with Tailwind CSS

Django is one of the most popular Python full-stack web frameworks available. It's high-level design makes rapid development of web apps easier and cleaner while requiring less code. TailwindCSS is rapidly becoming the first choice CSS framework for styling modern websites. It's utility-first approach makes creating beautifully styled apps with consistent choices of colour, spacing, typography (and everything else CSS) far easier for a great looking website or app. In this article I'll show you how you can combine these two frameworks so that they work together without missing out on monitoring development changes with both npm and Django development servers.