In this article I consider the task of clustering a collection of vectors into groups or clusters of vectors that are close to each other, as measured by the distance between pairs of them. In particular I focus on the famous clustering method, called the k-means algorithm, and give some typical applications.
A strong foundation in project management can help anyone start a great career as a project manager or indeed help enhance and push your career beyond solely software implementation. In this article I discuss the fundamentals of project management including describing what a project is and what a project manager does, before moving onto discussing the typical daily activites of a project manager and the traditional project management roles one sees in the technology and IT industries.
Automatic text summarization (ATS) techniques offer powerful solutions for generating accurate and informative summaries from textual data content. In our digital age, where an estimated 403 million terabytes of data are generated daily, it is vital that we are able to distil large amounts of textual content into focused summaries, containing just the salient details. In this article I provide an updated survey of state-of-the-art ATS methods, with a particular focus on how large language models models address the complexities and nuances of automated text summarization.
To understand even the simplest machine learning algorithm requires a solid grasp of the basics of linear algebra. We can think of linear algebra as a foundational tool that provides the mathematical framework necessary for when we later build and interpret sophisticated machine learning models. In this article, I'll guide you through the core concepts of linear algebra, starting from simple scalar arithmetic and progressing to the complexities of matrix multiplication.
Speaking in public, or in front of an audience at work, can be daunting. But there are ways we can bring structure and order to our thoughts and how we convey them. In this article I'll discuss three simple techniques for ensuring your next business speech or presentation is sharp, to the point, and engages the audience and their opinion.
Regardless of whether you're a start-up company just starting out or a large existing organization - setting up a data science team is no easy feat. While you may be in a leadership or management position you're unlikely to be able to do everything yourself. It would be unreasonable to expect this of anyone! Instead, your leadership, communication, and motivation skills are best used to create and lead a data science team that is motivated towards your goals. The question therefore becomes 'what are the typical roles in a data science team?'. And that's what we'll look at in this article.
Topic modelling is a form of text analysis that uses unsupervised machine learning to identify patterns, themes, clusters, and groups across a collection of documents. In this article I discuss using the powerful BERTopic library alongside quantized large language models to identify themes and topics from a collection of research papers at the intersection of artificial intelligence and ophthalmology. Then, we'll use the DataMapPlot library to produce a publication ready visualization of the thematic structure contained within the abstracts of those research papers.
Google Colab (or Colaboratory) is a complete, modern, cloud-based runtime environment. It gives individuals and teams the ability to work together on coding, data science and machine learning problems with shared access to data, state of the art GPU's and TPU's, and industry standard Python libraries. You are provided with an executable document, i.e. a notebook much like a Jupyter Notebook, that allows both code and markdown to be written and executed, and your results visualized. In this article I'll cover one of the most fundamental requirements all teams will face - how to get data into your Colab notebook in the first place!