Automatic text summarization (ATS) techniques offer powerful solutions for generating accurate and informative summaries from textual data content. In our digital age, where an estimated 403 million terabytes of data are generated daily, it is vital that we are able to distil large amounts of textual content into focused summaries, containing just the salient details. In this article I provide an updated survey of state-of-the-art ATS methods, with a particular focus on how large language models models address the complexities and nuances of automated text summarization.
Speaking in public, or in front of an audience at work, can be daunting. But there are ways we can bring structure and order to our thoughts and how we convey them. In this article I'll discuss three simple techniques for ensuring your next business speech or presentation is sharp, to the point, and engages the audience and their opinion.
Regardless of whether you're a start-up company just starting out or a large existing organization - setting up a data science team is no easy feat. While you may be in a leadership or management position you're unlikely to be able to do everything yourself. It would be unreasonable to expect this of anyone! Instead, your leadership, communication, and motivation skills are best used to create and lead a data science team that is motivated towards your goals. The question therefore becomes 'what are the typical roles in a data science team?'. And that's what we'll look at in this article.
Topic modelling is a form of text analysis that uses unsupervised machine learning to identify patterns, themes, clusters, and groups across a collection of documents. In this article I discuss using the powerful BERTopic library alongside quantized large language models to identify themes and topics from a collection of research papers at the intersection of artificial intelligence and ophthalmology. Then, we'll use the DataMapPlot library to produce a publication ready visualization of the thematic structure contained within the abstracts of those research papers.
Google Colab (or Colaboratory) is a complete, modern, cloud-based runtime environment. It gives individuals and teams the ability to work together on coding, data science and machine learning problems with shared access to data, state of the art GPU's and TPU's, and industry standard Python libraries. You are provided with an executable document, i.e. a notebook much like a Jupyter Notebook, that allows both code and markdown to be written and executed, and your results visualized. In this article I'll cover one of the most fundamental requirements all teams will face - how to get data into your Colab notebook in the first place!
Even a good database design can't always protect against bad data. But there are plenty of occasions when a good database design helps us avoid the bigger of the possible bad data - database headaches. Thus, in this article I'll discuss database normalization: what it is, why we do it, and how we do it. I'll also discuss how we can determine when a database table is normalized 'enough' and what indications we can look for that suggest bad data might pose a problem for our database.
Amazon Web Services (AWS) was the most popular Cloud provider available through the first quarter of 2022, controlling 33% of the entire market; beating Microsoft Azure and it's 21% share. Both organizations and individual developers use cloud services from AWS, Microsoft, and other vendors for machine learning, data analytics, cloud native development, application migration, and many other services. In this short article I'll discuss how to setup your own AWS account, as the root user, then how to create your first IAM admin user, before finally introducing some of the main services that standout as the most useful through a data scientists career.
Django is one of the most popular Python full-stack web frameworks available. It's high-level design makes rapid development of web apps easier and cleaner while requiring less code. TailwindCSS is rapidly becoming the first choice CSS framework for styling modern websites. It's utility-first approach makes creating beautifully styled apps with consistent choices of colour, spacing, typography (and everything else CSS) far easier for a great looking website or app. In this article I'll show you how you can combine these two frameworks so that they work together without missing out on monitoring development changes with both npm and Django development servers.