Introduction
Regardless of whether you’re a start-up company just starting out or a large existing organization - setting up a data science team is no easy feat. While you may be in a leadership or management position you’re unlikely to be able to do everything yourself. It would be unreasonable to expect this of anyone! Instead, your leadership, communication, and motivation skills are best used to create and lead a data science team that is motivated towards your goals. The question therefore becomes ‘what are the typical roles in a data science team?’.
The first thing to realize is that quality data science is best achieved through team effort. It takes a large group of people working together to solve real, practical data science projects that bring true value to the company or organization. Those people include the data scientists, the managers of those data scientists, and the data engineers who perform and develop the infrastructure.
Let’s consider the key positions within a typical data science team.
Data Science Team Roles
Data Engineer
The data engineer is usually the person responsible for constructing and maintaining the databases within which data is held. This could be data generated internally by the organization as part of it’s day to day activities. Or it could be data generated by clients and customers of the company (think customer accounts for example). But it could also be externally acquired data from e.g. web scraping activities.
By extension the data engineer is also the person you would turn to for extracting data out of that database as and when required for analysis (by the data scientists). This might mean maintaining multiple connected database tables. In particular it could mean maintaining multiple tables where the same data is held, but where the quality of that data varies. For example it might be that raw data is acquired from the interactions of customers and clients with the company website. You might refer to this simply as raw data; and let’s suppose we store this in a database table referred to as “bronze data”. But the data engineer could then combine this data with a separate table of data from e.g. social media accounts, or customer spending history. In doing so we enhance the existing data, in the “bronze data” table, thereby creating a new data set that is richer or more complete. And we might refer to this as the “silver data” table. Combining data in this way helps ensure the data engineer meets the data quality assessment metrics, such as completeness, accuracy, consistency, validity, uniqueness, and integrity.
The data engineer may also be expected to implement production-level machine learning algorithms, as their data scientist teammates deliver their models, by implementing them on servers for large-scale or real-time application.
Thus the focus of a data engineer’s work is more on the creation of production grade infrastructure, both hardware and software, rather than performing the actual day-to-day data analysis. They take the requirements of data scientists, again both in terms of hardware and software, and implement those either within a cloud environment or on dedicated in-house server systems. The work therefore also includes the long-term maintenance of these systems, ensuring they remain available in line with service level agreements made at data science management level (at a minimum) and organizational level (more likely).
Data Scientist
Turning to data scientists and we’ll meet the people who are more likely, on a day to day basis, to extract data from the database in order to perform their analysis. Experience working with databases (both SQL and NoSQL flavours) is vital here. By extension this pulling and analysing of data is done in order to perform experiments and then, perhaps, visualize the results, such that their findings can be communicated to team mates or leadership. Data scientists will often work within the context of data analysis pipelines; where data is extracted, cleaned, analysed, and the findings presented.
The experiments themselves could be exploratory in nature, for example discovering customer groupings, or popular products, or indeed the interrelationships between the two. But experiments are also conducted towards developing machine learning or prediction algorithms using data pulled from the database created by the data engineer. Once the appropriate model performance targets are met the data scientist would then hand over the model to the data engineer for them implement such that it can be run at scale. This separation of duties between data scientists (who develop the models) and data engineers (who implement the models at scale) is necessary to maintain a robust data ecosystem while driving meaningful insights.
Typically you would expect candidate data scientists to come from backgrounds in quantitative fields, such as mathematics, physics, or engineering. Ideally additional data science training, and in particular programming experience, would also be evident in their profile. Conversely you may also find that candidates with more software engineering backgrounds wish to make a career move into the data science field. Should this be the case you would look for evidence of applied statistics training, or evidence of statistic analysis experience.
Data Science Manager
You could draw an analogy between the data science manager and a scrum master, or indeed a project manager, through the way in which they ensure effective interaction and communication within the data science team and with the rest of the organization. They would run the daily sprint meetings to ensure their data scientist team mates have everything they need to meet their sprint targets and in turn to remove any blockers that could be preventing progress towards task or milestone deadlines. Indeed the data science manager would be the personel who takes the goals and deliverables of the organization overall (or perhaps a sub-department) and expresses those goals in terms of priorities, which would then be scheduled against a timeline of tasks that role-up into milestones.
The data science manager would also be expected to recruit for and build the data science team. This fits in nicely with demonstrating their communication skills as they promote the capabilities and contributions of the data science team to others in the organization including interfacing with upper management and peers across the organization. In this respect you can think of the data science manager as a facilitator i.e. a bridge between the team and the broader organization. Their involvement in recruitment, team building, and promotion of the team’s work is essential for sustaining a productive and well-integrated data science function. And again it introduces a much stronger focus on communication skills versus the expectations of, say, the data engineer.
One would hope a data science manager behaves in a supportive and motivational manner; being positive and encouraging to their teammates within the data science team, especially given the frustrations that are inherent in any engineering role. There is also a real skill to be found in an ability to handle, communicate, and re-strategise in response to disappointing results. Indeed the data science manager should have sufficient knowledge of the data science field to recognize when deliverables requested by stakeholders perhaps cannot be achieved with the time, budget, or scope requested. The role therefore is to advocate for the most effective use of data science in a corporate setting.
Team Dynamics
While the above discussion highlights the clear delineation between data engineers, data scientists, and data science managers - and thus the importance of specialized roles within the team - data science projects will absolutely require a collaborative effort across the whole team. True value will only come from a data science team that works effectively together. Thus the challenge remains for managers and leaders to harness the diverse expertise found within the data science team and the distinct responsibilities a strong team will engage with. From infrastructure to analysis to people management, this separation of duties is necessary to maintain a robust data ecosystem while driving meaningful insights.
The data science team should work collaboratively on projects, milestones, and tasks. You should plan for regular joint meetings and presentations to facilitate idea exchange and project coordination. And so it almost goes without saying that there is a great need for strong communication skills both from members of the data science team and the wider organization. Effective communication ensures that data science projects align with organizational goals and that insights are appropriately disseminated.
Of course, no matter how well you set up your data science team, and no matter how mature the personalities are, there will always be some potential for internal difficulties. These may be related to personality clashes or interactions between people. Other difficulties may be related to the way data scientists and data engineers tend to work or their performance under pressure. My suggestion here would be for the data science manager to set up an environment where these sorts of problems can be discussed, and thus minimized, to keep the team moving forwards together in as quickly and as friendly way as possible.
Motivation is a huge component of keeping an internal data science team moving forward. Given the challenging nature of data science and data engineering it’s important to use postitive feedback and to celebrate the team’s successes (milestones being reached etc.) as much as possible. By establishing clear policies, fostering open communication, addressing issues promptly, and maintaining a positive work environment, you can effectively manage a data science team and mitigate internal difficulties.
Conclusions
And so there you have it; from a high level, the main roles to consider when building a data science team and the principal team dynamics you will need to manage in order to be successful.