How to Build an Efficient Data Team to Work with Public Web Data
The topic of how to assemble an efficient data team is a highly debated and frequently discussed question among data experts. If you’re planning to build a data-driven product or improve your existing business with the help of public web data, you will need data specialists.
This article will cover key principles I have observed throughout my experience working in the public web data industry that may help you build an efficient data team.
Why isn’t there a universal recipe for assisting with public web data?
Although we have yet to find a universal recipe for assisting public web data — the good news is that there are various ways to approach this subject and still get the desired results. Here we will explore the process of building a data team through the perspective of business leaders who are just getting started with public web data.
What is a data team?
A data team is responsible for collecting, processing, and providing data to stakeholders in the format needed for business processes. This team can be incorporated into a different department, such as the marketing department, or be a separate entity in the company.
The term data team can describe a team of any size, from one to two specialists to an extensive multilevel team managing and executing all aspects of data-related activities at the company.
Where to start?
There’s a straightforward principle that I recommend businesses working with public web data to follow: an efficient data team works in alignment with your business needs. It all starts with what product you will build and what data will be needed.
Simply put, every company planning to start working with web data needs specialists who can ingest and process large amounts of data and those who can transform data into information valuable for the business. Usually, the transformation stage is where the data starts to create value for its downstream users.
To get to this stage, a small business can even start with one specialist.
The first hire can be a data engineer with analytical skills or a data analyst with experience working with big data and light data engineering. When building something more complex, it’s essential to understand that public web data is essentially used for answering business questions, and web data processing is all about iterations.
No matter the complexity of your product, you always start with acquiring a large amount of data.
Further iterations may include aggregated data or enriching your data with data from additional sources. Then, you process it to get information, like specific insights. As a result, you get information that can be used in processes that follow, for example, supporting business decision-making, building a new platform, or providing insights to clients.
The answer to what data team you need is connected to the tools you will be using,
Looking from a product perspective, the answer to what data team you need is connected to the tools you will be using, which also depends on the volumes of data you will be using and how it will be transformed. From this perspective, I can split building a data team into three scenarios:
- Scenario 1. You work with semi-automated or fully automated tools that don’t require customization and specific skills. Junior-level data specialists may even handle some tasks.
- Scenario 2. Some operations or data transformation processes require development work outside of the tools you’re using.
- Scenario 3. You cannot use the abovementioned options because your product requires full customization. In this case, you could use open-source software and build everything from scratch based on your exact product needs.
What is your product and vision for building an efficient data team?
Ultimately, the size of your data team and what specialists you need depend on your product and vision for it. Our experience building Coresignal’s data team taught us that the key principle is to match the team’s capabilities with product needs, despite the seniority level of the specialists.
How many data roles are there on a data team?
The short answer to this question is “It depends.” When it comes to the classification of data roles, there are many ways to look at this question. New roles emerge, and the lines between existing ones may sometimes overlap.
Let’s cover the most common roles in teams working with public web data. In my experience, the structure of data teams is tied to the process of working with web data, which consists of the following components:
- Getting data from the source system;
- Data engineering;
- Data analytics;
- Data science.
In her article published in 2017, a well-known data scientist Monica Rogati introduced the concept of the hierarchy of data science needs in an organization. It shows that most data science-related needs in an organization are related to the parts of the process at the bottom of the pyramid – collecting, moving, storing, exploring, and transforming the data. These tasks also make a solid data foundation in an organization. The top layers include analytics, machine learning (ML), and artificial intelligence (AI).
However, all these layers are important in an organization working with web data and require specialists with a specific skill set.
Data engineers
Data engineers are responsible for managing the development, implementation, and maintenance of the processes and tools used for raw data ingestion to produce information for downstream use, for example, analysis or machine learning (ML).
When hiring data engineers, overall experience working with web data and specialization in working with specific tools is usually at the top of the priority list. You need a data engineer in scenarios 2 and 3 mentioned above and in scenario 1, if you decide to start with one specialist.
Data (or business) analysts
Data analysts primarily focus on existing data to evaluate how a business is performing and provide insights for improving it. You already need data analysts in scenarios 1 and 2 mentioned above.
The most common skills companies seek when hiring data analysts are SQL, Python, and other programming languages (depending on the tools used).
Data scientists
Data scientists are primarily responsible for advanced analytics that are focused on making future predictions or insights. Analytics are considered “advanced” if you use them to build data models. For example, if you will have machine learning or natural language processing operations.
Let’s say you want to work with data about companies by analyzing their public profiles. You want to identify the percentage of the business profiles in your database that are fake. Through multiple multi-layer iterations, you want to create a mathematical model that will allow you to identify the likelihood of a fake profile and categorize the profiles you’re analyzing based on specific criteria. For such use cases, companies often rely on data scientists.
Essential skills for a data scientist are mathematics and statistics, which are needed for building data models, and programming skills (Python, R). You will likely need to have data scientists in scenario three mentioned above.
Analytics engineer
This relatively new role is becoming increasingly popular, especially among companies working with public web data. As the title suggests, the role of an analytics engineer role is between an analyst who focuses on analytics and a data engineer who focuses on infrastructure. Analytics engineers are responsible for preparing ready-to-use datasets for data analysis, which is usually performed by data analysts or data scientists, and ensuring that the data is prepared for analysis in a timely manner.
SQL, Python, and experience with tools needed to extract, transform, and load data are among the essential skills required for analytics engineers. Having an analytics engineer would be useful in scenarios 2 and 3 mentioned above.
Three things to keep in mind when assembling a data team
As there are many different approaches to the classification of data roles, there’s also a variety of frameworks that can help you assemble and grow your data team. Let’s simplify it for an easy start and say that there are different lenses through which a business can evaluate what team will be needed to get started with web data.
Data lens
I’m referring to the web data in this article is big data. Large amounts of data records are usually delivered to you in large files and raw format. It would be best to have data specialists with experience working with large data volumes and the tools used for processing it.
Tech stack lens
When it comes to tools, you should consider that tools that your organization will use for handling specific types of data will also shape what specialists you will need. If you need to become more familiar with the required tools, consult an expert before hiring a data team or hire professionals to help you select the right tools depending on your business needs.
Organizational lens
You may also start building a data team by evaluating which stakeholders the data specialists will work closely with and deciding how this new team will fit into your vision of your organizational structure. For example, will the data team be a part of the engineering team? Will this team mainly focus on the product? Or will it be a separate entity in the organization?
Organizations that have a more advanced data maturity level and are building a product that is powered by data will look at this task through a more complex lens, which involves the company’s future vision, aligning on the definition of data across the organization, deciding on who and how will manage it, and how the overall data infrastructure will look as the business grows.
What makes a data team efficient?
The data team is considered efficient as long as it meets the needs of your business, and almost in every case, the currency of data team efficiency is time and money.
So, you can rely on metrics like the amount of data processed during a specific time or the amount of money you spend. As long as you track this metric at regular intervals, the next thing you want to watch is the dynamics of these metrics. Simply put, if your team is managing to process more data with the same amount of money, it means the team is becoming more efficient.
Another efficiency indicator that combines the aforementioned is how well your team is writing code because you can have a lot of resources and perform iterations quickly, but errors equal more resources spent.
Besides the metrics that are easy to track, one of the most common problems that companies experience is trust in data. Trust in data is precisely what it sounds like. Although there is a way to track the time it takes to perform data-related tasks or see how much it costs, stakeholders may still question the reliability of these metrics and the data itself. This trust can be negatively impacted by negative experiences like previous incidents or simply the lack of communication and information from data owners.
Moreover, working with large volumes of data means spotting errors is a complex task. Still, the organization should be able to trust the quality of the data it uses and the insights it produces using this data.
It is helpful to perform statistical tests allowing the data team to evaluate the quantitative metrics related to data quality, such as fill rates. By doing this, the organization can also accumulate historical data that will allow the data team to spot issues or negative trends in time. Another essential principle to apply in your organization is listening to client feedback regarding the quality of your data.
To sum up, it all comes down to having talented specialists in your data team who can work quickly, with precision, and build trust around the work they are doing.
Conclusion
To sum everything up, here are helpful questions to help you assemble a data team:
- What is your product?
- What data will you be using?
- What are the key components of the product that involve data?
- What are the results expected from different project stages involving data?
- What tech stack will be required for that?
- Who are the stakeholders?
- What indicators will help you evaluate if your current data team meets your business needs?
I hope this article helped you gain a better understanding of different data roles that are common in organizations working with public web data, why they are essential, which metrics help companies measure the success of their data teams, and finally, how it is all connected to the way your organization thinks about the role of data.
Featured Image Credit: Photo by Sigmund; Provided by Author; From Unsplash; Thanks!