Opinion
Open web data collection will fuel future innovations
“The future of web data collection is up to those who control it,” Bright Data CTO Ron Kol says about information-driven trends. “Future data collection efforts will need to evolve and grow.”
The phrase “knowledge is power” does not apply only to individuals but also to businesses. While individuals can learn from books and life lessons, businesses need a tool to gather market insights. This is where web data collection comes in. It is one of the best tools because it provides up-to-date and reliable information for businesses that have to make critical decisions.
Fortunately, there is no shortage of online data. In fact, a recent survey from Frost & Sullivan found that 49% of IT decision-makers use data collection for business-critical operations, such as market research. Another 44% said they use data collection to gather competitive public data.
We have created a lot of web data this year, in large part because of the tremendous digital shift the world has experienced. Take online shopping, for example. Those who wanted to avoid crowded stores during the pandemic turned to online retailers, like Amazon, for quick and easy delivery. This resulted in Amazon shipping6,659 packages per minute in 2020.
Given the increased need for data-driven insights in these unprecedented times, we expect to see the following trends emerging as we advance.
Fraudulent digital activity will keep web data shielded
Sometimes, a few bad apples spoil the bunch. This saying is especially true for fraudsters who, even during these challenging times, continue to improperly use web data to conduct illegal activities, including fake reviews, illegal purchases, and phishing.
For example, a scam called brushing emerged last summer, where fraudsters sent free merchandise to addresses that are publicly available on the internet. In reality, the scammers were only using the addresses to make people look like verified buyers of those products so they could write glowing (and fake) reviews. This is obviously a clear breach of ethics and not the way public web data collection is intended to be used. Unfortunately, the amount of fraudulent activity like this is on the rise.
As would be expected, these fraudulent activities lead companies to take security precautions to protect themselves. Such precautions include limiting the amount of web data they make free and publicly available. Those responsible for abusing open data are causing such limitations, which affect everyone else, especially the business community that seeks to use public web data for legitimate purposes. This is a trend that, I believe, will, unfortunately, continue well into 2022.
Web data as a business driver for decision-making
It’s expected that companies will continue to legitimately protect confidential and proprietary information with data privacy and copyright restrictions. However, plenty of web data that is publicly available and free for businesses can be leveraged to make informed decisions. This includes using data to drive innovation and product development forward, pricing products competitively, improving customer service and enhancing the quality of products.
For example, in the world of e-commerce, shipping data can help companies estimate how much they should ramp up or slow down their efforts.
Data can also be used to predict and react to market shifts as they unfold by following financial or consumer behavior trends in real-time. For instance, 2021 has, so far, certainly been an unpredictable year, and I expect businesses across industries will continue to turn towards web data to guide their decisions.
Evolved data markets will promote open and on-demand data
The Frost & Sullivan survey also found that 54% of IT decision-makers expressed a need for large-scale web data collection to keep pace with their businesses’ growing demand for information. However, in order for businesses to be able to utilize online data, it needs to be accessible – not hidden. Today, businesses often block public web data collection attempts, while collecting it themselves. This situation is caused by two major factors: the continuous need to block malicious online activity, and the notion that somehow this public data is part of what gives a company its competitive edge.
I believe that in the coming months and onwards, companies will realize that public data collection is part of the general ongoing business conduct and is necessary for everyone. They will also realize that when it comes to a business’s competitive edge, areas such as inventory, prices, product quality, service quality, etc. play a big role as well. Once that realization settles in, blocking data will serve only to protect against abusive online activities.
So, how do we ensure that the right people get access to public data while blocking out the abusers? While this is not a simple task, it is certainly a doable one. One solution that could serve all is promoting the open exchange of information in central data hubs. Sites will continue to block abusers; this will not change. However, maybe they will permit compliant data collectors.
- ESG investing strategies powered by public data – predicting companies future performance and impacting evaluation
- Web data is driving AI development
- Millions of passwords stolen by cybercriminals exposed by activist hacker
The good news is that central data hubs are already in use, and I predict that they will only grow in popularity as data markets expand over the next few years. Research shows that IT managers already desire on-demand, quality, and verified public data. It will be interesting to watch these data markets take shape.
Ultimately, the future of web data collection is up to those who control it. At the rapid rate that data is being produced, future data collection efforts will need to evolve and grow. Companies will need automated data collection to keep up with their competitors and be able to gather data at a rate that would otherwise be impossible to complete manually. After all, the speed at which companies can access fresh public data will determine their relevancy and success.
Ron Kol, is Bright Data CTO