AT Think

How to value data: the fuel powering the AI revolution

Data has become one of the most valuable assets a company can own. Much like oil powered the industrial era, data now drives innovation, competitiveness and profitability, especially in the world of artificial intelligence.

Machine learning models depend on vast quantities of high-quality data to identify patterns and make accurate predictions. Without data, AI systems would lack the foundation needed to learn, adapt and generate meaningful insights.

For businesses monetizing and leveraging their proprietary data, what methodologies can be employed to value these assets?

Examples of proprietary data monetization

In addition to using data internally to gain a competitive advantage or boosting efficiency and improving decision making, we are seeing increasing examples of companies with high-quality, specialized datasets capitalizing by selling or licensing these assets to third parties.

Health care

One area where data is transforming the industry's landscape is in diagnostic imaging, particularly in the interpretation and utilization of medical images such as X-rays, MRIs and CT scans.  A key benefit of AI in this field is its ability to accelerate the diagnostic process. Traditional methods of image interpretation can be time-consuming and subject to human error. AI, however, can process and analyze images quickly, significantly reducing the time and costs for diagnosis. AI capabilities can also enhance the accuracy of diagnoses. By learning from large datasets of medical images, AI algorithms can identify patterns that might be overlooked by human readings.

The cost to access imaging data can be substantial. Stanford University's Center for Artificial Intelligence in Medical Imaging curated an imaging data repository featuring a total of 223,462 unique pairs of radiology reports and chest X-rays across 187,711 studies from 64,725 patients. The school licenses the data for an annual fee of $70,000 per dataset. 

Automotive

Connected vehicles collect a wide range of data, including location, driving habits, vehicle health, car-usage, and even personal information from the driver's connected devices, such as a smart phone through Apple CarPlay. This data can be used for various purposes, including advertising, insurance rates and even employment verification.  As more connected vehicles hit the road, the volume of data being generated is rapidly growing. Research by S&P Global found that connected vehicles can generate nearly 25 GB of data per hour from over 100 different data points.  

BMW's CarData platform provides access to the telematics data of BMW and Mini vehicles. The bundled vehicle information types are called "keys" and are arranged in categories such as usage data, event data, vehicle, basic data and metadata.  BMW charges €0.09 per individual event key and €0.29 per individual data key with a maximum monthly cap of €5.00 per car.

Media

Reddit started charging for access to its data and content via its application programming interfaces in 2023, asking developers to pay $12,000 per every 50 million requests. API is the method third-parties access data from a platform to connect to the third-party's apps, conduct research, or data analysis.  Reddit produces a massive amount of user-generated content from a diverse "community of communities," with a total of 5.3 billion pieces of content created by their users in the first half of 2024. Both Google and OpenAI use Reddit data to train their large language models, which underpin Google's Gemini and OpenAI's ChatGPT.  

How to value proprietary data

As data increasingly becomes a critical asset for AI-driven companies, accurately determining its monetary value becomes essential. There are several primary valuation methodologies, each suited for different scenarios. 

Market approach

Market-based valuation involves assessing data value based on comparable market transactions or licensing agreements, adjusting prices for differences such as data quality, exclusivity, or volume. While straightforward and market-driven, it can sometimes be challenging to find truly comparable datasets, limiting its accuracy for highly specialized or proprietary data.

Cost approach

Cost-based valuation calculates data's value based on expenses incurred in obtaining, preparing, maintaining and storing it, including both direct costs (acquisition, infrastructure) and indirect costs (labor, compliance). This method is quantifiable and practical for newly created datasets but may undervalue datasets where the strategic or market worth significantly exceeds production costs.

Income approach

In this approach, data is quantified based on expected future economic benefits derived from its usage, typically through revenue growth, cost reduction or operational improvements. It closely ties the data's valuation to tangible business outcomes, although forecasting future benefits can introduce uncertainty and sensitivity to assumptions.

Advanced data-based valuations

Advanced statistical or data science-based valuation combines analytics and machine learning to create nuanced valuation models tailored specifically to data characteristics. This approach uses techniques like feature extraction, sensitivity analysis and predictive modeling to identify the relative importance of data attributes, such as freshness, frequency of access and delivery mode (real-time vs. batch). 

While sophisticated and highly tailored, it requires advanced expertise and robust historical data to ensure accuracy. In practice, organizations frequently use hybrid methods, combining elements from multiple methodologies to capture data's full strategic, economic and operational value.

Implications for the accounting profession

As data becomes a recognized intangible asset, CPAs are uniquely positioned to lead in its financial interpretation, governance and assurance. In financial reporting, proprietary datasets must be accurately valued for purchase price allocations under ASC 805. During M&A due diligence, accounting professionals assess whether data-driven business models are sustainable, particularly where data is a core revenue generator. Cross-border data use also raises complex transfer pricing questions, requiring accountants to evaluate intercompany pricing models in line with OECD and IRS guidance. 

For auditors, the rise of data-centric business practices adds new dimensions to SOX compliance and internal controls testing, where the integrity and monetization of data must be considered. CPAs in advisory roles are increasingly engaged to quantify the financial impact of enterprise data, such as evaluating the return on investment in CRM platforms or data licensing agreements. 

As data powers AI and digital transformation, the accounting profession will continue to play a critical role in ensuring these assets are reliably valued and properly disclosed.

For reprint and licensing requests for this article, click here.
Technology Data Analytics Data management
MORE FROM ACCOUNTING TODAY