Data Valuation

Data is the main fuel of the modern world enabling artificial intelligence and driving technological growth. The demand for data has grown substantially, and data products have become valuable assets to purchase and sale since it is extremely valuable for sectors to acquire high quality data to discover knowledge. As a valuable resource, it is important to establish a principled method to quantify the worth of the data and its value for the data seekers. This is addressed via data valuation which is the essential component for realization of a fair data trading platform for owners and seekers.

Problem Statement

Consider the case when a Pharma Company would like to purchase data from a Hospital. The challenge is that how the Pharma Company can  value the worth of the data available at the Hospital without having access to it.  In other words, the challenge is valuing invisible decentralized data that is not available locally.   Furthermore, we consider data valuation without focusing on a specific task; that is, the Pharma Company would like to know the worth of the data available only at the Hospital without disclosing the task that they may want to purchase the data for. This is called an intrinsic data valuation, or a data-driven data valuation approach.


We use the fact that the Pharma Company also has data, which can be used to value the data at the Hospital. We estimate the Hospital's data by comparing its statistical properties with that of the data available locally to the Pharma Company. The rationale behind our approach is that the Pharma Company would like to purchase a data that is highly relevant and diverse compared to its own data. Therefore, we estimate the relevant and diverse statistical properties of the Hospital's data with respect to the Parma Company's data. This is done through sending queries from the Pharma Company to the Hospital which compare the statistical properties of the two datasets along the directions that are important to the parma Company. Data valuation is then measured from the relevance and diversity estimates.     

Please refer to the paper here for the more detail.