Personal data is continually being generated, gathered and shared at an unprecedented level which is useful in improving the products and services. For instance, personal information is useful in a variety of domains which include medical research, crime analysis, web usage analysis, customer behavior analysis and risk analysis. However, a raw personal data may consist of information about individuals that is sensitive. Consequently, use of personal data is constrained owing to privacy concerns. Due to the mismatch in demand and supply, the concept of personal data market has emerged where personal data is viewed as a commodity. Numerous studies have emphasized that^{1, 2, 3, 4}. However, evaluating the costs and benefits of preserving or disseminating personal data and quantifying the worth of personal data is notably challenging.

There are three stakeholders involved in the personal data market:

The

The

The

Numerous studies exist on personal data market from the perspective of data privacy and can be classified into information-technology based and economic-based approaches. Technology-based approaches protect privacy while sharing personal data by masking, generalization and suppression techniques for anonymizing the personally identifiable attributes in the data ^{5, 6, 7, 8}. While technology-based approaches can be used in data mining applications to find aggregate or statistical information or to obtain interesting patterns in the data, they are not suitable in applications such as crime analysis and marketing where sensitive information and individual identities are required.

Economics-based approaches rely on pricing mechanism for privacy compensation. In these approaches, personal data is considered as a commodity and individuals trade off their privacy for monetary gain ^{9, 10. }The worth of the personal data is determined using economic tools. Laudon ^{11} advocates a market-based approach where individuals have the control over the information about them, and thus receive a fair compensation for trading off their privacy. Gkatzelis et al. introduces a market that makes payments to the individuals according to their privacy choices ^{12}. At the same time, Gkatzelis et al. introduces a mechanism that elicits individuals to truthfully report their privacy preferences. Similarly, ^{13} proposes a fair privacy compensation mechanism that depends on the privacy attitude of the individuals and the sensitivity of the information provided by each individual. In another study ^{14}, a big data market model is introduced and an optimal pricing scheme is formulated as a Stackelberg game to maximize the profit of the data source. Yet another study ^{15}, presents an optimal pricing scheme based on the quality of the provided data. However, studies ^{14} and ^{15} do not consider the privacy preferences of the data providers.

Furthermore, there are studies that integrate technology-based approaches with economic-based models that simultaneously preserve individuals’ privacy and increases the value of personal information. For instance in ^{16}, a mechanism is proposed that integrates the technology based security mechanism with the market model that compensates individuals according to their privacy attributes. However, the proposed mechanism of ^{16} has limited applicability since the economic model employed is dependent on the specific security mechanism.

The key determinants in pricing the personal data include privacy attitudes of the data owners and the value derived by the data buyers from the information obtained. Privacy is the right of an individual and each individual’s privacy attitudes are different. Consequently, each person may be willing to sell his personal data for different prices. For instance, individuals less concerned about privacy may intend to sell their personal data for a small payment while others who are more concerned about privacy may consent to sell their personal data for higher payments. Furthermore, there may be some individuals who pretend to regard privacy as more valuable and misreport the cost of their personal data in an expectation of receiving higher payments. On the other hand, data buyers desire to obtain maximum utility from the information acquired at minimum subscription cost. For instance, a data buyer may desire to obtain those samples of data that yield more information content. Thus both data owners and data buyers are selfish and each of them intend to maximize their own objectives.

In this study, a pricing mechanism based on competitive equilibrium theory is presented which jointly optimizes privacy loss of data owners and utility of data buyers. In this study, a data buyer desires to buy the samples that maximize the information entropy.

The remaining of the study is organized as follows: While Section 2 presents an overview of competitive equilibrium theory Section 3 describes calculation of information entropy. Section 4 describes the proposed pricing model and Section 5 demonstrates experimental results. Finally, conclusions are drawn in Section 6.

Competitive equilibrium also termed as Walrasian equilibrium is a market state which satisfies the following conditions:

Producers/sellers and the consumers/buyers arrive at an equilibrium price for a product.

At an equilibrium price, the market clears; in other words demand equals supply for a product.

At equilibrium price, sellers maximize their profits and buyers maximize their utility subject to resource and technological constraints.

In this study, we will discuss the competitive equilibrium of a Fisher market model that contains ^{17}. Formally, the conditions for the market can be stated as follows:

For all

For each commodity

Walras introduced a trial and error process called tâtonment process for determining equilibrium prices ^{18}. Each buyer reports his demands for goods maximizing his utility at prices announced by an auctioneer. The market is in disequilibrium if there is either an excess demand or supply. Buying and selling does not take place at disequilibrium instead the auctioneer either lowers or raises the prices of goods depending on demand and supply of corresponding goods. The process terminates if the market reaches an equilibrium that is when the market clears. In other words, equilibrium is reached if demand and supply of goods is equal. According to ^{19}, equilibrium is guaranteed if each commodity is consumed by at least one consumer and each consumer buys at least one item. More formally, Arrow and Debreu prove that the market converges to equilibrium if the utility functions of the agents are concave ^{20}.

Information entropy introduced by Claude Shannon is a measure of ‘uncertainty’ or ‘surprise’ associated with a random variable. Let

where

Suppose a dataset contains

The total entropy of all samples in a dataset

A skewed probability distribution is unsurprising and has low entropy and lesser information content. On the other hand, a balanced probability distribution is surprising and has high entropy and more information content. The contribution of information content of ^{21} as follows:

It can be noted that

According to ^{22} the information entropy is concave, continuous and continuously differentiable function of

We consider a data market

where

While each data owner

In order to obtain market equilibrium, the data market adjusts the prices of each data owner’s sample depending on demand and supply, and the information entropy of the corresponding data owner’s sample. If

Equilibrium is reached if a vector of prices

For each data buyer

The market clears, that is

Details of computing competitive equilibrium prices and bundles of samples for each buyer at equilibrium prices maximizing his utility subject to his budget constraints and data requirements and privacy preferences of data owners is presented in Algorithm 1. Since, information entropy is a concave, strictly continuous and continuously differentiable function of

Algorithm 1: Computation of competitive equilibrium prices
Input
Data Samples: D=x1,..,xn
Initial Prices of Samples: p1,..,pn
Maximum Prices of Samples: pmax1,..,pmaxn
Minimum Prices of Samples: pmin1,..,pminn
Budgets of buyers: b1,..,bm
Output
Equilibrium Prices: p1*,..,pn*
Bundles of Samples: X1,..,Xm where Xj⊆D
1.
pi*←pi for i=1 to n
2.
Loop
2.1.
At prices p1*,..,pn* determine bundles of samples X1,..,Xm such that each data buyer’s utility function (5) is maximized subject to budget and data constraints in accordance with privacy preferences of data sellers
2.2.
If ∑j=1mbj-∑j=1m∑xi∈Xj*pi*≥α then
Adjust prices p1*,..,pn* in accordance with information entropy and demand of the corresponding sample within the range of maximum and minimum prices specified by the data owners.
Else
Exit Loop
3.
Report equilibrium prices and bundles of samples X1,..,Xm at these prices

In order to substantiate the efficacy of the pricing mechanism presented in this study, experiments are conducted on Adult dataset of UCI machine learning repository. The dataset consists of 48,842 instances described by 14 attributes. However, random samples of 20,000 are drawn from 48,842 samples and 7 attributers are considered in the experiments. The description of the attributes is presented in

Attribute | Cardinality | Sensitivity |
---|---|---|

Age | 65 | No |

Gender | 2 | No |

Marital Status | 3 | No |

Race | 4 | No |

Native Country | 25 | No |

Education | 10 | No |

Occupation | 10 | Yes |

In the experiments, 120 buyers are considered each of them requesting 8000 samples. For simplicity, budget of all buyers is set to 1. The initial price

In

In a data market, the objective of the data owners is to ensure for an appropriate compensation in lieu of trading off their privacy. At the same time, the objective of the data buyers is to derive maximum utility from the data obtained. The value derived from the data is primarily dependent on the information entropy of the data; if information entropy is more, then the utility derived is more and vice a versa. Hence, in this study, a personal data market model and pricing mechanism based on competitive equilibrium is presented that simultaneously and jointly optimizes the profit of data owners and the utility of data buyers. Experiments validate the propositions and the effectiveness of the proposed approach. The proposed approach is applicable for a static data market. In the future, a solution to a dynamic market based on competitive equilibrium should be studied.