«Access to this paper is restricted to registered delegates of the EMAC 2015 Conference. Purchase Conversions and Attribution Modeling in Online ...»
Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical
Author: TAHIR NISAR - Email: email@example.com
University: SOUTHAMPTON UNIVERSITY BUSINESS SCHOOL
Track: Modelling and Marketing Analytics
Co-author(s): Man Yeung (University of Southampton)
Access to this paper is restricted to registered delegates of the EMAC 2015 Conference.
Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation Abstract In a purchase funnel, a consumer may interact with an assortment of ad platforms ranging from display ads, paid search and organic search to social media and email. In this study, we consider attribution models that can be applied to assign sales credit to these and other online channels. Using an online firm’s conversion data, we investigate the commonly used the lastclick attribution model and compare its results to a cooperative game theory based (Shapley Value) attribution model. Our findings show that individual rewards vary significantly for different online channels under these two models. We also compute contributions of the various estimated factors using the Shapley Value regression approach in order to decompose a consumer funnel by regressed sources. Our empirical research provides insights into the complexity of attribution modeling.
Keywords: Purchase funnel; Attribution modeling; Last-click; Shapley Value Conference Track: Modeling and Marketing Analytics
1. Introduction Digital advertising campaigns are often launched across multiple channels, a selection of which may include search, display ads, social media, mobile, video, and email. These channels assist consumers to make purchase decisions, or sign up to a service being advertised, as they are exposed to advertisement impressions. To gauge the effectiveness of such advertising campaigns, it will be necessary to know which media channels or advertising formats have contributed to a purchase conversion. This is a process known as attribution. A better understanding of this process or assigning conversion credit to the various relevant channels can serve a number of research and industry purposes. For example, marketing managers may use such attribution models to interpret the influence of advertisements on consumer behavior and optimize their advertising campaigns.
In this paper we first examine the last-click attribution model and then consider a cooperative game theory (Shapley Value) based attribution approach as a statistical model for online businesses (Osborne and Rubinstein, 1994). The Shapley Value model assesses the contributions of a set of factors whose sum accounts for the purchase conversion. In our context, the approach yields an exact additive decomposition of any touch points into its contributory factors. Using an online firm’s purchase conversion data, the study sheds light on how these attribution models can be used to better measure advertising performance. As the effect of changing attribution models for different online channels has been largely unstudied, an analysis of these models will allow conclusions to be made on whether an advertising format’s revenues significantly differ between the models. To facilitate our analysis, we compare the performance of display advertising with other online sales channels.
We first provide a brief literature survey to identify the challenges of attribution modeling in online advertising markets. Our empirical results about the outcomes of different attribution models are presented in the next section. The following section describes our findings on Shapley Value regression model. The study then progresses to consider implications for different online sale channels and attribution. These are summarized in the last section.
2. Attribution in online advertising: A literature survey
There is a small but rapidly growing body of literature that examines the entire clickstream history of individual consumers in terms of whether visits to different ad formats have positive effects that accumulate toward a purchase (e.g., learning about a product that the shopper intends to buy. See Wiesel, Pauwels and Arts, 2011). This strategy of modeling the purchases as a result of the accumulative effects of all previous interactions largely focuses on how non-purchase activities (e.g., advertisement clicks, website visits) affect the probability of purchasing. Their concern with the non-purchase activities means that they cannot directly deal with the question of attributing credit for conversion to each individual ad format. Relatedly, Xu, Duan and Whinston (2014) study the specific “exciting effects” between advertisement clicks (i.e. how the occurrence of an earlier advertisement click affects the probability of occurrence of subsequent advertisement clicks). Li and Kannan (2014) use a probit-based consideration and nested logit formulation for visit and purchase to attribute conversions. These and other predictive models have (Li et al., 2010) generally focused on the classiﬁcation accuracy and, more importantly, they do not pay enough attention to the stability issue of the variable contribution estimate.
2.1. Shapley Value-based attribution model In digital advertising, multi-channel attribution is one of the most important problems, especially as a wide variety of media are involved. In recent years, researchers have made efforts to develop a true data-driven methodology to account for the inﬂuence of each user interaction to the ﬁnal user decision. Shao and Li (2011) have developed a probabilistic model based on a combination of ﬁrst and second-order conditional probabilities. There are
two steps involved in generating the probabilistic model:
Step 1. First compute the empirical probability of the main factors,
for i ≠ j. A conversion event (purchase or sign-up) is denoted as y which is a binary outcome variable, and xi,i = 1,...,p, denote p diﬀerent advertising channels. Npositive(xi) and Nnegative(xi) denote the number of positive or negative users exposed to channel i, respectively, and Npositive(xi, xj) and Nnegative(xi, xj) denote the number of positive or negative users ex- posed to both channels i and j.
Step 2. The contribution of channel i is then computed at each positive user level as:
where Nj≠i denotes the total number of j’s not equal to i. In this case it equals to N-1, or the total number of channels minus one (the channel i itself) for a particular user. An advantage of using this estimation is that it includes the second-order interaction terms in the probability model. As there is significant overlap between the inﬂuences of different touch points due to the user’s exposure to multiple media channels, the model fully estimates the empirical probability with the second-order interactions. Another important assumption is that the net eﬀect of the second-order interaction goes evenly to each of the two factors involved.
Dalessandro, et al. (2012) show that, after rescaling, this probability model is equivalent to their Shapely Value formulation under certain simplifying assumptions.
3. Data description We utilize logs from a large-scale online sales platform to first identify where different online channels feature in the customer journey. In total, 996,708 transactions are included in the analysis, with total revenue of $158,519,417, at an average order value of $159.04. Our conversion data span 104 weeks from January 1, 2012 – February 28, 2014. Currently, the firm we investigated attributes revenue generated through online transactions to its various paid marketing tools on a last-click basis. In our data, we have information about the following digital channels: display ad, organic search, paid search, price comparison sites, email, retargeting, and social media.
4. Attribution models: An empirical investigation Our specific hypotheses relate to examining the financial importance of display advertising channel under the current last-click model; and the effects of moving to Shapley Value-based attribution model. We test the hypothesis that, as being a convertor, display advertising generates more revenue under the last-click model than Shapley Value-based attribution model. In addition, we compute contributions of the various estimated factors using the Shapley Value regression approach so as to decompose a consumer funnel by regressed sources. The approach has the merit of computing the weighted marginal contributions of an estimated conversion source in various coalitions of conversion sources.
These weighted contributions exactly sum up to the considered channel impact measure.
4.1. The last-click model Current industry practice indicates that the majority of online sales are attributed on a “last ad” or “last-click” model. The model attributes all conversions to the last referring impression within a customer journey, which means it is the final interaction that matters from a marketing perspective (Li and Kannan, 2014). The contribution of display ads and the other online marketing tools to online revenue are presented in Table 1. It can be seen that using the current last-click method, display ads generate 18.42% of total online revenue. The highest revenue generating online marketing tool is that of organic search, bringing 63.45%.
Social media contributes the least with the current model, at 0.02%. The mean order value for display ads offer insight into this as it is higher than any other of the marketing tools at $159.04. We conduct two-sample t-test comparing average order value of display ads to the rest of online marketing tools. It examines if there is any significant difference between the means of the average order values for display ads against the rest of the online marketing tools. The T statistic of 21.22 is greater than the two-tail critical value of 1.96 and therefore indicating (with a 95% confidence level) there is significant difference between the average order values. Furthermore, the p-value of 3.13E-98 is considerably lower than 0.05.
4.2. The Shapley Value-based attribution model The Shapley Value methodology was developed in a cooperative game setting, and has been applied from measuring systemic risk in a macroeconomic environment to inequality indices (Osborne and Rubinstein, 1994). In a typical Shapley Value cooperative game, a group of players generates a shared “value” (e.g. wealth, cost) for a group as a whole.
The Shapley Value of a player in a game is calculated as his expected marginal contribution over the set of all permutations on the set of players. The Shapley Value of an advertising medium is its expected marginal contribution over all possible sets of the interacting channels. We have noted these assumptions in the formulation in Section 2.1, and use it to calculate the percentage of value allocated to each given channel.
Figure 1 shows the effects on revenue attribution for the online marketing tools using the Shapley Value-based model. Our results show that display ads represent 14.34% of the revenue generated, down on the 18.42% revenue accumulated under the last click model, whereas organic search registers only a small increase from 63.45% to 64.17%. Social media and email record the largest changes in value percentage, as reflected in their revenue generation contributions of 2.14% and 2.58%, respectively. There is also a sizeable increase in paid search, increasing from 10.92% under the last-click model to 12.85% under the probability model. We conduct two-sample t-test comparing last click and probability based display ad rewards. The T statistic of 28.43 is greater than the two-tail critical value of 1.96 and therefore indicating (with a 95% confidence level) there is significant difference between the average display advertising return. It could therefore be concluded that the Shapley Value based attribution model on average attributes lower revenue to display ads. This is also supported by Table 2 that shows that display ads are allocated 24.86% lower revenue under the Shapley Value-based attribution model.