Visions, PDMA's product development e-magazine

Thought Leaders of Product Development & Management

pdma: Visions: Back Issues: October 99 :

New Product Forecasting Part 2: What-if Analysis & Prospect Lists

Jeffrey Morrison, Director of Modeling Equifax Corporation
(m_jeffer@bellsouth.net) Forecasters have always struggled with how best to develop realistic projections in an environment where historical data and adequate market research may be scarce. Although new product forecasters are faced with even more challenges in this area, some statistical modeling techniques used to analyze mature products can be applied to new products to provide valuable insight into long run market acceptance. This article is the second in a three part series discussing quantitative forecasting techniques for new product forecasting.

In the last article (Visions, July 1999; page 10), we looked at Jim who had recently been promoted to Product Manager in a national sports equipment company, ABC Athletics. The research group had just completed the development of a new golf ball that goes 20% further than anything on the market. The financial people needed a ten-year forecast for demand and revenue. One of Jimís main tasks in his new job was to develop a long run sales forecast that he could sell as ìbelievableî to the very conservative vice-president of Finance. In addition, Jim should be prepared to answer a number of ìwhat-ifî questions from the Marketing vice-president about product penetration at different price offerings.

Survey or Trial Data
Luckily for Jim, a market survey had been sent out to subscribers of golf magazines a year earlier to determine the general level of market acceptance for such a product. After explaining the features of the new golf balls, questions were asked as to gender, income, years golfing experiencing, and brand loyalty. Then those surveyed were asked whether they would buy a package of the new golf balls at one of three price points. Survey data of this type is sometimes referred to as stated preference information because actual consumer behavior is not observed. If the data came from a product trial, then it would represent actual behavior, or revealed preferences. As you might guess, the literature shows new product penetration rates derived from stated preference data are typically greater than that obtained from revealed preference sources.

Before Jim proceeds further, it is necessary to re-code some of the data to a more useable form. We begin by first coding the surveyed purchase decision for each respondent as a zero or one value (0= no purchase; 1= purchase). This 0/1 coding scheme can also be used to re-code responses to other questions such as Gender, Previous Purchaser, etc. Afterwards, the data might look something like this...

Table 1

Survey Questions: 750 Respondents... Average Values

1) What is your Income in Thousands of dollars? $42.5

2) How many people in your household? 4.2

3) What is your age? 55.3

4) What is your gender (female = 1)? .12

5) Have you purchased products from us before (yes = 1)? .14

6) Would you purchase these new golf balls at $20, $30, $40 per dozen? .43

Averaged price asked on the survey $22

Table 2

OBS A
Buy? Yes=1 B
($000)
Income C
Number in Household D
Age E
Gender
Female=1 F
Previous Purchaser
Yes=1 G
Price Surveyed

1 0 66 5 24 0 0 30

2 0 95 3 33 0 0 20

3 1 56 4 65 1 1 20

4 1 50 4 55 1 0 40

5 0 45 3 19 0 0 30

... ... ... ... ... ... ...

Now letís define some terms typically used in regression. The dependent variable in regression is a general name for the variable we wish to forecast. The other variables, called independent or explanatory variables, will be used to explain or predict the behavior of the dependent variable. In this application, we specify the surveyed ìpurchase / no purchaseî decision (column A) as the dependent variable in the model. This equation predicts the likelihood that an individual would purchase the product given his / her income, household size, age, gender, previous purchase history, and the productís price.

Coming up with the best model to predict purchasing behavior is as much of an art as it is a science.

First, you should include only variables in the equation that make sense - either from an economic standpoint or marketing perspective.
Second, you should stay away from including two explanatory variables that are very correlated with one another - i.e. explain the same thing.
Third, check the signs of the coefficients the regression produces to make sure they make sense. For example, you would expect price to have a negative sign - when price goes up, the probability of purchase should go down.
Fourth, remove any explanatory variables from the equation that have a t-statistic generally greater than 2 in absolute value. T-statistics are produced by all regression packages and can be used as a means to eliminate unnecessary variables from the model.
Fifth, keep the number of variables relatively small, typically less than 20.
Finally, see how your model does in predicting who would purchase or not purchase your product. Use a cut-off value of .50. If the equation produces a probability greater than the cut-off, the predicted choice is to ìbuy.î If the number is less than or equal to .50, the predicted choice is ìnot to buy.î Compare this result to what was obtained from the survey to get an idea on your modelís accuracy. Remember, no model is perfect!

What-if Analysis: Linear Models
Now you have the engine necessary for your ìwhat-ifî analysis - the equation derived from your survey data! Although the prediction equation reflects the likelihood of purchase for the respondent, it can easily be used to simulate changes in aggregate penetration rates. For example, maybe our final model (Equation 1) simply used two variables (price and income) to explain purchase behavior:

Equation 1
Likelihood of Purchase = -.08551 -.010727 * Price + .017733 * Income

First we need a starting point for our what-if analysis. Letís call this the Base Case. A Base Case penetration estimate could be calculated by simply substituting the average price asked in the survey ($22) and the average income ($42.5) into the above equation.

Base Case Penetration = Intercept + (B1* Avg. Price) + B2 * Avg. Income
= -.08551 + (-.010727*22) + (.017733*42.5)
= .432

Now suppose you wanted to know how this would change if you decreased the price of your new product by 10% over the average price asked on the survey. Letís call this the Scenario penetration. Simply multiply the average surveyed price by .90 and recompute the equation for the new scenario as follows...

Scenario Penetration = Intercept + (B1* Avg. Price*.90) + B2 * Income
= -.08551 + (-.010727*22*.90) + (.017733*42.5)
= .456

The estimated change in penetration as a result of the price change is the difference between the Scenario and Base Case (.456-.432 or 2.4 percentage points). The same procedure could also be used to test a variety of scenarios with different variables or combination of ìwhat-ifsî to determine the impact on long run market acceptance.

Using Non-Linear Models
In the above example, we used linear regression to estimate linear relationships in the data. Computing penetration rates was a simple matter of running the scenario averages for the different variables through the equation. There was no need to process respondent level data. However, more advanced techniques such as probit or logistic regression can estimate nonlinear relationships. For these approaches, substituting survey averages into the equation to calculate penetration rates would be incorrect. Instead, the likelihood of purchase for each respondent would have to first be computed.

In our example, using logistic regression rather than linear regression would produce different estimates for the constant term (-3.8024), price (-. 10487), and income (.13298). Although the equation is slightly more complicated because it uses an exponential function, forecasting probabilities of purchase and long run penetration is still a simple matter. Table 3 shows an example summary of the calculations developed for the first five respondents in our survey after using a logistic regression approach (Equation 2):

Equation 2
Likelihood of Purchase (Respondent i) = [e^(-3.8024 -.10487 * Price +.13298 * Income)] / [1 + e^(-3.8024 -.10487 * Price +.13298 * Income)]

Base Case Likelihood of Purchase (Respondent 1)
= [e^(-3.8024 -.10487 * 30+.13298 * 66)] / [1 + e^(-3.8024 -.10487 * 30 +.13298 * 66)]
= 0.861545

Scenario Likelihood of Purchase (Respondent 1)
= [e^( -3.8024 -.10487 * 30 * .9 +.13298 * 66)] / [1 + e^(-3.8024 -.10487 * 30 * .9 +.13298 * 66)]
= 0.894993

Once the Base Case and Scenario like-lihoods are calculated for everyone in the survey, the market penetration rates are derived by summing all the probabilities and dividing by the total number of respondents. The difference between these Base Case and Scenario penetration rates would reflect the impact of a 10% price reduction on the long run market acceptance of your new product.

Table 3

OBS A
Buy? Yes=1 B
($000) Income Original Price Surveyed Base Case Likelihood Proposed Price Scenario Likelihood

1 0 66 30 0.861545 30*.9 0.894993

2 0 95 20 0.998811 20*.9 .999036

3 1 56 20 0.814492 20*.9 0.852811

4 1 50 40 0.206165 40*.9 0.283184

5 0 45 30 0.275998 30*.9 0.343037

... ... ... ... ... ... ...

Using Classification Tables
Another way to examine the purchase decision on a more detailed level is to produce a classification table. As before, simulate the impact of a price change by decreasing the value of the price asked to each respondent in the survey by 10%. Once the change is made, these new values for the explanatory variables in the model are run through the modeling equation to produce a new likelihood of purchase. By using a cut-off probability value of .50, you can determine the simulated new purchase decisions for everyone in your sample data. Those individuals with original predicted probabilities closest to the .50 cut-off will have the greatest chance to be affected by the new price change.

In other words, if a respondentís original probability was .22, then a 10% price reduction might not be enough to raise his purchase probability over the .50 cut-off value. His behavior or purchase choice would remain the same (no purchase). However, someone who had originally scored a .48 (no purchase) might easily be pushed over the .50 cut-off, placing him in the ìpurchaseî category. Afterwards, a count could be done on all the respondents who changed their purchase decision because of the price scenario and compared with the original surveyed results. This could be done in SAS, Fortran, C++, Visual Basic, or by software packages like LifeCast Pro that are designed specifically for new product forecasting.

Prospect Lists
Not only does Jim have a platform to answer a variety of questions related to his forecast, but he is now in a position to help the sales force in identifying their best prospects. First, he collects information from list providers (outside sources) along with data internal to ABC Athletics as used in his regression model: income, family size, age, etc. Next, he simply runs this prospect data through the regression equation and sorts the answer (probability of purchase) from highest to lowest. The highest results represent those prospects with the greatest likelihood of purchasing the new golf balls. The sales group then takes this rank ordered list and begins to mail or call the individuals at the top first, working their way down the list based upon their remaining budget.

Summary
In summary, survey data used with regression analysis can provide an excellent tool for answering a wide range of ìwhat-ifî questions regarding market acceptance, not to mention aiding sales and marketing efforts in finding the best prospects for their new products. When multiplied by a potential universe of subscribers, market penetration can be translated into unit sales over time.

The final article in this three part series will describe a number of ways to forecast sales over time reflecting the s-shaped pattern of the product life cycle.

The next article in this series will discuss estimating long run penetration rates from survey or trial data and prospect lists. The third article will go into more detail on the use of diffusion models.

Did you find this article interesting,
useful and well written?

Yes No

• Staff and editors

• Advertising in Visions Magazine

• About Visions

• Submit Article

Web design by Netconcepts Email marketing by gravityMail