New Product Forecasting Part 2: Whatif Analysis & Prospect Lists
Jeffrey Morrison, Director of Modeling Equifax Corporation
(m_jeffer@bellsouth.net)
Forecasters have always struggled with how best to develop realistic
projections in an environment where historical data and adequate market
research may be scarce. Although new product forecasters are faced with
even more challenges in this area, some statistical modeling techniques
used to analyze mature products can be applied to new products to provide
valuable insight into long run market acceptance. This article is the second
in a three part series discussing quantitative forecasting techniques for
new product forecasting.
In the last article (Visions,
July 1999; page 10), we looked at Jim who had recently been promoted
to Product Manager in a national sports equipment company, ABC Athletics.
The research group had just completed the development of a new golf ball
that goes 20% further than anything on the market. The financial people
needed a tenyear forecast for demand and revenue. One of Jimís main tasks
in his new job was to develop a long run sales forecast that he could
sell as ìbelievableî to the very conservative vicepresident of Finance.
In addition, Jim should be prepared to answer a number of ìwhatifî questions
from the Marketing vicepresident about product penetration at different
price offerings.
Survey or Trial Data
Luckily for Jim, a market survey had been sent out to subscribers of golf
magazines a year earlier to determine the general level of market acceptance
for such a product. After explaining the features of the new golf balls,
questions were asked as to gender, income, years golfing experiencing,
and brand loyalty. Then those surveyed were asked whether they would buy
a package of the new golf balls at one of three price points. Survey data
of this type is sometimes referred to as stated preference information
because actual consumer behavior is not observed. If the data came from
a product trial, then it would represent actual behavior, or revealed
preferences. As you might guess, the literature shows new product penetration
rates derived from stated preference data are typically greater than that
obtained from revealed preference sources.
Before Jim proceeds further, it is necessary to recode some of the
data to a more useable form. We begin by first coding the surveyed purchase
decision for each respondent as a zero or one value (0= no purchase; 1=
purchase). This 0/1 coding scheme can also be used to recode responses
to other questions such as Gender, Previous Purchaser, etc. Afterwards,
the data might look something like this...
Table
1 
Survey Questions:
750 Respondents... 
Average Values


1) What is your
Income in Thousands of dollars? 
$42.5


2) How many
people in your household? 
4.2


3) What is your
age? 
55.3


4) What is your
gender (female = 1)? 
.12


5) Have you
purchased products from us before (yes = 1)? 
.14


6) Would you
purchase these new golf balls at $20, $30, $40 per dozen?

.43


Averaged price
asked on the survey 
$22


Table
2 
OBS 
A
Buy? Yes=1 
B
($000)
Income 
C
Number in Household 
D
Age 
E
Gender
Female=1 
F
Previous Purchaser
Yes=1 
G
Price Surveyed 
1 
0 
66 
5 
24 
0 
0 
30 
2 
0 
95 
3 
33 
0 
0 
20 
3 
1 
56 
4 
65 
1 
1 
20 
4 
1 
50 
4 
55 
1 
0 
40 
5 
0 
45 
3 
19 
0 
0 
30 

... 
... 
... 
... 
... 
... 
... 
Now letís define some terms typically used in regression. The dependent
variable in regression is a general name for the variable we wish to forecast.
The other variables, called independent or explanatory variables, will be
used to explain or predict the behavior of the dependent variable. In this
application, we specify the surveyed ìpurchase / no purchaseî decision (column
A) as the dependent variable in the model. This equation predicts the likelihood
that an individual would purchase the product given his / her income, household
size, age, gender, previous purchase history, and the productís price.
Coming up with the best model to predict purchasing behavior is as much
of an art as it is a science.
 First, you should include only variables in the equation that
make sense  either from an economic standpoint or marketing perspective.
 Second, you should stay away from including two explanatory
variables that are very correlated with one another  i.e. explain the
same thing.
 Third, check the signs of the coefficients the regression produces
to make sure they make sense. For example, you would expect price to
have a negative sign  when price goes up, the probability of purchase
should go down.
 Fourth, remove any explanatory variables from the equation
that have a tstatistic generally greater than 2 in absolute value.
Tstatistics are produced by all regression packages and can be used
as a means to eliminate unnecessary variables from the model.
 Fifth, keep the number of variables relatively small, typically
less than 20.
 Finally, see how your model does in predicting who would purchase
or not purchase your product. Use a cutoff value of .50. If the equation
produces a probability greater than the cutoff, the predicted choice
is to ìbuy.î If the number is less than or equal to .50, the predicted
choice is ìnot to buy.î Compare this result to what was obtained from
the survey to get an idea on your modelís accuracy. Remember, no model
is perfect!
Whatif Analysis: Linear Models
Now you have the engine necessary for your ìwhatifî analysis  the equation
derived from your survey data! Although the prediction equation reflects
the likelihood of purchase for the respondent, it can easily be used to
simulate changes in aggregate penetration rates. For example, maybe our
final model (Equation 1) simply used two variables (price and income)
to explain purchase behavior:
Equation
1
Likelihood
of Purchase = .08551 .010727 * Price + .017733 * Income

First we need a starting point for our whatif analysis. Letís call this
the Base Case. A Base Case penetration estimate could be calculated by simply
substituting the average price asked in the survey ($22) and the average
income ($42.5) into the above equation.
Base Case Penetration = Intercept + (B1* Avg. Price) + B2 * Avg.
Income
= .08551 + (.010727*22) + (.017733*42.5)
= .432
Now suppose you wanted to know how this would change if you decreased
the price of your new product by 10% over the average price asked on the
survey. Letís call this the Scenario penetration. Simply multiply the
average surveyed price by .90 and recompute the equation for the new scenario
as follows...
Scenario Penetration = Intercept + (B1* Avg. Price*.90) + B2
* Income
= .08551 + (.010727*22*.90) + (.017733*42.5)
= .456
The estimated change in penetration as a result of the price change
is the difference between the Scenario and Base Case (.456.432 or 2.4
percentage points). The same procedure could also be used to test a variety
of scenarios with different variables or combination of ìwhatifsî to
determine the impact on long run market acceptance.
Using NonLinear Models
In the above example, we used linear regression to estimate linear relationships
in the data. Computing penetration rates was a simple matter of running
the scenario averages for the different variables through the equation.
There was no need to process respondent level data. However, more advanced
techniques such as probit or logistic regression can estimate nonlinear
relationships. For these approaches, substituting survey averages into
the equation to calculate penetration rates would be incorrect. Instead,
the likelihood of purchase for each respondent would have to first be
computed.
In our example, using logistic regression rather than linear regression
would produce different estimates for the constant term (3.8024), price
(. 10487), and income (.13298). Although the equation is slightly more
complicated because it uses an exponential function, forecasting probabilities
of purchase and long run penetration is still a simple matter. Table 3
shows an example summary of the calculations developed for the first five
respondents in our survey after using a logistic regression approach (Equation
2):
Equation
2
Likelihood
of Purchase (Respondent i) = [e^(3.8024 .10487 * Price +.13298
* Income)] / [1 + e^(3.8024 .10487 * Price +.13298 * Income)]

Base Case Likelihood of Purchase (Respondent 1)
= [e^(3.8024 .10487 * 30+.13298 * 66)] / [1 + e^(3.8024 .10487 * 30
+.13298 * 66)]
= 0.861545
Scenario Likelihood of Purchase (Respondent 1)
= [e^( 3.8024 .10487 * 30 * .9 +.13298 * 66)] / [1 + e^(3.8024 .10487
* 30 * .9 +.13298 * 66)]
= 0.894993
Once the Base Case and Scenario likelihoods are calculated for everyone
in the survey, the market penetration rates are derived by summing all
the probabilities and dividing by the total number of respondents. The
difference between these Base Case and Scenario penetration rates would
reflect the impact of a 10% price reduction on the long run market acceptance
of your new product.
Table
3 
OBS 
A
Buy? Yes=1 
B
($000) Income 
Original
Price Surveyed 
Base
Case Likelihood 
Proposed
Price 
Scenario
Likelihood 
1 
0 
66 
30 
0.861545 
30*.9 
0.894993 
2 
0 
95 
20 
0.998811 
20*.9 
.999036 
3 
1 
56 
20 
0.814492 
20*.9 
0.852811 
4 
1 
50 
40 
0.206165 
40*.9 
0.283184 
5 
0 
45 
30 
0.275998 
30*.9 
0.343037 
... 
... 
... 
... 
... 
... 
... 
Using Classification
Tables
Another way to examine the purchase decision on a more detailed level is
to produce a classification table. As before, simulate the impact of a price
change by decreasing the value of the price asked to each respondent in
the survey by 10%. Once the change is made, these new values for the explanatory
variables in the model are run through the modeling equation to produce
a new likelihood of purchase. By using a cutoff probability value of .50,
you can determine the simulated new purchase decisions for everyone in your
sample data. Those individuals with original predicted probabilities closest
to the .50 cutoff will have the greatest chance to be affected by the new
price change.
In other words, if a respondentís original probability was .22, then
a 10% price reduction might not be enough to raise his purchase probability
over the .50 cutoff value. His behavior or purchase choice would remain
the same (no purchase). However, someone who had originally scored a .48
(no purchase) might easily be pushed over the .50 cutoff, placing him
in the ìpurchaseî category. Afterwards, a count could be done on all the
respondents who changed their purchase decision because of the price scenario
and compared with the original surveyed results. This could be done in
SAS, Fortran, C++, Visual Basic, or by software packages like LifeCast
Pro that are designed specifically for new product forecasting.
Prospect Lists
Not only does Jim have a platform to answer a variety of questions related
to his forecast, but he is now in a position to help the sales force in
identifying their best prospects. First, he collects information from
list providers (outside sources) along with data internal to ABC Athletics
as used in his regression model: income, family size, age, etc. Next,
he simply runs this prospect data through the regression equation and
sorts the answer (probability of purchase) from highest to lowest. The
highest results represent those prospects with the greatest likelihood
of purchasing the new golf balls. The sales group then takes this rank
ordered list and begins to mail or call the individuals at the top first,
working their way down the list based upon their remaining budget.
Summary
In summary, survey data used with regression analysis can provide an excellent
tool for answering a wide range of ìwhatifî questions regarding market
acceptance, not to mention aiding sales and marketing efforts in finding
the best prospects for their new products. When multiplied by a potential
universe of subscribers, market penetration can be translated into unit
sales over time.
The final article in this
three part series will describe a number of ways to forecast sales over
time reflecting the sshaped pattern of the product life cycle.
The next article in this series will discuss estimating long run
penetration rates from survey or trial data and prospect lists. The third
article will go into more detail on the use of diffusion models.
Did you find this
article interesting,
useful and well written?



