Q2

In response to the interview questions by S OME C OMPANY Inc.
August 8, 2011
Contents
1 2 3 Introduction Features Data acquisition, analysis and discussion 3.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.1.3 3.2 Sample survey questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Customer demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate and predict customer behavior using linear model . . . . . . . . . . . . . . 2 2 4 4 4 5 6 8 8 9
Real-time data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.2.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose of collecting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 12 13 14
Summary
References Appendix
Introduction
This document is dedicated to analysis S OME P RODUCT in terms of three R metrics: 1. R EACH: how to acquire customers; 2. R ETENTION: how to keep customers; 3. R EVENUE I will structure the analysis in the following two aspects: 1. Summary available features that match the metrics, make some discussion. 2. Suggest some data collection methods and data analysis techniques. I will also make up some mockup data to demonstrate data that I wish to collect and some statistical method for data analysis.
Features
A great game is a game that is innovative and always centers around players. S OME P RODUCT is amazing and addictive. As a computer gamer who has played many simulation/management games, I would like to discuss a bit about current features and make some personal suggestions. You can treat this part as a review from a customer. The data analysis is in the next section which is the main body of this essay.(Jump to: Section 3) 1. It is Fun to play S OME P RODUCT [5] The gameplay is fun and addicting. The graphic is stunning and cute. The sound and BGM are quite interesting. Avatars, proles, state machines, tech trees, collections, sliders and interface toys are designed quite well in the game.[1] This is actual core metric that reach and retain users in the game. 2. Easy to lean, hard to master [5, 7] The game is easy to get hands on, even for ve-year-old kids. The intuitive tutorial after launching the game for the rst time guides users through almost all the necessary steps to operate a zoo. However, within a tiny space, some micromanagement skills are requires. For example, how to t more buildings into the zoo? Whether to spend zoo bucks for instant nish or not? Adequate advanced, more involved techniques is especially important for retaining advanced users from whom might the revenue come. Talking about players skill levels, I am not sure about player demographics(I will discuss this in the next section). For me, the game is a bit simple and I would like to see more advanced features. For example, one obnoxious problem is building path. So far, path is totally redundant in the game because "tourists" run around. Even if I insist on building a path regardless whether it is useful or not. I have to go to store -> decoration -> scroll pages until I can nd the path. And I also need to dig through this hierarchy again if I need a corner or I accidentally hit cancel. I would suggest: (a) Add some shortcut slots somewhere at the corner of the screen where players can drag whatsoever they want to it; 2
(b) Add options in store interface, allowing players to sort animals based on price, alphabetics, popularity. Also, add a fast way for customer to request animals they want. (c) Add bonus factors for building path. For example, animals built next to a path enjoy slightly more frequent income rate. (d) While I was playing Roller Coaster Tycoon1 several years ago. I found it was visually rewarding to see "tourists" streaming on the street or waiting a long line at a particular roller coaster or "laughing". S OME P RODUCT does have these details like "tourists" sometimes stop and take photos. But it is still not enough and the "tourists" models are too identical. (e) It is also enjoyable to see the number of "tourists" increasing steady after introducing some new stuff. For S OME P RODUCT, besides coin and XP, would it be better if we can add some jumping "zoo statistic" numbers including: current tourist numbers, todays income, etc. (f) Is it possible to rotate the view angle? Make the buildings more "three dimentional?" Please refer to Paradise Island2 . 3. Gameplay Depth [5] Currently, players can already do a lot in the game. There are free stuff, as well as fancy expensive stuff. As a player had spent sizable amount of time in game, he/she must have accumulated large quantity of game currency. Adding new content, especially expensive animals, advanced shops, at weekly basis will help suck up these money. If there is always new things to explore on a regular basis, there is more chance that users will check back from time to time, so called engagement or retention[7]. 4. Viral channel [6] The customer base development pattern is viral increase. Several common strategies, such as cross promotion, i.e. rewarding zoo bucks for installing another game can expose users to more games, especially when there is a large user base at an existing game while a new game is just launched. Therefore, keep More Awesome Games at the noticeable location is denitely a good idea. Splash images after launching the game is another good idea, but keep these images at a reasonable number is important. However, discoverability issue, as stated in [6], i.e. how to get noticed and grow customer base, remain a challenge, especially in this competitive market Also, it might be possible to explore other market other than iOS or Android. For example, although windows phone 7 only has 1% market share, there is not a single social game available and windows phone has its momentum. With the advent of "Mango" update due in fall 2011, windows phone will have in-app purchase function. It is denitely a promising market because (1)there are few competitors and (2) people are crazy for alternative high standard games other than pricey Xbox Live games. 5. Promotion, sale and income source When I was playing, I have seen several promotion methods, such as limited edition collection, popping up banner saying youve earned and EXCLUSIVE 50% off zoo Bucks RIGHT NOW! 4.99 for 125. It was really attempting to make purchase because it is really affordable and 125 zoo bucks can do a lot. Constant promotions that target all strata of players, i.e. exclusive animals for advanced users, discount stuff for average users. As for income, Reference [4] suggested 40% income from in-app purchase, the rest from advertisement and other source. I will discuss promotion, revenue in the next section.
1 2
http://www.rollercoastertycoon.com/us/index.php https://market.android.com/details?id=com.seventeenbullets.android.island
6. Social feature There is tiny social with basic social functions, such as visit others zoo, leave tips, add friends, share pictures on facebook. However, the social function is still not enough. In my opinion, nothing is more rewarding than posting a snapshot of the zoo and receive complement like "This is so amazing!" from friends. So far the only action that requires two parties is leaving tips. This function is good but limited. The rst thing: I looked every where, trying to nd a shortcut to add someone who tipped me as friend but I ended up input his/her name by myself! Secondly, leaving tips or not does not inuence the game too much (Maybe only inuence some zoo goals). I would suggest several options: (a) Reduce building cost for some unique animal and instead require players to tip certain amount of friends with a message "help with raising animals!" Friends can simply click "help out" at no extra cost; (b) Reduce building time if friends help with building; (c) I believe a carefully designed zoo is a piece of artwork. Should provide leaderboard or daily star that features some beautifully designed zoos. Or maybe zoo design contest with zoo bucks reward is another optional strategy to get more involvement. These carefully decorated zoo will not only increase in-game time, but also perfect thing to show off to friends. (d) Chat channel where users can share pictures, nd and chat with friends, request for help, communicate with game moderators, etc. I only listed several items that might be directly benecial. There are a lot more ideas for social games design and operation. But for most of them formal data collection and analysis are required for evaluating the ideas.
3
3.1
Data acquisition, analysis and discussion

Survey
The best way to evaluate user experience is through survey. It is straightforward and cost effective. Also, survey is the best way to assess factors that are subjective, unique to a cluster of players and cannot be obtained through indirect data research such as secondary market research. General survey methods include simple random sampling (SRS), stratied random sampling, ratio and regression estimation based on SRS, cluster sampling etc. Selection of methods should be based on real data. Take SRS as example, before actual sampling, random select several players (2050, depends on the population) for a pilot study. Collect related information (parameters), such as gender, age, race, social status, education, martial status, etc. The purpose of the pilot study is to: (1) test the survey questions; (2) determine sample size. Besides, from database, pull out other information (response variables), such as total time spent in game, total money spent, log on times per day, etc. After these, formal survey can be carried out. 3.1.1 Sample survey questionnaire
A formal survey questionnaire involves group effort and several test periods. There are several forms, such as web-based survey, email survey, online focus groups and traditional paper-based survey. Here I make a simple survey to demonstrate the concept. 4
Welcome to this very important survey with which we researchers want to get to know what you really want and provide better service. Thank you for lling it out! Your gender: Male Female
Your age: ______________________ How do you feel today? Your income range? prefer not to say I am feeling great! No income <$10,000 not really, because ____________ $10,000-$30,000 >$30,000
What is(are) your favorite zoo animal? Dog Cat Parrot Pigeon Unicorn Aragog favorite animals!It may appear in the next update!)____________ How do you like the graphic? horrible - - - - - awesome! a lot
Others(Name your
Do you have a lot of friends playing the game? no
- - - - - -
If there is one feature you want in the game, what is it?_______ 3.1.2 Customer demographics
Based on collected data, we can get some insight about customers. For example, it is possible to calculate the proportion of players who give high rating, the proportion of each gender, etc. Figure.1 demonstrates decomposing players into different age groups. I use pie chart (or can use stacked bar chart) for explanatory data analysis (EDA).
0.0/1.0
Age Group 1 0.8 0.2 1) 818 2) 1925 3) 2635 4) 3645 5) 4656 6) 60+ 0.6 0.4
Figure 1: The relationship between Age Group and user demographics From this mockup data, we can see the major user base is among age 8 to 35. Decision making should be based on this result:
1. R ETAIN the current customer population by producing more animals that are favored by 8 to 35 age group. For example, these young people may like more vivid animation, more involved breeding method, etc. These information can be obtained through secondary market research or from open questions or multiple-choice questions in the survey. We can send out questionnaire from time to time, asking players:" Which animal do you desperately want in your zoo?". After acquiring all the information, we can produce new content mainly based on their preference. With plenty of attractive content designed especially for them, these players will have a feeling of importance and be more loyal to the game. 2. R EACH other customers. Find out why there are less elder players. Is it because there are fewer smart phone users among these user group? This can be evaluated by some statistical methods (e.g. binomial test). If this is not the case. We need to nd out what these customer prefer. Is it the case that elder people favor tame animals like dogs, rabbit? Those information can be extracted from carefully designed survey. Then without disrupting the existing customer base, we can deploy new animals, develop new ads to attract these customers. Furthermore, if we want to study monetization from players. We survey spent by players and their social status conditional on gender, we can draw stacked bar chart or pie char as shown in Figure.2: From this mockup data, we know that higher percentage of wealthy male will spend more money in game than their female counterpart. So before formal statistical analysis, we can get rough idea that we should focus more on retaining this portion of players who can bring more prot. 3.1.3 Estimate and predict customer behavior using linear model
Through survey, we acquire information that is suitable for tting linear model, i.e. regression analysis and analysis or variance. Also, formal modeling is a crucial step before we can make any decisions based on data. For example, after EDA in the previous section, we suspect some factors are inuential and want to formally model total money spends by players. We consider including age, rating of the game in the model. Then, we can t a analysis of covariance model by crossing age (xi , continuous variable) and rating of game(yj , categorical variable from 1 to 5). The model to start with: zij = + 1 xi + 2 yj + 12 (xy)ij + where: is the average effect; 1 is the effect of age; 2 is the effect of rating; 12 is the interaction effect between age and rating;
ij ij , where i
N (0, 2 ) and i {1...n}, j {1...5}
is the randome error;
Female 1.0
Male
0.8
Response 1) Wealthy 2) Less Wealthy 3) Intermediate
0.6
0.4
4) Somewhat poor 5) Poorest
0.2
0.0
Female
Male
Response 1 1) Wealthy 2) Less Wealthy 3) Intermediate 4) Somewhat poor 5) Poorest
Figure 2: The relationship between social status and percentage of spend in the game [2] in two charting forms
500
q
400
q q q q
Spend
300 200
q
100
q
10
20
30
40
50
Age
Figure 3: The relationship between age and spend in the game, with 95% condence band By tting the mockup data, it is clear that rating is not statistically signicant (Table.1) but age is signicant (p-value = 0.0074 < 0.05) (interaction term was already removed since it is not signicant).
(Intercept) Rating Age
Estimate -47.8195 -0.5382 8.8597
Std. Error 79.7220 23.0856 2.2319
t value -0.60 -0.02 3.97
Pr(>|t|) 0.5706 0.9822 0.0074
Table 1: The coefcient summary
So we include age in the model and use age to predict the user spend in the game. Our model based inference is: age is strongly correlated with how much money users spend in S OME P RODUCT. Elder people tend to spend more than younger people (though it is a smaller user base), maybe because they are wealthier. (Correlation analysis of age and social status may proof this). Having this in mind, we can predict future revenue with the knowledge of players age. And we should work on increasing this user base from which most revenue comes. This is just a simplied demonstration of a basic regression analysis and model selection since regression analysis is a big family. There are various methods and which method to use should be based on the appearance of real data and industrial tradition. For example, log linear model is widely used for multinomial data, say, when we are counting number of players in each age group. Negative binomial model for modeling data with extra variance that cannot be accounted for by standard poission generalized linear model (say, when we are modeling number of players in a day). In terms of model selection, systematic model selection approaches based on likelihood ratio test (LRT), AIC, principle components analysis (PCA) are to be used. Also, we may need to use nonlinear regression to model more complicated data, such as viral growth pattern. And if we have multiple response variables, multivariate regression is desired method to account for extra correlation between responses(e.g. customer counts and revenue are to be modeled with respect to age, social status, etc, and we believe these two are closely related).
3.2
Real-time data analysis
While survey can give game developers insight about historical performance and reference for the future, the performance of the game should be closely monitored in real time. 3.2.1 Data collection
Real time data analysis should be driven by the reach, retention, revenue metrics.[7] Several key parameters, such as Daily Average Users(DAU), daily revenue, daily time spend, etc are data to be modeled. To be more specic, we are interested in the following data: 1. User trafc 2. User churn rates over time (conditional on whether the players are organic players or attracted by cross platform advertisement); 3. Levels and capitals of the users; 4. Number of animals in the zoo; 5. Number of animals in the inventory; 8
6. Number of player purchases a certain animal (conditional on using zoo bucks or coins); 7. Whether a player is online today or not; 8. Number of players use zoo bucks for "instant nish", recover sick animals; 9. Average number of times players launch the game; 10. Average time players spend in game; 11. Number of players use in-app purchase and revenue (ARPPU); 12. Average revenue per user (ARPU); 13. Effective cost per thousand impressions (eCPM); 14. Number of zoo goals each user achieved; 15. Amount of time it takes players to nish a zoo goal; 16. Number of friends; 17. Number of invitations sent; 18. Number of friends each user bring into game; 19. Number of tips received and given; 20. Number of visitors; 21. Number of pictures shared on facebook; 3.2.2 Purpose of collecting data
The purpose of collecting these data is to track the performance in real-time, guide decision making and nd a balance between R EACH , R ETENTION and R EVENUE. The following examples show how they work: 1. High price of in-app virtual items will discourage players from sharing the game to friends and this can be reected from "number of invitations sent" and "number of friends each user bring into game". However, too many free items without daily quota for sharing to a same person will discourage friends for trying the game because of "feeling of spammy"[4]; 2. Promotions can certainly encourage in-app purchase. Moreover, by carefully designing the depth of discount, it is possible to maximize revenue. However, too frequent promotions will discourage regular purchases, because players always wait for promotions. We need to closely track "number of players use in-app purchase and the revenue" to design promotions. 3. Monitor "whether a player is online today or not". If there is certain amount of players dropped, something must be wrong and needs to be xed. So that we can retain the existing customers; 4. "Number of zoo goals each user achieved; Amount of time it takes players to nish a zoo goal" help screening out unfavored zoo goals and give insight about the difculty.
5. eCPM is a crucial factor for that summarize the revenue from splash advertisement, promotions, etc. eCPM, along with user trafc, user churn rate will give a whole picture of the performance of the game. 6. Top 3%10% ARPPU indicates which customers should we put more emphasize on. 3.2.3 Data modeling
For each group of data, there are various modleing methods. Besides the linear models mentioned above, there are other well-established procedures. Such as factor analysis, hyphothesis test (e.g. Mann-Witney U, Wilcoxson Sign Rank), contigency table (e.g. Mantel-Haenszel Test), longitudinal study (e.g. Cohort Analysis), event history analysis[8], time series, Bayesian inference, etc As a demonstration, I am going to model data with a time series model and briey discuss Bayesian inference for real time data analysis.
Standarized user count and revenue
1.0 0.8 0.6 User Count 0.4 0.2 Revenue
100
200
300
400
500
600
Time
Figure 4: Real-time data visualization The mockup time series data in Figure.4 is from Yahoo! Finance whose URL can be found in the appendix. Here I model the "User Count" data and make prediction about number of users in the future. There is some hint of seasonality (Time: 300, 490, 600) but not very obvious. Otherwise, there is no obvious trend judging from the appearance of the plot. To decide which time series model to use, the autocorrelation function plot (ACF) and Partial-ACF (PACF) are shown in Figure.5. From the top two plots, there is evidence of underlying autoregressive (AR) and hint of moving average (MA) processes. After trying ARMA(1,1), we found the residual ACF and PACF are quite stationary (Figure 5, bottom two subgures). Therefore, the model is pretty good. Based on the model, we can make predictions. For demonstration, I predict future 100 days and plot in Figure.6. The predicted value is around 0.138. If we transfer this back to count, then we expect the average number of players in the following 100 days to be around 52,062,404. If we have a nancial goal in mind, based on the average percentage of players who make in-app purchase. We can decide if promotion should be carried out and to what extent should the discount be. Another approach for handling real-time data is using Bayesian approach. Bayesian models consider parameters to be random variables and allow prior information in the model. One advantage of Bayesian model is that we can use all the information available in the market for modeling building. For instance, we 10
1.0
0.8
0.6
sample PACF 0 5 10 15 Lag 20 25
sample ACF
0.4
0.2
0.2
0.2 0
0.2
0.4
0.6
0.8
1.0
10
15 Lag
20
25
1.0
0.8
residual PACF 0 5 10 15 Lag 20 25
0.6
residual ACF
0.4
0.2
0.2
0.2 0
0.2
0.4
0.6
0.8
1.0
10
15 Lag
20
25
Figure 5: Model building and diagnose

1.0
Standardized user count
0.8 0.6 Original 0.4 0.2 Prediction
100
200
300
400
500
600
700
Time
Figure 6: Time series based model based prediction can borrow data from other similar games from competitors as prior information. Another advantage is that with a Bayesian model implemented, we are able to keep updating the model by feeding new data in the
11
model while treating previous data as prior. Also Bayesian model is good for handling missing data. For example, we design a study to track relationship between number of players (rij ) who make purchases and the county they live (j ), the time they spend in the game (tij ). We can build the model: rij Binomial(pij , nij ) logit(pij ) = 0 + 1 tij + j
2 j N ( , ) and = 2j
by setting:
2 2 0 N ( 0 , 0 ), 1 N ( 1 , 1 ) n1 n1 n1 n1 2 Inv.Gamma( , ) n1 n1
2 N ( 2 , 2 ) n1 n1
and
where n1 represents previous prior information. nij : the number of ith group players in the j th county j : the deviation from the based mean by different county 0 : the overall base mean 1 : the effect of time players spend in S OME P RODUCT 2 : the effect of the county
2 : the variation in the county effect
By tting a Bayesian model, we are able to get estimation of all the parameters. Each day, we can update the model with new palyers group. These results can be evaluated continuously. We will have the chance to modify our strategy accordingly. For example, we should extend promotion if we predict more players are buying zoo bucks, expand accrual for advertisement, imbalancing randomization to favor better performing campaign or animals, focusing on players in those counties that make more purchase, etc. [3]
Summary
In this short essay, I listed some game features and made several personal suggestions. My overall impression is that the game has very vivid presentation, adequate game depth and catches consumers desire. Nevertheless, some features, such as the social feature, is still not satisfying enough to stand out from competitors. As for quantitative data analysis, I suggested several potential data collection and processing methods that may be useful to evaluate the market, customer behavior and nancial status. More inference, exact type of plot, statistical methods can be applied based on practical data.
12
References
[1] The end of moores law: A love story, 2010. http://techcrunch.com/2010/08/23/ the-end-of-moores-law-a-love-story/. [2] Pie charts in ggplot2, 2010. pie-charts-in-ggplot2.html. http://www.r-chart.com/2010/07/
[3] Donald A. Berry. Bayesian clinical trials. Nat Rev Drug Discov, 5(1):2736, 2006. 10.1038/nrd1927. [4] Eric Eldon. Ngmoco shares how it is making successful mobile social games, 2009. http://www.insidesocialgames.com/2010/03/19/ ngmoco-shares-how-it-is-making-successful-mobile-social-games/. [5] Brian Hastings. Insomniacs blueprint for social games. http://www.develop-online.net/ features/1151/Insomniacs-blueprint-for-social-games. [6] Jon Jordan. Gdc 2011: Data, revenue, reach and learning from failure: ngmocos live game playbook, 2001. http://www.pocketgamer.biz/r/PG.Biz/ngmoco+news/news.asp?c=27893. [7] JASON LIM. Zyngas gm andy tian, gives advice for game developers, 2011. http://technode.com/2011/04/13/zynga%E2%80% 99s-gm-andy-tian-gives-advice-for-game-developers/. [8] Jeroen K. Vermunt and Guy Moors. Event history analysis. Handbook of Quantitative Methods in Psychology, 2009.
13
Appendix
Below is the source R code:
### R code from vignette source 'Q2.rnw' ### Encoding: ISO8859-1 ################################################### ### code chunk number 1: Q2.rnw:67-68 ################################################### setwd('F:\\My Dropbox\\Personal\\Job Hunting\\Report\\tinyco\\Question2')
################################################### ### code chunk number 2: Q2.rnw:266-300 ################################################### library(ggplot2) library(sqldf) maxFontSize = 10 FigWidth = 7.5 # cm FigHeight = 5.5 # cm inch = 2.54 # cm # tweek the screen to get the right plot size in cm windows(width=FigWidth/inch, height=FigHeight/inch, pointsize=maxFontSize) age = read.table('Mock Data/age group.txt', header = T) sumFreq = sqldf(" select SUM(Freq) as 'SumFreq' From age") age = sqldf("select Freq, CASE WHEN AgeGrp == 0 THEN '1) 8-18' WHEN AgeGrp == 1 THEN '2) 19-25' WHEN AgeGrp == 2 THEN '3) 26-35' WHEN AgeGrp == 3 THEN '4) 36-45' WHEN AgeGrp == 4 THEN '5) 46-56' WHEN AgeGrp == 5 THEN '6) 60+' END ageGrp from age join sumFreq") #Standardization age$Freq = age$Freq / sumFreq$SumFreq p = ggplot(data =age, aes(x = factor(1), y = Freq, fill = ageGrp)) p = p + geom_bar(width = 1) + opts(plot.background = theme_blank(), plot.margin = unit(c(1, 0, 0, 0.5), "lines")) p = p + coord_polar(theta="y") p = p + xlab('') + ylab('') + labs(fill='Age Group') print.ggplot(p) ggsave("pie_chart_ageGrp.pdf",scale=1.5) dev.off()
################################################### ### code chunk number 3: Q2.rnw:331-366
14
################################################### df = read.table('Mock Data/pie chart.txt', header = T) df = sqldf("select Summary, CASE WHEN Gender==1 THEN 'Female' WHEN Gender==2 THEN 'Male' END gender, CASE WHEN Freq ==5 THEN '1) Wealthy' WHEN Freq ==4 THEN '2) Less Wealthy' WHEN Freq ==3 THEN '3) Intermediate' WHEN Freq ==2 THEN '4) Somewhat poor' WHEN Freq ==1 THEN '5) Poorest' END response from df") maxFontSize = 10 FigWidth = 12.5 # cm FigHeight = 5.5 # cm inch = 2.54 # cm windows(width=FigWidth/inch, height=FigWidth/inch, pointsize=maxFontSize) # p p p p First plot the stack bar chart = ggplot(data = df, aes(x= factor(1), y = Summary, fill = factor(response))) = p + geom_bar(width =1) + opts(plot.background = theme_blank()) = p+facet_grid(facets=. ~ gender) = p + xlab('') + ylab('') + labs(fill='Response') + opts(axis.text.x = theme_blank())
pushViewport(viewport(layout = grid.layout(2, 1))) vp1 <- viewport(width = 1, height = 0.5, just = c("bottom")) pdf("pie_chart.pdf") print.ggplot(p ,vp = vp1) # Second plot the pie chart p = p + coord_polar(theta="y") vp2 <- viewport(width = 1, height = 0.5, just = c("top")) print.ggplot(p,vp = vp2) #ggsave("pie_chart.pdf",scale=1.5) dev.off()
################################################### ### code chunk number 4: Q2.rnw:402-411 ################################################### windows(width=FigWidth/inch, height=FigHeight/inch, pointsize=maxFontSize) age_status = read.table('Mock Data/age_status.txt', header = T) model = lm(Spend ~ Rating + Age , data = age_status) summary(model) p = ggplot(data = age_status, aes(x = Age, y = Spend)) p = p + stat_smooth(method="lm") + geom_point() print.ggplot(p) ggsave("regress.pdf",scale=1.5) dev.off()
################################################### ### code chunk number 5: Q2.rnw:420-423
15
################################################### library(xtable) xtable(summary(model), caption = "The coefficient summary", label = 'coefsummary')
################################################### ### code chunk number 6: Q2.rnw:523-539 ################################################### stockdata = read.csv('http://chart.yahoo.com/table.csv?s=C&a=0&b=01&c=2009&d=6&e=10&f=2011& #standardize data stockdata$std.Volume = stockdata$Volume/max(stockdata$Volume) stockdata$std.Adj.Close = stockdata$Adj.Close/max(stockdata$Adj.Close) # modify device windows size windows(width=FigWidth/inch, height=FigHeight/inch, pointsize=maxFontSize) p = ggplot(data = stockdata) p = p + geom_line(aes(x = seq(1:length(std.Adj.Close)), y = std.Volume, colour = 'User Count')) p = p + geom_line(aes(x = seq(1:length(std.Adj.Close)), y = std.Adj.Close, colour = 'Revenue')) p = p + xlab('Time') + ylab('Standarized user count and revenue') + labs(colour='') print.ggplot(p) ggsave("real_time.pdf",scale=1.5) dev.off()
################################################### ### code chunk number 7: Q2.rnw:561-576 ################################################### pdf('timeseries.pdf', width=FigWidth*1.5/inch, height=FigWidth*1.5/inch) par(mfrow = c(2,2)) # Sample ACF and PACF acf(stockdata$std.Volume, main="", ylim=c(-0.2,1), ylab="sample ACF") pacf(stockdata$std.Volume, main="", ylim=c(-0.2,1), ylab="sample PACF") model = arima(stockdata$std.Volume,order = c(1,0,1),seasonal = list(order = c(0, 0, 0), period = 100)) model model.resid = resid(model) # Residual ACF and PACF acf(model.resid,main="", ylim=c(-0.2,1), ylab="residual ACF") pacf(model.resid,main="", ylim=c(-0.2,1), ylab="residual PACF") dev.off()
################################################### ### code chunk number 8: Q2.rnw:592-605 ################################################### windows(width=FigWidth/inch, height=FigHeight/inch, pointsize=maxFontSize) predict.stockdata = data.frame(count = stockdata$std.Volume, indicator = factor("Original")) spred = predict(model,100)
16
predict.stockdata = rbind(predict.stockdata, data.frame(count = spred$pred, indicator = factor("Prediction"))) p = ggplot(data = predict.stockdata, aes(x = seq(1:length(count)), y = count, colour = indicator)) p = p + geom_line() + xlab("Time") + ylab("Standardized user count") + labs(colour = '') print.ggplot(p) ggsave("prediction.pdf",scale=1.5) dev.off()
################################################### ### code chunk number 9: Q2.rnw:670-671 ################################################### Stangle('Q2.rnw')
################################################### ### code chunk number 10: Q2.rnw:674-676 ################################################### x = readLines('Q2.R') cat(x, sep = "\n")
17

Q2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Q2

Uploaded by

Copyright:

Available Formats

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

Data acquisition, analysis and discussion

In response to the interview questions by S OME C OMPANY Inc.

Do you have a lot of friends playing the game? no

In response to the interview questions by S OME C OMPANY Inc.

N (0, 2 ) and i {1...n}, j {1...5}

is the randome error;

In response to the interview questions by S OME C OMPANY Inc.

Response 1) Wealthy 2) Less Wealthy 3) Intermediate

4) Somewhat poor 5) Poorest

Response 1 1) Wealthy 2) Less Wealthy 3) Intermediate 4) Somewhat poor 5) Poorest

In response to the interview questions by S OME C OMPANY Inc.

(Intercept) Rating Age

Estimate -47.8195 -0.5382 8.8597

Std. Error 79.7220 23.0856 2.2319

t value -0.60 -0.02 3.97

Pr(>|t|) 0.5706 0.9822 0.0074

Table 1: The coefcient summary

Real-time data analysis

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

sample PACF 0 5 10 15 Lag 20 25

residual PACF 0 5 10 15 Lag 20 25

Figure 5: Model building and diagnose

Standardized user count

0.8 0.6 Original 0.4 0.2 Prediction

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

In response to the interview questions by S OME C OMPANY Inc.

################################################### ### code chunk number 3: Q2.rnw:331-366

In response to the interview questions by S OME C OMPANY Inc.

################################################### ### code chunk number 5: Q2.rnw:420-423

In response to the interview questions by S OME C OMPANY Inc.

################################################### library(xtable) xtable(summary(model), caption = "The coefficient summary", label = 'coefsummary')

In response to the interview questions by S OME C OMPANY Inc.

################################################### ### code chunk number 9: Q2.rnw:670-671 ################################################### Stangle('Q2.rnw')

You might also like