You are on page 1of 1

One_Phrase_3633

r/statistics Search Get Coins


1 karma

r/statistics

Posts Data Sources Pt. 1 Data Sources Pt. 2

Posted by u/ysharm10 5 months ago


' About Community *
11
[Q] What does it mean when adf and kpss shows contradictory results?
(
Question
r/statistics
Hello! I am working with a time series data. When I run adf test, it implies that the data is not
stationary but kpss says it's stationary. I've done my research before posting here but it didn't really This is a subreddit for discussion on all
help me make my decision what model should I go for. According to what I read, adf presumes that things dealing with statistical theory,
there's atleast one differencing required until it finds obvious reasons to reject that hypothesis software, and application. We welcome all
whereas kpss assumes it's stationary until finds strong reason to reject that hypothesis. Auto arima researchers, students, professionals, and
picks (5,0,0) model since by default it goes by kpss to test. However, if I difference by one then the best enthusiasts looking to be a part of an
model is (0,1,0) which is a random walk model, don't know how useful it is. In such situation how to online statistics community.
know what test is meant for your data?
260k 386
Statisticians Online
Thanks!

Edit: Every test other than kpss indicates the data might be non stationary
Created 13 Mar 2008

Time series and acf plot:


Join
http://imgur.com/a/qd2jPa3
Create Post
! 11 Comments " Award # Share $ Save % Hide & Report 100% Upvoted

Comment as One_Phrase_3633 COMMUNITY OPTIONS

What are your thoughts?

ADVERTISEMENT

* Markdown Mode

SORT BY BEST

machine-learning-bro 5 months ago


I cannot comment on your exact question. But what I can say from experience is don’t spend too
much time on this aspect other than for your own edification and intellectual satisfaction. It
doesn’t seem like your end goal is to prove stationary vs non-stationary - you said it’s for selecting
a best model to use. It is interesting that two tests are giving contradictory results. r/statistics Rules

But at the end of day, you want to pick a model based on ur best judgement or an educated guess
(all other tests seem to indicate non-stationary) and then kinda just run with it (and keep these 1. This is not a subreddit for homework
findings in ur back pocket in case they actually prove a point u didn’t realize till later). I work with a questions.
lot of time series data too, but i work in a clinical context. I’m biased in that I always look at
2. Please try to keep submissions on topic
something if it has utility. Ive spent too much time in my PhD fiddling with with data to make sure
and of high quality.
it follows assumptions, where at the end of day I kinda just gotta see how the model performs on
predicting patient outcomes. Lol sorry, I’ll get off my soap box tho. Interesting questions! 3. Just because it has a statistic in it
doesn't make it statistics.
' 3 ( ! Reply Give Award Share Report Save

4. Memes and image macros are not


ysharm10 ) 5 months ago acceptable forms of content.
Thanks for commenting! I'll definitely take this advice in consideration especially if it's coming
from someone like you who has worked with time series based model. Also, Can (0,1,0) model 5. Self posts with throwaway accounts will
be deleted by AutoModerator.
ever be put in use on real data? I'm sorry I don't have much experience with random walk model

' 1 ( ! Reply Give Award Share Report Save 6. Posts must be appropriately tagged
and flaired.

efrique 5 months ago · edited 5 months ago


If I understand your post correctly, both tests failed to reject the null Related Communities

A failure to reject the null doesn't mean the null is true (which is how you seemed to be framing r/AskStatistics
Join
your discussion of both tests). It means you failed to find enough evidence to reject it. 40,800 members

This would be very common if the series are short r/MachineLearning


Join
1,775,998 members
' 2 ( ! Reply Give Award Share Report Save
r/dataisbeautiful
Join
ysharm10 ) 5 months ago 15,641,172 members

Yes, failed to reject both hypothesis. My data is in weeks and I have 60 weeks of data. So yeah
r/compsci
it's not that long. What do you suggest should be the approach in such situations? Join
1,368,825 members

' 2 ( ! Reply Give Award Share Report Save


r/datasets
Join
138,885 members
efrique 5 months ago · edited 5 months ago
I'd normally try to avoid testing to identify models at all. Among many other issues, it simply r/datascience
Join
answers the wrong question. 449,633 members

Usually I start with thinking about/reading about/researching the nature of my variables very r/biostatistics
Join
3,593 members
carefully. If I felt I had to test for some reason, I'd try to avoid testing the specific data I
needed a model for, but other, closely related data (e.g. same variable in a different time
r/rstats
span, similar/closely related variables etc) Join
52,323 members

Note that if you're choosing between models by testing on the data you're modelling, that r/Python
Join
interferes with the assumptions on which any subsequent inference is based -- tests, confidence 781,017 members
intervals, prediction intervals and so on; they no longer have their nominal properties.
r/computerscience
Join
[In short, much of the work in econometrics and finance is based on what I see as poor 182,097 members

practice. Definitely don't get me started on Jarque-Bera tests, we could be here a long time
picking those apart.]
Search by Flair
' 2 ( ! Reply Give Award Share Report Save

ysharm10 ) 5 months ago Statistics Question


I understand your point and I'll definitely keep in mind. But for something like a
univariate time series data, does it make sense to analyze some other data than on the Research/Article
one you want a model? Every time series will be different. Sorry if I didn't get your point.
Discussion
' 1 ( ! Reply Give Award Share Report Save

efrique 5 months ago · edited 5 months ago Career Advice


See more
Every time series will be different
College Advice
1. Your sample's already too small to do model selection on
Moderators
Grad School Advice
2. If you want to use the data to select the model without screwing with the
properties of your inference, you would need to split your data into a subset for
Software
model selection and the remainder for inference. Since your sample size was so + Message the mods
small I figured you'd prefer to consider approaches that wouldn't cost you most
of your data u/TheShittyBeatles

3. You seem to be hung up on getting "the right" model rather than a useful model. u/feedmahfish
This is a hopeless task, even more so with small samples.
u/keepitsalty MSCS | Bayes for Days
' 1 ( ! Reply Give Award Share Report Save
u/AutoModerator

u/mmm_toasty PhD* | Data Science/Epidemiol…


azatryt 5 months ago
Maybe you should consider repeating the ADF test with different lags (the “right” number should VIEW ALL MODERATORS
be suggested by theory or previous literature) or maybe including drift/trend options depending
on your data. I don’t know what software you are using but Stata has options for all of these
features, maybe give the user manual page for that command a read. Also, the Dickey-Fuller test ADVERTISEMENT
has low statistical power for near-unit root processes.

' 2 ( ! Reply Give Award Share Report Save

ysharm10 ) 5 months ago


Got it! I'll try that. Thanks!

' 1 ( ! Reply Give Award Share Report Save

badge 5 months ago


I don’t have any good ideas, but just to check—I assume you know that the null hypotheses of the
ADF and KPSS tests are opposite (so to speak)?

' 1 ( ! Reply Give Award Share Report Save

ysharm10 ) 5 months ago


Help About
Yes, I know. I kept that in mind before concluding
Reddit App Careers
' 2 ( ! Reply Give Award Share Report Save Reddit Coins Press
Reddit Premium Advertise
Reddit Gifts Blog
Terms
Content Policy
Privacy Policy
Mod Policy

Reddit Inc © 2021. All rights reserved


Back to Top

You might also like