Python Forum
Rookie Stock Prediction Cross Validation using
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Rookie Stock Prediction Cross Validation using
#1
Hey all,

I am very new to this but am trying to write a stock prediction tool. I know there are tons of examples but I haven't found too many with the way Im trying it.

I am using pandas_datareader to read in content from yahoo. As many Im sure know, the data contains the following

High
Low
Open
Close
Volume
Adj Close
Date


I need to perform a cross validation where I insure the date ranges are in order ( not random since its a period of time data ) but I cannot seem to get it properly defined. I was trying to utilize TimeSeriesSplit. Does anyone have examples of this type of validation in use, or can you point me to somewhere, other than the sklearn data which seems very vague.

Thanks in advance





2018-09-11
1178.680054
1156.239990
1161.630005
1177.359985
1209300.0
1177.359985
2018-09-12
1178.609985
1158.359985
1172.719971
1162.819946
1295500.0
1162.819946
2018-09-13
1178.609985
1162.849976
1170.739990
1175.329956
1431200.0
1175.329956
2018-09-14
1180.425049
1168.329956
1179.099976
1172.530029
939400.0
1172.530029
2018-09-17
1177.020020
1154.030029
1170.140015
1166.318970
649608.0
1166.318970
Reply
#2
show your code, and/or current results or full untouched error traceback.
Reply
#3
import pandas_datareader.data as pdr

stock='GOOG'
df=pdr.DataReader(stock, 'yahoo')
df.tail()
SZ=len(df.index)

SKLEARN shows this as the timeseriessplit info
>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> tscv = TimeSeriesSplit(n_splits=3)
>>> print(tscv)  
TimeSeriesSplit(max_train_size=None, n_splits=3)
>>> for train_index, test_index in tscv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
but I cannot for the life of me determine what should be X and what should be y for data I am reading into a dataframe.

>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> tscv = TimeSeriesSplit(n_splits=3)
>>> print(tscv)  
TimeSeriesSplit(max_train_size=None, n_splits=3)
>>> for train_index, test_index in tscv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
Reply
#4
I think I have it.
forecast=int(7)
df['Predict']=df[['Close']].shift(-forecast_out)
#print (df.tail())
X=np.array(df.drop(['Predict'], 1))
X=preprocessing.scale(X)
X_forecast=X[-forecast_out:]
X=X[:-forecast_out:]
y=np.array(df['Predict'])
y=y[:-forecast_out]

tscv = TimeSeriesSplit(n_splits=10)
for train_index, test_index in tscv.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python prediction on historical data and/or external factors. pyrooky 1 1,756 Sep-03-2020, 03:19 PM
Last Post: DPaul
  Cross-validation: evaluating estimator performance Grin 1 2,683 Jun-29-2018, 05:15 AM
Last Post: scidam
  help with cross Item97 27 11,606 Nov-28-2017, 09:18 PM
Last Post: Item97
  10fold cross-validation on time series ulrich48155 5 9,283 May-08-2017, 04:36 PM
Last Post: ulrich48155

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020