Rookie Stock Prediction Cross Validation using

Graeber · Sep-17-2018, 08:24 PM

Hey all,

I am very new to this but am trying to write a stock prediction tool. I know there are tons of examples but I haven't found too many with the way Im trying it.

I am using pandas_datareader to read in content from yahoo. As many Im sure know, the data contains the following

High
Low
Open
Close
Volume
Adj Close
Date

I need to perform a cross validation where I insure the date ranges are in order ( not random since its a period of time data ) but I cannot seem to get it properly defined. I was trying to utilize TimeSeriesSplit. Does anyone have examples of this type of validation in use, or can you point me to somewhere, other than the sklearn data which seems very vague.

Thanks in advance

2018-09-11
1178.680054
1156.239990
1161.630005
1177.359985
1209300.0
1177.359985
2018-09-12
1178.609985
1158.359985
1172.719971
1162.819946
1295500.0
1162.819946
2018-09-13
1178.609985
1162.849976
1170.739990
1175.329956
1431200.0
1175.329956
2018-09-14
1180.425049
1168.329956
1179.099976
1172.530029
939400.0
1172.530029
2018-09-17
1177.020020
1154.030029
1170.140015
1166.318970
649608.0
1166.318970

**Larz60+** · Sep-17-2018, 08:46 PM

show your code, and/or current results or full untouched error traceback.

Graeber · (This post was last modified: Sep-17-2018, 11:02 PM by Larz60+.)

import pandas_datareader.data as pdr

stock='GOOG'
df=pdr.DataReader(stock, 'yahoo')
df.tail()
SZ=len(df.index)

SKLEARN shows this as the timeseriessplit info
>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> tscv = TimeSeriesSplit(n_splits=3)
>>> print(tscv)  
TimeSeriesSplit(max_train_size=None, n_splits=3)
>>> for train_index, test_index in tscv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]

but I cannot for the life of me determine what should be X and what should be y for data I am reading into a dataframe.

>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> tscv = TimeSeriesSplit(n_splits=3)
>>> print(tscv)  
TimeSeriesSplit(max_train_size=None, n_splits=3)
>>> for train_index, test_index in tscv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]

Graeber · (This post was last modified: Sep-17-2018, 11:03 PM by Larz60+.)

I think I have it.

forecast=int(7)
df['Predict']=df[['Close']].shift(-forecast_out)
#print (df.tail())
X=np.array(df.drop(['Predict'], 1))
X=preprocessing.scale(X)
X_forecast=X[-forecast_out:]
X=X[:-forecast_out:]
y=np.array(df['Predict'])
y=y[:-forecast_out]

tscv = TimeSeriesSplit(n_splits=10)
for train_index, test_index in tscv.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python prediction on historical data and/or external factors.	pyrooky	1	1,756	Sep-03-2020, 03:19 PM Last Post: DPaul
	Cross-validation: evaluating estimator performance	Grin	1	2,683	Jun-29-2018, 05:15 AM Last Post: scidam
	help with cross	Item97	27	11,606	Nov-28-2017, 09:18 PM Last Post: Item97
	10fold cross-validation on time series	ulrich48155	5	9,283	May-08-2017, 04:36 PM Last Post: ulrich48155

Rookie Stock Prediction Cross Validation using

User Panel Messages

Announcements