Scikit-learn 2. Hands-on python scikit-learn: RandomForest.
To read about what is Random Forest go to - it-tuff.blogspot.com/machine-learning-1
If you can't remember where skikit-learn models, metrics etc. are:
- locate sklearn | grep utils | cut -d"/" -f 1-6 | uniq
- cd to the found directory:
- cd /usr/lib64/python2.7/site-packages/sklearn
- to view packages:
- ll | grep ^d
>>> import pandas as pd
>>> test_file_path = "~/test.csv"
>>> test_data = pd.read_csv(test_file_path)
>>> # check if data has NaN values
>>> test_data.isnull()
>>> test_data = test_data.dropna(axis=0)
>>> test_data.columns.values
>>> test_data_features = ['Rooms','Floors','Area']
>>> X = test_data[test_data_features]
>>> y = test_data.Price
>>> from sklearn.model_selection import train_test_split
>>> train_X, val_X = train_test_split(X, random_state=0)
>>> train_y, val_y = train_test_split(y, random_state=0)
>>> from sklearn.ensemble import RandomForestRegressor
>>> # rfr states for RandomForestRegressor (by default RFR creates 10 trees)
>>> test_rfr_model = RandomForestRegressor(random_state=1)
>>> test_rfr_model.fit(train_X, train_y)
>>> test_rfr_preds = test_rfr_model.predict(val_X)
>>> # check if data has NaN values
>>> test_data.isnull()
>>> test_data = test_data.dropna(axis=0)
>>> test_data.columns.values
>>> test_data_features = ['Rooms','Floors','Area']
>>> X = test_data[test_data_features]
>>> y = test_data.Price
>>> from sklearn.model_selection import train_test_split
>>> train_X, val_X = train_test_split(X, random_state=0)
>>> train_y, val_y = train_test_split(y, random_state=0)
>>> from sklearn.ensemble import RandomForestRegressor
>>> # rfr states for RandomForestRegressor (by default RFR creates 10 trees)
>>> test_rfr_model = RandomForestRegressor(random_state=1)
>>> test_rfr_model.fit(train_X, train_y)
>>> test_rfr_preds = test_rfr_model.predict(val_X)
>>> from sklearn.metrics import mean_absolute_error
>>> mean_absolute_error(val_y, test_rfr_preds)
40.0
As you even with default values Random Forest gives better results (in it-tuff.blogspot.com/scikit-learn-1 MAE of CART DT was 150.0).
>>> mean_absolute_error(val_y, test_rfr_preds)
40.0
As you even with default values Random Forest gives better results (in it-tuff.blogspot.com/scikit-learn-1 MAE of CART DT was 150.0).
No comments:
Post a Comment