Tuesday, August 14, 2018

Scikit-learn 2. Hands-on python scikit-learn: RandomForest.

To read about what is Random Forest go to -  it-tuff.blogspot.com/machine-learning-1

If you can't remember where skikit-learn models, metrics etc. are:

  1. locate sklearn | grep utils | cut -d"/" -f 1-6 | uniq
  2. cd to the found directory:
    1. cd /usr/lib64/python2.7/site-packages/sklearn
  3. to view packages:
    1. ll | grep ^d
>>> import pandas as pd
>>> test_file_path = "~/test.csv"
>>> test_data = pd.read_csv(test_file_path)
>>> # check if data has NaN values
>>> test_data.isnull()
>>> test_data = test_data.dropna(axis=0)
>>> test_data.columns.values
>>> test_data_features = ['Rooms','Floors','Area']
>>> X = test_data[test_data_features]
>>> y = test_data.Price
>>> from sklearn.model_selection import train_test_split
>>> train_X, val_X = train_test_split(X, random_state=0)
>>> train_y, val_y = train_test_split(y, random_state=0)
>>> from sklearn.ensemble import RandomForestRegressor
>>> # rfr states for RandomForestRegressor (by default RFR creates 10 trees)
>>> test_rfr_model = RandomForestRegressor(random_state=1)
>>> test_rfr_model.fit(train_X, train_y)
>>> test_rfr_preds = test_rfr_model.predict(val_X)
>>> from sklearn.metrics import mean_absolute_error
>>> mean_absolute_error(val_y, test_rfr_preds)
40.0

As you even with default values Random Forest gives better results (in it-tuff.blogspot.com/scikit-learn-1 MAE of CART DT was 150.0).

No comments:

Post a Comment