IT Stuff

Pandas 1. Hands-on python pandas intro.

To install pandas:

pip install pandas

[admin@localhost ~]$ cat > test.csv

Rooms,Price,Floors,Area

1,300,1,30

1,400,1,50

3,400,1,65

2,200,1,45

5,700,3,120

,400,2,70

,300,1,40

4,,2,95

>>> import pandas as pd

>>> test_file_path = "~/test.csv"

>>> test_data = pd.read_csv(test_file_path)

>>> type(test_data)

>>> help(pd.DataFrame)

class DataFrame(pandas.core.generic.NDFrame)

| Two-dimensional size-mutable, potentially heterogeneous tabular data

| structure with labeled axes (rows and columns). Arithmetic operations

| align on both row and column labels. Can be thought of as a dict-like

| container for Series objects. The primary pandas data structure.

>>> test_data

>>> test_data.describe()

>>> test_data.columns

Index([u'Rooms', u'Price', u'Floors', u'Area'], dtype='object')
>>> test_data.columns.values
array(['Rooms', 'Price', 'Floors', 'Area'], dtype=object)

>>> help(test_data.dropna)

dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False) method of pandas.core.frame.DataFrame instance

Remove missing values.

----------

axis : {0 or 'index', 1 or 'columns'}, default 0

Determine if rows or columns which contain missing values are

removed.

* 0, or 'index' : Drop rows which contain missing values.

* 1, or 'columns' : Drop columns which contain missing value.

>>> test_data = test_data.dropna(axis=0)

>>> test_data

<<output omitted - rows with NaN values are removed>

>>> test_data.describe()

>>> test_data.columns.values
array(['Rooms', 'Price', 'Floors', 'Area'], dtype=object)

>>> price = test_data.Price

>>> type(price)

>>> help(pd.Series)

class Series(pandas.core.base.IndexOpsMixin, pandas.core.generic.NDFrame)

| One-dimensional ndarray with axis labels (including time series).

>>> price

>>> price.describe()

>>> test_data.columns.values
array(['Rooms', 'Price', 'Floors', 'Area'], dtype=object)

>>> test_data_features=['Rooms','Price']

>>> features = test_data[test_data_features]

>>> type(features)

>>> features

>>> features.describe()

>>> features.head(n=3)

>>> help(features.head)

head(self, n=5) method of pandas.core.frame.DataFrame instance

Return the first `n` rows.

>>> >>> test_data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 4 columns):
Rooms 5 non-null float64
Price 5 non-null float64
Floors 5 non-null int64
Area 5 non-null int64
dtypes: float64(2), int64(2)
memory usage: 200.0 bytes

>>>

IT Stuff

Wednesday, August 8, 2018

Pandas 1. Hands-on python pandas intro.

No comments:

Post a Comment