Recent

Sunday, July 24, 2016

Python Pandas: Replacement method for convert_objects()

The DataFrames.convert_objects() in Pandas is a very useful function to try to infer better data types for you imported data.

For example if you have just imported hockey player stats and the data looks like:
df.dtypes
Out[1]: 
PLAYER    object
TEAM      object
GP        object
G         object
A         object
PTS       object
+/-       object
dtype: object

Using convert_objects:
df.convert_objects(convert_numeric=True).dtypes
 __main__:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
Out[2]: 
PLAYER     object
TEAM       object
GP          int64
G           int64
A           int64
PTS         int64
+/-         int64
dtype: object


The return information indicates that it is deprecated, but isn't clear on a suitable replacement, because while convert_objects() tried to infer all columns in the data frame, pandas.to_numeric() is applied to a specific column. The solution is to combine it with the DataFrame.apply():

df.apply(pd.to_numeric, errors='ignore').dtypes
Out[3]: 
PLAYER     object
TEAM       object
GP          int64
G           int64
A           int64
PTS         int64
+/-         int64
dtype: object

1 comment:

  1. Thanks for your help. I found I needed:

    df = df.apply(pd.to_numeric, errors='ignore')

    ReplyDelete