pytz - Pandas convert datetime with a separate time zone column -
i have dataframe column time zone , column datetime. convert these utc first join other data, , i'll have calculations convert utc viewers local time zone eventually.
datetime time_zone 2016-09-19 01:29:13 america/bogota 2016-09-19 02:16:04 america/new_york 2016-09-19 01:57:54 africa/cairo def create_utc(df, column, time_format='%y-%m-%d %h:%m:%s'): timezone = df['tz'] df[column + '_utc'] = df[column].dt.tz_localize(timezone).dt.tz_convert('utc').dt.strftime(time_format) df[column + '_utc'].replace('nat', np.nan, inplace=true) df[column + '_utc'] = pd.to_datetime(df[column + '_utc']) return df
that flawed attempt. error truth ambiguous makes sense because 'timezone' variable referring column. how refer value in same row?
edit: here results answers below on 1 day of data (394,000 rows , 22 unique time zones). edit2: added groupby example in case wants see results. fastest, far.
%%timeit tz in df['tz'].unique(): df.ix[df['tz'] == tz, 'datetime_utc2'] = df.ix[df['tz'] == tz, 'datetime'].dt.tz_localize(tz).dt.tz_convert('utc') df['datetime_utc2'] = df['datetime_utc2'].dt.tz_localize(none)
1 loops, best of 3: 1.27 s per loop
%%timeit df['datetime_utc'] = [d['datetime'].tz_localize(d['tz']).tz_convert('utc') i, d in df.iterrows()] df['datetime_utc'] = df['datetime_utc'].dt.tz_localize(none)
1 loops, best of 3: 50.3 s per loop
df['datetime_utc'] = pd.concat([d['datetime'].dt.tz_localize(tz).dt.tz_convert('utc') tz, d in df.groupby('tz')]) **1 loops, best of 3: 249 ms per loop**
here vectorized approach (it loop df.time_zone.nunique()
times):
in [2]: t out[2]: datetime time_zone 0 2016-09-19 01:29:13 america/bogota 1 2016-09-19 02:16:04 america/new_york 2 2016-09-19 01:57:54 africa/cairo 3 2016-09-19 11:00:00 america/bogota 4 2016-09-19 12:00:00 america/new_york 5 2016-09-19 13:00:00 africa/cairo in [3]: tz in t.time_zone.unique(): ...: mask = (t.time_zone == tz) ...: t.loc[mask, 'datetime'] = \ ...: t.loc[mask, 'datetime'].dt.tz_localize(tz).dt.tz_convert('utc') ...: in [4]: t out[4]: datetime time_zone 0 2016-09-19 06:29:13 america/bogota 1 2016-09-19 06:16:04 america/new_york 2 2016-09-18 23:57:54 africa/cairo 3 2016-09-19 16:00:00 america/bogota 4 2016-09-19 16:00:00 america/new_york 5 2016-09-19 11:00:00 africa/cairo
Comments
Post a Comment