# In[1]
from datetime import datetime
datetime(year=2023,month=7,day=28)
# Out[1]
datetime.datetime(2023, 7, 28, 0, 0)
dateutil module, you can parse dates from a variety of string formats.# In[2]
from dateutil import parser
date=parser.parse("28th of July, 2023")
date
# Out[2]
datetime.datetime(2023, 7, 28, 0, 0)
datetime object, you can do things like printing the day of the week.# In[3]
date.strftime('%A')
# Out[3]
'Friday'
datetime64 dtype encodes dates as 64-bit integers, and thus allows arrays of dates to be represented compactly and operated on in an efficient manner.# In[4]
date=np.array('2023-07-28',dtype=np.datetime64)
date
# Out[4]
array('2023-07-28', dtype='datetime64[D]')
# In[5]
date+np.arange(12)
# Out[5]
array(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
'2023-08-05', '2023-08-06', '2023-08-07', '2023-08-08'],
dtype='datetime64[D]')
datetime objects, especially as arrays get large.datetime64 and related timedelta64 objects is that they are built on a fundamental time unit.datetime64 object is limited to 64-bit precision, the range of encodable times is times this fundamental unit.datetime64 imposes a trade-off between time resolution and maximum time span.# In[6]
print(np.datetime64('2023-07-28')) # day based
print(np.datetime64('2023-07-28 12:00')) # minute based
print(np.datetime64('2023-07-28 12:59:59.50','ns')) # nanosecond based
# Out[6]
2023-07-28
2023-07-28T12:00
2023-07-28T12:59:59.500000000
datetime64 can be found in Numpy's datatime64 documentation Timestamp object, which combines the ease of use of datetime and dateutil with the efficient storage and vectorized interface of numpy.datetime64Timestamp objects, Pandas can construct a DatetimeIndex that can be used to index data in a Series or DataFrame.# In[7]
date=pd.to_datetime("28th of July, 2023")
date
# Out[7]
Timestamp('2023-07-28 00:00:00')
# In[8]
date.strftime('%A')
# Out[8]
'Friday'
# In[9]
date+pd.to_timedelta(np.arange(12),'D')
# Out[9]
DatetimeIndex(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
'2023-08-05', '2023-08-06', '2023-08-07', '2023-08-08'],
dtype='datetime64[ns]', freq=None)
# In[10]
index=pd.DatetimeIndex(['2023-07-28','2023-08-28',
'2024-07-28','2024-08-28'])
data=pd.Series([0,1,2,3],index=index)
data
# Out[10]
2023-07-28 0
2023-08-28 1
2024-07-28 2
2024-08-28 3
dtype: int64
# In[11]
data['2023-07-28':'2024-07-28']
# Out[11]
2023-07-28 0
2023-08-28 1
2024-07-28 2
dtype: int64
# In[12]
data['2023']
# Out[12]
2023-07-28 0
2023-08-28 1
dtype: int64
For timestamps, Pandas provides the Timestamp type.
datetime, but it's based on the more effcient np.datetime64 data type.DatetimeIndexFor time periods, Pandas provides the Period type.
np.datetime64PeriodIndexFor time deltas or durations, Pandas provides the Timedelta type.
datetime.timedelta type, and is based on np.timedelta64TimedeltaIndexCommonly, we use the pd.to_datetime function, which can parse(=analyze) a wide variety of formats.
Passing a single date to pd.to_datetime yields a Timestamp.
# In[13]
dates=pd.to_datetime([datetime(2023,7,28),'28th of July, 2023',
'2023-07-30','31-07-2023','20230801'])
dates
# Out[13]
DatetimeIndex(['2023-07-28', '2023-07-28', '2023-07-30', '2023-07-31',
'2023-08-01'],
dtype='datetime64[ns]', freq=None)
to_period function, with the addition of a frequency code.# In[14]
dates.to_period('D')
# Out[14]
PeriodIndex(['2023-07-28', '2023-07-28', '2023-07-30', '2023-07-31',
'2023-08-01'],
dtype='period[D]')
# In[15]
dates-dates[0]
# Out[15]
TimedeltaIndex(['0 days', '0 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
pd.date_range for timestampspd.period_range for periodspd.timedelta_range for time deltas.# In[16]
pd.date_range('2023-07-28','2023-08-14')
# Out[16]
DatetimeIndex(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
'2023-08-05', '2023-08-06', '2023-08-07', '2023-08-08',
'2023-08-09', '2023-08-10', '2023-08-11', '2023-08-12',
'2023-08-13', '2023-08-14'],
dtype='datetime64[ns]', freq='D')
# In[17]
pd.date_range('2023-07-28',periods=8)
# Out[17]
DatetimeIndex(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04'],
dtype='datetime64[ns]', freq='D')
freq argument, which defaults to D.# In[18]
pd.date_range('2023-07-28',periods=8,freq='H')
# Out[18]
DatetimeIndex(['2023-07-28 00:00:00', '2023-07-28 01:00:00',
'2023-07-28 02:00:00', '2023-07-28 03:00:00',
'2023-07-28 04:00:00', '2023-07-28 05:00:00',
'2023-07-28 06:00:00', '2023-07-28 07:00:00'],
dtype='datetime64[ns]', freq='H')
pd.period_range and pd.timedelta_range functions are useful.# In[19]
pd.period_range('2023-07',periods=8,freq='M')
# Out[19]
PeriodIndex(['2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12',
'2024-01', '2024-02'],
dtype='period[M]')
# In[20]
pd.timedelta_range(0,periods=6,freq='H')
# Out[20]
TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00'],
dtype='timedelta64[ns]', freq='H')
Listing of Pandas frequency codes
| Code | Description | Codes | Description |
|---|---|---|---|
D | Calendar day | B | Business day |
W | Weekly | ||
M | Month end | BM | Business month end |
Q | Quarter end | BQ | Business quarter end |
A | Year end | BA | Business year end |
H | Hours | BH | Business hours |
T | Minutes | ||
S | Seconds | ||
L | Milliseconds | ||
U | Microseconds | ||
N | Nanoseconds |
Listing of start-indexed frequency codes
| Code | Description |
|---|---|
MS | Month start |
QS | Quarter start |
AS | Year start |
BS | Business month start |
BQS | Business quarter start |
BAS | Business year start |
Additionally, you can change the month used to mark any quarterly or annual code by adding a three-letter month code as a suffix.
Q-JAN, BQ-FEB, QS-MAR, BQS-APR etc.In the same way, the split point of the weekly frequency can be modified by adding a three-letter weekday code.
W-SUN, W-MON, W-TUE, W-WED etc.Codes can be combined with numbers to specify otehr frequencies.
# In[21]
pd.timedelta_range(0,periods=6,freq='2H30T')
# Out[21]
TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
'0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00'],
dtype='timedelta64[ns]', freq='150T')
pd.tseries.offsets module.# In[22]
from pandas.tseries.offsets import BDay
pd.date_range('2023-07-28',periods=6,freq=BDay())
# Out[22]
DatetimeIndex(['2023-07-28', '2023-07-31', '2023-08-01', '2023-08-02',
'2023-08-03', '2023-08-04'],
dtype='datetime64[ns]', freq='B')
# In[23]
import pandas_datareader.data as web
start_date = datetime(2006, 1, 1)
end_date = datetime(2016, 1, 1)
#Bank of America
bac = data.DataReader('BAC', 'stooq', start_date, end_date)
bac.head()
# Out[23]
Open High Low Close Volume
Date
2015-12-31 14.7814 14.8325 14.6233 14.6233 5.417059e+07
2015-12-30 14.9473 14.9807 14.8070 14.8168 4.030734e+07
2015-12-29 14.9897 15.0780 14.9130 15.0131 5.251059e+07
2015-12-28 14.9630 14.9720 14.7539 14.8846 4.803435e+07
2015-12-24 15.0495 15.1035 14.9630 15.0063 3.380344e+07
# In[24]
bac=bac['Close']
plot method.# In[25]
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
bac.plot();

resample method, or the much simpler asfreq method.resample is fundamentally a data aggregation, while asfreq is fundamentally a data selection.# In[26]
bac.plot(alpha=0.5,style='-')
bac.resample('BA').mean().plot(style=':')
bac.asfreq('BA').plot(style='--')
plt.legend(['input','resample','asfreq'],loc='upper left');

resample reports the average of the previous year, while asfreq reports the value at the end of the year.resample and asfreq are largely equivalent, though resample has many more options available.pd.fillna function, asfreq accepts a method argument to specify how values are imputed.# In[27]
fig, ax=plt.subplots(2,sharex=True)
data=bac.iloc[:20]
data.asfreq('D').plot(ax=ax[0],marker='o')
data.asfreq('D',method='bfill').plot(ax=ax[1],style='-o')
data.asfreq('D',method='ffill').plot(ax=ax[1],style='--o')
ax[1].legend(["back-fill","forward-fill"]);

shift method, which can be used to shift data by a given number of entries.# In[28]
bac=bac.asfreq('D',method='pad')
ROI=100*(bac.shift(-365)-bac)/bac
ROI.plot()
plt.ylabel('% Return on Investment after 1 year');

rolling attribute of Series and DataFrame object, which returns a view similar to what we saw with the groupby operation.# In[29]
rolling=bac.rolling(365,center=True)
data=pd.DataFrame({'input':bac,'one-year rolling_mean':rolling.mean(),
'one-year rolling_median':rolling.median()})
ax=data.plot(style=['-','--',':'])
ax.lines[0].set_alpha(0.3)
