# In[1]
from datetime import datetime
datetime(year=2023,month=7,day=28)
# Out[1]
datetime.datetime(2023, 7, 28, 0, 0)
dateutil
module, you can parse dates from a variety of string formats.# In[2]
from dateutil import parser
date=parser.parse("28th of July, 2023")
date
# Out[2]
datetime.datetime(2023, 7, 28, 0, 0)
datetime
object, you can do things like printing the day of the week.# In[3]
date.strftime('%A')
# Out[3]
'Friday'
datetime64
dtype encodes dates as 64-bit integers, and thus allows arrays of dates to be represented compactly and operated on in an efficient manner.# In[4]
date=np.array('2023-07-28',dtype=np.datetime64)
date
# Out[4]
array('2023-07-28', dtype='datetime64[D]')
# In[5]
date+np.arange(12)
# Out[5]
array(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
'2023-08-05', '2023-08-06', '2023-08-07', '2023-08-08'],
dtype='datetime64[D]')
datetime
objects, especially as arrays get large.datetime64
and related timedelta64
objects is that they are built on a fundamental time unit.datetime64
object is limited to 64-bit precision, the range of encodable times is times this fundamental unit.datetime64
imposes a trade-off between time resolution and maximum time span.# In[6]
print(np.datetime64('2023-07-28')) # day based
print(np.datetime64('2023-07-28 12:00')) # minute based
print(np.datetime64('2023-07-28 12:59:59.50','ns')) # nanosecond based
# Out[6]
2023-07-28
2023-07-28T12:00
2023-07-28T12:59:59.500000000
datetime64
can be found in Numpy's datatime64 documentation Timestamp
object, which combines the ease of use of datetime
and dateutil
with the efficient storage and vectorized interface of numpy.datetime64
Timestamp
objects, Pandas can construct a DatetimeIndex that can be used to index data in a Series or DataFrame.# In[7]
date=pd.to_datetime("28th of July, 2023")
date
# Out[7]
Timestamp('2023-07-28 00:00:00')
# In[8]
date.strftime('%A')
# Out[8]
'Friday'
# In[9]
date+pd.to_timedelta(np.arange(12),'D')
# Out[9]
DatetimeIndex(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
'2023-08-05', '2023-08-06', '2023-08-07', '2023-08-08'],
dtype='datetime64[ns]', freq=None)
# In[10]
index=pd.DatetimeIndex(['2023-07-28','2023-08-28',
'2024-07-28','2024-08-28'])
data=pd.Series([0,1,2,3],index=index)
data
# Out[10]
2023-07-28 0
2023-08-28 1
2024-07-28 2
2024-08-28 3
dtype: int64
# In[11]
data['2023-07-28':'2024-07-28']
# Out[11]
2023-07-28 0
2023-08-28 1
2024-07-28 2
dtype: int64
# In[12]
data['2023']
# Out[12]
2023-07-28 0
2023-08-28 1
dtype: int64
For timestamps, Pandas provides the Timestamp
type.
datetime
, but it's based on the more effcient np.datetime64
data type.DatetimeIndex
For time periods, Pandas provides the Period
type.
np.datetime64
PeriodIndex
For time deltas or durations, Pandas provides the Timedelta
type.
datetime.timedelta
type, and is based on np.timedelta64
TimedeltaIndex
Commonly, we use the pd.to_datetime
function, which can parse(=analyze) a wide variety of formats.
Passing a single date to pd.to_datetime
yields a Timestamp.
# In[13]
dates=pd.to_datetime([datetime(2023,7,28),'28th of July, 2023',
'2023-07-30','31-07-2023','20230801'])
dates
# Out[13]
DatetimeIndex(['2023-07-28', '2023-07-28', '2023-07-30', '2023-07-31',
'2023-08-01'],
dtype='datetime64[ns]', freq=None)
to_period
function, with the addition of a frequency code.# In[14]
dates.to_period('D')
# Out[14]
PeriodIndex(['2023-07-28', '2023-07-28', '2023-07-30', '2023-07-31',
'2023-08-01'],
dtype='period[D]')
# In[15]
dates-dates[0]
# Out[15]
TimedeltaIndex(['0 days', '0 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
pd.date_range
for timestampspd.period_range
for periodspd.timedelta_range
for time deltas.# In[16]
pd.date_range('2023-07-28','2023-08-14')
# Out[16]
DatetimeIndex(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
'2023-08-05', '2023-08-06', '2023-08-07', '2023-08-08',
'2023-08-09', '2023-08-10', '2023-08-11', '2023-08-12',
'2023-08-13', '2023-08-14'],
dtype='datetime64[ns]', freq='D')
# In[17]
pd.date_range('2023-07-28',periods=8)
# Out[17]
DatetimeIndex(['2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31',
'2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04'],
dtype='datetime64[ns]', freq='D')
freq
argument, which defaults to D
.# In[18]
pd.date_range('2023-07-28',periods=8,freq='H')
# Out[18]
DatetimeIndex(['2023-07-28 00:00:00', '2023-07-28 01:00:00',
'2023-07-28 02:00:00', '2023-07-28 03:00:00',
'2023-07-28 04:00:00', '2023-07-28 05:00:00',
'2023-07-28 06:00:00', '2023-07-28 07:00:00'],
dtype='datetime64[ns]', freq='H')
pd.period_range
and pd.timedelta_range
functions are useful.# In[19]
pd.period_range('2023-07',periods=8,freq='M')
# Out[19]
PeriodIndex(['2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12',
'2024-01', '2024-02'],
dtype='period[M]')
# In[20]
pd.timedelta_range(0,periods=6,freq='H')
# Out[20]
TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00'],
dtype='timedelta64[ns]', freq='H')
Listing of Pandas frequency codes
Code | Description | Codes | Description |
---|---|---|---|
D | Calendar day | B | Business day |
W | Weekly | ||
M | Month end | BM | Business month end |
Q | Quarter end | BQ | Business quarter end |
A | Year end | BA | Business year end |
H | Hours | BH | Business hours |
T | Minutes | ||
S | Seconds | ||
L | Milliseconds | ||
U | Microseconds | ||
N | Nanoseconds |
Listing of start-indexed frequency codes
Code | Description |
---|---|
MS | Month start |
QS | Quarter start |
AS | Year start |
BS | Business month start |
BQS | Business quarter start |
BAS | Business year start |
Additionally, you can change the month used to mark any quarterly or annual code by adding a three-letter month code as a suffix.
Q-JAN
, BQ-FEB
, QS-MAR
, BQS-APR
etc.In the same way, the split point of the weekly frequency can be modified by adding a three-letter weekday code.
W-SUN
, W-MON
, W-TUE
, W-WED
etc.Codes can be combined with numbers to specify otehr frequencies.
# In[21]
pd.timedelta_range(0,periods=6,freq='2H30T')
# Out[21]
TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
'0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00'],
dtype='timedelta64[ns]', freq='150T')
pd.tseries.offsets
module.# In[22]
from pandas.tseries.offsets import BDay
pd.date_range('2023-07-28',periods=6,freq=BDay())
# Out[22]
DatetimeIndex(['2023-07-28', '2023-07-31', '2023-08-01', '2023-08-02',
'2023-08-03', '2023-08-04'],
dtype='datetime64[ns]', freq='B')
# In[23]
import pandas_datareader.data as web
start_date = datetime(2006, 1, 1)
end_date = datetime(2016, 1, 1)
#Bank of America
bac = data.DataReader('BAC', 'stooq', start_date, end_date)
bac.head()
# Out[23]
Open High Low Close Volume
Date
2015-12-31 14.7814 14.8325 14.6233 14.6233 5.417059e+07
2015-12-30 14.9473 14.9807 14.8070 14.8168 4.030734e+07
2015-12-29 14.9897 15.0780 14.9130 15.0131 5.251059e+07
2015-12-28 14.9630 14.9720 14.7539 14.8846 4.803435e+07
2015-12-24 15.0495 15.1035 14.9630 15.0063 3.380344e+07
# In[24]
bac=bac['Close']
plot
method.# In[25]
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
bac.plot();
resample
method, or the much simpler asfreq
method.resample
is fundamentally a data aggregation, while asfreq
is fundamentally a data selection.# In[26]
bac.plot(alpha=0.5,style='-')
bac.resample('BA').mean().plot(style=':')
bac.asfreq('BA').plot(style='--')
plt.legend(['input','resample','asfreq'],loc='upper left');
resample
reports the average of the previous year, while asfreq
reports the value at the end of the year.resample
and asfreq
are largely equivalent, though resample
has many more options available.pd.fillna
function, asfreq
accepts a method argument to specify how values are imputed.# In[27]
fig, ax=plt.subplots(2,sharex=True)
data=bac.iloc[:20]
data.asfreq('D').plot(ax=ax[0],marker='o')
data.asfreq('D',method='bfill').plot(ax=ax[1],style='-o')
data.asfreq('D',method='ffill').plot(ax=ax[1],style='--o')
ax[1].legend(["back-fill","forward-fill"]);
shift
method, which can be used to shift data by a given number of entries.# In[28]
bac=bac.asfreq('D',method='pad')
ROI=100*(bac.shift(-365)-bac)/bac
ROI.plot()
plt.ylabel('% Return on Investment after 1 year');
rolling
attribute of Series and DataFrame object, which returns a view similar to what we saw with the groupby operation.# In[29]
rolling=bac.rolling(365,center=True)
data=pd.DataFrame({'input':bac,'one-year rolling_mean':rolling.mean(),
'one-year rolling_median':rolling.median()})
ax=data.plot(style=['-','--',':'])
ax.lines[0].set_alpha(0.3)