# In[1]
data=['peter','Paul','MARY','gUIDO']
[s.capitalize() for s in data]
# Out[1]
['Peter', 'Paul', 'Mary', 'Guido']
# In[2]
data=['peter','Paul',None,'MARY','gUIDO']
[s if s is None else s.capitalize() for s in data]
# Out[2]
['Peter', 'Paul', None, 'Mary', 'Guido']
str
attribute of Pandas Series and Index objects containing strings.# In[3]
names=pd.Series(data)
names.str.capitalize()
# Out[3]
0 Peter
1 Paul
2 None
3 Mary
4 Guido
dtype: object
str
methods mirror Python string methods.# In[4]
monte=pd.Series(['Graham Chapman','John Cleese','Terry Gilliam',
'Eric Idle','Terry Jones','Michael Palin'])
# In[5]
monte.str.lower()
# Out[5]
0 graham chapman
1 john cleese
2 terry gilliam
3 eric idle
4 terry jones
5 michael palin
dtype: object
# In[6]
monte.str.len()
# Out[6]
0 14
1 11
2 13
3 9
4 11
5 13
dtype: int64
# In[7]
monte.str.startswith('T')
# Out[7]
0 False
1 False
2 True
3 False
4 True
5 False
dtype: bool
# In[8]
monte.str.split()
# Out[8]
0 [Graham, Chapman]
1 [John, Cleese]
2 [Terry, Gilliam]
3 [Eric, Idle]
4 [Terry, Jones]
5 [Michael, Palin]
dtype: object
re
module.Mapping between Pandas methods and functions in Python's re
module
Method | Description |
---|---|
match | Calls re.match on each element, returning a Boolean |
extract | Calls re.match on each element, returning matched groups as strings |
findall | Calls re.findall on each element |
replace | Replaces occurrences of pattern with some other string |
contains | Calls re.search on each element, returning a Boolean |
count | Counts occurrences of pattern |
split | Equivalent to str.split , but accepts regexps |
rsplit | Equivalent to str.rsplit , but accepts regexps |
# In[9]
monte.str.extract('([A-Za-z]+)',expand=False)
# Out[9]
0 Graham
1 John
2 Terry
3 Eric
4 Terry
5 Michael
dtype: object
# In[10]
monte.str.findall(r'^[^AEIOU].*[^aeiou]$')
# Out[10]
0 [Graham Chapman]
1 []
2 [Terry Gilliam]
3 []
4 [Terry Jones]
5 [Michael Palin]
dtype: object
If you want to know more about Pandas string methods and regular expressions, reference these urls :
1. About string methods
2. About regular expressions
Other Pandas string methods
Method | Description |
---|---|
get | Indexes each element |
slice | Slices each element |
slice_replace | Replaces slice in each element with the passed value |
cat | Concatenates strings |
repeat | Repeats values |
normalize | Returns Unicode form of strings |
pad | Adds whitespace to left, right, or both sides of strings |
wrap | Splits long strings into lines with length less than a given width |
join | Joins strings in each element of the Series with the passed separator |
get_dummies | Extracts dummy variable as a DataFrame |
get
and slice
operations, in particular, enable vectorized element access from each array.str.sliec(0,3)
df.str.slice(0,3)
is equivalent to df.str[0,3]
# In[11]
monte.str[0:3]
# Out[11]
0 Gra
1 Joh
2 Ter
3 Eri
4 Ter
5 Mic
dtype: object
Indexing via df.str.get(i)
and df.str[i]
are likewise similar.
These indexing methods also let you access elements of arrays returned by split
# In[12]
monte.str.split().str[-1]
# Out[12]
0 Chapman
1 Cleese
2 Gilliam
3 Idle
4 Jones
5 Palin
dtype: object
get_dummies
method is useful when your data has a column containing some sort of coded indicator.# In[13]
full_monte=pd.DataFrame({'name':monte,
'info':['B | C | D','B | D','A | C','B | D','B | C','B | C | D']})
full_monte
# Out[13]
name info
0 Graham Chapman B | C | D
1 John Cleese B | D
2 Terry Gilliam A | C
3 Eric Idle B | D
4 Terry Jones B | C
5 Michael Palin B | C | D
get_dummies
routine lets us split out these indicator variables into a DataFrame.# In[14]
full_monte['info'].str.get_dummies('|')
# Out[14]
A B C D
0 0 1 1 1
1 0 1 0 1
2 1 0 1 0
3 0 1 0 1
4 0 1 1 0
5 0 1 1 1