Python
#!/usr/bin/env python3 import pandas as pd data = [['Alex', 10], ['Ronald', 18], ['Jane', 33]] df = pd.DataFrame(data, columns=['Name', 'Age']) print(df)
df = pd.DataFrame(data, columns=['Name', 'Age'])
$ python simple.py
Name Age
0 Alex 10
1 Ronald 18
2 Jane 33
We can update the index so that it does not start from 0.
change_index.py
#!/usr/bin/env python3
import pandas as pd
data = [['Alex', 10], ['Ronald', 18], ['Jane', 33]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
df.index = df.index + 1
print(df)
In the example, we add 1 to the index.
$ python change_index.py
Name Age
1 Alex 10
2 Ronald 18
3 Jane 33
series_numpy.py
#!/usr/bin/env python3
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd'])
s = pd.Series(data)
print(s)
The example creates a column of letters from an ndarray.
$ python series_numpy.py
0 a
1 b
2 c
3 d
dtype: object
A series can be created from a Python dictionary.
series_dict.py
#!/usr/bin/env python3
import pandas as pd
import numpy as np
data = {'coins' : 22, 'pens' : 3, 'books' : 28}
s = pd.Series(data)
print(s)
The example creates a series object from a dicionary of items.
$ python series_dict.py
coins 22
pens 3
books 28
dtype: int64
The index column does not have to be numerical. We can create our own custom index.
custom_index.py
#!/usr/bin/env python3
import pandas as pd
data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98]}
frame = pd.DataFrame(data)
print(frame)
print('------------------------------')
frame.index = ["BR", "RU", "IN", "CH", "SA"]
print(frame)
In the example, we create a data frame from a data dictionary. We print the data frame and then we change the index column with index property.
$ python custom_index.py
country capital area population
0 Brazil Brasilia 8.516 200.40
1 Russia Moscow 17.100 143.50
2 India New Dehli 3.286 1252.00
3 China Beijing 9.597 1357.00
4 South Africa Pretoria 1.221 52.98
------------------------------
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Dehli 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
Pandas DataFrame has three basic parts: index, columns, and values.
index_vals_cols.py
#!/usr/bin/env python3
import pandas as pd
data = [['Alex', 10], ['Ronald', 18], ['Jane', 33]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(f'Index: {df.index}')
print(f'Columns: {df.columns}')
print(f'Values: {df.values}')
The example prints the index, columns, and values of a data frame.
$ python index_vals_cols.py
Index: RangeIndex(start=0, stop=3, step=1)
Columns: Index(['Name', 'Age'], dtype='object')
Values: [['Alex' 10]
['Ronald' 18]
['Jane' 33]]
import pandas as pd
df = pd.read_excel('readfile.xlsx', index_col=0)
print(df)
ImportError: Missing optional dependency ‘xlrd’. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
python3 -m pip install xlrd
As an example, if our stock_prices
had a second sheet called Sheet2
, you would import that sheet to a pandas DataFrame like this:
new_data_frame.to_excel('stock_prices.xlsx', sheet_name='Sheet2')
How To Export .xlsx
Files Using Pandas
Exporting Excel files is very similar to importing Excel files, except we use to_excel
instead of read_excel
. An example is below using our randomly-generated df
DataFrame:
df.to_excel('my_new_excel_file.xlsx')
We are trying to filter the data of an excel sheet and save the filtered data as a new Excel file.
Excel Sheet used:
In this excel sheet we are having three categories in Species column-
- Setosa
- Versicolor
- Virginica
Now our aim is to filter these data by species category and to save this filtered data in different sheets with filename =species.subcategory name i.e. after the execution of the code we will going to get three files of following names-
- Setosa.xlsx
- Versicolor.xlsx
- Virginica.xlsx
# Python code to filter and save the
# data with different file names
import pandas
data = pandas.read_excel("datasets.xlsx")
speciesdata = data["Species"].unique()
for i in speciesdata:
a = data[data["Species"].str.contains(i)]
a.to_excel(i+".xlsx")
Output:
Explanation:
- First, we have imported the Pandas library.
- Then we have loaded the data.xlsx excel file in the data object.
- To fetch the unique values from that species column we have used
unique()
function. To check the unique values in the Species column we have called the unique() in speciesdata object. - Then we will going to iterate the speciesdata object as we will going to store the Species column unique values(i.e. Setosa, Versicolor, Virginica) one by one.
- In object “a” we are filtering out the data that matches the
Species.speciesdata
i.e. in each iteration object a will going to store three different types of data i.e. data of Setosa type then data of Versicolor type and at last the data of Virginica type. - Now to save the filtered data one by one in excel file we have used to_excel function, where, the file will going to be saved by the speciesdata name.