Close

July 28, 2020

Python

#!/usr/bin/env python3

import pandas as pd

data = [['Alex', 10], ['Ronald', 18], ['Jane', 33]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(df)
df = pd.DataFrame(data, columns=['Name', 'Age'])
$ python simple.py
    Name  Age
0    Alex   10
1  Ronald   18
2    Jane   33
 We can update the index so that it does not start from 0.
change_index.py

#!/usr/bin/env python3

import pandas as pd

data = [['Alex', 10], ['Ronald', 18], ['Jane', 33]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
df.index = df.index + 1

print(df)
 In the example, we add 1 to the index.

$ python change_index.py
    Name  Age
1    Alex   10
2  Ronald   18
3    Jane   33
series_numpy.py

#!/usr/bin/env python3

import pandas as pd
import numpy as np

data = np.array(['a', 'b', 'c', 'd'])
s = pd.Series(data)

print(s)
 The example creates a column of letters from an ndarray.

$ python series_numpy.py
0    a
1    b
2    c
3    d
dtype: object
 A series can be created from a Python dictionary.
series_dict.py

#!/usr/bin/env python3

import pandas as pd
import numpy as np

data = {'coins' : 22, 'pens' : 3, 'books' : 28}
s = pd.Series(data)

print(s)
 The example creates a series object from a dicionary of items.

$ python series_dict.py
coins    22
pens      3
books    28
dtype: int64
 The index column does not have to be numerical. We can create our own custom index.
custom_index.py

#!/usr/bin/env python3

import pandas as pd

data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
        "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
        "area": [8.516, 17.10, 3.286, 9.597, 1.221],
        "population": [200.4, 143.5, 1252, 1357, 52.98]}

frame = pd.DataFrame(data)
print(frame)

print('------------------------------')

frame.index = ["BR", "RU", "IN", "CH", "SA"]
print(frame)
 In the example, we create a data frame from a data dictionary. We print the data frame and then we change the index column with index property.

$ python custom_index.py
        country    capital    area  population
0        Brazil   Brasilia   8.516      200.40
1        Russia     Moscow  17.100      143.50
2         India  New Dehli   3.286     1252.00
3         China    Beijing   9.597     1357.00
4  South Africa   Pretoria   1.221       52.98
------------------------------
         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Dehli   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98
 Pandas DataFrame has three basic parts: index, columns, and values.
index_vals_cols.py

#!/usr/bin/env python3

import pandas as pd

data = [['Alex', 10], ['Ronald', 18], ['Jane', 33]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(f'Index: {df.index}')
print(f'Columns: {df.columns}')
print(f'Values: {df.values}')
 The example prints the index, columns, and values of a data frame.

$ python index_vals_cols.py
Index: RangeIndex(start=0, stop=3, step=1)
Columns: Index(['Name', 'Age'], dtype='object')
Values: [['Alex' 10]
    ['Ronald' 18]
    ['Jane' 33]]
import pandas as pd

df = pd.read_excel('readfile.xlsx', index_col=0)
print(df)

ImportError: Missing optional dependency ‘xlrd’. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

python3 -m pip install xlrd

As an example, if our stock_prices had a second sheet called Sheet2, you would import that sheet to a pandas DataFrame like this:


new_data_frame.to_excel('stock_prices.xlsx', sheet_name='Sheet2')

How To Export .xlsx Files Using Pandas

Exporting Excel files is very similar to importing Excel files, except we use to_excel instead of read_excel. An example is below using our randomly-generated df DataFrame:


df.to_excel('my_new_excel_file.xlsx')

We are trying to filter the data of an excel sheet and save the filtered data as a new Excel file.

Excel Sheet used:

In this excel sheet we are having three categories in Species column-

  1. Setosa
  2. Versicolor
  3. Virginica

Now our aim is to filter these data by species category and to save this filtered data in different sheets with filename =species.subcategory name i.e. after the execution of the code we will going to get three files of following names-

  1. Setosa.xlsx
  2. Versicolor.xlsx
  3. Virginica.xlsx

# Python code to filter and save the  
# data with different file names 
import pandas 
  
  
data = pandas.read_excel("datasets.xlsx") 
  
speciesdata = data["Species"].unique() 
  
for i in speciesdata: 
    a = data[data["Species"].str.contains(i)] 
    a.to_excel(i+".xlsx") 

Output:

Explanation:

  • First, we have imported the Pandas library.
  • Then we have loaded the data.xlsx excel file in the data object.
  • To fetch the unique values from that species column we have used unique() function. To check the unique values in the Species column we have called the unique() in speciesdata object.
  • Then we will going to iterate the speciesdata object as we will going to store the Species column unique values(i.e. Setosa, Versicolor, Virginica) one by one.
  • In object “a” we are filtering out the data that matches the Species.speciesdata i.e. in each iteration object a will going to store three different types of data i.e. data of Setosa type then data of Versicolor type and at last the data of Virginica type.
  • Now to save the filtered data one by one in excel file we have used to_excel function, where, the file will going to be saved by the speciesdata name.