How to merge data in Python using Pandas merge

Use the popular Pandas library for data manipulation and analysis to read data from two files and join them into a single dataset.

1 2 Page 2
Page 2 of 2
martinheller@Martins-Retina-MacBook rmerges3ways % python3  
Python 3.7.2 (v3.7.2:9a3ffc0492, Dec 24 2018, 02:44:43)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> mydf = pd.read_csv("673598238_T_ONTIME_REPORTING.csv")
>>> mydf.head(10)
      FL_DATE OP_UNIQUE_CARRIER ORIGIN DEST  DEP_DELAY_NEW  Unnamed: 5
0  2019-08-01                DL    ATL  DFW           31.0         NaN
1  2019-08-01                DL    DFW  ATL            0.0         NaN
2  2019-08-01                DL    IAH  ATL           40.0         NaN
3  2019-08-01                DL    PDX  SLC            0.0         NaN
4  2019-08-01                DL    SLC  PDX            0.0         NaN
5  2019-08-01                DL    DTW  ATL           10.0         NaN
6  2019-08-01                DL    ATL  DTW            0.0         NaN
7  2019-08-01                DL    MSP  JFK           22.0         NaN
8  2019-08-01                DL    JFK  MSP            0.0         NaN
9  2019-08-01                DL    BHM  ATL            0.0         NaN
>>> mylookup = pd.read_csv("L_UNIQUE_CARRIERS.csv_")
>>> mylookup.head(10)
  Code                                        Description
0  02Q                                      Titan Airways
1  04Q                                 Tradewind Aviation
2  05Q                                Comlux Aviation, AG
3  06Q                      Master Top Linhas Aereas Ltd.
4  07Q                                Flair Airlines Ltd.
5  09Q  Swift Air, LLC d/b/a Eastern Air Lines d/b/a E...
6  0BQ                                                DCA
7  0CQ                               ACM AIR CHARTER GmbH
8  0FQ               Maine Aviation Aircraft Charter, LLC
9  0GQ       Inter Island Airways, d/b/a Inter Island Air
>>> joined_df = pd.merge(mydf, mylookup, left_on= "OP_UNIQUE_CARRIER", right_on="Code")
>>> joined_df.head(10)
      FL_DATE OP_UNIQUE_CARRIER ORIGIN  ... Unnamed: 5  Code           Description
0  2019-08-01                DL    ATL  ...        NaN    DL  Delta Air Lines Inc.
1  2019-08-01                DL    DFW  ...        NaN    DL  Delta Air Lines Inc.
2  2019-08-01                DL    IAH  ...        NaN    DL  Delta Air Lines Inc.
3  2019-08-01                DL    PDX  ...        NaN    DL  Delta Air Lines Inc.
4  2019-08-01                DL    SLC  ...        NaN    DL  Delta Air Lines Inc.
5  2019-08-01                DL    DTW  ...        NaN    DL  Delta Air Lines Inc.
6  2019-08-01                DL    ATL  ...        NaN    DL  Delta Air Lines Inc.
7  2019-08-01                DL    MSP  ...        NaN    DL  Delta Air Lines Inc.
8  2019-08-01                DL    JFK  ...        NaN    DL  Delta Air Lines Inc.
9  2019-08-01                DL    BHM  ...        NaN    DL  Delta Air Lines Inc.

[10 rows x 8 columns]

And there we have it. Or do we? If you run the code as a file from the command line, instead of line by line in the interactive Python console, there won’t be any output. We need to add some print() function calls:

import pandas as pd

mydf = pd.read_csv("673598238_T_ONTIME_REPORTING.csv")
mylookup = pd.read_csv("L_UNIQUE_CARRIERS.csv_")

print( mydf.head(10) )
print( mylookup.head(10) )

joined_df = pd.merge(mydf, mylookup, left_on= "OP_UNIQUE_CARRIER", right_on="Code")

print( joined_df.head(10) )

Now python3 pymerge.py will emit the output we want.

Copyright © 2021 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
How to choose a low-code development platform