Solved: JSON Normalize fails for Nested Json array

rajeshps · ‎05-18-2023

Hello Team,

For below nested json, the array is not getting normalized using

import pandas as pd
import json

def on_input(data):

df = pd.read_json(data)df = pd.json_normalize(json.loads(data))

api.send("output", df.to_json(orient="records"))

api.set_port_callback("input1", on_input)

I also tried max_level and then record_path but no luck.

Flow: Kafka producer(json string) -> avro decoder -> Python3 -> Hana Client

Is there any way to normalise the json to check iF Then else condition & then updated suffix for column field and eventually updated to DB with no duplicates. Here header.poNumber and data.id are primary keys/unique identifiers.

Example:

 [
   {
      "header.poNumber":"9023496",
      "data.id":"10013459",
      "message.source":[
         {
            "createSource":null,
            "timeStamp":"2023-05-12T19:30:00.0000000+02:00",
            "type":"full"
         },
         {
            "createSource":"testdev",
            "timeStamp":"2023-05-11T19:30:00.0000000+02:00",
            "type":"ordersEstimated"
         },
         {
            "createSource": "event",
            "timeStamp":"2023-05-12T12:30:00.0000000+01:00",
            "type":"ordersCreated"
         }
      ],
      "message.time":[
         {
            "timeSource":"UTC",
            "typeId":"full"
         },
         {
            "timeSource":"IST",
            "typeId":"actual"
         }
      ]
   }
]

Expected output:

Vitaliy-R · ‎05-19-2023

If you need to keep 'actual' only, then one of the approaches might be filtering the JSON payload to keep only `actual` parts.

Then it is simple enough to flatten it in Pandas DataFrame.

Something like...

data_as_json=json.loads(data)
data_as_json_filtered=list()
for record in data_as_json:
    record_filtered=dict()
    for key in list(record.keys()):
        if key=='message.time':
            record_filtered.update({key: item for item in [time for time in record[key]] 
                     if item['typeId']=='actual'})
        elif key=='message.source':
            record_filtered.update({key: item for item in [source for source in record[key]] 
                     if item['type']=='actual'})
        else:
            record_filtered.update({key: record[key]})
    data_as_json_filtered.append(record_filtered)
display(data_as_json_filtered)
df_source=pd.json_normalize(data_as_json_filtered)

...gives me

PS. Please mark the answer as Accepted or Helpful to help the rest of the community. Thank you.

Vitaliy-R · ‎05-18-2023

I do not think you can simply normalize this record because it contains two arrays of dictionaries: `"message.time"` and `"message.source"`.

From the "expected output", I understand that you want to join values from `"message.time"` to values from `"message.source"` on the `type` attribute to create records.

So, I think you need to flatten two arrays into separate Pandas DataFrames, and then merge them on keys. Something like:

data_as_json=json.loads(data)<br>df_source=pd.json_normalize(data_as_json, record_path='message.source', meta=['header.poNumber', 'data.id'])
df_time=pd.json_normalize(data_as_json, record_path='message.time', meta=['header.poNumber', 'data.id'])
df=df_source.merge(df_time, left_on=['header.poNumber','data.id','type'], right_on=['header.poNumber','data.id','typeId'])

Here are my tests:

Regards,
-Vitaliy

JSON Normalize fails for Nested Json array

Accepted Solutions (1)

Accepted Solutions (1)

Answers (1)

Answers (1)

Re: How to Enable Multiple Selection Mode in a Dat...

I am getting the following error when trying to lo...

Re: PO created but workflow is not triggered

Sap Asset manager OMDO Assignment on Complex table

Re: Transaction zapping in Sybase Replication Serv...