Python class generation from YAML

YAML - I remember using YAML extensively back in the day. It somehow fell out of favour, there was a new kid on the block - JSON. This story is not about JSON.

I'm using the pyYaml library in this write-up.

# companies.yaml
- !Company
  company_name: PoundCompany
  active: True 
  has_dimensions: True
  dimensions:
    - sillyname1
    - sillyname2
  file_mapping:
    account_number_map: accounts.csv
    account_type_map: accounts.csv
    tax_code_map: taxcodes.csv
  bi_central_uuid: 00000000-0000-0000-0000-000000000001
- !Company
  company_name: EuroCompany
  active: True 
  has_dimensions: True
  dimensions:
    - sillyname1
    - sillyname2
  file_mapping:
    account_number_map: accounts.csv
    account_type_map: accounts.csv
    tax_code_map: taxcodes.csv
  bi_central_uuid: 00000000-0000-0000-0000-000000000002

This YAML describes two instances of the initialized class.

#company.py
from dataclasses import dataclass, field

@dataclass
class CompanyModel:
    company_name: str = field(default="")
    active: bool = False
    has_dimensions: bool = False
    dimensions: list = field(default_factory=list)
    file_mapping: dict = field(default_factory=dict)
    bi_central_uuid: str = field(default="")

And this is my class. It might surprise you that I've chosen to name it xxxModel. To me, this is a model. A collection of attributes that I, at some point, am going to serialize to JSON before posting it on the Business Central API. This class is basically data.

The task at hand is to go from YAML to (model)Class!

The pyYaml documentation has a few examples on how to do this, none of which I found particularly interesting.

Onwards! Let's start with reading the YAML

# company_loader.py
import yaml
from protocols import Settings, Company
from settings import SETTINGS
from company import CompanyModel
from beartype import beartype

@beartype
def get_companies(settings: Settings):
    with open(settings.base_dir / "companies.yaml", "rb") as fp:
        return yaml.load(fp, Loader=get_loader())

The interesting part here is get_loader(). Let's define it.

# company_loader.py
@beartype
def get_loader()->type[yaml.SafeLoader]:
    loader = yaml.SafeLoader
    loader.add_constructor("!Company", company_constructor)
    return loader

Now we have created the loader and we've added a constructor to it. The constructor does all the heavy lifting and will construct our class. There are a few constructors in pyYaml. To be honest, I gave yaml.SafeLoader.construct_mapping() a go and never figured out why it didn't produce lists and dicts. This is far from rocket science and creating a constructor is relatively easy

This might be a good time to explain why I created this contraption. I needed to move data (appearing in one and only one static format) to the BiCentral API. No room for interpretation.

Anyway. You'll find code in pyYaml that resembles the below function - the main difference being that I understand what my code does and it does exactly what I need.

# company_loader.py
@beartype
def company_constructor(
    loader: yaml.SafeLoader, 
    node: yaml.nodes.MappingNode,
    )-> Company:                                                                     
    """Construct an employee."""                                                                                                                                    
    _node = {}                                                                                                                         

    def process_node(node): -> None                                                                                                                                 
        for _n in node.value:                                                                                                                                       
            attr_name, attr_value = _n                                                                                                                   
            if isinstance(attr_name, yaml.MappingNode):                                                                                                             
                process_node(attr_name)                                                                                                                             
            _node[attr_name.value] = ""                                                                                                                             
            _cast: str = attr_value.tag.split(":")[-1]
            if _cast in ["bool", "int"]:                                                                                                                            
                _node[attr_name.value] = eval(attr_value.value)                                                                                                     
            elif _cast == "seq":                                                                                                                                    
                _node[attr_name.value] = [
                    item.value for item in
                    attr_value.value
                    ]                                                                                  
            elif _cast == "map":                                                                                                                                    
                _node[attr_name.value] = {}                                                                                                                         
                _node[attr_name.value].update(                                                                                                                      
                    {
                        (key.value, val.value) for (key, val)
                         in [
                                item for item in
                                attr_value.value
                             ]
                    },                                                                 
                )                                                                                                                                                   
            else:                                                                                                                                                   
                _node[attr_name.value] = attr_value.value
    process_node(node)                                                                                                                                              
    return CompanyModel(**_node)  # type: ignore

As you can see, the process_node() function is recursive and it is tailored to my needs. I don't need to call eval on items in my list or dict instances so naturally, I don't. When it's done I unpack the resulting dict and return an initialized CompanyModel object.

Before wrapping up I'd like to mention two minor details. I really like protocols (Thank you Arjan) and beartype - I also like beartype, a lot.

# protocols.py
from typing import Protocol
from pathlib import Path
from beartype import typing

@typing.runtime_checkable                                                                                                                                           
class Settings(Protocol):                                                                                                                                           
    base_dir: Path

@typing.runtime_checkable
class Company(Protocol):
    company_name: str
    active: bool
    has_dimensions: bool
    dimensions: list
    file_mapping: dict
    bi_central_uuid: str

You'll find the SETTINGS class somewhere else on this blog.

The main takeaways.

  • YAML is still cool

  • Things do not need to get too complicated if you scope it well

  • Protocols are awesome

  • Arjan is awesome

  • Beartyping is next-level awesome