PyOCF

Code generation

We generate the Python code from the OCF schemas. This is mainly because the OCF schemas are still in Beta, and we don't want to have to keep it up to date manually, because then it will always not up to date.

There's already tools that can generate code from schemas, so we tried some out.

JSON Schema URL ID's

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "https://opencaptablecoalition.com/[...]/VestingTermsFile.schema.json",
  "title": "File - Vesting Terms",
  [...]
  "properties": {
    "items": {
      "type": "array",
      "description": "List of OCF vesting terms objects",
      "items": {
        "$ref": "https://opencaptablecoalition.com/[...]/VestingTerms.schema.json"
      }
   [...]
}

There was one big problem with many of them, though:

The JSON Schema has URL's as ids, and several generators looks at a reference, and if it's a URL, it tries to fetch the schema for that reference from that URL. And that might work, if the schema was released and published, and not in a beta version. But OCF is in beta version, and isn't completely published, so that fails. Instead the code generator needs to load all the schema files first, and only then try to resolve them.

Existing tools

I looked at a few:

  • json-schema-codegen and statham-schema, both of which I couldn't understand how to create objects from more than one schema file at a a time.
  • Warlock, which creates objects, not code, which makes me worried it's going to be hard to debug.
  • yacg, which the author weirdly only has published as a Docker container, so that went nowhere. The code is also quite complex, because it actually tries to be generic, so it can support both any type of schema, and any type of output code, which seems overly ambitious.

And then datamodel-codegen, which I looked more closely, as it seemed promising.

datamodel-codegen

It also has the URL loading problem, but I could work around that, and have it generate code.

However, it uses the file names as the names for modules, which makes sense, but unfortunately, Python style guides say that file names should be lowercase, and the OCF Schema filenames are not. It also generated the class names from the schema titles. And that gives us classes like pyocf.objects.StockClass.ObjectStockClass() when we would have wanted wanted pyocf.objects.stockclass.StockClass().

class StockClass(BaseModel):
    class Config:
        extra = Extra.forbid

    __root__: Any


class StockClassModel(BaseModel):
    class Config:
        extra = Extra.forbid

    object_type: Optional[Any] = None
    name: str = Field(
        ...,
        description='Name for the stock type',
    )
    class_type: StockClassType.StockClassTypeModel = Field(
        ..., description='The type of this stock class'
    )
    [...]

Fixing those problems by modifying the Schema still resulted in code like this:

Why is there a StockClass, and a StockClassModel? I don't know know. And at this point I needed to use field discriminators, which I will explain later, and I couldn't get that to work, so I decided to make my own code generator.

My solution

Pydantic

Both datamodel-codegen and our code generator generates code that uses Pydantic. Pydantic is a runtime data verification library. So you set up your classes with Python type hints, and Pydantic will, during runtime, make sure you only set attributes to the right type, which will ensure that the data we save is valid, and what more, it will try to convert the indata to the right types. This is very handy for importing data from text files like JSON.

Pydantic runtime verification is slow, but you can avoid all verification with .construct() if you need speed. So if you KNOW the data is as it should be you can construct it quickly.

Field Discriminators

# List of OCF transaction objects
items: list[
        ConvertibleAcceptance
        | PlanSecurityAcceptance
        | StockAcceptance
        | WarrantAcceptance
        | ConvertibleCancellation
        | PlanSecurityCancellation
        | StockCancellation
        | WarrantCancellation
        | ConvertibleConversion
        | StockConversion
        | PlanSecurityExercise
        | WarrantExercise
        | ConvertibleIssuance
        | PlanSecurityIssuance
        | StockIssuance
        [...]

I mentioned Field Discriminators before

There are 31 different types of transactions, which means the transaction file has a field like this example.

But when loading this from a JSON file, how does it know which of these classes to create? By default it will simply create the first one, and that's most of the time incorrect.

And this is where the magic of field discriminatiors enter.

Field Discriminators

# List of OCF transaction objects
items: list[
    Annotated[
        ConvertibleAcceptance
    [29 classes removed for brevity]
        | StockPlanPoolAdjustment,
        Field(discriminator="object_type"),
    ]

This means it loads the json, looks at the object_type field in the JSON, looks at all the allowed classes, and finds the object type that matches.

Field Discriminators

class StockPlanPoolAdjustment(Object, Transaction, StockPlanTransaction):
    """Stock Plan Pool Adjustment Transaction"""

    object_type: Literal["TX_STOCK_PLAN_POOL_ADJUSTMENT"] = \
        "TX_STOCK_PLAN_POOL_ADJUSTMENT"

The classes must have a literal to match against. It's less than pretty, but it works:

The Literal["TX_STOCK_PLAN_POOL_ADJUSTMENT"] bit says that this HAS to be a Literal string with that value, and the = "TX_STOCK_PLAN_POOL_ADJUSTMENT" sets it to that value. A bit redundant, but it seems like this is the correct way to do it.

Code style

Generating code is one thing, generating pretty code is another. Luckily we have a shortcut. We just run black on the code after generating it.

Black doesn't break long strings or comments, so that is done in the generator code, all other formatting is done by black.

Import styling

from pyocf.objects.transactions.acceptance.stockacceptance import StockAcceptance
from pyocf.objects.transactions.acceptance.warrantacceptance import WarrantAcceptance

[...]

transaction: StockAcceptance | WarrantAcceptance

I currently import classes, so code looks like this.

I could possibly just import pyocf instead, and have code like this

Import styling

import pyocf

[...]

transaction = (
    pyocf.objects.transactions.acceptance.stockacceptance.StockAcceptance |
    pyocf.objects.transactions.acceptance.warrantacceptance.WarrantAcceptance
)

Or any form of compromise there, like importing only 2 or 3 levels down. Opinions are welcome.

Fields?

STYLE ONE:

# Very long text here
object_type: ObjectType


STYLE TWO:

object_type: Annotated[
    ObjectType,
    Field(
        description="Very long text here"
    )
]

We could use Field() everywhere, not just when we have discriminator fields.

That would make it possible to get descriptions on fields, but on the otehr hand, the descriptions are very long, so it makes for ugly code, so I just stuck it as comments for the time being.

That's all folks