We generate the Python code from the OCF schemas. This is mainly because the OCF schemas are still in Beta, and we don't want to have to keep it up to date manually, because then it will always not up to date.
There's already tools that can generate code from schemas, so we tried some out.
{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://opencaptablecoalition.com/[...]/VestingTermsFile.schema.json", "title": "File - Vesting Terms", [...] "properties": { "items": { "type": "array", "description": "List of OCF vesting terms objects", "items": { "$ref": "https://opencaptablecoalition.com/[...]/VestingTerms.schema.json" } [...] }
There was one big problem with many of them, though:
The JSON Schema has URL's as ids, and several generators looks at a reference, and if it's a URL, it tries to fetch the schema for that reference from that URL. And that might work, if the schema was released and published, and not in a beta version. But OCF is in beta version, and isn't completely published, so that fails. Instead the code generator needs to load all the schema files first, and only then try to resolve them.
I looked at a few:
And then datamodel-codegen, which I looked more closely, as it seemed promising.
It also has the URL loading problem, but I could work around that, and have it generate code.
However, it uses the file names as the names for modules, which makes sense, but unfortunately, Python style guides say that file names should be lowercase, and the OCF Schema filenames are not. It also generated the class names from the schema titles. And that gives us classes like pyocf.objects.StockClass.ObjectStockClass() when we would have wanted wanted pyocf.objects.stockclass.StockClass().
class StockClass(BaseModel): class Config: extra = Extra.forbid __root__: Any class StockClassModel(BaseModel): class Config: extra = Extra.forbid object_type: Optional[Any] = None name: str = Field( ..., description='Name for the stock type', ) class_type: StockClassType.StockClassTypeModel = Field( ..., description='The type of this stock class' ) [...]
Fixing those problems by modifying the Schema still resulted in code like this:
Why is there a StockClass, and a StockClassModel? I don't know know. And at this point I needed to use field discriminators, which I will explain later, and I couldn't get that to work, so I decided to make my own code generator.
Both datamodel-codegen and our code generator generates code that uses Pydantic. Pydantic is a runtime data verification library. So you set up your classes with Python type hints, and Pydantic will, during runtime, make sure you only set attributes to the right type, which will ensure that the data we save is valid, and what more, it will try to convert the indata to the right types. This is very handy for importing data from text files like JSON.
Pydantic runtime verification is slow, but you can avoid all verification with .construct() if you need speed. So if you KNOW the data is as it should be you can construct it quickly.
# List of OCF transaction objects items: list[ ConvertibleAcceptance | PlanSecurityAcceptance | StockAcceptance | WarrantAcceptance | ConvertibleCancellation | PlanSecurityCancellation | StockCancellation | WarrantCancellation | ConvertibleConversion | StockConversion | PlanSecurityExercise | WarrantExercise | ConvertibleIssuance | PlanSecurityIssuance | StockIssuance [...]
I mentioned Field Discriminators before
There are 31 different types of transactions, which means the transaction file has a field like this example.
But when loading this from a JSON file, how does it know which of these classes to create? By default it will simply create the first one, and that's most of the time incorrect.
And this is where the magic of field discriminatiors enter.
# List of OCF transaction objects items: list[ Annotated[ ConvertibleAcceptance [29 classes removed for brevity] | StockPlanPoolAdjustment, Field(discriminator="object_type"), ]
This means it loads the json, looks at the object_type field in the JSON, looks at all the allowed classes, and finds the object type that matches.
class StockPlanPoolAdjustment(Object, Transaction, StockPlanTransaction): """Stock Plan Pool Adjustment Transaction""" object_type: Literal["TX_STOCK_PLAN_POOL_ADJUSTMENT"] = \ "TX_STOCK_PLAN_POOL_ADJUSTMENT"
The classes must have a literal to match against. It's less than pretty, but it works:
The Literal["TX_STOCK_PLAN_POOL_ADJUSTMENT"] bit says that this HAS to be a Literal string with that value, and the = "TX_STOCK_PLAN_POOL_ADJUSTMENT" sets it to that value. A bit redundant, but it seems like this is the correct way to do it.
Generating code is one thing, generating pretty code is another. Luckily we have a shortcut. We just run black on the code after generating it.
Black doesn't break long strings or comments, so that is done in the generator code, all other formatting is done by black.
from pyocf.objects.transactions.acceptance.stockacceptance import StockAcceptance from pyocf.objects.transactions.acceptance.warrantacceptance import WarrantAcceptance [...] transaction: StockAcceptance | WarrantAcceptance
I currently import classes, so code looks like this.
I could possibly just import pyocf instead, and have code like this
import pyocf [...] transaction = ( pyocf.objects.transactions.acceptance.stockacceptance.StockAcceptance | pyocf.objects.transactions.acceptance.warrantacceptance.WarrantAcceptance )
Or any form of compromise there, like importing only 2 or 3 levels down. Opinions are welcome.
STYLE ONE: # Very long text here object_type: ObjectType STYLE TWO: object_type: Annotated[ ObjectType, Field( description="Very long text here" ) ]
We could use Field() everywhere, not just when we have discriminator fields.
That would make it possible to get descriptions on fields, but on the otehr hand, the descriptions are very long, so it makes for ugly code, so I just stuck it as comments for the time being.
That's all folks