Why Zod 2 isn't leaving beta

Colin McDonnell @colinhacks

published December 7th, 2020

Zod's v2 has been in beta since September 2020 (around 3 months as of this writing). The people are starting to talk.

The background

The reason Zod 2 has been in beta for so long (well, one of the reasons anyway!) is that I’ve been increasingly unsatisfied with Zod’s implementation of transformers. Transformers are a mechanism within Zod to convert data from one type to another. For the sake of rigor, I decided it was paramount for transformers to encapsulate both pre- and post-transform validation; to create one, you gave it an input schema A, and output schema B, and a transformation function (arg: A)=>B.

Multiple issues have cropped up that have led me to re-evaluate the current approach. At best, transformers are a huge footgun and at worst they’re fundamentally flawed.

Context

In version 1, a Zod schema simply let you define a data type you'd like to validate. Under the hood, it magically inferred the static TypeScript type for your schema. More technically, there is a base class (ZodType) that tracked the inferred type in a generic parameter called Type.

Screen Shot 2020-12-08 at 5 09 36 PM

Transformers break this assumption. The accept an Input of one type and return an Output of a different type. To account for this, every Zod schema now had to track both an input and output type. For non-transformers, Input and Output are the same. For transformers only, these types are different.

Let's look at stringToNumber as an example:

const stringToNumber = z.transformer(z.string(), z.number(), (val) =>
  parseFloat(val)
);

For stringToNumber, Input is string and Output is number. Makes sense.

What happens when you pass a value into stringToNumber.parse?

  1. The user passes a value into stringToNumber.parse
  2. The transformer passes this value through the parse function of its input schema (z.string()). If there are parsing errors, it throws a ZodError
  3. The transformer takes the output from that and passes it into the transformation function (val => parseFloat(val))
  4. The transformer takes the output of the transformation and validates it against the output schema (z.number())
  5. The result of that call is returned

Here's the takeaway: for a generic transformer z.transformer(A, B, func), where A and B are Zod schemas, the argument of func should be the Output type of A and the return type is the Input type of B. This lets you do things like this:

const stringToNumber = z.transformer(z.string(), z.number(), (val) =>
  parseFloat(val)
);
const numberToBoolean = z.transformer(
  z.number(),
  z.boolean(),
  (val) => val > 25
);
const stringToNumberToBoolean = z.transformer(
  stringToNumber,
  numberToBoolean,
  (val) => 5 * val
);

The problems with transformers

After implementing transformers, I realized transformers could be used to implement another much-requested feature: default values. Consider this:

const stringWithDefault = z.transformer(
  z.string().optional(),
  z.string(),
  (val) => val || 'trout'
);
stringWithDefault.parse('marlin'); // => "marlin"
stringWithDefault.parse(undefined); // => "trout"

Voila, a schema that accepts string | undefined and returns string, substituting the default "trout" if the input is ever undefined.

So I implemented the .default(val:T) method in the ZodType base class as below (partially simplified)

default(def: Input) {
  return ZodTransformer.create(this.optional(), this, (x: any) => {
    return x === undefined ? def : x;
  });
}

Do you see the problem with that? I didn’t. Neither did anyone who read through the Transformers RFC which I left open for comment for a couple months before starting on implementation.

This implementation quickly gets wonky when you try to pass an existing transformer in as the input or output. In other words, transformers aren't composable. Let's see what happens if we try to set a default value on stringToNumber.

stringToNumber.default('3.14');

// EQUIVALENT TO
const defaultedStringToNumber = z.transformer(
  stringToNumber.optional(),
  stringToNumber,
  (val) => (val !== undefined ? val : '3.14')
);

defaultedStringToNumber.parse('5');
/* { ZodError: [
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "number",
    "path": [],
    "message": "Expected string, received number"
  }
] */

Let’s walk through why this fails. The input ("5") is first passed into the transformer input (stringToNumber.optional()). This converts the string "5" to the number 5. This is then passed into the transformation function. But wait: val is now number | undefined, but the transformer function needs to return a string. Otherwise, if we pass 5 into stringToNumber.parse it’ll throw. So we need to convert 5 back to "5". That may seem easy in this toy example but it’s not possible in the general case. Zod can’t know how to magically undo the transformation function.

In practice, the current definition of default in ZodType shouldn’t have even been possible. The only reason the type checker didn’t catch this bug is because there are a regrettable number of anys floating around in Zod. It’s not a simple matter to switch them all to unknowns either; I’ve had to use any in several instance to get type inference and certain generic methods to work properly. I’ve tried multiple times to reduce the number of anys but I’ve never managed to crack it.

It’s possible this is a one-off issue. I could find some other way to implement .default() that doesn’t involve transformers. Unfortunately this isn’t even the only problem in Zod’s implementation.

The .transform method

Initially the only way to define transformers was with z.transformer(A, B, func). Eventually I implemented a convenience method you can use like this:

z.string().transform(z.number(), (val) => parseFloat(val));

// equivalent to
z.transformer(z.string(), z.number(), (val) => parseFloat(val));

Some users were executing multiple transforms in sequence without changing the actual data type:

z.string()
  .transform(z.string(), (val) => val.toLowerCase())
  .transform(z.string(), (val) => val.trim())
  .transform(z.string(), (val) => val.replace(' ', '_'));

To reduce the boilerplate here, it was recommended that I overload the method definition to support this syntax:

z.string()
  .transform((val) => val.toLowerCase())
  .transform((val) => val.trim())
  .transform((val) => val.replace(' ', '_'));

If the first argument is a function instead of a Zod schema, Zod should assume that the transformation doesn’t transform the type. In other words, z.string().transform((val) => val.trim()) should be equivalent to z.string().transform(z.string(), (val) => val.trim()). Makes sense.

Consider using this method on a transformer:

stringToNumber.transform(/* transformation_func */);

What type signature do you expect for transformation_func?

Most would probably expect (arg: number)=>number. Some would expect (arg: string)=>string. Neither of those are right; it’s (arg: number)=>string. Yes really. The transformation function expects an input of number (the Output of stringToNumber) and a return type of number (the Input of stringToNumber). This type signature is a natural consequence of a series of logical design decisions, but the end result is dumb. Well done, self.

Intuitively, you should be able to append .transform(val => val) to any schema. Surely passing in the identity function to a method called .transform() should be a no-op? Unfortunately due to how transformers are implemented, that's not always possible.

More complicated examples

These strange issues composability issues also make it difficult to write any generic functions on top of Zod (of which .transform and .default are two examples). Others have encountered similar issues, like here and here. For the sake of simplicity, I won't break down the nuances of those cases; the underlying issue is the composability problem I described above.

When should defaults even be applied?

Strangely, there doesn't appear to be a consensus expectation on whether default values should be "applied" pre- or post-transformations. Consider the simple stringToNumber example; should stringToNumber.default(/* value */) accept a number or a string?

As implemented in Zod 2 it should accept string (the Input of the output schema of stringToNumber). In this case, the schema looks at the value passed into parse, checks if it's undefined, and, if so, hot-swaps in the default value. That value is then passed through all the transformers and refinements downstream (in this case, being converted to a number) by the non-optional stringToNumber. Remember, as implemented, this is how stringToNumber.default() returns:

stringToNumber.default(val);

// equivalent to

z.transformer(stringToNumber.optional(), stringToNumber, (val) =>
  val !== undefined ? val : '3.14'
);

But some users expect stringToNumber.default to accept a number; they assume that a default value "short-circuits" the parsing logic. If you pass undefined into .parse(), the parsing should "return early" and return the value. In this case, taht value should be a number because that's the ultimate expected return type of the transformer returned by stringToNumber.default(val).

A path forward

Note from March 2021: the approach described below also had issues! Ultimately the implementation used in Zod 3 was slightly different.

I just created an RFC describing a plan to improve this state of affairs. The idea was to have each Zod schema contain a list of post-parse transformation functions. When a value is passed into .parse, Zod will type check the value, then pass it through the transform chain. This is the approach Yup uses.

const schema = z
  .string()
  .transform((val) => val.length)
  .transform((val) => val > 100);

type In = z.input<typeof schema>; // string
type Out = z.input<typeof schema>; // boolean

Unlike before, Zod doesn’t validate the data type between each transform. We’re relying on the TypeScript engine to infer the correct type based on the function definitions. In this sense, Zod is behaving just like I intended; it’s acting as a reliable source of type safety that lets you confidently implement the rest of your application logic — including transforms. Re-validating the type between each transform is overkill; TypeScript’s type checker already does that.

Each schema will still have an Input (the inferred type of the schema) and an Output (the output type of the last transform in the chain). But because we’re avoiding the weird hierarchy of ZodTransformers everything behaves in a much more intuitive way.

One interesting ramification is that you could interleave transforms and refinements. Zod could keep track of the order each mod/refinement was added and execute them all in sequence:

const schema = z
  .string()
  .transform((val) => parseFloat(val))
  .refine((val) => val > 25, { message: 'Too short' })
  .transform((val) => `${val}`)
  .refine((val) => /^\d+$/.test(val), { message: 'No floats allowed' });

We'll have to wait and see how this turns out!

Update from March 2021

This is the general approach being used in the new Zod 3 alpha release. There are some small implementational details — refinements and transforms are now "housed" within the ZodEffects wrapper class instead of being attached directly to the schema.

A note on default values: Zod no longer uses transforms to implement default values. Transforms are only capable of implementing post-parse defaults: parse the input, then swap in the default value if the result is undefined. But I've decided that pre-parse defaults make more sense (and are more expected by users). That is — if the input is undefined swap in the default value, and then finish the parsing process (which includes refining and transforming). As such, the defaulting logic now lives in the ZodOptional class.