What an interesting bug - Travis Swiers

Last Monday, I sat down at my desk with a cup of coffee to get my day started in the usual fashion: check calendar, check messages, check my preferred views in Datadog, check GitLab for review requests.

On this particular morning Datadog, had been alerting us we had SQS messages in the queue for longer than expected. This happens on occasion. Not particularly alarming or interesting. Usually some system Compassion Community relies on was down or slow to respond. Our team doesn't send these types of alerts to PagerDuty, as it's generally not user blocking and can be handled during normal working hours. More often than not, it's just redriving the Dead Letter Queue.

The queue in question this time was SendPushNotificationDLQ. Not one of the usual suspects... Interest piqued!

Being on a distributed team and in the Mountain Time Zone, I dropped a message in Teams to see if anybody was on the case. An East Coast colleague was on it, so I moved on to other tasks.

An hour or so later, that colleague DM'd me asking if anything had stuck out to me. His hunch ended up being a dead end. I hadn't looked at it since finding out he was working on it, but I informed him I'd take a look.

Root Cause Analysis

First stop: Datadog.

I knew the queue, and I knew that queue was processed by a Lambda Function. So I could deduce the error likely originated in the Lambda Function. I also knew the alerts started Saturday. Given all this information, I wanted to look at this Lambda's invocations from Saturday and see if there were any errors. My preferred way to look at a specific function in Datadog is, unsurprisingly, the Lambda view (Infrastructure -> Serverless -> Lambda). Filter by the date range, environment, service, and function and I quickly found the failed invocations. A peak at the logs, and we have our error:

2025-08-23T22:15:14.244Z	20b1ebf8-7b1d-5e77-ba69-8adc44b1e580	ERROR	PrismaClientKnownRequestError: 
Invalid `prisma.notification.createMany()` invocation:


unexpected end of hex escape at line 1 column 764
    at Vn.handleRequestError (/node_modules/@prisma/client/runtime/library.js:121:7339)
    at Vn.handleAndLogRequestError (/node_modules/@prisma/client/runtime/library.js:121:6663)
    at Vn.request (/node_modules/@prisma/client/runtime/library.js:121:6370)
    at async l (/node_modules/@prisma/client/runtime/library.js:130:9617) {
  code: 'InvalidArg',
  clientVersion: '6.2.1',
  meta: { modelName: 'Notification' }
}

I also discovered it was a one-off. There were successful invocations before and after. Even successful invocations during the retries of this message before it was moved to the DLQ.

The error looks familiar. I've seen similar errors parsing malformed JSON strings:

JSON.parse('{"key":')
// Uncaught SyntaxError: Unexpected end of JSON input

So what's in this message? I bet some user input is escaping us from a string!

From the same view in Datadog, we can also look at the trace of a given Lambda invocation. This gives me a high-level view of the path the request took to get to this Lambda. What I want to see is the Lambda's payload. We've instrumented our Lambdas so their request and response payloads are captured. From the trace, I highlight the Lambda and open the Lambda Request drop down. A quick scan, and a row of emoji's immediately pops out to me! This is a prettified human-readable look at the payload. Let's look at the raw:

{
  // ...
  "detail": {
    // ...
    "description": "Today was our reading, writing, and math reinforcement graduation with more than 31 graduates \uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73",
    // ...
  }
}

Look at all those backslashes! I think we're onto something. What happens if we stringify and parse that?

JSON.parse(
  JSON.stringify({
    detail: {
      description: "Today was our reading, writing, and math reinforcement graduation with more than 31 graduates \uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73",
    },
  })
)

Nothing blows up... What are we doing with this description before persisting it the database? I head to the codebase to find out.

In the Lambda handler code, I quickly notice a call to a function called _truncateMessageBodies, which itself calls a utility function in the codebase called truncate. We're truncating the bodies of these messages to 100 characters so they show up nicely on the user's phone as a preview of the content.

Let's look at truncate's implementation:

export const truncate = (text, maxLength) =>
  text.length <= maxLength ? text : text.substring(0, maxLength - 3) + '...'

Straightforward enough: take in a string of text and an integer, maxLength, defining the maximum amount of characters you want in the string. If the given string exceeds the length, you'll get a subset of your string back with an ellipsis.

Let's see what this does to our string:

const stringWithEmojis = "Today was our reading, writing, and math reinforcement graduation with more than 31 graduates \uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73"
truncate(stringWithEmojis, 100)
// 'Today was our reading, writing, and math reinforcement graduation with more than 31 graduates 🥳\uD83E...'

Half an emoji. That's not a great preview. It's still a valid string, though, right?

Meanwhile, my colleague was taking a different approach. What does that error even mean? Searching the error, you quickly find an issue on Prisma's GitHub repository where other's have gotten the same error. Dig a little deeper, and in Prisma's v4 upgrade guide they introduced "Better grammar for string literals".

There it is! Prisma is opinionated on what a valid string is.

When we truncated our string, we cut the surrogate pair in half. That created an invalid string by Prisma's definition, causing a validation error.

With all of this information, we write a bug issue describing how to reproduce and move onto fixing it.

Fix

Given this codebase uses Prisma and we now know Prisma's opinions on strings, we need to make our truncate utility function respect surrogate pairs. The implementation isn't wrong, it just needs to be expanded.

An easy way in JavaScript to be aware of Unicode characters (like our emoji represented by a surrogate pair: '\uD83E\uDD73') is to convert it to an array:

Array.from('\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73\uD83E\uDD73')
// ['🥳', '🥳', '🥳', '🥳', '🥳']

Using this, our truncate implementation becomes:

export const truncate = (text, maxLength) => {
  if (text.length <= maxLength) return text

  // Convert to array to handle Unicode strings
  const chars = Array.from(text)

  if (chars.length <= maxLength) return text

  return chars.slice(0, maxLength - 3).join('') + '...'
}

Write a few more unit tests to cover this edge case, create a merge request, get some reviews and approvals, and merge.

After QA gives us the green light, we can get this in production.

During QA I got this DM:

emoji at 100th char.. what an interesting bug lol

I couldn't agree more! (So much so it inspired this article.)

With QA approval, the code gets promoted to production. Once in prod, we can redrive our Dead Letter Queue and get this notification to the user!