Terra
Integrations
Research

Integration

API
Unified API
SDK
SDK
Authentication
Authentication
Streaming
Streaming
Blood
Blood Report API
Planned Workouts
Planned Workouts
AI Interface
AI Interface

User engagement

Graph API
Graph API
Scores
Health Scores
Rewards
Health Rewards

Use cases

Enterprise
Enterprise
Insurance
Insurance

Developers

Wearable Data
Wearable Data
Community
Community
Documentation
Documentation

Learn

Blog
Blog
Podcast
Podcast
Events
Events
Reports
Reports

Company

Customers
Customers
Careers
Careers
Partners
Partners
Support
Support
Pricing
Become an integrationGet started
IntegrationsResearch
Unified APIUnified APISDKSDKAuthenticationAuthenticationStreamingStreamingGraph APIGraph APIScoresScoresRewardsRewardsBlood Report APIBloodAI InterfaceAI Interface
EnterpriseEnterpriseInsuranceInsuranceWearable DataWearable DataCommunityCommunityDocumentationDocumentationBlogBlogPodcastPodcastEventsEventsReportsReportsCustomersCustomersCareersCareersPartnersPartnersSupportSupport
Pricing
Get startedBecome an integration
next ventures
pioneer fund
samsung next
y combinator
general catalyst

The world's best health apps run on Terra data

Get started
ProductsIntegrations AI Interface Authentication Mobile Development Documentation GraphAPI
DocumentationAPI SDK Quickstart
CommunityBlog Research Community Podcast Github
CompanyAboutCareersCustomersBecome an IntegrationCookies PolicyGDPRPrivacy PolicyTerms of Purchase
© Terra API. 2026 — All rights reserved.

Cookie Preferences

Essential CookiesAlways On
Advertisement Cookies
Analytics Cookies

Crunch Time: Embrace the Cookie Monster Within!

We use cookies to enhance your browsing experience and analyse our traffic. By clicking “Accept All”, you consent to our use of cookies according to our Cookie Policy. You can change your mind any time by visiting out cookie policy.

Cookies Policy
< Blogs
Alex Venetidis
Alex Venetidis

April 13, 2022

Building for scale when optimising for speed

Our startup environment is characterised by one thing at its core: speed. Failing to build things fast stunts growth and inhibits development of creative ideas, especially when solutions to engineering problems are scrutinised for scalability instead of their inherent value. With that being said, scaling the infrastructure of our product is an inevitable and crucial step as long as this scaling is considered within context and at the appropriate time.

Background & Description of the Problem:

We were recently faced with a scaling problem when one of our larger customers submitted a request for 2 years' worth of data for thousands of their users, for data analytics purposes. When processing such large requests, we implement a method of "chunking" the data into week-long segments. This allows processing to be broken down into a number of individual pieces, protecting us from the fragility of a single huge data payload being normalised at once.

When the large query comes in, it gets sent into a queue of tasks and waits to be processed asynchronously. Once a worker accepts the original 2 year-long task, it then spawns one task for each week of the initial request, and waits for their completion in order to send a message indicating that all 2 years of data has been transferred successfully.

As a result, each request submitted was broken down into ~100 further week-long requests, causing millions of calls to be pushed into our task queue. This first caused a huge backlog of queries waiting to be processed, and other devs were unable to receive any webhooks - whether that be their own data requests, or even notifications of users authenticating! To make matters worse, each "parent request" would wait on its sub-queries to be completed, but due to the sheer number of "parent requests", all workers were filled with them and were busy waiting on sub-requests, which were never actually processed since they couldn't be allocated to any worker.

Due to the two factors above, our queues kept filling up instead of slowly being emptied.

Immediate action: How did we find out about the incident and what actions did we take?

Thankfully, we've previously set up some internal testing infrastructure which informs us of webhook delays, tracks uptime for HTTP requests, and provides other analytics. Shortly after our customer's large requests were submitted, our testing powerhouse (Terra Thor, built from scratch in-house!) notified us on Discord about large webhook delays in the API.

After some time, when we realized this wasn't just a bug in the testing framework, we immediately went into crisis mode: what can we best do to resolve the problem right now, before thinking of solutions not to let this happen again.

We spent 5 minutes diagnosing what the problem was, and another 5 understanding what the potential solutions were:

  1. Clear the entire queue, and ask the developer to submit the query again once we upgraded the infrastructure.

Drawback: this would also inadvertently clear other developers' requests, and severely hinder them if - for example - they had authentication webhooks which weren't send through in the end

2. Spawn a large number of workers to hopefully clear the tasks after some time.

Drawback: We couldn't know how long this would take, and devs could be waiting for the webhooks for a while.

We deliberated and felt that there would be way too much cost to clearing the queue, and would cause trouble for hundreds of devs relying on the API. We chose option B, and the queue was then slowly cleared over around 1–2 hours.

Black box approach: Learnings & Changes to be made

Learning from mistakes and iterating as we go are two of our fundamental guiding principles: hence this was the perfect opportunity to revisit our webhook infrastructure.

First, we provided chunking parent tasks with their own separate queue. This ensures that they cannot block other requests from completing as had just happened.

Second, we needed a more apparent notification of urgency when something is up with the API. We created an upper threshold of webhook delay, past which we will get notified by SMS on our mobile phones, so that we can understand if something is genuinely going wrong.

Third, one developer's requests should never interfere with another ones: from a customer's standpoint, they should not have to care what another one of Terra's users is doing on their end, and have the right to expect reliable, uninterrupted service regardless. To this effect, we decided it made sense to give larger developers their own set of queues, so that their tasks don't get mangled with others'.

While speed of implementation is one of our core values, we need to be able to react to situations in which we need to think of scalability, and adapt to our customers' needs. We learn, grow, and document our learnings so that we can build on top of robust foundations.

Related Articles

September 2025 updates

October 1, 2025

September 2025 updates

July: Terra Research launches, Lab Reports land in the dashboard with PDF/Image → JSON, and Samsung Health moves to the new Data SDK for a tighter Android integration. 🚀

Alex VenetidisAlex Venetidis
August 2025 updates

September 1, 2025

August 2025 updates

🎉 July Highlights: InBody Goes Global, Faster APIs, and Rock-Solid Data 💪📊

Alex VenetidisAlex Venetidis
July 2025 updates

August 2, 2025

July 2025 updates

July = rock-solid Terra: WHOOP V2, Garmin & Fitbit bug fixes, faster SDKs, plus bulk blood-report uploads with smarter reference ranges. Reliability + data power-ups! 💪🩸

Alex VenetidisAlex Venetidis

More Topics

All Blogs
Team Spotlight
Startup Spotlight
How To
Blog
Podcast
Product Updates
Wearables
See All >
The complete guide: How the new Google Health API works

The complete guide: How the new Google Health API works

Google Health API replaces the Fitbit Web API. This is the field guide with code, schemas, and a migration playbook to help you understand where Google Health is heading.

Vanessa NeeffVanessa Neeff
May 18, 2026
September 2025 updates

September 2025 updates

July: Terra Research launches, Lab Reports land in the dashboard with PDF/Image → JSON, and Samsung Health moves to the new Data SDK for a tighter Android integration. 🚀

Alex VenetidisAlex Venetidis
October 1, 2025
August 2025 updates

August 2025 updates

🎉 July Highlights: InBody Goes Global, Faster APIs, and Rock-Solid Data 💪📊

Alex VenetidisAlex Venetidis
September 1, 2025
July 2025 updates

July 2025 updates

July = rock-solid Terra: WHOOP V2, Garmin & Fitbit bug fixes, faster SDKs, plus bulk blood-report uploads with smarter reference ranges. Reliability + data power-ups! 💪🩸

Alex VenetidisAlex Venetidis
August 2, 2025
June 2025 Updates

June 2025 Updates

June brings Terra MCPs for AI-driven setup, Fern-powered Python/JS SDKs with strong typing, and official Expo plugin support—build faster with less friction. 🚀🧰📱

Alex VenetidisAlex Venetidis
July 1, 2025