Why I Rebuilt Our Entire Platform on Microservices — and What Breaking a Monolith Actually Looks Like

When we started DnXT Solutions, our goal was simple: build the best regulatory operations platform for life sciences. Like many startups, we began with what was practical and fast. Our first iteration was a classic monolith: a robust Java application, packaged as one big WAR file, deployed on WildFly. It worked. It got us our first few customers, then five, then ten. For a while, it was perfectly adequate, allowing us to focus on features and customer success.

But as we scaled, the cracks began to show. What started as minor annoyances quickly became existential threats to our growth and, more importantly, to our customers’ trust. At around 15 customers, with their submission volumes growing exponentially, our elegant monolith started to groan under the pressure. A single customer initiating a heavy publishing job could slow down document management for everyone else. Deployments, even minor bug fixes, meant scheduled downtime for ALL our customers. The final straw came when a bug in our workflow engine, deep within the monolithic codebase, had the potential to bring down the entire platform. The thought of a single flaw impacting every single one of our life sciences clients, potentially disrupting critical regulatory submissions, was simply unacceptable.

It became clear: the foundational architecture that had served us well in the beginning was now a significant bottleneck. I made the call to rebuild. Not a minor refactor, but a complete decomposition of our monolithic regulatory platform into a microservices architecture. It was, without a doubt, the right decision for our future and for our customers. But let me tell you, the journey was harder, messier, and more complex than any architecture blog post or whitepaper could have ever prepared me for. This isn’t just a story about technology; it’s a story about hard choices, trade-offs, and the real-world grit it takes to build a resilient platform in a highly regulated industry.

The Decision: When a Monolith Becomes a Bottleneck

For those early days, our monolithic Java application was a dream. Everything was in one place. Developing a new feature meant touching related code within the same repository, compiling, and deploying. It was fast, efficient for a small team, and relatively easy to manage. We could respond quickly to customer feedback, which is crucial for early-stage product-market fit.

However, as our customer base grew and their usage patterns intensified, the inherent limitations of the monolith became glaringly obvious. Imagine a global top-20 pharma company uploading thousands of documents for an upcoming submission, while a mid-size biopharma is trying to finalize a critical eCTD publishing job. In our monolithic world, these concurrent, resource-intensive operations were all competing for the same CPU, memory, and database connections.

  • Scaling Walls: One customer’s heavy publishing job would literally slow down another customer’s document management experience. This “noisy neighbor” problem was unacceptable. We needed independent scaling for different functional areas.
  • Deployment Headaches: Every deployment, no matter how small, required a full redeployment of the entire application. This meant downtime, however brief, for ALL customers. In the life sciences, where submissions have strict deadlines and audit trails are paramount, even a few minutes of unscheduled downtime is a major issue. Our customers operate globally, meaning there’s no “off-peak” time for everyone.
  • Blast Radius of Bugs: A bug in one module, say the workflow engine, had the potential to bring down the entire platform. This single point of failure was a ticking time bomb we couldn’t ignore. We needed fault isolation.
  • Developer Velocity: Even with a well-structured monolith, the codebase grew vast. New developers faced a steep learning curve, and even experienced team members found themselves navigating a massive repository, increasing the risk of unintended side effects with every change.

The pain was real. Our ability to serve our growing customer base effectively and reliably was being compromised by our architecture. It wasn’t just about technical elegance; it was about business continuity, customer satisfaction, and our reputation in a demanding industry. That’s when I knew we had to embrace a microservices architecture for our regulatory platform.

“The pain was real. Our ability to serve our growing customer base effectively and reliably was being compromised by our architecture. It wasn’t just about technical elegance; it’s about business continuity, customer satisfaction, and our reputation in a demanding industry.”

Our Microservices Architecture: Building for Regulatory Scale

The decision to decompose was just the beginning. The next step was designing what that new architecture would actually look like. We didn’t just blindly follow trends; we made pragmatic choices based on our team size, our specific regulatory requirements, and our desired outcomes.

Here’s what we actually built:

  • Independent Services: We broke down our monolith into 8+ distinct, independent services, each owning a specific domain and responsible for a set of related functionalities. These included:
    • db-service: Handles all core data persistence and retrieval, acting as the single source of truth for structured data.
    • auth-service: Manages user authentication, authorization, and single sign-on.
    • tenant-service: Responsible for managing tenant-specific configurations and metadata.
    • audit-service: Dedicated to capturing and maintaining a tamper-proof audit trail for all critical actions, a non-negotiable for 21 CFR Part 11 compliance.
    • search-service: Powers all search capabilities, indexing documents and metadata for fast retrieval.
    • workflow-service: Orchestrates complex regulatory workflows, approvals, and task assignments.
    • file-management-service: Handles secure storage, versioning, and retrieval of all regulatory documents and files.
    • ai-gateway: Our dedicated service for integrating AI/ML capabilities, like automated document classification or content extraction.
  • Domain Ownership & Independent Deployment: Each of these services owns its domain, has its own API, and, critically, can be deployed, updated, and scaled independently. This was a massive win for reliability and velocity. A bug fix in the auth-service no longer requires redeploying the entire platform, nor does it impact the workflow-service.
  • Azure Container Apps for Orchestration: We chose Azure Container Apps for orchestrating our microservices. Why not Kubernetes? Simply put, our team is relatively small. While Kubernetes is powerful, its operational complexity and steep learning curve weren’t justified for us. Azure Container Apps gives us the auto-scaling capabilities, traffic splitting, and serverless container execution we needed, without the heavy operational overhead of managing a full-blown K8s cluster. It allowed us to focus on building the product, not managing infrastructure.
  • Shared Oracle Database with Tenant Isolation: Data integrity and isolation are paramount in a regulatory platform serving multiple life sciences companies. We opted for a shared Oracle database, but with strict tenant isolation at the schema level. Our GLB_DATA platform layer ensures that each customer’s data resides in its own logical schema, providing robust segregation while allowing us to manage a single database instance efficiently. This was a critical design choice for multi-tenancy in a regulated environment.
  • Inter-Service Communication via REST: We considered event-driven architectures with message queues for inter-service communication. Ultimately, we decided on synchronous RESTful APIs. For our team size and the nature of our interactions (often requiring immediate responses and clear request-response patterns), REST was simpler to implement, debug, and trace. While it introduces some coupling, the explicit call stack made troubleshooting much more straightforward when a request spanned multiple services.

This new architecture immediately addressed many of our pain points. We could scale specific services independently, ensuring that a heavy publishing job only taxed the file-management-service and workflow-service, without impacting document browsing or authentication. Deployments became non-disruptive, and the blast radius of any potential bug was significantly reduced. This robust microservices architecture regulatory platform was built for resilience and scalability, meeting the stringent demands of the life sciences industry.

The Hard Parts No One Mentions: Real-World Microservices Challenges

While the architectural vision was clear, the actual execution of decomposing a monolith into a microservices architecture was a brutal education. This is where the rubber met the road, and we learned some hard lessons that architecture diagrams rarely convey.

  • Data Consistency Across Services: This was, without a doubt, one of the trickiest challenges. In a monolith, a single database transaction ensures atomicity. With microservices, where different services might own different pieces of data, achieving consistency becomes incredibly complex. For example, when a document is uploaded, the file-management-service needs to store the file, the db-service needs to record its metadata, the search-service needs to index its content, and the audit-service needs to log the upload event. If any one of these fails, what happens? Distributed transactions, in my experience, are largely a myth in practice – too complex, too slow, and often not fully supported. We had to embrace eventual consistency and implement robust retry mechanisms and compensating transactions. It meant designing for failure at every step and building complex saga patterns to ensure that if one step failed, we could either retry or gracefully roll back related operations. This is a level of complexity that simply doesn’t exist in a monolith.
  • Debugging in Production: In a monolithic application, when a user reports an issue, you can often trace the request path through a single log file or debugger. With a microservices architecture, a single user request might touch four, five, or even more independent services. Tracing an issue becomes exponentially harder. A request starts at the API Gateway, goes to auth-service, then db-service, then workflow-service, then file-management-service, and so on. To tackle this, we invested heavily in observability. We built correlation IDs into every single request. These unique IDs are passed from service to service, allowing us to stitch together the entire journey of a request across all services in our logging and monitoring tools. Without this, debugging in production would be a nightmare.
  • Environment Variable Management: This sounds trivial, but it quickly became configuration hell. Imagine 8 services, each requiring a dozen or more environment variables for database connections, API keys, service endpoints, logging levels, and more. That’s nearly 100 environment variables to manage across development, staging, and production environments. One wrong variable, a typo, or an outdated value, and a service silently fails or behaves unexpectedly. We moved from simple .env files to a centralized configuration management system (Azure App Configuration) to manage these variables dynamically and consistently, but the initial pain of managing this manually was significant.
  • The Team Size Problem: Microservices often assume larger, specialized teams where one team might own a single service or a small group of related services. We’re a lean, focused team. With 4 developers, each person ended up owning 2 or more services. This meant constant context switching, a broader knowledge domain for each developer, and a higher cognitive load. While it fostered full-stack ownership, it also meant that deep expertise in any single service could be diluted. It’s a trade-off that small teams embarking on a microservices journey need to be acutely aware of.

Lessons for Founders: When (and How) to Decompose

Having lived through the trenches of this transformation, I have some strong opinions and hard-won advice for other founders, especially those in regulated industries like life sciences:

  • Don’t Decompose Until the Pain Is Real: This is my number one piece of advice. Premature microservices is worse than a well-structured monolith. If your monolith is still serving your customers well, if you’re not hitting significant scaling or deployment bottlenecks, don’t rush into microservices just because it’s the “trendy” architecture. A well-designed monolith with clear boundaries and modules can scale remarkably far. Decompose when the business pain is undeniable, when your existing architecture is actively hindering your growth or reliability.
  • Start with the Services That Have the Most Independent Scaling Needs: Don’t try to break everything apart at once. Identify the parts of your application that experience the most unpredictable load or have the most distinct scaling requirements. For us, the ai-gateway (which could have bursts of heavy processing) and the file-management-service (which handles large file uploads and downloads) were prime candidates. Extracting these first allowed us to gain experience with microservices patterns and tools without overhauling the entire system.
  • Invest in Observability BEFORE You Decompose: I cannot stress this enough. Before you even write the first line of microservice code, ensure you have robust logging, monitoring, and tracing in place. You will need it. Without centralized logs (like Azure Monitor with Log Analytics), metrics (Prometheus/Grafana), and distributed tracing (OpenTelemetry), you will be flying blind when issues arise across multiple services. This foundational investment is non-negotiable for a successful microservices architecture regulatory platform.
  • Keep Services Coarse-Grained: The temptation is to break things down into the smallest possible units. Resist this urge, especially with a small team. Fine-grained microservices, where a single business capability is spread across many tiny services, is a recipe for maintenance hell. Each service comes with its own deployment pipeline, configuration, and operational overhead. Aim for coarse-grained services that encapsulate a significant business domain. It’s easier to split a larger service later than to merge many tiny ones.

Cloud-Native in a Regulated World: Beyond the Hype

Operating in the life sciences space adds another layer of complexity to any architectural decision. Regulatory compliance, particularly with standards like 21 CFR Part 11, doesn’t care about your chosen architecture; it cares about the outcomes.

  • 21 CFR Part 11 and Architecture Agnosticism: 21 CFR Part 11 doesn’t mandate a specific architecture. It mandates robust audit trails, strict access controls, data integrity, and secure electronic signatures. Our microservices architecture, with a dedicated audit-service and auth-service, actually made it easier to build and enforce these requirements consistently across the platform. Each service is designed with security and auditability in mind, ensuring that every critical action is logged, attributed, and protected.
  • Multi-Tenancy with Data Isolation is Non-Negotiable: In pharma, data segregation between clients is not just a best practice; it’s a fundamental requirement. Our decision to use a shared Oracle database with tenant isolation at the schema level was a direct response to this. It offers the operational efficiency of a shared database while providing the strong logical separation and security that our regulated customers demand.
  • Azure Container Apps: Auto-Scaling Without the K8s Complexity Tax: Choosing Azure Container Apps was a strategic decision to embrace cloud-native benefits without incurring the significant operational complexity and cost of Kubernetes for a smaller team. It gives us automatic scaling of services based on demand, integrated HTTPS endpoints, and event-driven capabilities, all managed by Azure. This allows us to deliver a highly available and scalable microservices architecture regulatory platform, meeting peak demands without requiring a large DevOps team dedicated solely to Kubernetes management. It’s the sweet spot for delivering enterprise-grade performance in a lean operation.

The journey from a monolithic regulatory platform to a resilient, scalable microservices architecture on Azure was transformative. It wasn’t easy, and we learned some tough lessons along the way. But the result is a platform that can truly scale with the demands of the life sciences industry, offering unparalleled reliability, performance, and the agility to innovate rapidly. This experience has deeply informed not just how we build our own products, but also how we advise our consulting clients on their own architectural transformations.

See Our Architecture in Action

Curious to learn more about how our microservices architecture powers DnXT’s regulatory platform? We’d be happy to walk you through a detailed demonstration and discuss how these architectural principles can apply to your own challenges. Contact us today to schedule a consultation.

About DnXT Solutions

DnXT Solutions provides cloud-native eCTD publishing, review, and regulatory compliance tools for life sciences companies. With 340+ submissions published and 20+ customers, DnXT is the regulatory platform purpose-built for speed and accuracy.