A Summary of Fundamentals of Software Architecture: An Engineering Approach

7/12/2021

The tech industry has always struggled to come up with a universal job description for "Software Architect".

Probably the biggest reason for this is that most Software Architects don't follow a formalized approach to the profession. Rather, most Architects are driven by intuition formed after many years of technical and industry-specific experience.

While this intuition is in itself invaluable, without this formalized approach, most architects will fail to reach their full potential.

In his excellent book Fundamentals of Software Architecture: An Engineering Approach, Mark Richards meticulously describes this approach, drawing insight from industry thought leaders. Throughout the book, he puts the approach to the test, using case studies and scenarios from the world of modern, Agile-driven software development.

While I highly encourage a full reading, I'll attempt to do the book some justice by summarizing the material from my perspective (aka, my "lazy book report"):

4 Dimensions of Software Architecture

What do architects actually do? Below are the 4 major aspects of software architecture. (Each one is described in more detail in the sections below.)

  • Architecture Characteristics
  • Architecture Decisions
  • Structure
  • Design Principles

Architecture Characteristics

Architecture Characteristics are the “Non-functional” success criteria of the system (ie “availability”, “security”). They typically influence some structural aspect of the design and are important to the success of the system.

  • Definitions
    • Implicit vs Explicit Characteristics
      • Implicit: won’t show in requirements docs but are necessary (i.e. "security", "reliability", etc)
      • Explicit: in requirements docs (ie "user can purchase the asset once their waiver is approved")
    • Common characteristics
      • Common operational characteristics
        • Availability
        • Continuity
        • Performance
        • Recoverability
        • Reliability/safety
        • Robustness
        • Scalability
      • Common structural characteristics
        • Configurability
        • Extensibility
        • Installability
        • Leveragability/reuse
        • Localization
        • Maintainability
        • Portability
        • Supportability
        • Upgradability
      • Common cross-cutting characteristics
        • Accessibility
        • Archivability
        • Authentication
        • Authorization
        • Legal
        • Privacy
        • Security
        • Supportability
        • Usability
    • Custom characteristics: the list above is incomplete. There may be custom characteristics needed for any application/domain/project.
    • "Best" vs "Least-Worst Architecture"
      • "Least-Worst" is preferable
      • We can’t include all of these characteristics because of company limitations (ability to support or limited time/budget) and/or serious tradeoffs (i.e. improved security may affect performance)
  • Identifying Characteristics
    • 3 ways:
      • Domain concerns
        • Listen to stakeholders, find a way to translate their domain terminology into architect terminology (ie “time to market” = “agility + testability + deployability”)
      • Requirements
        • Architecture Katas: a process for drawing out the important architecture characteristics. Format below:
          • Description
            • The overall domain problem the system is trying to solve
          • Users
            • The expected number and/or types of users of the system
          • Requirements
            • Domain/domain-level requirements an architect might expect from the domain experts
          • Additional Context
            • Implicit domain knowledge from the architect/stakeholders
    • Measuring Characteristics
      • Operational
        • Determine metrics and compare expected vs real metrics
          • ie page load time <500ms, use “first contentful paint” and gather with lighthouse
      • Structural
        • (narrow) metrics
          • Cyclomatic complexity: the number of decisions (if’s) compared to lines of code
            • Generally, a value under 5 is good, but it depends
            • Focus testing on areas with lots of complexity
          • Testability: code coverage tools
          • Deployability: measured through # of failed deployments, deployment length, etc
    • Governing: ensuring that architecture characteristics are maintained
      • Fitness Functions
        • Concept of measuring the expected vs actual outcome of architectural characteristics
          • Mechanisms can include metrics, monitors, unit testing libraries, chaos engineering,
          • i.e. measurements for coupling/modularity/code coverage added to the CI/CD pipeline
        • Developers must understand the purpose behind the fitness function before you impose it
  • Avoid “Generic Architectures” that try to include too many characteristics --this will lead to too much complexity and maintainability issues over time. Instead, determine the "must have’s" (the "Least-Worse Architecture")

Architecture Decisions

These decisions form the rules for how a system should be built. They will, ultimately, direct the development teams on what is and is not allowed. An example of an architecture decision would be: "data access can only be performed in the API layer".

  • Decision Anti-patterns:
    • "Covering your assets"
      • Not making a decision out of fear
      • Ways of avoiding
        • Last responsible moment - wait until you have all of the info to make a good decision (but not later)
        • Stay in collaboration with the dev team to ensure your decision can be implemented
    • "Groundhog Day"
      • Making a decision without providing the necessary justification, which results in endless back-and-forth discussions (same thing different day)
      • To avoid:
        • The justification for a decision should highlight all technical and business value
    • "Email-Driven Architecture"
      • The justification of your decision get lost in an email trail
      • To avoid
        • Document decision in a centralized place (ie Confluence)
        • Try to Message only the people who it affects
  • Architecturally significant
    • What decisions should architects make?
      • decisions that affect the structure, nonfunctional characteristics, dependencies, interfaces, or construction techniques
  • Architecture Decision Records - use them!
  • Architecture Risk: decisions, naturally, are influenced by risk
    • Risk Matrix for assessing risk: Risk Matrix for assessing risk
    • Risk assessment, take from score above: Risk scoring matrix
    • Risk Storming:
      1. Working with a group to form a consensus on risk (no architect can go it alone here)
        • Come up with a high-level architecture diagram: a high-level architecture diagram
      2. Send it out to the storming group. Then, each group member will individually assess the risk using the matrix methods above
      3. Meet and form a consensus on risk: a high level architecture diagram with consensus on risk
      4. Work to agree on risk mitigation measures
      5. Come up with user stories and assess the risk of them getting done
  • Rather than guessing at risk, run tests to confirm, if possible

Structure

  • The type of architecture style (or styles) the system is implemented in (such as microservices, layered, or microkernel).
  • Architecture Styles
    • Big Ball of Mud (anti-pattern)
      • System of spaghetti with no architecture
      • Changes are difficult
      • Comes from a lack of guidance and governance
    • Monolithic
      • Layered Architecture
        • Horizontal layers divided by technical concern, usually presentation, business, persistence, and database
        • Difficult to apply domain-specific changes because those changes span many layers
        • Isolation:
          • rules around which layer can communicate with other layers (ie pres can’t comm with data layer directly)
          • Allows for modularity in the layers
        • Pros: cost, reliability, simplicity
        • Cons: deployability, elasticity, modularity, fault tolerance
      • Pipes & Filters
        • Series of independent filters processing info through pipes (ie a workflow process)
        • Pros: cost, modularity, simplicity
        • Cons: scalability, elasticity, fault tolerance
      • Microkernel
        • “Plug-in” architecture
        • Can be layered (technical) or domain partitioned
        • Can be runtime or compile-time
          • Advantage of runtime is that new plug-ins can be added
        • Can be “monolithic” (ie plugin dll’s or packages) or distributed (ie REST)
        • Registry - the place where metadata for the plugin is stored
        • Contract - typically plugins adhere to one contract
        • Pros- deployability, modularity, cost
        • Cons: scalability, elasticity, fault tolerance
    • Distributed
      • Fallacies of distributed computing
        • The network is reliable
        • Latency is zero
        • Bandwidth is infinite
        • The network is secure
        • The network topology never changes
        • There is only one administrator
        • Transport cost is zero
        • The Network is homogenous
      • Distributed logs
        • Makes troubleshooting transaction problems very difficult
        • Use solutions like Splunk/xray/etc
      • Distributed transactions
        • Eventually consistent instead of ACID
        • Transaction Saga pattern can help with this (more on this later??)
      • Contract Maintenance and versioning
        • Hard to enforce contracts for services since they are spread among different teams
        • Comm patterns for versioning inconsistent and require work
      • Service-based architecture
        • Vs microservices
          • Coarser-grained services
            • Data integrity (ACID)
            • Tradeoff - changes are riskier since they could affect the entire larger service
          • Shared database
            • Allows for easier data access (joins) and ACID transactions
            • DB partitioning and changes are harder
        • UI is a separate layer
        • Pros: deployability, fault tolerance, reliability, testability,
        • Cons: elasticity, scalability
      • Event-Driven architecture
        • Using events to communicate between components
        • Broker style
          • Using an event broker like SMS or SQS or RabbitMQ etc
          • Cons: Async nature makes it difficult for error handling, and data consistency
          • Pros: scalability and performance
        • Mediator
          • Handles coordination of events and control lifecycle and workflow (step functions basically)
          • Pros/Cons: inverse of above
        • Workflow Processor
          • Handles errors from normal processors (like DLQ…)
        • Preventing data loss
          • Persisted message queues (SQS)
          • Auto acknowledge - don’t deque (visibility timeout)
          • Last participant support - don’t deque if the last processor fails (ie DB insert)
        • Broadcasting
          • Sending messages to multiple consumers (SQS)
        • Request-reply
          • Handling synchronous behavior in an asynchronous system
            • Request queue takes the request
            • Response queue sends back the result
            • Caller sends to request queue and blocks until it gets a response back from the response queue
        • Request-based vs event-based
          • Use request based for data-driven and predictable workflows (ie retrieving customer data
          • Use event-based for flexibility and responsiveness and handling complex workflows
          • Hybrid (both request and event-driven)
            • Examples are microservices (req-response for API requests, and event-driven between the APIs)
      • Space-based architecture
        • Using in-memory data stores that sync asynchronously to a centralized database
        • Overcomes database bottlenecks, because data stores can be scaled horizontally
        • Event processors (ie web servers) -> in memory db -> virtualized hardware -> central DB
      • Microservices
        • Smaller focused services, driven by an API layer, each contained in a bounded context
        • Bounded context
          • Transactions across bounded contexts are discouraged
          • Performance a concern, since sometimes multiple network calls are needed to assemble data
          • Determining the right granularity of the bounded context is key
            • Avoid making services too small
              • Drivers for determining the size of the bounded context: purpose, transactions, choreography
        • Data isolation is key
        • API layer
          • for handling traffic and common concerns (ie security)
          • Avoid business logic in API layer --keep that in the bounded context
          • Typically also used for service discovery
        • Operational reuse
          • Common shared sidecar component (ie for logging & monitoring)
          • Service mesh - a common way to handle all services operationally, usually manifests in some type of dashboard
        • Frontends
          • UI usually separated from the service bounded context (for technical reasons)
          • Can be a monolithic UI or micro frontend
        • Communication (between services)
          • Synchronous (request-response ie REST)
            • Orchestrator pattern (ie BFF) - create an intermediary “service” to coordinate data between the microservices (ie a service to get Customer and Catalog info from the respective microservices)
            • Front controller pattern - one microservice directly calling others (ie Customer service calling the Catalog service)
          • Asynchronous
            • broker-based event-driven architecture with queues/topics
        • Transactions
          • Avoid if at all possible transactions across multiple microservices (bounded contexts)
            • Instead, adjust the granularity of the bounded context (make it bigger!)
          • However, if necessary, use the transaction saga pattern (example step functions in AWS)
            • This pattern can get very complex so use it sparingly
        • Pros: scalability, elasticity, and evolutionary
        • Cons: performance (network calls, security check, etc)
    • Choosing the right architecture style
      • Decision criteria
        • The domain
        • Architecture characteristics
        • Data architecture
      • Key determinations
        • Monolith vs distributed?
        • Where should the data live?
        • Communication between services - synchronous or async? (use synchronous by default)

Design Principles

  • Guidelines for developers (ie “use async messaging between services for better performance”, or “when to use REST vs RPC”)
  • Vs architecture decision which is a hard-and-fast rule (“service x should use aysnc to comm with service Y”)

"People/Soft Skills" for the Architect

Weaved throughout the book are additional examples of skills required to effectively work with people at the organization when applying a structured approach. The book stresses that these skills are just as important as the harder architecture skills.

Presenting Architecture

  • Diagramming
    • No formal languages (like UML) are necessary anymore, but there are some others like C4 that might be useful
    • Usually, start at the high-level architecture topology, then drill into specific parts (ie microkernel)
  • Presenting
    • Slides vs talking: keep slides engaging and lean, don’t just read the slides, add more through talking

Working with Teams

  • Team Boundaries (Constraints)
    • Provide the correct level: too loose = confusion, too tight = frustration
  • Architect personalities
    • Bad ones
      • Control freak - too tight, controls every decision (even low-level ones meant for devs, typical of new architects who used to be devs
      • Armchair architect - too loose, stays too high-level, and does not consider implementation or collaborating with dev teams; dev team has to take up the slack and handle architecture on their own
    • Good one
      • How much control?
        • Factors determining: team familiarity (new members need a bit more control to adhere, team size (larger=more control), experience (more junior=more control), project complexity (more complex=more control), project duration (longer = more control)
      • Recognizing team warning signs
        • Pluralistic ignorance: not everyone agrees but those that don’t are afraid to speak up
        • Process loss: team coordination slows down productivity (ie merge conflicts)
        • Diffusion on responsibility: the team is too big and people don’t know who should be doing what
      • Recognizing and ensuring efficient teams with high morale will help ensure that your architecture is implemented correctly
  • Team checklists
    • Create checklists to enforce architecture compliance
    • Don’t create too many, the more you create the likelihood of them being ignored goes up
    • Necessary checklists
      • Developer Code completion checklist (example below): example developer checklist
      • Unit and Functional Testing Checklist
        • Examples (where applicable)
          • Special characters in text and numeric fields
          • Minimum and maximum value ranges
          • Unusual and extreme test cases
          • Missing fields
      • Software release checklist
        • Examples
          • Configuration changes in servers or external configuration servers
          • Third-party libraries added to the project (JAR, DLL, etc.)
          • Database updates and corresponding database migration scripts

Expectations of an Architect

  • Make architecture decisions
  • Continually analyze the architecture
  • Keep current with the latest trends
  • Ensure compliance with decisions
  • Diverse exposure and experience
  • Have business domain knowledge
  • Possess interpersonal skills
  • Understand and navigate politics

Negotiation

  • With stakeholders
    • Pick up on buzzwords to understand the concerns (ie “we need zero downtime”=availability & reliability, “lightning-fast”=performance”)
    • Get as much information upfront as possible before entering into negotiations (ie
    • Frame arguments in understandable terms instead of buzzwords (ie “5 nines” means 1 second per day of downtime”)
    • When all else fails bring up cost and time (“5 nines” would cost 2x more than “3 nines”, is it worth it?)
    • Divide and conquer - “is 5 nines needed for all parts of the system, or just some parts?”
  • With other architects
    • Prefer demonstration over discussion - show evidence for pros to your approach
    • Don’t get heated or personal
  • With developers
    • Provide justification rather than dictating from high (ie avoid “Ivory Tower”)
    • Guide developers to arrive at decisions on their own (“will framework Y satisfy the security constraints?”)

Leadership

  • 4 C’s of architecture: communication, collaboration, clarity, and conciseness.
  • Important for gaining respect as a leader for dev teams
  • Balance pragmatic with visionary, be visionary within a realistic context
  • Pragmatic aspects to consider
    • Budget constraints and other cost-based factors
    • Time constraints and other time-based factors
    • Skillset and skill level of the development team
    • Trade-offs and implications associated with an architecture decision
    • Technical limitations of a proposed architectural design or solution
  • Don’t mandate and leverage title with devs (“have you considered..”, instead of “you need to…”)

Architectural Thinking

Whether working through aspects of the 4 dimensions or working with people at your organization, you always need to think like an Architect.

  • Architecture vs Software Design
    • There is no separation, but developers usually handle much of the design
    • Architects should help guide the development design through leadership and mentoring
    • Traditional roles created a disconnect and made the implementation of architecture problematic
  • Technical Breadth vs Depth
    • Architects need more breadth, ie being aware of multiple cloud services instead of just one
    • Breadth is stuff you know + stuff you know you don’t know
    • Focus on breadth, maintaining depth AND breadth is very hard
  • Analyzing Tradeoffs
    • Everything is a trade-off.
    • It’s important to recognize the pros and cons of each option and determine the best option based on the scenario ie “security is more important than extensibility”
  • Understanding Business Drivers (more later)
  • Balancing Architecture with Hands-On Coding
    • Avoid the “bottleneck trap” - taking on some code for a team/project (often “framework code”)
      • Instead, delegate and lead that work with someone else on the team
    • Staying proficient as a developer
      • Code POC’s - helps communicate & validate architectural goal (ie 2 different caching solutions)
      • Tackle tech debt
      • Build dev tools to help the team
      • Do code reviews
  • Modularity
    • A logical grouping of like functionality
    • Measuring modularity: cohesion, coupling, and connascence
      • Cohesion:
        • how related the parts are to one another. (ie “customer-related functionality only belongs in the customer module”)
          • Measure: if you separate things in this module will remote calls need to be made for communication between them
      • Coupling
        • Metrics are very academic here --take them with a grain of salt
      • Conascense
        • When 2 components need to change together to work
  • Component-based thinking
    • Architect vs developer role:
      • Components are about the lowest level that architects should go when performing a role (besides some of the code-level governance stated earlier). Architects should avoid making design decisions like code patterns, function, etc without the involvement of the developer. Let the dev lead in this role with guidance from/collaboration with the architect.
    • Architecture partitioning
      • Before coming up with components, you must decide how to partition your architecture from the top level
        • Domain partitioning
          • separate top-level components by workflows and/or domains
          • Advantages
            • Models business functions closely
            • Easier to build cross-functional teams
            • Aligns with microservices & distributed architecture
          • Disadvantages
            • Customization code appears in many places
        • Technical partitioning
          • Separate components by technical capabilities (ie presentation/bus logic/data)
          • Advantages
            • Separates customization code.
            • Aligns more closely to the layered architecture pattern.
          • Disadvantages
            • More coupling (between layers and data)
            • Possible duplication of domain concepts
    • Component Identification Flow (general process but can be customized, iterative approach)
      • Identifying Initial Components
        • First, based on the partitioning chosen, then free to do whatever, assigning domain functionality to the components
        • The first pass is rarely perfect, the architect must iterate to get it right
      • Assign Requirements to Components
      • Analyze Roles and Responsibilities
        • Pay attention to roles and responsibilities
      • Analyze architecture characteristics
        • How do the top-level arch characteristics affect each component? Ie does the “checkout” component need more elasticity in particular?
      • Restructure components
        • Based on further analysis and feedback
    • Component Design
      • Start course, and then refine
      • Entity trap anti-pattern: architecting based on DB structure or CRUD
      • Actor/actions approach
        • architects identify actors who perform activities with the application and the actions those actors may perform.
        • Works well if requirements define distinct roles and actions
        • Works well for monolith or distributed
      • Event Storming
        • determine which events occur in the system based on requirements and identified roles, and build components around those event and message handlers.
      • Workflow approach
        • models the components around workflows
        • builds components around the identified activities.