Tuesday, May 4, 2021

A Software Developer’s Design Document Handbook

 Written by Fuad Balashov Contributions from Kevin Hogan and Benjamin Edgar

© 2021: Feel free to distribute and modify with attribution. Please contact me for permission if you intend to do so for commercial purposes.

This handbook is meant for those looking to refine/formalize their software design documents and for engineers writing their first design doc. Communicating technical details and designs is a key skill that I have had to develop in the 7 years I have been an engineer. Since I switched companies and became an engineering manager I have seen others struggle and learn the same skills I did, often organically. As a result I set out to create a document that encompasses the lessons I learned and genercize them so they can be used across different software engineering applications.

This document offers some guidance on approaching software design documents and then offers a reusable framework for writing them. Because it is generic, I would encourage you to copy and modify it to your domain and company.

This document doesn’t get into design principles like SOLID, nor is it a guide on how to be a good software architect. Instead it focuses on one approach to communicating software design which will dominate your time as you become a technical contributor and leader. Note that my experience is mainly working in mid to large size companies (engineering departments between 80-400 people) with agile teams of 4-8 software engineers, as well as embedded project managers and quality engineers. Some of this document assumes that you will have access to people of these roles and other engineers to get feedback from. It is possible that you may work in a much smaller team without these resources. I also have spent most of my work experience building mobile apps and web services/apps which no doubt influences the language and examples of the doc, though i did try to generalize it.

Finally, there are some places where I have limited the depth I go into a topic to just the high level.  For example, where I describe what to describe if you are making an “api change” I don’t offer best practices on making an API change. I do my best to link to additional resources in these cases, and would also encourage readers to do their own research on these topics if they are relevant to the design they are making.

Why write this, and how to use it

Real world problems rarely offer an opportunity for one clean solution. Engineers must consider the time and costs of implementation, how to support future improvements, and the opinions and needs of the stakeholders. Moreover, it is rare that all this information is available at the start of a project, software engineering is as much an investigative and social practice as it is an applied one.

Today, as a software engineer becomes more senior, they are expected to shoulder more of these responsibilities. For your first few years your technical skills and ability to solve technical problems are essential. As you grow, you will use these skills as a foundation to solve business problems and lead others to solve problems. Even those who pursue the most technical paths will eventually find themselves solving problems that require business context and leadership.

In school I had taken a few classes on software design and architecture. Understanding these principles is helpful when approaching code and brainstorming how to design a system. However, it was at work that I received most of the training for when and how to apply these principles.

I want to provide a framework for approaching software design that will help me and others create and communicate robust designs. Each section (numbered) after the first can be seen as a section of a design document, while each subsection (lettered) may be a part of a design document you create.

A handbook, in my mind, should be a practical document that provides continued value after its first read. To this end I would encourage you to consider copy and pasting the numbered and lettered chapters into your design documents as a starting point. I also suggest you annotate and edit this document for your own use as you learn lessons from your features and designs.

Sidenote: Be Humble - Iterate and Communicate

Approach your design as an iterative art. Lay out your initial plan and identify areas where you are missing knowledge. Use that as a blueprint for what information you need to gather and take the time to investigate and document what you learn. Going through this cycle a couple times should get you to a point where you will be ready to collaborate on the design with others. There are bound to be points of your design others will improve on. Take what they offer, weigh it against your design and see if there are improvements you can make.

The Handbook

The length of your design document should roughly correspond to the complexity of the problem you are working with. In cases where you are designing something very large you should probably break down your design into separate segments and explore/write the segments as you get to them.

Let's walk through the different parts of a design document you may include. In general I would expect to have all the top level numbered sections below but not all of the nested lettered sections. Even if the problem you are working with doesn't seem to involve one of these, I would encourage you to include them as a section in the outline of your document and drop them if they aren't relevant.

1. Reviewers and Resources

Link any resources related to your feature so it is easy for your and others to navigate between documents. Once you are ready for your document to be reviewed, tag those who you'd like to review your document and add a field for them to express whether they have reviewed and approved the document.

2. Problem Statement

In a short and sweet statement, lay out what you are trying to do and why. Make sure to explain who the users will be and how this improves their workflow.

It is important to understand why you are adding a piece of functionality. You should at least have one conversation with your product owner/manager to determine the users of the feature, and what value the feature provides to them. Beyond just communicating this to others in your design, it will enable you to make improvements and trade-offs to maximize the value-to-effort ratio.

3. Considerations

Considerations provide background on the problem you are trying to solve. They help you and readers understand the lay of the land and help determine the technical requirements of your solution.

As you'll see below, these considerations cover a wide variety of areas. Some of them may not be relevant to every problem you need to solve, others will have different meanings based on the problem you are tackling. For example, performance of a front-end feature may require responsiveness of interactions and animations, but for an api it may entail caching or queueing parallel calls.

3a. Performance

I have learned that when I am working on a problem which involves complex logic, parallelization, or constant use, I need to consider how to make the solution performant. Make sure to keep in mind how the code/feature will be used to ensure you are hitting the right mark with performance.

If you are working on a user facing feature how will users interface with it? This may be a feature where they can execute multiple parallel actions (uploading files), this also may be a feature that requires regular updates of the UI which could bottleneck the javascript thread of a web application.

If this is an api, do you expect users to make many concurrent calls to it at the same time? In some cases caching may be a way to address this, but in other cases you may not be able to cache the data you are returning (a silly example being the current time). In cases like these it may make sense to limit complex uncached logic in your api and plan for performance testing to validate the design will meet your needs.

Finally, in some cases, you may not have enough information on your performance requirements. If this happens, you can add logs and tests for performance as a basis for conversations with PMs as the feature is developed and rolls out.

3b. Time

Having an ambitious "delivery date" for your feature is actually better than not having one at all. It gives you the ability to have a meaningful discussion with your PM about what you can deliver and determine the right level of investment for a feature. It can also help to understand the relative priority of your feature so you can judge how much time you can dedicate to it.

In this vein, don't be taken in by the belief that a quick solution is less robust than one which takes longer. The pitfalls and inner workings of a simple solution are often much easier to understand than a larger more complex one.

3c. Monetary Costs

What technologies are available to solve your problem? Make sure to read their licenses and ensure you won't be accidentally open sourcing your code or have to pay some unexpected fees by doing so. Also, consider what hardware demands your solution may impose, how those demands will scale, and if this is something you should address.

If you are planning to use an api make sure you are familiar with the terms of use and read over the cost structure. There are all types of different cost structures that you can get acquainted with here. Providing an outline of the options available, their monetary & time costs, as well as the features they offer will be a helpful process for making and communicating your decision. This could be well represented in a table or matrix.

3d. Where Should the Solution Live?

Will you put this feature into an existing app? Where will the files live? Should this be in a new repo? How will the application be deployed?

Oftentimes these questions are answered by your company's best practices. The location of files and their names are determined by the patterns you follow, and most of the time you just build on an existing application rather than creating one from scratch. As a result, this is one of the rarer items to include in a technical design, but it is worth thinking about to make sure you aren't making the wrong decision because it is the familiar one.

There are advantages to putting your logic into a new application or breaking established patterns. Say for example you realize your feature should only be accessible to a seperate set of users than your existing application (for example it offers some new administrator functionality). In such a case you may want to isolate that part of the UI and application logic from the rest of the app to reduce the chance of exposing the new functionality to the wrong users. Alternatively, maybe the functionality you are adding just has nothing to do with any existing functionality. If the rest of your application is bloated, then this may be a good opportunity to step back and think how you can organize your code and user experience to be easier to maintain and use.

You should also consider how your feature will interact with existing functionality. Frequently your design will need to take into account how existing systems work and ensure that you do not break any existing functionality and that the existing systems play well with your new design (if necessary).

3e. Future Proofing/Maintainability

As you work on your feature you will need to come to the right balance between long term maintenance and short development times. It can be tempting to want to build the most robust feature that any incoming developer can jump into and add onto in the future. Similarly you are likely to get pressure to finish the current feature and move to the next as soon as possible. Getting informed on the needs for the feature now and in the likely future will enable you to communicate a clear timeline and justify the level of effort to business and product people alike.

To this end, talk to your PM. Determine how many users will be using this feature, and how do they expect this feature to be built on in the future. Bias to building things in a way that will be easy to expand on to meet likely demands in the future but avoid investments for unlikely enhancements. For example, you may realize from a discussion with your PM that a column in a table, while only holding int values now, is likely to hold strings in the near future. This can allow you design with that in mind, and maybe support the future use case with no added cost from the get-go.

In some cases you face a complex feature which is only meant to serve a few customers. If you realize "doing the feature right" is going to be an expensive undertaking, but you can put it together hack-ily quite easily, it is worth discussing the option with your PM. Identify where you can reasonably cut corners with discussion on future costs which need to be paid if the feature is expanded. For example, In a couple instances, I have omitted front ends for managing feature configurations and instead required SQL queries to do so in order to get a feature out for a market test with one or two clients. This has come with the agreement that future expansion of the feature would require additional development.

3f. Privacy

Whenever you are working with data you are likely to run into privacy concerns. If you are taking in data from a user, or processing a client's data you need to consider how the information will be transmitted and stored.

I’d suggest not taking in data you don't need, and making sure access to a user's data is limited to the user. In the cases where data may be shared, work with your product owner to find a clear way to communicate that to users. The volume and sensitivity of the information you ask for will impact people's willingness to use the flow you have built.

If you need some "sensitive data", for example a social security number, credit card number, or address, avoid storing it if possible. In the cases where you do need to store it, make sure you are following appropriate regulations for the information and annonymizing or encrypting the data at rest.

Your industry or company may also have specific rules around how user data is stored. Concerns I have personally had to address were around the format of the data (anonymization or encryption), the location of the data (literally where the database servers were), and protocols which could be used for passing information (SSL/TLS). Large companies should provide training and guidance around this, if yours doesn't, you can look externally to see what your industry's norms are.

If you are looking to learn more about applying good privacy practices to your designs this powerpoint from Microsoft is a solid introduction.

3g. Security

Security, along with performance and privacy was one of the most abstruse aspects of design for me. Most aspects of security are handled by the language and framework you plan to use. The dot net framework for example offers you protection from vulnerabilities like SQL injection, and CSRF if you use them dogmatically. This makes it easier to develop applications and to miss common pitfalls by straying from the framework. For this reason, it is valuable to understand the different types of vulnerabilities, how your framework protects against them, and how they tie into your feature.

Some commons pitfalls are:

  • CSRF

  • XSS

  • Input Validation

  • Serialization/Deserialization

  • Injection (SQL, XML,)

  • Broken Authentication or Session Management

  • Access Control

  • Buffer Overflows

  • Logging and Monitoring

You can learn more about different common security pitfalls by reading the OWASP Top Ten or taking this free OWASP course. The most common security issue I need to write code for are access control requirements. Normally this just requires the configurations of specific endpoints or controllers.

Finally, I recommend thinking about security as you plan testing for the feature. For example, if you are taking in input text, you should have cases for injecting sql, html, or js scripts to see whether they are sanitized.

3h. External Dependencies

What other teams are stakeholders in solving this problem? If you will be spinning up a new application and there is another team responsible for deployments it would behoove you to call that out and to include them in the review of the document. Maybe something about this app will be novel enough that they will need to do additional research and setup.

It is also possible another developer is working on the same area of your code as you. Make sure to explain how you will avoid stepping on each other's feet and decouple (or couple) your deployment. You may also need to structure the increments of your work to merge in a specific order to prevent conflicts as well.

4. Proposal

Now, after feeling out the problem and the constraints you must work with, it is time to declare what you think is the best approach to solving the problem. Instinctually I want to sell my solution and get people to agree with what I put forward. And each time I write a proposal, I end up having to step back and reevaluate how I can instead get people to understand my plan, while instead inviting criticism and discussion. That isn't to say you shouldn't have an opinion, you should, just push yourself to keep an open mind through the discussions which take place.

Make sure to provide a high level description of the design before diving into the nitty gritty. If appropriate, I would suggest providing a high level diagram of your system/feature and how it interacts with other parts of the application. Sometimes a before/after diagram can also be helpful to include. The following sections describe common sections I had included in my proposals.

4a. Data Model Changes

Clearly lay out what data models you will be adding or changing. Make sure to explain the datatype of different fields and any relationship between models. Including code snippets helps to visualize the changes and provide a second way to communicate the same information. If you are removing or modifying existing fields this could have large repercussions, make sure to think through whether any of your changes are “breaking” and how to mitigate any effects of that.

You should also explain any migration plans between data model versions, but this is covered in a later section.

4b. API changes

Describing the api changes before you get into the code changes will make it easier for reviewers to understand why changes in your code are being made and keep in mind how a specific change relates to the larger picture. A diagram could also be helpful here. Make sure to consider whether the changes you are making are “breaking” and how you can offer a smooth upgrade path for consumers of the api.

4c. Code Changes

This section will no doubt be divided into smaller sections relevant to your feature. This could be multiple apps in a microservice ecosystem, or your front and backend in a web app. You could also take the approach of dividing this up by different “flows” or pieces of functionality you will need to modify for your feature. I would suggest writing out a first pass design, and through that process determining a meaningful categorization of your code changes.

4d. Picking your Tools

The newer the project you are working on, the more freedom you'll have to pick new tools. It is easy to get excited and want to use the best technology for the problem, but your goal should be to instead use the best tool for you and your organization. That means you may lose some performance, but instead use a programing language which is easier to learn, or you may have to give up some nice built-in syntax for your db query language because the tool you want to use has licensing fees and you work at a non-profit.

4e. Migrations / Backwards compatibility

Migrations plans are a must when you are changing a data model. They are also common when you are changing an api and need to switch consumers of the api over. Explicitly call out how you will be switching over to the new data model without impacting end users. This may require you to split up your switch to the new model by first adding new fields and deleting them in a later follow up unit of work.

4f. Toggles

If you are releasing new logic you should consider toggling it so you can control when the feature becomes accessible to users and turn it off if there is a bug. Toggles aren't always necessary, for example, if I am refactoring a piece of code and have solid test coverage then I won't worry about toggling my code. Make sure to explicitly call out any toggles, what they are for, and when they will be turned on in different environments.

side tip: having awareness of industry trends via blogs, newsletters, and other reading is a good way to be aware of appropriate tools you can use.

Some questions to consider when picking from different tools are:

  • What do you know?

  • What tools are available?

  • What functionality do you need?

  • What are the costs of licensing?

In cases where you have a lot of tools or dimensions to compare you can put together a matrix to compare tools to different pieces of functionality you need. This will provide you and others with a clear way of weighing the different options and enable you to come to consensus more easily.

5. Offering Alternatives

Sometimes it can help to offer alternative approaches to your solution. If the approach you are taking has some drawbacks or seems unintuitive, alternatives can help test whether another approach would be more desirable.

For example, let's say my approach requires me to merge into two repositories at once and deploy from them at the same time. Outlining alternatives to address this may reveal I can avoid this by introducing a toggle. Or perhaps I'll realize any alternative will be prohibitively expensive and we will be more confident in the original approach. Alternatives also help spark conversation with other team-members.

6. Setup

It can help to clearly delineate the setup necessary for your feature. This can be infrastructure setup or manually entered information needed to get the feature running. Call out what needs to be input in order to make it work. Similarly you should explain if you need to make a new deployment pipeline for the feature and if networking changes are needed so your app or code can get access to data to serve its function.

7. Testing and Logging

Laying out your plans for testing during the design phase will push you to design your code for testability and increase confidence when it comes time to deploy the feature. Don't try to write out a full test plan in this document, that can be covered in a separate page which you develop with your team's quality engineer. Aim to communicate the different types of testing needed and what areas you will cover with manual and automated testing.

Try to maximize what will be unit tested and minimize how much you need to integration test. This will make it easier to perform automated validation. It is helpful to review what tests exist which you can build on, so you can estimate how much work will be needed to write new tests. Beyond just the correctness of your code, you will also need to revisit how you will validate the performance and security of your feature based on the considerations from section 3.

Alongside testing, you should also outline what information you want to log for your feature. It will be valuable to determine the stories you want to tell and outputs you will build with the logged information. The types of information you are likely to include are diagnostic, error, and usage logs. Diagnostic information would capture the performance . Error logging is pretty self descriptive. Usage logs should be based on the type of information you and PMs want to see from the users of the feature.

8. Tasking and Parallelization

Once you have laid out your design, you will need to outline how you will implement it. To do this I would recommend you break down your work into deliverable chunks which you could implement and write tests for in a day or so. Tasks should include enough information that someone else picking them up could understand what they need to do without necessarily needing to reach out to you. The breakdown of your tasks may live inside of a ticket management system (like Jira) instead of in your design document, in this case I would recommend linking the tickets/epic in your document.

Try to look for opportunities to make work parallelizable so multiple developers can take part in the effort. Having multiple developers take part in my feature gives me more confidence in the feature and excitement to deliver. For a mid-large sized feature you can likely create 2-3 meaningful work streams while very large projects can greatly exceed this. I have been on a large project which grew to having 20 or so workstreams as devs owned the development of different UI components. Separation of frontend and backend changes are a natural division, though note that creating an api layer between the two before you start building will help with integrating the two. You can also separate by "read flows" and "write flow" within your backend. Domain experts can often give you more insight on the best way to separate your work for your feature.

In cases where the number of developers exceeds the number of work streams you can plan to have developers pair together on tasks to learn new skills and push through the project more quickly.

Going through this exercise will test your knowledge as you will have to recap your design and ensure it can be built. Don't hesitate to change your design to facilitate easier parallelization.

8. Open Questions

Even after this exercise there will be unknowns. It’ll benefit you to call out what these are and invite collaboration to solve them. Open questions may cover how to handle edge cases, question assumptions you made in your design, or dig into unknowns about the technology you will be using or interacting with. With a solid set of questions at the end of your plan you can make it easier for others to engage, and open your plan up for constructive discussion. Once you get responses to the questions and come to a conclusion, make sure to come back and update this part of the document to share the decision.

Short Stories

I added a few stories of missteps made when developing features and the conclusions I drew from the experiences. I hope this helps illustrate how to get the most out of your design exercises and documents.

Own your Approach

A couple years ago I picked up a feature where I was the only designer and developer. It seemed pretty cut and dry as another developer had outlined most of the technical requirements. As a result, it wasn't exciting to me and I just put together a design based on my predecessor's conclusions. The design got some input from others, but since I was the only one owning it and the most senior developer on the team, the input was not comprehensive.

Development was stop and go since I found myself split between writing code for the feature and helping other developers on my team. In retrospect I realize that from the start I could have used another developer on the feature to increase throughput and motivation to complete it. This is something I could have asked for, but I didn't, due to a combination of pride and a misconception of what I could request.

Once I did have the feature done, and it came time to deploy, I realized I hadn't put forward a plan for integration testing, nor did I have an understanding of the performance requirements. This made setting the app up a pain. The deployment pipeline ended up being unique from any other we had due to security requirements around the app, and this created a new workstream for another team which we should have identified during the design stage.

It was a relief to finally have the feature go live. But within a couple weeks there were complaints that some unrelated features were broken. It ends up, those other "unrelated features'' needed to interact with my app. Due to a miscommunication with the PM for the feature, this had been missed. Moreover, we ended up having very weird performance issues which were due to some of the host machines being misconfigured. This proved to be a time consuming investigation and the problem was eventually addressed by the team who built the deployment pipeline.

Being disengaged at the start of the feature created many challenges later on. I didn't think about how the feature would be set up and deployed, resulting in expanded timelines and increasing the risk of issues in the deployment pipeline. I also didn't challenge the original plan to just have me develop the feature and prepare for parallelization. Having to go through the process and actually answer the questions that I had assumed had already been answered would have had a notable impact on the feature I was developing.

False Confidence

I have found that when I join a new team or pick up a project in a new area, I have the tendency to underestimate the difficulty of my first features and the time needed to complete them. I am going to take a story from one of my co-workers which depicts this better than any of my own. They are an experienced developer, having worked for years in the industry and at our company for close to 9 years.

Their first feature at our company was to add an enhancement to our mobile app which seemed like less than a week of work. I actually recall them saying, "If I can't get this done this week you should fire me". These were words we never let them live down, much to their chagrin. The feature ended up being much more involved due to both the state of our code and some requirements the PM hadn't described in the ticket. By just jumping into the feature instead of discussing the requirements with the PM and the code with other developers, they created wasted work and had to readjust their timeline.

Temper your confidence with some planning and validation to ensure you are building the right thing. Though a large design document would have been overkill in this case, going through the design process and building consensus would have been valuable. Having a design handbook specific to the mobile development team would have had an especially notable impact. If they had to explicitly answer how to handle the “common issues” that others working on the app had run into, they would have realized their original approach was flawed sooner.

The Review Process

After you send your document out for review, try to be responsive, I would recommend checking the document once a day to respond to comments and make updates. If you see there is a lot of confusion from reviewers, consider setting up a meeting to review and discuss the design.

If you get feedback which conflicts with your opinions, run through your reasoning and how you came to your conclusion. You may realize you made some poorly founded assumptions around typical user behavior or performance which you can better evaluate via existing logs or some AB testing. You may also be offered a solution which cuts out all the cool parts of your design, but drastically reduces the complexity of what you are delivering and meets your other constraints. It can hurt your pride, but it will pay off to listen and make those changes and do the additional research.

Every plan will have its drawbacks and tradeoffs. You can discuss and plan as much as you want, but there will always be alternatives and contrary opinions. You don't need to incorporate every one, but it is helpful to hear and acknowledge them.

At a certain point, you need to move forward with an approach. It will have shortcomings, and there will be lessons learned while implementing the solution. In general, you and the team need to be responsive and communicate when those issues are encountered. Let PMs know when timelines change, or work with them to adjust scope, have conversations with those consuming your service to identify the base functionality they need,  ask customers if they care more about X, Y, and Z or A, B, and C.

Epilogue

Note this document is for features and small applications. There will be times when you are faced with the design of a large application, or refactoring a large amount of code. This framework is helpful, but your solution will be hard to capture in one document. I would encourage you to break up the problem you are facing, set each aside as a separate investigation, and tie together all of those using a high level design document or charter. This is not something I outlined explicitly but the lessons offered will still be applicable.

As you investigate, code, and design features, I would recommend keeping notes of what you are doing and learning. This will make it easier to track changes to your design and update your design document with them. I also like to write down my goals each time I sit down to work so I reevaluate what I am doing each time I step back and be more focused, especially when the path forward is unclear.

I wish you the best as a designer, and engineer. I hope the outline offered in this handbook helps you in your journey and leads you and your teams to success! This handbook isn’t yet complete, it still needs your input to be truly yours. There will be plenty of lessons you learn along the way, and I encourage you to annotate and write about them as you do.

Happy Coding,

Fuad



0 comments:

Post a Comment