A Primer on Software Engineering for Science

Breaking down SWE 101 for scientists

Aug 07, 2024

Scientists new to computer programming are often self-taught, mentored by other scientists, and usually are not trained software engineers. Most might not even know the difference between computer science, software development, and software engineering (SWE). An increase in research software is a strong indication that scientists need to start understanding which software engineering practices are appropriate.

Understanding SWE practices can help scientists be more productive and efficient in their research work by being able to communicate and speak SWE’s language. It also helps biologists and software developers communicate better and understand each other's needs and goals to develop a new tool for research when collaborating. In order to understand how to apply SWE practices to iterate faster in the lab, we need to go back to the basics. What makes someone a software engineer? What does it mean to do software engineering? What are some fundamental SWE practices?

Computer science is the theoretical study of computation, automation, and information. Computer programming or coding is a practical application of computer science, where programmers mainly focus on writing computer directions, known as code, and bring software to life.

Software is a sequence or set of “instructions” of code in a programming language for a computer to execute. Software developers use code to plan how to guide computers to do what users would need. They also maintain computer programs, tools, and applications, in the case of a scientist, these programs would be ones used to support biological research (BLAST, PyMol, GMOD, etc). Software development is a subset of software engineering.

A software engineer is someone who applies the principles of software engineering to design, develop, maintain, test, and evaluate software. The key difference between SWE’s and programmers is engineering education. Engineering is the use of scientific principles to design and build machines, structures, and other items such as software. Principles help us describe things which tell us why and how things happen, and are qualitative measures.

In the lab, an obvious example of a scientific principle is the scientific method. It represents a set of general principles. It is a highly variable and creative process, these principles must be mastered to increase productivity and enhance perspective. It’s not as simple as an automated sequence of steps to follow. The scientific method is not the only way to do things, but it is the best-known way to make discoveries that are free from biases in religious, political, or philosophical values.

In that same line of thinking, software development principles are the best-known way to design and develop software. Following certain guidelines can help scientists write error-free, clear, and maintainable code. It can also help scientists be more transparent about software that was used as a part of the scientific process. It’s key to be able to reproduce individual research data, both in biological research and for SWEs.

The Software Development Cycle

The equivalent of the scientific method is the software development life cycle. Generally, it flows like the image below:

Planning is the observation you make of a current problem, in this example let’s say you find issues in your bioinformatics workflow and it’s too slow to run
The analysis is the question you want to answer with a new or improved workflow, so finding a way to make your workflow faster
Design is the hypothesis that this new method you thinkwill be faster, and how you plan to test it
Implementation is the experiments with this new method
Testing/integration is observing whether there are any changes as you iterate your experiments
Maintenance is what happens once you get your conclusions/results

Test-driven development is a specific example of a type of development cycle. Your test titles would be your hypothesis, the test body is the experiment, run your tests and see if it fails, write new code, run tests again until you reach your test passes.

So what exactly does a scientist need to know when developing code?

Just like there are some fundamental practices to making agar or running western blots, there’s a few rule of thumbs for SWE. Some fundamental practices include:

Legible, correct code
1. It’s not about how fast you write, but about how easy it is to read and understand. This is useful if others need to debug or modify it. Keep indentation low, incorporating modularity, descriptive names, and comments when appropriate.
Version control
1. Tracking your changes over time, and sync them with a master copy stored on a server. Code can get lost or break, so it’s good to keep an updated backup. It’s also useful when you have multiple collaborators.
Testing code
1. Large software projects often depend on complicated testing frameworks and are more complex, than just making sure it spits out a correct output. At first, it might be a very simple test, but as the code base gets larger and larger over time it’s critical to have a testing framework implemented.

These are just sample practices but are useful overall for scientists. Scientific software applications can get complex, with millions of lines of code. Collaborative tools and approaches are mission-critical as some applications require input from multiple stakeholders (mathematicians, biologists, or natural scientists).

Software reuse is also an important aspect. As mentioned previously, software is a means to an end for a scientist. If someone’s built an algorithm already for a specific problem, reusing code can reduce development time. Scientists can spend their time focusing elsewhere.

Subaita's Notes

Discussion about this post