Overcoming Challenges in Space Processor Emulation - with Terma
Narayan (Host):
Hi, and welcome to The Space Industry podcast by satsearch. My name is Narayan, COO at satsearch, and I'll be your host as we journey through the space industry.
The space sector is going through some seismic changes, promising to generate significant impact for life on Earth and enable humans to sustain life elsewhere in the cosmos.
At satsearch, we work with buyers and suppliers across the global marketplace, helping to accelerate missions through our online platform. Based on our day-to-day work supporting commercial activity, my aim here during this podcast is to shed light on the boots-on-the-ground developments across the globe that are helping foster and drive technical and commercial innovation.
So come join me as we delve into a fascinating, challenging, and ultimately inspiring sector.
Narayan (Host):
Hello and welcome to today's episode. We're here to discuss the challenges in space processor emulation and how to solve them with Dr. Mathias Holm, a Product Owner at Terma.
Dr. Mathias Holm holds a Master's in Computer Engineering from Chalmers in Gothenburg and worked as a Young Graduate Trainee at the Flight Software Section at ESTEC, and subsequently with flight software development in the UK.
In 2013, he completed his PhD, having done research on optimizing compilers at Leiden University. He has since been working with simulators and emulators at Terma, being involved in several operational simulators, as well as the Terma Emulator Full System Simulator.
Narayan (Host):
So Mathias, welcome to The Space Industry podcast. It's great to have you here, and we're really excited to learn about what you have to say about space processor emulators today.
Dr. Mathias Holm (Guest):
Thank you for the introduction. I'm also very grateful to be on the podcast.
Narayan (Host):
Great. So let's begin by somewhat setting the stage for, let's say, a non-expert listener who doesn't have a PhD in space emulators, by basically asking you: what do flight emulators do today, and can you give a brief overview of their role in actual flights and in the whole process of satellite development?
Dr. Mathias Holm (Guest):
Emulators are quite tricky to explain because people tend to get very low on details when they talk about these. I'll try to avoid that and keep it a bit more simple so you don't need to be an expert in computer architecture to understand this, I hope. But basically, in simple terms, an emulator is a piece of a program, a piece of software that simulates a microprocessor, its memory, and peripherals. So that's basically the whole computer that you would fly in space or put in a microwave or wherever it may be. And it simulates these to the extent that we can run the real software, so the real flight software, without any changes that you would normally upload to the hardware.
So that's the goal of these systems: to be so accurate that we can run the software in them. So, why would we need to do this? Flight software especially tends to be built for very specialized computers. Some are radiation tolerant, with the microprocessors being radiation-hardened, the memories, and so on. They also tend to have very special data buses, which is how you move data, like a USB cable or something like that, but they're very specialized for real-time systems behavior to be able to make sure that the software can run its control loops. So it reads from sensors, does some calculations telling you to fire thrusters, and then obviously gives commands to the thrusters, so this has a timing constraint on how fast it needs to be done.
And another thing that we have with these systems, as I mentioned, is the radiation-tolerant systems. The radiation tolerance is not something that you need to simulate in an emulator, but it does make the hardware tend to be very expensive. So you don't really put these computers on every software engineer's desk; it would be too costly and, you know, spill a coffee on it, then people would not be very happy. We then use emulators to simulate these systems.
And technology-wise, it's pretty much the same as when people are trying to play old games on new consoles. But we're obviously not trying to play Mario here; we're trying to run the flight software without the expensive hardware on the desk or old hardware or whatever. And it's not just for hardware cost reasons, actually, but also for convenience purposes. For software developers, we do code editing, or maybe some people do generation with AI these days, but then you have to compile it and debug it or test it. And this is just much more convenient when you have an emulator instead of a physical box where you need to upload the binaries you have built and so on. So it saves a lot of convenience in that way.
Also, a bit different from your emulators playing Mario or Sonic is that with a flight computer emulator, we provide a mechanism to support the actual software debugging. The industry today is often using this now, and it's been used for a long time. And typically, we've been building software validation facilities or operational simulators, and these all tend to be for specific missions. But the core of these systems, they're all emulators.
Narayan (Host):
That's a beautifully placed analogy; I really like the way that you explained it. Obviously, today, processing and processors are becoming more and more complex given the nature of the missions people are building and the nature of complex payloads that are up and coming and the requirements that they are driving in terms of memory requirements, processing requirements, interface requirements, and many other things that are out there. So there are obviously these multi-core flight computers and hypervisors that have made debugging extremely complex in the process. Can you walk us through why debugging is becoming a nightmare, and in this process, what role do emulators play in managing them?
Dr. Mathias Holm (Guest):
Yeah, I will. Basically, at some point in the distant past, when I started my career writing flight software, we had single-core systems; they ran at 20 megahertz and, I don't know, four megabytes of RAM, not particularly sophisticated. The difficult part was to fit the software on this, not necessarily to develop and actually debug the software. But then we got to the point that we started to get more processors on the chip, so we got to multi-core. Clock rates started to increase a lot, especially lately. And the multi-core introduction in space actually made it difficult to develop software for these, especially if you want to make sure the software is real-time.
So the industry decided pretty much together with, at least in Europe, it was partially driven by ESA, and they decided then to introduce what are called Time and Space Partitioning Hypervisors. So, now that that's a very technical term, we call them TSP hypervisors. But they basically partition time and space. So space here is not being space as in orbit, but space as in memory for the software. So they partition this, so it lets us run different applications, developed separately and partially independently at least, to control the time of these so we can get the real-time guarantees. We can do this over multiple processors and be sure that we run our control loops in real-time, yeah.
But what we got there was the space partitioning. And that meant that we had different applications separate. And if you run on a PC, you start up Word and Excel; it's not the same program. If you crash Word, Excel doesn't go down at the same time, or vice-versa. So, this basically meant that it was now much more difficult to debug systems because you have more applications in the loop. You can look at one application by itself; it's fine, that's pretty much what we always used to do, but when things start interacting with each other, so you have the full integrated suite of applications running, then you need to be able to visualize in some way the behavior of all of these at the same time. Or you will struggle to understand what is happening. And we already see this now; we see developers resorting to just logging and then looking at a two-gigabyte text file to try to figure out what happened when and where and did that happen at the same time in different places and so on. So this is not an ideal situation, I would put it like that.
With emulators, we have control over the simulated environment. And we can add different tools like debugging protocols so we can connect external debuggers to this. But most importantly is that we can stop time. And we can reverse time, either interactively by saying "go back a line," if the emulator supports it. Or we can save the state of the simulation because it is a simulator, so we can write it to disk. And if you then know that we have triggered a bug somewhere, we save the state a bit before we anticipate we trigger this. And then we can always go back, therefore speeding up this compile-edit, sorry, edit-compile-debug cycle.
Now, if we in an emulator also not just simulate the hardware, but also provide a way to understand the operating systems running there. And then also by the actual applications running in these hypervisors and the operating systems. Then we will be able to provide a way to look at the complete system at the same time. So we can see the control loop application, say the AOCS. We could see the payload processing application, and we could all inspect these, look at what's the current state, is it running now, is it stopped, is it, and so on. And we could also see how this is running over time and together with the simulated hardware events and get a timeline.
So someone read from sensors, okay, we see it's coming from the simulated hardware traffic here. Then we see the control loop is doing something with it over time. And then we see something coming out on the actuators to fire the thrusters, turn on the torquers, or whatever it is, yeah.
This is what we're trying to do in the long run with the emulators. It is basically to provide this way of looking at the complete system. And no one has been able to do this as far as we are aware so far. But it's where things are moving at the moment at least.
Narayan (Host):
Yeah, that's an interesting thought in there where you're essentially looking at supporting the entire lifetime of the mission, while also being on the ground and trying to work with what you have in space in that sense. And basically, then you're really talking about how these contribute to mission operations in the process as well, and then how mission operations have actually relied on such real-time simulation with software-based emulators.
So obviously, today we see complexity coming in with higher clock rates, higher multi-core processors, and all of these kinds of things. Now, the question really that then derives from this is, can emulators or even simulators be actually useful through the lifetime of such missions and can be adopted in the long run? And if they're also needed through the entire life of a mission, what do you think about this?
Dr. Mathias Holm (Guest):
Yeah, so that's a very good question. Obviously, I just spoke a bit about the debugging, which is the first phase; you're developing your flight software, but as I also mentioned, we use the emulator also for the operational simulations aspects. And there, we're looking at finally the software that has been deployed to the spacecraft. It's the real thing; it's done, it's developed. But we want to test the telecommand sequences before they are used on the real spacecraft. There have been a few incidents in the past where someone forgot to run their sequences with the emulator that they had or the simulator, and then they made a mistake. So therefore, we really want to have this testing of a full end-to-end system of the sequences during operations.
What we have done traditionally is that because the operational simulator is typically used also for training. And then you have lots of people being trained to handle the launch campaign of the spacecraft and so on. And for that reason, we have always had to run these in real-time. Because otherwise, you don't simulate the stress of the launch with a lot of people in the room and so on in the correct way. For day-to-day use, I do not believe this is strictly needed that it needs to run in real-time. Obviously, you want it to run as fast as possible, preferably even faster than real-time. But in some cases, it might not be possible, especially with the newer architectures.
But there are luckily a few routes to go to keep the performance up. I already mentioned that well, maybe we don't need real-time in all cases, and then we can obviously relax these requirements. But we have two main variants where we can, technically three, I guess, but one is not emulator-based, so we'll skip that for the moment.
One of these is virtualization. So this is something people have heard of often where they get a virtual machine running somewhere with Linux somewhere. And the other is something we call code substitution. And that's a very low-level thing, but it basically means that we can intercept a piece of software being executed or a few routines in the software. And then instead of running them in the emulator with the real stuff, we can substitute it with a simulation model instead.
For some cases, that is quite doable. We can have things like image processing; it processes like a camera image from a star tracker. You don't really need to do this because in the end, the only thing you're producing is the rotational information for the spacecraft. So you might as well just take that from the simulation infrastructure. We already know how the spacecraft is oriented. You don't need to produce the star tracker image or anything like that. So you can save that work and just forward it to the simulation. And that means it will run faster, and you can probably, if you do this with enough points, keep the real-time guarantees of the system. So that's an option.
I mentioned virtualization, and that's another option, and that's where we take and run the emulator actually on the same processor family as what we're trying to simulate. The difficulty with this is not so much running the code, but it's to make sure that we're simulating the time to at least be roughly the same. Most hypervisors that we run, that you get part of, say, Linux or Windows or whatever it is that you have these in, they tend to run the guest, they call it. But they keep that the timing behavior comes from the host. Which means you run too much code on the same time unit because your host machine will run at three gigahertz and the simulated target will be at five hundred megahertz. So you run too much code, which means you impact the timing of the software. And this is not something we can actually do and still have a representative system. So that produces challenges for virtualization. It is technically possible; it's just difficult and a lot of work to get it done.
Narayan (Host):
Great, what you are speaking to is essentially also increasing productivity of people at the end, like developer productivity to a large extent as well. And obviously, when you look at what is happening in other sectors in software, there are many things that people do test out given the scale at which other industries operate, they obviously have a lot more software engineers and a lot more scale to test at the end. And there are obviously strategies like shift-left testing that have been used. And for people who don't know shift-left testing, I guess it's, I'm not an expert as well, but from what I can say, from what I know, is basically to identify and fix and detect errors as quickly as possible, and then trying to reduce costs and improve the quality of the code, but at the same time by doing that early, like testing often, early, and testing it in feedback loops with automation and in teams in collaboration. That's a very small gist of what shift-left testing would mean.
And from an emulator perspective, you also have stuff like what you mentioned in the Terma documents that I had read for this podcast, which is an emulator per engineer concept of doing this, right? And then the whole heart of this will come down to increasing developer productivity and basically changing the way flight software is developed against traditional hardware-based workflows. So what are some of the aspects here that may be of interest to discuss when it comes to essentially the goal of how do we take this and look at it from the space industry perspective?
Dr. Mathias Holm (Guest):
So, with shift-left, for those not familiar with the term, what we mean is that we can move various product phases left on the timeline. So we move it earlier, and therefore left is earlier because at least we, people who say this, we read from left to right. I know that's not the case everywhere, but that's the meaning of it. So we move things to the left, and you have your graph, time zero starts at the left of it, and then it goes forward to the right. Now, the industry has, at least what we would call the OldSpace industry — NewSpace is more agile, of course — but especially OldSpace is and has traditionally been very waterfall-oriented. So we have these long projects, system requirements definitions, and then a review, and then we start working on the software, design first, and then it's reviewed, and then we start working on the software implementation, and so on. But when we need to test the software here, we typically need to have the hardware available. But when we started designing a spacecraft, often the hardware and the software you try to kick it off at the same time, but the actual testing work on the software, the real sort of validation of everything, you need to have the hardware available.
So, when we talk about shift-left here, we basically say that with an emulator, we know it can be made available earlier than the hardware. And one of the reasons for this is, firstly, that hardware is always a bit more challenging to develop, and so on. But with the software, we can much easier provide you with an early version of it.
It might not have models of the complete hardware. Maybe something is missing, maybe you don't have some external bus you would want to use. But the system that you get is at least useful for getting your software running on that subset of the hardware. And then we can provide more models of the different hardware components, and then you can successively test more and more of the software. And this is before we actually get real hardware. So we get this iterative sort of hardware availability.
One piece by one piece, we can fix issues in these models much easier than you can on hardware where it's done, and then you have an issue on it, there's a document saying errata, there's a bug on this processor, don't do this. But we can at least fix this earlier. So, it means that you can start working on the software, test it before the hardware is out.
And this has been successfully used in many industries for different purposes. I know one of the BSD versions, which is an operating system like Linux, was first developed in an emulator before there was hardware available for a specific processor that they wanted this to run on. So they basically were ready with the software, and then as soon as the hardware came out, they loaded it up and ran it, and it was working, yeah.
So, we move this work as early as possible in the timeline, which means we can hopefully also finish everything earlier and therefore meet our deadlines and also, you know, adding more flexibility, basically, to the project.
Narayan (Host):
Sticking to the theme of improving productivity and also learning from the traditional software engineering world, we do have practices like continuous integration, which is to summarize in a very quick fashion, trying to frequently merge code into a central repository, and techniques like, or strategies like continuous delivery, which in a very layman fashion would be making code available to anyone on the fly, as such.
There are some of these kinds of things that we can take inspiration from in the space world to be able to increase productivity in the end, right? So, how are emulators enabling in these kinds of new environments when it comes to CI/CD and so on, and automated testing for space missions where hardware isn't really always available? How do they behave here?
Dr. Mathias Holm (Guest):
Yeah, so exactly. And this is actually one of the areas where emulator-based solutions really shine. Because if you build the software for the final target, so whatever processor it may be that you put in the spacecraft, you need that thing available for you to test it. So if you have a weekly or you build it every time you add code to the system, without having the hardware, you get the binary, so you get the sort of continuous delivery, but it's not tested. So to test it, we need the hardware, as I said. So how, if the hardware is not available, clearly you cannot run it on the hardware.
And there are a few other issues there, like if the hardware is available, but you can have it, it can be very difficult to upload software. It might take a long time, maybe your software image is huge, and you need to copy a gigabyte of code to a flash memory. It can take time, so the testing time in the CI/CD system would increase.
But most importantly there is that if you have one piece of hardware for this, because radiation-tolerant computers are expensive, you would really have a limit on how many concurrent test jobs you can run after you have built something. So you can only test it once at a time. You need this, otherwise you impact all the timing behavior and everything.
So that is really where these emulated solutions shine. Because we can just, there's no limit on how many copies we can spin up at the same time, at least no technical limit. And then you can run the tests every time you build, commit code to the system. You spin off a build job, and it runs a test job in the emulator automatically. You get a test report showing a pass/fail or whatever.
Narayan (Host):
Thank you for taking the time today to explain all of this. So I have a final question to you as a closing note to this episode, and that's more playing the moneyball for the future here. Which is, as we see the trend in the industry is to have faster, cheaper, and at the same time more complex missions that are going ahead. Obviously, this creates a pressure that is backward to all the systems that are out there, including processors and through that, simulators and emulators, and so on, right? So, what are the biggest technical challenges that you see moving ahead in this current scenario that we see as is, and are there any specifics that Terma is doing that you can mention that is going to deal with such things?
Dr. Mathias Holm (Guest): Y
eah, so, yeah, we obviously have the general performance issues of these systems. So, can we keep real-time or how fast can we actually do the simulation of the flight computers? We discussed that a bit earlier. We get more processor cores, we get higher clock rates, and that all puts a pressure on what can we actually do with the simulators or the emulators on this.
Luckily, so our own emulator team, it already supports multi-threaded emulation of multi-core processors. So, we do get that if you add more processor cores, okay, we can just add more threads in the system. It scales quite well. We're not too worried about what we would call horizontal scaling of systems, where we get more processing units running in parallel.
The clock frequency increases; that's more of a challenge because that places constraints on the way that we actually, how do we simulate, do we need to do virtualization, and so on? And that is obviously something that we're looking at for the future.
We also have other work on sophisticated features to just optimize your own application-specific behavior. Like, programmable idle detection so we can sense that, okay, your software is now doing something useless, or you can specify when this happens so we can just skip ahead in time and say that this has no impact on the mission. There are lots of things that are like that, memory scrubbing and so on, typical examples of that, but, okay.
We mentioned code substitution, and that is also something which we can already do, but it's not very ergonomic. It's very difficult; you need to be this super geeky person to actually understand how to program this substitution engine. And that really needs to improve the sort of usability of these fairly sophisticated systems anyway.
Yeah, we already mentioned virtualization, but that's, yeah, the difficulty there is to maintain a certain notion of simulated time. And we do know how to solve that; it's, but it's a lot of work to get it done. So, we cannot use the built-in virtualization solutions that you have in Linux or whatever, but we really need to look at this from a fresh perspective.
Now, the other thing that we did mention was especially the debugging support, and especially these systems-level software issues people are looking at; it's something that we really need to solve. So, we actually started working on, firstly, a user interface to look at software on a systems level, but also to look at software over time. And hopefully, anything we do in that area to improve productivity will be a huge benefit to the end user.
Because when it comes to systems issues, we have sometimes seen that it can take a week to fix a bug or a month or even six months in some cases. It can be so tricky to figure out the exact reason. And in the end, you spend two months looking at an issue, and then it turns out the change to the program to fix the issue is adding two characters to it or something like that. Not the most productive if you look at typing, but, yeah, the difficulty is in the journey then.
So, if we have something that takes six months, if we can just provide tools then to shave this down to five months, four months, or three months, whatever it is. Anything there will save a large amount of time and ultimately money for the end user. Yeah, we're working on that. We have a graphical user interface for the emulator in the works for this. We never had a graphical interface because we integrated it into other systems. We had what's called a command-line interface, where you sit and type commands. It's very difficult to use. Software developers love it, but not necessarily when we look at systems issues. This is something we're looking at.
We're also seeing another thing is that we're moving to other architectures. We used to have SPARC processors, at least in the European space sector. We're quickly now moving towards new architectures like RISC-V. So, this is obviously also being added. And when we look at complexity, we sometimes have missions that might have multiple processors. And they might be heterogeneous. So, we have a processor of one architecture, a SPARC processor, and then there's another one, a PowerPC or an ARM processor or a RISC-V or something like that. And we know that many emulators that exist, they focus on simulating one of these. The advantage that we have actually is that we can simulate all of these in the same process, as long as we have models for the different processor architectures in place. So, that is something that we see work on, and also to improve how we integrate multiple of these systems in the same configuration of the emulator, basically.
Narayan (Host):
Mathias, this conversation has been really, one, relaxing, and I guess that's because as a Product Owner, you really come through with the information that is needed, not just from a perspective of somebody at the bottom of the pyramid doing software engineering, but actually being the Product Owner at the end. And I guess the perspectives that you actually provided from the level of a Product Owner, I think, were really interesting in this episode.
So I thank you very much again for being a part of this episode and giving us so many interesting nuggets of information and making it also easier for people who are probably non-experts to also understand the overall landscape at the end.
And so for anybody that is out there, we will leave your credentials if somebody wants to contact you further to get some advice or more information on the Terma emulator itself. And so thank you so much again for appearing as a guest here.
Dr. Mathias Holm (Guest):
Thank you very much. It was a great pleasure to be here, and I said initially I hoped I wouldn't be too low-level on the details so I get people lost. I hope I managed to accomplish that goal at least.
Narayan (Host):
Oh, you absolutely did.
Dr. Mathias Holm (Guest):
Thank you.
Narayan (Host):
For more information on the Terma Emulator, you can go on to their satsearch profile and find all the relevant information, including their data sheets and detailed description of the emulator itself.
Narayan (Host):
Thanks for joining me today for another exciting story from the space industry. If you have any comments, feedback, or suggestions, please feel free to write to me at info@satsearch.com. And if you are looking to either speed up your space mission development or showcase your capabilities to a global audience, check out our marketplace at satsearch.com.
In the meantime, go daringly into the cosmos, until the next time we meet.
