Starship Reliability | Monte Carlo Simulation

How the simulation works

An individual launch is simulated as follows:

After launching, the booster may attempt a catch. If it does not attempt a catch, it is lost. Not attempting a catch could represent a failure after stage separation (as in Flight 2), an off-shore divert (as in Flight 6) or a last minute abort. Basically any scenario where the booster is lost, but the tower is safe. The probability of attempting a catch is defined as a percentage in Percent chance of booster catch attempt being made. If a booster catch is attempted, there are 2 possible outcomes:

The catch is successful, resulting in no lost boosters and no pads being damaged.
The catch fails, resulting in one booster being lost and one pad being damaged.

The probability of a successful booster catch is defined as a percentage in Percent chance of booster catch attempt being successful.

After launch, the ship may attempt a catch. If it does not attempt a catch, it is lost. Not attempting a catch could represent a failure during ascent (as in Flights 2, 7 and 8), a failure on-orbit that precludes re-entry (as in Flight 9), a failure during re-entry (as in Flight 3), an off-target re-entry (as in Flight 4), an abort before its flip maneuver or a last minute abort. Basically any scenario where the ship is lost, but the tower is safe. The probability of attempting a catch is defined as a percentage in Percent chance of ship catch attempt being made. If a ship catch is attempted, there are 2 possible outcomes:

The catch is successful, resulting in no lost ships and no pads being damaged.
The catch fails, resulting in one ship being lost and one pad being damaged.

The probability of a successful ship catch is defined as a percentage in Percent chance of ship catch attempt being successful.

So for a given launch, the best scenario is that no vehicles are lost and no pads are damaged, and the worst scenario is that both the booster and the ship are lost and 2 pads are damaged.

When a launch campaign is simulated (the number of launches in the campaign being specified by Number of launches), and the total number of lost boosters, ships and damaged pads is calculated for that specific launch campaign.

When you press Run Simulation, 10000 launch campaigns are simulated and their results collected. The graphs then show the distribution of results. That is, the first graph shows, out of the 10000 simulated campaigns, how many resulted in 0 pads being damaged, how many resulted in 1 pad being damaged etc. It's a pretty basic monte carlo simulation, intended to give you an intuitive feel for how adjusting certain variables (number of flights, success rate of tower catches etc.) makes outcomes more or less likely.

Note: Changing the settings for Number of launch pads, Number of boosters and Number of ships does not actually change the simulation in any way. All it does is change the colours of the bars in the distribution chart. If there are zero failures, that's an ideal case so the bar is green. If there are some failures, but not enough to “run out of vehicles/pads”, that's less than ideal, but the campaign can still be completed, so the bar is orange. If there are enough failures to “run out of vehicles/pads” before the campaign completes then you can't complete the campaign and are not going beyond LEO today, so the bar is red. Purely cosmetic, you can interpret it as you see fit.

Limitations and considerations

The simulation is very simple, just percentage chances of events happening. No physical simulations.
There is no distinction between an early catch abort (like the off-shore divert Flight 6) and a last moment abort. If a catch is attempted in the simulation, there is risk to the pad. You should read Chance of attempting a catch as the probability of getting to the last moments of a catch attempt where pad damage is a possibility. Lower values here can simulate more early aborts that do not risk pad infrastructure.
There is no way to explicitly specify the probability of a failure prior to stage separation (as in Flight 1) as the booster and the ship are considered separately. However, the outcome of this scenario (both vehicles lost but no pads damaged) is a possible outcome of the simulation.
Similarly, there is no way to explicitly specify the probability of a failure on the pad (similar to AMOS-6). Again, the outcome of this scenario (ship and booster lost and one pad damaged) is a possible outcome of the simulation.
The simulation doesn't consider the success of the refueling mission. There is no way to model a scenario where the ship is unable to complete its refueling mission and the mission must be re-flown. If you want to account for this, just increase the number of launches.
The simulation does not consider the amount of damage a failed catch attempt could do to the launch infrastructure. There is no distinction between “a last second divert resulting in a crater next to the pad" and "a destroyed tower”. I want to avoid an ambiguous “damage percentage” or “days of repair needed” metric that isn't grounded in reality. You are free to interpret the nature of “Damaged Pads” however you wish.
There is no simulation of how quickly SpaceX can fix damaged pads, or replace destroyed vehicles, but as the number available of pads and vehicles is not considered by simulation itself, this doesn't change anything.
There is no simulation of correlated or clustered failures. An issue causing several failures in a row can't be modelled here as each launch is independent. This limitation may bias results to be more optimistic than reality, as certain failure modes are excluded.
There is no simulation of learning and improvement over the course of the campaign, as the reliability probabilities are fixed over the course of a given campaign.
The booster and the ship catch attempts are independent events. While SpaceX currently has ~1 pad, they are building multiple pads at each of their launch sites, so the simulation implicitly assumes that even if the booster damages a pad, there is another one ready to catch the ship. Pad attrition is captured in the “Number of damaged pads” distribution.
There is no simulation of the time window the launches need to occur in. Fuel boil off rates, launch window constraints and turnaround time between launches is almost entirely speculative at this point, so excluding them from the simulation simplifies things massively. This simulation is focused on the risks of repeated catches to vehicle and pad infrastructure. If you want to simulate turnaround times between launches and how that impacts the number of refueling launches needed, you're in the wrong simulation.
There is no simulation of specific launches of fuel depots or the departing beyond LEO starship. These are special cases as involve a launch and potential booster catch, the ships themselves do not return to the launch pad for a catch. The simulation effectively assumes these ships are already in place at the start of the launch campaign.

Interesting things to simulate

Well, I think they're interesting

The default values for the simulation are a wild guess at a single launch campaign (eg. sending a HLS Starship to the Moon or a single Starship to Mars). You could also simulate larger campaigns, like Musk's stated intention to send 100 starships to Mars in the 2030 launch window or 500 in 2033. Just increase the Number of launches to 500 * number of launches needed to send a single Starship to Mars.
Tweaking the values of Chance of attempting a catch and Chance of a catch succeeding separately can allow you to simulate different scenarios:

Decreasing Chance of attempting a catch can simulate SpaceX being more cautious with catch attempts to reduce the risk of a failed catch damaging launch infrastructure.
Increasing Chance of attempting a catch can simulate a more gung-ho approach, attempting catches even when the conditions aren't ideal.
Increasing the Chance of attempting a catch while increasing the chance of Chance of a catch succeeding can simulate improving overall reliability of the system.
Increasing Chance of attempting a catch while decreasing Chance of a catch succeeding simulates a case where catch attempts are inherently risky, resulting in more pads being damaged.
Setting Chance of attempting a catch to 0% effectively simulates launching in expendable mode, as there will never be a catch attempt (and thus, no risk to launch infrastructure). Really, that just means the number of lost vehicles will be equal to the number of launches. Not very interesting to visualise, but you can simulate it if you want!
Setting Chance of a catch succeeding to 100% allows you to simulate an alternate landing approach that does not risk damaging the launch pad (eg. using legs, rather than a tower catch) because this effectively sets the chance of damaging a pad to 0. You can then use the Chance of attempting a catch to set the probability of the alternative landing approach succeeding. If catches never damage the pad, then Chance of attempting a catch effectively becomes the probability of recovering the booster, with no infrastructure risk.

Decreasing Number of launches can represent improved Starship performance (lower dry mass, higher thrust, higher I_SP, increased payload to orbit etc.) as fewer flights would be required in this case. The inverse is also true. This is the key variable if you are speculating about the performance and capabilities of Starship.
Try inputting the current Falcon 9 reliability figures. Falcon 9 is currently the only operational, re-usable launch vehicle and successful landings of the Falcon 9 first stage have become routine occurrences, though failures occasionally happen, like CRS-16's 2018 offshore divert, Starlink 8-6's 2024 landing failure, Starlink 12-20's post landing failure, or Starlink 9-3's 2024 upper stage failure. Calculate the landing/catch attempt and failure probabilities as you see fit, then simulate how that would work out with a Starship launch campaign!