From Data to Display: A Complete Tutorial on the Kaplan-Meier Survival Curve

CLYTE research team
Nov 10
7 min read

Updated: 6 days ago

In the world of biomedical and clinical research, we often need to answer questions about time. How long until a patient relapses? How long does a new treatment extend life compared to a placebo? This is called time-to-event analysis, and the gold standard for visualizing it is the Kaplan-Meier survival curve.

This powerful tool, first published by Edward L. Kaplan and Paul Meier in 1958, is a non-parametric method used to estimate the probability of survival over time. Unlike other methods, its true power lies in its brilliant way of handling the biggest challenge in clinical studies: incomplete data.

This tutorial will guide you through what a Kaplan-Meier curve is, why it's essential for handling "censored" data, how to calculate and plot it step-by-step, and how to interpret what it's telling you.

You might also be interested in this IC50 v EC50 article!

Ask Sophie to walk you through the Kaplan-Meier Survival Curve creation!

What is a Kaplan-Meier Survival Curve?

At its core, a Kaplan-Meier curve (also known as the "product-limit estimator") is a graph that shows the probability of an event (like death, disease relapse, or machine failure) not happening over a period of time.

You read the plot like this:

Y-Axis: Represents the survival probability, starting at 1.0 (or 100%) at the beginning (Time 0), when no one has experienced the event.
X-Axis: Represents time (e.g., days, months, or years).

The line on the graph is a "step function." It stays flat for periods where no events occur and drops down vertically every time one or more subjects experiences the event. A steep, rapid drop suggests poor survival (a high event rate), while a flat, slow decline suggests better survival.

The Core Challenge: Understanding Censored Data

Why not just calculate the percentage of people who survived at 1, 2, and 3 years? The answer is censored data.

In a perfect study, you would follow every single subject until they all experienced the event. This is almost never possible. "Censoring" occurs when we lose contact with a subject or the study ends before they've had the event. We have incomplete information.

There are two main types of right-censoring, which the Kaplan-Meier method is designed to handle:

Lost to Follow-up: A patient moves to another country, stops responding to calls, or withdraws from the study. We know they survived up to the point we last saw them, but we don't know what happened after.
Study Ends: The study's funding runs out after 5 years, and many patients are still alive and event-free. We know they survived at least 5 years, but we don't know their total survival time.

Ignoring this censored data would deeply bias the results. The Kaplan-Meier method cleverly incorporates these subjects. It counts them as "at risk" of the event until the time they are censored, at which point they are removed from the calculation. The curve does not drop for a censored subject, but the "at-risk" pool for the next time point gets smaller.

A Step-by-Step Tutorial: Calculating the Curve

The calculation is an iterative process of multiplying probabilities, which is why it's called the "product-limit" estimate. Let's walk through it.

Step 1: Sort Your Data

First, you need three pieces of information for every subject:

Time: The total duration they were in the "at risk" pool.
Status: A binary code: 1 = Event Occurred (e.g., death) or 0 = Censored (e.g., lost to follow-up).
Group: (Optional) The group they belong to, like "Treatment" or "Placebo".

You then sort all subjects by their time, from shortest to longest.

Step 2: Build the Survival Table

This is the engine of the calculation. You create a table that tracks the subjects at each time point an event occurs.

Time (t): The specific time an event happens.
Number at Risk (n): The total number of subjects who are still in the study and "at risk" just before time t.
Number of Events (d): The number of subjects who had the event at time t.
Number Censored (c): The number of subjects who were censored at time t.

Step 3: Calculate Interval Survival

At each event time t, you calculate the probability of surviving past that specific interval.

Interval Survival Probability = (n - d) / n

For example, if 20 people are at risk (n=20) and 1 person has the event (d=1), the probability of surviving that interval is (20 - 1) / 20 = 0.95 or 95%.

Step 4: Calculate Cumulative Survival (The "Product-Limit")

This is the final, most important step. The overall survival probability at any time t, denoted S(t), is the probability from the previous time point, S(t-1), multiplied by the new interval survival probability.

S(t) = S(t-1) * [ (n - d) / n ]

You repeat this process all the way down:

Time 0: S(t) = 1.0 (100% survival)
First Event: S(t) = 1.0 * (Interval Survival 1)
Second Event: S(t) = (Result from First Event) * (Interval Survival 2)
...and so on.

The resulting S(t) values and their corresponding Time values are what you plot to create the iconic "step" graph.

How to Interpret a Kaplan-Meier Plot

Now that you have the plot, what does it mean?

The Steps: Remember, the curve only drops when an event happens. The height of the vertical drop represents the proportion of subjects at risk who experienced the event at that time.
The Tick Marks: You will often see small tick marks (like + or |) on the horizontal lines. These indicate the time points where individual subjects were censored.
Median Survival Time: This is one of the most-cited metrics. To find it, draw a horizontal line from the 0.5 (or 50%) mark on the Y-axis. The time value on the X-axis where this line first intersects the survival curve is the median survival time. It's the time at which half of the study population is expected to have survived. If the curve never drops to 0.5, the median survival is not reached.

Comparing Two Groups: The Log-Rank Test

The real power of survival analysis comes from comparing curves. Did the "New Treatment" group survive longer than the "Placebo" group?

You can plot both curves on the same graph. If the treatment curve is consistently above the placebo curve, it suggests a survival benefit. But is this difference "real" (statistically significant) or just due to random chance?

To answer this, you use a statistical test. The most common is the log-rank test.

Null Hypothesis (H0): The log-rank test assumes there is no difference in survival between the two groups.
The p-value: The test produces a p-value. A low p-value (typically p < 0.05) gives you evidence to reject the null hypothesis, suggesting that a statistically significant difference exists between the survival curves.

Other tests, like the Breslow or Tarone-Ware tests, are also available, and more advanced models like the Cox Proportional Hazards model can provide a Hazard Ratio (HR), which quantifies how much more likely one group is to experience the event than the other.

Practical Application (Tools & Assumptions)

You don't need to do these calculations by hand. They are standard features in statistical software. The sources for this article highlight several:

GraphPad Prism
SPSS
R (using the survival package and survfit function)

However, to use the test correctly, you must meet its assumptions. The most critical one is the assumption of independent censoring. This means the reason a subject is censored must be unrelated to their risk of the event. (For example, if patients are withdrawn from a study because they are getting sicker, this assumption is violated, and the results will be biased).

Learn more protocol and statistical procedures!

Kaplan-Meier Curve Frequently Asked Questions (FAQ)

How do you interpret the Kaplan-Meier survival curve?

Interpreting the curve is a straightforward process:

Check the Axes: The Y-axis is the probability of survival (from 1.0 at the top to 0.0 at the bottom). The X-axis is time (e.g., days, months, years).
Follow the Line: All subjects start at Time 0 with a 100% (1.0) survival probability. The line stays flat as long as no events occur.
Watch the "Steps": The curve drops down in a "step" every time one or more subjects experience the event (e.g., death). A large, sudden drop means a high number of events occurred. A long, flat line means a good period of survival.
Look for Tick Marks: Small vertical tick marks (| or +) on the line indicate censored data. This is a subject who was lost to follow-up or the study ended while they were still event-free. The curve does not drop for a censored subject, but they are removed from the "at risk" pool, which affects the size of the next drop.
Find Median Survival: To find the median survival, draw a horizontal line from the 50% (0.5) mark on the Y-axis. The point where it hits the curve corresponds to the median survival time on the X-axis.

What is the difference between Kaplan-Meier and the survival curve?

Think of it like this: "Survival curve" is the general name for any graph that plots survival over time. The "Kaplan-Meier curve" is the name of a specific, non-parametric method or estimator used to create that graph.

The Kaplan-Meier method is the most popular way to generate a survival curve because it is specifically designed to correctly handle censored data (incomplete observations), which is extremely common in real-world clinical and biomedical studies.

How to calculate 5-year survival from Kaplan-Meier curve?

This is a common and simple way to read the plot:

Find the "5-year" mark on the X-axis (time).
Draw a vertical line straight up from that point until you hit the Kaplan-Meier "step" curve.
From that point on the curve, draw a horizontal line straight to the left until you hit the Y-axis (survival probability).
The value on the Y-axis (e.g., 0.65) is your 5-year survival probability. In this example, it would mean there is a 65% probability of survival at 5 years.

How to do a survival curve?

While the underlying math can be done by hand for small datasets (as shown in the tutorial), the standard way to create a Kaplan-Meier survival curve is with statistical software:

Collect Data: For each subject, you need two key variables: Time (the duration they were followed) and Status (a binary code, e.g., 1 = Event occurred, 0 = Censored).
Input Data: Enter this data into a program like GraphPad Prism, SPSS, or R.
Run the Analysis: Select the "Survival Analysis" or "Kaplan-Meier" function in your software. You will assign your "Time" variable and your "Status" variable. If you are comparing groups (e.g., Treatment vs. Placebo), you will also assign your "Group" variable.
Generate Output: The software automatically performs the product-limit calculations and generates the Kaplan-Meier curve plot. It will also typically provide a table with median survival times and, if you are comparing groups, the results of a log-rank test (with a p-value) to tell you if the difference between the curves is statistically significant.

References

From Data to Display: A Complete Tutorial on the Kaplan-Meier Survival Curve

What is a Kaplan-Meier Survival Curve?

The Core Challenge: Understanding Censored Data

A Step-by-Step Tutorial: Calculating the Curve

Step 1: Sort Your Data

Step 2: Build the Survival Table

Step 3: Calculate Interval Survival

Step 4: Calculate Cumulative Survival (The "Product-Limit")

How to Interpret a Kaplan-Meier Plot

Comparing Two Groups: The Log-Rank Test

Practical Application (Tools & Assumptions)

Kaplan-Meier Curve Frequently Asked Questions (FAQ)

Recent Posts

Let' Connect!

Scrap the Tip: How CytCut Is Eliminating ~200 Tons of Laboratory Plastic Waste

MTT Assay Protocol: Guide to Measuring Cell Viability & Proliferation

MTT Assay vs. Live/Dead Staining: Your Guide to The Right Cell Viability Assay

CytCut 3.0: Wound Healing Assay Tool

Subscribe for Updates