Sleep Stage Comparisons: Oura vs. Polar
Our sleep comparison article series has so far looked into comparing HRV readings between wearables during sleep (check out our last article comparing Apple Watch and Oura HRV here). We will now start to also take a look at hypnograms or sleep stage data, another super interesting sleep-tracking metric that more and more wearables are able to provide.
TL;DR
- Oura and Polar sleep stage data agree for light sleep stage times and the wake-up time quite well, but there is disagreement for REM and Deep sleep stages, which makes sense as they are more difficult to determine.
- Oura overestimates Deep sleep stages and Polar records sleep latency readings that are far too low.
- Polar records a lot of interruptions in the sleep stages throughout the night (switching back to the awake state) which are likely not real interruptions. This was only seen for Oura in a few cases at the start of the sleep session which could be attributed to a true interruption before fully falling asleep.
- Both devices determine the switch from sleeping to awake quite well and at the same time. The accuracy of this reading is important for people to gain insight into what sleep stage they actually woke up from.
For a quick overview of sleep stages, most wearables divide sleep stages into the following: Awake in bed, Light Sleep, Deep Sleep, and Rapid Eye Movement (REM) Sleep. The light sleep stage is where you're not far off being awake. You would likely be woken up easier in this stage and not feel as groggy. The Deep Sleep stage is where the body is thought to undergo physical and mental repair and get you ready for the next day. REM sleep is typically associated with dreaming. REM sleep is actually a deeper level of sleep than Deep Sleep, but it is pretty common for people to jump between REM sleep and light sleep stages. Being awoken while in deep sleep or REM sleep usually causes the classic groggy wake-up. Once you go through all stages of sleep, a sleep cycle is completed and the cycle repeats (this is a generalization, and cycles obviously have some variation). Cycles on average last 1.5hrs and people go through about 3–4 sleep cycles per night (this is of course dependent on the individual). On average, a person spends around 45–50% of their time sleeping in Light Sleep, 20–25% in REM sleep, and 15–20% in Deep Sleep.
In medical settings, comprehensive sleep studies (polysomnography) are performed and include a hypnogram with stages determined using EEG/brain activity data. For example, Deep sleep is associated with low frequency, high amplitude fluctuations in the EEG signal. This device allows you to obtain accurate sleep stage data due to the EEG readings, but it does have some shortcomings and users are not comfortable sleeping with it so results could be skewed, making a wearable a great option for sleep stage readings.
Most wearable devices deduce which sleep stage you are in by utilising the wearables' accelerometer data to check your movement in bed. This is then combined with other metrics that are also known to be impacted by different sleep stages. In deep sleep, your heart rate, respiration rate, and HRV decrease (HRV does eventually increase as your body recovers overnight as expected). You also move a lot less than in other stages. For REM sleep, your heart rate, HRV, and respiration are seen to increase instead, likely a result of dreaming. Breathing rate variation is also seen to lessen as you go through successive sleep stages and HRV fluctuations are indicative of changes between the different stages of sleep.
Now that we know a little about sleep stages and how they're measured by wearables, back to our Oura vs. Polar sleep stage comparison. A Terra team member wore an Oura ring Gen 3 and a Polar Unite to bed to test the sleep stage data of these wearables. Let's take a look at the data for a couple of nights below.
When looking at the above data, we see some places where the stages match and others where there is greater fluctuation and deviation between Oura and Polar. Given that there is no obvious way to determine sleep stage, for the wearables to match occasionally without EEG data is pretty impressive. Given the novel nature of wearable sleep stage data, we can't expect these trends to perfectly match for shorter time intervals.
We can see that for the first hour of sleep, both Oura and Polar pick up an interruption during sleep. This was likely a real interruption considering it happened early on and both wearables recorded it. It is interesting that both devices note the team member went straight into the deep sleep stage, which may be a result of fatigue (they did sleep at 3AM after all). Both devices pick up a switch to the light sleep stage quite well for most of the night (between 4AM - 5AM and 6AM - 7AM). The light sleep stage is likely easier for a device to determine as it is associated with more frequent movement while asleep. We do see some places where the stages don't agree, including between 5AM - 6AM where Polar notes this as mostly light sleep and Oura notes it as mostly deep sleep. This may be due to differences in their algorithms (prioritising different metrics to determine stage) or due to sensor readings. It is difficult to say which of these is actually providing the correct reading for this time frame. Both devices do a good job of picking up when the user begins waking up (easier to do this from the accelerometer data alone).
Looking at the differences for the whole night, we can see that Polar picks up a lot of switching to the awake state (the user has no memory of waking up during this late sleep session). Switching to the awake state during sleep is labeled an interruption. Since Oura does not also pick up interruptions when tracking sleep, this is likely erroneous readings from Polar's side. We can also see that Oura records a lot more deep sleep than Polar on the whole while Polar notes more REM sleep. We can better see these duration differences in the graph below.
We can see here that the duration for light sleep is closer between the wearables as compared with other sleep stages, which we saw from the hypnograms. It is obvious from the graph that there is a significant difference in REM and deep sleep estimates. Oura has a large amount of deep sleep recorded (great for recovery if that's true). Over 2.5 hours worth of deep sleep, or 56% of the sleep time. Oura records a much lower amount duration for REM with 11% of sleep time, which is quite low. Polar disagrees with both of these extremes and has roughly the same amount of REM and Deep sleep (25% each). Polar's estimates here are more in line with what is expected for a night of sleep. However, Polar's sleep latency estimate is too low at 20 seconds (doesn't appear on the graph), which is physically not true. Oura estimates the sleep latency at around 10 minutes which seems more reasonable (meaning Oura picks up when you actually start sleeping better). We noticed this last point when comparing Oura vs. Polar's heart rate readings during sleep as well. Take a look at one other night of sleep data below.
As we can see, trends for this night are similar to the previously shown day, but with more fluctuation from both devices for the Hypnogram graph (possibly indicating restless sleep). We can again see Oura's overestimate of the deep sleep stage and lower REM estimate, Polar showcasing too many awake state interruptions that are likely not true, and Polar's far too low sleep latency readings. The light sleep stage agrees less in these readings, likely due to the increased fluctuation, and Polar estimates a very low duration for Deep Sleep (50 minutes or only 17% of total sleep time).
Both devices do a good job detecting an actual interruption between 4AM - 5AM at as well as the person waking up around 10AM, but Oura records this as waking up from light sleep instead of from REM as Polar has.
We can see from this comparison that while sleep stage data is difficult to measure using wearable devices, the fact there is some agreement in trends is quite interesting, and perhaps these readings may one day get closer to EEG-derived hypnograms in terms of accuracy. There is still some disagreement in the data though, and it's pretty hard to tell which device is estimating the right stage in this case. This is especially apparent for REM and Deep Sleep stage estimates (which makes sense considering they're the most difficult stages to determine as well as differentiate from one another).