A Thorough Breakdown of the “Extreme Volume Study”

The authors of the "extreme volume study" break down their findings and the real-world application, and respond to critiques of the original paper.

Additional writing and research by Mike Israetel

Note from Greg: A couple months ago, there was a big discussion in the “evidence-based” side of the fitness industry about the limits of training volume. To a point, more volume tends to lead to more muscle growth, but logically, there must be a limit. One of the studies that the discussion focused on was this one, authored by Stronger By Science coach Cody Haun and the research team at Auburn University. Since there was so much chatter about the study, and since many of the important details of the study went overlooked, Cody wanted to write this article to set the record straight.

…

Our recent study examining “extreme-volume” resistance training and graded whey protein supplementation was published in Frontiers in Nutrition, and we decided to write this article together to encourage consideration of proper caveats and in hopes that readers could benefit from further clarification. Furthermore, a number of critiques and questions have been posited regarding the study design and results that we feel are appropriate to address for clarification. For example, it has been questioned whether or not this was indeed the highest volume training study in a six-week timeframe to date, as described in the manuscript. Other critiques relate to the type of lean body mass (LBM) gained (e.g., fluid versus tissue) and concerns about the design of the training protocol. Considering this, the outline of this article is as follows:

a.) Study design, purpose, and summary of results

b.) Addressing critiques and comparison to other studies

c.) Insights beyond the publication

d.) Implications

Importantly, the manuscript is open access, and data are freely available for your own inspection or analysis. More detailed information and explanation of study methodology can be found there, as this article will primarily focus on aspects pertaining to the outline above with practical implications highlighted and insight offered beyond the published work.

Study design, purpose, and summary of results

Briefly, the purpose of the study was to investigate the effects of training volumes higher than previously investigated in a six-week timeframe (i.e. ~ one mesocycle) on muscle growth in resistance-trained young men. Additionally, we wanted to examine if increasing the amount of whey protein consumed as training volume increased improved muscle growth responses compared to a steady dose of whey or maltodextrin. Subjects were screened prior to enrollment in the study, and they reported about five years of training experience on average. Upon initial screening, subjects included in the study could barbell back squat ~1.75x bodyweight on average (based on stringent 3RM testing during the screening process). The training protocol was purposefully unique to this study, which we’ll discuss more below. Each set was programmed to be completed at 60% 1RM, and the exercises employed in the study, along with the set x rep configurations programmed each week, are shown in a manuscript figure. However, another way of showing this data for the purposes of this article is separating sets for each exercise per week for the upper body and lower body, and based on the muscles emphasized. These examples are shown below.

*This is only an example of categorizing exercises emphasizing specific musculature. This could be done differently, depending on the criteria you use. We categorized based on muscles lengthening and shortening the most during each exercise, relative to their resting lengths.

We used multiple advanced laboratory techniques to measure muscle growth in order to better characterize “true” muscle growth from both a regional (e.g. individual muscle, limbs, etc.) and whole-body perspective. These included: a) dual-energy x-ray absorptiometry (DXA) for estimates of lean body mass (LBM), b) ultrasound (US) of the biceps brachii muscle and vastus lateralis muscle for measurements of muscle thickness, c) bioelectrical impedance spectroscopy (BIS) for assessments of body water, and d) muscle biopsies of the vastus lateralis muscle for assessment of muscle fiber cross-sectional area (fCSA). We also worked with a registered dietitian to provide intricate nutrition programming to all participants in an attempt to ensure participants were consuming enough calories to facilitate a moderate calorie surplus and reasonable macronutrient values (i.e., ~500 calories above estimated daily energy expenditure). Please see the open access manuscript here for more detail.

Based on inferential or frequentist statistics, there were no significant differences between groups in proxies of muscle hypertrophy (more on what “true hypertrophy” means from our perspective below). However, effect size calculations revealed the group of subjects consuming the graded dose of supplemental whey (GWP) gained the most lean body mass and lost the most fat mass according to DXA. Since there were only ~10 subjects per group and the study was relatively short in duration, more research is necessary to clarify if graded whey supplementation is more effective than traditional supplemental approaches.

One of the most interesting findings was the apparent continued hypertrophy response (based on DXA LBM) to training past previously investigated volumes. That is, it didn’t seem that subjects clearly surpassed their maximal recoverable volume (MRV), on average. This suggests subjects were still training below their maximal adaptable volume (MAV). Albeit, there was quite a bit of heterogeneity in responses, and this interpretation depends on the measurement (more below). Considering this, and based on a variety of interpretations we’ve heard or read recently, we’d like to focus this article on the overall effects of training regardless of group and dedicate much of the rest of the article to the overall response of the cohort to the training protocol with appreciation for the heterogeneity in responses and limitations of various measurements.

Addressing critiques and comparison to other studies

Critique 1: “60 % 1RM at 3-4 RIR per set isn’t a very high intensity. This likely doesn’t reflect intensities used in the practical setting by serious trainees looking to build muscle.”

Counterpoint 1: At this point, the evidence is clear that hypertrophy-focused programs’ effective rep and load ranges can be quite flexible, so long as sets are taken to failure or near failure. For example, Schoenfeld et al published a meta-analysis in 2017 showing no significant difference between hypertrophy outcomes comparing studies utilizing ≤60 % 1RM and >60 % 1RM. Utilizing a variety of loading and rep structures organized in a logical manner probably maximizes muscle growth in the long-term to ensure both faster and slower twitch fibers are properly stimulated. Although fiber-type specific hypertrophy based on specific training doses is still contentious, based on available evidence and principles of neuromuscular physiology, we suspect preferential growth of fibers can occur based on training stimuli. In support of this thesis, a recent blood flow restriction (BFR) study in well-trained powerlifters reported significant type I fiber hypertrophy, but not type II fiber hypertrophy, from practical BFR using ~30% 1RM only for a few weeks out of a six-week block of training. This is the first layer of evidence supporting the initial selection of 60% 1RM across lifts and weeks in the study, although we’ll discuss other reasons more below.

Counterpoint 2: Acute measurements of muscle protein synthesis (MPS) in response to completion of resistance exercise at different percentages of 1RM have also been investigated by various labs, and these data are visualized below supporting 60% 1RM as sufficiently heavy to realize significant increases in MPS.

From Burd et al., 2012

From Poortmans, 2016

From Kumar et al., 2009

Perhaps the best example is from Kumar et al (2009), wherein the authors reported no significant differences in myofibrillar protein synthesis rate increases between 60%, 75%, and 90% 1RM with sets not being taken to failure (shown above).

Counterpoint 3: Multiple lines of evidence suggest that you don’t need to train all the way to failure to realize significant increases in muscle size or strength (Martorelli et al, 2017; Izquierdo et al, 2006; Nobrega & Libardi, 2016).

Beyond these data, since the programmed volumes were very high and primarily barbell movements were utilized, we felt the need to attempt to improve the safety of participants without sacrificing the muscle growth effects of training by avoiding the requirement for subjects to reach failure. However, we felt this was a fair tradeoff as the programmed loads and structure of the protocol were likely sufficient to realize similar muscle fiber activation patterns. Electromyography (EMG) is a method used to measure the electrical activity of a muscle. It is commonly used in exercise science research to gain an understanding of the extent to which a muscle is active during an exercise, although it possesses certain limitations and should be interpreted carefully. A study from Sundstrup et al in 2012, which utilized EMG to measure muscle activity during sets taken to failure, provides more support of why we didn’t feel the need for subjects to reach failure. Sundstrup et al (2012) reported normalized EMG plateaus at ~3-5 reps from failure, which suggests no additional total fiber recruitment past this point, although fiber rotation or select fibers may have received more activation (although we can’t know that for certain). In Sundstrup et al’s own words: “Furthermore, a plateau of high level of muscle activity was reached at approximately 10–12 repetitions of the 15 RM, indicating that a maximal level of EMG can be reached 3–5 repetitions before failure (Figure 1).”

From Sundstrup et al., 2012

Indeed, the subjects in our study generally reported that their reps in reserve fell within this range (~3-5 RIR, on average).

Counterpoint 4: Finally, one of the most well-established relationships between training variables and hypertrophy relates to training volume. Although we’ll address volume in more detail below, for now, this can be thought of as either challenging sets per muscle per week or tonnage per exercise per week. See example plots redrawn below and James Krieger’s recent article on the topic for more detail.

Redrawn from Schoenfeld et al., 2016

Basically, the more training volume completed, to a point, the more resultant muscle growth.

Thus, we elected to increase training volume each week through the addition of sets while holding load steady at 60 % 1RM for this study. We didn’t just hold intensity constant for shits and giggles or because we thought volume-only progressions were going to be the most effective. We did it, in part, to test if volume-only progressions could be effective, and if so, how effective. Stated differently, holding the load constant from week to week allowed for the examination of the growth effect of adding sets, without an interaction of load and set addition, to better clarify the effects of this programming strategy. Also, given the logistical difficulty of monitoring the safety of participants and accurate loading, standardizing reps per set and load per set provided clear instructions and feasibility for us as a research staff and for the participants. Consider that 30 subjects were taken through the protocol and we only had five squat racks in our weight room. This took a concerted effort. Practically, studies need to be realistic to carry out, and we’re less defending the protocol here, and more admitting it’s not perfect but we’ll take any good data we can get. Perfect data doesn’t exist, and we moved forward with limitations in mind.

You might still be thinking that doing a single set of 10 at 60% 1RM on four different exercises isn’t very difficult. But, consider that doing over 5 sets per exercise per session and over 20 sets total in a week is a different story. We imagine that most people reading this who have completed 5-10 sets of 10 reps in a workout in the past can appreciate just how challenging such a protocol is. With both of us having worked under the barbell for over a decade now, we know that over 20 sets of 10 with 60% 1RM on the back squat is by no means easy. Although subjects consistently reported that they could have done ~3-5 more reps with good technique, they also reported being very fatigued, soreness ratings significantly increased over time, and we doubt similar volume loads could have been sustained using higher intensities after week 3 without adjusting load down during training sessions. Based on available data and practical experience, we were aware that for a single set, subjects could have likely completed >15 reps before failure at 60% 1RM. However, keep in mind that subjects were programmed to complete more than 4 sets per exercise after week 1, and finished week 6 at 12 sets of 10 reps per exercise during session 1, 8 sets of 10 during session 2, and 12 sets of 10 during session 3. From our practical experience and available scientific data, subjects can be notoriously bad at RIR estimates. So, it’s very possible they low-balled at the beginning of study and high-balled at the end.

Additionally, we piloted two iterations of the protocol on ourselves before having subjects complete it, and it was the most brutal six weeks of training we’d personally ever completed. Also, subjects were instructed to exert maximal force during each rep of each exercise, with the intent to better ensure higher threshold motor unit activation, and thereby stimulate greater fiber recruitment on a per-rep basis. So, you can call this training less than “optimally efficient” and many other things, but it was anything but easy. Also, it’s worth mentioning that yes, we fully expect recovery would have been much more difficult if the weights were heavier, weight was progressed, and RIR was lower. So, that should all be taken into account if you attempt to transfer the results of this program (specifically the sets done in the last week) to your own training.

Critique 2: “Other studies have investigated higher volumes.”

Recently, it has been argued that another study investigated higher volumes and that the claim this was the highest volume training study in this timeframe was technically not true, with specific reference to this 2015 study from Radaelli et al. However, based on our calculation of training volume as total reps for a given movement x load, the volumes investigated in our six week study were indeed higher in this timeframe. Importantly, as pointed out by Greg in his review of this article, this is likely less important than considering this calculation in light of subjects’ maximum strength and volume load expressed relative to percentage of 1RM: for example, 5000lb of volume from 60% 1RM versus 5000lb of volume from 90% 1RM. So, for a more informative comparison, expressing these calculations relative to the subjects’ 1RM can provide more insight into the relative difficulty of the training program and the expected adaptive outcome. In Radaelli et al’s study, the authors described the intensities of each set as “8-12 RM to concentric failure” and described load progression throughout the study by stating loads were increased by 5-10% in the next session once participants were able to hit 12 reps with a certain load. If we assume subjects lifted their 10 RM on average, this equates to ~75 % 1RM. So, relatively speaking, each unit of volume may have been more difficult to achieve in the Radaelli et al study, further confounding a direct comparison. For example, although total volume is still higher in our study, relative volume load in the Radaelli et al study was higher the first two weeks and comparable on week 3 – assuming subjects were lifting ~75% 1RM. Consequently, the apparent discrepancy depends on how one defines and expresses volume (e.g., sets per muscle per week vs. tonnage or volume load vs. volume load relative to percent 1RM). The study from Radaelli et al included 48 men and lasted six months. Participants were separated into three groups that completed either one set, three sets, or five sets of each programmed exercise three days per week. Radaelli et al reported volume from the first and last training session. The volume load from participants in the one-set and three-set groups were considerably lower than our study. But, although the five-set group surpassed the volume we investigated the first week of our study, the total volume within the first six weeks of their study was still much lower than ours with week 1 of their study being the only week that was higher.

In the five-set group (n = 13), Radaelli et al reported volume load as ~140,000kg during the first session of the study. The authors reported that the volume load from the last session of the study in the five-set group was ~160,000kg. So, if we assume ~150,000kg was completed each session on average throughout the first six weeks of training, subjects completed ~450,000kg of volume each week and, therefore, ~2,700,000kg of volume in the first six weeks of the study. Randomly sampling 13 subjects from our study, subjects completed almost double the volume in the six-week timeframe we investigated (i.e., ~5,000,000kg). The only week that was higher in the Radaelli study was week 1 (~450,000kg vs ~375,000kg). Beyond the Radaelli et al study, a few other studies have also investigated very high volumes but, based on our in-depth search of the literature while designing the study and writing the manuscript, none surpassed the volume we investigated. We’ve plotted these data for visualization below. When considering volumes relative to sets per muscle per week, a different interpretation arises. As pointed out by Menno Henselmans in this article, the Radaelli et al five-set group completed 30 sets for the biceps each week and 45 sets for the triceps each week, which is a higher set-per-week value than our study when defining volume in this way. At any rate, we’re simply pointing out that while each interpretation is technically true, understanding how volume load is computed and presented in a manuscript is important to better comprehend the relationship between dose and adaptive response whilst appreciating the limitations of each method.

Thus, based on our definition of training volume and comprehensive review of the literature, our study appeared to be the highest volume (i.e. total reps x weight for each exercise) investigated. Importantly, while the article was awaiting publication, Dr. Brad Schoenfeld and colleagues published a study that may have involved higher total volume loads, but training doses were reported as sets per exercise per week and not as total volume loads. To be clear, we’re not arguing that our study is superior to others or to toot our own horn. It’s simply meant to bring attention to the fact that we intended to investigate higher volume loads than had previously been investigated and that also influenced the design of the study and how the training protocol was set up. This training protocol was not designed to be optimal or to be implemented directly in the practical setting. Rather, the qualitative nature of the design and proximal doses of sets per movement or muscle per week were meant to help inform practice and better understand adaptive proclivities in this population of young, well-trained males. Indeed, to others’ points, tracking volume load in other ways like the number of sets per muscle per week or per movement per week is likely more feasible and accomplishes very similar goals compared to tracking “tonnage” or volume load. However, from a quantitative standpoint, the case can be made that tracking total volume load, volume load relative to body weight or 1RM, or other relative factors can allow for increased precision of overloading training or an improved understanding of dose-response relationships. Since it’s relatively easy to compute, it makes sense to track both in our view.

Critique 3: “The study was too short to draw confident conclusions.”

The study was designed with a “mesocycle” timeframe in mind. One of the intentions was to clarify upper limits of short-term RT dosing for hypertrophy intending to mimic the duration of a training cycle in the practical setting. So, we think this critique is not very applicable, as our research question wasn’t aimed at explicitly addressing longer-term interventions. However, we feel the results of the study can help people structure longer-term training programs, considering the implications of the study’s results. Certainly, longer-term training studies with a larger number of subjects will help clarify dose-response relationships in various subsets of the population.

Critique 4: “It looks like what the subjects actually gained was fluid.”

Indeed, body water measurements consistently increased over time and fCSA data, US data, and DXA data demonstrated different qualitative responses. However, the DXA LBM data shows a clear average increase when subtracting the change in extracellular water from PRE-MID (+1.2kg), with the remaining increase thought to be primarily due to intracellular water and lean tissue (e.g. muscle). From MID-POST, only ~+0.2kg was observed when removing the change in extracellular fluid. When expressing changes in body mass over time removing fat and total water, an average increase from PRE-MID of 0.7kg was observed, and an average decrease from MID-POST of 0.2kg for a total change from PRE-POST of 0.5kg on average. This seems to support the notion that subjects may have been training beyond their MRV after week 3 of the study, although this may be unnecessarily myopic considering muscle is ~75 % water and removing water altogether isn’t necessarily reflective of true “hypertrophy.” As Greg pointed out in the review of this article, it’s overly confident to argue too strongly that the increase in LBM was primarily due to water, considering the data collectively, especially since fCSA tended to increase mid-to-post. Since the US and fCSA data were from either the biceps brachii muscle or the VL muscle of the quad alone, those changes can’t necessarily be extrapolated to reflect the changes in all of the other muscles of the body. This is important to consider since the exercises employed emphasized musculature other than the biceps and VL (e.g. SLDL, bench press, OH press), and the DXA data provides a better picture of the whole-body response. Thus, the DXA LBM data are better reflective of the whole-body response, in our view. To address this critique using a different way of showing the data, we’ve plotted the raw changes in LBM, changes in LBM minus changes in extracellular water (ECW) potentially due to swelling or edema, and changes in body mass with changes in fat mass and body water removed.

Since the majority of the remaining body mass consists of bone, organs, and muscle tissue, it seems reasonable to us to assume the primary change explaining the increase was muscle, as it comprises the majority of the remaining tissue. We are completing further analyses now to better address the specific mode of hypertrophy, with intent to better understand fiber-level adaptations to myofibrillar and sarcoplasmic fractions in muscle biopsy samples. Although there’s a good bit of work remaining to be more confident in our findings so far, preliminary data from subjects exhibiting large increases in muscle fiber size (fCSA) indicate heterogeneous responses in myofibrillar protein concentration and sarcoplasmic protein concentrations as well. In other words, although increased fiber sizes were observed in quite a few subjects, this increase doesn’t seem to be entirely due to myofibrillar protein since sarcoplasmic protein concentrations also demonstrated increases in some cases. We’re performing follow-up experiments to be more confident in our findings before we present that more formally. You might also be interested in a manuscript we have in review that discusses the presence of or alterations in biomarkers that seem to explain a significant amount of the variation in muscle growth in this study so we can better understand physiological factors related to why people respond better or worse to high volumes of training. We hope you’ll check that out upon publication!

Considering this, we feel it’s important to bring attention to the heterogeneity in responses of fCSA (in both slower and faster-twitch fibers) and muscle thicknesses of the bicep and vastus lateralis to better interpret the results of the study. The plots below can serve to better demonstrate this than text, so check those out.

As you can see, averages only tell us so much. Some subjects continually realized increases in fCSA and muscle thickness while others didn’t. This speaks to the importance of individualizing training doses to train within the bounds of your MRV, and even points to the importance of considering different muscles probably possess different MRVs. Furthermore, since hypertrophy is empirically defined as an accrual of myofibrillar protein concomitant to increases in fCSA, these data only tell us so much. Future work will focus on the nature of hypertrophy in response to specific training interventions; this is the direction we’re currently focusing our research questions, similar to our analysis in this paper.

Insights beyond the publication

While the reps in reserve questions were answered similarly toward the end of the study, researcher observations were unanimous that the guys were struggling hard to meet technically sound rep goals toward the end, and true RIR declined. Two takehomes:

a.) Many subjects really were close to their max ability to recover or had actually exceeded it already.

b.) RIR use in lab settings is not without its imperfections and should be used alongside objective measures (velocity, RM tests, etc.) when possible.

In fact, subjects started growing frustrated with having to provide a rating after each set and started to seem as if they were just repeating the rating to avoid changing the load stemming from high fatigue. Having interacted with many of them since the study, it seems they won’t be carrying out this protocol or training like it for a good while.

Implications

a.) If you go very high RIR to begin with and don’t increase bar weight, you can recover from lots of sets, more than most expected. At least one potential insight we can glean from this is that when people claim success with exceedingly high-volume programs, we should be very interested in their per-set RIR before drawing any tentative conclusions based on their experience. A 40-set program of compound lifts with 5 RIR averages can be survivable, whereas the same program with 0 RIR might be highly prohibitive.

b.) Because both high RIR and stable bar weight are likely not the best ideas for consistent real-world training, lower RIR and increasing bar weights are likely to cause more fatigue on a per-set basis, meaning that the total number of sets you can tolerate is probably lower than the set volume used in this study.

c.) Subjects were probably near or beyond their MRVs by the time they were doing 30 sets per muscle group per week at the end of this program. Since higher loads and lower RIRs will likely cause more fatigue per set, it’s unlikely that most people’s MRVs will be greater than 30 sets per muscle group per week in most practical settings. This gives us some insight as to the limits of volume progression in practice.

However, it stands to reason that stable bar weights in the context of overloading other training parameters (e.g. frequency, rest intervals, reps per set, total sets, etc.) might be a good option in some contexts, considering that hypertrophy was observed using the volume progression in this study. Stated differently, load progression for hypertrophy isn’t the ONLY way to productively overload training for hypertrophy, and it wasn’t necessary in this study to elicit a hypertrophic effect for some subjects. Future research can help clarify specific effects of load progression, frequency progression, or other progression styles to better understand adaptive responses.

In conclusion, every study teaches us something, but no study is the holy grail of knowledge, nor should any study’s results be taken at face value without reading into the methods and potentially even other observations of the researchers. For example, one of our favorite Brad Schoenfeld studies comparing high- versus low-force volume-equated routines demonstrates the importance of this nicely. Although hypertrophy was statistically the same, subjects in the high-load group reported excessive fatigue and it took them about an hour longer than the low-load group to complete training sessions. So, always look to literature to instruct, but never accept research conclusions without a good dive into the intricacies of a study. Better yet, refer to the whole body of research on a subject and never take one study too seriously.

Note of thanks from the author: Thanks to Chris Vann, MS, for providing edits to this article and his critical assistance with the study. Also, thanks to my amazing PhD mentor, Dr. Michael Roberts, for facilitating this research and the other Molecular and Applied Sciences Lab members at Auburn University for all their help to make this project happen.

A Thorough Breakdown of the “Extreme Volume Study”

Study design, purpose, and summary of results

Addressing critiques and comparison to other studies

Insights beyond the publication

Implications

Read Next

Share this on Facebook and join in the conversation

Cody Haun

read next

When to Use Specific Progressive Overload Strategies

When Should You Consider Taking a New Supplement?

The Evidence-Based Guide to Grip Strength Training & Forearm Muscle Development

When to Use Specific Progressive Overload Strategies

When Should You Consider Taking a New Supplement?

The Evidence-Based Guide to Grip Strength Training & Forearm Muscle Development