# What happend with Legos?

*What happened with Legos? They used to be simple. Oh come on, I know you
know what I’m talking about. Legos were simple. Something happened out here
while I was inside. Harry Potter Legos, Star Wars Legos, complicated kits, tiny
little blocks. I mean I’m not saying it’s bad, I just wanna know what happened.”
– Prof. Cane, Community*

What happened with Legos? The question implies a kind of grumpy nostalgia that I don’t necessarily agree with, but underneath the back-in-my-day bluster there is an interesting question: how have Lego sets changed over the past several decades? There are a couple obvious differences: the introduction of sets featuring licensed content, models aimed at adult collectors, and non-smiley face minifigures, to name a few, but these are largely differences in marketing and branding. I’m interested in how Lego as a creative toy has changed over time.

Contemporary handwringing about changes in Lego focuses on two perceived shifts: first, Lego sets have gotten more expensive, and second, Lego sets have gotten more complex, with too many bricks that are tailored to specific models and less amenable to creative play. Others have addressed the economic angle; in short, per brick and adjusted for inflation Legos are as inexpensive as they’ve ever been – so in this post I’m going to investigate the second point.

The data we’re using in this analysis was generously provided by Rebrickable.com, and it is fantastically comprehensive. In fact, it’s a little too comprehensive, it includes things like LEGO-branded watches, video games, and various other non-brick LEGO products. To tidy things up we’ll exclude any product that has fewer than 10 pieces.

We’ll also use a hand-coded dataset which maps the brick categories defined by Lego onto a simpler subset of categories.

# Unique brick types by year

To start, lets investigate how the number of types of Lego bricks has changed. With a little pandas aggregation and set logic we can get a list of each brick type included in every Lego set.

After getting the brick type lists our data looks like this:

```
year pieces descr piece_list
set_id
00-1 1970 471 Weetabix Castle 3062a 3006 3038 3005 3022 3001a 3043 3003 29c0...
0011-2 1978 12 Town Mini-Figures 3626apr0001 3833 3624 973c07 3625 973pb0091c01...
0012-1 1979 12 Space Mini-Figures 3626apr0001 3962a 3838 3842a 970c00 973p90c02
```

A quick note on colors and prints: due to the way the brick ids are defined, bricks of the same shape but different color count as the same brick type, while bricks of the same shape but with a different print (e.g. a blank slope and a telephone) count as different brick types.

From here it’s simple to group the sets by year, and count how many unique brick types were included in that year’s sets.

Now we’re ready to look at how the number of brick types has changed over time. For each year I’ll plot three different metrics:

- the cumulative brick count – how many brick types exist in all years up to and including that year
- the non-cumulative brick count – how many brick types exist just in that year
- the new brick count – how many brick types show up for the first time in that year

Right off the bat we can see that there are a lot more Lego bricks than there used to be. The 90s were my Lego heydey, and since then the number of unique brick types has more than tripled; the situation is even more dramatic if your formative Lego experiences are from the 80s. (One other interesting note about these plots: Lego almost went bankrupt in 2003, and you can see signs of the turmoil that precipitated the crisis in the early 2000s and the fallout in the latter half of the decade.)

Not all bricks are created equal though: for every generally useful brick like a classic 2x4 there’s also a super-specialized brick like a ‘61185c01’. Or rather, there used to be; the perceived proliferation of specialized pieces is one of the major complaints behind the ‘Legos aren’t the creative toy they used to be’ sentiment. Thanks to the data from Rebrickable we’re in a position to address this impression quantitatively, so next let’s investigate what types of bricks are most responsible for the increase.

The data contains category information for each brick type, however the categories as provided are a little too fine-grained to be useful. To make things a little simpler I assigned each of the 55 original categories to one of four new categories:

- Minifigures: minifigures and accessories, including plants and animals
- Bricks: everything that you use to build stuff
- Non-brick Lego: Lego products that aren’t bricks, like Bionicle or Znap
- Other: everything else

We’ll use this mapping from (complicated category) -> (simple category) to assign each piece to one of the simple categories.

We want to look at the trends for ‘basic’ and ‘specialized’ bricks separately, so we need some way to separate the two types of bricks. Counting the number of different sets a particular brick type shows up in seems like a good way to do this; somewhat arbitrarily I’ll say that any brick that occurs in less than 10 different Lego sets is ‘specialized’.

Having assigned each brick type to one of our now five simple categories (‘basic’, ‘specialized’, ‘minifig’, ‘non-brick lego’, and ‘other’), we can re- run the type counting analysis above on each subset of bricks.

It looks like new minifigures account for a large fraction of the increase in brick types, especially since about 2010. There are definitely more minifigures than there used to be, however this also reflects the brick type labelling scheme – remember that identical pieces with different prints count as different bricks, so each distinct minifigure counts as another new brick type.

More to our original premise, we have what appear to be some conflicting trends in the ‘specialized’ vs. ‘basic’ brick type counts. Let’s break those out into their own plots for a closer look.

The cumulative number of specialized bricks is higher than the number of basic bricks, and in most years since ~1995 there have been more new specialized bricks introduced than basic bricks, but in any given year there are many more basic than specialized brick types. To me, this suggests that a basic brick type, once introduced, stays in the rotation, while a specialized brick is introduced, used once or twice that year, and then phased out of production.

These trends, especially the new brick count by year, seem to support the ‘Lego has gotten too specialized’ argument. Up until about 1995 new ‘basic’ and ‘specialized’ bricks were being created at about the same rate, but since then specialized bricks have substantially outpaced basic bricks, except for a brief period in the late 2000s, which was presumably a result of the near-bankruptcy in 2003.

This speaks to the ‘How has Lego changed?’ question that motivates this analysis, but the conclusion so far is still weak because it doesn’t account for how common each type of brick is. Next we’ll expand on this analysis by looking at the distribution of bricks in a given year’s sets.

# Yearly brick frequency distributions

The number of brick types produced in a given year is an interesting metric, but it doesn’t capture much information about how suitable that year’s Legos are for creative play. Who cares if a set has a handful of overly specific bricks if that same set also has tons of basic bricks? Looking at brick frequency distributions should provide a better indication of how Lego sets are changing: if Lego sets have come to rely too heavily on lots of specialized, single-use bricks then we should see the distribution of brick frequencies shift towards smaller values. Alternatively, if Lego sets contain about the same proportion of generally useful bricks as they did in the past then the frequency distribution should look the same, just scaled up.

To calculate the brick frequency distributions we’ll add up how many of each brick type each Lego set contains, and then sum over all the sets in a given year.

The brick frequncy distribution has a long tail, so we’ll use logarithmic bins for the histogram.

Looking at the frequency distributions, there are undeniably more specialized bricks than there used to be – the leftmost bin, corresponding to brick types which show up only once, becomes larger and larger relative to the rest of the distrubtion as we progress from 1985 to 2015. But this is just what we saw earlier when we counted unique brick types; what’s more interesting is how the other end of the distribution has changed. The different y-axis scales make it somewhat hard to see, but the tail has gotten both longer and fatter. Let’s zoom in on the high frequency region to see this.

In 1985 there was a single brick type that showed up more than 1000 times in that year’s sets, in 2015 there were more than 25. There might be more specialized bricks than there used to be, but there are also way, way more general purpose bricks as well. The question is, ‘Has the relative growth in the number of specialized bricks outpaced the growth in basic bricks?’

It’s important to note that we’re talking about the number of *bricks*, not the
number of brick *types* here – we’ve already established that the number of
specialized brick *types* is growing faster, now we want to know if that means
that we end up with more of the specialized bricks themselves.

It’s hard to say, just by looking at the histograms, whether this is the case (especially since we’re using a logarithmic scale). To make things easier, let’s borrow the idea of a cumulative distribution function and apply it to the yearly collection of bricks. (The year’s ‘collection’ of bricks is what you’d end up with if you bought one of every set released that year). Roughly speaking, a cumulative distribution function tells you what fraction of a total distribution is accounted for by a given subset of the elements in the distribution. In our case, the ‘Collection CDF’ will tell us what fraction of the total number of bricks in a given year are accounted for by a given subset of brick types.

To generate our collection CDF we’ll proceed in two steps:

- Rank each brick type by frequency, then normalize by the number of brick types that year. These are our x values.
- Take the cumulative sum of the ranked brick frequencies, then normalize by the total number of bricks that year. These are our y values.

The resulting curve will tell us what fraction of our collection is made up of
bricks in the bottom *x* percent of brick types.

In 2015, the bottom 80% of brick types, by frequency, accounted for a little under 10% of all the bricks in that year’s collection, in 1980 they accounted for about 20% of the collection. The same trend holds true for the rest of the curve: rare brick types account for a smaller fraction of the yearly collection in 2015 than they did in 1980. In other words, we can definitively answer the question posed a few paragraphs ago: No, the relative growth in the number of specialized bricks has not outpaced the growth in basic bricks. There are a lot more uncommon, specialized brick types, however in aggregate, they make up a smaller fraction of the yearly Lego collection than they did in the past.

So what happened with Legos? They made a lot more of them. In doing so, they
made a lot of new, specialized bricks, but they made even more general purpose
bricks. This trend is easily obscured by the opposite trend in the number of
brick *types*, but from a ‘creative play’ standpoint the bricks you actually end
up with are more important than the bricks you could have ended up with.

The jupyter notebook for this post is here and the data is here.