The Method To The Madness: How We Arrived At Our AOTY List

Everybody loves talking about their year-end lists, but no one talks about perhaps the most important part: How they arrived at said lists! The bigger a staff group gets for a site, the harder it gets to aggregate their year-end lists. One possible way is to just get people together and have them argue it out, have editors yell louder than everyone else and end up with some sort of list, but that gets complicated and frustrating way too fast, no one ends up happy, and it wastes way too much time. We did something like that for our two–part year-end list for the podcast, but since our staff roster has 27 people, it’s quite intractable. As such, I resorted to science. No, seriously. Let me tell you how I computed our AOTY list.

Last year, Nick painstakingly created our list by taking everyone’s top 50 lists, assigning a point score to each position in the list (more on that later) and then ranking albums by their total sum score. That’s a pretty good approach, but it requires way too much busywork and the hand-picked weights he gave each album are kind of arbitrary and not necessarily optimal to achieve consensus. So I did some digging, read about 40 pages’ worth of papers on aggregate ranked consensus, and narrowed it down to a set of methods. There are basically two big approaches to aggregation of this sort. You either bias towards things a few people really, really liked, or towards things most people liked a decent bit. In other words, you highlight peaks of very high ranks or a more even distribution of relatively high ranks. There isn’t really an objectively better way to do things, but we’ll see how each plays out. For the record Nick’s approach was more geared towards the former.

Either way, we first need to get a list of albums people liked. Nick just had everyone submit their top 50-or-whatever, and hand-entered scores for all of them. The goal this year was to reduce busywork, so I decided to automate as much of the process as possible. I first asked everyone to create a joint list of all albums that they may consider putting anywhere near their top 50. We used Google Docs to arrive at this list, and the end result had a bit over 450 albums in it. That’s a lot!

I then created a Surveymonkey poll where people would rank their top 50 from those items. Despite many people asking for the ability to make lists as long as 50 items (I suggested 25 items, as I feel like having really long lists lets people get sloppy with their listing instead of highlighting things they REALLY like, but that’s just me), only about a handful out of 27 people actually submitted 50-long lists. Many people submitted lists ranging anywhere from 10 to 25 items (I mandated a minimum of 10). I then took the results of that poll, and imported it into MATLAB to play around with. I then implemented a bunch of models, mostly from “Distance-based and ad hoc consensus models in ordinal preference ranking” by Cook et al., and Nick’s previous model.

Distance Minimization

While I tried a variety of approaches, I want to highlight the results of a few. First, the hilariously bad one. This approach, which I call “Distance minimization”, seeks to minimize the rank of an album on the final list compared to what everyone else rated it. So, for example, if I rate an album #15, and Nick rated it #7, the final rank for it would be somewhere around #11, because that’s the rank that minimizes the distance in the final list from both our lists. This was hilariously bad, because while it’s a great method for closed-list voting, when people can vote on a different 50-subset of 450 albums, it doesn’t work. When someone excludes an album, it means they don’t like it, but this method doesn’t take that into account. If I rate an album #1 and Nick rates it #50, it means we both like it, just that I like it more than Nick. The end score for that album should probably be pretty high. But instead this method would put it around #25, because it thinks Nick likes the album less than anything else. Additionally, if someone rates an album #1, and no one else cares about it, that means the album would be #1 in the aggregate list, because no other score would pull its position down. So, essentially, this approach doesn’t work when we want our list to be a list of things that we all like. It only works when everyone is voting on the same 50 albums. Anyway, here’s what our list would have looked like if I did this:

‘Yndi Halda – Under Summer’
‘The Body – No One Deserves Happiness’
‘Corima – Amaterasu’
‘Kanye West – The Life of Pablo’
‘Whispered – Metsutan: Songs of the Void’
‘Orphx – Pitch Black Mirror’
‘Car Seat Headrest – Teens of Denial’
‘Aenaon – Hypnosophy’
‘Axon-Neuron – Metamorphosis’
‘Chance the Rapper – Coloring Book’
‘Com Truise – Silicon Tare’
‘Dangers – The Bend In The Break’
‘Alcest – Kodama’
‘Slice the Cake – Odyssey to the Gallows/West’
‘Katatonia – The Fall of Hearts’
‘Deftones – Gore’
‘Öz ürügülü – Fashion and Welfare’
‘Dance Gavin Dance – Mothership’
‘Kashiwa Daisuke – Program Music II’
‘Bon Iver – 22, A Million’
‘Sturgill Simpson – A Sailor’s Guide to Earth’
‘John Zorn – The Classic Guide to Strategy, Vol. 4’
‘Meshuggah – The Violent Sleep of Reason’
‘Winterhorde – Maestro’
‘Thank You Scientist – Stranger Heads Prevail’
‘Childish Gambino – Awaken, My Love!’
‘The Avalanches – Wildflower’
‘Roly Porter – Third Law’
‘Agoraphobic Nosebleed – Arc’
‘Spirit Adrift – Chained to Oblivion’
‘How To Dress Well – Care’
‘Dark Tranquillity – Atoma’
‘Coma Cluster Void – Mind Cemeteries’
‘Clipping – Splendor and Misery’
‘Frank Ocean – Blonde’
‘Blazon Stone – War of the Roses’
‘Skee Mask – Shred’
‘Anohni – Hopelessness’
‘Maeth – Shrouded Mountain’
‘Thrice – To Be Everywhere Is to Be Nowhere’
‘Amygdala – Population Control’
‘Vektor – Terminal Redux’
‘Esperanza Spalding – Emily’s D+Evolution’
‘Gorguts – Pleiades Dust’
‘Protest the Hero – Pacific Myth’
‘Lady Gaga – Joanne’
‘Leon Vynehall – Rojus (Designed to Dance)’
‘The 1975 – I Like It When You Sleep…’
‘Black Tusk – Pillars of Ash’
‘Cobalt – Slow Forever’

This list would basically make a few people very happy and everyone else very unhappy.

Pairwise Comparison

Then, let’s look at what we finally went with, pairwise comparison. Essentially, for every pair of albums, I look at whether each person scored album A over album B or not, and award the album a point depending on the result of that. I add these points up based on that, then sort the results. This ended up being pretty reasonable, because it takes into account each person’s internal taste and how that stacks up against everyone else. There was one problem with this approach though. The varying list lengths. Let’s say my list is 10 items. My #1 album would “win” comparisons against 9 other albums, earning 9 points. Nick’s list is 50 items, so his #1 item would “win” 49 comparisons, and get 49 points. This would make his #1 worth more than my #1. That’s pretty unfair, so I scaled a point’s worth by the length of a person’s list. So, winning comparisons on lists with 10 albums would earn an album 1 point per comparison, and winning comparisons on a list with 50 albums would earn an album 0.2 points per comparison. The end result here ended up being pretty representative of our aggregate tastes, and we went with this eventually. You can see the final results here. This falls into the latter category of list that I described earlier. Makes as little an amount of people the least amount of upset.

Nick’s Approach / Static Assignment

Finally, I tried Nick’s approach for comparison. He assigned a static number to each rank, and added that number to an album’s score for its position in each person’s list. Numbers are, from 1 to 50: [150, 145, 140, 135, 130, 125, 120, 115, 110, 105, 101, 97, 93, 89, 85, 81, 77, 73, 69, 65, 62, 59, 56, 53, 50, 47, 44, 41, 38, 35, 33, 31, 29, 27, 25, 23, 21, 19, 17, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5]. This very heavily weights a person’s top 10. Thankfully, the list resulting from this approach wasn’t too different from the final list we went with. Especially the top 10 is quite similar. Props to Nick for coming up with a method that works pretty well! The difference comes into player deeper into the 50, with pairwise comparison highlighting albums that more people liked. Here’s what the list could have looked like if we followed Nick’s method:

‘Car Bomb – Meta’
‘Vektor – Terminal Redux’
‘Oathbreaker – Rheia’
‘The Dillinger Escape Plan – Dissociation’
‘Obscura – Akroasis’
‘Cult of Luna & Julie Christmas – Mariner’
‘David Bowie – Blackstar’
‘Meshuggah – The Violent Sleep of Reason’
‘Alcest – Kodama’
‘Gorguts – Pleiades Dust’
‘Thank You Scientist – Stranger Heads Prevail’
‘Ihsahn – Arktis’
‘Astronoid – Air’
‘Clipping – Splendor and Misery’
‘Fallujah – Dreamless’
‘Wormrot – Voices’
‘Aesop Rock – The Impossible Kid’
‘Periphery – Periphery III’
‘Plini – Handmade Cities’
‘Danny Brown – Atrocity Exhibition’
‘Insomnium – Winter’s Gate’
‘Nails – You Will Never Be One of Us’
‘Haken – Affinity’
‘Devin Townsend Project – Transcendence’
‘Aenaon – Hypnosophy’
‘Dark Tranquillity – Atoma’
‘Slice the Cake – Odyssey to the Gallows/West’
‘Virvum – Illuminance’
‘Swans – The Glowing Man’
‘Inter Arma – Paradise Gallows’
‘Trap Them – Crown Feral’
‘Wormed – Krighsu’
‘Every Time I Die – Low Teens’
‘Gojira – Magma’
‘Ulcerate – Shrines of Paralysis’
‘Textures – Phenotype’
‘Saor – Guardians’
‘Black Queen – Fever Daydream’
‘Neurosis – Fires Within Fires’
‘Deathspell Omega – The Synarchy of Molten Bones’
‘Anciients – Voice of the Void’
‘Protest the Hero – Pacific Myth’
‘Radiohead – A Moon Shaped Pool’
‘A Sense Of Gravity – Atrament’
‘Cyborg Octopus – Learning to Breathe’
’65daysofstatic – No Man”s Sky: Music For An Infinite Universe’
‘Shokran – Exodus’
‘O”Brother – Endless Light’
‘First Fragment – Dasein’
‘Deftones – Gore’

In the end, there is no ultimately correct way to do this, and this was an interesting experiment for me. The end result has pretty much everyone happy (except for a few members whose taste is completely divorced from the rest of the blog), as evidenced by fellow staffers commenting on how many of their top 50 (or more like top 20) albums actually made it into the final list. It seems our staff is generally less upset with the list year, and it took less busywork on our part in terms of compiling and computing the final list, so that’s a big plus too!

Conclusion

I think transparency in how these lists are decided on is important, and I wish other sites would talk about how they made their lists as well. It would go a long way towards demystifying the process and grounding their lists. Plus, I’d be interested in knowing what they do. I hope with this post I was able to clarify our approach a little bit! I’d also love to talk about this in more depth and share more statistics, so let me know in the comments if you want to know what was the highest album that received a single vote, or what was the lowest rank someone’s #1 placed in, or whatever. If there’s demand, I can make a separate posts containing bizarre statistics about our AOTY list.

Distance Minimization

Pairwise Comparison

Nick’s Approach / Static Assignment

Conclusion

Heavy Vanguard Episode 6: White Suns // Totem

Heavy Blog Staff’s Favorite Non-2016 “Discoveries”