Making software is like: finding music
I like music, and I like finding new artists, albums and songs to listen to. But how do I find the music that I’ll like from among all of the music out there, given that I have limited time and money for purchasing and listening to music?
I could just buy CDs at random and I hope that I like them, but this seems cost ineffective. I probably would enjoy a very small percentage of them. I could listen to CDs at random, and then buy the ones that I like, but even this would be very time consuming given the thousands of hours’ worth of recordings released each year. I could guide my choices based on the album art, or by record label, or by purchasing music by artists whose previous releases I’ve enjoyed, but these are not often necessarily good indicators of the enjoyment I’ll find.
What I really need is some sort of system that can tell me which CDs to listen to, an oracle for suggesting music. There are 3 basic forms of such systems that spring to mind.
1) Recommendations based on knowledge of your tastes. The old fashioned way, where a friend who knows what you like will recommend something that they think fits into the space of music that you like. They may not know what it is about the suggested album that makes it likely that you will enjoy it, they just have this vague idea of what you like in music and see an overlap. If their tastes are very similar to yours, they may just be able to use their own appreciation of a given album as the basis of this suggestion. The power that guides the recommendation is a holistic understanding of what you need, which is likely to be effective. But it is difficult to achieve, requiring a person who both knows of the album you might like, and knows you well enough to know that you might like it.
2) Recommendations based on a given preference. This is the very essence of Amazon.com’s “People who bought this also bought album X”. The low-tech version is when the guy at the record store sees you buying the Of Montreal album and suggests an old Apples in Stereo album. Once again, the suggesting party doesn’t really understand what it is you like about an album, they just know that people who like it show a tendency for liking another album. If you are in Venn diagram circle A, maybe you’re in A and B, and therefore would like B. Some internet services allow you to enter a great many preferences, and will cross-reference these with others’ lists, looking for likely overlaps. The power that guides the recommendation is the combined preferences of all the people that the service has access to. The system does not really understand what you like, nor what anyone else likes, but allows these preferences to imply similarities in the underlying qualities of the music, and matches them.
3) Recommendations based on underlying qualities. This is the type of recommendation system that I was recently exposed to, and which inspired this post. At pandora.com there is a service into which you enter the name of an artist or song that you like, and it will play you music that you might like. This is all based on something called the music genome project, which endeavors to examine thousands and thousands of songs, identifying certain basic qualities to them, from “varying tempo and time signatures” to “eclectic bass riff” to “east coast rap influence”. It appears that these qualities have been manually applied, with a group of people listening to all the songs and deciding which qualities are good fits. The result is that when you indicate that you like the Beatles, it gets the idea that you might like qualities that are commonly found in their songs, including “major key tonality”, “mellow rock instrumentation” and “a clear focus on studio production”. These qualities are obviously subjectively applied, but they are fairly objectively phrased; there is no mention of “high quality” or “strangeness” or anything that would otherwise be so relative. Then other songs with these qualities are played. The power of the recommendation is the qualities that were sort of hidden away and implied in the other two approaches, but which are explicitly leveraged here. This means that the qualities will more likely be transitioned, but there is no holistic understanding of whether this will really result in music that you like.
So, speaking of “hidden away”, I have been especially coy about the point here. When it comes time to build software, we would like to be able to reuse others’ ideas. These ideas might come in the form of high-level understandings of a problem domain, design patterns or actual reusable code. But once again, there are so many ideas out there, with costs in terms of time and money to find and use them. Our project has certain tastes, that is, it needs certain kinds of ideas that are useful to what is being attempted. We can’t really spell out exactly what it is that we need. We can explain what ideas we already have, or we can try to describe our goals, but there are so many underlying factors as to what idea/pattern/code segment will be useful during a given project that it is very difficult to find useful ones.
I will confess that I am not up-to-the-minute on the current reuse research, and I would appreciate any comments about where I’m wrong, but this is my take on this metaphor. In each case, it doesn’t take much to ruin a good match. A single element of a given song might be enough to turn someone off to it, and there are myriad details that could make a given software idea usable in a certain context. The biggest difference is likely the stakes; if a person doesn’t like a CD its no big deal, whereas a poorly applied software idea could be very costly indeed. But how well do our three basic approaches to music finding apply to software?
1) Recommendations based on knowledge of your tastes (that is, needs). This is the current approach to reuse, as I understand it. Someone is familiar with the software you are developing and the situation around it, and also knows of an idea that seems like a good fit, so they recommend it. In some sense you may need to recommend such an idea to yourself; that is, you must understand your own problem, and identify the connection. This is often something that is much more difficult to keep in your head than your own musical needs.
2) Recommendations based on a given preference. Can we automate this idea suggesting process? I imagine rudimentary systems for this exist, but I haven’t heard of anything very impressive. I think that such a system could be created, but what it lacks is the necessary user data. If we had every document created by software development teams over the last 5 years, we would likely be able to hash out some details of overlapping ideas, concepts, keywords, diagram sections and code segments. It seems like it would be possible to then enter your project, in whatever state it is in, and receive suggestions based on similar projects. Some uniform formatting across projects would, of course, ease things, but I don’t think it would be necessary for results.
I’m making a lot of assumptions about the technology here, and certainly finding projects willing to give up such information is a near impossibility, but I like the idea in its mythical, impractical state nonetheless. I know that Chris Jensen is working on some stuff to data mine open source projects – similar technology might be applied.
3) Recommendations based on underlying qualities. This is the approach that inspired this post, and the one that seems most interesting to me. I was not so much impressed by the recommendations that Pandora.com gave, as I was by the endeavor of deciding on these qualities and then hand-describing thousands and thousands of songs. If we tried to do the same thing to software, what would it look like? What would our qualities be? Would they relate to the domain? Perhaps the architectural style? Could we determine what the emphasis of a given design was? These seem analogous to the instrumentation, style influence and instrument emphasis qualities, respectively, as found on the music site. I wonder if there is some underlying structure of classification of specific designs that can be found. If so, constructing a version for software design would be much easier if we had something more generic to follow.
I think that the benefit of such an endeavor goes beyond my initial point. Yes, once we hashed out the details of such a set of qualities, we could analyze individual designs and try to match them up with each other. This would potentially be useful, and seems in many ways to hold more potential for good matches than approach #2 has for software, and more potential than this approach #3 has for music, even.
But finding these qualities and trying to really decide whether a design has them seems like it would be really fun, for lack of a better word. It would, for me, give me some sense of software aesthetics and qualities beyond the –illities. It is perhaps unrealistic to think that appropriate qualities could be found and applied, but that is what I would have thought about music. Maybe there is hope after all.
The software portion of this post started off about reuse and ended in seeking the nature of software. We know that reuse requires a good understanding of the nature of a given system, and I think the route may be through my, more specific, area of interest: the nature of software in general.
** For those who would like to check out pandora.com, be warned that it is fairly limiting in terms of how many songs you can skip, how long you can use it before registering for free, and then how long you can use it without paying. However, as is often the case, there are some easy workarounds if you are willing to do a little research.