Greg Kochanski September 2011

Software for finding rhythmical similarities between music

The overall strategy in the software is as follows:
1) From each audio track, compute a "feature vector".
	This is a vector of floating point numbers that represents the
	rhythms of a particular track.   This is the computationally
	intensive step: you are condensing a few megabytes of audio
	down to a few kilobytes of feature vector.

	Ideally, the feature vector should capture something important
	about the music.  Here, we capture beat patterns.

1a) The software assumes that two tracks are similar if and only if they
	have the same "label".   In this software, the "label" is just
	the name of the album.  However there are a bunch of other
	reasonable choices you could use:  Year of composition,
	year of performance, artist, performer, or combinations thereof.
	The necessary constraint is that you need at least a few tracks
	per label.

2) Compress the feature vectors.   This is a process that reduces the
	length of each feature vector.  The intent is that it keeps
	information that helps discriminate labels, and throws away
	information that primarily varies within a label.

	When the compression algorithm is done, you can take the
	Euclidean distance between two feature vectors, and
	if the two vectors have the same label, the distance will be
	small; if the vectors have different labels, the distance
	will be large.  (Obviously, this will only work well to the
	extent that the feature vectors are correlated with the labels.)

	Compression is a fairly fast operation.   You accumulate a matrix
	from each track's feature vector, and then you do something similar
	to principle component analysis.

3) Apply a k-means clustering algorithm.   This efficiently brings together
	tracks that have similar values of the compressed feature vector.
	It gives you a list of which tracks are in which cluster.
	Tracks that are in the same cluster will be similar in the sense
	that they will probably have the same label, and they will have
	nearby values of the uncompressed feature vector.   Similarity
	here is a mixture of what the uncompressed feature vector defines
	as similar and what the labels define to be similar.

	Note that the clusters generated by the k-means algorithm
	do not have names.   They are just groups of similar songs.
	If you have very common labels, they might be split up into
	several clusters.   Rare labels would probably get merged
	together.

	If you want to give a name to a cluster, you could perhaps pick
	the label of one of the tracks that ends up in the cluster.
	If you really want your clusters to match the labels as exactly
	as possible, you'd want to replace this step with a classifier
	algorithm: something that explicitly matches labels and clusters.
	

Now, this software does the computation for one set of labels: album names.
However, if would be a fine idea to do the whole process separately for
several different sets of labels.    You could do it with album names,
and then with year, and with performer and songwriter, for instance.
That would give you four sets of clusters, and tracks could be similar in
four different ways.

Or, you could think of a pair of tracks (A and B) as being "near identical"
if they clustered together all four ways, "very similar" if A and B ended
up in the the same cluster three times out of four, "similar" if they
clustered together twice, and "slightly similar" if they shared a cluster
only once.

