Oreilly thoughtful machine learning 2014 9

Page 183

iterations.times do |i| puts "Iteration #{i}" expect_maximize end end def expect_maximize expect maximize end end

EM Jazz Clustering Results Back to our results using EM clustering with our jazz music. To actually perform the analysis, we run the following script: data = [] artists = [] CSV.foreach('./annotated_jazz_albums.csv', :headers => true) do |row| @headers ||= row.headers[2..-1] artists << row['artist_album'] data << row.to_h.values[2..-1].map(&:to_i) end data = Matrix[*data]

e = EMClusterer.new(25, data) e.cluster

The first thing you’ll notice about EM clustering is that it’s slow. It’s not as quick as calculating new centroids and iterating. It has to calculate covariances and means, which are inefficient. Occam’s Razor would tell us here that most likely EM clustering is not a good use for grouping big amounts of data. The other thing you’ll notice is that our annotated jazz music will not work; this is because the covariance matrix is singular. This is not a good thing. Realistically, this problem is ill suited for EM clustering for this reason, so we have to transform it into a different problem altogether. We do that by collapsing the dimensions into the top two genres by index: require 'csv' CSV.open('./less_covariance_jazz_albums.csv', 'wb') do |csv| csv << %w[artist_album key_index year].concat(2.times.map {|a| "Genre_#{a}" }) CSV.foreach('./annotated_jazz_albums.csv', :headers => true) do |row| genre_count = 0

Categorizing Music

|

167


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.