You are on page 1of 8
22, 424 AM Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science e Published in Towards Data Science © Benjamin Tenmann Jan 10,2021 . Sminread - @ Listen [yt save Image by author. Outer-Products: A Love Letter The undeniable power of matrix magic. Linear algebra and vector calculus are amazing! It really is a terrible shame that almost no time is dedicated to these parts of maths in high school. Instead, we are given integral and derivative problem sets ad nauseam — yuck! Vectors, matrices and arrays offer an incredibly powerful way of doing an immense amount of computation very efficiently. A great example of this is the NumPy module in Python, which gives the high-level language the ability to perform array-based computation. Here, I show how using a method from linear algebra — the outer-product — can be used to avoid excessive looping and speed up computation. Let’s get started! hitpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e ve 322, 424 AM Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science @0 openinap @& Firstly, in case you are not familiar with linear algebra, there are a few ways to multiply sta two arrays. One of these is called the “outer-product” and is written as follows: M=u®v where u and v are n and m dimensional vectors respectively and M is a n x m matrix. The ij-th element of M is computed as follows: Mij = uj + 0; So at the end we get one product between each possible element pairing from vector u and vector v. How is this useful? Let’s find out! Outer-products in Python Base Python Though technically not an outer-product between arrays, it is possible to produce something approximating it using only base Python. The most intuitive way is using lists (representing the vectors) and nested for loops: (2,3,5,2,7,8)18] 2 v= [5,2,7,2,3,18,4] ao9-0 S for din ut 6 7 8 for J inv: ° 10 row.appens( * 4) hitpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e 28 322, 424 AM (Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science @0 openinasp @& This expression returns a list of lists, where each list is analogous ( a row vector in an sta array. This method, however; seems clunky. In fact, as we can see (fig. 1), itis. Nested for loops are notoriously inefficient in Python and generally frowned upon as not “Pythonic’. Amore elegant and “Pythonic” solution is using list comprehensions: 2 w= (2,3,5,2,7,8,20) 2 v= [5,2,7,2,3,10,4] 4 = [14 for £ in u for j in v] 5 print (my Is comp.py hosted with @® by GitHub view raw While you might appreciate how this solution is syntactically much more efficient and neat than the nested for loops, you will also notice that this expression will not give you a list of lists like the previous solution, but rather one long list of length n x m. It is thus less intuitive to read and index. However, it is algorithmically significantly more efficient (fig. 1. Finally, we can use the map() function provided by base Python to solve this problem: 1 w= (1,3,5,2,7,8,10) 2 v= [5,2,7,2,3,10,4] 3 4 M= List(map(lambda i, 3: i J, 5 [i for 4 in w for j inv], 6 velen(u))) 7 print) map.py hosted with @ by GitHub view raw Not only is this method the hardest to read, it is also the least efficient one (fig. 1). Though map() is generally a very efficient way of mapping a function over an iterable, here we have the problem of requiring a lot more memory. To implement this method, we hitpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e ae 322, 424 AM (Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science @0 openinap @& NumPy sta In case you didn’t catch the memo, NumPy is fantastic! An expansive module, it is an essential part of every data scientist’s toolkit. Its data-class — the numpy.array() — brings with it many useful operations and functionalities that make computation of large amounts of data a lot easier and quicker. The outer-product is incredibly simple to compute, as it comes with the module as a pre- defined function: 1 import numpy as np 3 w= [2,3,5,2,7,8,10] 4 v= [5,2,7,2,3,18,4] 6 M= np.outer(u, v) 7 print¢m outerpy hosted with @® by GitHub view raw Itis also far more efficient than the base Python methods (fig. 1). Furthermore, the numpy.array() data-type of the output brings with it a whole host of neat methods and advantageous idiosyncrasies. hitpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e 48 18/22, 4:24 AM (Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science on commen method 012. — ferdoops — lst-comprehension — map 0.10 — outer-product 0.08 time 0.04 0.02 0.00 ° 200 400 600 800 1000 size Figure 1: Time taken for computing the outer product of two vectors of a given size (length) using different methods, The outer-product (npouter() is by far the most efficient, while map0 is the least efficient. Image by author. What’s the point? Outer-products are all well and good, but they surely have limited use in the wider context of data science? Well, yes. Outer-products specifically aren’t particularly useful beyond certain use-cases (e.g. error back propagation in multi-layer perceptrons), but their principle can be generalised. For example, we can imagine that instead of a multiplication between two numbers, we use any arbitrary function which takes two numbers and returns a single value. In data science. we often want to find the differences —i.e. distances — between data-points. ‘lo hitpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e 5 18/22, 4:24 AM (Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science @0 openinapp @& Aclassic example which uses this is the Simulated Annealing algorithm in the Travelling Salesman Problem (TSP). In the TSP, we are trying to find the shortest possible path to visit each city ina set exactly once, ending at the starting city. For this, a pairwise distance matrix for the set of cities is required. Here is a solution using NumPy: sta 1 dmport unpy as np 2 from scipy.stats inport uniform neities = soe 5 coords = uniform.rvs(size=n_cities) + uniform.rvs(size-n_cities) ‘13 6 7 def dist(a, b): # calculates distance between two points 8 d = abs(a - b) 9 return 4 12 11 conpute_dist = np.frompyfunc(dist, 2, 1) # vectorise distance function 2 13D = compute_dist.outer(coords, coords) .astype(np.floates) 14 print(o) dist_mat.py hosted with @ by GitHub view raw Ok, so there are a few things going on here: * [use the uniform function from scipy.stats to generate random coordinates for the * Luse complex numbers for the coordinate system. This has a few advantages, principally that I can store a coordinate as a singular element in an array; also, | get the euclidean distance between two points by taking their absolute difference * Ivectorise the distance function by using np.frompyfunc(). This is the crucial step in this approach. We can now use ufunc methods on this function, provided by NumPy * One of these methods is .outer(), which we have seen before. It works as before, only that instead of a product of two numbers we will get the distance between two hitpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e ee 322, 424 AM (Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science on ope Pretty neat, eh? The biggest selling point of this is that it can be generalised, as I have mentioned before. For example, I recently used this method to calculate the pairwise sequence alignment scores for a set of DNA sequences. One could also easily imagine how this could be used to test different combinations of experimental parameters, using vectors of parameters. The possibilities are nigh endless —and we are only scratching the surface of ufune in NumPy. ONumby, you beautiful machine! Thanks to Ludovic Benistant Sign up for The Variable By Towards Data Science Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take alook, ( & Getthis newsletter 7 Cnt CVs htpssltowardsdatascience.com/outer-products-adove-etter-b20a2¢2c818e 78 322, 424 AM Outer-Products:A Love Letter. The undeniable power of matrix magic. | by Benjamin Tenmann | Towards Data Science C2 eee] ae

You might also like