Professional Documents
Culture Documents
Posted by Dr. Vincent Granville on September 6, 2015 at 12:55pm in Data Mining Software
View Discussions
Some interesting stuff you can do with Lego's to introduce analytic thinking, computational complexity, and
experimental design to kids 6-12 years old, and get them interested in analytics.
Let's say you purchase 2 sets of Lego's (one to build a car, another one to build another car). Let's assume that the
overlap between the two sets is substantial. There is three different ways that you can build the two cars. The first
step consists in sorting the pieces (Lego's) by color, and maybe also size. The three ways to proceed are:
Sequentially: Build one car at a time. This is the traditional approach.
Semi-parallel system: sort all the pieces from both sets simultaneously (so pieces will be blended - some in
the red pile will belong to car A, some to car B). Then build the two cars sequentially, following the
instructions in the accompanying leaflets.
In parallel: Sort all the pieces from both sets simultaneously, and build the two cars
simultaneously, progressing simultaneously with the two sets of instructions.
Which is the most efficient way to proceed? The least efficient is sequential. Why? If you are a good at multitasking,
the full parallel approach is best. Why? Note that with the semi-parallel approach, the first car that you build will
take less time than the second car (due to easier way to find the pieces that you need because of higher redundancy),
and less time than needed if you used the sequential approach (for the same reason).
You can have your kid build two cars A, B in parallel, then two other cars C and D sequentially, to test my
assumptions, and to help get familiar with the concept of distributed architecture.
Other concepts that can be introduced: building a 80-piece car takes more than twice as much time as building a 40-
piece car. Why? (the same also applies to puzzles). Note that if the overlap between A and B (the proportion of Lego
pieces that are identical in both A and B) is small, then the sequential approach will work best.
DSC Resources
Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
Buzz: Business News | Announcements | Events | RSS Feeds
Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Additional Reading
50 Articles about Hadoop and Related Topics
10 Modern Statistical Concepts Discovered by Data Scientists
Top data science keywords on DSC
4 easy steps to becoming a data scientist
13 New Trends in Big Data and Data Science
22 tips for better data science
Data Science Compared to 16 Analytic Disciplines
How to detect spurious correlations, and how to find the real ones
17 short tutorials all data scientists should read (and practice)
10 types of data scientists
66 job interview questions for data scientists
High versus low-level data science