CSE6242 Data & Visual Analytics
Fun Materials for the week
How to create an intuitive statistical visualization?
Hans Rosling had mastered that art and we can learn a lot from his first TED talk.
Data Visualization
Edward Tufte is a pioneer in Data Visualization and he is also a fellow of the American Statistical Association.
His books on visualization are amongst the best 100 books of the 20th century on Amazon.
The "Thinking Eye with Edward Tufte" (link) is a lecture given by Edward Tufte at MIT Sloan Sports Analytics Conference is a great place to learn about "Edward Tufte's Design Principles"
Information is beautiful is one of the best places to learn how to create impactful infographics and data visuals. The website was founded by David McCandless who is one of the brightest minds in data visualization. He has tons of great examples on his website.
He also conducts Dataviz Workshops in major cities around the world and you can check his workshop schedule here (including online seminars).
August 2020 edition of Wired magazine had a brilliant article on data visualization aptly titled, Is Your Chart a Detective Story? Or a Police Report?. Here's a small abstract from the article:
Consider the visualization created by information designer Will Burtin in 1951 to summarize the effectiveness of three antibiotics—penicillin, neomycin, and streptomycin—in treating 13 bacteria. Bacterial species are arrayed in a circular layout, with three bars for each bacterial infection representing the amount of each antibiotic needed to treat it. An inversion of the scale means that longer bars represent more effective antibiotics, aligning with a spontaneous interpretation of bigger is better, while shading behind the bars neatly organizes the bacteria into two groups according to whether they result in a positive or negative gram stain test.
This article is a great place to understand the pros and cons of creative non-traditional visualizations.
"Simplicity is the ultimate sophistication.” This design mantra of Steve Jobs and Jonny Ive changed our world and technology forever.
Jony Ive's biography Jony Ive: The Genius Behind Apple's Greatest Products
has some wonderful lessons one can learn on creating a simple and intuitive design for complex problems. Here's is a small excerpt from this book:
The process of simplification is design 101, a mind-set that every design student is taught in school. But not every student adopts it, and it’s rarely applied with the ruthless discipline practiced by Jony. Indeed, if there’s such a thing as a single secret to what Jony Ive does, it is to follow slavishly the simplification philosophy. That approach has accounted for many of the major breakthroughs, as well as for some products that failed and others that Apple hasn’t released. Caring enough to commit the enormous time and effort to get something right has also been Jony’s hallmark, from his earliest college projects onward. Jony’s ultimate goal is for his designs to disappear.
*"Correlation does not imply causation"*. The deciphering “causation” has been an open question for decades if not centuries. Judea Pearl is not only described as one of the giants in the field of artificial intelligence but also considered as one of the founding fathers of the causal revolution.
His 2018 book, The Book of Why: The New Science of Cause and Effect has rekindled the interest in causal research.
Here’s an excerpt from his book:
** Ironically, the need for a theory of causation began to surface at the same time that statistics came into being. In fact, modern statistics hatched from the causal questions that Galton and Pearson asked about heredity and their ingenious attempts to answer them using cross-generational data. Unfortunately, they failed in this endeavor, and rather than pause to ask why, they declared those questions off-limits and turned to developing a thriving, causality-free enterprise called statistics.* [---] ***My emphasis on language also comes from a deep conviction that language shapes our thoughts. You cannot answer questions that you cannot ask, and you cannot ask a question that you have no words for.
** **His new book has inspired many people and there are some wonderful python packages now on causality including Microsoft’s Python package aptly titled - Do Why.
Python
Python is the most popular and easy to use language in data science. This week, we recommend a book that will help not only beginners but also advanced Python programmers to revisit some of the hidden gems and good etiquette of Python programming.
- Fluent Python: Clear, Concise, and Effective Programming by Luciano Ramalho (link)
- Interview with Python creator Guido van Rossum on how Python makes thinking in code easier (link)
Richard Feynman
Richard Feynman has often considered as the greatest teacher ever. He not only mastered physics but had mastered the art of thinking correctly.
His famous lines to NASA managers testified in front of the Rogers Commission that was investigating the Space Shuttle Challenger disaster, “When you don't have data, you have to use reason. And they were giving you reasons.” is an important insight for data scientists to dive deep into the problem to find a resolution even with limited data.
This BBC documentary on his life, The Pleasure of Finding Things Outis a never-ending oasis of inspiration. His famous talk on “Beauty” is part of this documentary.
His last book The Meaning of It All: Thoughts of a Citizen-Scientist is a great consolidation of his lectures and insights. Thanks to Bill Gates, who bought all the rights to Richard Feynman Caltech lectures and made it available free online under the name Project Tuva.
Traveling Salesman problem
Last week, after 44 years computer scientists found a better way to come up with an approximate solution to the Traveling Salesman problem. You can read about it here.
Traveling Salesman problem is an NP-hard problem and you can read about it here.
There is a wonderful movie from 2012 aptly titled “Traveling Salesman” which depicts hours and days after three mathematicians solve this hardest problem in computer science, P=NP. The movie is available for free on Prime and YouTube. Enjoy the movie!
Bayes Algorithm
Bayes Algorithm has stood the test of time for centuries. Sharon Bertsch McGrayne’s fascinating book The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy not only covers the history but also the immense practical applications of Bayes.
To complement this book, the movie Imitation Games is a brilliant portrayal of how Alan Turing cracked the Enigma code using Bayes to eventually helped defeat the Nazis.
If you are ready to start using Bayes, Allen Dowing’s Think Bayes is a great place to start and it is Python-based.
Machine Learning
This week all of you have started Machine Learning based homework and with long term career planning in ML, these are three practical and “code” based ML books (with limited theory):
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2 by Sebastian Raschka & Vahid Mirjalili
Approaching (Almost) Any Machine Learning Problem by Abhishek Thakur
The 100 Page Machine Learning Book by Andriy Burkov covers everything in just 100 pages.
For a quick review of ML theory, Chris Albon’s Machine Learning Flashcards are excellent.
Oscar Nominee Werner Herzog’s documentary Lo and Behold, Reveries of the Connected World (also starring Sebastian Thrun) is a wonderful watch to learn about the past, present, and future of the internet.
Young Scientist
E.O. Wilson’s Letters to Young Scientist is an insightful read for all ages but it is an important read for students.
Here’s a small excerpt from the book where the living legend of biology draws an analogy between a scientist and an artist :
The ideal scientist thinks like a poet and only later works like a bookkeeper. Keep in mind that innovators in both literature and science are basically dreamers and storytellers. In the early stages of the creation of both literature and science, everything in the mind is a story. There is an imagined ending, and usually an imagined beginning, and a selection of bits and pieces that might fit in between. In works of literature and science alike, any part can be changed, causing a ripple among the other parts, some of which are discarded and new ones added. The surviving fragments are variously joined and separated and moved about as the story forms. One scenario emerges, then another. The scenarios, whether literary or scientific in nature, compete with one another. Some overlap. Words and sentences (or equations or experiments) are tried to make sense of the whole thing. Early on, an end to all the imagining is conceived. It arrives at a wondrous denouement (or scientific breakthrough). But is it the best, is it true? To bring the end safely home is the goal of the creative mind.***
***
E.O.Wilson also has a TED on the same subject, Advice to a young scientist.