Monday, April 07, 2014

Size isn’t everything. Especially when you are thinking with it

Because of my electrical engineering background, most graduate school faculty expected me to get into working with numbers and computers.  My first year assignment was to be a research assistant to a professor who wanted to develop a GIS-based approach in her project, back when GIS at the personal computer level was quite a joke by today's standards.  It was soon clear to her that I was not planning to be a data geek in the social sciences.

I was not after numbers by any means.  I was in graduate school to learn about ideas.  I was blown away that so much had already been said and written down.  Yet, I had not known even the tiniest bits of those ideas when I was led down a narrow alley called electrical engineering.  It is not that I did not value the empirical approach--after all, the background in science and math and engineering had provided me with enough and more evidence on the importance of empirical data.  But, theories and ideas mattered more to me than what the numbers could reveal.

Over the years, I have watched with fascination the rapidly growing collection of data at all levels.  The Big Data, as it has come to be called.  And how mining the Big Data will be the digital world equivalent of the oracle at Delphi--the data will tell us everything we would want to know.

Thus, with interest I read two lengthy opinion pieces on the flaws with Big Data.   The conclusion in one is:
Big data is here to stay, as it should be. But let’s be realistic: It’s an important resource for anyone analyzing data, not a silver bullet.
And the conclusion in the other is:
“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.
Both the essays methodically lay out their arguments, and I urge you to read them.  I especially liked the following:
we almost forgot one last problem: the hype. Champions of big data promote it as a revolutionary advance. But even the examples that people give of the successes of big data, like Google Flu Trends, though useful, are small potatoes in the larger scheme of things. They are far less important than the great innovations of the 19th and 20th centuries, like antibiotics, automobiles and the airplane.
Yes, the hype of it all!

Big Data's cheerleaders claim:
that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”.
So, what about those claims?
Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”
Bollocks. Nonsense. I love such clear and succinct opinions!

All these remind me of a short reading that I assign students to read.  Well, it is an actually an interview with George Dyson--yes, he is Freeman Dyson's son.  George Dyson offers a profound observation that "information is cheap, meaning is expensive":
We now live in a world where information is potentially unlimited. Information is cheap, but meaning is expensive. Where is the meaning? Only human beings can tell you where it is. We’re extracting meaning from our minds and our own lives.
Go figure!  With or without Big Data, that is ;)
A Measuring Worm
By
Richard Wilbur

This yellow striped green
Caterpillar, climbing up
The steep window screen,

Constantly (for lack
Of a full set of legs) keeps
Humping up his back.

It’s as if he sent
By a sort of semaphore
Dark omegas meant

To warn of Last Things.
Although he doesn’t know it,
He will soon have wings,

And I, too, don’t know
Toward what undreamt condition
Inch by inch I go.

2 comments:

Ramesh said...

Completely agree. Obtuse statistical analysis can prove, after a million hours of computer crunching, that your name is indeed Sriram.

Insights, great discoveries, major advances in any discipline, don;t come from data only. You cannot create genius from data.

Love bollocks and nonsense. Add Bullshit, crap and such other stuff and descriptions can be vivid and complete.

Meaning is expensive indeed. Or perhaps exclusively reserved for messiahs , even though you do not believe in them.

Sriram Khé said...

Who are you, and what did you do to the Ramesh I know who disagreed with me? ;)