Charles Darwin's information seeking strategy

This week's find is a peer-reviewed paper by Murdock, Allen, and DeDeo from Indiana University published in the journal Cognition. It is particularly appropriate to start this series because it is about information foraging strategy: should the next book I read deepen the knowledge I already have about a subject or should it teach me something new at a shallower level? It is a question that I ask myself constantly while considering my life and career objectives and I am sure a lot of other people do to. Information foraging is a field of research in information science as well and is relevant to recommender engines, etc.

The article in Cognition is fun to read because the authors analyze Charles Darwin's reading habits using text analytics (Latent Dirichlet Allocation) and then used the Kullback-Leibler divergence as a measure of "surprise" (how different is the new book from the previous one and from the body of books read before). Darwin recorded the dates when he read a particular non-fiction book and wrote notes about each one from July 1837 to May 1860. He read 687 books in that time, averaging a book every 10 days or so. The authors find that at the beginning his strategy was exploitation - he reinforced knowledge he already had - and later shifted to exploration. His exploration was well above the baseline for societal increase in knowledge (in principle each new book should contribute at least a few things that are new to the body of human knowledge).

About two years ago I invented an algorithm that functionally does the same thing as what the authors published; the measure of surprise was a different but related quantity (also an entropy) and I applied it to the analysis of companies instead of humans. I did think about trying out something like the present analysis, but I did not have the appropriate data. I was excited to read this paper and I am looking forward to when my patent application gets published by the US Patent and Trademark Office so that I can talk openly about this.

The Cognition article is here:
A non-technical summary is here:

