{"id":31456,"date":"2021-01-29T07:11:21","date_gmt":"2021-01-29T12:11:21","guid":{"rendered":"https:\/\/centricconsulting.com\/?p=31456"},"modified":"2022-07-22T13:03:37","modified_gmt":"2022-07-22T17:03:37","slug":"small-data-an-experiment-with-the-little-data-that-has-a-powerful-impact","status":"publish","type":"post","link":"https:\/\/centricconsulting.com\/blog\/small-data-an-experiment-with-the-little-data-that-has-a-powerful-impact\/","title":{"rendered":"Small Data: An Experiment With the Little Data That Has a Powerful Impact"},"content":{"rendered":"<h2 style=\"text-align: center;\">We live in a big data world, so you may not think much of small data, but we look at just how big a story it can tell.<\/h2>\n<hr \/>\n<p class=\"intro-text\">If the world of data and analytics were a popularity contest, <a href=\"https:\/\/centricconsulting.com\/technology-solutions\/data-analytics\/big-data\/\">big data<\/a> would be the homecoming queen. Search for \u201cbig data\u201d in Google, and you\u2019ll get 142 million results in less than a second, as well as probably more than 250 articles from the past 24 hours alone. Dozens of new <a href=\"https:\/\/centricconsulting.com\/technology-solutions\/\">technologies<\/a> will help you process your big data, and you\u2019ll find countless conference talks centered on this single topic.<\/p>\n<p>\u00a0\u201cSmall data,\u201d on the other hand, is the nerdy kid in the corner. Try that same Google search with \u201csmall data,\u201d and you\u2019ll be met with less than 8 million results and maybe 15 articles from the past 24 hours &#8212; if it\u2019s been a busy day. Where\u2019s the love for small data? <strong>I\u2019m willing to contend that small data can make as much of an impact as its more popular friend, and I have a perfect example to support my stance.<\/strong><\/p>\n<h2>My Small Data Diversion<\/h2>\n<p>First, a disclaimer: I am a millennial by age, but certainly not by behavior or interests. I am not on Twitter, Instagram, Tiktok, or whatever else is popular these days. I enjoy jigsaw puzzles and crochet. I do not subscribe to any streaming services other than Disney+, and I do not listen to Taylor Swift. Lastly, I absolutely cannot stand bubbly waters. My beverage of choice will always be Diet Dr. Pepper.<\/p>\n<p>Though far from my favorite drink, bubbly water\u2019s appeal \u2014 and the astounding 16 flavors a particular leading sparkling water brand comes in \u2014 led my friends and me to get creative. In part, to answer a couple of questions: Why was there so much variety? Which flavors do people prefer the most? And in part, to beat the boredom and spend some (virtual) time together.<\/p>\n<p><strong>Enter: The 2021 Sparkling Water Tasting Tournament. To answer our questions, my friends and I decided to give each participant in a tasting tournament one can of each flavor and complete individual brackets based on our tasting preferences.<\/strong> We seeded the brackets by the most recently available national sales numbers.<\/p>\n<p>Throughout the week, we got on Zoom calls to debate the merits of certain flavors and to watch each other\u2019s faces as we tried some particularly egregious varieties. The group designated me as the official number-cruncher, and I was initially only tasked with aggregating the results to reach a consensus.<\/p>\n<p>But I\u2019m a curious soul, and I thought this set of small data had a bigger story to tell than an aggregate opinion. To be fair, this data barely qualifies as small &#8212; it\u2019s tiny. I set up <a href=\"https:\/\/centricconsulting.com\/blog\/three-reasons-being-comfortable-with-code-helps-you-deliver-better-value\/\">SQL Server Express<\/a> on my desktop and created a single table to house each bracket&#8217;s results. The resulting table has a whopping 17 columns and 15 rows. <strong>It weighs in at an astounding .016 MB. I\u2019m certain smaller datasets exist, but this is a gold star example of something so minuscule that any big data fan would deem it inconsequential.<\/strong><\/p>\n<p>However, small data\u2019s benefit is that its size allows for higher standards of cleanliness and for more time to slice, dice and love the data to see what it is telling you. In our world, time and money are always constraints. And whenever those are constraints, small data will have value. The time and money required to elucidate impactful information from small data are also small. Better to spend a bit of time and money finding the right questions to ask before investing in more significant endeavors. For example, a business could host a small focus group and analyze the results before launching into a larger A\/B marketing test or some similar, more expensive campaign.<\/p>\n<h2>What Did I Learn From This Small Set of Carbonated Data?<\/h2>\n<p>Initially, I had to determine a framework by which to quantify the results. None of the participants created finalized \u201cstandings,\u201d so I didn\u2019t have any ranked output. Instead, each round had winners and losers. To avoid introducing bias to the results, this left me with some ties. Each bracket had a clear winner and a clear second place. It then had two flavors that didn\u2019t advance past the final four, so I labeled those as sharing third place.<\/p>\n<p>Four flavors didn\u2019t advance past the elite eight, and I labeled those as sharing fifth place. The remaining eight flavors that lost in the very first round all tied for ninth place. Therefore, in all calculations, numbers closer to one were more desirable, and numbers closer to nine were less desirable (Lower is better. Higher is worse).<\/p>\n<p>Now that each flavor had a numerical result from each bracket, naturally, the first thing I wanted to know is who won. In this example, there were multiple ways to crown a \u201cwinner\u201d \u2013 each with its pros and cons, and each with its part of the story to tell. <strong>Since each flavor had a numerical rank, the simple first step was to average all flavor results and call the flavor with the lowest overall average the winner.<\/strong><\/p>\n<p>In this case, Key Lime came out on top with an average rank of 4.3. But there is another way to slice the numbers to reveal more of the details in the story. Another way a flavor could come out on top was if most participants chose it as their favorite. In this case, Mango and LimonCello were each selected by three participants (20 percent) as the best flavor.<\/p>\n<p>At this point, I noticed a few flavors floating to the top of the list \u2013 Key Lime, Mango and LimonCello were relatively popular. To round things out, I needed to have information about the bottom of the stack, too. Which flavors fared poorly? <strong>The lowest possible average rank for any flavor would be a nine if all participants didn\u2019t take the flavor out of the first round.<\/strong><\/p>\n<p>In this dataset, two flavors that scored the worst were Pasteque with an average of 8.1 and Passionfruit with 8.3. Yikes! Even after just these three quick exercises, I could easily tell that if I had every flavor available to me (and no Diet Dr. Pepper), I would have at least some flavors I\u2019d willingly try before others \u2013 and some you\u2019d have to bribe me to sample.<\/p>\n<h2>Breaking Down My Small Data Results Even Further<\/h2>\n<p>The first set of results pointed me to another question \u2013 Key Lime only had an average of 4.3, so obviously, some participants must have ranked it relatively poorly. I would therefore expect quite a bit of variance in the data. But which flavors were most polarizing or controversial, and therefore potentially popular with specific audiences but derided by others? <strong>Standard deviation is a quick, easy, readily accepted, and generally universally understood measure for exactly this sort of question.<\/strong><\/p>\n<p>When I calculated the standard deviation for this dataset, two flavors had remarkably higher standard deviations than the rest \u2013 LimonCello and Tangerine. Interestingly, LimonCello showed up earlier in the list of three potential \u201cwinners\u201d of the tournament but then showed up as the flavor with the most varied responses.<\/p>\n<p>Three flavors had particularly low standard deviations on the other end of the spectrum \u2013 Passionfruit, Lemon, and Pasteque. The story became more evident. Pasteque and Passionfruit were both near the bottom of the average overall rankings and had low variance. As much as the top of the brackets seemed murky, the bottom became more apparent. Perhaps preferences and favorites varied, but some flavors certainly appeared more universally disliked.<\/p>\n<p>Since the data pointed toward varied tastes, the next path I wanted to explore was determining which participants seemed to have normal (or abnormal) preferences compared to the group. This digs into the rational explanation for variances in flavor performance \u2013 human taste differs. In a group of friends, this was an entertaining exercise. After all, who wouldn\u2019t want to know which of their group had the most eccentric taste?<\/p>\n<p>In this case, the data told me something even potentially more noteworthy. <strong>I calculated the difference between their rank for each flavor and the group average of that flavor for each participant.<\/strong> I then summed the absolute values of all the differentials to come up with what I fondly titled a \u201cweirdness score.\u201d The average weirdness score of all participants was 36, and it turns out that all weirdness scores fell between 30 and 39, except two. Those two scores were 42.1 and 54.5, and these each belonged to the only two children who completed brackets. I\u2019m hesitantly proud to say that the weirdest tastes of all belonged to my daughter, age four.<\/p>\n<p>Now, a weirdness score sounds silly, but in this dataset, it tells us something important we otherwise would have had to utilize some data science tricks, like clustering, to find out. Children have different tastes than adults. No one is shocked by that, but in less than 10 minutes of simple averaging, addition and subtraction I discovered these results without specifically looking for them. <strong>Even if I didn\u2019t know it already, the data would\u2019ve led me to realize that part of the story.<\/strong><\/p>\n<p>Now that I was calculating differences, it was easy to ask a few related questions using the same basic math on different groupings of participants and results. For example, calculating differentials between all participants individually (instead of between each participant and the average) determines which participants had similar tastes. On a small scale, this yielded some very entertaining results. At a larger scale, companies could use similar information to pair customers or even suggest products based on what similar customers also enjoyed.<\/p>\n<h2>Comparing the Results: Small Data With a Big Story<\/h2>\n<p>A final way I wanted to utilize the idea of differentials is by comparing the flavors themselves. Regardless of individual participants, did like (or dislike) of any one flavor correlate with liking or disliking any other flavor? <strong>Using this tiny data set, I leveraged the results into a baby version of predictive analytics. It turns out that in this small data, there were some remarkably strong relationships.<\/strong><\/p>\n<p>For example, anyone who liked Peach-Pear was much more likely also to enjoy LimonCello. If a participant took Peach-Pear to the final four, the average rank of LimonCello was 2.3. However, if Peach-Pear fell outside of the final four, the average for LimonCello dropped to 7.1. Perhaps this isn\u2019t shocking since both flavors are similarly sweet and pretty pronounced. Surprisingly, an even stronger like-like correlation existed between Coconut and Apricot \u2013 not a typical pairing based on flavor similarity. If Coconut made it to the final four, the average rank of Apricot was 2.0 \u2013 a solid second place. But if Coconut was outside of the final four, the average for Apricot fell to 7.9.<\/p>\n<p>Both of these are examples of like-like correlations, but I also searched for like-hate correlations. For example, those participants who took Razz-Cranberry to the final four soundly hated Tangerine \u2013 it had an average rank of 9.0, meaning it never made it out of the first round. But if participants didn\u2019t put Razz-Cranberry in their final four, the average for Tangerine rose to 4.8. If you like Razz-Cranberry, I might suggest you stay away from Tangerine.<\/p>\n<p><strong>I can also compare my results with the data set that generated the original seeding to see which flavors taste \u201cbetter\u201d (or worse) than their sales data would seem to predict.<\/strong> In this group, the biggest over-achievers were:<\/p>\n<ul>\n<li><strong>Peach-Pear<\/strong> &#8212; Seeded dead last but came in ninth place overall<\/li>\n<li><strong>Mango<\/strong> &#8212; Seeded 10th but came in second place<\/li>\n<li><strong>Lemon<\/strong> &#8212; Seeded 11th but came in third place.<\/li>\n<\/ul>\n<p>And the most significant relative losers were:<\/p>\n<ul>\n<li><strong>Pamplemousse<\/strong> &#8212; Seeded first but came in a disappointing eighth place<\/li>\n<li><strong>Pasteque<\/strong> &#8212; Seeded sixth but came in 15th place<\/li>\n<li><strong>Passionfruit<\/strong> &#8212; Seeded seventh but came in dead last.<\/li>\n<\/ul>\n<p>Discrepancies like this in any small data set shed a different light on the story. They lead to more questions to ask, and potentially to dig into further with different audiences. <strong>By following these leads, a business might identify if they\u2019re spending way too much money marketing a specific flavor that otherwise wouldn\u2019t perform well, and see their efforts could better support a different product.<\/strong><\/p>\n<h2>Conclusion<\/h2>\n<p>In the end, with very low investment (sixteen 8-packs of sparkling water and less than 3 hours of data analysis), I\u2019ve landed on a few significant insights, a lot of entertainment for participants, several potential pathways for further exploration, and a better understanding of the stories this data was trying to tell. I also learned that I would still rather drink Diet Dr. Pepper, but if the world runs out of soda, I\u2019ll look for a can of LimonCello.<\/p>\n<p>I\u2019m an engineer by training and a technical consultant by trade \u2013 but, I\u2019m a storyteller at heart. <strong>Data isn\u2019t useful as data \u2013 it only becomes useful when you take that data, turn it into information that becomes knowledge, and apply that knowledge to your situation until it becomes wisdom.<\/strong> Often, that requires your data to tell a story to unite a group of people around a shared understanding. And small data frequently has big stories to tell.<\/p>\n<p>Sometimes the outcome of that story may be as simple as shared laughs among a group of friends who cannot spend as much time together as they\u2019d like. But, that outcome could provide a deeper understanding of which beverages perform better with different subsets of the population, which might lead to a new marketing push and increased sales. Don\u2019t be tempted to take your small data for granted. The homecoming queen is popular for a reason, but the nerdy kid in the corner might be your key to success.<\/p>\n<p>As my favorite author, JRR Tolkien, <strong>almost<\/strong> wrote: \u201cEven the smallest [data] can change the course of the future.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We live in a big data world, so you may not think much of small data, but we look at just how big a story small data can tell.<\/p>\n","protected":false},"author":305,"featured_media":31457,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_oasis_is_in_workflow":0,"_oasis_original":0,"_oasis_task_priority":"","_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","footnotes":""},"categories":[1],"tags":[18616,16611],"coauthors":[21028],"acf":[],"publishpress_future_action":{"enabled":false,"date":"2024-07-21 22:01:03","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"category"},"_links":{"self":[{"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/posts\/31456"}],"collection":[{"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/users\/305"}],"replies":[{"embeddable":true,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/comments?post=31456"}],"version-history":[{"count":0,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/posts\/31456\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/media\/31457"}],"wp:attachment":[{"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/media?parent=31456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/categories?post=31456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/tags?post=31456"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/centricconsulting.com\/wp-json\/wp\/v2\/coauthors?post=31456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}