All Posts by Sayantam Dey

Sayantam Dey

Senior Director Engineering

Sayantam Dey is the Senior Director of Engineering at 3Pillar Global, working out of our office in Noida, India. He has been with 3Pillar for ten years, delivering enterprise products and building frameworks for accelerated software development and testing in various technologies. His current areas of interest are data analytics, messaging systems and cloud services. He has authored the ‘Spring Integration AWS’ open source project and contributes to other open source projects such as SocialAuth and SocialAuth Android.

How Can Bots Understand Natural Language?

I picked up on a conversation between Alice and Bob that went like this: Alice: “The quick brown fox jumped over the lazy dog.” Bob: “Ok.” Alice: “Who jumped over the dog?” Bob: “The fox.” Alice: “What was the color of the fox?” Bob: “Brown.” Alice: “What was the dog like?” Bob: “Lazy:” This is… Read more »

Understanding Data in Data Science – Statistical Inference

In my previous posts in this series, I wrote on the analysis of single variables and multiple variables. The measures described in the previous posts apply to the entire known population, as well as random samples of the population. However, an additional challenge is introduced with random sampling – how do we tell if the sample mean, median,… Read more »

Understanding Data in Data Science – Multiple Variable Summaries

In my previous post on understanding data for analysis, I described the common approaches for the analysis of single variables. In this post, I’ll summarize the common approaches for analyzing the relationships between multiple variables in your data. Why is an analysis of the relationships important? Let’s start with a paradox. Simpson’s Paradox An intriguing effect is… Read more »

Using the ELK Stack for Data Analysis

ELK is a popular abbreviation of the Elasticsearch, Logstash, and Kibana stack. This is an end-to-end stack that handles everything from data aggregation to data visualization. On a recent project, I needed a database with a schema-less data model for aggregated queries and fast searching. I filtered my options to two choices – Elasticsearch and Solr… Read more »

Using Grid Heat Maps for Data Visualization

Heat maps represent values in a matrix as colors. Traditionally, heat maps have been used to indicate the level of activity in different systems. For example, a load test result can represent requests to different parts of the application as a heat map. The heat map appears as a mass of colors chosen from a… Read more »

Understanding the Data in Data Science

The most time-consuming aspect of any data science project is the transformation of data to a format that an analyst can use to build models. This is more critical for parametric models, which assume known distributions in the data. However, even before you begin to transform the data, you need to understand it. What does… Read more »

Real Time Analytics & Visualization with Apache Spark

I’m sure you’ve heard fast data is the new black? If you haven’t, here’s the memo – big data processing is moving from a ‘store and process’ model to a ‘stream and compute’ model for data that has time-bound value. For example, consider a logging subsystem that records all kinds of informational, warning, and error… Read more »

Topic Clusters with TF-IDF Vectorization using Apache Spark

In my previous blog about building an Information Palace that clusters information automatically into different nodes, I wrote about using Apache Spark for creating the clusters from the collected information. This post outlines the core clustering approach that uses the new DataFrame API in Apache Spark and uses a zoomable circle packing to represent the clusters… Read more »

Our Secret Sauce with Intent Analysis

Interpreting the intent behind spoken or written language is a very complex process that humans learn through experience in different social settings. This is one reason why literature is studied in universities around the world and academics earn doctoral degrees distilling the various nuances of the works of great literary figures. 3Pillar recently teamed up… Read more »

Building a Software Information Palace

On the television show The Mentalist, the protagonist, Patrick Jane, often describes a “Memory Palace,” which he purportedly uses to store vast amounts of information which he is able to retrieve at will. To create this palace, he advises choosing a large, real, physical location with which you are intimately familiar. Once you have such a… Read more »