What are the benefits?
During data exploration, visualizations are helpful in forming immediate views on the relationships between key variables, and screening out other variables that appear to have no impact on the variables in which we are interested. This process usually generates insights that are informative, can be counter-intuitive and go against established views, and can be immediately helpful to the business.
Robust statistical analysis allows to validate or invalidate, from a statistical perspective, the hypotheses formulated during the exploration stage, and ultimately to substantiate important business decisions with hard facts.
Examples of business applications
Statistical methods enable organizations to confidently make decisions that are grounded in science.
Real-life examples of applications include:
How to get started?
- Drug efficacy
- Staff performance
- A/B testing
- Sales performance
- Seasonal effects
- Product comparison
- Customer engagement
- Marketing campaigns impact
- Client segmentation
- Company performance
- Product defaults
All is required to start a data exploration project is (i) a statement of an overall concern (e.g. "What are they key factors impacting my sales results?") and (ii) a dataset that is sufficiently comprehensive and exhaustive to take into consideration as many potential variables as possible (e.g. date, sales, shop location, shop area, staff on duty, weather, promotions, marketing efforts, product arrangements, etc).
Not all relevant data may be readily available at the beginning of the project. In such case, a first step will be to gather such data, based on a mutually defined list, over the course of a few weeks or months. This may require the adequate training of a number of key company employees and the implementation of data-gathering tools.
Once sufficient data has been made available, we will proceed to explore it and present our observations in a series of reports containing helpful visualizations. This process is iterative, as new insights or conclusions will likely lead to fresh questions on other possible relationships between variables.
Once all interesting patterns and relationships have been identified through exploration, we will proceed to formulate hypotheses for statistical confirmation or invalidation. This part is critical, as some of the observations made in the first stage may be due to pure chance. Rigorous statistical methods will allow us to make statements with a given level of confidence, e.g. "It is 95% likely that sales/m2 in location 1 are higher than sales/m2 in location 2."