Instacart Analysis
Instacart is an online grocery store that operates through an app. They already had very good sales, but wanted to uncover more about their sales patterns. My task was to perform an initial data and exploratory analysis of some of their data to show them how to improve their sales. This task took me just over a week to complete.
Data
-
Open source data sets downloaded directly from Instacart.
-
Al data sets include some kind of common identifier.
Skills
-
Python
-
Data wrangling
-
Data merging
-
Delivering variables
-
Grouping data
-
Aggregating data
-
Reporting in Excel
-
Population flows
Project/tools
-
Python
-
Jupyter Notebooks
-
Visual Studio Code 2
Consistency checks, data wrangling & merging
I was given four dataframes to work with: ’orders’, ‘products’, ‘departments’ and ’customers’. I then went through all the usual steps to get these dataframes ready to answer some questions. First I checked the consistency of the data and addressed all mixed type variables, missing values and all duplicates.
With that out of the way, I put on my cowboy hat, strapped on my boots and got my wrangling on. Here I dropped a couple of unnecessary columns, renamed a few and changed a couple of variables’ data types.
Finally I merged the four dataframes, into one, big, glorious dataframe, ready to tell me what I needed to know.
Exploratory analysis & visualizations
The stakeholders and sales team from Instacart had some questions that needed answering. They wanted to find out more about their customers’ purchasing behaviors, since they couldn’t target everyone the same way. They needed to create different marketing strategies for different groups of people and so they came to me for answers.
Questions that needed answering
The sales team needed to know what the busiest days of the week and hours of the day were in order to schedule ads at times when there were fewer orders. The code to get those answers were pretty straightforward.
Days of the week
0 – Saturday
1 – Sunday
2 – Monday
3 – Tuesday
4 – Wednesday
5 – Thursday
6 – Friday
Hours of the day
Who were the top buyers?
The older people got, the more spending power they had.
And what products were the top sellers?
Produce, followed by dairy/eggs and snacks.
I proposed the following:
Marketing
-
Run targeted ads during peak hours when most people are shopping, thus ensuring most visibility.
-
To attract more customers for the off-peak times, consider running ads that offer discounts and incentives for those periods.
Products
-
Produce is the best-selling department among all ages and family groups. Expand the department even further. Offer bigger varieties and more options and increase advertising.
Customers
-
Offer discounts and incentives to the people in the ‘lower income’ bracket as they spend almost no money compared to the rest. Vouchers sent directly to their house will give them a reason to shop.
-
Target the older people with products specifically made for them as they have the most spending power.
Retrospective
What went well
I really enjoyed this analysis. Working with multiple data sets consisting of 32.5 million rows that needed cleaning, wrangling and merging, was a great opportunity to practice and showcase my analytical skills. In the end, I managed to answer all of the stakeholders’ questions successfully.
What didn't go well
The dataframes had some constraints, which took a bit of time to overcome, but overall, I think this was a pretty straightforward analysis.
What's next
I would like to further explore and answer any additional questions Instacart might have. There is always room for improvement.
Final thoughts
Overall, I’m really happy with the entire project and how it all went. I ran into some issues, which my tutor helped me with and that actually gave me a chance to practice troubleshooting my code.