Data analysis with the Microsoft PowerBI tool. Features:
- Individual jobs.
- Evaluation: Content and presentation of the work (it is mandatory to defend the work).
- Submit in virtual.ipb.pt in the Activity ” Practical Work – PowerBI”
o No other submitted papers are accepted.
2 files must be submitted ( separately or zipped):
- PowerBI file (pbix)
- Report (pdf)
- Alternative – (Ordinary, Worker) (Final, Appeal, Special)
- Practical Work – 50%
- Final Written Exam – 50% (Minimum Exam Score : 7 values)
“Practical work – PowerBI”)
The dataset for this competition is a relational set of files describing customers’ orders over time. The goal of the competition is to predict which products will be in a user’s next order. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, we provide between 4 and 100 of their orders, with the sequence of products purchased in each order. We
also provide the week and hour of day the order was placed, and a relative measure of time between orders. For more information, see the blog post accompanying its public release
- Description of files
Each entity (customer, product, order, aisle, etc.) has an associated unique id. Most of the files and variable names should be self-explanatory.
aisle_id,aisle 1,prepared soups salads 2,specialty cheeses 3,energy granola bars
- order_products *.csv
These files specify which products were purchased in each order. order_products prior.csv contains previous order contents for all customers. ‘reordered’ indicates that the customer has a previous order that contains the product. Note that some orders will have no reordered items. You may predict an explicit ‘None’ value for orders with no reordered items.
This file tells to which set (prior, train, test) an order belongs. You are predicting reordered items only for the test set orders. ‘order_dow’ is the day of week (0 and 1 is Saturday and Sunday).
product_id,product_name,aisle_id,department_id 1,Chocolate Sandwich Cookies,61,19
3,Robust Golden Unsweetened Oolong Tea,94,7
1; Saturday, September 14, 1991; M;Poland
2; Thursday, November 30, 1989; M;Central African Republic 3; Friday, October 20, 1967; M;France…
Understanding of the dataset structure:
- users are identified by user_id in the orders csv file. Each row of the orders csv fil represents an order made by a user. Order are identified by order_id;
- Each order of a user is characterized by an order_number which specifies when it has been made with respect to the others of the same user;
- each order consists of a set of product each characterized by an add_to_cart_order feature representing the sequence in which they have been added to the cart in that order;
- for each user we may have n-1 prior orders and 1 train order OR n-1 prior orders and 1 test order in which we have to state what products have been reordered.
Purpose of the work
Those responsible for the site want the information system to produce reports that allow to improve decision making. Thus, the person responsible wishes to obtain answers to the following questions:
- Characterization of clients by age group , gender and nationality .
- On which day of the week ( and time) do women do the most shopping?
- What are the most important departments and corridors (by number of products)?
- What are the most important corridors in each department (by number of products).
- What are customers’ favorite departments and corridors (by age, gender and country)?
- When do customers do the most shopping : day of the week , time of day , department and corridor .
- What is the day of the week when we have the most new clients ( by age, gender and country)??
- How many products ( on average) are purchased by customers on the first order?
- How often do customers shop (total days between purchases)? Do underage customers shop more often?
- How many products do customers buy in each order (by day of the week and time of day)?
- How often do customers buy the same items again (reorders) – % of products in an order that were previously ordered?
- What products are best sold in the morning (0-12) and afternoon (12-24)? Which type does you do the most shopping in the two periods?
- Top 10 top-selling products on a given day of the week, department and corridor.
- Which products (Top 12) are most likely to be purchased again?
- What products (Top 10) do customers put first in the cart (by age , gender and country)?
- Is there any association between the date of the last order and the probability of a new order?
- Is there any association between number of requests and probability of re-order ?
- What is the percentage of organic requests vs. non-organic (by age, sex and country)??
- Which customers always buy the same products ( reorders).
- For a given product, identify the customers most likely to buy.
- Identify behavior pad sons (Dataming).
Need help with this assignment or a similar one? Place your order and leave the rest to our experts!