Description Possible Marks and Wtg(%) Word Count Due date
Assignment 2 Written Practical Report 100 marks 30% Weighting 3000 25/04/16
The key frameworks and concepts covered in modules 1–5 are particularly relevant for this assignment. Assignment 2 relates to the specific course learning objectives 1, 2 and 4 and associated MBA program learning goals and skills: Global Content, Problem solving, Critical thinking, and Written Communication at level 3:
1. demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes
2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making
and sustainable business performance management can effectively address real world
4. demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.
Note you must use RapidMiner Studio for Task 1 and Tableau Desktop for Task 3 in this Assignment 2. Failure may result in one or more Tasks 1 or 3 not being marked and awarded zero marks.
Note carefully University policy on Academic Misconduct such as plagiarism, collusion and cheating. If any of these occur they will be found and dealt with by the USQ Academic Integrity Procedures. If proven Academic Misconduct may result in failure of an individual assessment, the entire course or exclusion from a University program or programs.
Assignment 2 consists of three main tasks and a number of sub tasks
Task 1 Exploratory Data Analysis and Decision Tree Analysis (Worth 30 Marks)
a) Assignment 2 requires that you research and critically evaluate literature surrounding the problem of effectively assessing loan applications for credit worthiness. Credit worthiness assessment reduces the risks associated with lending by determining which potential loan applications are considered to be good, or alternatively a poor, credit risk and should on that basis be approved or rejected. Good risk management of loan applications can significantly improve the bottom line of financial institutions such as banks, building societies and credit unions. This research will inform your assessment and identification of the key variables in the credit data set which is provided for Assignment 2. Note you should also refer to the data dictionary provided in Appendix A of this document and with the creditdata.csv file as this document defines each of the variables and their range of values. (About 250 words).
b) Using RapidMiner Studio data mining tool conduct an exploratory analysis of the creditdata.csv data set on the Assignment 2 folder on course study desk which is provided on the CIS8008 course study desk to identify what you consider to be top five key variables which contribute to determining whether a potential loan applicant is a good credit risk or a bad credit risk. Note you should also refer to the data dictionary provided in Appendix A of this document and with the creditdata.csv file as this document defines each of the variables and their range of values.
Then using RapidMiner Studio data mining tool build a simple predictive model of Credit risk using a reduced creditdata.csv data set using a DecisionTree.
Discuss each of your five top variables in about 50 words in terms of the results of your exploratory data analysis and discuss the results of your decision tree analysis drawing on the key outputs from RapidMiner Studio data mining tool and the relevant supporting literature on credit assessment and relevant supporting literature on the interpretation of decision trees. Your discussion should also include appropriate statistical analysis results such as graphs and results tables from conducting an exploratory data analysis in the RapidMiner data mining tool with some supporting references on predictive model building and interpretation using Decision Trees in data mining (about 250 words).
Task 2 Data Warehousing and Big Data (Worth 35 Marks)
A data warehouse is the foundation of any Business Intelligence or Business Analytics initiative. Consider the following scenario a large local government consisting of seven departments with many different data sets residing in each department. They want high level advice on the logical design of a data warehouse that would incorporate big data analytics.
(a) Discuss the possible approaches could be used for designing a data warehouse architecture using Kimball or Inmon’s methodology and provide a high level logical design of a data warehouse architecture.
(750 Words) (b) Discuss how your high level warehouse architecture design in part A could incorporate the capture processing storage and presentation of big data. Your answer here should focus on providing explanation of a revised high level diagrammatic representation of the logical design of your data warehouse including how big data analytics would be incorporated/integrated in the logical design of your data warehouse.
Note that the coverage of these concepts in textbook Chapter 2 Data Warehousing is somewhat limited and dated and may not be current thinking for such a fast moving field. Hence you will need to research and critically review the current literature in relation to the concept of data warehouses and different data warehouse design architectures and data warehouse architecture design methodologies in more detail. You will also need to consider how big data is being incorporated/integrated into data warehouses initiatives in order to provide a comprehensive and informed answer to these sub questions for Task 2.
Task 3 Sales Reports using Tableau Desktop (Worth 25 Marks)
Task 3 Sales Reports using Tableau Desktop consists of the following sub tasks
With the following Excel file SalesSuperstore.xlsx provided on the course study desk Assignment 2 Folder link and using Tableau Desktop produce the four following reports with appropriate accompanying graphs based on a Tableau workbook sheet view for each. Briefly comment on each report in about 125 words in terms of what trends and patterns are apparent in each report.
The SalesSuperstore.xlsx file contains the following dimensions and information:
1. C u s t o m e r N a m e , C u s t o m e r S e g m e n t
2 . L o c a t i o n - R e g i o n , S t a t e , C i t y , Z i p c o d e
3. Product Category, Sub Category, Product Name, Product Container, Unit Price
4. O r d e r I n f o r m a t i o n
5 . S h i p p i n g I n f o r m a t i o n
6. Sales Information
7. P r o f i t
a) Create a report and accompanying graph using Tableau that shows a trend analysis for sales by Product Category over the years 2009 to 2012 and comment on key trends and patterns apparent in this report (About 125 words)
b) Create a report and accompanying graph using Tableau that shows for each Product Category Average Profit and Total Sales for each month over the years 2009 to 2012 and comment on key trends and patterns apparent in this report (About 125 words)
c) Create a geographical map presentation using Tableau that shows graphically the relative size by City within each state, Product Sales for year 2010 and comment on key trends and patterns in this report (About 125 words)
d) Create a report and accompanying graph using Tableau that shows for Product Sub Categories that are technology based Unit Prices, Sales and Profit for each month over the years 2009 to 2012 and comment on key trends and patterns in this report (About 125 words) Your assignment 2 report must be structured in report format as follows:
Cover page for assignment 2 report
1. Title Page
2. Table of Contents
3. Body of report – main sections and subsections for assignment 2 task and sub tasks so 3.1 Task 1 will be a main heading with appropriate sub headings etc....for each sub task etc..
3.2 Task 2 …
3.3 Task 3 ….
4. List of References
5. List of Appendices
You need to submit two files when you submit Assignment 2
1. Your Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with the extension .docx
2. Your Assignment 2 Task 3 as a Tableau packaged workbook with the extension .twbx
Use the following file naming convention:
1. Student_no_Student_name_CIS8008_Ass2.docx and
Online Assignment submission
All assignments must be submitted electronically via the course study Assignment 2 submission link and are subject to automated checking for plagiarism, collusion and cheating by Turnitin when you submit your Assignment 2 documents via the Assignment 2 submission link.
Note carefully University policy on Academic Misconduct such as plagiarism, collusion and cheating. If any of these occur they will be found and dealt with by the USQ Academic Integrity Procedures.
Harvard referencing resources
Install a reference tool (example Endnote) which integrates with your word processor. These tools are a great help for referencing and citing sources in your assignments. For more information on how to get Endnote you may visit the following webpage: http://www.usq.edu.au/library/referencing/endnote-bibliographic-software.
Study the referencing techniques for Harvard Referencing. The USQ Librarian has compiled the following resources on how to reference correctly using the Harvard referencing system – make use of these excellent resources if you are unsure as how to reference correctly using Harvard referencing system.
Library Harvard Referencing Guide http://www.usq.edu.au/library/referencing/harvard- agps-referencing-guide
Appendix A Data Dictionary and Description of the creditdata.csv data set.
1. Title: German Credit data – creditdata.csv
2. Number of Instances: 1000
3. Number of Attributes: 22 (8 numerical, 14 categorical)
4. Table with Attribute description for creditdata.csv
Attribute Name Type of Attribute Range of attribute
1. Custno Customer Id Custno1 to Custno1000
2. Checking Status of existing checking account (qualitative) A: = 0 DM
B: = 200 DM
C: = 200 DM / Salary assignments for one year
D: No checking account
3. duration Duration in months of loan (numeric)
4. history Credit history (qualitative) A: no credits taken/ all credits paid back duly
B: all credits at this bank paid back duly
C: existing credits paid back duly till now
D: delay in paying off in the past
E: critical account/other credits existing (not at this bank)
5. purpose Purpose of proposed loan (qualitative) A: car
B: car (used)
C: furniture/equipment D: radio/television
E: domestic appliances
6. amount Credit amount (numeric)
7. savings Savings account/bonds (in German currency) (qualitative) A: 100 DM
B: 100 = ... 500 DM
C: 500 = ... 1000 DM
D: = 1000 DM
E: unknown/ no savings account
8. employed Present employment since (qualitative) A: unemployed
B: 1 year
C: 1 = ... 4 years
D: 4 = ... 7 years
E: = 7 years
9. instalp Instalment rate as percentage of disposal income (numeric)
10. marital Personal status and sex (qualitative) A: male: divorced/separated
B: female : divorced/separated/married
C: male : single
D: male : married/widowed
E: female : single
11. coapp Other debtors / guarantors (qualitative) A: none
12. resident Present residence since (numeric) in years
13. property Property (qualitative) A: real estate
B: if not A: building society savings agreement/life insurance
C: if not A/B: car or other, not in attribute 6
D: unknown / no property
14. age Age in years (numeric)
15. other Other instalment plans A: bank
16. housing Housing (qualitative) A: rent
C : for free
17. excred Number of existing credits at this bank (numeric)
18. job Job (qualitative) A: unemployed/ unskilled - non-resident
B: unskilled - resident
C: skilled employee / official D: management/ self-employed/ highly qualified employee/ officer
19. depends Number of people being liable to provide maintenance for (numeric)
20. telephone Telephone A: none
B: yes, registered under the customers name
21. foreign Foreign worker (qualitative) A: yes B: no
22. credit_rating Credit rating (qualitative) Good Bad