Excel vs R for Prediction Analysis
Project Details:
This project is sub-part of another project that aims to reduce the usage of air conditioning by studying the optimal ambient temperatures and switching between air conditioning and room ventilation. The main project can be accessed in this link. Download links to projects and report available at the end of the page
​​
For the main project, there was a requirement to collect data on air conditioning usage. This sub project was a trial to see if prediction models can accurately predict the operation of the aircon. If the prediction model achieves a certain level of accuracy, manual recording would not be necessary which would save time and effort. This project uses data from 2 sensors, one sensor placed in a room where the air conditioning is operated and another control room where air conditioning is not operational.
​
The main aim of this prediction project is to highlight the ease and efficiency an organisation can achieve by using industry 4.0 methods such as machine learning instead of industry 3.0 methods of downloading reports and analysing them using Excel.
​
The study found that using machine learning methods instead of manual analysis using excel (as a company practicing industry 3.0 would) saves up to 80% time as well as increase the prediction accuracy. Furthermore, scalability and reproducibility were also found to be easier using a programming language like R instead of Excel.
​
Scroll down to view the project highlights and download links for project files
Software / programming languages used:
Excel
SQL
R
PowerBI
Statistical testing methods
Machine Learning
Highlights
Best prediction model achieved using Excel:

Best excel model: For the best excel model, an infliction point, which is the difference in temperature between the air-conditioned room and control room. It was found that at different time periods and at different temperatures, this infliction point that decides the status of the aircon can change, hence the above matrix was devised.

Best excel model: In addition to the infliction points being segregated by temperature and time, an offset was required. In the above matrix, the offsets are in minutes. It was noticed that it took a while for the temperatures to reach the specific infliction point when the air conditioning was turned on or off, hence, these offsets had to be applied to count for the delays in reaching infliction points.

Best excel model: Due to the complexity of using multiple timings and temperatures as well as offsets, the excel formulas had to be split into 2 columns with multiple nested if statements

Best excel model: With this method, we were able to achieve a 97.5% prediction rating, however, it should be noted that this is on the "training" dataset itself

Best excel model: However, on the test data, the best prediction rating that was reached was only 72%. Although 72% is fair, it is nowhere near the accuracy needed.
Now let's look at how R and machine learning fares against Excel

Best R model: Single decision tree yielded the greatest accuracy. The decision tree was set up using the Rpart method with cross validation, 6 predictors predicting 2 classes was found to give the best accuracy
