### The Bayesian D-Optimal Design In Mixture Experimental Design

- Research Article in Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
- Authors:
- Uqwatul Alma Wizsa, Utami Dyah Syafitri, Aji Hamim Wigena
- Abstract:
Mixture design is known as experimental design which is often used. The total number of components in the mixture is 100% and the value of each component must be greater than or equal to 0%. The industry sector is usually used the mixture design. Then, the D-optimality criterion can help to determi…

more »Mixture design is known as experimental design which is often used. The total number of components in the mixture is 100% and the value of each component must be greater than or equal to 0%. The industry sector is usually used the mixture design. Then, the D-optimality criterion can help to determine the possible compositions of the mixture to conduct some trial and error composition of the product. However, this criterion very depend on the assumption of the model. To reduce its dependence, the Bayesian approximation is used. The Bayesian D-optimal algorithm applied to a mixture consisting of two components with constraint functions. Ten design points formed from eleven candidate points. By applying the Bayesian D-optimal algorithm on two components of the mixture, the design has no convergent design as the result. So, to find the result, the classical D-optimal was used and three different points was formed.

### Spatio-temporal Bayes Regression with INLA in Statistical Downscaling Modeling for Estimating West Java Rainfall

- Research Article in Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
- Authors:
- Ro’fah Nur Rachmawati, Anik Djuraidah, Aji Hamim Wigena, I Wayan Mangku
- Abstract:
Currently, the inference of Bayes spatio-temporal regression in SD modeling is still used MCMC method, with convergence issue problem and very high demands for computational resources. When the spatio-temporal model is complex and designed hierarchically, MCMC computing becomes inefficient. Therefo…

more »Currently, the inference of Bayes spatio-temporal regression in SD modeling is still used MCMC method, with convergence issue problem and very high demands for computational resources. When the spatio-temporal model is complex and designed hierarchically, MCMC computing becomes inefficient. Therefore, this paper aims to predict observed and unobserved locations, using Bayes spatio-temporal model with efficient, fast, accurate and developed inference method, INLA. The response variable is monthly rainfall at 57 locations in West Java, Indonesia, observed from 1981-2017 and assumed to have normal distribution. The explanatory variables consist of spatial and temporal random effects and fixed effects of monthly precipitation GCM with 8x5 dimensions (40 variables) and the dimension is reduced with PCA. Our model successfully predicts monthly rainfall for observed and unobserved locations using spatial characteristics from nearly locations, and primely capture the monthly rainfall trends in annually cyclic behavior. The correlations between predict and real rainfall data is about 0.8 (for 0.65, 0.8 quantile) and 0.7 (for 0.95, 0.975 high quantile) with RMSEP is 151 for low (0.65) quantile. At the end of the research results, we present the regional rainfall for the entire West Java region. The eastern part near the central Java border has higher rainfall, as well as the west, while the north and south have lower rainfall.

### Small Area Estimation with Penalty for Specific Area Effects Selection

- Research Article in Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
- Authors:
- Novi Hidayat Pusponegoro, Anang Kurnia, Khairil Anwar Notodiputro, Agus Mohamad Sholeh, Erni Tri Astuti
- Abstract:
Small area estimation (SAE) techniques are now widely employed to produce parameter estimates for smaller domains where sample sizes cannot be used to deliver direct estimation. SAE as an indirect estimation method utilizes strength from other related small areas to improve the precision. In the 'B…

more »Small area estimation (SAE) techniques are now widely employed to produce parameter estimates for smaller domains where sample sizes cannot be used to deliver direct estimation. SAE as an indirect estimation method utilizes strength from other related small areas to improve the precision. In the 'Big data' era, database size and technology has developed rapidly. This leads to computational and statistical challenges since the availability of high data volume. Thus, the existing SAE methods can not longer handle the complexity of fixed effects or random effects in this data. The big data also provide large number areas as observation where not all them are small areas. This sparsity of random effects also brings out violation to its normal assumptions. Therefore, identifying the effective random effects is very important to ease the computational burden and to construct more interpretable models. This study presents a small area estimation method that is able to overcome the complexity of random effects with hard-ridge penalty. In this paper, simulations are delivered to demonstrate the performance of the methods and applied to estimate sub-district level mean of per capita income using the poverty survey data in Bangka Belitung Province at 2017.

### Comparing Decision Tree, Random Forest and Boosting in Identifying Weather Index for Rice Yield Prediction

- Authors:
- Mohammad Masjkur, Ken Seng Tan
- Abstract:
Modeling relationship of weather index and yield losses is a basis for developing weather-based index crop insurance. The data mining approach may overcome some limitations of traditional regression approaches to identify a weather index for predicting crop yield. The purpose of study is to evalu…

more »Modeling relationship of weather index and yield losses is a basis for developing weather-based index crop insurance. The data mining approach may overcome some limitations of traditional regression approaches to identify a weather index for predicting crop yield. The purpose of study is to evaluate performance Decision Tree, Random Forest and Boosting in identifying most important weather index for rice crop yield prediction. The study using district level of rice yield data of 8 locations within the annually period of 1991 – 2014 in Java region. The corresponding weather data consist of 48 weather variables including timescale Standardized Precipitation Index (SPI), Growing Degree Days (GDD), and Vapor Pressure Deficit (VPD) for growing season, respectively. Results show that Boosted Regression Tree is the best model compared to Regression Tree and Random Forest for rice yield prediction. The most important weather index is Growing Degree Days on growing season I (GDD I) and Growing Degree Days on growing season III (GDD III).The threshold values of GDD I > 21000C and GDD III > 21500C would trigger rice yield losses.

### Modeling of Quality of Education in Junior High Schoolusing Multivariate Adaptive Regression Splines (MARS) Method

- Authors:
- Urwawuska Ladini, Budi Susetyo, Indahwati Indahwati
- Abstract:
The accreditation of education system in Indonesia was established based on national education standards, while the national examinations were conducted to measure student’s academic achievement. The relationship between accreditation and national examinations is still under debate, considering tha…

more »The accreditation of education system in Indonesia was established based on national education standards, while the national examinations were conducted to measure student’s academic achievement. The relationship between accreditation and national examinations is still under debate, considering that both are important in measuring education quality. Multivariate adaptive regression spline is the implementation of regression problem solving to predict the variable response that continuously based on several independent variables arranged in a set of several basic function coefficients, which overall controlled in the regression data. This research conducted to model national examinations based on national education standard scores at the junior high school level and can accommodate interaction of independent variables. Facilities and infrastructure standards is the most important, which is 100% has a large influence on the goodness of the model, and then graduate competency standards with an importance of 45.34%. Process and management standards didn’t significantly influence national examinations.

### Implementation of the Beta Distribution Parameter Estimation Method on Empirical Bayes of Small Area Estimation

- Authors:
- Siti Rafika Fiandasari, Margaretha Ari Anggorowati
- Abstract:
There is a problem when the amount of available sample is not sufficient for estimating a parameter in sampel survey. Small Area Estimation can handle the problem with use additional variable, but there is a problem when the additional variable hard to get or not strong enough to correlate with the…

more »There is a problem when the amount of available sample is not sufficient for estimating a parameter in sampel survey. Small Area Estimation can handle the problem with use additional variable, but there is a problem when the additional variable hard to get or not strong enough to correlate with the response variable. Empirical Bayes method can handle that because it does not need an additional variable, but there are α and β in that method which needs to be estimated. This research uses four methods for estimating α and β that is Moment and Newton Raphson by Rao, Moment and Newton Raphson by Claire. Moment by Claire, Moment and Newton Raphson by Rao are more effective than Newton Raphson by Claire while Empirical Bayes estimator are more effective than direct estimator.

### The Bootstrap Stratified Random Sampling in Finite Population for Traffic Survey Data

- Authors:
- Kristiana Yunitaningtyas, Indahwati Indahwati, Muhammad Nur Aidi, Santi Susanti
- Abstract:
Traffic survey is an important technique used to measure traffic density and gas emissions produced by vehicles but generally it is carried out for a long period of time. This study aims to apply stratified random sampling to traffic survey data so as to improve the process of data collection and e…

more »Traffic survey is an important technique used to measure traffic density and gas emissions produced by vehicles but generally it is carried out for a long period of time. This study aims to apply stratified random sampling to traffic survey data so as to improve the process of data collection and efficiency with a high degree of accuracy. The data is divided into strata based on traffic density and is implemented using the direct bootstrap resampling technique by paying attention to the finite population correction factor. The bootstrap in finite population is expected to resolve the overestimate variance due to the standard bootstrap. Evaluation is done by looking at the criteria of validity, reliability, and accuracy of the bootstrap statistics. The results indicated that the bias and variance decrease when bootstrap replication is large. Bootstrap sample size of 32 produced the lowest distribution of bias, adjusted variance, and MSE value.

### Study of Robust Regression Modeling Using MM-Estimator and Least Median Squares

- Authors:
- Khusnul Khotimah, Kusman Sadik, Akbar Rizki
- Abstract:
Ordinary least squares (OLS) is a method commonly used to estimate regression equations. One solution handle OLS limitation to outlier problem is to use the robust regression method. This study used least-median squares (LMS) and multi-stage method (MM) robust regression. Simulation results of regr…

more »Ordinary least squares (OLS) is a method commonly used to estimate regression equations. One solution handle OLS limitation to outlier problem is to use the robust regression method. This study used least-median squares (LMS) and multi-stage method (MM) robust regression. Simulation results of regression analysis in various scenarios are concluded that LMS and MM methods have better performance compared to OLS on data containing vertical and bad leverage point outliers. MM method has lowest average parameter estimation bias, followed by LMS, then OLS. LMS has smallest average root mean squares error (RMSE) and highest average R^2 is followed by MM then OLS. The results of the regression analysis comparison of the three methods on Indonesian rice production data in 2017 which contains 10% outliers were concluded that the LMS is the best method. The LMS produces the smallest RMSE of 4.44 and the highest R^2 that is 98%.

### Cryptocurrency Forecasting using α-Sutte Indicator, ARIMA, and Long Short-Term Memory

- Authors:
- Apriliyanus Rakhmadi Pratama, Sigit Nugroho, Ketut Sukiyono
- Abstract:
The purpose of these studies are to obtain bitcoin price predictions using three different approach in forecasting methods : ARIMA model, α-sutte indicator and LSTM algorithm, and to find out the accuracy level of the three methods in forecasting bitcoin’s price as well. Bitcoin closing’s price eac…

more »The purpose of these studies are to obtain bitcoin price predictions using three different approach in forecasting methods : ARIMA model, α-sutte indicator and LSTM algorithm, and to find out the accuracy level of the three methods in forecasting bitcoin’s price as well. Bitcoin closing’s price each day taken from website of coin market starting from April 29 2013 to February 06 2019 was analyzed using R and Python softwares. Based on the smallest value of MSE, MAPE, and MAD, the LSTM algorithm gave the best prediction, followed by the α-Sutte indicator and the ARIMA model.

### K-Nearest Neighbor Learning based Diabetes Mellitus Prediction and Analysis for eHealth Services

- Research Article in EAI Endorsed Transactions on Scalable Information Systems: Online First
- Authors:
- Iqbal H. Sarker, Md. Faisal Faruque, Hamed Alqahtani, Asra Kalim
- Abstract:
Nowadays, eHealth service has become a booming area, which refers to computer-based health care and information delivery to improve health service locally, regionally and worldwide. An effective disease risk prediction model by analyzing electronic health data benefits not only to care a patient …

more »Nowadays, eHealth service has become a booming area, which refers to computer-based health care and information delivery to improve health service locally, regionally and worldwide. An effective disease risk prediction model by analyzing electronic health data benefits not only to care a patient but also to provide services through the corresponding data-driven eHealth systems. In this paper, we particularly focus on predicting and analysing diabetes mellitus, an increasingly prevalent chronic disease that refers to a group of metabolic disorders characterized by a high blood sugar level over a prolonged period of time. K-Nearest Neighbor (KNN) is one of the most popular and simplest machine learning techniques to build such a disease risk prediction model utilizing relevant health data. In order to achieve our goal, we present an optimal KNearest Neighbor (Opt-KNN) learning based prediction model based on patient’s habitual attributes in various dimensions. This approach determines the optimal number of neighbors with low error rate for providing better prediction outcome in the resultant model. The effectiveness of this machine learning eHealth model is examined by conducting experiments on the real-world diabetes mellitus data collected from medical hospitals.