### Reducing Bitrate and Increasing the Quality of Inter Frame by Avoiding Quantization Errors in Stationary Blocks

- Research Article in EAI Endorsed Transactions on Industrial Networks and Intelligent Systems: Online First
- Authors:
- Xuan-Tu Tran, Ngoc-Sinh Nguyen, Duy-Hieu Bui, Minh-Trien Pham, Hung K. Nguyen, Cong-Kha Pham
- Abstract:
In image compression and video coding, quantization error helps to reduce the amount of information of the high frequency components. However, in temporal prediction the quantization error contributes its value as noise in the total residual information. Therefore, the residual signal of the inte…

more »In image compression and video coding, quantization error helps to reduce the amount of information of the high frequency components. However, in temporal prediction the quantization error contributes its value as noise in the total residual information. Therefore, the residual signal of the inter-picture prediction is greater than the expected one and always differs zero value even input video contains only homogeneous frames. In this paper, we reveal negative effects of quantization errors in inter prediction and propose a video encoding scheme which is able to avoid side effects of quantization errors in the stationary parts. We propose to implement a motion detection algorithm as the first stage of video encoding to separate the video into two parts: motion and static. The motion information allows us to force residual data of non-changed part to zero and keep the residual signal of motion regularly. Beside, we design block-based filters which improve motion results and filter those results fit into block encode size well. Fixed residual data of static information permits us to pre-calculate its quantized coefficient and create a bypass encoding path for it. Experimental results with the JPEG compression (MJPEG-DPCM) showed that the proposed method produces lower bitrate than the conventional MJPEG-DPCM at the same quantization parameter and a lower computational complexity.

### Histogram-based Feature Extraction for GPS Trajectory Clustering

- Research Article in EAI Endorsed Transactions on Industrial Networks and Intelligent Systems: Online First
- Authors:
- Chi Nguyen, Thao Dinh, Van-Hau Nguyen, Nhat Phuong Tran, Anh Le
- Abstract:
Clustering trajectories from GPS data is a crucial task for developing applications in intelligent transportation systems. Most existing approaches perform clustering on raw data consisting of series of GPS positions of moving objects over time. Such approaches are not suitable for classifying mo…

more »Clustering trajectories from GPS data is a crucial task for developing applications in intelligent transportation systems. Most existing approaches perform clustering on raw data consisting of series of GPS positions of moving objects over time. Such approaches are not suitable for classifying moving behaviours of vehicles, e.g., how to distinguish between a trajectory of a taxi and a trajectory of a private car. In this paper, we focus on the problem of clustering trajectories of vehicles having the same moving behaviours. Our approach is based on histogram-based feature extraction to model moving behaviours of objects and utilizes traditional clustering algorithms to group trajectories. We perform experiments on real datasets and obtain better results than existing approaches.

### Hidden Markov Model for Exchange Rate with EWMA Control Chart

- Research Article in Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
- Authors:
- Rahmawati Ramadhan, Dodi Devianto, Maiyastri Maiyastri
- Abstract:
Nowadays, the US dollar exchange rate is still very influential on the exchange rate stability of many countries, including Indonesia. The effect of the US Dollar exchange rate has caused the fluctuation of Rupiah exchange rate. That is one of the cases that can be modeled with the Hidden Markov Mo…

more »Nowadays, the US dollar exchange rate is still very influential on the exchange rate stability of many countries, including Indonesia. The effect of the US Dollar exchange rate has caused the fluctuation of Rupiah exchange rate. That is one of the cases that can be modeled with the Hidden Markov Model (HMM) as the development of a Markov chain in which its state is not able to be observed directly (hidden), but it is only able to be observed through a set of other observations. In this paper, Exponentially Weighted Moving Average (EWMA) control chart will be used to determine the state of HMM. Based on the EWMA control chart, there are three states which are increase, decrease, and constant. The probability of the changes of exchange rate will be predicted in 2019 with the Baum Welch Algorithm on HMM. By using 240 exchange rate data of US Dollar to Rupiah in 2018, it is predicted the changes of exchange rate in 2019 are increased with a probability of 0.57. The results of HMM have connected to the EWMA control chart where they have eight uncontrolled data with two states increase and six states decrease. Thus, the existence of uncontrolled data implies the probability of increasing of the exchange rate in 2019.

### Prediction of Number of Claims using Poisson Linear Mixed Model with AR(1) random effect

- Research Article in Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
- Authors:
- Fia Fridayanti Adam, Anang Kurnia, I Gusti Putu Purnaba, I Wayan Mangku
- Abstract:
This study focuses on the number of claims data in an insurance company in Indonesia in 35 locations. The approach taken is a linear Poisson mixed model with two random effects. The response variable is number of claims, the fixed variable is deductibles and random effects are the area and the mo…

more »This study focuses on the number of claims data in an insurance company in Indonesia in 35 locations. The approach taken is a linear Poisson mixed model with two random effects. The response variable is number of claims, the fixed variable is deductibles and random effects are the area and the month of occurrence which is assumed to follow the first-order autoregressive process. Fixed and random component estimation is carried out based on MPQL while estimating component variance is using REML which the initial values are β_0= 0,β_1= 0,σ_v^2= 0.5, and σ_u^2= 1. Modeling is carried out on training data which is 75% of observations and predictions carried out with testing data which is 25% of the observations. Modeling on training and testing data produces accurate models in almost all regions included in the model. This are indicated by the MAPE values which are less than 20% in all regions.

### Mean Square Error of Non-Sampled Area in Small Area Estimation

- Research Article in Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
- Authors:
- Faisal Haris, Azka Ubaidillah
- Abstract:
Small area estimation (SAE) is a statistical technique to predict the parameter of subpopulation with small or even zero sample size. An area with zero sample size can be estimated with the support of cluster information. The area random effect assumed has a similarity between region and can be ana…

more »Small area estimation (SAE) is a statistical technique to predict the parameter of subpopulation with small or even zero sample size. An area with zero sample size can be estimated with the support of cluster information. The area random effect assumed has a similarity between region and can be analyzed by clustering the auxiliary variables. In SAE, Mean square error (MSE) is used to compare the precision of parameter estimates. But, there is no study that discuss the MSE of non-sampled area in SAE. The main idea of this research is to modify the existing statistical method by adding the cluster information with the assumption that there are similar characteristics of similar areas. The new method was evaluated by data simulation and case study to check the performance. The data simulation show that all modified methods produce a relatively similar MSE of non-sampled area..

### Simulation Study to Describe Bayesian Analysis of Nonlinear Structural Equation Modeling

- Authors:
- Ferra Yanuar, Aidinil Zetra
- Abstract:
Structural equation modeling (SEM) has widely used in many disciplines, such as economic, politic or health. Nonlinear structural equation modeling, as part of SEM, also has been developing analytically but still limited. In this method, the parameter models are estimated using conjugate prior in B…

more »Structural equation modeling (SEM) has widely used in many disciplines, such as economic, politic or health. Nonlinear structural equation modeling, as part of SEM, also has been developing analytically but still limited. In this method, the parameter models are estimated using conjugate prior in Bayesian approach. In nonlinear SEM, the models are specified including quadratic forms and/or interactions of latent variables. Posterior mean and posterior variance of the parameters are estimated using iteration approach since it is difficult to estimate those parameters model using analytical approach. The iteration approach used here is Markov Chain Monte Carlo (MCMC) method with Gibbs sampling. The simulation study is done to illustrate the proposed estimation methods for nonlinear model. A group of 300 data are generated to demonstrate the implementation of the proposed method. This study resulted that the proposed nonlinear SEM model could be accepted based on criteria of goodness of fit model.

### A New Mixture Distribution for Extreme Excess Zeros: Negative Binomial-Generalized Exponential (NB-GE) Distribution

- Authors:
- Junifsa Afly Prameswari, Ida Fithriani, Siti Nurrohmah
- Abstract:
Negative Binomial-Generalized Exponential (NB-GE) distribution is a distribution that capable for modeling overdispersion data with extreme excess zeros, which is more than 80% zeros in a data. The distribution is a mixture distribution that obtained by mixing the Negative Binomial (NB) distributio…

more »Negative Binomial-Generalized Exponential (NB-GE) distribution is a distribution that capable for modeling overdispersion data with extreme excess zeros, which is more than 80% zeros in a data. The distribution is a mixture distribution that obtained by mixing the Negative Binomial (NB) distribution with the Generalized Exponential (GE) distribution. The formation of the Negative Binomial-Generalized Exponential (NB-GE) distribution and the characteristics of the Negative Binomial-Generalized Exponential (NB-GE) distribution such as the probability density function, kth moment, mean, variance, skewness and kurtosis are discussed in this paper. Estimation of the parameters from the Negative Binomial-Generalized Exponential (NB-GE) distribution using the maximum likelihood method. As an illustration, Negative Binomial-Generalized Exponential (NB-GE) distribution used to model the data of fatal crash that has more than 80% zeros.

### The Use Of MEWMA Control Chart In Controlling Major Component Of Cement Product

- Authors:
- Surya Puspita Sari, Dodi Devianto
- Abstract:
Cement is one of the industrial products that has a quality control process. Major component that consists of SiO2, Al2O3, Fe2O3, CaO, MgO and SO3 as basic component in cement product. This research explains about quality control of major component of cement by using MEWMAcontrol chart. The way to …

more »Cement is one of the industrial products that has a quality control process. Major component that consists of SiO2, Al2O3, Fe2O3, CaO, MgO and SO3 as basic component in cement product. This research explains about quality control of major component of cement by using MEWMAcontrol chart. The way to measure the performance of the control chart is used ARL as the average run observation to find the first out of control data. ParameterARL0 is the average run observation of in control data. In this research, it is assumed the data was in control. The optimization of ARL0 by weighted parameter of MEWMA control chart for λ=1, that is equal to Hotteling T2 chart. Optimal value of the weight parameter is determined by using the bisection method for then the variables did not show the outlier data. Finally, this research shows that cement production process is in control.

### Performance Evaluation of AIC and BIC in Time Series Clustering with Piccolo Method

- Authors:
- Triyani Hendrawati, Aji Hamim Wigena, I Made Sumertajaya, Bagus Sartono
- Abstract:
Piccolo method use parameters of Autoregressive model tocluster time series data. One set of time series data can produce several model, but only one model is used for clustering. Akaike’s Information Criterion (AIC) or Bayesian information Criterion (BIC) can be used to selection model. But if it …

more »Piccolo method use parameters of Autoregressive model tocluster time series data. One set of time series data can produce several model, but only one model is used for clustering. Akaike’s Information Criterion (AIC) or Bayesian information Criterion (BIC) can be used to selection model. But if it is used different criterion to selection model, can be produced different model, so it can cause different cluster. The aim of this research is to evaluate performance of AIC and BIC in time series clustering with Piccolo method. The simulation comparing performance of AIC with BIC in time series clustering using the Piccolo method was carried out. Results shows that Bayesian information Criterion (BIC) is better than Akaike’s information Criterion (AIC).

### Hierarchical Generalized Linear Mixed Models for Multilevel Analysis of Indonesian Student’s PISA Mathematics Literacy Achievement

- Authors:
- Tonah Tonah, Anang Kurnia, Kusman Sadik
- Abstract:
Generally, learning assessment and evaluation data in educational has a hierarchical structures one of which is PISA data. Multilevel models are methods that can be used to analyse hierarchical data structures and can be considered as HGLM models. This study has two objectives namely, examine the …

more »Generally, learning assessment and evaluation data in educational has a hierarchical structures one of which is PISA data. Multilevel models are methods that can be used to analyse hierarchical data structures and can be considered as HGLM models. This study has two objectives namely, examine the distribution of variable mathematical literacy and selecting the best HGLM model to determine student and school level variables that significantly influence students' mathematical literacy achievement. The result we have obtained are mathematical literacy achievement has lognormal distribution and M7 model is the best model.

### Determination of General Circulation Model Domain Using LASSO to Improve Rainfall Prediction Accuracy in West Java

- Authors:
- Nanda Fadhli, Aji Hamim Wigena, Anik Djuraidah
- Abstract:
The Statistical downscaling technique has often been used to predict rainfall. This technique needsa domain of general circulation model (GCM) data. The selection of GCM domain is an important factor to improvepredictionaccuracy.The goal of this study is to determine the optimum domain. This study …

more »The Statistical downscaling technique has often been used to predict rainfall. This technique needsa domain of general circulation model (GCM) data. The selection of GCM domain is an important factor to improvepredictionaccuracy.The goal of this study is to determine the optimum domain. This study uses GCM data from CFSRv2 with gridresolution "2.5°×2.5°" and local rainfall data in West Java. The GCM domain is determined basedon minimum correlation value of 0.3 between GCM data and local rainfall data. Correlations are calculated for the grid in the four directions of the compass with one grid as the reference that straightly above the local rainfall station. The domains are evaluated using the regression model with L1 (LASSO) regularization. The result showed that the optimum domain was 8×5 grids.

### Quasi Poisson Model for Estimating Under-Five Mortality Rate in Small Area

- Authors:
- Nofita Istiana, Anang Kurnia, Azka Ubaidillah
- Abstract:
Under-Five Mortality Rate (U5MR) is an important indicator because it reflects the socio-economic conditions and developments in health sector. U5MR is obtained from Demographic and Health Survey (DHS) where the level of estimation is designed for national and provincial level. The decentralization…

more »Under-Five Mortality Rate (U5MR) is an important indicator because it reflects the socio-economic conditions and developments in health sector. U5MR is obtained from Demographic and Health Survey (DHS) where the level of estimation is designed for national and provincial level. The decentralization system makes the importance of U5MR for sub-domain of province such as district/municipality level. Small area estimation (SAE) can be used for estimating U5MR in district/municipality level by using a mixed model. The model that is often used is generalized linear mixed model (GLMM). Direct estimation of U5MR produces a large proportion of zero values (excess zero), so the Poisson model is not suitable for modeling the data. Excess zero is the reason for violating the equidispersion in Poisson model. In this study, quasi Poisson modelproduces better predictions than direct estimation. In addition, the U5MR estimation for municipality makes it possible to produce U5MR maps in municipality level.

### Hybrid Model of Seasonal ARIMA-ANN to Forecast Tourist Arrivals through Minangkabau International Airport

- Authors:
- Mutia Yollanda, Dodi Devianto
- Abstract:
The number of tourist arrivals forecasting is required for the future development of tourism industry to improve the economic growth. The number tourist arrivals data can be analyzed by building a model so that it will help to find out the number of tourist arrivals in the next period which is thro…

more »The number of tourist arrivals forecasting is required for the future development of tourism industry to improve the economic growth. The number tourist arrivals data can be analyzed by building a model so that it will help to find out the number of tourist arrivals in the next period which is through Minangkabau International Airport. The linear model that is used is Seasonal Autoregressive Integrated Moving Average (SARIMA) used and continued to build a nonlinear model of the residual SARIMA model using Artificial Neural Network (ANN). In this research, SARIMA model which obtained is SARIMA (1, 0, 1) (1, 1, 0)12. But, residual of SARIMA model has not been fulfilled an autocorrelation assumption so that it isproposed a new model of SARIMA-ANN. The residual model of SARIMA is built using the ANN model architecture with 2–2–2–1 network topology. The performance rate of time series model of tourist arrivals which is the data started on January 2012 until March 2019 is measured using Mean Absolute Percentage Error (MAPE). Based on the MAPE value of 17.1770% indicates that the model obtained having good performance to forecast the number of tourist arrivals through Minangkabau International Airport in the future.

### Using LDA for Innovation Topic of Technology : Quantum Dots Patent Analysis

- Authors:
- Nurmitra Sari Purba, Rani Nooraeni
- Abstract:
This study seeks to explore information about one of nanotechnology, quantum dots (QDs), through analysis of patent information. QDs patent documents obtained from the United States international patent database, the USPTO, use web scraping. In total, 3914 patents from 1988 to 2016 were taken and a…

more »This study seeks to explore information about one of nanotechnology, quantum dots (QDs), through analysis of patent information. QDs patent documents obtained from the United States international patent database, the USPTO, use web scraping. In total, 3914 patents from 1988 to 2016 were taken and archived for analysis. This paper discusses how to apply Latent Dirichlet Allocation (LDA), a topic model, in a trend analysis methodology that exploits patent information. After the text preprocessing and transformation, the number of topics is decided using the log likelihood value. Then LDA model is used for identifying underlying topic structures based on latent relationships of technological words extracted. We extracted words from 6 relevant topics and showed that these topics are highly meaningful in explaining technology applications of QDs.

### Unordered Features Selection of Low Birth WeightDatain Indonesiausing the LASSO and Fused LASSO Techniques

- Authors:
- Yenni Kurniawati, Khairil Anwar Notodiputro, Bagus Sartono
- Abstract:
This paper aims to analyze the Low Birth Weight (LBW) data of infants in Indonesia by using the LASSO and Fused LASSO techniques. Fused LASSO is usually used to select parameters for ordered features. In this case, the features are unordered. Therefore, this research adopts three techniques in orde…

more »This paper aims to analyze the Low Birth Weight (LBW) data of infants in Indonesia by using the LASSO and Fused LASSO techniques. Fused LASSO is usually used to select parameters for ordered features. In this case, the features are unordered. Therefore, this research adopts three techniques in ordering features. Furthermore, all these Fused LASSO techniques and LASSO are compared. This paper utilizes data on 1,176 LBW infants collected from the 2017 Indonesian Demographic and Health Survey (IDHS). The results showed that LASSO has the sparsest solutionbased on the 5-fold cross-validation. Thefeatures that contribute to LBW are mothers' occupation, mothers' age, antenatal care, multiple birth, and birth order. However, Fused LASSO 1 has the lowest AIC and BIC valuecompared to other ordering techniques.Ordering features using the correlation between the features and the outcomes is recommended as an alternative technique to sort unordered features.

### Bayesian LASSO Quantile Regression: An Application to the Modeling of Low Birth Weight

- Authors:
- Ferra Yanuar, Aidinil Zetra, Arrival Rince Putri, Yudiantri Asdi
- Abstract:
The modeling of low birth weight using ordinary least square is not appropriate and inefficient. The low birth weight data violates the normality assumption since the data is right skewed. The data usually contains outliers as well. Many researchers used quantile regression approach to model this c…

more »The modeling of low birth weight using ordinary least square is not appropriate and inefficient. The low birth weight data violates the normality assumption since the data is right skewed. The data usually contains outliers as well. Many researchers used quantile regression approach to model this case but this method has limitation. The limitation of this approach is need moderate to big sample size. This study aims to combine the quantile regression with Bayesian LASSO approach to model the low birth weight. Bayesian method has ability to model small sample size since it involves the information related to data (known as likelihood function) and prior information about the parameter tobe estimated (prior distribution). This study demonstrated that Bayesian quantile regression and Bayesian LASSO (Least Absolute Shrinkage Selection Operator)quantile regression could yield the acceptable model of low birth weight case based on indicators of goodness of fit model. Bayesian LASSO quantile regression produced better estimated parameter values since it yielded shorter 95% Bayesian credible interval than Bayesian quantile regression.

### Implementing Extreme Gradient Boosting (XGBoost) Classifier to Improve Customer Churn Prediction

- Authors:
- Iqbal Hanif
- Abstract:
As a part of Customer Relationship Management (CRM), Churn Prediction is very important to predict customers who are most likely to churn and need to be retained with caring programs to prevent them to churn. Among machine learning algorithms, Extreme Gradient Boosting (XGBoost) is a recently popul…

more »As a part of Customer Relationship Management (CRM), Churn Prediction is very important to predict customers who are most likely to churn and need to be retained with caring programs to prevent them to churn. Among machine learning algorithms, Extreme Gradient Boosting (XGBoost) is a recently popular prediction algorithm in many machine learning challenges as a part of ensemble method which is expected to give better predictions with imbalanced-classes data, a common characteristic of customers churn data. This research is aimed to prove or disprove that XGBoost algorithm gives better prediction compared with logistic regression algorithm as an existing algorithm. This research was conducted by using customer’s data sample (both churned and stayed customers) and their behaviors recorded for 6 months from October 2017 to March 2018. There were four phases in this research: data preparation phase, feature selection phase, modelling phase, and evaluation phase. The results show that XGBoost algorithm gives a better prediction than LogReg algorithm does based on its prediction accuracy, specificity, sensitivity and ROC curve. XGBoost model also has a better capability to separate churned customers from not-churned customers than LogReg model does according to KS chart and Gains-Lift charts produced by each algorithm.

### Bayes Risk Post-Pruning in Decision Tree to Overcome Overfitting Problem on Customer Churn Classification

- Authors:
- Devina Christianti, Sarini Abdullah, Siti Nurrohmah
- Abstract:
Classification is the process of assigning a set of data into an existing class. Decision tree is claimed to be faster and produces better accuracy compared to another classifier. However, it has some drawbacks in which the classifier is susceptible to overfitting. This problem can be avoided by po…

more »Classification is the process of assigning a set of data into an existing class. Decision tree is claimed to be faster and produces better accuracy compared to another classifier. However, it has some drawbacks in which the classifier is susceptible to overfitting. This problem can be avoided by post-pruning that trimming the small influence subtree in conducting the classification to improve model performance in predicting data. This paper proposes a Post-Pruning method by applying Bayes Risk, in which the risk estimation of each parent node compared with its leaf. This method is applied to two datasets of customer churn classification from the Kaggle site and IBM Datasets with three different sizes for training dataset (60%, 70%, and 80%). For the result, Bayes Risk Post-Pruning can improve decision tree performance and the larger the size of the training dataset was associated with higher accuracy, precision, and recall of the model.

### VARIABLE SELECTION IN ANALYZING LIFE INFANT BIRTH IN INDONESIA USING GROUP LASSO AND GROUP SCAD

- Authors:
- Ita Wulandari, Khairil Anwar Notodiputro, Bagus Sartono
- Abstract:
Regression analysis often requires a selection of explanatory variables X1, X2, ... Xp so shrinkage coefficients can occur that can facilitate the interpretation of the regression equation obtained. In this context, the explanatory variable often has a grouping structure so that a more relevant pro…

more »Regression analysis often requires a selection of explanatory variables X1, X2, ... Xp so shrinkage coefficients can occur that can facilitate the interpretation of the regression equation obtained. In this context, the explanatory variable often has a grouping structure so that a more relevant problem is how to choose groups rather than individuals. Group LASSO and group SCAD are techniques for selecting groups of variables which in many works of literature appear to have advantages over LASSO. In this study, the percentage of live born children in the province of Bali, East Nusa Tenggara and other Indonesia provinces were analyzed and linked to the explanatory variables using the group LASSO and group SCAD methods. The classification of available explanatory variables is grouped based on the theory and results of previous studies. The results show that the best model is the group SCAD method with the smallest AIC, BIC and GCV values. Factors included in the model for Bali province are demographic factors, women's status, and autonomy and the economy. For East Nusa Tenggara province the factors that enter the model are demographics and economics, while generally for Indonesia the factors that are included in the model are demography, women's status, and autonomy and family planning.

### Implementation Extreme Learning Machine for Rainfall Forecasting

- Authors:
- Laksmita Puspaningrum, Ayundyah Kesumawati
- Abstract:
Indonesia is one of country that have a large number of rainfall days in a year. So, make any sense that forecasting is important to get any strategies to overcome the problem of erratic rainfall. There are a lot of method that can conduct forecast rainfall, the new one is Extreme Machine Learni…

more »Indonesia is one of country that have a large number of rainfall days in a year. So, make any sense that forecasting is important to get any strategies to overcome the problem of erratic rainfall. There are a lot of method that can conduct forecast rainfall, the new one is Extreme Machine Learning. Extreme Learning Machine (ELM) is a new learning method of artificial neural networks. ELM is an easy-to use and effective learning algorithm of single-hidden layer feed-forward neural networks (SLFNs). Therefore, ELM has the advantages of fast learning speed and good generalization performance. This research conducted by using rainfall data in Sleman city to get forecasting in one year. It found that the ELM method has a smaller value compared to the GARCH method for all six rainfall station stations in Sleman region.

### Simulation Studyfor Comparison of Maximum Likelihood and Bayesian Method in Spatial Autoregressive Models with Heteroskedasticity

- Authors:
- Fitri Ramadhini, Anik Djuraidah, Aji Hamim Wigena
- Abstract:
Generally spatialregressionconsiders onlyone of the spatial effects, namely spatial dependence or heteroskedasticity between areas. Spatial autoregressive(SAR) models take only into account thedependence on the response variable. Most of SAR estimators are valid if there is no violation in the erro…

more »Generally spatialregressionconsiders onlyone of the spatial effects, namely spatial dependence or heteroskedasticity between areas. Spatial autoregressive(SAR) models take only into account thedependence on the response variable. Most of SAR estimators are valid if there is no violation in the error assumption. Estimation of SAR parameters with heteroskedasticity using maximum likelihood (ML)method gives bias and inconsistent estimators. An alternative method that can be used is Bayesian method. Bayesian method solves heteroskedasticity by modeling the structure of variance-covariance matrix. Simulation data is used to evaluate the Bayesian method in estimating parameters of SAR model with heteroskedasticity. The results indicate that Bayesian method provides bias parameter estimates relatively small and consistent compared to the ML method.

### Efficiency of Several Complex Survey Design using EBLUP in Small Area Estimation

- Authors:
- Nadra Yudelsa Ratu, Ika Yuni Wulansari
- Abstract:
Thedissemination of data from the survey was carried out by estimating the parameter of the survey results. The implementation of the survey at BPS now is getting more complex where direct estimation results are presented into small areas. However, the sample size of direct estimation in small area…

more »Thedissemination of data from the survey was carried out by estimating the parameter of the survey results. The implementation of the survey at BPS now is getting more complex where direct estimation results are presented into small areas. However, the sample size of direct estimation in small area has a relatively small size so that it is not reliable enough, not efficient and has low precision. Therefore, other statistics are needed that can accommodate the dissemination from total household expenditure data in the small area. In this study of small area, it was carried out by applying the Small Area Estimation (SAE) method, which is Empirical Best Linear Unbiased Prediction (EBLUP) by involving a complex survey design. The sampling method in complex survey design that used are Simple Random Sampling Without Replacement called SRSWOR, One Stage Cluster (SRSWOR), Two Stage Cluster (SRSWOR-SRSWOR) and Two Stage Cluster (Probability Proportional to Size called PPSWR-SRSWOR). The efficiency of estimation result is evaluated based on MSE and RRMSE values that obtained in each method of the complex survey design. According to the calculation results, the largest MSE and RRMSE value of the estimation was obtained from Two Stage Cluster (SRSWOR-SRSWOR) sampling method. Besides, the smallest MSE and RRMSE value was obtained from the SRSWOR sampling method that seem to have distinct advantage over the other sampling method.

### Construction of ANFIS Model Based on LM-Test for Forecasting of Chili Price Data in Semarang

- Authors:
- Tarno Tarno, Di Asih I Maruddani, Rita Rahmawati
- Abstract:
The research aim is constructing Adaptive Neuro-Fuzzy Inference System (ANFIS) model for forecasting time series data. The ANFIS model is constructed and applied to chili price data in Semarang. The daily data are written during December 2018 to May 2019. The input selection in ANFIS is done by usi…

more »The research aim is constructing Adaptive Neuro-Fuzzy Inference System (ANFIS) model for forecasting time series data. The ANFIS model is constructed and applied to chili price data in Semarang. The daily data are written during December 2018 to May 2019. The input selection in ANFIS is done by using theLagrange Multiplier (LM) test. The lag-1 with 2 membership functions is selected as optimal input. The performance of prediction based on in-sample data is measured by the values of mean absolute percentage error (MAPE) and root mean squares error (RMSE). The values of MAPE and RMSE are 2.9% and 939.8 respectively.

### Random Forest Lag Distributed Regression for Forecasting on Palm Oil Production

- Authors:
- Aulia Rizki Firdawanti, I Made Sumertajaya, Bagus Sartono
- Abstract:
Palm oil is one of the most cultivated potential commodities so it is necessary to do research to determine the determinants of production and forecasting on palm oil production. The objectives are perform data modeling dan forecasting using random forest lag distributed regression on palm oil prod…

more »Palm oil is one of the most cultivated potential commodities so it is necessary to do research to determine the determinants of production and forecasting on palm oil production. The objectives are perform data modeling dan forecasting using random forest lag distributed regression on palm oil production. This analysis combines the lag distributed regression and random forest methods. The results showed that the performances for this model are the correlation value is 0.9302, RMSE is 20.379, MAE is 14.143, and R-Square is 0.829. The 5 most important variables were quantity of palm oil, land area, palm oil age, 8th lag of wind velocity, and 1st lag of temperature. The distribution of data forecasting results are not much different from the distribution of testing data and original data.

### Estimating the Poverty level in the Coastal Areas of Mukomuko District Using Small Area Estimation: Empirical Best Linear Unbiased Prediction Method

- Authors:
- Etis Sunandi, Dian Agustina, Herlin Fransiska
- Abstract:
This research aims to estimate the poverty level in the Coastal Areas of Mukomuko District using small area estimation. One of the estimation methods on small area estimation is Empirical Best Linear Unbiased Prediction (EBLUP). using the method, the poverty estimator in the coastal area of Mukomuk…

more »This research aims to estimate the poverty level in the Coastal Areas of Mukomuko District using small area estimation. One of the estimation methods on small area estimation is Empirical Best Linear Unbiased Prediction (EBLUP). using the method, the poverty estimator in the coastal area of Mukomuko District is obtained. The evaluation of parameter estimator is calculated by the value of MSE (Mean Square Error) using Bootstrap resampling method. Based on the result of the study is seen that the MSE value of EBLUP estimators is smaller than the MSE value of the direct estimator in each village. The MSE value of the EBLUP estimators is smaller than the MSE value from the direct estimator for each village. This indicates that the estimation with the EBLUP method can improve the estimation of parameters.

### Nowcasting Indonesia's GDP Growth Using Dynamic Factor Model: Are Fiscal Data Useful?

- Authors:
- Ardiana Alifatussaadah, Anindya Diva Primariesty, Agus Mohamad Soleh, Andriansyah Andriansyah
- Abstract:
Since introduced by Giannone et. al., GDP nowcasting models have been used in many countries, including Indonesia. Variables to select usually include housing and construction, income, manufacturing, labor, surveys, international trade, retails and consumptions. Interestingly, fiscal variables are …

more »Since introduced by Giannone et. al., GDP nowcasting models have been used in many countries, including Indonesia. Variables to select usually include housing and construction, income, manufacturing, labor, surveys, international trade, retails and consumptions. Interestingly, fiscal variables are excluded even though government expenditure is an integral part of the basic GDP identity. By employing the previous journal of quarter-to-quarter real GDP growth nowcasting technique by Bok et. al., this paper is aimed at testing the usefulness of inclusion of fiscal variables, in addition to 61 non-fiscal variables, in nowcasting Indonesia GDP. The results show, even though based on the fact that fiscal data have low correlation coefficients to GDP, the inclusion of fiscal data may help to produce a better early estimate of GDP growth based on a better RMSEP value.

### Analysis of Bayesian Generalized Linear Models on the Number of Tuberculosis Patients in Indonesia with R

- Authors:
- Femmy Diwidian, Anang Kurnia, Kusman Sadik
- Abstract:
Generalized Linear Models (GLM) is an extension of the linear regression model that aims to determine the causal relationship, the effect of independent variables on the dependent variable where the response variable is a member of the exponential family. In general, estimating parameters on GLM ca…

more »Generalized Linear Models (GLM) is an extension of the linear regression model that aims to determine the causal relationship, the effect of independent variables on the dependent variable where the response variable is a member of the exponential family. In general, estimating parameters on GLM can be divided into two approaches, namely the frequentist method and the Bayesian GLM method. In this study, both approaches will be used to analyze the number of people suffering from tuberculosis in 34 provinces in Indonesia. The data used is based on 2018 Indonesia Health Profile Data and Information published by the Ministry of Health of the Republic of Indonesia in 2018. Based on the best model test criteria, this study provides results that the frequentist approach to GLM is better in matching the number of people suffering from tuberculosis in Indonesia compared to use Bayesian GLM.

### BAYESIAN QUANTILE REGRESSION MODELING TO ESTIMATE EXTREME RAINFALL IN INDRAMAYU

- Authors:
- Eko Primadi Hendri, Aji Hamim Wigena, Anik Djuraidah
- Abstract:
Quantile regression can be used to analyze symmetric or asymmetric data. Estimates of quantile regression parameters are obtained by the simplex method. Another approach is the Bayesian method based on Laplace's asymmetric distribution using MCMC. MCMC is used numerically to estimate parameters fro…

more »Quantile regression can be used to analyze symmetric or asymmetric data. Estimates of quantile regression parameters are obtained by the simplex method. Another approach is the Bayesian method based on Laplace's asymmetric distribution using MCMC. MCMC is used numerically to estimate parameters from each posterior distribution. The Bayesian quantile regression and the quantile regression can be used for statistical downscaling in extreme rainfall cases. This study used statistical downscaling to obtain relationship between global-scale data and local-scale data. The data used were monthly rainfall data in Indramayu and GCM output data. LASSO regularization was used to overcome multicollinearity problems in GCM output data. The purpose of this study was to compare Bayesian quantile regression models with quantile regression. The Bayesian quantile regression and the quantile regression couldpredict extreme rainfallmore accurate and consistent in one year ahead. The Bayesian quantile regression model is relatively better than the quantile regression.

### Comparison of Maximum Likelihood and Generalized Method of Moments in Spatial Autoregressive Model with Heteroskedasticity

- Authors:
- Rohimatul Anwar, Anik Djuraidah, Aji Hamim Wigena
- Abstract:
Spatial dependence and spatial heteroskedasticity are problems in spatial regression. Spatial autoregressive regression (SAR) concerns only to the dependence on lag. The estimation of SAR parameters containingheteroskedasticityusing the maximum likelihood estimation (MLE) method provides biased and…

more »Spatial dependence and spatial heteroskedasticity are problems in spatial regression. Spatial autoregressive regression (SAR) concerns only to the dependence on lag. The estimation of SAR parameters containingheteroskedasticityusing the maximum likelihood estimation (MLE) method provides biased and inconsistent. The alternative method is the generalized method of moments (GMM). GMM uses a combination of linear and quadratic moment functions simultaneously so that the computation is easier than MLE. The bias is used to evaluate the GMM in estimating parameters of SAR model with heteroskedasticity disturbances in simulation data. The results show that GMM provides the bias of parameter estimates relatively consistent and smaller compared to the MLE method.

### Confidence Interval for Multivariate Process Capability indices in Statistical Inventory Control

- Authors:
- Mustafid Mustafid, Dwi Ispriyanti, Sugito Sugito, Diah Safitri
- Abstract:
Multivariate process capability indices (MPCI) has important role in the analysis of statistical inventory control determined by several consumer demand as quality characteristics that are correlated. In the inventory control management is also needed confidence interval for MPCI to overcome the un…

more »Multivariate process capability indices (MPCI) has important role in the analysis of statistical inventory control determined by several consumer demand as quality characteristics that are correlated. In the inventory control management is also needed confidence interval for MPCI to overcome the uncertain from consumer demand. The research aims to apply the confidence interval for MPCI in statistical inventory control. The case studies conducted on the apparel industry to implement the confidence interval for the MPCI using several types of apparel which is used as the quality characteristics. The upper and lower limits for the intervals from the MPCI are obtained using sample data assuming multivariate normal distribution and stable. Process sample data in stable conditions are obtained by using analysis of multivariate control diagram designed by T2 Hotelling. The MPCI confidence interval can be used as the indicator in determining the number of products provided in inventory based on the number of consumer demand.

### Evaluation of Proportional Odds and Continuation Ratio Models for Smoker in Indonesia

- Authors:
- Rini Warti, Anang Kurnia, Kusman Sadik
- Abstract:
The polytomous model is a model used for more than two categorical response data. Some models that can use for ordinal scale responses are the Proportional Odds Model, Continuation Model, Partial Proportional Odds Model, and Adjacent Model. The Proportional Odds model has the assumption of "proport…

more »The polytomous model is a model used for more than two categorical response data. Some models that can use for ordinal scale responses are the Proportional Odds Model, Continuation Model, Partial Proportional Odds Model, and Adjacent Model. The Proportional Odds model has the assumption of "proportionality" or parallelity to the cumulative logit. If the parallel logits assumption not fulfilled, the alternative models that can use are Adjacent Model and Continuation-Ratio. The purpose of this study is to evaluate the proportional Odds (PO) model and Continuation-Ratio (CR) for smokers in Indonesia. The data used was taken from 2017 Indonesian Demographic and Health Survey (IDHS) by classifying smokers in ordinal categories (mild, moderate, and severe). The results show there was a violation of the assumptions in the PO Model so that the CR Model was an alternative to use. Gender is a factor that has a significant influence on all response categories. Based on the value of Goodness of fit, deviance, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Mac Fadden R2 indicate that the CR Model is better to use than the Model PO.

### Kernel Regression on Reflectance of Lithium Niobate in various Concentrations of Ruthenium Oxide

- Authors:
- Anne Mudya Yolanda, Muhammad Nur Aidi, Indahwati Indahwati, Irzaman Irzaman
- Abstract:
Research on LiNbO3doped with RuO2has been widely developed. This study aimed to use kernel regression to measure the influence of wavelength to the percentage of reflectance of LiNbO3 doped with RuO2 and compare estimated results among all concentrations. A hundred candidates of bandwidth were trie…

more »Research on LiNbO3doped with RuO2has been widely developed. This study aimed to use kernel regression to measure the influence of wavelength to the percentage of reflectance of LiNbO3 doped with RuO2 and compare estimated results among all concentrations. A hundred candidates of bandwidth were tried to find the optimum bandwidth. For the percentage of reflectance with wavelength around 450.2 to 900.9, the results show that the optimum bandwidth is 11.56. The kernel regression performs smoothing and estimating at each data point based on the optimum bandwidth used so that the fitted data is closely following the observed data. The kernel regression model produced adjusted R-square as 0.9629, 0.9590, 0.9871, and 0.9840, respectively for LiNbO3doped with concentration 0, 2, 4, and 6%. For the same wavelength, the percentage of reflectance of a material made of LiNbO3doped with various concentration is higher than material made of LiNbO3only.

### Combining Result of PCR and PLSR Statistical Downscaling with Quadratic Optimization for Improving Estimation

- Authors:
- Praditya Puspitaninggar, Aji Hamim Wigena, Anwar Fitrianto
- Abstract:
Statistical downscaling (SDS) using principal component regression (PCR) and partial least square regression (PLSR) used to estimate rainfall. The accuracy measurement of these models was done by calculating root mean square error of prediction (RMSEP) value. The smaller RMSEP value, the closer est…

more »Statistical downscaling (SDS) using principal component regression (PCR) and partial least square regression (PLSR) used to estimate rainfall. The accuracy measurement of these models was done by calculating root mean square error of prediction (RMSEP) value. The smaller RMSEP value, the closer estimate to actual rainfall. The RMSEP value of PCR and PLSR models tend to be large, so it needs a method to improve the accuracy of estimation. Improving accuracy of rainfall estimate was done by combining the two estimates with the SDS models and minimizing the combined RMSEP as an objective function with or without constraints. The optimization with constraints resulted in the combined RMSEPs larger than the RMSEPs of the two models on few datasets, while the optimization without constraints resulted in the combined RMSEPs smaller than the RMSEPs of the two models on all datasets. The combined rainfall estimate was better than rainfall estimates forming it.

### Determining Factors That Affecting Financial Performance of Sharia Small Business Bank by Using Ordinal Logistic Regression with Bootstrap Estimation

- Authors:
- Muhammad Ridho, Dodi Devianto
- Abstract:
The aim of this study is to find out the financial performance‘s affecting factors of sharia small business bank in Indonesia. Financial performance is the response variable which grouped into three categories, they are bad, medium, and good. Predictor variables involved in this study consisted of …

more »The aim of this study is to find out the financial performance‘s affecting factors of sharia small business bank in Indonesia. Financial performance is the response variable which grouped into three categories, they are bad, medium, and good. Predictor variables involved in this study consisted of 11 numerical data and 1 categorical data. The numerical data are financial capital, productive assets, credit, third party funds, income, capital adequacy ratio, return on assets, return on equity, financing deposit ratio, non performing financing, and operational efficiency. Whereas the categorical data is the location of sharia small business bank (1 if it is in rural areas, 2 if it is in the business center, and 3 if it is in a business and urban center). Ordinal logistic regression with a bootstrap estimation is performed to determine the predictor variable that affects the response variable. This study found that two predictor variables affected the response variable significantly, namely return on assets and return on equity with a hit ratio of 96.41%. So that it can be concluded that the model obtained has been able to determine the level of performance of sharia small business bank in Indonesia.

### Forecasting of Revenue, Number of Plane Movements and Number of Passenger Movements at Sultan Iskandar Muda International Airport Using the VARIMA Method

- Authors:
- Asep Rusyana, Lia Rahmati, Nurhasanah Nurhasanah
- Abstract:
Forecasting has become a necessity in various fields. One of the companies that do the forecasting is PT. Angkasa Pura II - Sultan Iskandar Muda (SIM) International Airport. The amount of income, the number of aircraft movements, and the number of passenger movements are some of the interesting thi…

more »Forecasting has become a necessity in various fields. One of the companies that do the forecasting is PT. Angkasa Pura II - Sultan Iskandar Muda (SIM) International Airport. The amount of income, the number of aircraft movements, and the number of passenger movements are some of the interesting things to study from an airport. This kind of problem can be solved by Vector Autoregressive Integrated Moving Average (VARIMA) method. The variables involved in this data analysis can be modeled and forecasted. The analysis shows that the three variables studied tend to have a flat trend, where the amount of income, the number of aircraft movements, and the number of passenger movements has increased over time. The best model obtained is in the form of VARIMA (1,1,0) model. Percentage of forecast errors for income variables, aircraft movements, and passenger movements amounted to 14.028%, 11.003% and 13.330%, respectively.

### Finding Factors Caused Diabetes Mellitus Disease Using Link Functions

- Authors:
- Sudarno Sudarno, Tatik Widiharih, Moch. Abdul Mukid
- Abstract:
Diabetes mellitus disease was a disease caused abnormal of pancreas organ. Abnormal of pancreas organ causes supply insulin in the blood be a few. If a person has diabetes mellitus for long time, then that body would have complication with other disease or body organ. This desease could be known by…

more »Diabetes mellitus disease was a disease caused abnormal of pancreas organ. Abnormal of pancreas organ causes supply insulin in the blood be a few. If a person has diabetes mellitus for long time, then that body would have complication with other disease or body organ. This desease could be known by level of blood sugar after fast. If a person has symptom level of blood sugar after fast was greater than 126 mg/dL, he could be said that he has diabetes mellitus desease. Many people have diabetes melitus in this time. Generally, a person could get this disease, because genetic from parents, human life about eat and drink, and bad hobby. Link function was function to relate between response variable versus predictor variables in multiple linear regression. Predictor variables of link function were linear with respect to both variable and parameter. The link functions of this paper were logit link function, normit link function and complementary log-log (cloglog) link function. We take response variable is diabetes mellitus, while the factors cause this disease (covariate variables) are age, sex, systolic blood pressure, diastolic blood pressure, logarithm of the ratio of urinary albumin and creatinine, high-density lipoprotein in cholesterol, logarithm of insulin, and smoke. In this research, the significant factors were sex, logaritm of the ratio of urinary albumin and creatinine, and logarithm of insulin.

### Spatial Durbin Model for Identifying the Factors Affecting Diarrhea in East Java

- Authors:
- Dyra Fitri Kesuma Dewi, Asep Saefuddin, Utami Dyah Syafitri
- Abstract:
Diarrhea is an infectious disease that has been a major health problem in Indonesia. To reduce the number of diarrhea cases in Indonesia, the government must examine the factors causing diarrhea in order to make the right decision. An analysis that can be used forthis purpose is regression analysis…

more »Diarrhea is an infectious disease that has been a major health problem in Indonesia. To reduce the number of diarrhea cases in Indonesia, the government must examine the factors causing diarrhea in order to make the right decision. An analysis that can be used forthis purpose is regression analysis. When regression analysis is conducted with spatial data, the residuals are usually spatially autocorrelated. There are some methods which can handlethis spatial effect, such as spatial autoregressive (SAR), spatial error (SEM), and spatial durbin(SDM) models. In this research, those models were used to model the number of diarrhea cases in East Java in 2017. SDM was the best model because it had the smallest AIC and the greatest pseudoR2. There were two variables significantly affecting diarrhea cases in East Java, percentage of people with clean and healthy lifestyle, and spatial lag of the population’s density.

### Bayesian Zero Inflated Negative Binomial Regression Model for The Parkinson Data

- Authors:
- Shafira Shafira, Sarini Abdullah, Dian Lestari
- Abstract:
Excess zeros can be solved by Zero Inﬂated Poisson (ZIP). If over-dispersion still exists in the data, the ZIP model is no longer suitable. Replacing the Poisson distribution with negative binomial distribution in the counting process may provide an alternative solution. Zero Inﬂated Negative Binom…

more »Excess zeros can be solved by Zero Inﬂated Poisson (ZIP). If over-dispersion still exists in the data, the ZIP model is no longer suitable. Replacing the Poisson distribution with negative binomial distribution in the counting process may provide an alternative solution. Zero Inﬂated Negative Binomial (ZINB) regression model is estimated using the Bayesian method. Conjugate non-informative priors were used. Sampling parameters from posterior distribution is conducted using Markov Chain Monte Carlo (MCMC) simulation with 50,000 burn-in and 150,000 iterations. The model was then implemented to Parkinson’s disease data obtained from the Parkinson’s Progression Markers Initiative (PPMI) program. The MCMC result showed the convergence of the parameters. The result of the inspection of motoric aspect was significant in explaining does Parkinson’s patients have to consume drugs or not. The result of the inspection of non-motoric aspect and body response were significant in explaining motoric complication in Parkinson's disease sufferers.

### Twitter as Source of Auxiliary Information in Small Area Estimation (A Case Study about Estimation Electability of The Candidate Pairs Of President and Vice-President Of The 2019 President Election)

- Authors:
- Fathi Abdul Muhyi, Anang Kurnia, Bagus Sartono
- Abstract:
When it comes to topic about electability people are become excited. People always wonder who has higher electability. During campaign of president election 2019, politics becomes one of the very interesting topic especially in social media. People talk about each pair candidate’s president of Indo…

more »When it comes to topic about electability people are become excited. People always wonder who has higher electability. During campaign of president election 2019, politics becomes one of the very interesting topic especially in social media. People talk about each pair candidate’s president of Indonesia in social media. Topics that people talk can contain whether it is positive topic or negative topic. And now, what about linking electability with topics that people talk about in social media. It would be very interesting if there are some information in social that related to electability. Information extracted from social media can come in handy as auxiliary information to help estimating electability in each province using small area estimation.

### Analysis of Time Series Data Using Maximal Overlap Discrete Wavelet Transform Autoregressive Moving Average

- Authors:
- Sella Nofriska Sudrimo, Kusman Sadik, I Made Sumertajaya
- Abstract:
The price of broiler chickens has fluctuations pattern or certain wave patterns. This study aims to predict broiler chicken price data that have to fluctuate and non-stationary using MODWT-ARMA models andARIMA models and also see the ability of MODWT-ARMA in increasing accuracy in predicting data. …

more »The price of broiler chickens has fluctuations pattern or certain wave patterns. This study aims to predict broiler chicken price data that have to fluctuate and non-stationary using MODWT-ARMA models andARIMA models and also see the ability of MODWT-ARMA in increasing accuracy in predicting data. In this study, the data is separated using wavelet transforms namely MODWT into two-part is wavelet and smooth signal, then each signal is modeled using the ARMA model and the final of the process is to recombine all signals. The results show that the MODWT-ARMA model has a smaller RMSE and normalized error than the ARIMA which is 1175.97 and 0.68 for the MODWT-ARMA model while 2365.85 and 2.77 for the ARIMA model. The conclusion in this study, MODWT-ARMA can handle broiler chicken price data in Bogor better than the ARIMA model and can improve the accuracy of prediction results.

### The Study of Robust Estimators on Panel Data Regression Model for Data Contaminated with Outliers

- Authors:
- Mia Amelia, Kusman Sadik, Bagus Sartono
- Abstract:
Outliers can cause biased parameter estimators and deviate from the actual values. This research studies robust estimators on panel data regression model. The robust estimators used are least trimmed squares (LTS) and within-group generalized M (WGM). This research aims to study robust estimator me…

more »Outliers can cause biased parameter estimators and deviate from the actual values. This research studies robust estimators on panel data regression model. The robust estimators used are least trimmed squares (LTS) and within-group generalized M (WGM). This research aims to study robust estimator method in estimating panel data regression parameter on simulation data with various kinds of outliers and outlier proportions. This research utilizes primary data taken from the results of simulation data designed based on fixed effects of the panel data regression. The variety of overall simulation data in this study contains 16 types of contamination. The result shows that the within estimation method is not robust against outliers. Based on the absolute relative bias and RMSE, the WGM method produces a small variety of estimators and high accuracy of estimators for various types of outliers and levels of outlier contaminations.

### Clusterwise Regression Model Development with Gamma Distribution

- Authors:
- Reski Syafruddin, Agus M. Soleh, Aji H. Wigena
- Abstract:
This paper present development of clusterwise regression with a data set that has gamma distribution. Clusterwise regression is a method that finds simultaneously an optimal member of data in k cluster and each cluster have the best regression model. Analysis of a simulated data set has also been p…

more »This paper present development of clusterwise regression with a data set that has gamma distribution. Clusterwise regression is a method that finds simultaneously an optimal member of data in k cluster and each cluster have the best regression model. Analysis of a simulated data set has also been presented for illustrative purposes. Gamma and normal distributions were used for distribution of responses scenario with different parameters. This simulation study is carried out by initializing the number of clusters, classify observations randomly as an initial partition, move observation to the cluster giving the smallest residual and re-estimate the regression model from final partition. This simulation showed that clusterwise regression is able to form partition according to the distribution of data, also to form the best generalized linear model with Gamma distribution and linear regression model.

### Spectral Analysis of Rainfall Anomaly in Tanjung Priok

- Authors:
- Tyas T Pujiastuti, Nurjaman Nurjaman, Sugeng Indarto
- Abstract:
Rainfall is the most important weather element that played critical role in human life. Located in tropical region, Indonesia has great variability of rainfall that influenced by various factors including global, regional, and local scale phenomena. To achieve better rainfall prediction accuracy, w…

more »Rainfall is the most important weather element that played critical role in human life. Located in tropical region, Indonesia has great variability of rainfall that influenced by various factors including global, regional, and local scale phenomena. To achieve better rainfall prediction accuracy, we have to understand factors that contribute to rainfall formation process. In this study, we conducted spectral analysis of rainfall in Tanjung Priok. As the busiest port in the country, there was a demand of high precision weather forecast in Tanjung Priok region. Spectral analysis was used to recognize rainfall pattern in the location. The result showed that rainfall in Tanjung Priok followed the annual cycle, while rainfall anomalies observed six periods of 19 dasarians, 31 dasarians, 3 dasarians, 42 dasarians, 114 dasarians, and 228 dasarians. These periods were indicated as the period of ITCZ, Madden-Jullian Oscillation, dipole mode, and El Nino / La Nina.

### Comparison of MCEM and Bayesian Correction Methods of Spatially Lagged Covariates Measured with Error : Evidence from Monte Carlo Simulation

- Authors:
- Mohammad Masjkur, Henk Folmer, Asep Saefuddin
- Abstract:
Measurement errors in (spatially lagged) explanatory variables under the classical-errors-in variables assumption are not routinely accounted for in applied (spatial) research, in spite of their serious consequences. Particularly, the estimator of coefficients of variables measured with error but a…

more »Measurement errors in (spatially lagged) explanatory variables under the classical-errors-in variables assumption are not routinely accounted for in applied (spatial) research, in spite of their serious consequences. Particularly, the estimator of coefficients of variables measured with error but also of those not measured with error are biased and inconsistent. The purpose of this paper is to analyze and compare by way of Monte Carlo simulation two bias correction methods, i.e. Monte Carlo Expectation-Maximization (MCEM) and Bayesian approach (BA). We consider spatial lag model (SLX) with different spatial correlation of covariate of interest, different measurement error variances and sample sizes. We use relative bias (RelBias) and Root Mean Squared Error (RMSE) as valuation criteria. The main result is that the Bayesian approach and MCEM method outperform the Naive model without measurement error correction. Moreover, the Bayesian approach performs better than MCEM method.

### Two-Stage Statistical Downscaling Modeling with Multi-Class Random Forest on Rainfall Prediction

- Authors:
- Riana Hadiana, Agus Mohamad Soleh, Bagus Sartono
- Abstract:
Statistical downscaling (SD) modeling to predict rainfall has been widely used using the General Circulation Model (GCM) output. Based on the previous study, SD modeling to predict rainfall by rainfall grouping (two-stages) gives a smaller Root Mean Squares Error of Prediction (RMSEP) than SD model…

more »Statistical downscaling (SD) modeling to predict rainfall has been widely used using the General Circulation Model (GCM) output. Based on the previous study, SD modeling to predict rainfall by rainfall grouping (two-stages) gives a smaller Root Mean Squares Error of Prediction (RMSEP) than SD modeling without rainfall grouping (one-stage). In this study, the daily and monthly rainfall were divided into three groups based on their intensity (volume) and two-stages SD modeling was applied to predict rainfall. The first stage was rainfall groups classification using random forest. The second stage was rainfall prediction using Partial Least Squares Regression (PLSR). The accuracy obtained by random forest for daily and monthly rainfall lied between 62%-84%. The RMSEP obtained from two-stages SD modeling for daily rainfall was similar to one-stage SD modeling, where the Coefficient of Variation (CV) was above 100%. The different results happened when two-stages SD modeling was applied to monthly data. The RMSEP obtained was better than one-stage SD modeling, where the CV lied between 30%-50%.

### Super Learner for Predicting Stock Market Trends: A Case Study of Jakarta Islamic Index Stock Exchange

- Authors:
- Gerry alfa Dito, Bagus Sartono, Annisa Annisa
- Abstract:
Predicting stock market trend is one of the challenging tasks over the years. It has diverse influencing’s factors which cause stock market trend is very dynamic and has high volatility. Forecasting model, which is a prevalent method to predict the stock market trend, has several difficulties with …

more »Predicting stock market trend is one of the challenging tasks over the years. It has diverse influencing’s factors which cause stock market trend is very dynamic and has high volatility. Forecasting model, which is a prevalent method to predict the stock market trend, has several difficulties with its characteristics. Although forecasting model is efficient, sometimes it has high forecasting error. Formulating forecasting problem into a classification problem might be considered an alternative approach to predict stock market trend. Several kinds of research have shown that machine learning is a suitable method for predicting stock market trend as a classification problem. This paper discusses applying one of powerful machine learning method, which is called Super learner, to predict stock market trends. Besides, this research employs several technical indicators as predictor variables. Results show that the Super Learner model is useful for predicting both the short-term and long-term trend.

### Daily Rainfall Prediction using Two-Stage Modeling with Boosting Classification on Statistical Downscaling

- Authors:
- Agung Satrio Wicaksono, Hari Wijayanto, Agus Mohamad Soleh
- Abstract:
Statistical Downscaling (SD) techniques can be used to predict local rainfall data by using the General Circulation Model (GCM) output data as large-scale global data. Previous research concluded that SD techniques in two-stage modeling with classification using monthly rainfall data can reduce err…

more »Statistical Downscaling (SD) techniques can be used to predict local rainfall data by using the General Circulation Model (GCM) output data as large-scale global data. Previous research concluded that SD techniques in two-stage modeling with classification using monthly rainfall data can reduce errors in one-stage modeling with Partial Least Square Regression (PLSR). In this study, SD techniques in two-stage modeling with classification are used to predict daily rainfall data. First, the robustness of Boosting method in classification was used to determine the occurrence of rainfall in a day. Second, the PLSR method was used to predict amount of rainfall in rainy days predicted by Boosting method. The capability of the model is tested in four stations all located in West Java Province. Results obtained from 5-fold Cross Validation with 2 repeats clearly show that the RMSEP value will be decrease if the classification accuracy value increase.

### An Empirical Study in Forecasting Bitcoin Price Using Bayesian Regularization Neural Network

- Authors:
- Rina Sriwiji, Arum Handini Primandari
- Abstract:
In recent years, Bitcoin has attracted a lot of attention because of its nature that supports encryption technology and monetary units. For traders, Bitcoin becomes a promising investment since its fluctuating prices potentially draw high profit (the higher the risk the higher the return). Unlike c…

more »In recent years, Bitcoin has attracted a lot of attention because of its nature that supports encryption technology and monetary units. For traders, Bitcoin becomes a promising investment since its fluctuating prices potentially draw high profit (the higher the risk the higher the return). Unlike conventional stock, Bitcoin trades for 24 hours a day without a closing period, so that it escalates the risk. Predicting the value of Bitcoin is expected to minimize the risk by considering some information such as blockchain information, macroeconomic factors, and global currency ratios. However, the multicollinearity among these independent variables causes regression method cannot be used. This research employs Bayesian Regularization Neural Network (BRNN) which is a free assumption. This method is Single Hidden Layer Feed Forward Neural Network (SLNN) that utilize Bayesian concept to optimize weights, biases, and connection strengths. The data is time series data from January 23, 2017, to January 23, 2019. Regression with subset selection is employed to reduce independent variables, from a total of 25 variables to 14 variables. As a result, the predicted value is not much different from the actual data, with an accuracy of 91.1% based on the MAPE value.

### Modeling of Quality of Education in Junior High Schoolusing Multivariate Adaptive Regression Splines (MARS) Method

- Authors:
- Urwawuska Ladini, Budi Susetyo, Indahwati Indahwati
- Abstract:
The accreditation of education system in Indonesia was established based on national education standards, while the national examinations were conducted to measure student’s academic achievement. The relationship between accreditation and national examinations is still under debate, considering tha…

more »The accreditation of education system in Indonesia was established based on national education standards, while the national examinations were conducted to measure student’s academic achievement. The relationship between accreditation and national examinations is still under debate, considering that both are important in measuring education quality. Multivariate adaptive regression spline is the implementation of regression problem solving to predict the variable response that continuously based on several independent variables arranged in a set of several basic function coefficients, which overall controlled in the regression data. This research conducted to model national examinations based on national education standard scores at the junior high school level and can accommodate interaction of independent variables. Facilities and infrastructure standards is the most important, which is 100% has a large influence on the goodness of the model, and then graduate competency standards with an importance of 45.34%. Process and management standards didn’t significantly influence national examinations.

### Comparing Decision Tree, Random Forest and Boosting in Identifying Weather Index for Rice Yield Prediction

- Authors:
- Mohammad Masjkur, Ken Seng Tan
- Abstract:
Modeling relationship of weather index and yield losses is a basis for developing weather-based index crop insurance. The data mining approach may overcome some limitations of traditional regression approaches to identify a weather index for predicting crop yield. The purpose of study is to evalu…

more »Modeling relationship of weather index and yield losses is a basis for developing weather-based index crop insurance. The data mining approach may overcome some limitations of traditional regression approaches to identify a weather index for predicting crop yield. The purpose of study is to evaluate performance Decision Tree, Random Forest and Boosting in identifying most important weather index for rice crop yield prediction. The study using district level of rice yield data of 8 locations within the annually period of 1991 – 2014 in Java region. The corresponding weather data consist of 48 weather variables including timescale Standardized Precipitation Index (SPI), Growing Degree Days (GDD), and Vapor Pressure Deficit (VPD) for growing season, respectively. Results show that Boosted Regression Tree is the best model compared to Regression Tree and Random Forest for rice yield prediction. The most important weather index is Growing Degree Days on growing season I (GDD I) and Growing Degree Days on growing season III (GDD III).The threshold values of GDD I > 21000C and GDD III > 21500C would trigger rice yield losses.