MAKE MOST OF THE KNOWLEDGE NETWORK, JOIN ACADEMIC RESEARCH FOUNDATION

Journal of Statistics and Computer Science

Journal of Statistics and Computer Science

Frequency :Bi-Annual

ISSN :2583-5068

Peer Reviewed Journal

Table of Content :-Journal of Statistics and Computer Science , Vol:1, Issue:1, Year:2022

On Application of the Cramér-von Mises Distance for Equivalence Testing

BY :   Vladimir Ostrovski
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.1-9
Received: 10 April 2022  | Revised: 07 June 2022  | Accepted : 09 June 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.01 

We consider an equivalence test for a fully specfied continuous distribution on. The equivalence test is a powerful tool to show that observed data are sufficiently close to a given distribution. The test under consideration is based on the time-proven Cramér-von Mises distance. We show that the test is locally asymptotically most powerful. A consistent estimator for the asymptotic variance of the test statistic is provided. The bootstrap percentile-t method is applied to improve the finite sample performance of the equivalence test. A detailed algorithm for the asymptotic and percentile-t tests is given. An extensive simulation study of the finite sample properties is performed. A practical approach to find efficient values of the tolerance parameter is provided.

MSC 2020: 62G10

Keywords: Testing equivalence, Cramér-von Mises distance, equivalence test, neighborhood-of-model validation

Vladimir Ostrovski. (2022). On Application of the Cramér-von Mises Distance for Equivalence Testing. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 1-9. https://DOI: 10.47509 /JSCS.2022.v01i01.01


Determining Brand Attributes for a Consumer Packaged Goods (CPG) Brand from Imbalanced Binary Data

BY :   Chiranjit Dutta, Balaji Raman, Venu Gorti and Kamal Sen
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.11-21
Received: 13 March 2022  | Revised: 25 May 2022  | Accepted : 30 May 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.02 

Classification of imbalanced binary data set is challenging due to a large presence of one class, typically more than 90% of data. Few examples of imbalanced data sets are: fraudulent activity of credit cards, conversion rates of online advertisements, clinical diagnostic tests of rare diseases etc. Standard classification techniques like logistic regression assumes that the underlying data set is evenly distributed. Applying these to imbalanced data set results in a classifier with poor prediction accuracy for minority class. Machine learning methods like SMOTE (Synthetic Minority Over-Sampling Technique) address this issue by oversampling the minority class, i.e it creates synthetic samples of the minority class instead of sampling with replacement. However, there are no standard rules to determine sample size of the oversampled minority class. Moreover, it is difficult to draw causal inference from a machine learning approach. In this paper, we discuss applications of novel statistical methodologies like generalized linear model using GEV-links, power links and latent factor models to a respondent level survey data. Objective of the study is to identify demand enhancing attributes for a CPG (Consumer Packaged Goods) brand in an emerging market.

Keywords: Imbalanced binary response, GEV, power and reverse power link, SMOTE, brand imagery.

Chiranjit Dutta, Balaji Raman, Venu Gorti & Kamal Sen. (2022). Determining Brand Attributes for a Consumer Packaged Goods (CPG) Brand from Imbalanced Binary Data. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 33-43. https://DOI: 10.47509 /JSCS.2022.v01i01.02


Two Runs Rule based Weighted Alternated Charting Statistic Control Charts to Monitor the Mean Vector of a Bivariate Process

BY :   Gadre M. P.
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.23-43
Received: 11 April 2022  | Revised: 21 June 2022  | Accepted : 24 June 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.03 

For multivariate processes, while developing the multivariate control charts, the test statistics are the functions of all quality characteristics. In ‘Alternated Charting Statistic’ (ACS) chart, by alternating the charting statistic, only one of the two/three quality characteristics is inspected (measured) per sample. For bivariate processes, in ‘Weighted Alternated Charting Statistic’ (WACS) chart, the weights of the two quality characteristics are considered to decide which one of the two quality characteristics is inspected (measured) per sample. WACS chart performs better as compared to the ACS chart. In this article, for bivariate processes, two run length based control charts namely, the ‘WACS Synthetic’ (WACS-Syn) chart and the ‘WACS Group Runs’ (WACS-GR) chart are proposed. When there is no correlation or small to moderate correlation between the two quality characteristics, it is numerically illustrated that the proposed control charts perform better as compared to the Hotelling ????2 chart, ACS chart and WACS chart. Further, WACS-GR chart performs significantly better as compared to the WACS-Syn chart.

MSC 2020 subject classification: 62P30

Keywords: Alternated charting statistic, Weighted Alternated charting statistic, WACS chart, ACS-Syn chart, ACS-GR chart.

Gadre M.P. (2022). Two Runs Rule based Weighted Alternated Charting Statistic Control Charts to Monitor the Mean Vector of a Bivariate Process. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 57-77. https://DOI: 10.47509 /JSCS.2022.v01i01.03


Applications of the Polynomial Pair f(s) and  f(S/ – ) to Control, Probability, and Matrix Analysis

BY :   David Hertz
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.45-53
Received: 07 March 2022  | Revised: 24 May 2022  | Accepted : 30 May 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.04 

David Hertz (2022). Applications of the Polynomial Pair f(s) and f(S/????–????) to Control, Probability, and Matrix Analysis. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 79-87. https://DOI: 10.47509/JSCS.2022.v01i01.04


Cross-Validation for Supervised Learning with Tuning Parameters

BY :   Lyron Winderbaum and Inge Koch
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.55-76
Received: 25 April 2022  | Revised: 07 June 2022  | Accepted : 25 June 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.05 

Recent advances in machine learning and data science have led to widespread adoption of complex predictive modelling. Increasing awareness of the ‘reproducibility crisis’ has led to calls for improved transparency and accountability in scientic reporting. One important aspect of veridical data science is the robust estimation of prediction error. Availability of computational resources has led to cross-validation (CV) as a main tool for such estimation. We consider CV estimation in supervised learning for high-dimensional data, and focus on linear regression and discriminant analysis approaches based on variable selection with direct dimension reduction as well as lasso-type sparsity criteria. We highlight how the same description of a method could in fact apply to any one of several different crossvalidation implementations. We outline key principles underpinning good cross-validation practice, several ‘pitfall’ implementations which subtly violate these principles in different ways as well as a more complex and computationally intensive implementation which does not. We demonstrate the differences in the estimated error resulting from these dierent implementations with real data relating to endometrial cancer, in the context of high-stakes decision making where accurate and robust estimation of prediction error is critical. We use simulated data to illustrate how these different implementations
result in estimators for prediction error with very different properties and relationships to the true prediction error. We call for increased detail in method-reporting, present principles for good practice in the implementation of cross-validation, and make recommendations to guide cross-validation implementation.

Keywords: Cross-Validation, Prediction, Proteomics, Reproducibility.

Lyron Winderbaum & Inge Koch (2022). Cross-Validation for Supervised Learning with Tuning Parameters. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 89-106. https://DOI: 10.47509 /JSCS.2022.v01i01.05


A Sequential Design for Estimating the Product of Two Non Simultaneously Zero Means

BY :   Zohra Benkamra, Mekki Terbeche and Mounir Tlemcani
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.77-98
Received: 12 April 2022  | Revised: 07 June 2022  | Accepted : 10 June 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.06 


Zohra Benkamra, Mekki Terbeche & Mounir Tlemcani. (2022). A Sequential Design for Estimating the Product of Two Non Simultaneously Zero Means. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 11-32. https://DOI: 10.47509 /JSCS.2022.v01i01.06


Impact Estimation on COVID-19 Infections following School Reopening in September 2020 in Italy

BY :   Livio Fenga and Massimo Galli
Journal of Statistics and Computer Science , Year: 2022,  Vol.1 (1),  PP.99-110
Received: 19 April 2022  | Revised: 17 June 2022  | Accepted : 30 June 2022  | Publication: 15 July 2022 
Doi No.: DOI: 10.47509 /JSCS.2022.v01i01.07 

Background Since its outbreak, CoViD-19 (formally known as 2019-nCoV) has been triggering many questions among public authorities, social organisations and school officials, as to when students should be allowed to return to school. Such a decision is critical and must take into account, other than its beneficial effects, also those associated with an increased exposition of the students to the virus, which, as a result, might spread at a faster rate. To date, in Italy, a few studies have rigorously investigated the correlation between school reopening and number of people tested positive to CoViD-19. Therefore, this paper aims to provide an assessment of such an impact as well as to illustrate the methodology followed.

Methods: Official daily data on the cumulative number of people tested positive to CoViD-19–in conjunction with external information accounting for the different points in time schools reopened in the various Italian regions–have been employed to build a stochastic model of the type Seasonal Autoregressive Moving Average embodying external information.

Results: There was a statistically significant increase in the number of positive cases in all the Italian regions related to schools reopening. Such an increase occurred, in average, about 18.9 days after the schools have been reopened. Schools reopening have been significantly contributed to the diffusion of the pandemic, with an overall estimated impact of about 228,724 positive cases.

Conclusions: The results suggest the need for strict control of all in-school activities. This could be done by using, to a variable extent, all the non pharmaceutical interventions available, such as limited access to school spaces, no overlapping practices between different sports in the same space, universal masking, bubble-size classroom. However, in many cases, such measures might not be a viable option, at least in the short run, nor be reasonably applicable. Therefore, whenever the established safety criteria could not be met, school buildings should remain closed.

Keywords: CoViD–19 pandemic; intervention analysis; S-ARIMA-REG models; schools reopening; time series analysis

Livio Fenga & Massimo Galli. (2022). Impact Estimation on COVID-19 Infections following School Reopening in September 2020 in Italy. Journal of Statistics and Computer Science. Vol. 1, No. 1, pp. 45-56. https://DOI: 10.47509/JSCS.2022.v01i01.07


Displaying articles 1-7