# Appraisal of linear type traits and prediction of milk production using machine learning approach in Bos indicus cattle

Devi Indu^{1*}
, Naseeb Singh^{2}
, Kuldeep Dudi^{3}
, Ajay Kumar^{4}
, S S Lathwal^{5}

^{1*}Livestock Production Management, ICAR-National Dairy Research Institute, Karnal-132001, Haryana, India

^{2}Farm Machinery and Power, ICAR- Research Complex for NEH Region, Umiam, Meghalaya- 793103, India

^{3}District Extension Specialist, Krishi Vigyan Kendra (Ujha, HAU), Panipat – 132103, India

^{4}(LPM), DUVASU, Mathura -281001, Uttar Pradesh, India

^{5}ICAR- National Dairy Research Institute, Karnal-132001, Haryana, India

Corresponding Author Email: indulathwal@gmail.com

**DOI : ** https://doi.org/10.5281/zenodo.8125212

##### Keywords

### Abstract

Linear-type traits are one of the important factors that affect the production and performance of dairy cattle. In the current study, the reduction in the dimensionality of multiple linear traits was studied to identify the most important traits for body, udder and teat confirmation of indigenous cattle. Four different machine learning algorithms were used to validate the factor analysis results for the production parameters. The high KMO value (0.72) and significant Bartlett’s sphericity test (p<0.01) suggested correlations between the linear traits and support the feasibility of data for factor analysis. Total of 9 factors were found to have an eigen values ≥1 and accounted for 65.69% of the total variance. Factor 1 mainly reflected traits related to the body structure of a cow like stature, angularity, and mammary system traits like udder cleft, rear udder width, and teat thickness. Factors 2, 3, 4, 5 and 7 were related to udder depth, fore udder attachment, teats, and other body traits as bone structure, and rump width, factor 6 reflects the body strength and stability, while factor 8 and 9 reflected the cows with good teats. Providing a pre-defined variance value of 65.69%, the developed models predicted the test day and 305-days milk yield values with R2 as high as 0.64 and 0.63, respectively, indicating that derived factors were able to explain the pre-defined variance effectively. So, body, strength and mammary system conformation are important auxiliary phenotypic traits that may be considered in selection programs to further improve the production potential of cows.

**Introduction**

Improvements in dairy cow productivity, reproduction, and functionality are ongoing challenges for researchers to satisfy the future dairy needs of a growing population. However, continuous focus on increasing milk production might result in the loss of merits of some important type traits that subsequently have a negative effect on cow health [1]. Linear traits represent the animal’s strengths, weaknesses, and overall functionality and can thus be used as an effective tool to aid genetic improvement in cow functioning. Linear traits are imperative parameters that should be considered while designing selection criteria in different countries and breeding strategies [2]. These traits affect directly or indirectly herd life, longevity, and production performance of dairy cows [3] and culling decisions [4]. Conformational traits have medium to high heritabilities, could be recorded easily in a single appraisal, and so makes them trustworthy traits for their inclusion in selection indices [5] in the early life of animals. Body size traits like stature and heart girth have been found significant functional traits influencing feeding efficiency [6], which in turn are the key traits affecting the economic production efficiency of dairy production. The dairy character, muscularity and BCS seemed to be indicators of the metabolic reserve and energy balance of animals [7]. Body length, wither height and heart girth were related to milk production potential and udder conformation traits are important parameters for the prediction of milk production [8] and longevity of cows in a herd [9].

*Bos indicus* (zebu) cattle breeds have evolved with special adaptations that are suitable for dairy production under tropical climates [10]. Zebu cattle have a prominent hump, large dewlap, long ears, and large naval flap/sheath in males. In India, information is available on colour, horn shape, and body size of cattle breeds but systematic information on the type traits of indigenous cattle is lacking. Different body conformation traits relationship with milk yield have been studied in past for indigenous cattle breeds and it was found that in spite of non-prominent naval flaps, cows with proportionate bodies and medium-sized teats were average milk yielders. Conversely, cows with large naval flap, dewlap and well-developed udder, milk vein and teats were good milk yielders and docile, whereas cows with short naval flap, small udder and hard teats were found to be poor milkers [11]. Standard Linear traits are usually large in number, and have significant correlation and colinearity among them [12]. This could give inaccurate estimates of longevity like overestimation of herd life parameters, so a limited number of type traits, with a known biological relationship with longevity, should be used for indirect selection [8]. To limit the number of traits, in past multiple researchers used factor analysis technique to explain the dependencies between the traits by eliminating the redundant information between correlated variables and representing them in a lesser number of derived variables called factors [1]. The published literature on the comprehensive study of linear traits and reduction in the dimensionality of zebu cattle are less. Therefore, this study was conducted to find out the important traits that have a strong relationship with milk production capacity of animal. Further, to validate that the association of factors that were extracted using factor analysis technique [13], various regression models to predict milk yield were developed using machine learning algorithms. For factors that truly characterize the cattle performance, these developed models should predict the milk yield performance of cattle to a convincing accuracy value. We wanted to explore application of ML models for prediction and validation. Multiple studies [14, 15] have shown that machine-learning algorithms outperform conventional regression models. The artificial neural network, support vector machine, random forest, and extreme gradient boost algorithms were used in the present study for the development of regression models to predict milk yield.

**Materials and methods**

*2.1 **Location of the study area*

The present study was carried out at Livestock Research Centre, ICAR-National Dairy Research Institute, Karnal, Haryana, India. The study was conducted between 2020 to 2022. The research work protocols were approved by the Institutional Animal Ethics Committee (IAEC) of the research institute.

*2.2 Animal Selection and experimental design*

For the present study, 150 multiparous Sahiwal cows of first to fifth parity, 3.8 -8.2 years age, with the most probable production ability (MPPA) of around 2312.43±167.12 (mean±SE) kg milk were selected. All linear-type traits were recorded as per International Committee for Animal Recording [16]. Out of total 26 type traits (details in Table 1), 18 traits were approved standard traits; 5 were common standard traits and 3 were additional traits (hump size, dewlap size, naval flap size) as peculiar traits of zebu cattle. All the measurements were recorded on each animal in a normal standing position on a levelled floor. In order to avoid the between-scorer effects, the measurements were taken by the same person to the nearest centimeter with utmost precision using a measuring tape and digital angle recorder having resolution of 1.0 mm and 0.01°, respectively. The animals within 15 days after parturition were not considered for scoring to avoid the influence of edematous swelling of udder. Further, analysis was done with the data adjusted for non-genetic factors like parity, stage of lactation and season. Production data was also collected from the livestock record unit of the institute.

*2.2 **Animal husbandry practices*

As per standards [17], animals were kept in a loose housing system which includes covered as well as open area for free movement. The floor type was paddock with ‘brick on edge’ and the manger area was made up of concrete with grooves. The animals were fed ad lib conventional balanced diet comprising of dry fodder (wheat straw), green fodder (berseem in winter, maize/jowar in summer), and compound concentrate feed as per the normal feeding schedule recommended by [18]. Fresh water was available to animals *ad libitum*.

*2.3 **Descriptive statistics*

The descriptive statistics for all linear type traits (total 26) along with respective average score points (ASP) and interpretation have been given in Table 1. The column having name, trait code mentions the abbreviations that were used for the traits in the present study. The desirable column is showing the desirability of different traits based on their ideal scores. Most of the linear-traits in Sahiwal cattle were of intermediate (ASP of 4-6) and desirable type, except few traits (i.e., stature, chest width, body depth, rump width, angularity, central ligament, rear udder height, rear udder width and fore udder attachment) which suggested the scope of further improvement by the inclusion of these traits in the selection programme. These findings of the present study were in agreement with previous studies conducted for Sahiwal, Tharparkar and Holstein cattle breeds [19, 13, 20].

*2.4 **Pre-processing of dataset*

Scaling is an important step in the pre-processing of a dataset prior to actually performing the statistical analysis. Factor analysis seeks features with the greatest variance, and the variance is high for high-magnitude features, thus, skewing the factor analysis in favour of high-magnitude features. In the present dataset, the range of features values varies widely, e.g., above 150.0 for lactation milk yield, above 100.0 for body stature, rear leg set, fore udder attachment etc., and below 10.0 for muscularity, teat thickness, hock development, etc. Therefore, scaling of data is necessary in the present study and was performed using the *MinMax* Scaler as defined by Eq. (1) with a range of [-1.0,1.0]. This scaler preserves the shape of the original data distribution without changing the information embedded in the original data.

… (1)

*2.5 **Implementation of factor analysis technique*

Prior to carrying out the factor analysis on a given dataset, it is advisable to find out the suitability of the dataset for the factor analysis. Generally, the two most common methods applied for the same are: Bartlett’s Test of Sphericity and Kaiser, Meyer, Olkin (KMO) Measure of Sampling Adequacy (MSA). To test the null hypothesis that the correlation matrix for variables is an identity matrix, Bartlett’s test of Sphericity was used. An identity correlation matrix indicates that the variables were unrelated and thus unsuitable for factor analysis. KMO test was used to determine the strength of the partial correlation between the variables i.e., how the variables explain each other. In order to get distinct and reliable factors, KMO test value should be close to 1.0, while a value near to 0.0 indicates that the factor analysis is likely to be inappropriate.

Python-based freely available *factor analyzer* [21] library with a laptop (8 GB RAM, Intel Core i5 CPU, and Windows 10 operating system) having Spyder development environment was used in this study for statistical analysis. As shown in Table 2, the high KMO-MSA mean value of 0.72 and significance of Bartlett’s sphericity test (p<0.01) suggests correlations between the linear traits and indicate the practicality of employing the factor analysis for the collected indigenous Sahiwal cow dataset. KMO test value greater than 0.5 indicated suitability of data for satisfactory multivariate (factor) analysis. A previous study [22] had reported sampling adequacy values of 0.6 for studying different traits in cattle. Higher estimate of sampling adequacy of 0.89 was reported for biometric traits of Kankrej cows [23]. Sampling adequacy value of 0.75 and 0.79 were reported, respectively in Holstein cattle [1, 24]. The numbers of derived factors were chosen based on their eigen-values score as well as analyzing the slope of the scree curve. Using scree test plot, all factors that existed on curve before the slope of curve reduced to zero was extracted and factors were rotated using varimax rotation to improve interpretation by removing ambiguity in non-rotated solutions.

*2.6* *Validation of extracted factors with milk production performance*

In order to validate the derived factors using factor analysis technique in the present study, which were able to explain the pre-defined variance in milk yield performance of Sahiwal cattle, several regression models, were developed. The inputs to these models were the derived factors while the predicted values were the test-day milk yield and 305-days milk yield. Four different machine learning algorithms namely, artificial neural network, support vector regression, random forest regression, and extreme gradient boosting algorithms were used to develop these models. For the training of these models, the scaled data were divided into training and validation data in a ratio of 70:30, respectively.

### 2.6.1. Artificial neural network model

Several researchers have employed the artificial neural network (ANN), a modelling tool that is motivated by the human brain [25], as a regression model in the past. A typical ANN consists of the input layer, output layer, and one or more hidden layers as well as an activation function in each neuron (figure 1). In this study, an artificial neural network (ANN) with two hidden layers having 18 and 9 neurons in each was used to estimate the milk yield parameters. Rectified Linear Units (ReLU) [26] activation function was employed in each neuron of the hidden layer to account for nonlinearity in the model, except the output layer in which the linear activation function [27] was used as the model is for regression purpose. The weights were optimized using the Adam optimizer [28], and the mean squared error metric [29] was used as a loss function. With a batch size of 16 data instances, each model was trained for 1000 epochs.

*2.6.2. **Support vector regression model*

Support vector machine developed by [30] is a supervised machine learning algorithm mainly used for classification tasks in which with the help of support vectors, a correct hyperplane is constructed for classification. These support vectors can also be used for regression problems and are called support vector regressors (SVR). The same concept behind support vector machines underlies support vector regression, which transforms data points into a high-dimensional feature space and builds an optimized separating hyperplane with the objective of minimizing the distance between the closest data point and the hyperplane in order to predict discrete values. In SVR, the correct hyperplane is the one that has the maximum number of points. In defining the SVR model, radial basis function was used as kernel function, and training was carried out for maximum iterations of 1000 with a regularization parameter value of 1.0, and ɛ-value [31] of 0.1.

*2.6.3. **Random forest regression model*

Random Forest, proposed by [32], is a supervised machine learning algorithm that utilizes the ensemble learning method and a large number of Decision Trees. It is a bagging method that can be applied to classification as well as regression tasks. For training the random forest model, total 200 numbers of trees in a forest were used, bootstrap sampling was implemented for building the trees, and squared error was applied to measure the quality of a split. The number of variables was set at three for each sampling event, and the nodes were extended until all leaves were pure or contained less than two samples. Other parameters that are given as default values in the *Sklearn Python library,* were not changed.

*2.6.4. **Extreme gradient boosting model*

For regression and classification problems, extreme gradient boosting is a machine learning approach proposed that generates a prediction model in the form of a cluster of weak prediction models, usually decision trees [15]. Similar to other boosting techniques, it constructs the model in stages, but it generalizes them by enabling the optimization of a differentiable loss function. For training the extreme gradient boosting model, total 150 numbers of trees in the ensemble were used and the root mean squared error (RMSE) metric was applied for the model evaluation. Increasing the maximum depth of tree causes it to be more complex and likely to overfit as well. The tree population in ensemble was increased in the step of 25 trees until no further improvements were observed. Using the trial and error method, each model’s learning rate, maximum depth, number of samples, and number of features in each tree were set to 0.1, 6.0, 0.7, and 1.0, respectively as these values were generating lower error values. Other parameters that are given as default values in the *xgboost Python library* were used as it is.

*2.6.5. **Performance evaluation of developed models*

The prediction accuracy of developed models is directly proportional to the input factors. If derived factors determined using factor analysis technique are unable to explain the pre-defined variance in data, then the accuracy of these models will be inadequate. To evaluate the performance of developed models in the prediction of milk yield parameters, linear type traits along with milk yield data of randomly selected 25 cows were collected from the same study area by following similar animal husbandry practices as explained in previous sections for data collection. The three experts ranked the cows according to the collected linear type traits, with the cow with the best linear type traits being kept at the top. Scaled data were used in the factor analysis, and the results were the derived factors. These obtained variables supplied as input to the developed models, from which test day milk yield and 305-days milk yield values were predicted. The accuracy of the developed models was evaluated using the R^{2} (coefficient of determination) values as defined in Eq. (2).

… (2)

where, *N* is the number of samples, *Y _{i }*is the observed value of the milk yield, is the predicted milk yield value, and is the mean observed milk yield value.

For comprehensive evaluation, based on predicted 305-days milk yield values, the cows were ranked and compared with the observed rank. The ranking gap is the difference between observed and predicted rank of a cow which ideally should be nil. For accurate models, performance of which significantly depends on input factors, the ranking gap should be as low as possible. The hypothesis here is that if the extracted factors are unable to explain the pre-defined variance in data, then the rank gap will be higher.

# 3. Results

*3.1 **Factor analysis results*

Out of 26 factors, nine were extracted using Kaiser Rule criterion to determine the number of factors i.e. retaining only the factors that have eigenvalues greater than 1 (Table 4). The identified nine factors could explain 65.69% of the cumulative variance between the linear type traits. The first factor accounted for 21.89% of the variation. The second factor explained 7.42%; the third factor explained 6.73%; fourth factor explained 6.06%; fifth factor explained 5.43%; the sixth factor explained 4.96%; the seventh factor explained 4.69%; eighth factor explained 4.40% and ninth factor explained 4.12% of total variance. The scree test indicated the extraction of 26 factors (Figure 2), nine factors indicated with a critical value greater than one. Estimates of factor weights for linear traits using varimax rotation (loadings) are given in figure 3. Larger weight of a particular factor is indicating that the magnitude of correlation or association between the factors and the original traits is more. Each factor can be interpreted biologically depending on the sign and size of the factor weight. The first factor weights varied from –0.77 to 0.85 for RUH and CL respectively (figure 3). The higher significant weights (>0.60) in the first factor were for CL, stature, RUW, TT and ANG for the fitness of dairy cows. The second factor weights varied from –0.29 to 0.95 for CW and UD respectively (figure 3). The higher significant weights (>0.60) in the second factor were for UD. The third factor (figure 3) weights varied from –0.19 to 0.92 for BD and BS respectively. Range of fourth factor was -0.15 to 0.98 for RUH and FUA respectively. The higher significant weights (>0.60) in fourth factor was for FUA. Range of fifth factor (figure 3) was -0.17 to 0.63 for LOC and RW respectively. The higher significant weights (>0.60) in fifth factor was for RW. The range of sixth factor (fig 3) was -0.49 to 0.45 for RLS and MUS respectively; seventh factor was -0.17 (BCS) to 0.66 (RA); eight factor was -0.28 (HD) to 0.72 (FTP); ninth factor was -0.19 (NF) to 0.51 (RTP).

Figure 4 shows the loading plot for nine factors representing important traits for a particular trait based on colour (yellow) whose value was more than 0.60, and values below 0.60 were represented in bluish green colour. This plot was clearly showing (based on yellowish colour) that factor 1 was explaining CL, RUW, RUH and ST, factor 2 was having UD, factor 3 having BS, factor 4 having FUA as significant traits. Results of fig 4 and 5 were verifying the significance of body and udder traits.

*3.2 Evaluation of developed machine learning based models*

The developed machine learning-based models were used for predictions of milk production potential (milk yield) of Sahiwal cattle, for which the parameters in terms of derived factors were feed to these models. The predicted values of test day milk yield (TDMY) and 305-days milk yield (305 DMY) were compared with the ‘observed values’ using the coefficient of determination (R^{2}) values. The R^{2} values attained using the developed models in prediction of TDMY and 305-DMY have been shown in Fig. 5 and Fig. 6, respectively. Fig. 5 showed that all developed models have an R^{2} value of at least 0.58 for predicting TDMY values, with the artificial neural network having the highest R^{2} value of all the developed models of 0.64. Similar to this, as shown in Fig. 6, developed models were able to predict the 305 days MY with an R^{2} value over 0.60. The artificial neural network model showed the greatest R^{2} value of 0.63 for 305-days MY.

For comprehensiveness of validation, observed ranks of cows by the three experts and ranks allotted to cows based on predicted 305-days milk yield were analyzed along with rank gap. Fig. 7 shows the observed rank, predicted rank, and rank gap for each cow from which it can be observed that the rank gap of most cows is below four for the developed models. This suggested that models predicted the ranks, so the milk yield with considerable accuracy. To quantify the ranking gap results, frequency of ranking gap for developed models along with cumulative ranking gap percentage was plotted as shown in Fig. 8. It can be observed from Fig. 8 that for a rank gap value of three, artificial neural network, support vector regression model, random forest regression model, and XGBoost model attained cumulative ranking gap percentage of 60.0%, 52.0%, 56.0%, and 44.0%, respectively which are reasonable values considering the pre-defined variance value of 65.69%.

# 4. Discussions

*4.1 **Analysis of extracted factors*

Factor analysis is a useful statistical multivariate technique to explain the variability and dependencies among observed correlated traits. First factor represented significant weights for central ligament, stature, rear udder width, teat thickness and angularity. Central ligament is a median suspensory elastic ligament, provides support (60%) to udder for proper udder balance and uniform teat placement on the udder floor. Rear udder width considered as an indicator of udder capacity [7]. Taller or higher stature is a desired attribute as this type of cow often has more capacity to eat, bears its udder higher and is less likely to have lesser mastitis chances. The second factor represented udder depth which measures the relationship of the udder floor relative to the hock joint of cow. Shallow or higher udders have been found to be associated with less mastitis, udder injury and more herd life of dairy cows. The higher significant weight in the third factor was for bone structure indicating that bone structure of hind limb (cannon bone) of cow could affect the locomotion, standing ability and overall fitness of the animal. The fourth factor represented fore udder attachment, indicating the importance of fore udder attachment to the body wall in deciding the overall conformation of udder and considered as third most significant physical trait of the udder for predicting herd life of a cow [7]. The fifth factor indicated significance of rump width, signifying the significance of optimum distance between pin bones to prevent reproductive system-related diseases [1]. Sixth, seventh, eighth and ninth factor represented muscularity; rump angle; fore teat placement; rear teat placement respectively. Muscularity in loin and thigh area of cow has not direct correlation with milk yield but can play some role in overall dairy cow feeding status, well-developed, fat-free and lean musculature type is indicative of the high production capacity of cow. Rump angle represents the difference in height of hook bones to pin bones, rump with pin bones lower than hip bones is preferable for better uterine drainage, genital tract health and lesser calving difficulties. Fore teat placement determines front teat placement on the udder quarter, they should not be on the either extreme side, otherwise more chances of being culled of a cow [18].

Scree plots was also plotted to decide the actual number of important factors, factors having eigenvalues up to or more than one is usually considered for analysis. We found nine factors, while four factors were reported in Brazilian Holstein cattle [24]; seven factors in Holstein cows of Colombia [1] and four factors in Indian buffaloes [33]. Differences in the statistical methods and population structures, might clarify the difference in number of extracted factors. In the present study, first nine factors accounted for 65.69% of total variance between linear type traits. Kern *et al.* [24] reported first two factors defined 100% of total cumulative variance for 20 linear type traits in Brazilian Holstein cattle. We reported body and udder related parameters in nine factors but they cannot be grouped in two groups only, first 5 factors included all important udder and body traits. Factors 6 to 9 included additional traits like muscularity, rump angle and teat placements, which implies that these might also be important parameters for selection of elite zebu dairy cows. If the traits of factor 1-9 are included in selection criteria, cow are expected to have higher stature, more body depth angular body, wider chest width and rump width with deep udder cleft, desirable (shallow) udder depth and strong legs for longer strides.

*4.2 **Interpretation of milk production performance validation*

Four different machine learning algorithms were used to validate that the factor analysis results for the production parameters (TDMY and 305 DMY). Results of these four models in terms of R^{2 }values as predicted accuracy were observed to be above 50% (ranged from 0.58 -0.64) for TDMY and above 60% for 305-DMY. As stated earlier, in the present study, factors having eigenvalues greater than 1.0 were considered which resulted into the selection of first nine factors and explained 65.69% of the total variance in data. Providing a pre-defined variance value of 65.69%, the developed models predicted the test day and 305-days milk yield values with R^{2} as high as 0.64 and 0.63, respectively, indicating that the derived factors were able to explain the pre-defined variance effectively. We reported reasonable gap between observed and predicted values of milk yield, considering the variance value of 65.69% defined by 9 factors. As in this case, the null hypothesis was that a higher rank gap will be observed, if the extracted factors are unable to explain the pre-defined variance which proves to be false by observing the cumulative ranking gap percentage. So, by observing the low rank gap as well as the higher values of R^{2}, it can be inferred that the extracted nine factors using the factor analysis technique effectively represented the whole-body conformation. Previous studies have also reported a significant relationship between important linear traits and the milk yield of cows. The significant phenotypic correlation between type traits like angularity, udder width, rear udder width, central ligament, fore udder attachment, and milk yield in dairy cows [24, 20] has been reported. In our study, the traits like stature, angularity, central ligament, rear udder width, teat thickness, udder depth, fore udder attachment, rump width, rump angle were found to be important for a comprehensive explanation of variance. Hence, it can be inferred that the extracted nine factors in the present study effectively explained the body, feet, udder, and teat conformation of zebu dairy cattle and can adequately be considered to select the dairy cows with better milk production potential.

**5. ****Conclusions**

The factor analysis technique seems to be useful method to reduce the dimensionality of a large number of linear type traits, indicating 9 important groups to identify the traits that most accurately explained the body, udder, and teat conformation of zebu dairy cattle. On considering the traits of the first factor as selection criteria, indigenous Sahiwal cows are supposed to have a deep body, high angularity and taller stature, in addition to a wider rear udder with strong central ligament and thicker teats. Traits corresponding to factor 2, 3, 4 and 5 if considered for selection, will reflect a strong cow with desirable udder depth, strong fore udder attachment, good bone quality and wider rump. It clearly indicating body, strength and mammary system conformation are important phenotypic traits and can be considered in selection programs of indigenous Sahiwal cows to further get better production potential.

**Acknowledgements**

The authors sincerely acknowledge the grant (SRG/2020/001804) provided by Science Engineering Research Board (SERB), New-Delhi, India to carry out this study.

**Conflict of interest statement**

There was no conflict of interest with the financial organization regarding the manuscript.

**References**

- Corrales AJ, M. Cerón-Muñoz A, Jhon Cañas C, Herrera R, Samir Calvo C. Relationship between type traits and milk production in Holstein cows from Antioquia, Colombia. Revista MVZ Córdoba. 2011; 16:2507–2513.
- Miglior F, Muir BL, Van Doormaal BJ. Selection indices in Holstein cattle of various countries. J. dairy sci. 2005; 88:1255–1263.
- Rennó FP, de Araújo CV, Pereira JC, de Freitas MS, Torres RdeA, Rennó LN, Azevêdo JAG, Kaiser FdaR. Genetic and phenotypic correlations among type traits and milk yield of Brown Swiss cattle in Brazil. Revista Brasileira de Zootecnia. 2003; 32:1419–1430.
- Zavadilová L, Štípková M. Genetic correlations between longevity and conformation traits in the Czech Holstein population. Czech J. Anim. Sci. 2012; 57:125–136.
- Kistemaker G, Huapaya, G. Parameter estimation for type traits in the Holstein, Ayrshire and Jersey Breeds. Dairy Cattle Breeding and Genetics Committee Report to the Genetic Evaluation Board. March. Mimeo. 2006.
- Lin CY, Lee AJ, McAllister AJ, Batra TR, Roy GL, Vesely JA, Wauthy JM, Winter, KA. Intercorrelations among milk production traits and body and udder measurements in Holstein heifers. J. Dairy Sci. 1987; 70:2385–2393.
- 2013 SJ, Trent AM, Marsh WE, McGovern PG, Robinson RA. Individual cow risk factors for clinical lameness in lactating dairy cows. Preventive veterinary medicine. 1993; 17:95–109.
- Vukasinovic N, Moll J, Künzi N. Factor analysis for evaluating relationships between herd life and type traits in Swiss Brown cattle. Livestock Prod. Sci. 1997; 49:227–234.
- Søndergaard E, Sørensen MK, Mao IL, Jensen J. Genetic parameters of production, feed intake, body weight, body composition, and udder health in lactating dairy cows. Livestock Prod. Sci. 2002; 77:23–34.
- Madalena F, Toledo-Alvarado H, Cala-Moreno N. Bos indicus Breeds and Bos indicus × Bos taurus Crosses. Encyclopedia of Dairy Sciences (Third edition). 2019; 30–47.
- Manoj S, Gurdeep S, Baljit S. Know about important breeds of dairy Cattle and Buffaloes in India, 27p. 2008.
- Mantovani R, Cerchiaro I, Contiero B. Factor analysis for genetic evaluation of linear type traits in dual purpose breeds. Italian J. Anim. Sci. 2005; 4:31–33. doi:10.4081/ijas.2005.2s.31.
- Zink V, Zavadilova L, Lassen J, Stipkova M, Vacek M, Stolc L. Analyses of genetic relationships between linear type traits, fat-to-protein ratio, milk production traits and somatic cell count in first-parity Czech Holstein cows. Czech J Anim Sci. 2014; 59:539–547.
- Thongpeth W, Lim A, Wongpairin A, Thongpeth T, Chaimontree S. Comparison of linear, penalized linear and machine learning models predicting hospital visit costs from chronic disease in Thailand. Informatics in Medicine Unlocked. 2021; 26:100769.
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA. p. 785–794. 2016.
- International Committee of Animal Recording. Conformation recording of dairy cattle. Guidelines approved by the general assembly, Rome, Italy. 2018.
- BIS. Recommendations for loose housing system for animals. Recommendations for loose housing system for animals. Bureau of Indian Standards. Ministry of Animal Husbandry, Dairying and Fisheries (IS 11799 : 2005). 2005
- ICAR. Nutrient requirements of cattle and buffalo. Nutrient requirements of animals, Indian Council of Agricultural Research, New Delhi. India. 2013.
- Dubey A, Mishra S, Khune V, Gupta PK, Sahu BK, Nandanwar AK. Improving linear type traits to improve production sustainability and longevity in purebred Sahiwal cattle. J. Agricultural Sci. and Technol. 2012; 2:636.
- Godara AS, Tomar AKS, Patel M, Godara RS, Bhat SA, Bharati P. Body Conformation in Tharparkar Cattle as a Tool of Selection. J. Anim. Res. 2015; 5:423–430.
- Biggs J. factor-analyzer: A Factor Analysis tool written in Python. Available from: https://github.com/EducationalTestingService/factor_analyzer. 2022.
- Shah WA, Ahmad N, Javed K, Saadullah M, Babar ME, Pasha TN, Saleem AH. Multivariate analysis of Cholistani cattle in Punjab Pakistan. J. Anim. Plant and Sci. 2018; 28.
- Pundir RK, Singh PK, Singh KP, and Dangi, PS . Factor analysis of biometric traits of Kankrej cows to explain body conformation. Asian-Australasian J. Anim. Sci. 2011; 24:449–456.
- Kern EL, Cobuci, JA, Costa, CN, Pimentel CMM. Factor analysis of linear type traits and their relation with longevity in Brazilian Holstein cattle. Asian-Australasian J. Anim. Sci. 2014; 27:784–790.
- Zou, J., Han, Y and So, SS (2009). Overview of Artificial Neural Networks. In: D. J. Livingstone, editor. Artificial Neural Networks: Methods and Applications. Humana Press, Totowa, NJ. p. 14–22.
- Agarap, AF (2019). Deep Learning using Rectified Linear Units (ReLU). doi:10.48550/arXiv.1803.08375.
- Zare Abyaneh, H (2014). Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters. J Environ Health Sci Engineer. 12:40.
- Kingma, DP., and Ba, J (2017). Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. http://arxiv.org/abs/1412.6980
- Wallach D, Goffinet B. Mean squared error of prediction as a criterion for evaluating and comparing system models. Ecological Modelling. 1989; 44:299–306.
- Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory. Association for Computing Machinery, New York, NY, USA. p. 144–152. 1992.
- Drucker H. Improving regressors using boosting techniques. In: ICML. Vol. 97. Citeseer. p. 107–115. 1997.
- Breiman L. Random forests. Machine learning. 45:5–32. 2001.
- Vohra V, Niranjan SK, Mishra AK, Jamuna V, Chopra A, Sharma N, Jeong DK. Phenotypic characterization and multivariate analysis to explain body conformation in lesser known buffalo (Bubalus bubalis) from North India. Asian-Australasian J. Animal Sci. 2015; 28:311–317.

Table 1. Descriptive statistics and Average Score Points of linear traits in Sahiwal cows

S. No. | Trait name | Trait Code | Mean ± S.E. | ASP± S.E. | Interpretation | Desirable |

1 | Stature (cm) | ST | 127.15±0.40 | 5.77±0.08 | Intermediate | Tall |

2 | Chest width (cm) | CW | 26.69±0.41 | 4.59±0.10 | Intermediate | Wide |

3 | Body depth (cm) | BD | 67.01±0.47 | 5.24±0.10 | Intermediate | Deep |

4 | Angularity (deg) | ANG | 65.37±0.76 | 4.69±0.09 | Intermediate | Wide |

5 | Rump angle (cm) | RA | 15.14±0.34 | 5.48±0.08 | Intermediate | More slope |

6 | Rump width (cm) | RW | 16.71±0.30 | 5.56±0.09 | Intermediate | Wide |

7 | Rear leg rear view (visual) | RLR | 5.25±0.09 | 5.05±0.06 | Intermediate | Intermediate |

8 | Rear leg set (deg) | RLS | 159.26±0.77 | 4.81±0.12 | Intermediate | Intermediate |

9 | Foot angle (deg) | FA | 45.09±0.46 | 4.84±0.10 | Intermediate | Intermediate |

10 | Fore udder attachment (deg) | FUA | 114.53±1.45 | 4.85±0.11 | Intermediate | Strong |

11 | Front teat Placement (visual) | FTP | 5.11±0.08 | 4.41±0.07 | Intermediate | Intermediate |

12 | Teat length (cm) | TL | 7.34±0.14 | 3.50±0.10 | Intermediate | Intermediate |

13 | Udder depth (cm) | UD | 6.88±0.31 | 5.31±0.08 | Intermediate | Intermediate |

14 | Rear udder height (cm) | RUH | 12.57±0.47 | 5.52±0.10 | Intermediate | High |

15 | Central ligament | CL | 3.76±0.16 | 4.01±0.09 | Intermediate | Strong |

16 | Rear teat placement | RTP | 5.42±0.11 | 6.28±0.08 | Intermediate | Intermediate |

17 | Locomotion (visual) | LOC | 5.33±0.10 | 4.88±0.06 | Intermediate | Long stride |

18 | Body condition scoring (visual) | BC | 5.37±0.11 | 6.53±0.08 | Intermediate | Intermediate |

19 | Hock development (visual) | HD | 5.30±0.10 | 5.70±0.21 | Intermediate | Intermediate |

20 | Bone structure (visual) | BS | 5.41±0.10 | 5.38±0.17 | Intermediate | Intermediate |

21 | Rear udder width (cm) | RUW | 9.61±0.25 | 4.27±0.09 | Intermediate | Wide |

22 | Teat thickness (cm) | TT | 5.03±0.15 | 4.16±0.10 | Intermediate | Intermediate |

23 | Muscularity (visual) | MUS | 5.77±0.12 | 5.93±0.19 | Intermediate | Intermediate |

24 | Hump size (cm) | HS | 11.1±0.14 | 4.10±0.78 | Intermediate | Intermediate |

25 | Dewlap width (cm) | DW | 14.76±0.49 | 4.6±1.30 | Intermediate | Intermediate |

26 | Naval Flap (cm) | NF | 8.66±0.29 | 4.8±1.53 | Intermediate | Intermediate |

Table 2. Results of Kaiser-Meyer-Olkin test and Bartlett’s test of sphericity.

Particulars | Value |

Kaiser-Meyer-Olkin measure | 0.7204 |

Bartlett’s test of sphericity | 0.00042 |

Chi-Square | 802.43 |

p-value | <0.01 |

Table 3. Total variance explained by different factors in Sahiwal cows

Factor | Eigen values | % of variance explained | Cumulative variance |

1 | 5.75 | 21.89 | 21.88 |

2 | 1.95 | 7.42 | 29.3 |

3 | 1.77 | 6.73 | 36.03 |

4 | 1.60 | 6.06 | 42.09 |

5 | 1.43 | 5.43 | 47.52 |

6 | 1.3 | 4.96 | 52.48 |

7 | 1.23 | 4.69 | 57.17 |

8 | 1.16 | 4.4 | 61.57 |

9 | 1.08 | 4.12 | 65.69 |

10 | 0.99 | 3.79 | 69.48 |

11 | 0.93 | 3.53 | 73.01 |

12 | 0.85 | 3.22 | 76.23 |

13 | 0.79 | 3.00 | 79.23 |

14 | 0.76 | 2.89 | 82.12 |

15 | 0.66 | 2.52 | 84.64 |

16 | 0.61 | 2.34 | 86.98 |

17 | 0.56 | 2.15 | 89.13 |

18 | 0.51 | 1.94 | 91.07 |

19 | 0.45 | 1.72 | 92.79 |

20 | 0.39 | 1.48 | 94.27 |

21 | 0.35 | 1.33 | 95.6 |

22 | 0.3 | 1.14 | 96.74 |

23 | 0.27 | 1.02 | 97.76 |

24 | 0.24 | 0.91 | 98.67 |

25 | 0.21 | 0.78 | 99.45 |

26 | 0.14 | 0.53 | 99.98 |

Fig. 1. Illustration of artificial neural network architecture along with weights matrices

Fig. 2. Illustration of Scree plot for extracted factors along with Eigen values.

Fig. 3. Estimates of factor weights for linear traits using varimax rotation (loadings)

Fig.4. Loading plot of nine factors showing the weights for 26 linear traits under study

Fig. 5. Coefficient of determination (R^{2}) values obtained for test day milk yield prediction using developed regression models

Fig. 6. Coefficient of determination (R^{2}) values obtained for 305-days milk yield prediction using developed regression modFig. 7. Illustration of observed and predicted ranks for each model along with ranking gap

Fig. 8. Frequency of ranking gap for each model along with cumulative ranking gap percentage