Главная Коллекция "Revolution" Социология и обществознание Factors influencing the choice of the people to lead a healthy lifestyle

Factors influencing the choice of the people to lead a healthy lifestyle

Concept and health indicators, the impact on his bad habits. A study of factors that influence the choice of people to stick to a certain type of lifestyle on the example of people aged 14 to 40 years old. The stages of the analysis and the results.

Рубрика	Социология и обществознание
Вид	курсовая работа
Язык	английский
Дата добавления	28.08.2016
Размер файла	882,9 K

посмотреть текст работы

скачать работу можно здесь

полная информация о работе

весь список подобных работ

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Страница:

Размещено на http://www.allbest.ru/

Introduction

Healthy lifestyle has always been important for people all over the world. There exist many definitions of this concept, however, one of the definitions was given in a Journal of Health and Social Behavior: «Health lifestyles are defined here as collective patterns of health-related behavior based on choices from options available to people according to their life chances (Cockerham 2000a)». [8]

It is well known for everybody what should be done in order to remain healthy for a long time and what should be definitely avoided. Still, a large number of people suffer from early alcoholism, smoking, overweight, hypodynamia, hypertonia along with osteochondrosis, and etc. Everyone understands that keeping up with the healthy lifestyle is crucial. Moreover, this topic has been actively discussed in mass media over the last 30 years. According to Roshina (2014) research paper, health is one of the major components of humans' well-being as well as a factor affecting human's income and other opportunities.

There exist a number of vital issues that affect human's health improvement or deterioration. For instance, in contemporary world people do not pay much attention to the diet they stick to. Popularity of fast food restaurants negatively affects health status and lead to overweight in all categories of people from small children to adults. What is more, this problem is observed not only in developed countries but also in developing ones.

Another issue is neglecting regular visits to doctors and avoiding the checks of the health status. As a result, a lot of sicknesses and problems are discovered at terminal stages. Smoking, as well as uncontrolled alcohol consumption, are habits that have horrifying effect on humans' health. Some people start smoking when they are young, due to the influence of the fashion and social environment. Unfortunately, only few of them are able to give it up when they understand what kind of addiction smoking is. Similarly, alcoholism frequently starts as a way to have fun with friends or get away from day-to-day problems. And these are just a small number of issues that have adverse effect on the health.

It should be mentioned that health status affects not only particular humans but the economy of countries in general. Poor health conditions of population results in two types of costs: direct and indirect (Kolosnytcina, Berdnikova, 2009) [2]. Direct costs are represented by populations' expenditures on medical services, such as visiting doctors, drugs, medical procedures, etc. Indirect costs of poor health condition of the population primarily affect labor market. It is known that people are the source of labor for every country and apparently healthy citizens work harder, skip less working days due to illnesses and provide more added value that is one of the main attributes of county's prosperity. Thus, the healthier the population is, the higher is gross domestic product.

Therefore, the health status of a population is a macro indicator that shows how developed a particular country is and how favorable are the conditions for living and working in it. It is essential to understand the current situation and find out whether there is an opportunity to improve the conditions for citizens in the future. That is why, modelling of factors that lead to healthy lifestyle of a population remains an important issue for research. Taking into account the importance of population's health for country's economy, it is crucial to understand, which factors affect health condition. This knowledge might be useful for construction of educational programs and programs promoting healthy lifestyle targeted at particular groups of people. Fitting educational programs to a particular classes of people will allow them to be more effective and reach higher results.

In this paper, we are going to investigate what factors influence the choice of people to stick to a particular type of lifestyle. The sample considered within our research consists of people aged 14 to 40 in Russia in a period from 2005-2014. We eliminated people that are older than 40 years old from our research in order to avoid age issues. We believe that after 40 people are likely to have health problems that do not depend only on their lifestyle, but also could happen due to the age. Our assumption is backed by the research of Ekaterina Scherbakova that was published in Demoscop Weekly, where it was discovered that people start having health problems that arise due to age at the level of 40-50 years old. So, we decided to take the lower border in our research to make sure that our results are not affected by health issues that depend on the age. [27]

The aim of our research is to discover factors that determine what type of lifestyle a particular person leads. On the first stage of the research the sample will be divided into clusters, according to a person's lifestyle type. For example, first cluster will include people who do not pay proper attention to their health. Second cluster will consist of people who pay attention to their health, but still have some unhealthy habits. Third cluster will contain people who care about their health status. So, the higher the number of cluster, the higher the attention a person pays to his health. The second stage of the research will be devoted to the determination of factors among several socio-demographic characteristics, which affect the probability of a person to stick to a particular lifestyle.

The paper is organized as follows. Firstly, the studies concerning the influence of various factors that affect healthy lifestyle will be examined and presented. Next, we will describe the theoretic approach that underlines factor and cluster analyses and present model specification used for the purpose of the research. Thirdly, we will describe the data that has been used in the paper and the model that has been applied to provide statistical analysis of the data. Finally, we will present the results of our research, discussion of obtained results and provide conclusion about what type of factors have significant effect on the peoples' choice of the healthy lifestyle pattern.

1. Literature Review

There is a vast amount of research aimed at determining what influences the type of healthy lifestyle, and it is deduced by the majority of researchers that there are different types (clusters) of healthy lifestyle that are influenced by a set of socio-demographic and economic variables like gender, social status, level of education, level of income etc. All the papers can be divided into two blocks based on the approach and methodology used for the purpose of the research. The first one is characterized by the examination of influence of socio-demographic factors on separate variables determining the lifestyle of the individual. At the same time, in other papers, due to significant interrelationship characteristics of lifestyle they are grouped into factors which are subsequently used for forming several clusters, combining people following a certain lifestyle. So the authors firstly determine clusters of individuals according to a set of characteristics of their lifestyle and then examine the influence of different socio-demographic and economic variables on the probability for an individual to fill into a particular cluster.

So, the final stage of the research for the two blocks of studies described above is the same: they both use independent variables separately to determine the influence of particular variable (socio-economic and demographic characteristics of respondents) on belonging to a particular type of healthy lifestyle.

Various researches performed for the samples consisting of individuals from different countries demonstrate that there is a significant association between socio-demographic and economic factors and the type of person's lifestyle measured in different ways. While there are certain differences in the way that particular factors affect the type of lifestyle for various countries, in general all of the authors report that such characteristics as age, gender, income, education, etc. have an impact on an individual's lifestyle.

In this section, we firstly consider papers, in which the authors apply different variables describing person's lifestyle separately, and then turn to papers where individuals are first combined into clusters on the basis of lifestyle factors.

We start with the basic paper of Grossman (1999) [7].The author states that a person is born with a certain amount of health. Health is one of types of human capital. It is viewed as a durable capital stock that yields an output of healthy time. The amount of health may increase or decrease due to different factors, however these factors play the role of investment in human health. Such factors as healthy diet, physical activity, medical care may increase the reserve of health, while smoking and drinking cause a negative effect on human's health. In Grossaman's paper, the author considers two demand functions: for health and for medical care as one of inputs into production of health. He uses the data from United States survey conducted in 1963 by the National Opinion Research Center and the Center for Health Administration Studies of the University of Chicago. In this research the stock of health is measured by individuals' self-evaluation of their health status. Healthy time, the output of health capital, is measured either by the number of restricted activity days due to illness or injury or the number of work-loss days due to illness or injury. Medical care is measured by the respondent's medical expenditures such as expenditures on doctors, dentists, hospital care, drugs, etc. While exploring health demand function, all of the initial hypothesis are confirmed by the research results. Wage rate and education appeared to be significant factors that positively affect health regardless of the measure of health applied. What is more, age is also a significant determinant of health, but the relationship between age and health level is negative. On the other hand, the association between age and medical expenditures is found to be positive and significant. These findings are in line with pure investment model. In the demand function for medical care, wage rate is shown to change its sign and both variables (education and wage rate) became insignificant.

Qi, Phillips and Hopman (2006) [6] demonstrate significant influence of socioeconomic status on health behaviors and the use of preventative screening. In this study, the relationship between individual characteristics and health behavior is explored. Income, education, age, sex, marital status, BMI, residence and access to regular physician are considered as factors that determine health behavior. Like in many papers related to this topic, the dependent variables are various factors describing health behaviors (such as smoking, excessive alcohol use and regular physical activity) and also the utilization of preventative screening such as mammography, blood pressure checks and others. The authors run separate logistic regressions for each dependent variable accounting for health behaviors and preventative screening. For the purpose of the analysis, data from National Population Health Survey of Canada for 1998-1999 is used. The sample consists of about 14000 respondents aged 20 and more. The results of the research demonstrate that the level of income has a negative association with smoking and excessive alcohol use: the lower the income of an individual is, the higher is the probability of smoking and alcohol drinking, keeping other variables constant. Higher incomes are also associated with greater probability to engage in regular physical activity. Negative relationship is found between the level of education and smoking and drinking levels. Besides, the increase in the level of education leads to a greater likelihood of having regular health checks. Finally, the authors obtain an unexpected result, concerning checks of blood pressure: individuals with lower income are more likely to have checked their blood pressure during previous 12 months. The researchers explain this by the fact that, in general, Canadians with lower income use basic health care more than wealthier Canadians. The other proposed reason is that lower income is associated with higher danger of heart diseases, thus, blood pressure could be observed as a part of actual heart disease management.

Kolosnytsina and Berdnikova (2009) [2] in their research focus on an excess weight as one of the important factors affecting human health. The authors appeal to the research conducted by Roshchina (Roshchina, 2008) where it is confirmed that body mass index has a positive and significant impact on the probability for an individual to have such diseases as hypertension, heart attack, other cardiovascular diseases, and diabetes. In their own research the authors also prove that higher body mass is associated with higher medical expenditures as well as higher temporary disability. So, the authors explore factors affecting excess weight measured by body mass index. They use a sample of about 11000 Russian people that participated in Russian monitoring of the economic situation and public health in 2006. Population in Russia is characterized by a medium level of obesity in comparison to other countries. According to statistics, the highest coefficient of men who have overweight is observed in USA and equals 75,6% while the lowest one is observed in Romania - 37,7%. The highest value of the coefficient for women is again observed in USA and equals 72,6%, while the lowest one is in France - 34,7%. In Russia 46,5% of men have overweight, what is less than women with overweight. The percentage of women with obesity in Russia composes 51,7%. The major explanation of this phenomenon is lack of activity. According to the results of the research three factors are reported to have a significant effect on weight. To the amount of these factors refer gender, age and the level of education. Age is found to be an important determinant of excess weight. The authors report non-linear dependence between age and obesity. The share of people with excess weight increases significantly after the age of thirty and reaches its peak after they turn 45. At the same time the fraction of people with excess weight is much lower in age group 70+. This is explained by the fact that food preferences and lifestyle of the elderly people were determined much earlier and have not undergone such significant changes in recent years comparing to young generations. Education is also demonstrated to be an important factor affecting body mass index. The association between education and excess weight is reverse for men and women. More educated women are less likely to have problems with excess weight whereas men with higher level of education tend possess higher body mass index. For example, 63% of men with incomplete secondary education have normal weight. This share reduces to 45% in the group of men with higher education. The authors propose two theories. The first one is neoclassical theory. It states that ideal weight is a normal good. When prices become lower, consumption of goods become higher and this becomes a result of an overweight. The second theory is behavioral. It states that style of food is just the matter of preferences.

One more study is completed by Kim(2003) [5]. His main proposition is that socioeconomic status plays a huge role in defining the lifestyle a person is following. The sample in this research is composed of individuals from China and Unites States. Chinese people are observed using random cluster process in eight provinces, while Americans were chosen by complex, multistage, area probability sample design method. As a measurement of a healthfulness of lifestyle, a new index is formed. It is named Lifestyle Index and it is based on four main factors such as alcohol drinking, smoking, diet and physical activity. In order to form the index different weights are assigned for each of four main factors. To define the level of socioeconomic status, the level of income and the level of education are taken. As control variables age, gender, race and area of living are used. The research is conducted separately for China and USA using several methods. First, the effect of income and education on healthfulness of lifestyle is examined using logistic regression. Then the authors compare socioeconomic characteristics of respondents in the highest tertile or quartile of the Lifestyle Index score with the same characteristics of those in the lowest tertile or quartile of the Lifestyle Index score. It is deduced that China and USA demonstrate different results concerning the influence of independent variables under consideration. In China an increase in income leads to a decrease in likelihood of following healthy lifestyle, while in America the result is the opposite. The same pattern holds for increase in the level of education. In America the higher is the level of education, the higher is the level of leading healthy lifestyle, while in China, again, the effect is opposite. When both factors (the level of education and the level of income) are combined, it is found that if socioeconomic status improved, the likelihood of leading healthy lifestyle significantly decreased in China, while in America the likelihood is increased by the same value. These differences are explained by the fact that the patterns of the lifestyle transition depend on the level of the country's development. In developing countries such as China people with lower socioeconomic status tend to maintain healthier lifestyle as they are involved in more physical activities and consume more natural food such as fruits, vegetables and grain which is low fat. Wealthier people in developing countries consume more «processed foods, which are commonly high in fat, salt, and refined sugar». In developed countries pursuing healthier lifestyle costs more. So, only people with higher socioeconomic status and income can afford this. They are able to buy natural rather than processed foods, join health clubs and invest in expensive sports equipment.

So far we've examined a block of papers where various proxies were used for healthy lifestyle (health behaviors, utilization of preventative screening, self-evaluation of health status, body mass index, lifestyle index, etc.). In these researches the authors estimate the influence of socioeconomic and demographic factors on various proxies for healthy lifestyle separately. In all of these researches such factors as the level of income and education as well as age and gender are demonstrated to be important determinants of person's lifestyle.

In the second block of researches, the authors first define clusters of respondents based on the set of criteria characterizing healthy lifestyle. Then they either examine the difference between clusters in terms of socio-demographic and economic characteristics or estimate the association between the probability to fill into a particular health cluster and a number of socioeconomic and demographic variables.

Chan and Leung (2015) [3] look at Hong Kong Chinese population and conduct a research using 2 step cluster analysis. The aim of the paper is to understand what type of lifestyle behavior dominates and how socio-economic and demographic characteristics vary between various clusters designed based on peoples' health. The data for the research were collected through a structured self-reported questionnaire. Clusters of healthy lifestyle are determined based on the set of lifestyle characteristics including physical activity, following the diet, smoking and excessive alcohol consumption. The socio-demographic variables used in this study include age, gender, health status, and socioeconomic status measured by person's education and employment. The results are obtained using AIC criteria and LogLikelihood distance. The authors form two clusters: «healthy» and «less healthy» which is characterized by a greater share of participants who smoke, drink more alcohol, don't follow a diet and are less engaged in physical activities. They report that socio-demographic factors such as health status, age and gender affect lifestyle behavior. Namely, among participants of «less healthy' cluster prevail young men with high-to-middle education which are employed. Thus, these results are in line with the one obtained by Kim (2003) for the sample of Chinese people. More educated and wealthier Chinese people tend to be less healthy.

One more study concerning healthy lifestyle is conducted by Chan, Mok, Wong, Lee, Fok (2006) [4]. The authors are looking at Hong Chinese people and try to identify their healthy profiles using cluster analysis method. The sample includes 702 participants (nurses) that are observed using Hong Kong Telephone directory. First, the authors perform factor analysis and generate 7 factors accounting for almost 70% of the variation. These factors include physical activity, health knowledge, mental health, diet, smoking, sexual behavior and social health. Second, they carry two step cluster analysis and obtain 2 clusters characterizing people health profiles. The results are obtained based on BIC criteria and Loglikelihood distance. The respondents that fill in cluster 1 account for more that 60% of the sample and are characterized by better health knowledge, less smoking, following diets and having better social health which is measured by the healthiness of relations with family, friends and society. However, at the same time these respondents tend to be less engaged in physical activity. Cluster 2 accounts for about 40% of the sample and is characterized by poorer health knowledge and social health, more smoking and not following the diet. The respondents in this cluster, however, do more physical activity. The two clusters do not differ in terms of mental health and sexual behavior. The authors also report significant differences between clusters in terms of some demographic variables. The respondents that fill into cluster 2 are predominantly men that have no religion beliefs and are characterized by higher BMI and blood pressure. So, in this research the authors prove that there exist certain health profiles among Hong Kong Chinese people and a set of demographic characteristics differ between people characterized by these profiles. These results are in line with those of (Shin 1999, Hsu & Gallinagh 2001, Tung & Gillett 2005) [22], [23], [24] who report similar population profiles for people in other Asian countries.

Glorioso and Pisati (2014) [20] distinguish 13 various lifestyles on the basis of questionnaire of Italian people. These lifestyles are distinguished based on several variables, such as body mass index (BMI) and frequency of its measurement, the frequency of physical activity, smoking, following the diet, passing the main medical tests (measurement of cholesterol, etc.), using an alternative medicine. The authors also report that there is a significant link between social status and belonging to a particular lifestyle. For example, people with higher education tend to heave healthier lifestyle.

Poortinga (2006) [24] also considers the problem of leading healthy lifestyle. The author studies the clustering of four risk factors such as alcohol drinking, smoking, lack of activity and fruit and vegetables eating among men and women. He uses multinomial multilevel regression model to examine the variation of social-demographic factors in obtained clusters of risk factors. For the purpose of the research, Poortinga uses the data from Health Survey of England population (2003) for the sample consisting of almost 15000 respondents aged 16 and over. 6% of the sample appear to have no lifestyle risk factor, 26% have one, 42% have two and 25% have three lifestyle risk factors. About 5% of the respondents have all four lifestyle risk factors. The clustering appears to be more pronounced for women. According to the results of the research, higher number of lifestyle risk factors is more typical for men, lower social class households, singles, and economically inactive. At the same time, older age groups and homeowners are e less likely to have a higher number of lifestyle risk factors.

One of the most important papers concerning research of leading healthy lifestyle in Russia is the paper by Roshina (2014) [1]. The author tries to construct a typology concerning healthy lifestyle and identifies principal factors that affect leading healthy lifestyle. She is also aimed to find out whether social inequality plays a role in health. Roshina takes the data from RLMS 2010-2014. The author looks at the habits of people aged 14 and over years old and tests which factors have positive influence and which have negative effects on health. The author considers the following characteristics as components of lifestyle: physical activity, visiting doctors, taking vitamins, nutrition, diet, body mass, working hours, smoking and drinking. On the basis of these characteristics the cluster analysis is conducted and as a result, five clusters from A to E are formed. A is characterized as the most healthy one, while class E is the worst it terms of health. It is characterized by the highest alcohol consumption and smoking. The respondents in the other three clusters between the two extreme ones possess not all but some bad habits. The second step of the research is to define how different variables affect the probability of person to fall into a particular health cluster. To the amount of these variables refer age, sex, education, income, health, employment, professional status. Also, one more variable is considered - social class, which is based on the status of a member of a family, who brings income. The author uses multinomial logistic regression to reveal factors affecting lifestyle. From the regression analysis conducted, it is deduced that sex, education and age are significant in influencing leading the particular type of healthy lifestyle, as well as `social class'. The initial hypothesis that social class significantly positively influences the type of lifestyle led by a man is not rejected. This result is in line with the one obtained by Christensen and Capriano (2014) [19]. In their research the authors prove that social class has a significant effect on body mass index of Danish women. The unexpected result is that no matter the fact that statistics under constructed `social class' variable is lower, the results are better interpreted. Besides, the author reports that people with higher education and income tend to have healthier lifestyle.

Overall, there are many studies covering the problem of healthy lifestyle and factors affecting it. Various authors examined which factors have a significant impact on health, using different samples for developed and developing countries.

Different proxies of health and healthy lifestyle are applied in research. Some of the authors use one particular variable to measure health, such as body mass index (Kolosnytsina and Berdnikova, 2009; Christensen and Capriano, 2014), individuals' self-evaluation of their health status (Grossman, 1999) or use of preventative health screenings (Qi et al., 2006). The other authors use either complex index of health based on several characteristics of lifestyle (Kim, 2003) or perform cluster analysis on the basis of various components of lifestyle (Roshchina, 2014; Chan et al., 2006; Poortinga, 2006). To the amount of components of lifestyle usually refer four risk factors, such as smoking, excessive drinking, not following the diet and lack of physical activities. These factors can be completed by such variables as body mass index, frequency of visiting the doctors and passing medical tests as well as access to medicine.

The results of researches prove that, first, there exist certain patterns of peoples' behavior in terms of lifestyle. In other words, people can be grouped into particular clusters, characterizing their lifestyle. Second, there is a set of factor that affect person's health or choice of lifestyle. To the number of these factors refer several demographic and personal characteristics, such as age, gender, marital status, religion, place of living and BMI. Besides, socio-economic factors such as education, income, employment and economic status are reported to have a significant influence on health. There is some variation in the results of the research devoted to healthy lifestyle, however, there are also some similarities. First, education is reported to have a positive impact on health on various samples for developed countries (Grossman, 1996; Qi et al., 2006; Poortinga, 2006). However, for developing countries the results are controversial. For the sample of Chinese people it is shown that more educated people are characterized by poorer health (Kim, 2003; Chan et al. 2015). For the sample of Russian people education have a positive impact on health for women and negative for men (Kolosnytsina, Berdnikova, 2009). Income has a positive impact on health in developed countries (Grossman, 1996; Qi et al., 2006; Poortinga, 2006) and controversial in developing. Kim, 2003 reports negative association between income and health of Chinese people whereas Roshchina, 2015 reports positive association for respondents in Russia. Age is found to have a positive effect on health by Chan et al, 2006 and negative by Grosssman, 1999 and Poortinga, 2006. Kolosnytsina, Berdnikova, 2009 and Roshchina, 2015 report non-linear relationship between age and lifestyle. The direction of influence of sex and marital status on healthy lifestyle varies among researches. Kolosnytsina, Berdnikova, 2009 and Roshchina, 2015 report that men tend to follow healthier lifestyle, whereas Poortinga, 2006, Chan et al., 2015, Chan et al. 2006 discover that women are healthier. Finally, social class is shown to have a positive effect on following healthy lifestyle. Poortinga, 2006 and Roshchina, 2015 report that respondents belonging to higher social class are more likely to follow healthy lifestyle.

In our paper, we also investigate the influence of factors affecting leading healthy lifestyle examining Russian people aged from 14 to 40. We perform the analysis using the factor and cluster approaches in order to split the sample into categories (clusters) of people who tend to lead similar lifestyle patterns. Then we run a regression model to test the influence of different variables on the possibility of getting into a specific lifestyle cluster.

2. Methodology of the research

In our paper we use factor and cluster analyses which are very suitable to evaluate survey data such as RLMS-HSE.

Since we use respondents' answers to a set of questions, concerning lifestyle, there might be a significant correlation between initial variables responsible for the description of respondent's lifestyle. So, it is reasonable to first use factor analysis for data reduction. Factor analysis allows to reduce the data by means of combining initial variables into their linear combinations based on correlation between them. Factors represent linear combinations of initial variables that contain the most needed information to provide the research. Factor analysis finds a few common factors (say, n of them) that linearly reconstruct the m original variables.

y_ij = в_i1x_1j + в_i2x_2j + · · · + в_iqx_qj + e_ij_,where

· y_ijis the value of ith observation on the jth variable

· в_ik is the ith observation on the kth common factor

· x_kj is the set of linear coefficients called the factor loadings

· e_ij is similar to a residual but is known as the jth variable's unique factor

By reconstructing we mean that, while applying principal component factor analysis, means minimum residual variances are summed across all equations (eigenvectors are returned into normalized form with unit length, L'L=I). [25],

Once the factors and their loadings have been estimated we may interpret them. Interpretation typically means examining the x_kj's and assigning names to each of the factor. In order to provide independence of factors we perform orthogonalization procedure. [25], [14].

So, factor analysis has three stages:

1. Preparation of the covariance matrix (Sometimes the correlation matrix is used instead);

2. Discharge of the original orthogonal vectors (main stage);

3. Rotation in order to obtain a final decision.

First stage - we observe covariance matrix to understand correlation and possible similarity between variables. Then, on the second stage, we use statistical package (in our case Stata) and construct factor analysis. We choose appropriate number of factors that will be used to evaluate factor loadings (pattern matrix) and unique variables based on factors cumulative table (cumulative value should not be more than 0.8). The third stage - we conduct factor analysis (using number of factors from stage two) and obtain orthogonal factors (pattern matrix), using rotation option. After that, we identify each variables to exact factor. We do this based on a principal that factor loadings matrix contains correlation between variables and factors. Moreover, we can identify similarities between factors and variables, finding the maximum correlation between each variable and factors.

Final decision is to understand what exactly each factor means based on a group of variables, which it contains. Our main goal is to reduce the complexity in a set of data to detect the latent structure in the data. Such variable restriction process allows us do not exclude valuable variables from analyzing process. What is more, factor components are suitable to conduct model evaluation without possible problem such as multicollinearity. The results of factor analysis could be suitable to obtain cluster analysis, which is the process of identifying groups of objects that are homogeneous within themselves and heterogeneous between each other. One can use several different methods to identify those clusters.

In our analysis we do not need to create subclusters, so we will use nonhierarchical approach with K-means set, which is a widely used technique for vast datasets. K-means sets the cluster centroids randomly and assigns each object to the cluster with the closest centroid.

The K-means algorithm is a procedure that tends to find out the the data in such a way that within cluster variation is minimized, where cluster variation is the distance from the observation to the center of the associated cluster. Since we work with a really small dataset and the results of the hierarchical approach are reproducible we decided to stick to this one. [16]

We should also discuss approach how to choose the appropriate number of clusters. Using hierarchical analysis - suitable options - graphical analysis using dendogram. In our case we will use K-means method so we will use Calinski approach (Calinski T, [17])

The Calinski-Harabasz criterion is sometimes called the variance ratio criterion (VRC). The Calinski-Harabasz index is defined as

, where

· SSB is the overall between-cluster variance

· SSW is the overall within-cluster variance

· k is the number of clusters

· N is the number of observations

If SSB is large and SSW is small, then clusters can be described as wee-defined. To find out the optimal number of clusters, we need to maximize VRCk ratio with respect to k. The optimal number of clusters is the solution with the highest Calinski-Harabasz index value. [26], [17]

After performing cluster analysis, we will use the result of this analysis to construct multinominal logit model. In the multinomial logit model we assume that the log-odds of each response follow a linear model

з_ij=log(р_ij/р_iJ)=бj+x?_iв_j, where

· б_jб_j is a constant

· в_j is a vector of regression coefficients, for j=1,2,…, J?1.

From this model we may see, that it is analogous to a logistic regression model, however the difference is that we have J-1 equations here instead of one and probability distribution response is multinomial instead of binomial.

What is more, there exist no difference in multinomial regression models concerning the choice of the reference cell, as we can always convert one formulation to another.

To sum everything up, suppose that there are k categorical outcomes and-without loss of generality-let the base outcome be 1. Accorging to Greene W.H. [18], the probability that the response for the jth observation is equal to the ith outcome is

, where

· x_jis the row vector of observed values of the independent variables for the jth observation

· вm is the coefficient vector for outcome m.

3. Data Description

The data that is used in order to construct the model for the current research originates from non-state monitoring survey of the population of our country called «Russian Longitudinal Monitoring Survey» (RLMS-HSE). The data in the survey represents a stable sample and published on the annual basis. It is assumed that the same people are questioned every year. In the current research we are going to use the data for ten years in the period 2005-2014. However, the representative years are 2005, 2006, 2010, and 2011 due to missing variables in a survey analysis, while respondents did not answer the questions (not filled them).

The main aim of the research is to determine the factors that affect healthy pattern of people, so it is essential to keep permanent track of the information they provide. So, if the person participates in the survey in one year but skipped it in another we had to dismiss him from the data set. Also a large number of people provided illogical and inconsistent answers that could negatively affect the result of the model. For instance, there exist people in the survey who provided the information that in 2005 he or she quitted smoking 5 years ago (so, around year 2000). However, in the survey of 2010 the same person provides information that he or she stopped smoking 5 years ago. Unfortunately, it cannot be true in both cases, so we had to get rid of such observations in the data. As a result, the data consists of people who provided accurate answers to all corresponding questions throughout the period.

In the research we are using the data that consists of the variables presented in the Appendix 1.

4. Descriptive Statistics

Now, we would like to provide information on the data that I use in my research.

In my research we take a sample of people aged from 14 to 40. We base our work on the panel data of individuals from 2005 to 2014. It consists of 492 men (43.46%) and 640 women (56.54% respectively) who are included in the panel by sex composition. Initially, the panel for testing included 1132 respondents. However, some people were not included in the empirical research due to the fact that they did not provide comprehensive answers to the survey questions. The typical example when the person was excluded from the panel is connected to the fact that a respondent answered «it is hard for me to say» or just ignored some of the questions. (missing data points and «noise data», in general, are the main problems of the research, that is based on the data of sociological surveys).

We have also replaced by missing data points all answers that were: «no answer», «do not know», «it is hard for me to say». This procedure is necessary because it is not possible to analyze and interpret this kind of answers to the questions.

Before conducting an econometrician research in order to identify factors that affect keeping up with a healthy lifestyle, it is important to examine whether the data that is supposed to be used in the research contains errors. We have performed the analysis to find out if there exist outliers (or «noise data») that are unique responses that might be inexplicable. For instance, negative age or weight that is lower than ten kilos, or height that is more than three or four meters high, and etc. Such answers might distort the results of the modeling, therefore they should be avoided (or replaced my missing data points). In the current research we have excluded outliers from the data in order to perform econometric modeling.

Now I would like to provide additional information on the data on which this research is based on.

Firstly, I would like to dwell on questions that implied free responses. We believe that this form of information gathering might provide unexpected results, as people are not constrained by multiple-choice questions. For instance, variable m72_, which states the age when the respondent started smoking, has a minimum value of 4 and maximum value of 34. We decided to exclude the age of 4,5,6 and 34 years old, as they are outliers.

Another question with a free response answer is connected to variable m75_, which states for the number of cigarettes per day. It has a maximum value of 60. This figure also seems extremely high. However, we believe that there exist people who are such a heavy smokers and use 3 packs per day. In fact, we can often see on TV police officers or news writers who smoke one cigarette after another during tense periods of everyday or work life.

Variable m78_ states for the number of years that an individual gave up smoking. This variable takes value from the range from 0 to 21. The answer 21 is also taken into account as the age of a person at the moment of answering may be 39. The answer concerning weight of a 31 year's old respondent raises doubts as it was 21 kg. However, this individual misses the analysis as he answered only a number of questions such as: l5_, l20_, m3_, s_occup.

So, we did not find the noticeable distortion of the data. There were cases of outliers (based on weight, number of cigarettes per day), however, it was decided not to remove them from the data. We believe that they could be logically explained and what is more important provide additional value for the current research.

Table 1. Descriptive statistics of the main variables

Variable	Observations	Mean	Std. Dev.	Min	Max
Gender	111320	1.565371	4957301	1	2
Number of visits to the doctor	11032	2.095178	1.001679	1	5
Did you have any health problems during the last 30 days?	11300	1.78	4142646	1	2
Were you in a hospital during the last 3 months?	10185	1.957486	2017674	1	2
How would you rate your health?	10130	3.553011	6181218	1	5
Do you smoke now?	11301	1.636758	4809551	1	2
Remember please, when did you start smoking? How old were you?	3112	16.31652	2.955865	4	34
Did you smoke during the last 7 days?	4089	1.005869	0763962	1	2
How many cigarettes do you smoke daily?	4025	14.12174	7.328462	1	60
Have you ever smoked?	7123	1.812158	3906135	1	2
How long ago did you quit smoking?	1229	4.235151	4.274483	0	21
Did you consume alcohol during the last 30 days?	8555	1.293279	4552917	1	2
How often you consumed alcohol during the last 30 days?	6347	2.501024	1.082534	1	6
Did you run in the last 30 days?	9038	1.040385	1968715	1	2
Did you swim in the last 30 days?	7904	1.042637	2020493	1	2
Did you go to fitness in the last 30 days?	9031	1.057579	2329594	1	2
How did your weight change over the last year?	10984	2.327749	7265783	1	3
Did you dance in the last 30 days?	9035	1.019812	139361	1	2
Did you play basketball or football in the last 30 days?	9039	1.958181	2001859	1	2
How often do you do physical exercises?	8390	1.834923	1.454371	1	5
Did you miss your work during the last 30 days due to illness?	6751	1.938824	2396708	1	2
Year of birth	11320	1981.421	5.216236	1974	1991
Age	11320	28.07862	5.954817	14	40
Weight	10789	69.80668	15.47784	21	160

Next I would like to provide descriptive statistics of the data that was gathered with the help of questions with multiple-choice answers.

Initially we would like to take a look at the marital status of people who participated in the survey. We observe that there are more married women than men (42.6% of women are married among the total number of women, men - only 36.9%). At the same time women exceed men in terms of divorce rate, so there are more women who ended their official relationships than men.

Table 2. Contingency table for gender and marital status ,count (row %)

marital status
gender	never married	married	live together but not married	divorced	widow (widower)	total
male	266 (54.5)	180 (36.9)	32 (6.6)	10 (2.0)	0 (0.0)	488 (100)

female	260 (40.9)	271 (42.6)	71 (11.2)	32 (5.0)	2 (0.3)	636 (100)
total	526 (46.8)	451 (40.1)	103 (9.2)	42 (3.7)	2 (0.2)	1 124 (100)

Following step is to examine educational status of the respondents in details. As we see in the Table 3. the data contains more not finished school among men rather than women (20,5% for men versus 16,5% for women respectively). Also we can see that there are only 14,2% of respondents have higher education.

Table 3. Contingency table for gender and level of education status, count (row %)

level of education
gender	finished 0-6 classes	did not finish school (7-8 classes)	did not finish school (7-8 classes) + additional education	finished high school	finished prof education	finished higher education	total
male	7 (1.4)	45 (9.2)	100 (20.5)	180 (37.0)	99 (20.3)	56 (11.5)	487 (100)

female	6 (1.0)	35 (5.5)	105 (16.5)	219 (34.4)	169 (26.5)	103 (16.2)	637 (100)
total	13 (1.2)	80 (7.1)	205 (18.2)	399 (35.5)	268 (23.8)	159 (14.2)	1 124 (100)

Furthermore, we can observe that only few people in Russia rate their health level as good or very good (less than 3% among both men and women). Men are more likely to consider their health status as bad. These are also interesting results as they describe overall attitude towards health in our country. We can assume that it is not as good as it might be and there should be reasons that explain why Russian people intend to have poor health.

Table 4. Contingency table for gender and self-assessment health status, count (row %)

How do you rate your level of health
gender	very good	good	normal	bad	very bad	total
male	1 (0.2)	13 (2.7)	177 (36.1)	273 (55.7)	26 (5.3)	490 (100)

female	2 (0.3)	13 (2.0)	287 (44.9)	314 (49.1)	23 (3.6)	639 (100)
total	3 (0.3)	26 (2.3)	464 (41.1)	587 (52.0)	49 (4.3)	1 129 (100)

The next point of the research concerns smoking. Men smoke more often than women (54.9% vs. 20.4% for women). The same result we obtain from answers of respondents concerning alcohol drinking. However, a distinction among men and women is not so high (51.3% vs. 43.5%). So we can conduct that at least half of the respondents consume alcohol on the permanent basis.

Table 5. Contingency table for gender and smoking, count (row %)

Do you smoke?
gender	yes	no	total
male	268 (54.9)	220 (45.1)	488 (100)
female	130 (20.4)	506 (79.6)	636 (100)
total	398 (35.4)	726 (64.6)	1 124 (100)
Pearson chi 2 (1) = 143.5166 Pr=0.000

Table 6. Contingency table for gender and alcohol consumption, count (row %)

Did you consume alcohol during the last 30 days?
gender	yes	no	total
male	250 (51.3)	237 (48.7)	487 (100)
female	275 (43.5)	357 (56.5)	632 (100)
total	525 (46.9)	594 (53.1)	1 119 (100)
Pearson chi 2 (1) = 143.5166 Pr=0.000

Having examined the statistics by gender as well as the suitability of responses about bad habits and the respondent's self-assessment about his or her health, based on Chi2 statistics we may observe that interrelationship gender and bad habits exist. Also, we can conduct that there is a tendency about harmful effects of bad habits on the individual's state of health.

5. Research hypotheses

We have formed the following hypotheses that we are going to test in the current research.

Hypothesis 1: Health status deteriorates with the age (both for men and for women);

The logic behind this hypothesis is that the older a person gets, the higher is the risk to suffer from some serious diseases. Young people tend to be sick less frequently as possess higher immunity and frequently involved into more physical activities.

Hypothesis 2: Weight has a significant impact on the health status;

Excess weight negatively affects health of a respondent. As was shown in several researches (Kolosnytcina, Berdnikova, 2009; Roshina, 2015) people with higher body mass tend to suffer more frequently from heart diseases and other dangerous illnesses.

Hypothesis 3: Women are more likely to lead a healthy life

The hypothesis can be explained by the fact that women are often more concentrated about their health. They are more likely to follow various diets and take vitamins in comparison with men.

Hypothesis 4: People with higher level of education are in favor of healthy lifestyle

The hypothesis can be explained in two ways. First, people with higher level of education are more conscious and are more likely to pay attention to their health. Second, higher level of education might be a proxy for higher social status and income. And people with higher income can afford buying natural food and visit fitness centers.

Hypothesis 5: Marital status plays a significant role in determination of healthy lifestyle. Namely, married people tend to follow healthier lifestyle patterns.

The logic behind this hypothesis is that married people are more aware of. Family couples, especially ones with kids, usually have less bad habits such as excessive alcohol usage and smoking. They also pay attention to their nutrition, consuming less fast food for example.

To test these hypotheses, initially, we conduct a cluster analysis of respondents' health status and then build a multinomial model to discover factors that affect probability of an individual to fill into a particular cluster.

...