Composite Index
1. Meaning of Composite Index
A Composite Index is a single summary measure constructed by combining multiple individual indicators to represent a multidimensional concept that cannot be captured by a single variable.
Examples
Human Development Index (HDI)
Multidimensional Poverty Index (MPI)
Consumer Price Index (CPI)
Composite indices are widely used in economic, social, and development research for comparison across regions or over time.
2. Need for Composite Index
Captures multidimensional phenomena
Simplifies complex information
Enables ranking and comparison
Useful for policy formulation and evaluation
Steps in the Construction of a Composite Index
The construction of a composite index involves systematic and transparent steps to ensure reliability and validity.
Step 1: Conceptual Framework and Objective Definition
The first step is to clearly define:
What is being measured
Why the index is required
The dimensions involved
Example:
For HDI, the concept of human development includes health, education, and income.
Step 2: Selection of Indicators
Relevant indicators are selected for each dimension. Indicators should be:
Relevant
Measurable
Reliable
Comparable across units
Example (HDI):
Health → Life expectancy at birth
Education → Mean years of schooling, Expected years of schooling
Income → GNI per capita
Step 3: Data Collection
Data are collected from:
Census
Surveys
Administrative records
National and international databases
The data should be consistent across time and space.
Step 4: Treatment of Missing Values
Missing data must be handled carefully using methods such as:
Mean substitution
Interpolation
Deletion of indicators (if unavoidable)
Improper handling can distort the index.
Step 5: Normalization (Standardization) of Data
Since indicators are measured in different units, they must be converted to a common scale.
Common normalization methods:
Min–Max normalization
Z-score standardization
Min–Max formula:
X∗=X−XminXmax−XminX^* = rac{X - X{min}}{X{max} - X_{min}}X∗=Xmax−XminX−Xmin
Step 6: Direction of Indicators
Indicators may be:
Positive (higher value = better outcome, e.g., income)
Negative (higher value = worse outcome, e.g., infant mortality)
Negative indicators are transformed so that higher values always indicate better performance.
Step 7: Weight Assignment
Weights reflect the relative importance of indicators.
Methods of weighting:
Equal weighting
Expert judgment
Statistical methods (e.g., PCA)
Example:
HDI assigns equal weight to its three dimensions.
Step 8: Aggregation of Indicators
Normalized and weighted indicators are combined to form the composite index.
Methods:
Arithmetic mean
Geometric mean
HDI uses the geometric mean to reduce substitutability among dimensions.
Step 9: Sensitivity and Robustness Analysis
This step checks:
Sensitivity to choice of indicators
Sensitivity to weights and aggregation method
Ensures reliability of the index.
Step 10: Interpretation and Validation
The index values are interpreted, compared, and validated using:
Rankings
Time-series comparison
Cross-country or inter-regional analysis
Advantages of Composite Index
Captures multidimensionality
Simplifies complex data
Facilitates comparison
Useful for policy formulation
Limitations of Composite Index
Subjectivity in indicator and weight selection
Data availability constraints
Risk of oversimplification
Sensitivity to methodology
3. Dealing with Missing Values in Composite Index Construction
Missing values can distort index values and rankings. Hence, careful treatment is required.
(a) Deletion Methods
1. Listwise Deletion
Remove observations with missing values
Simple but reduces sample size
Suitable only when missing data are minimal and random
2. Pairwise Deletion
Uses all available data for each indicator
May create inconsistencies across components
(b) Imputation Methods
1. Mean/Median Imputation
Replace missing values with mean or median
Easy to implement
Reduces variability and may bias results
2. Regression Imputation
Predict missing values using other variables
More accurate but model-dependent
3. Multiple Imputation
Generates several plausible values
Accounts for uncertainty
Considered statistically robust
(c) Indicator Adjustment Methods
1. Re-weighting
Adjust weights of remaining indicators
Prevents penalising units with missing data
2. Normalisation-Based Substitution
- Use regional or group averages after standardisation
(d) Threshold-Based Exclusion
Exclude indicators or units if missing data exceed a defined limit
Maintains reliability of the index
4. Best Practices
Analyse the pattern of missingness
Use transparent and consistent methods
Conduct sensitivity analysis
Clearly report assumptions and methods
Sources of Financial Data for Money and Capital Markets
Researchers rely on primary and secondary sources of financial data to identify research problems, formulate hypotheses, and conduct empirical analysis.
1. Sources of Financial Data for the Money Market
The money market deals with short-term funds and highly liquid instruments such as treasury bills, call money, commercial paper, and certificates of deposit.
(a) Central Bank Publications
The most important source of money market data is the Reserve Bank of India (RBI).
Key RBI publications include:
Handbook of Statistics on the Indian Economy
RBI Bulletin
Annual Report of RBI
Database on Indian Economy (DBIE)
These provide data on:
Call money rates
Treasury bill yields
Repo and reverse repo rates
Money supply (M1, M2, M3)
Liquidity conditions
(b) Government Sources
Ministry of Finance
Controller General of Accounts
Data includes:Short-term government borrowings
Treasury bill auctions
Fiscal deficit financing
(c) Financial Institutions and Banks
Commercial banks
Primary dealers
They publish data on:Interbank lending
Deposit and lending rates
Credit growth
(d) International Sources
International Monetary Fund (IMF)
World Bank
These provide comparable cross-country money market indicators such as:
Interest rates
Inflation
Monetary aggregates
2. Sources of Financial Data for the Capital Market
The capital market deals with long-term financial instruments such as shares, debentures, and bonds.
(a) Stock Exchanges
Major stock exchanges are:
Bombay Stock Exchange (BSE)
National Stock Exchange (NSE)
They provide:
Share prices and indices
Trading volume
Market capitalisation
Volatility measures
(b) Market Regulator
The Securities and Exchange Board of India (SEBI) publishes:
Market surveillance reports
Investor participation data
Mutual fund statistics
Corporate governance data
(c) Corporate Financial Statements
Annual reports
Balance sheets
Profit and loss accounts
These help analyse:Firm performance
Capital structure
Dividend policy
(d) Financial Databases and Research Institutions
CMIE (Centre for Monitoring Indian Economy)
Stock market databases (firm-level and sectoral data)
(e) International Capital Market Data
World Federation of Exchanges
IMF’s Global Financial Stability Reports
Used for comparative and global market analysis.
3. Role of Financial Data in Identifying Research Issues
Financial data helps researchers to identify and refine research problems in the following ways:
(a) Detecting Trends and Patterns
Rising interest rate volatility may indicate liquidity stress.
Stock market booms or crashes suggest speculative bubbles or structural changes.
(b) Testing Economic Theories
Interest rate data helps test monetary transmission mechanisms.
Stock price behaviour helps test the Efficient Market Hypothesis.
(c) Identifying Market Imperfections
Credit rationing
Information asymmetry
Excess volatility
These issues become visible through empirical financial data.
(d) Policy Evaluation
Impact of monetary policy changes
Effects of financial reforms and deregulation
Performance of capital market regulations
5. Role of Financial Data in Analysis
Financial data enables quantitative and econometric analysis, such as:
(a) Econometric Modelling
Regression analysis
Time-series analysis
Volatility models (ARCH/GARCH)
(b) Risk and Return Analysis
Portfolio analysis
Asset pricing models (CAPM)
Credit risk assessment
(c) Forecasting
Interest rate forecasting
Stock market trend prediction
Business cycle analysis
(d) Cross-Country Comparisons
Financial depth
Market integration
Capital flows and stability
2. Application of Multistage Sampling to Study Poverty Incidence in a Village
Meaning of Multistage Sampling
Multistage sampling is a probability sampling technique in which the sample is selected in successive stages, using smaller and smaller sampling units at each stage. Instead of selecting final units (households) directly, selection is done step-by-step.
This method is especially useful when:
The population is large and scattered
A complete list of households is not readily available
Time and cost constraints exist
Applying Multistage Sampling to Measure Poverty in a Village
Suppose a researcher wants to estimate the incidence of poverty among households in a village or group of villages.
Stage 1: Selection of Region / District
From a state, a district is selected randomly or purposively (depending on research design).
Example: Select one backward district for poverty analysis.
Stage 2: Selection of Villages
From the selected district, a sample of villages is drawn using simple random sampling or probability proportional to size (PPS).
Larger villages may have higher probability of selection.
Stage 3: Selection of Households
From each selected village, a list of households is prepared.
Households are then selected randomly or systematically.
Stage 4: Data Collection
Information is collected on:
Income or consumption expenditure
Employment
Household size
Poverty is assessed using a poverty line (income or consumption-based).
Advantages of Multistage Sampling in Poverty Studies
Cost-effective and feasible
Suitable for rural and scattered populations
Flexible at different stages
Reduces fieldwork burden
Limitations
Sampling error may accumulate at each stage
More complex than single-stage sampling
Requires careful design and execution
Comparison of Operational Procedure: Stratified Sampling vs Multistage Sampling
Meaning of Stratified Sampling
In stratified sampling, the population is first divided into homogeneous sub-groups (strata) based on specific characteristics (e.g., income groups, caste, gender), and then samples are drawn from each stratum.
Comparison Table
| Basis of Comparison | Stratified Sampling | Multistage Sampling |
|---|---|---|
| Basic Principle | Population divided into strata | Sampling done in stages |
| Nature of Groups | Homogeneous within strata | Heterogeneous units at each stage |
| Sampling Stages | Single-stage after stratification | Two or more stages |
| Sampling Frame | Required for entire population | Required only for each stage |
| Selection of Units | Direct selection of final units | Indirect, step-by-step selection |
| Cost and Time | More costly for large populations | Relatively economical |
| Precision | High precision | Slightly lower precision |
| Usefulness | Small, well-defined populations | Large, geographically dispersed populations |
| Example | Sampling poor & non-poor households separately | District → Village → Household |
Operational Differences Explained
Stratified Sampling ensures representation of all important sub-groups and improves accuracy, but requires complete prior information.
Multistage Sampling is operationally simpler for large-scale field surveys but may involve higher sampling error.
Research Methods and Research Methodology
Distinction between Research Methods and Research Methodology
(a) Meaning of Research Methods
Research methods refer to the specific techniques, tools, and procedures used for collecting and analysing data in a research study. They answer the question “How is the research carried out?”
Examples:
Surveys
Interviews
Observation
Statistical analysis
Regression techniques
(b) Meaning of Research Methodology
Research methodology refers to the overall philosophical framework and logic that guides the selection and use of research methods. It explains why certain methods are used and how the research is systematically designed.
It includes:
Research philosophy
Assumptions about reality (ontology)
Nature of knowledge (epistemology)
Research strategy and design
(c) Differences between Research Methods and Research Methodology
| Basis | Research Methods | Research Methodology |
|---|---|---|
| Meaning | Techniques of data collection and analysis | Philosophical framework guiding research |
| Scope | Narrow | Broad |
| Focus | Practical and operational | Theoretical and conceptual |
| Concerned with | Data, tools, procedures | Logic, assumptions, justification |
| Level | Micro-level | Macro-level |
| Example | Interview, questionnaire | Positivism, interpretivism |
(d) Relationship between the Two
Research methodology determines which research methods are appropriate. Thus, methods are embedded within a methodology and cannot be chosen independently of it.
Formulation of a Research Proposal
Step 1: Identification of the Research Problem
Selection of a clear, specific, and researchable problem
Must be relevant to theory, policy, or practice
Avoids vague or overly broad topics
Example:
“Determinants of rural female labour force participation in India”
Step 2: Review of Literature
Study of existing research, theories, and findings
Helps to:
Identify research gaps
Avoid duplication
Refine research questions
Step 3: Statement of Research Objectives and Questions
Clearly states what the study aims to achieve
Objectives should be:
Specific
Measurable
Achievable
Example
To analyse the impact of education on labour participation
To examine regional variations
Step 4: Formulation of Hypotheses (if applicable)
Tentative statements to be tested empirically
Common in quantitative research
Example
H₁: Education has a positive impact on female labour force participation.
Step 5: Research Design
Overall blueprint of the study
Includes:
Type of study (exploratory, descriptive, causal)
Time dimension (cross-sectional or longitudinal)
Step 6: Data Sources and Data Collection Methods
Primary data: surveys, interviews, observations
Secondary data: Census, NSS, NFHS, RBI data
Justification of chosen method is essential.
Step 7: Sampling Design
Definition of:
Target population
Sample size
Sampling technique (random, stratified, purposive)
Step 8: Tools and Techniques of Analysis
Statistical and econometric tools to be used
Examples:
Descriptive statistics
Regression models
Index numbers
Step 9: Scope and Limitations of the Study
Defines boundaries of the research
Acknowledges constraints such as:
Time
Data availability
Methodological limitations
Step 10: Ethical Considerations
Confidentiality of data
Informed consent
Avoidance of plagiarism and bias
Step 11: Expected Contribution of the Study
Academic contribution
Policy relevance
Practical implications
Step 12: Time Schedule and Budget (if required)
Work plan with timelines
Financial requirements for data collection and analysis
Methodologies Used in Interpretive Research
Interpretive research employs several methodologies aimed at understanding subjective meanings and social processes.
(a) Phenomenological Methodology
Focuses on individuals’ lived experiences
Seeks to understand how people perceive and interpret economic realities
Used in studies of poverty, unemployment, and informal labour
Example:
Understanding how households experience poverty rather than merely measuring income levels.
(b) Ethnographic Methodology
Involves long-term immersion in a social setting
Uses participant observation and informal interviews
Common in rural development and informal sector studies
Example:
Studying work culture and survival strategies of street vendors.
(c) Hermeneutic Methodology
Concerned with interpretation of texts and narratives
Includes policy documents, interviews, and historical records
Emphasises context and historical background
Example:
Interpreting development policy documents to understand underlying assumptions about growth and welfare.
(d) Case Study Methodology
In-depth study of a single case or small number of cases
Useful for complex economic and institutional processes
Allows contextual understanding
Example:
A detailed study of one self-help group or cooperative society.
(e) Narrative and Discourse Analysis
Focuses on language, stories, and communication
Examines how economic realities are constructed through discourse
Example:
Analysing how poverty is framed in government reports versus community narratives.
5. Relevance of Interpretive Methodologies in Economics
Interpretive methodologies are particularly useful when:
Quantitative data fails to capture social realities
Human behaviour, institutions, and culture are central
Policy evaluation requires understanding stakeholder perspectives
They complement quantitative methods by providing depth, context, and meaning.
Action Research and Its Application to Reducing Malnourishment among Adolescent Students
1. Meaning of Action Research
Action Research is a participatory and problem-oriented research approach in which the researcher actively intervenes in a real-life situation to bring about change while simultaneously generating knowledge. It combines action, reflection, and research in a cyclical process.
Key features:
Focus on practical problem-solving
Conducted in real social settings
Involves stakeholders (teachers, students, parents, community)
Cyclical process: Plan → Act → Observe → Reflect
2. Objectives of Action Research in the Given Context
The objective of the proposed action research is to:
Identify the extent and causes of malnourishment among adolescent students
Design and implement context-specific interventions
Evaluate outcomes and improve nutritional status
The study is conducted in a rural school in Gujarat, where adolescent malnutrition may be linked to poverty, dietary habits, and lack of awareness.
Key Characteristics of Action Research and Their Role in Local-Level Change
1. Participatory Nature
Action research actively involves community members in:
Problem identification
Data collection
Decision-making
This ensures that the voices of disadvantaged groups are heard and valued, leading to solutions rooted in lived experiences.
2. Problem-Centred and Context-Specific
Unlike abstract research, action research focuses on real, local problems, such as:
Malnutrition
Poor sanitation
Low school attendance
This local relevance increases the effectiveness and acceptance of interventions.
3. Empowerment-Oriented
Participation builds:
Awareness
Confidence
Collective agency
Disadvantaged groups move from being subjects of research to agents of change, strengthening social inclusion.
4. Cyclical and Flexible Process
Action research follows a cycle of:
Planning
Action
Observation
Reflection
This allows continuous learning and correction, which is crucial in complex social settings.
5. Immediate Application of Findings
Findings are not delayed for academic publication but are translated directly into action, making it suitable for urgent local issues.
6. Democratic and Inclusive
It reduces power asymmetries between researchers and participants by promoting:
Dialogue
Mutual learning
Collective ownership
This is especially important for historically excluded groups.
7. Capacity Building
Action research enhances local capabilities by developing:
Problem-solving skills
Leadership
Organisational capacity
This ensures sustainability beyond the research period.
3. Steps in Action Research
Action research proceeds through systematic stages, as explained below.
4. Steps for Data Collection
(a) Identifying the Problem
Preliminary discussions with teachers and health workers
Review of school health records
Identification of symptoms such as low BMI, fatigue, absenteeism
(b) Data Collection Methods
(i) Primary Data
Anthropometric measurements: height, weight, BMI
Structured questionnaires for students on:
Dietary intake
Meal frequency
Awareness of nutrition
Interviews with:
Parents
Teachers
Anganwadi / health workers
Observation of:
Mid-day meal quality
Hygiene practices
(ii) Secondary Data
School health registers
ICDS and NFHS reports
Government nutrition programme guidelines
5. Steps for Data Analysis
(a) Quantitative Analysis
Classification of students as undernourished, normal, or overweight using BMI-for-age
Frequency and percentage analysis of:
Nutrient intake
Meal skipping
Anaemia symptoms
(b) Qualitative Analysis
Thematic analysis of interviews
Identification of key causes:
Inadequate diet
Cultural food practices
Economic constraints
Lack of nutrition awareness
6. Developing and Implementing Action Plans
Based on findings, targeted interventions are designed.
(a) Nutritional Interventions
Improvement in mid-day meal quality
Inclusion of:
- Pulses, eggs, milk, green vegetables
Coordination with local health departments
(b) Awareness and Behavioural Interventions
Nutrition education sessions for students
Workshops for parents on balanced diets
Posters and charts on adolescent nutrition
(c) Health Interventions
Regular health check-ups
Iron and folic acid supplementation
Deworming programmes
7. Observation and Evaluation
Monitoring changes in:
BMI and weight
Attendance and participation
Feedback from students and teachers
Comparison of pre- and post-intervention data
8. Reflection and Follow-up
Evaluation of effectiveness of actions
Identification of gaps and improvements
Modification of strategies if required
Planning the next action research cycle
9. Advantages of Action Research in This Context
Directly addresses a real social problem
Encourages community participation
Produces immediate and usable outcomes
Enhances policy and programme effectiveness
Difference between Research Design and Research Methods
(a) Research Design
Research design refers to the overall plan or blueprint of a research study. It specifies what type of study is to be conducted, how data will be collected, and how analysis will be carried out.
It answers the question: “What is the overall strategy of the research?”
Examples:
Exploratory research design
Descriptive research design
Experimental research design
(b) Research Methods
Research methods are the specific techniques or procedures used for data collection and analysis within the framework of a research design.
They answer the question: “How will data be collected and analysed?”
Examples:
Survey method
Interview method
Statistical analysis
(c) Differences between Research Design and Research Methods
| Basis | Research Design | Research Methods |
|---|---|---|
| Meaning | Overall research plan | Techniques used in research |
| Nature | Conceptual and strategic | Operational and practical |
| Scope | Broad | Narrow |
| Focus | Structure of the study | Execution of the study |
| Sequence | Decided first | Follow the design |
Methods of Univariate Data Analysis
Meaning of Univariate Data Analysis
Univariate data analysis refers to the analysis of a single variable at a time. Its main objective is to describe, summarise, and understand the distribution of that variable.
Methods of Univariate Data Analysis
(a) Frequency Distribution
Data is arranged into classes or categories
Shows how often each value occurs
Helps identify concentration and spread
(b) Measures of Central Tendency
These indicate the central or typical value of the data.
Mean – arithmetic average
Median – middle value
Mode – most frequently occurring value
(c) Measures of Dispersion
These measure the spread or variability of data.
Range
Variance
Standard deviation
(d) Graphical Presentation
Used for visual representation of data:
Bar diagrams
Pie charts
Histograms
Steps in Analysing Qualitative Data
1. Meaning of Qualitative Data Analysis
Qualitative data analysis refers to the systematic process of organising, interpreting, and deriving meaning from non-numerical data such as interviews, observations, field notes, and documents.
2. Steps for Analysing Qualitative Data
(a) Data Preparation and Organisation
Transcribing interviews and field notes
Organising data into texts or documents
Reading data repeatedly for familiarity
(b) Coding
Assigning labels or codes to meaningful segments of data
Helps in identifying key ideas, concepts, and patterns
(c) Categorisation
Grouping similar codes into broader categories
Reduces complexity and aids interpretation
(d) Theme Identification
Identifying recurring themes and relationships
Themes represent important patterns in the data
(e) Interpretation
Linking themes with research questions
Understanding meanings in social and economic contexts
(f) Validation
Checking consistency of interpretations
Use of triangulation and participant feedback
Analysis of Findings in Grounded Theory
Meaning of Grounded Theory
Grounded Theory is a qualitative research approach in which theory is developed inductively from data, rather than testing pre-existing theories.
Steps in Analysing Findings from Grounded Theory
(a) Open Coding
Breaking data into small units
Identifying concepts and initial categories
(b) Axial Coding
Linking categories and sub-categories
Identifying causal relationships and conditions
(c) Selective Coding
Identifying a core category
Integrating all categories around this central theme
(d) Constant Comparative Method
Continuous comparison of data with emerging categories
Refines concepts and strengthens theory
(e) Theory Development
Formulating a substantive theory grounded in data
Explaining processes, actions, or interactions
Multicollinearity: Meaning, Detection and Implications
1. Meaning of Multicollinearity
Multicollinearity refers to a situation in a multiple regression model where two or more independent (explanatory) variables are highly correlated with each other. As a result, it becomes difficult to isolate the individual effect of each explanatory variable on the dependent variable.
Multicollinearity can be:
Perfect multicollinearity: exact linear relationship (model cannot be estimated)
Imperfect (high) multicollinearity: strong but not exact correlation (model is estimable but problematic)
2. Detection of Multicollinearity
(a) Correlation Matrix
- High pairwise correlation coefficients (close to ±1) among explanatory variables indicate multicollinearity.
(b) Variance Inflation Factor (VIF)
Measures how much the variance of a coefficient is inflated due to multicollinearity.
Rule of thumb:
- VIF > 10 → serious multicollinearity
(c) T-statistics and R² Paradox
High R² but statistically insignificant individual coefficients
Indicates explanatory variables move together
(d) Auxiliary Regression
Regress one independent variable on the others
A high R² suggests multicollinearity
(e) Instability of Coefficients
- Coefficient estimates change significantly with small changes in data or model specification
3. Implications of Multicollinearity
(a) Inflated Standard Errors
- Makes coefficient estimates less precise
(b) Insignificant t-values
- Important variables may appear statistically insignificant
(c) Unreliable Coefficient Estimates
- Signs and magnitudes of coefficients may be counterintuitive
(d) Difficulty in Interpretation
- Hard to assess the individual impact of explanatory variables
(e) Reduced Predictive Reliability
- While overall fit (R²) may be high, predictions become less reliable
Difference between Realism and Instrumentalism and Evaluation of Milton Friedman’s Instrumentalism
1. Realism and Instrumentalism: Meaning
(a) Realism
Realism is a philosophical position which holds that:
Economic assumptions should be realistic and descriptively accurate
Models should reflect actual behaviour and real-world mechanisms
The truth or realism of assumptions matters for scientific explanation
In realism, theories are judged by:
Plausibility of assumptions
Explanatory power
Correspondence with real economic behaviour
(b) Instrumentalism
Instrumentalism argues that:
The realism of assumptions is irrelevant
What matters is the predictive accuracy of a theory
Theories are merely tools (instruments) for prediction, not descriptions of reality
Thus, even unrealistic assumptions are acceptable if predictions are accurate.
2. Differences between Realism and Instrumentalism
| Basis | Realism | Instrumentalism |
|---|---|---|
| View of assumptions | Must be realistic | Can be unrealistic |
| Purpose of theory | Explanation + prediction | Prediction only |
| Focus | Truth and causality | Usefulness |
| Relation to reality | Descriptive | Pragmatic |
| Evaluation criterion | Realism + accuracy | Predictive success |
Milton Friedman’s Instrumentalist Approach
The most influential proponent of instrumentalism in economics is Milton Friedman.
Friedman’s Core Argument
In his essay “The Methodology of Positive Economics”, Friedman argued that:
Economic theories should not be judged by the realism of assumptions
Unrealistic assumptions are common and unavoidable
A theory is valid if it yields accurate predictions
Example:
Firms are assumed to maximize profits, even if real firms do not consciously do so
The assumption is justified if it predicts firm behaviour correctly
4. Critical Evaluation of Friedman’s Instrumentalism
(a) Strengths
Practical and pragmatic
Allows economists to build simple models
Encourages testable predictions
Promotes empirical testing
- Shifts focus from philosophical debates to observable outcomes
Useful in policy analysis
- Predictive models can guide decision-making even if assumptions are simplified
(b) Criticisms
Neglect of explanation
Accurate prediction does not guarantee correct explanation
Wrong mechanisms may produce right predictions temporarily
Weakens causal understanding
- Unrealistic assumptions may hide important institutional and behavioural factors
Problem in complex social systems
In economics, predictions often fail due to changing contexts
Unrealistic assumptions reduce robustness
Limits critical scrutiny
- If assumptions are ignored, theories become difficult to challenge meaningfully
Usefulness of Analysis of Variance (ANOVA) in the Regression Model
1. Meaning of ANOVA in Regression
In the context of a regression model, Analysis of Variance (ANOVA) is a statistical technique used to decompose the total variation in the dependent variable into components attributable to the regression model and to random error. It helps in assessing the overall significance and explanatory power of the regression equation.
2. Decomposition of Variance in Regression
ANOVA divides the Total Sum of Squares (TSS) into:
$$ ext{TSS} = ext{ESS} + ext{RSS}$$ where:
TSS (Total Sum of Squares): total variation in the dependent variable
ESS (Explained Sum of Squares): variation explained by the regression model
RSS (Residual Sum of Squares): unexplained variation (error)
3. Usefulness of ANOVA in Regression Analysis
(a) Testing Overall Significance of the Model
ANOVA provides the F-test, which tests the null hypothesis:
$$H_0: eta_1 = eta_2 = cdots = eta_k = 0$$
It checks whether the explanatory variables jointly influence the dependent variable
Helps determine whether the regression model is meaningful
(b) Measuring Explanatory Power
From ANOVA, the coefficient of determination (R²) is obtained:
$$R^2 = rac{ESS}{TSS}$$
Indicates the proportion of variation explained by the model
Higher R² implies better model fit
(c) Comparison of Alternative Models
ANOVA allows comparison between restricted and unrestricted models
Useful in model selection and specification testing
(d) Identifying Unexplained Variation
RSS highlights the magnitude of random error
Helps in diagnosing model inadequacy or omitted variables
(e) Basis for Further Diagnostic Tests
ANOVA provides a framework for:
Testing significance in multiple regression
Understanding goodness of fit before interpreting individual coefficients
4. ANOVA Table in Regression (Illustrative)
| Source of Variation | Sum of Squares | Degrees of Freedom | Mean Square |
|---|---|---|---|
| Regression | ESS | k | ESS / k |
| Residual | RSS | n − k − 1 | RSS / (n − k − 1) |
| Total | TSS | n − 1 | — |
Lorenz Curve as a Tool for Measuring Inequality
1. Meaning of the Lorenz Curve
The Lorenz Curve is a graphical tool used to measure inequality in the distribution of income or wealth in an economy. It shows the relationship between:
Cumulative percentage of population (on the X-axis), arranged from poorest to richest
Cumulative percentage of income or wealth (on the Y-axis)
The farther the Lorenz curve lies from the line of equality, the greater is the inequality.
2. Lorenz Curve under Different Cases
(a) Perfect Equality
Income is equally distributed among all individuals.
Each percentage of population receives the same percentage of income.
Representation:
The Lorenz curve coincides with the 45° line, known as the line of perfect equality.
Example:
20% of population earns 20% of income
50% of population earns 50% of income
Interpretation:
No inequality exists.
Gini coefficient = 0
(b) Perfect Inequality
- One individual (or household) receives all the income, while others receive nothing.
Representation:
The Lorenz curve lies along the horizontal axis until the last individual.
At 100% population, income jumps suddenly to 100%.
Interpretation:
Extreme inequality exists.
Gini coefficient = 1
(c) Relative Inequality
Income is unequally distributed, but not perfectly unequal.
This is the most common real-world case.
Representation:
The Lorenz curve lies between the line of perfect equality and the curve of perfect inequality.
The greater the bow away from the equality line, the higher the inequality.
Interpretation:
Indicates partial inequality.
Allows comparison between:
Regions
Time periods
Countries
3. Importance of the Lorenz Curve
Simple and visual measure of inequality
Helps in comparing income distributions
Basis for calculating the Gini coefficient
Widely used in economic and development studies
Quasi-Participant Method over Simple Observation for Data Collection?
1. Meaning of Quasi-Participant Method
The quasi-participant method is a qualitative data collection approach in which the researcher partially participates in the social setting being studied while maintaining analytical distance. The researcher interacts with participants but does not become a full member of the group.
2. Meaning of Simple Observation
In simple observation, the researcher remains a detached observer, recording behaviour without interacting with participants. The role is passive, and understanding is limited to what is externally visible.
3. Reasons Why Quasi-Participant Method is Preferred
(a) Deeper Understanding of Social Reality
Quasi-participation allows the researcher to understand meanings, motivations, and perceptions
Observation alone captures only surface behaviour
(b) Better Contextual Interpretation
Social and economic actions are context-dependent
Partial participation helps interpret actions within cultural, institutional, and social contexts
(c) Access to Insider Information
Interaction builds rapport and trust
Participants may share experiences and explanations not visible through observation
(d) Reduced Observer Bias
Pure observation may lead to misinterpretation of actions
Engagement allows clarification and verification of observed behaviour
(e) Captures Dynamic Processes
Economic behaviour (e.g., labour relations, informal markets) involves processes over time
Quasi-participation captures changes, negotiations, and adaptations better than static observation
(f) More Suitable for Development and Institutional Studies
Useful in studying:
Poverty
Informal sector
Rural institutions
These require understanding lived experiences, not just visible actions
4. Limitations of Simple Observation
Limited insight into intentions and meanings
Risk of superficial conclusions
Cannot explain why people behave in a certain way
Hierarchical and Non-Hierarchical Methods of Clustering
Meaning of Clustering
Clustering is a multivariate statistical technique used to group observations such that:
objects within a cluster are similar, and
objects between clusters are dissimilar,
based on selected variables and a distance/similarity measure.
1. Hierarchical Clustering Methods
Definition
Hierarchical clustering creates a nested sequence of clusters arranged in the form of a tree structure (dendrogram). Once a cluster is formed, it cannot be undone.
Types
Agglomerative (bottom-up)
Each observation starts as a separate cluster
Clusters are progressively merged
Most commonly used
Divisive (top-down)
All observations start in one cluster
Clusters are progressively split
Computationally expensive and less common
Common linkage criteria
Single linkage (nearest neighbour)
Complete linkage (farthest neighbour)
Average linkage
Ward’s method (minimises within-cluster variance)


Features
Number of clusters not required in advance
Produces a dendrogram for visual interpretation
Sensitive to outliers and noise
Computationally intensive for large samples
2. Non-Hierarchical (Partitioning) Clustering Methods
Definition
Non-hierarchical clustering divides the data into a pre-specified number of clusters (k) and allows reallocation of observations to improve cluster quality.
Common methods
K-means clustering
K-medoids
ISODATA


Features
Number of clusters must be specified beforehand
Observations can move between clusters
Efficient for large datasets
Sensitive to initial seed selection
3. Differences between Hierarchical and Non-Hierarchical Clustering
| Basis | Hierarchical Clustering | Non-Hierarchical Clustering |
|---|---|---|
| Nature | Nested, tree-based | Partition-based |
| Number of clusters | Not pre-specified | Must be pre-specified |
| Reallocation | Not possible | Possible |
| Output | Dendrogram | Final cluster membership |
| Computational cost | High | Relatively low |
| Suitability | Small datasets | Large datasets |
| Sensitivity to outliers | High | Moderate |
| Interpretation | Visual and intuitive | Less visual |
4. Criteria for Choosing a Clustering Method
1. Research Objective
Exploratory analysis → Hierarchical clustering
Classification or segmentation → Non-hierarchical clustering
2. Size of Dataset
Small samples (e.g., village surveys, pilot studies) → Hierarchical
Large samples (e.g., NSS, NFHS datasets) → Non-hierarchical
3. Knowledge of Number of Clusters
Unknown number of groups → Hierarchical
Known or policy-driven grouping (e.g., poor vs non-poor) → Non-hierarchical
4. Need for Interpretability
If visual interpretation and structure matter → Hierarchical
If efficiency and final grouping matter → Non-hierarchical
5. Presence of Outliers
- If data has many outliers → Prefer Non-hierarchical methods
6. Computational Constraints
Limited computing power → Non-hierarchical
Rich computing resources → Hierarchical possible
5. Application in Economic Research
Hierarchical clustering:
Regional development analysis
Typology of states or districts
Exploratory poverty profiling
Non-hierarchical clustering:
Consumer segmentation
Labour market classification
Credit risk grouping
National Family Health Survey (NFHS)
The National Family Health Survey (NFHS) is one of India’s most comprehensive and authoritative large-scale household survey databases on population, health, and nutrition. It provides reliable, nationally representative data crucial for public health planning, policy formulation, and academic research.
NFHS is conducted under the stewardship of the International Institute for Population Sciences with support from the Ministry of Health and Family Welfare (MoHFW), Government of India.
2. NFHS as a Public Health Database
Meaning and Scope
NFHS is a repeated cross-section survey conducted at regular intervals to collect data on:
Population and demographic indicators
Health and nutrition outcomes
Maternal and child health
Reproductive health
Disease prevalence
Health service utilisation
NFHS rounds include:
NFHS-1 (1992–93)
NFHS-2 (1998–99)
NFHS-3 (2005–06)
NFHS-4 (2015–16)
NFHS-5 (2019–21)
Key Features of NFHS Database
National and Sub-national Coverage
Representative at national, state, and district levels
Enables regional and inter-district comparisons
Large Sample Size
Covers millions of individuals and households
Enhances statistical reliability
Standardised Methodology
Uniform questionnaires across states
Allows comparison over time
Rich Health Indicators
Fertility, mortality, family planning
Child nutrition (stunting, wasting, underweight)
Anaemia, obesity, hypertension
Immunisation and maternal care
Gender-disaggregated Data
- Special focus on women, children, and adolescents
3. NFHS Variables Relevant to Public Health
(a) Maternal and Child Health
Antenatal care
Institutional deliveries
Infant and child mortality
Breastfeeding practices
(b) Nutrition and Anthropometry
Height-for-age (stunting)
Weight-for-height (wasting)
BMI, anaemia levels
(c) Disease Burden
Hypertension
Diabetes
HIV knowledge
Tuberculosis awareness
(d) Health Infrastructure and Access
Sanitation and drinking water
Health insurance coverage
Utilisation of public vs private healthcare
4. Usefulness of NFHS for Researchers
1. Evidence-Based Research
NFHS provides high-quality secondary data, enabling researchers to conduct:
Public health studies
Demographic analysis
Nutritional epidemiology
Health inequality research
2. Policy Evaluation
Researchers use NFHS to:
Evaluate government programmes such as:
National Health Mission (NHM)
POSHAN Abhiyaan
Janani Suraksha Yojana
Assess outcomes before and after policy interventions
3. Study of Health Inequalities
NFHS data allows analysis across:
Income groups
Caste and religion
Rural–urban divide
Gender and regional disparities
4. Time-Series and Trend Analysis
Since NFHS is conducted periodically, researchers can:
Study changes in health indicators over time
Analyse long-term improvements or setbacks in public health
5. Micro-level Econometric Analysis
NFHS is widely used for:
Regression analysis
Logistic and probit models
Impact evaluation studies
Examples
Determinants of child malnutrition
Effect of female education on fertility
Link between sanitation and child health
6. Interdisciplinary Research
NFHS supports research in:
Economics
Public health
Sociology
Gender studies
Development studies
7. International Comparability
NFHS follows global standards (DHS framework), allowing:
Cross-country comparisons
Global health benchmarking
5. Limitations of NFHS (Brief)
Mostly cross-sectional in nature
Limited scope for causal inference
Self-reported health data may involve recall bias
Despite these, NFHS remains the most reliable public health dataset in India.
Comparative overview of Verification and Falsification.
1. Introduction
The debate between verification and falsification lies at the heart of the philosophy of science and directly influences how research hypotheses are formulated, tested, and evaluated in economics and social sciences. These approaches represent two contrasting views of scientific knowledge and progress.
2. Verification Principle
Meaning
The verification principle holds that a scientific statement is meaningful only if it can be empirically verified through observation or experiment.
This view is associated with logical positivism, particularly the Vienna Circle.
Core Idea
A theory is scientific if repeated observations confirm it.
Key Features
Emphasis on induction
Knowledge grows through accumulation of confirming evidence
Statements must be empirically observable
Unobservable or metaphysical statements are considered meaningless
Example (Economics)
“Increase in income leads to increase in consumption.”
If repeated data observations support this, the theory is considered verified.
Limitations of Verification
Problem of induction:
No number of positive observations can conclusively prove a universal law.Cannot handle counter-examples adequately.
Leads to confirmation bias, where researchers seek only supporting evidence.
3. Falsification Principle
Meaning
The principle of falsification, proposed by Karl Popper, argues that a theory is scientific not because it can be verified, but because it can be falsified.
Core Idea
A theory is scientific if it makes risky predictions that could, in principle, be proven false.
Key Features
Emphasis on deduction
Scientific knowledge grows through conjectures and refutations
One counter-example is sufficient to reject a theory
Clear demarcation between science and non-science
Example (Economics)
“Minimum wages always reduce employment.”
If even one credible empirical case contradicts this, the universal claim is falsified.
Strengths of Falsification
Avoids the problem of induction
Encourages critical testing
Promotes scientific rigor and objectivity
Criticisms
In practice, theories are rarely rejected outright due to:
Measurement errors
Auxiliary assumptions
Social sciences often deal with probabilistic laws, not strict universals
4. Verification vs Falsification: A Comparative Overview
| Basis | Verification | Falsification |
|---|---|---|
| Philosophical base | Logical Positivism | Critical Rationalism |
| Key thinker | Vienna Circle | Karl Popper |
| Logic used | Induction | Deduction |
| Criterion of science | Confirmability | Falsifiability |
| Role of evidence | Accumulates support | Seeks refutation |
| Status of theory | Accepted if verified | Accepted until falsified |
| View of progress | More confirmations | Elimination of false theories |
| Risk attitude | Conservative | Risk-embracing |
Popperian View of Verisimilitude
Meaning of Verisimilitude
Verisimilitude means truth-likeness or closeness to the truth.
Popper acknowledged that:
Scientific theories are never perfectly true
Yet, science progresses by developing theories that are closer to the truth
Popper’s Argument
Even when a theory is falsified, it may still:
Explain more facts
Make more precise predictions
Have greater empirical content than earlier theories
Thus, a new theory can be false yet better than an older one.
Example
Newtonian mechanics is false at relativistic speeds
Yet it is more truth-like than Aristotelian physics
Einstein’s theory is even closer to truth
How Verisimilitude Enables Scientific Progress
Science progresses without certainty
Replacement theories have:
Greater explanatory power
Higher falsifiability
Wider empirical scope
Criticism of Popper’s Verisimilitude
Early formulations faced logical difficulties
Measuring “closeness to truth” is problematic
In social sciences, truth-likeness is often context-dependent
Despite this, verisimilitude remains a powerful idea explaining progress without verification.
6. Relevance to Economic Research
Econometric models are tested, not verified
Hypotheses are retained until falsified
Competing theories are compared based on:
Predictive power
Empirical adequacy
Explanatory scope
Thus, economics largely follows a Popperian methodological stance in practice.
Correspondence Analysis for analysing associations
Correspondence Analysis (CA) is a multivariate statistical technique used to analyse and visually represent associations between categories of qualitative variables. It is especially useful when data are presented in the form of contingency tables.
2. Usefulness of Correspondence Analysis in Analysing Associations
1. Analysis of Association
Correspondence analysis helps in:
Identifying patterns of association between row and column categories
Measuring the degree of similarity or dissimilarity among categories
Categories located closer in the graphical display indicate stronger association.
2. Graphical Representation
CA provides a low-dimensional map (usually two-dimensional) where:
Rows and columns are displayed simultaneously
Associations are visually interpreted
This is particularly useful for exploratory data analysis.
3. Reduction of Data Complexity
Large contingency tables are simplified into:
A few principal dimensions
Without significant loss of information
This helps researchers interpret complex categorical relationships easily.
4. Detection of Structure and Profiles
Correspondence analysis identifies:
Row profiles (distribution of categories across columns)
Column profiles (distribution across rows)
This helps uncover hidden structures in categorical data.
5. Application in Social and Economic Research
CA is widely used in:
Consumer preference analysis
Poverty and deprivation studies
Occupational structure analysis
Education and health surveys
3. Applicability to Categorical Variables
Yes, correspondence analysis applies specifically to categorical variables.
It is designed for nominal and ordinal variables
Works on frequency data from:
Two-way tables
Multi-way contingency tables
Unlike regression analysis, CA does not require numerical measurement of variables.
4. Advantages
Non-parametric in nature
No assumption of normality
Suitable for survey and census data
Complements chi-square tests by adding interpretation
5. Limitations (Brief)
Mainly exploratory
Sensitive to rare categories
Interpretation may be subjective
Canonical Correlation Analysis (CCA) help analyse interdisciplinary constructs
1. Introduction
Canonical Correlation Analysis (CCA) is a multivariate statistical technique used to study the relationship between two sets of variables simultaneously. Unlike simple correlation or regression, which examine relationships between individual variables, CCA captures the overall association between two multidimensional constructs, making it especially useful for interdisciplinary research.
2. Meaning of Canonical Correlation Analysis
CCA finds:
Linear combinations of variables in the first set (canonical variate UUU), and
Linear combinations of variables in the second set (canonical variate VVV)
such that the correlation between UUU and VVV is maximised.
$$U = a_1X_1 + a_2X_2 + dots + a_pX_p$$ $$V = b_1Y_1 + b_2Y_2 + dots + b_qY_q$$
3. Why CCA is Useful for Interdisciplinary Constructs
Interdisciplinary research often involves:
Multiple variables from different disciplines
Concepts that cannot be captured by a single indicator
CCA helps by:
1. Capturing Multidimensional Relationships
It analyses sets of variables together, not one-to-one relationships.
2. Integrating Different Disciplines
Allows joint analysis of:
Economic variables
Social, psychological, health, or environmental variables
within a single framework.
3. Reducing Complexity
Multiple correlated variables are reduced to a few canonical functions, making interpretation manageable.
4. Avoiding Multiple Regression Problems
Instead of running many regressions, CCA provides a single comprehensive association measure.
4. Simple Interdisciplinary Example
Example: Relationship between Socio-economic Status and Health Outcomes
Set 1: Economic Variables
Income
Education
Employment status
Set 2: Health Variables
Body Mass Index (BMI)
Anaemia status
Frequency of illness
Using CCA:
A composite socio-economic index is created from income, education, and employment.
A composite health index is created from BMI, anaemia, and illness frequency.
CCA estimates the strength of association between these two indices.
Interpretation
A high canonical correlation indicates that better socio-economic conditions are strongly associated with better health outcomes.
Helps economists, sociologists, and public-health researchers draw integrated conclusions.
5. Applications in Research
Health economics
Education and labour market studies
Environment–economy linkages
Development and welfare analysis
6. Limitations (Brief)
Requires large sample sizes
Interpretation can be complex
Assumes linear relationships
Evaluation of Nominal, Ordinal, Interval and Ratio Scale Variables & Applicable Measures of Central Tendency
Measurement scales classify variables based on the nature of information they contain and the type of mathematical operations permitted on them. The four commonly used scales—nominal, ordinal, interval, and ratio—differ in terms of ordering, distance, and origin, which in turn determines the appropriate measure of central tendency.
2. Evaluation on Common Parameters
| Parameter | Nominal Scale | Ordinal Scale | Interval Scale | Ratio Scale |
|---|---|---|---|---|
| Nature of data | Qualitative | Qualitative | Quantitative | Quantitative |
| Classification | Yes | Yes | Yes | Yes |
| Ordering | No | Yes | Yes | Yes |
| Equal intervals | No | No | Yes | Yes |
| True zero | No | No | No | Yes |
| Meaningful ratios | No | No | No | Yes |
| Examples | Gender, caste | Income groups, ranks | Temperature (°C) | Income, age, expenditure |
1. Nominal Scale Variables
Nominal variables classify data into distinct categories without any order or ranking. The numbers or labels assigned are purely symbolic.
Key features
No ordering
No arithmetic operations possible
Only equality/inequality can be checked
Example (Economics)
Type of occupation: farmer, labourer, self-employed
Measure of central tendency: Mode
2. Ordinal Scale Variables
Ordinal variables classify data into categories that have a meaningful order, but the distance between categories is not equal or known.
Key features
Ranking is possible
Differences are not measurable
Arithmetic operations not meaningful
Example (Economics)
Income groups: low, middle, high
Measure of central tendency: Median and Mode
3. Interval Scale Variables
Interval variables have ordered categories with equal intervals, but they lack a true zero, so ratios are not meaningful.
Key features
Order and equal spacing
Zero is arbitrary
Differences are meaningful, ratios are not
Example (Economics)
Consumer Price Index (CPI)
Measure of central tendency: Mean, Median, Mode
4. Ratio Scale Variables
Ratio variables possess all the properties of interval scales and additionally have a true zero, making ratios meaningful.
Key features
Order, equal intervals, true zero
All arithmetic operations possible
Example (Economics)
Income, consumption expenditure
Measure of central tendency: Mean, Median, Mode
3. Measures of Central Tendency Applicable
(a) Nominal Scale
Only mode is applicable
Mean and median are meaningless because:
No numerical value
No ordering
Example: Most common occupation in a village
✔ Applicable: Mode only
(b) Ordinal Scale
Median and mode are applicable
Mean is not appropriate due to:
- Lack of equal intervals
Example: Poverty categories (low, medium, high)
✔ Applicable: Median, Mode
(c) Interval Scale
Mean, median, and mode are applicable
Ratios are meaningless due to absence of true zero
Example: Temperature in Celsius
✔ Applicable: Mean, Median, Mode
(d) Ratio Scale
All measures of central tendency are applicable
Allows meaningful comparison using ratios
Example: Monthly income, consumption expenditure
✔ Applicable: Mean, Median, Mode
4. Summary Table: Scale vs Central Tendency
| Scale | Mode | Median | Mean |
|---|---|---|---|
| Nominal | ✔ | ✘ | ✘ |
| Ordinal | ✔ | ✔ | ✘ |
| Interval | ✔ | ✔ | ✔ |
| Ratio | ✔ | ✔ | ✔ |
Mixed Methods Research
1. Meaning of Mixed Methods Research
Mixed Methods Research is an approach that integrates both quantitative and qualitative methods within a single study to gain a more comprehensive understanding of a research problem.
It combines:
Quantitative methods (numbers, measurement, statistical analysis), and
Qualitative methods (meanings, experiences, perceptions).
The core idea is that numbers explain “how much”, while qualitative insights explain “why and how.”
2. Key Characteristics
Methodological Integration
- Uses surveys, experiments, econometrics along with interviews, focus groups, or case studies.
Complementarity
- Qualitative findings help interpret quantitative results, and vice versa.
Triangulation
- Cross-validates findings using different methods, increasing reliability.
Flexibility
- Can be sequential (one after the other) or concurrent (both together).
3. Types of Mixed Methods Designs (Brief)
Sequential Explanatory: Quantitative → Qualitative
Sequential Exploratory: Qualitative → Quantitative
Concurrent Design: Both conducted simultaneously
4. Illustration with an Example
Example: Studying Female Labour Force Participation in Rural India
Quantitative Component
Use NSS/NFHS data
Apply regression analysis to study effects of education, wages, and household income
Qualitative Component
Conduct interviews and focus group discussions with rural women
Explore cultural norms, mobility constraints, and household decision-making
Integration
Quantitative analysis may show low participation despite education
Qualitative findings explain this through social norms and unpaid care work
Thus, mixed methods provide both statistical evidence and contextual understanding.
5. Advantages
Richer and deeper analysis
Better policy relevance
Reduces bias of single-method studies
Especially useful in development and public policy research
Hypothesis
A hypothesis is a tentative, testable statement about the relationship between two or more variables. It is formulated to provide a possible explanation of a phenomenon and is subjected to empirical verification using data.
In research methodology, a hypothesis acts as a guide to investigation, helping the researcher decide:
what data to collect,
how to analyze it, and
what conclusions to draw.
Example (Economics):
“An increase in education level leads to higher wages.”
This statement can be tested using data on education and income.
Definitions by Scholars
Goode and Hatt: “A hypothesis is a proposition which can be put to test to determine its validity.”
Kerlinger: “A hypothesis is a conjectural statement of the relationship between two or more variables.”
Sources of Hypothesis
Hypotheses do not arise randomly; they are derived from several intellectual and empirical sources:
1. Theory
Existing economic theories are the most important source of hypotheses.
Example: Keynesian theory suggests a relationship between income and consumption, leading to hypotheses about marginal propensity to consume.
2. Previous Studies and Literature
Past research findings often suggest gaps, contradictions, or extensions that give rise to new hypotheses.
3. Observation and Experience
Real-world observations—such as regional inequality, unemployment trends, or inflation behavior—may lead researchers to frame hypotheses.
4. Analogies
Similarities between phenomena can suggest hypotheses.
Example: Concepts from industrial organization applied to digital platforms.
5. Intuition and Insight
Researchers’ intellectual insight or creative thinking can generate hypotheses, though these must still be empirically tested.
6. Social and Policy Problems
Contemporary economic problems (poverty, inflation, unemployment, climate change) provide fertile ground for hypothesis formulation.
Steps Involved in Testing a Hypothesis
Hypothesis testing is a systematic statistical procedure used to decide whether empirical evidence supports or rejects a hypothesis.
Step 1: Formulation of Hypothesis
Two hypotheses are formulated:
Null Hypothesis (H₀):
States that there is no relationship or no effect.
Example:H₀: Education has no effect on wages.
Alternative Hypothesis (H₁ or Hₐ):
States that a relationship or effect does exist.
Example:H₁: Education positively affects wages.
Step 2: Selection of Significance Level (α)
The significance level represents the probability of rejecting a true null hypothesis.
Common levels:
1% (0.01)
5% (0.05)
10% (0.10)
Economics research commonly uses 5%.
Step 3: Choice of Appropriate Test Statistic
Depending on the nature of data and sample size, an appropriate test is selected, such as:
Z-test
t-test
Chi-square test
F-test
Step 4: Specification of Sampling Distribution
The probability distribution of the test statistic under the null hypothesis is identified (normal, t, chi-square, etc.).
Step 5: Computation of Test Statistic
Using sample data, the value of the test statistic is calculated.
Step 6: Determination of Critical Value / p-value
Critical value approach: Compare calculated value with table value.
p-value approach: Compare p-value with significance level.
Step 7: Decision Rule
If calculated value > critical value → Reject H₀
If calculated value ≤ critical value → Fail to reject H₀
Step 8: Conclusion and Interpretation
The result is interpreted in the context of the research problem, policy relevance, or economic theory.
Example:
“The null hypothesis is rejected at the 5% level, indicating that education significantly affects wages.”
Importance of Hypothesis in Research
Provides direction to research
Helps in theory testing
Ensures objectivity
Facilitates statistical analysis
Links theory with empirical evidence
Semi-log and Log-linear Regression Models
1. Introduction
In econometric analysis, logarithmic transformations are widely used to:
linearise non-linear relationships
stabilise variance
interpret coefficients meaningfully
Two commonly used models are the semi-log model and the log-linear model. Though both involve logarithms, they differ in structure and interpretation.
2. Semi-log Model
Definition
In a semi-log model, only one variable (either dependent or independent) is expressed in logarithmic form.
Types and Functional Form
(a) Log-linear (dependent variable in log form)
lnY=α+βX+uln Y = alpha + eta X + ulnY=α+βX+u
(b) Linear-log (independent variable in log form)
Y=α+βlnX+uY = alpha + eta ln X + uY=α+βlnX+u
Interpretation
In log-linear form:
βetaβ represents percentage change in Y for a one-unit change in X.In linear-log form:
βetaβ represents absolute change in Y for a 1% change in X.
Applications
Wage determination models
Engel curve estimation
Growth analysis with dummy variables
Impact of education on earnings
3. Log-linear (Double-log) Model
Definition
In a log-linear (double-log) model, both dependent and independent variables are expressed in logarithmic form.
Functional Form
lnY=α+βlnX+uln Y = alpha + eta ln X + ulnY=α+βlnX+u
Interpretation
βetaβ is an elasticity
Measures percentage change in Y due to percentage change in X
Applications
Demand and supply analysis
Production function estimation (Cobb–Douglas)
Export–import demand studies
Price elasticity estimation
4. Distinction between Semi-log and Log-linear Models
| Basis | Semi-log Model | Log-linear (Double-log) Model |
|---|---|---|
| Variables in log form | One variable | Both variables |
| Functional form | Mixed (log–linear or linear–log) | Fully logarithmic |
| Coefficient interpretation | Semi-elasticity | Elasticity |
| Complexity | Relatively simple | More analytically rich |
| Common use | Growth, wage, policy impact | Demand, production, trade |
5. Advantages of Logarithmic Models (Brief)
Reduces heteroscedasticity
Handles non-linearity
Improves normality of residuals
Facilitates economic interpretation
Stratified random sampling & Purposive sampling
Data collection methods determine how units are selected from a population for research. Sampling methods are broadly classified into probability sampling and non-probability sampling.
Stratified random sampling belongs to probability sampling, while purposive sampling is a non-probability sampling method.
(a) Stratified Random Sampling
Meaning
Stratified random sampling is a probability sampling technique in which the population is first divided into homogeneous sub-groups called strata, and then random samples are drawn from each stratum.
The key idea is to ensure that all important sub-groups are adequately represented in the sample.
Steps Involved
Identify the target population
Divide the population into mutually exclusive and exhaustive strata based on relevant characteristics (e.g., income, region, gender)
Select samples randomly from each stratum
Combine samples from all strata to form the final sample
Types of Stratified Sampling
Proportionate stratified sampling: Sample size from each stratum is proportional to its population size
Disproportionate stratified sampling: Sample sizes differ intentionally to study smaller strata in detail
Example (Economics)
Suppose a researcher studies income inequality among households in India.
Population is divided into strata based on income groups:
Low income
Middle income
High income
Random samples are selected from each income group.
This ensures that all income categories are represented, avoiding bias toward any one group.
Merits
Ensures better representation
Reduces sampling error
Suitable for heterogeneous populations
Improves precision of estimates
Limitations
Requires prior information about population
More complex and time-consuming
Incorrect stratification can distort results
(b) Purposive Sampling
Meaning
Purposive sampling (also called judgmental sampling) is a non-probability sampling method in which the researcher selects units deliberately based on their relevance to the research objective.
Selection depends on the researcher’s judgment and expertise, not randomization.
Key Characteristics
No random selection
Focus on information-rich cases
Common in qualitative and exploratory research
Used when the population is small or specialized
Types of Purposive Sampling
Expert sampling – selecting experts in a field
Typical case sampling – selecting representative cases
Critical case sampling – selecting crucial or extreme cases
Example (Economics)
Suppose a study examines policy challenges faced by Self-Help Groups (SHGs).
The researcher deliberately selects:
SHG leaders
NGO coordinators
Local development officers
These respondents are chosen because they possess specialized knowledge, not because they represent the population statistically.
Merits
Useful when probability sampling is not feasible
Cost-effective and time-saving
Ideal for in-depth qualitative studies
Focuses on relevant respondents
Limitations
High risk of researcher bias
Findings cannot be generalized
Lack of statistical representativeness
Comparison between Stratified Random and Purposive Sampling
| Basis | Stratified Random Sampling | Purposive Sampling |
|---|---|---|
| Type | Probability sampling | Non-probability sampling |
| Selection | Random | Researcher’s judgment |
| Representativeness | High | Limited |
| Bias | Low | High |
| Statistical inference | Possible | Not reliable |
| Usage | Quantitative studies | Qualitative / exploratory studies |
Significance of Normal Distribution Assumption in Regression Analysis
In classical linear regression analysis, one of the important assumptions is that the error term (disturbance term) follows a normal distribution with mean zero and constant variance. This assumption plays a crucial role in statistical inference, though not in the estimation of regression coefficients themselves.
Meaning of the Assumption
The normality assumption states that:
ui∼N(0,σ2)u_i sim N(0, sigma^2)ui∼N(0,σ2)
where
uiu_iui is the error term,
the mean is zero, and
variance is constant.
This implies that most errors cluster around zero, with extreme errors being rare.
Significance of Normality Assumption
1. Validity of Hypothesis Testing
Normality ensures that:
t-tests for individual regression coefficients
F-tests for overall model significance
are statistically valid, especially in small samples.
2. Construction of Confidence Intervals
Confidence intervals for regression coefficients rely on the assumption that estimators follow a normal or t-distribution, which is guaranteed when errors are normally distributed.
3. Exact Sampling Distributions
With normal errors, estimators such as OLS coefficients have exact sampling distributions, not just approximate ones. This improves reliability of inference.
4. Importance in Small Samples
In large samples, the Central Limit Theorem often compensates for non-normality. However, in small samples, normality becomes crucial for correct inference.
5. Efficiency of Estimators
Under normality, OLS estimators are not only Best Linear Unbiased Estimators (BLUE) but also maximum likelihood estimators, giving them desirable optimal properties.
6. Prediction and Forecasting
Normality of errors allows probabilistic statements about forecast errors, improving prediction accuracy and interpretation.
What Normality Does Not Affect
It does not affect unbiasedness or consistency of OLS estimators.
Regression coefficients can still be estimated without normality.
Normality mainly affects inference, not estimation.
Reciprocal Form of Regression Model
The reciprocal regression model is a non-linear functional form in which the dependent variable is related to the reciprocal (inverse) of the independent variable. It is used when the effect of an explanatory variable on the dependent variable diminishes rapidly at first and then slowly.
Specification of the Model
The general reciprocal regression model is written as:
Y=a+bX+uY = a + rac{b}{X} + uY=a+Xb+u
where:
YYY = dependent variable
XXX = independent variable
a,ba, ba,b = parameters
uuu = random error term
If b<0b < 0b<0, the relationship between YYY and XXX is inverse.
Explanation of the Reciprocal Relationship
As XXX increases, 1Xrac{1}{X}X1 decreases.
Changes in XXX have larger effects when XXX is small and smaller effects when XXX is large.
This captures diminishing influence of XXX on YYY.
Applicability of Reciprocal Regression in Economics
1. Average Cost and Output
AC=a+bQAC = a + rac{b}{Q}AC=a+Qb
As output (QQQ) increases, average fixed cost falls.
Widely used in cost theory.
2. Productivity and Labour Input
Output per worker=a+bL ext{Output per worker} = a + rac{b}{L}Output per worker=a+Lb
Small increases in labour at low levels significantly affect productivity.
Effect weakens as labour increases.
3. Interest Rate and Investment Efficiency
I=a+brI = a + rac{b}{r}I=a+rb
At very low interest rates, changes have strong effects on investment.
At higher rates, marginal impact declines.
4. Time Taken and Speed
T=a+bST = a + rac{b}{S}T=a+Sb
Common in transport and logistics economics.
Increasing speed initially reduces time sharply; later reductions are smaller.
5. Poverty or Inequality Measures
Used when improvements are rapid at low income levels but slow at higher levels.
Useful in development economics.
Illustration (Simple Example)
Suppose we model average cost (AC) as:
AC=50+200QAC = 50 + rac{200}{Q}AC=50+Q200
When Q=10Q = 10Q=10, AC=70AC = 70AC=70
When Q=100Q = 100Q=100, AC=52AC = 52AC=52
This shows rapid decline initially and slow decline later, which linear models cannot capture well.
Merits of the Reciprocal Model
Captures non-linear diminishing effects
Economically intuitive in many real-world situations
Often provides better fit than linear models
Limitations
Cannot be used when X=0X = 0X=0
Interpretation of coefficients is less direct than linear models
Sensitive to measurement errors in small values of XXX
Distinguish
(i) Oral Histories and Life History
| Basis | Oral Histories | Life History |
|---|---|---|
| Meaning | Collection of oral accounts of past events | Detailed narrative of an individual’s entire life |
| Focus | Specific events or periods | Whole life experiences |
| Scope | Narrow and event-centred | Broad and comprehensive |
| Time Coverage | Limited time span | Long-term (childhood to present) |
| Use | Historical and social research | Sociological and developmental studies |
(ii) Participant Observation and Non-Participant Observation
| Basis | Participant Observation | Non-Participant Observation |
|---|---|---|
| Role of researcher | Actively participates in the group | Remains detached and passive |
| Interaction | High interaction with subjects | No or minimal interaction |
| Depth of data | Rich and in-depth | Limited to observable behaviour |
| Objectivity | Risk of involvement bias | More objective |
| Suitability | Cultural and community studies | Behavioural and institutional studies |
(iii) Pooled Data and Panel Data
| Basis | Pooled Data | Panel Data |
|---|---|---|
| Nature | Combination of cross-section data | Same units observed over time |
| Time dimension | No individual time tracking | Has both time and individual dimensions |
| Unit identity | Not preserved | Preserved |
| Analysis | Simpler regression | Advanced econometric techniques |
| Example | Different households surveyed once | Same households surveyed yearly |
(iv) Questionnaire and Schedule
| Basis | Questionnaire | Schedule |
|---|---|---|
| Meaning | A list of questions filled by the respondent | A list of questions filled by the enumerator |
| Mode | Self-administered | Interview-based |
| Literacy requirement | Requires literate respondents | Suitable for illiterate respondents |
| Cost | Low cost | Relatively expensive |
| Response rate | Usually low | High |
| Bias | Less interviewer bias | Possibility of interviewer bias |
| Suitability | Large, educated populations | Rural areas and field surveys |
(v) Population Census and Economic Census
| Basis | Population Census | Economic Census |
|---|---|---|
| Coverage | Counts people | Counts economic units/enterprises |
| Objective | Demographic and social information | Structure of economic activities |
| Conducted by | Registrar General of India | Ministry of Statistics & Programme Implementation |
| Information collected | Age, sex, literacy, occupation | Type of enterprise, employment, location |
| Frequency | Once every 10 years | Periodic (not fixed like population census) |
| Use | Planning social services | Economic planning and industrial policy |
(vi) Primary Data and Secondary Data
Primary Data
Collected first-hand by the researcher
Specific to the research objective
Costly and time-consuming
Secondary Data
Already collected by others
General-purpose data
Economical and time-saving
(vii) Population Census and Economic Census
Population Census
Complete enumeration of people
Focuses on demographic and social characteristics
Unit of study: individual/household
Economic Census
Complete enumeration of economic establishments
Focuses on economic activities and employment
Unit of study: enterprise/establishment
(viii) Research Techniques and Research Tools
Research Techniques
Procedures or methods used to conduct research
Indicate how data are collected or analysed
Examples: survey, interview, regression analysis
Research Tools
Instruments used to implement techniques
Help in data collection or measurement
Examples: questionnaire, interview schedule, scale
(ix) Measured Variable and Latent Variable
Measured Variable
Directly observable and measurable
Data obtained through instruments
Example: income, years of education
Latent Variable
Not directly observable
Inferred from measured variables
Example: poverty, intelligence, job satisfaction
(x) Inclusion and Exclusion of Variables and Linkage with Adjusted R2R^2R2
Inclusion of Variables
Adds explanatory factors to the model
May increase goodness of fit
Exclusion of Variables
Removes irrelevant or insignificant variables
Helps avoid overfitting
Link with Adjusted R2R^2R2
R2R^2R2 always increases when variables are added
Adjusted R2R^2R2 increases only if the new variable improves the model after adjusting for degrees of freedom
Used as a criterion for variable selection
(xi) Life History vs Narratives
| Basis | Life History | Narratives |
|---|---|---|
| Meaning | Detailed account of an individual’s entire life | Story or account of specific events or experiences |
| Scope | Long-term, comprehensive | Shorter, focused |
| Nature | Chronological | Thematic or episodic |
| Usage | Sociology, anthropology | Qualitative social research |
| Depth | Very deep | Relatively limited |
(xii) Sampling Errors vs Non-sampling Errors
| Basis | Sampling Errors | Non-sampling Errors |
|---|---|---|
| Meaning | Errors due to using a sample instead of population | Errors arising from data collection and processing |
| Occurrence | Only in sample surveys | In both census and sample surveys |
| Cause | Random variation | Bias, non-response, measurement errors |
| Control | Reduced by larger samples | Reduced by better survey design |
| Nature | Quantifiable | Often non-quantifiable |
(xiii) Instrumentalism vs Realism
| Basis | Instrumentalism | Realism |
|---|---|---|
| View of theories | Tools for prediction | True description of reality |
| Assumptions | Need not be realistic | Must reflect real-world mechanisms |
| Focus | Predictive accuracy | Explanatory power |
| Associated with | Milton Friedman | Critical realism |
| Criticism | Ignores reality | Difficult to empirically verify |
(xiv) Positive vs Normative Measures of Inequality
| Basis | Positive Measures | Normative Measures |
|---|---|---|
| Nature | Descriptive | Value-based |
| Objective | Measure inequality as it exists | Judge inequality against a standard |
| Examples | Gini coefficient, Lorenz curve | Atkinson index |
| Value judgment | Absent | Present |
| Usage | Empirical analysis | Welfare evaluation |
(xv) Research Design vs Research Methods
| Basis | Research Design | Research Methods |
|---|---|---|
| Meaning | Overall plan of research | Techniques used to collect and analyze data |
| Nature | Conceptual framework | Operational tools |
| Scope | Broad | Narrow |
| Examples | Exploratory, descriptive design | Survey, interview, regression |
| Stage | Before data collection | During data collection and analysis |
(xvi) Time-series Data vs Pooled Data
| Basis | Time-series Data | Pooled Data |
|---|---|---|
| Meaning | Observations on a single unit over time | Combination of cross-section and time-series data |
| Units | Same unit repeatedly | Multiple units over time |
| Time dimension | Essential | Essential |
| Example | India’s GDP from 2010–2024 | GDP of several states from 2010–2024 |
| Use | Trend and forecasting | Heterogeneity and richer inference |
(xvii) Action Research vs Exploratory Research
| Basis | Action Research | Exploratory Research |
|---|---|---|
| Purpose | Solving a practical problem | Gaining initial understanding |
| Nature | Intervention-oriented | Discovery-oriented |
| Outcome | Immediate action and change | Hypothesis formulation |
| Researcher role | Active participant | Detached investigator |
| Example | Improving school attendance | Studying causes of low attendance |
(xviii) Induction vs Deduction Approach of Enquiry
| Basis | Induction | Deduction |
|---|---|---|
| Direction | From specific to general | From general to specific |
| Logic | Observation → theory | Theory → hypothesis → test |
| Nature | Theory-building | Theory-testing |
| Used in | Qualitative research | Quantitative research |
| Example | Observing markets to build theory | Testing demand theory with data |
(xix) Nominal vs Ordinal Scaling Techniques
| Basis | Nominal Scale | Ordinal Scale |
|---|---|---|
| Nature | Classification only | Classification with ranking |
| Order | No order | Order exists |
| Arithmetic operations | Not possible | Limited (no meaningful difference) |
| Example | Gender, religion | Income groups (low, middle, high) |
| Central tendency | Mode | Median, mode |
(xx) Ontology vs Epistemology
| Basis | Ontology | Epistemology |
|---|---|---|
| Concern | Nature of reality | Nature of knowledge |
| Key question | What exists? | How do we know? |
| Focus | Being and existence | Knowledge and justification |
| Research link | What is reality being studied | How reality can be studied |
| Example | Is poverty objective or subjective? | Can poverty be measured statistically? |