DevxSubh
I develop 3D visuals, user interfaces and web applications.
Loading...0%
Menu
Menu
Close
Close
Back to projects
Nutrition & Obesity Trends Analysis (Bell-Labs)

Nutrition & Obesity Trends Analysis (Bell-Labs)

Bell Labs (repository name) implements a reproducible analytics stack for global diet and health outcomes. Raw FAO nutrition and population inputs and WHO obesity prevalence are cleaned, harmonized, and joined through a staged pipeline (`run_pipeline.py`): FAO preprocessing, obesity standardization, food-group mapping, panel construction, and a final master merge producing `master_panel_final.csv`. The dataset supports macro-level questions on per-capita energy, protein, fat, food-group shares, and obesity trends. Analysis layers include scripted EDA (`perform_eda.py`, `extended_eda.py`), interactive Plotly dashboards (`interactive_plot.py`), and notebook-driven exploration with documented methodology, data dictionary, and research notes under `doc/`.

Timeline

Multi-month

Role

Data / ML Engineer

Team

Solo

Status

Completed

Technology Stack

PythonpandasNumPymatplotlibseabornPlotlyJupyterscikit-learn

Key Features

Single-command pipeline via `run_pipeline.py` producing cleaned CSVs and `data/processed/final/master_panel_final.csv`
Five-step ETL: FAO clean, obesity clean, item→food-group mapping, intermediate panels, master panel with missing-data handling (e.g. limited interpolation for gaps)
Country–year panel: 171 countries, 2010–2022 overlap, variables for nutrients per capita/day, food-group kcal and share columns, population, `obesity_pct`
Exploratory analysis outputs: summaries, correlations, trends under `data/outputs/`
Interactive Plotly charts: energy vs obesity, food-group shares over time, country comparisons
Notebook curriculum from raw exploration through main EDA (`notebooks/`) with README guidance
Documentation: methodology, data dictionary, dataset analysis, and research notes for report-grade reuse

Key Learnings

  • Panel data construction for international health and agriculture statistics
  • Python packaging of multi-step ETL with clear folder conventions (raw → cleaned → panels → final)
  • Bridging notebooks and scripts for both exploration and repeatable runs
  • Communicating nutrition–obesity relationships responsibly with documented limitations

Key Challenges

  • Aligning country names and codes across FAO and WHO sources
  • Managing missing values and interpolation policy without overstating certainty
  • Keeping intermediate artifacts organized so the pipeline is rerunnable and auditable
  • Explaining high-dimensional nutrition structure to non-technical readers via clear visuals

Impact & Results

One merged research-ready dataset for regression, ML, or policy-style analysis
Transparent workflow others can clone, rerun, and extend (MIT license)
Foundation for coursework or publications citing FAO and WHO provenance

Future Enhancements

Predictive models on `master_panel_final.csv` (e.g. panel regression, forecasting)
Automated tests for pipeline steps and data quality checks
Packaging as installable module or CLI for non-notebook users
Optional dashboard app layer on top of processed outputs