Nutrition & Obesity Trends Analysis (Bell-Labs)

Bell Labs (repository name) implements a reproducible analytics stack for global diet and health outcomes. Raw FAO nutrition and population inputs and WHO obesity prevalence are cleaned, harmonized, and joined through a staged pipeline (`run_pipeline.py`): FAO preprocessing, obesity standardization, food-group mapping, panel construction, and a final master merge producing `master_panel_final.csv`. The dataset supports macro-level questions on per-capita energy, protein, fat, food-group shares, and obesity trends. Analysis layers include scripted EDA (`perform_eda.py`, `extended_eda.py`), interactive Plotly dashboards (`interactive_plot.py`), and notebook-driven exploration with documented methodology, data dictionary, and research notes under `doc/`.

Timeline

Multi-month

Role

Data / ML Engineer

Team

Solo

Status

Completed

Source Code

Technology Stack

PythonpandasNumPymatplotlibseabornPlotlyJupyterscikit-learn

Key Features

Single-command pipeline via `run_pipeline.py` producing cleaned CSVs and `data/processed/final/master_panel_final.csv`

Five-step ETL: FAO clean, obesity clean, item→food-group mapping, intermediate panels, master panel with missing-data handling (e.g. limited interpolation for gaps)

Country–year panel: 171 countries, 2010–2022 overlap, variables for nutrients per capita/day, food-group kcal and share columns, population, `obesity_pct`

Exploratory analysis outputs: summaries, correlations, trends under `data/outputs/`

Interactive Plotly charts: energy vs obesity, food-group shares over time, country comparisons

Notebook curriculum from raw exploration through main EDA (`notebooks/`) with README guidance

Documentation: methodology, data dictionary, dataset analysis, and research notes for report-grade reuse

Key Learnings

Panel data construction for international health and agriculture statistics
Python packaging of multi-step ETL with clear folder conventions (raw → cleaned → panels → final)
Bridging notebooks and scripts for both exploration and repeatable runs
Communicating nutrition–obesity relationships responsibly with documented limitations

Key Challenges

Aligning country names and codes across FAO and WHO sources
Managing missing values and interpolation policy without overstating certainty
Keeping intermediate artifacts organized so the pipeline is rerunnable and auditable
Explaining high-dimensional nutrition structure to non-technical readers via clear visuals

Impact & Results

One merged research-ready dataset for regression, ML, or policy-style analysis

Transparent workflow others can clone, rerun, and extend (MIT license)

Foundation for coursework or publications citing FAO and WHO provenance

Future Enhancements

Predictive models on `master_panel_final.csv` (e.g. panel regression, forecasting)

Automated tests for pipeline steps and data quality checks

Packaging as installable module or CLI for non-notebook users

Optional dashboard app layer on top of processed outputs