(always a work in progress)
Why
This portfolio is to track and share what I’ve learned through projects and courses I’ve taken in my journey to level-up my nerd skills. It includes links to Juypter Notebooks, code from tutorials & workshops, book notes, and algorithm coding exercises.
Brief career tour: I rocked full-stack and data-centric software engineering for over 13 years using many languages then at the senior level shifted focus into team leadership and project management, then decided to shift back to being hands on to build more great software. At that point Python became my clear language of choice and my journey into machine learning began.
Ping me on LinkedIn if you’d like to know more!
Contents
- Data Engineering: Preparation & Cleaning
- Exploratory Data Analysis
- Data Visualization
- Machine Learning: Modeling & Algorithms
- Statistics & Probability
Data Engineering: Preparation & Cleaning
- Preparing data for EDA: The Human Microbiome Project’s catalog of microbes
- Data set: Human Microbiome Project’s catalog of microbes
- Technologies: pandas, matplotlib, seaborn
- Course notes: “Python and pandas for Data Science” (Datacamp)
- Technologies: pandas, Numpy; importing, cleaning, merging, and transforming data with pandas
Exploratory Data Analysis
- Cleaning and exploration of HCCI’s 2016 Health Care Cost and Utilization Report
- Course notes: “Elements of Data Science Part 2 - EDA” (AWS Training and Certification)
Data Visualization
- Visualizations of prescription drug spending & costs
- Data Set: HCCI’s 2016 Health Care Cost and Utilization Report
- Technologies: matplotlib, seaborn, Bokeh
- Course notes: “Data Visualization with matplotlib, Seaborn, and Bokeh” (Datacamp)
- Technologies: matplotlib, Seaborn, Bokeh
Machine Learning: Modeling & Algorithms
- Supervised learning: Gradient Boosting Machines demo with XGBoost (Feb 2020)
- Technologies: pandas, scikit-learn, scipy, xgboost, matplotlib
- Data set: Airline On-Time Statistics and Delay Causes from Bureau of Transportation Statistics (BTS)
- Course notes: “Supervised Learning with scikit-learn” (Datacamp)
- Technologies & topics: scikit-learn, classification, regression, model tuning, pipelines
- Course notes: “Unsupervised Learning in Python” (Datacamp)
- Technologies & topics: scikit-learn, k-means clustering, t-SNE, PCA, NMF
- Course notes: “Deep Learning in Python” (Datacamp)
- Technologies & topics: Keras, deep learning, backpropagation, model tuning
- Course notes: “Network Analysis in Python” (Datacamp)
- Technologies & topics: NetworkX, graph visualization, pathfinding, graph structures