Hend
Abd El-Ghafour

Logo

Data Analyst

View My GitHub Profile

Turning Data into cheerful, understandable, & Actionable Insights


About Me Projects Testimonials Experience Education Contact

Human Resources Tableau Dashboard

Sep 2024

View Project

HR Tableau

See More
Overview

Created a comprehensive Tableau dashboard to analyze employee data, gaining insights into workforce aspects, including demographics, hiring and termination trends, and salary distributions. This analysis aimed to understand employee characteristics, department-specific trends, and performance evaluations to drive data-informed decisions.

Tools & Techniques

Tableau for calculations, data visualization and interactive dashboards.

Roles & Responsibilities

  • Data Cleaning:
    Verified data types, identified null values, and inspected unique entries, such as detecting nulls in the 'termdate' column indicating non-terminated employees.
  • Visualization:
    Selected the most appropriate charts for effective data presentation and created a comprehensive employee information table.
  • HR Employee Details
  • Analysis:
    Conducted statistical analysis to identify trends in hiring, terminations, and salary distributions.

Challenges Faced

  • Data Gaps:
    Identified missing values in critical fields, requiring strategies for accurate interpretation.
  • Complex Relationships:
    Analyzed complex relationships between hiring, terminations, and department-level trends.
  • Data Standardization:
    Needed to verify data consistency across branches and departments for accurate insights.

Achievements

  • Employee Analysis:
    Total employee count reached 8,950 (7,984 active, 966 terminated).
  • Hiring Trends:
    Noted peak hiring in 2017 with 1,560 new employees, while 2021 experienced the lowest hiring rate with 382 hires.
  • Termination Analysis:
    Found that 2023 had the highest terminations, with 174 employees (18% of total terminations), predominantly in the Operations department.
  • Departmental Insights:
    Operations had the highest activity, with 30% of both active and terminated employees, suggesting high turnover.
  • Geographical Distribution:
    70% of employees were based at HQ in New York, which also had a higher termination rate.
  • Gender Analysis:
    Gender distribution was slightly male-dominated (54%), with a balanced termination rate (11% each for males and females).
  • Educational Trends:
    Identified that bachelor’s degree holders formed the largest employee group (61%) with noticeable termination disparities among educational levels.

Insights

  • Hiring & Termination Trends:
    The Operations department’s turnover was high, and New York HQ showed the highest activity, with a considerable termination rate.
  • Gender & Education Dynamics:
    Gender imbalances were observed in specific educational categories, with a higher termination rate among female high school graduates and male PhD holders.
  • Performance Ratings:
    Educational level affected performance ratings, with high school graduates more often rated "Needs Improvement," while PhD holders frequently achieved "Excellent" ratings.
  • Salary Disparities:
    Significant disparities were observed, with males earning more at the bachelor’s level, while females earned more at the PhD level.

Future Application

  • Workforce Planning:
    Explore hiring and termination trends to optimize staffing and reduce turnover in high-activity departments like Operations.
  • Turnover Analysis:
    Conduct a deeper analysis of the reasons behind turnover patterns, especially in specific positions and departments.
  • Gender & Education Balance:
    Investigate gender disparities in salary and termination rates to promote equity.
  • Performance-Based Retention:
    Reevaluate performance rating criteria and termination practices to ensure fair and consistent employee assessments.
  • Compensation Strategy:
    Research if salary differences are consistent over time and explore whether performance and experience are accurately reflected in the company’s pay structure.




______________________________________________________



No Show Apointments

Sep 2024

View Project

No Show

See More
Overview

This project analyzes a dataset of 100,000 medical appointments in Brazil, focusing on the factors that impact patient attendance, such as demographics, health conditions, appointment scheduling gaps, and reminders. Key questions address the effects of gender, age, neighborhood, health conditions, and communication on attendance rates.

Tools & Techniques

  • Tools:
    Python for Data Assessment and Cleaning using Pandas & Numpy, and for Data Visualization using Matplotlib & Seaborn.
  • Data Cleaning:
    Used to standardize column labels, add calculated columns, and categorize patient age groups and appointment scheduling gaps.
  • Data Analysis:
    Applied statistical methods, including descriptive statistics, to understand patterns in attendance across demographic and health-related variables.
  • Visualization:
    Created charts and graphs to present findings on attendance patterns across different patient groups and appointment characteristics.

Roles & Responsibilities

  • Data Preparation:
    Cleaned data and standardized labels to facilitate analysis.
  • Data Exploration:
    Generated insights through descriptive statistics and visualizations.
  • Analysis of Factors Influencing Attendance:
    Assessed relationships between patient demographics, health conditions, appointment scheduling, and attendance.
  • Recommendation Development:
    Suggested approaches to improve appointment attendance, particularly through reminder systems and optimized scheduling.

Challenges Faced

  • Data Complexity:
    A large number of variables, including demographic, health, and scheduling information, required careful analysis and categorization to maintain data integrity.
  • Categorization:
    Defining and assigning appropriate categories for age groups and scheduling gaps posed challenges in achieving balanced, meaningful groupings.
  • Data Completeness:
    Sparse neighborhood data limited definitive conclusions regarding neighborhood influence on attendance rates. Additionally, some appointment dates were recorded prior to their scheduled dates, leading to negative appointment gaps.

Achievements

  • Insights into Attendance Patterns:
    Identified key factors, such as age, neighborhood, and reminder messages, that impact attendance rates.
  • Practical Recommendations:
    Developed actionable suggestions for healthcare providers to improve attendance, including adjusting scheduling practices and implementing reminder systems.
  • Categorization Model:
    Created a structured model for age and appointment gap categories to facilitate further analysis and comparisons.

Insights

  • Gender:
    Women make up the majority of patients, with a high attendance rate across both genders, indicating no significant difference in attendance based on gender.
  • Age Groups:
    Middle-aged, elderly, and child groups showed the highest commitment to appointments, suggesting a link between age and regular healthcare engagement.
  • Appointment Scheduling Gap:
    Patients are more likely to attend appointments with shorter scheduling gaps. Same-day and short-gap appointments had the highest attendance rates.
  • Neighborhood Influence:
    Certain neighborhoods showed significantly higher attendance rates, although the data suggests this may also correlate with lower appointment counts in these areas.
  • Scholarship Status:
    Attendance rates were marginally lower for patients with a government scholarship, indicating a potential but small effect.
  • Health Conditions:
    Patients with chronic conditions such as hypertension or diabetes showed slightly higher attendance rates, which may indicate a greater commitment to health management among these groups.
  • Reminders:
    Confirmations sent to patients increased attendance by over 10%, suggesting a strong positive impact from reminder messages.

Future Application

  • Enhanced Patient Reminder Systems:
    Implement automated SMS or phone reminders to improve attendance rates, especially for moderate to long-gap appointments.
  • Optimized Scheduling Strategies:
    Focus on offering same-day or short-gap appointment slots to increase attendance.
  • Neighborhood Targeting:
    Conduct further studies to understand neighborhood-based attendance patterns, potentially incorporating geographic proximity or access factors.
  • Scholarship & Attendance Correlation:
    Further statistical testing is recommended to understand the impact of government support on attendance consistency.
  • Personalized Engagement:
    Apply insights on health conditions and age-related attendance patterns to create targeted communication strategies, potentially increasing engagement with chronic disease patients and elderly individuals.



______________________________________________________



Northwind

Aug 2024:Sep 2024

View Project

Northwind

See More
Overview

The Northwind database is a sample database created by Microsoft, containing the sales data of "Northwind Traders," a fictitious company that imports and exports specialty foods worldwide. It includes details on customers, orders, inventory, purchasing, suppliers, shipping, employees, and single-entry accounting.

Tools & Techniques

  • SQL:
    Created views and tables; adjusted database diagrams.
  • Python (Pandas):
    Calculated growth rates.
  • Excel:
    Employed Power Query, Power Pivot, and pivot tables to analyze data, create relationships, and develop interactive dashboards.

Roles & Responsibilities

  • Data Cleaning and Standardization:
    Ensured data types and missing values were handled, particularly in shipping and ordering dates.
  • Data Analysis:
    Analyzed sales trends, customer demographics, shipping efficiency, and product performance.
  • Dashboard Creation:
    Developed seven dashboards summarizing critical insights into sales, product performance, and customer behaviors.

Challenges Faced

  • Incomplete Data:
    Sales records spanned only from July 1996 to May 1998, requiring careful adjustments in year-over-year comparisons.
  • Data Consistency:
    Managed inconsistencies in order quantities and stock levels, and standardized date data types.
  • Complex Growth Calculations:
    Analyzed growth across inconsistent time periods to derive accurate trends.

Achievements

  • Growth Analysis Adjustments:
    Created an "Actual Country Growth Table," yielding more accurate comparisons by aligning half-year periods for better insights.
  • Order and Customer Insights:
    Classified 89 customers into new or repeat based on order activity, facilitating customer tracking and targeted insights.
  • Comprehensive Inventory Status:
    Developed a robust stock status system, classifying products into Safe Stock, Restock Needed, No Restock, and Stopped.
  • Dashboard:
    Completed a comprehensive, interactive dashboard with critical sales insights and key metrics.

Insights
  • Sales and Orders:
    • Orders Summary: 830 orders processed for 89 customers, with 51,317 total quantities sold.
    • Shipping Efficiency: Average shipping time was 8.48 days.
  • Product Analysis:
    • Product Categories: 8 categories with 77 products.
    • Net Sales: $1,265,793.18; Net Revenue: $89K.
    • Discounts: Totaled $89K across all products
    • Shipping Cost: $65K.
    • Highest Net Sales by Month: Achieved in April 1998, totaling $124K.
  • Geographical Insights:
    • Top Countries by Sales: in 1996: USA - $35K, in 1997: Germany - $117K, & in 1998: USA - $93K.
    • Country with Most Customers: USA, with 13 recorded customers.
    • Discount Performance by Country: USA had the highest discounts and net sales across all three years.
  • Customer Insights:
    • Total Amount Paid by Customers (Including Shipping): $1.3M.
    • New Customers in 1998: 1 new customer.
    • Highest Monthly Sales: April 1998, driven by 51 customers placing orders.
  • Product Stock and Reorder Analysis:
    • Continued & Discontinued Products: 10.13% were discontinued, while 89.87% remained active.Of the active products: 66.52% were in "Safe Stock" status, and 23.35% required restocking.
    • Top Category by Performance: Beverages, with net sales of $268K, net revenue of $19K, discounts of $19K, and $4K in freight, across each year.
  • Shipping Companies:
    • Top Freight Costs by Year: 1996: Federal Shipping - $4K, 1997 & 1998: United Package Company - $12K each year.
    • Delivery Performance: 809 orders were shipped to their destination, with 772 delivered on time.
    • Highest On-Time Deliveries: 1996: Federal Shipping - 55 on-time deliveries, 1997 & 1998: United Package Company with 142 and 103 on-time deliveries, respectively.

Future Application

  • Improve Inventory Management:
    Use reorder level and stock status data to streamline restocking processes.
  • Optimize Shipping Processes:
    Increase on-time deliveries by selecting shipping companies based on past performance.
  • Enhance Customer Targeting:
    Focus on high-growth markets, like the USA, and monitor purchasing trends for customer retention and acquisition strategies.

Note
The values presented (in USD) are rounded to the nearest thousand or million.




______________________________________________________



Death & Life Expectancy for Mothers & Children

View Project



Death & Life

See More
Overview

The dataset analyzes seven indicators impacting maternal and child health from 1995 to 2010 across six WHO regions. These indicators include Total Fertility Rate, Percentage of Births Attended by Skilled Health Staff, Child Mortality Rate, Maternal Mortality Rate, Average Life Expectancy, Total Health Spending per Person in USD, and Percentage of Total Health Expenditure in GDP.

Tools & Techniques
  • Programming Language: Python
  • Data Wrangling: Pandas - Numpy
  • Visualization: Matplotlib - Seaborn

Roles & Responsibilities

  • Data Analysis:
    Evaluated health indicators for each region over the specified period.
  • Comparative Analysis:
    Compared indicators across regions and years to identify trends and patterns.
  • Additional Research:
    Investigated improvements, inconsistencies, and relationships between indicators.

Challenges Faced

  • Addressed data quality issues such as null values and unpivoting columns.
  • Managed large datasets by segmenting countries into regions for effective analysis.

Achievements

  • Trend Identification:
    Noted trends such as decreasing Total Fertility Rates, Child Mortality Rates, and Maternal Mortality Rates across all regions.
  • Regional Insights:
    Identified which regions had the highest and lowest values for each indicator.
  • Correlation Analysis:
    Found relationships between health expenditure indicators and various health outcomes.

Insights

  • Total Fertility Rate:
    Decreased overall, with AFR showing the highest rates and EUR the lowest.
  • Births Attended by Skilled Health Staff:
    Highest in EUR and lowest in AFR. Inconsistencies in data affected the ability to determine relationships with other indicators.
  • Child Mortality Rate:
    Decreased globally, with AFR having the highest and EUR the lowest rates.
  • Maternal Mortality Rate:
    Showed a global decrease, with AFR experiencing the highest rates and EUR the lowest.
  • Average Life Expectancy:
    Increased across regions, with EUR having the highest and AFR the lowest.
  • Total Health Spending per Person:
    Showed an overall increase, with EUR spending the most and SEAR the least. It is considered a reliable indicator, though its impact on skilled birth attendance showed variability.
  • Percentage of Total Health Expenditure in GDP:
    Generally increased, with EUR having the highest and EMR the lowest. It is considered a Significant indicator affecting health metrics, but with regional inconsistencies.

Future Application

  • Compare findings with global health indicators from the same period.
  • Investigate regional improvements and inconsistencies in health metrics.
  • Explore reasons behind the decline in health expenditure and its impact on other indicators, especially in the SEAR region and in 2009.



______________________________________________________



US Bikeshare

View Project



US Bikeshare

See More
Overview

This project analyzed bikeshare data from Chicago, New York City, and Washington to uncover usage patterns and user demographics.

Tools & Techniques
  • Programming Language: Python
  • Data Wrangling: Pandas - Numpy

Roles & Responsibilities

I independently conducted all aspects of the project, including coding the analysis scripts, and preparing and presenting findings.

Challenges Faced

  • Developed an interactive terminal script for real-time statistics based on user input.
  • Managed data discrepancies across cities and datasets.

Achievements

  • Popular Times of Travel:
    Identified the most common month, day of the week, and hour of the day for bike rentals.
  • Popular Stations and Trips:
    Determined the most common start and end stations, as well as the most frequent trips.
  • Trip Duration:
    Calculated total and average travel times.
  • User Info:
    Counted each user type (e.g., subscriber, customer), & Analyzed gender-specific data and birth years for NYC and Chicago.

Insights

  • Peak rental times align with commuting hours, suggesting daily transportation use.
  • Popular stations are near landmarks or transportation hubs, indicating high-demand areas.
  • Average ride lengths and travel time variability provided insights into trip duration.
  • User demographics analysis revealed the composition by user type and gender.

Future Application

  • Expand analysis to additional cities or more recent data for evolving usage patterns.
  • Enhance the terminal script's interactivity for detailed queries and customized outputs.
  • Explore correlations between bikeshare usage and external factors such as weather or events.





About Me Projects Testimonials Experience Education Contact